Age | Commit message (Collapse) | Author |
|
the POSIX standard explicitly requires in section 6.2 that "the POSIX
locale shall contain 256 single-byte characters", see:
https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap06.html#tag_06_02
So the current behaviour of treating non-ASCII bytes in an LC_CTYPE=POSIX
input stream as if they were characters is not a POSIX violation, but
actually required by the standard - and not just for awk(1), but for
utility programs in general and even for library functions in general.
Consequently, delete the wrong sentence i added to the STANDARDS section
last year.
Thanks to millert@ and jmc@ for making me realize my mistake.
OK millert@ jmc@
|
|
* Fixed readcsvrec resize segfault when reading csv records longer than 8k.
* Rewrite if-else chain in quoted as a switch.
|
|
|
|
For u8_isutf() the conditionals already ensure that a NUL byte won't
match. For substr() we can use the byte offset of 'm' to avoid
re-scanning the initial part of the string. From Jonas Bechtel.
|
|
From jsg@ via upstream.
|
|
|
|
ok tb@ deraadt@
|
|
Fixes a use-after-free bug with ARGV for "delete ARGV".
|
|
* fixed regex engine gototab reallocation issue that was introduced
during the Nov 24 rewrite.
* fixed use-after-free bug in fnematch due to adjbuf invalidating
the pointers to buf.
|
|
|
|
|
|
|
|
This includes a rewrite of the fnematch() function as well as a
refactoring of the sub and gsub implementation.
|
|
From Arnold Robbins. https://github.com/onetrueawk/awk/pull/213
|
|
fnematch() expects to store a NUL byte when EOF is encountered.
However, the rewrite broke this assumption because r.len from getrune()
is zero on EOF. This results in j becoming negative on EOF, causing an
out-of-bounds access. It is simplest to just force r.len to 1 on EOF
to copy a single NUL byte--the rune is initialized to zero even for EOF.
This also fixes the call to adjbuf(). We cannot use 'k' to determine
when we need to expand the buffer now that we are potentially reading
more than a single byte at a time.
https://github.com/onetrueawk/awk/pull/211
|
|
This is really just a version number bump as we already have the
fixes committed.
|
|
|
|
|
|
From upstream.
|
|
We need to use u8_strlen(), not strlen(), to compute the length.
Otherwise, there may be an out of bounds write when writing the NUL
terminator to set the length of the substring.
https://github.com/onetrueawk/awk/pull/205
|
|
We cannot use set_gototab() to reset all the entries for a state,
it will leave existing entries as-is. Add a new reset_gototab()
function that zeroes the table entries for the specified state.
There is no need to reset the goto table immediately after
resize_state(), it is already initialized via calloc().
Fixes https://github.com/onetrueawk/awk/issues/199
|
|
fnematch and getrune have been overhauled to solve issues around
unicode FS and RS. also fixed gsub null match issue with unicode.
big thanks to Arnold Robbins.
|
|
|
|
We must store a UTF-32 empty string, not UTF-8 empty string, for
an empty CCL. Found running the awk test suite with address sanitizer.
|
|
Based on a diff from millert@ with additions by me.
Feedback and OK millert@.
|
|
Upstream awk has supported --version for a long time but does not
support -V like our awk does. Both options are supported by gawk.
|
|
|
|
|
|
Now that awk stores chars as int we need to cast the Node * to int *.
|
|
This makes it possible to get the old awk behavior (where chars are
bytes) by setting LC_CTYPE to C or POSIX. OK schwarze@
|
|
reformat a lengthy line in awk.1;
ok millert
|
|
in scope. but a 3rd similar situation in the same scope exists also,
which does not create a new variable, and uses the upper scope variable.
Pretty sloppy stuff.
ok millert
|
|
This corresponds to the 2nd edition of "The AWK Programming Language"
and adds support for UTF-8 and comma-separated value inputs.
|
|
it will be published in 2023 with a copyright date of 2024
ok jmc@ millert@
|
|
|
|
Force hex escapes in strings to be no more than two characters, as
they already are in regular expressions. This brings internal
consistency, as well as consistency with gawk.
|
|
Fix undefined behavior and a use-after-free in cat().
|
|
Various leaks and use-after-free issues plugged/fixed.
|
|
Allocated string erroneously marked DONTFREE. From Miguel Pineiro Jr.
|
|
entry in ARGV (since circa 1989). From Miguel Pineiro Jr.
|
|
since the files array was first initialized with stdin, stdout, and
stderr (circa 1992). From Miguel Pineiro Jr.
|
|
Fixes error handling in closefile() and closeall(). Long standing
warnings had been made fatal and some fatal errors went undetected.
|
|
We already had the fix so no actual code changes.
|
|
|
|
Fixes a decision bug with trailing stuff in lib.c:is_valid_number.
All other fixes were already present.
|
|
getrec() returns 0 on EOF and leaves the contents of buf unchanged.
From https://github.com/onetrueawk/awk/pull/134
|
|
For a null string, this was not the case. Since awk(1) documents
that a null string for FS has a specific behavior, make -F '' behave
consistently with -v FS="". https://github.com/onetrueawk/awk/pull/128
|
|
https://github.com/onetrueawk/awk/issues/121
|
|
It is not sufficient to check for the EOF flag on a stream.
From https://github.com/onetrueawk/awk/pull/117
|
|
Without this fix, when reading the first record of an input file named
on the command line, the regular expression engine will be
misconfigured, precluding a successful match. From Miguel Pineiro Jr
|