Age | Commit message (Collapse) | Author |
|
|
|
|
|
|
|
This includes a rewrite of the fnematch() function as well as a
refactoring of the sub and gsub implementation.
|
|
From Arnold Robbins. https://github.com/onetrueawk/awk/pull/213
|
|
fnematch() expects to store a NUL byte when EOF is encountered.
However, the rewrite broke this assumption because r.len from getrune()
is zero on EOF. This results in j becoming negative on EOF, causing an
out-of-bounds access. It is simplest to just force r.len to 1 on EOF
to copy a single NUL byte--the rune is initialized to zero even for EOF.
This also fixes the call to adjbuf(). We cannot use 'k' to determine
when we need to expand the buffer now that we are potentially reading
more than a single byte at a time.
https://github.com/onetrueawk/awk/pull/211
|
|
This is really just a version number bump as we already have the
fixes committed.
|
|
|
|
|
|
From upstream.
|
|
We need to use u8_strlen(), not strlen(), to compute the length.
Otherwise, there may be an out of bounds write when writing the NUL
terminator to set the length of the substring.
https://github.com/onetrueawk/awk/pull/205
|
|
We cannot use set_gototab() to reset all the entries for a state,
it will leave existing entries as-is. Add a new reset_gototab()
function that zeroes the table entries for the specified state.
There is no need to reset the goto table immediately after
resize_state(), it is already initialized via calloc().
Fixes https://github.com/onetrueawk/awk/issues/199
|
|
fnematch and getrune have been overhauled to solve issues around
unicode FS and RS. also fixed gsub null match issue with unicode.
big thanks to Arnold Robbins.
|
|
|
|
We must store a UTF-32 empty string, not UTF-8 empty string, for
an empty CCL. Found running the awk test suite with address sanitizer.
|
|
Based on a diff from millert@ with additions by me.
Feedback and OK millert@.
|
|
Upstream awk has supported --version for a long time but does not
support -V like our awk does. Both options are supported by gawk.
|
|
|
|
|
|
Now that awk stores chars as int we need to cast the Node * to int *.
|
|
This makes it possible to get the old awk behavior (where chars are
bytes) by setting LC_CTYPE to C or POSIX. OK schwarze@
|
|
reformat a lengthy line in awk.1;
ok millert
|
|
in scope. but a 3rd similar situation in the same scope exists also,
which does not create a new variable, and uses the upper scope variable.
Pretty sloppy stuff.
ok millert
|
|
This corresponds to the 2nd edition of "The AWK Programming Language"
and adds support for UTF-8 and comma-separated value inputs.
|
|
it will be published in 2023 with a copyright date of 2024
ok jmc@ millert@
|
|
|
|
Force hex escapes in strings to be no more than two characters, as
they already are in regular expressions. This brings internal
consistency, as well as consistency with gawk.
|
|
Fix undefined behavior and a use-after-free in cat().
|
|
Various leaks and use-after-free issues plugged/fixed.
|
|
Allocated string erroneously marked DONTFREE. From Miguel Pineiro Jr.
|
|
entry in ARGV (since circa 1989). From Miguel Pineiro Jr.
|
|
since the files array was first initialized with stdin, stdout, and
stderr (circa 1992). From Miguel Pineiro Jr.
|
|
Fixes error handling in closefile() and closeall(). Long standing
warnings had been made fatal and some fatal errors went undetected.
|
|
We already had the fix so no actual code changes.
|
|
|
|
Fixes a decision bug with trailing stuff in lib.c:is_valid_number.
All other fixes were already present.
|
|
getrec() returns 0 on EOF and leaves the contents of buf unchanged.
From https://github.com/onetrueawk/awk/pull/134
|
|
For a null string, this was not the case. Since awk(1) documents
that a null string for FS has a specific behavior, make -F '' behave
consistently with -v FS="". https://github.com/onetrueawk/awk/pull/128
|
|
https://github.com/onetrueawk/awk/issues/121
|
|
It is not sufficient to check for the EOF flag on a stream.
From https://github.com/onetrueawk/awk/pull/117
|
|
Without this fix, when reading the first record of an input file named
on the command line, the regular expression engine will be
misconfigured, precluding a successful match. From Miguel Pineiro Jr
|
|
|
|
and installing USD/SMM/PSD docs.
jmc@ agrees with the direction, ok millert@ on an earlier diff
|
|
This resulted in the NUL terminator being written to the end of the
buffer which was not the same as the end of the string. That in
turn caused garbage bytes from malloc() to be processed. Also
change the NUL termination to be less error prone by writing the
NUL immediately after the last byte copied. OK sthen@
|
|
|
|
Includes the official fix for +-inf and +-nan handling.
|
|
|
|
Prevents strings beginning with "inf" or "nan" from being interpreted
as infinity or not-a-number respectively which still parsing "inf"
and "nan" (with or without a leading sign) correctly.
|
|
|
|
This is the only missing time function compared to those two
implementations. Doc changes OK jmc@
|