summaryrefslogtreecommitdiff
path: root/usr.bin/awk
AgeCommit message (Collapse)Author
2024-08-11Even though US-ASCII (= ANSI X3.4-1986) only defines 128 characters,Ingo Schwarze
the POSIX standard explicitly requires in section 6.2 that "the POSIX locale shall contain 256 single-byte characters", see: https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap06.html#tag_06_02 So the current behaviour of treating non-ASCII bytes in an LC_CTYPE=POSIX input stream as if they were characters is not a POSIX violation, but actually required by the standard - and not just for awk(1), but for utility programs in general and even for library functions in general. Consequently, delete the wrong sentence i added to the STANDARDS section last year. Thanks to millert@ and jmc@ for making me realize my mistake. OK millert@ jmc@
2024-08-03Update awk to the July 28, 2024 version.Todd C. Miller
* Fixed readcsvrec resize segfault when reading csv records longer than 8k. * Rewrite if-else chain in quoted as a switch.
2024-07-30bump posix spec from 2008 to 2024; ok millertJason McIntyre
2024-06-04Avoid unnecessary string traversals in u8_isutf() and substr().Todd C. Miller
For u8_isutf() the conditionals already ensure that a NUL byte won't match. For substr() we can use the byte offset of 'm' to avoid re-scanning the initial part of the string. From Jonas Bechtel.
2024-06-03Spelling fixes and removal of unneeded prototypes and extern.Todd C. Miller
From jsg@ via upstream.
2024-06-03Build with WARNINGS=Yes and fix resulting warnings.Todd C. Miller
2024-05-05add upstream change to fix the buildJonathan Gray
ok tb@ deraadt@
2024-05-04Update awk to the May 4, 2024 version.Todd C. Miller
Fixes a use-after-free bug with ARGV for "delete ARGV".
2024-04-25Update awk to the Apr 22, 2024 version.Todd C. Miller
* fixed regex engine gototab reallocation issue that was introduced during the Nov 24 rewrite. * fixed use-after-free bug in fnematch due to adjbuf invalidating the pointers to buf.
2024-01-25Update awk to the Jan 22, 2024 version.Todd C. Miller
2023-11-28Update awk to the Nov 27, 2023 version.Todd C. Miller
2023-11-25Update awk to the Nov 24, 2023 version.Todd C. Miller
2023-11-22Update awk to the Nov 20, 2023 version.Todd C. Miller
This includes a rewrite of the fnematch() function as well as a refactoring of the sub and gsub implementation.
2023-11-15fnematch: fix a bug that could result in extra chars being pushed back.Todd C. Miller
From Arnold Robbins. https://github.com/onetrueawk/awk/pull/213
2023-11-15fnematch: fix out-of-bounds access on EOFTodd C. Miller
fnematch() expects to store a NUL byte when EOF is encountered. However, the rewrite broke this assumption because r.len from getrune() is zero on EOF. This results in j becoming negative on EOF, causing an out-of-bounds access. It is simplest to just force r.len to 1 on EOF to copy a single NUL byte--the rune is initialized to zero even for EOF. This also fixes the call to adjbuf(). We cannot use 'k' to determine when we need to expand the buffer now that we are potentially reading more than a single byte at a time. https://github.com/onetrueawk/awk/pull/211
2023-10-31Update awk to Oct 30, 2023 version.Todd C. Miller
This is really just a version number bump as we already have the fixes committed.
2023-10-30This is the OpenBSD version of Awk.Todd C. Miller
2023-10-30Minor cosmetic changes to make our awk match my github branch.Todd C. Miller
2023-10-30Include strings.h for the strncasecmp() prototype.Todd C. Miller
From upstream.
2023-10-28substr: fix buffer overflow with utf-8 stringsTodd C. Miller
We need to use u8_strlen(), not strlen(), to compute the length. Otherwise, there may be an out of bounds write when writing the NUL terminator to set the length of the substring. https://github.com/onetrueawk/awk/pull/205
2023-10-06Correctly reset the goto table for a state.Todd C. Miller
We cannot use set_gototab() to reset all the entries for a state, it will leave existing entries as-is. Add a new reset_gototab() function that zeroes the table entries for the specified state. There is no need to reset the goto table immediately after resize_state(), it is already initialized via calloc(). Fixes https://github.com/onetrueawk/awk/issues/199
2023-10-06Update awk to Sep 24, 2023 version.Todd C. Miller
fnematch and getrune have been overhauled to solve issues around unicode FS and RS. also fixed gsub null match issue with unicode. big thanks to Arnold Robbins.
2023-09-21--csv is an extension; ok millertJason McIntyre
2023-09-21Fix a potential out-of-bounds read caused by the big-endian fix.Todd C. Miller
We must store a UTF-32 empty string, not UTF-8 empty string, for an empty CCL. Found running the awk test suite with address sanitizer.
2023-09-21Document LC_CTYPE.Ingo Schwarze
Based on a diff from millert@ with additions by me. Feedback and OK millert@.
2023-09-20Support --version option like upstream awk but don't document it.Todd C. Miller
Upstream awk has supported --version for a long time but does not support -V like our awk does. Both options are supported by gawk.
2023-09-20Use awk_mb_cur_max in nawk_convert() instead of MB_CUR_MAX.Todd C. Miller
2023-09-19Compare int value against 0, not '\0', for consistency.Todd C. Miller
2023-09-18Fix a bad cast to char * that causes incorrect results on big endian.Todd C. Miller
Now that awk stores chars as int we need to cast the Node * to int *.
2023-09-18Disable utf-8 for non-multibyte locales, such as C or POSIX.Todd C. Miller
This makes it possible to get the old awk behavior (where chars are bytes) by setting LC_CTYPE to C or POSIX. OK schwarze@
2023-09-18add --csv to usage(), and reformat it to match manual; while here,Jason McIntyre
reformat a lengthy line in awk.1; ok millert
2023-09-182 cases of c99 for-scope variable decl, when a variable already existsTheo de Raadt
in scope. but a 3rd similar situation in the same scope exists also, which does not create a new variable, and uses the upper scope variable. Pretty sloppy stuff. ok millert
2023-09-17Update to the One True Awk, 2nd edition (Sep 12, 2023).Todd C. Miller
This corresponds to the 2nd edition of "The AWK Programming Language" and adds support for UTF-8 and comma-separated value inputs.
2023-09-15update awk book reference for the second editionJonathan Gray
it will be published in 2023 with a copyright date of 2024 ok jmc@ millert@
2023-09-10Update awk to Sep 6, 2023 version.Todd C. Miller
2023-09-09Update awk to Dec 15, 2022 version.Todd C. Miller
Force hex escapes in strings to be no more than two characters, as they already are in regular expressions. This brings internal consistency, as well as consistency with gawk.
2022-09-21Update awk to Sep 12, 2022 version.Todd C. Miller
Fix undefined behavior and a use-after-free in cat().
2022-09-01Update awk to Aug 30, 2022 version.Todd C. Miller
Various leaks and use-after-free issues plugged/fixed.
2022-06-03Memory leak when assigning a string to some of the built-in variables.Todd C. Miller
Allocated string erroneously marked DONTFREE. From Miguel Pineiro Jr.
2022-06-03The fulfillment of an assignment operand had been truncating itsTodd C. Miller
entry in ARGV (since circa 1989). From Miguel Pineiro Jr.
2022-06-03Fix a file management memory leak that appears to have been thereTodd C. Miller
since the files array was first initialized with stdin, stdout, and stderr (circa 1992). From Miguel Pineiro Jr.
2022-01-27Update awk to Dec 8, 2021 version.Todd C. Miller
Fixes error handling in closefile() and closeall(). Long standing warnings had been made fatal and some fatal errors went undetected.
2021-11-12Update awk to Nov 03, 2021 version.Todd C. Miller
We already had the fix so no actual code changes.
2021-11-08missing full stop;Jason McIntyre
2021-11-02Update awk to October 12, 2021 version.Todd C. Miller
Fixes a decision bug with trailing stuff in lib.c:is_valid_number. All other fixes were already present.
2021-11-01awkgetline: do not access unitialized data on EOFTodd C. Miller
getrec() returns 0 on EOF and leaves the contents of buf unchanged. From https://github.com/onetrueawk/awk/pull/134
2021-07-27POSIX mandates that -F str be treated the same as -v FS=str.Todd C. Miller
For a null string, this was not the case. Since awk(1) documents that a null string for FS has a specific behavior, make -F '' behave consistently with -v FS="". https://github.com/onetrueawk/awk/pull/128
2021-07-08Avoid a potential buffer overflow in backslash escaping.Todd C. Miller
https://github.com/onetrueawk/awk/issues/121
2021-06-10Fix readrec's definition of a recordTodd C. Miller
It is not sufficient to check for the EOF flag on a stream. From https://github.com/onetrueawk/awk/pull/117
2021-04-19RS ^-anchoring needs to know if it's reading the first record of a file.Todd C. Miller
Without this fix, when reading the first record of an input file named on the command line, the regular expression engine will be misconfigured, precluding a successful match. From Miguel Pineiro Jr