summaryrefslogtreecommitdiff
path: root/usr.bin/awk
diff options
context:
space:
mode:
authorTodd C. Miller <millert@cvs.openbsd.org>2024-04-25 18:33:54 +0000
committerTodd C. Miller <millert@cvs.openbsd.org>2024-04-25 18:33:54 +0000
commitdf30f24b2f34a4763c60a172f2ef6142c381667c (patch)
treec3854272fd57ede5fce0c3338f0ab5ca26a7f846 /usr.bin/awk
parent2664480f5c625d6ebfbd5cc09cf0dcc1a4c0c784 (diff)
Update awk to the Apr 22, 2024 version.
* fixed regex engine gototab reallocation issue that was introduced during the Nov 24 rewrite. * fixed use-after-free bug in fnematch due to adjbuf invalidating the pointers to buf.
Diffstat (limited to 'usr.bin/awk')
-rw-r--r--usr.bin/awk/FIXES53
-rw-r--r--usr.bin/awk/README.md5
-rw-r--r--usr.bin/awk/b.c42
-rw-r--r--usr.bin/awk/main.c4
-rw-r--r--usr.bin/awk/run.c6
5 files changed, 66 insertions, 44 deletions
diff --git a/usr.bin/awk/FIXES b/usr.bin/awk/FIXES
index 3b059250d5c..c4eef3bd8ea 100644
--- a/usr.bin/awk/FIXES
+++ b/usr.bin/awk/FIXES
@@ -25,19 +25,33 @@ THIS SOFTWARE.
This file lists all bug fixes, changes, etc., made since the
second edition of the AWK book was published in September 2023.
+Apr 22, 2024:
+ fixed regex engine gototab reallocation issue that was
+ introduced during the Nov 24 rewrite. Thanks to Arnold Robbins.
+ Fixed a scan bug in split in the case the separator is a single
+ character. thanks to Oguz Ismail for spotting the issue.
+
+Mar 10, 2024:
+ fixed use-after-free bug in fnematch due to adjbuf invalidating
+ the pointers to buf. thanks to github user caffe3 for spotting
+ the issue and providing a fix, and to Miguel Pineiro Jr.
+ for the alternative fix.
+ MAX_UTF_BYTES in fnematch has been replaced with awk_mb_cur_max.
+ thanks to Miguel Pineiro Jr.
+
Jan 22, 2024:
Restore the ability to compile with g++. Thanks to
Arnold Robbins.
Dec 24, 2023:
- matchop dereference after free problem fix when the first
- argument is a function call. thanks to Oguz Ismail Uysal.
+ Matchop dereference after free problem fix when the first
+ argument is a function call. Thanks to Oguz Ismail Uysal.
Fix inconsistent handling of --csv and FS set in the
command line. Thanks to Wilbert van der Poel.
- casting changes to int for is* functions.
+ Casting changes to int for is* functions.
Nov 27, 2023:
- Fix exit status of system on MacOS. update to REGRESS.
+ Fix exit status of system on MacOS. Update to REGRESS.
Thanks to Arnold Robbins.
Fix inconsistent handling of -F and --csv, and loss of csv
mode when FS is set.
@@ -45,7 +59,7 @@ Nov 27, 2023:
Nov 24, 2023:
Fix issue #199: gototab improvements to dynamically resize the
table, qsort and bsearch to improve the lookup speed as the
- table gets larger for multibyte input. thanks to Arnold Robbins.
+ table gets larger for multibyte input. Thanks to Arnold Robbins.
Nov 23, 2023:
Fix Issue #169, related to escape sequences in strings.
@@ -54,29 +68,29 @@ Nov 23, 2023:
by Miguel Pineiro Jr.
Nov 20, 2023:
- rewrite of fnematch to fix a number of issues, including
+ Rewrite of fnematch to fix a number of issues, including
extraneous output, out-of-bounds access, number of bytes
to push back after a failed match etc.
- thanks to Miguel Pineiro Jr.
+ Thanks to Miguel Pineiro Jr.
Nov 15, 2023:
- Man page edit, regression test fixes. thanks to Arnold Robbins
- consolidation of sub and gsub into dosub, removing duplicate
- code. thanks to Miguel Pineiro Jr.
+ Man page edit, regression test fixes. Thanks to Arnold Robbins
+ Consolidation of sub and gsub into dosub, removing duplicate
+ code. Thanks to Miguel Pineiro Jr.
gcc replaced with cc everywhere.
Oct 30, 2023:
- multiple fixes and a minor code cleanup.
- disabled utf-8 for non-multibyte locales, such as C or POSIX.
- fixed a bad char * cast that causes incorrect results on big-endian
- systems. also fixed an out-of-bounds read for empty CCL.
- fixed a buffer overflow in substr with utf-8 strings.
- many thanks to Todd C Miller.
+ Multiple fixes and a minor code cleanup.
+ Disabled utf-8 for non-multibyte locales, such as C or POSIX.
+ Fixed a bad char * cast that causes incorrect results on big-endian
+ systems. Also fixed an out-of-bounds read for empty CCL.
+ Fixed a buffer overflow in substr with utf-8 strings.
+ Many thanks to Todd C Miller.
Sep 24, 2023:
fnematch and getrune have been overhauled to solve issues around
- unicode FS and RS. also fixed gsub null match issue with unicode.
- big thanks to Arnold Robbins.
+ unicode FS and RS. Also fixed gsub null match issue with unicode.
+ Big thanks to Arnold Robbins.
Sep 12, 2023:
Fixed a length error in u8_byte2char that set RSTART to
@@ -101,9 +115,8 @@ Sep 12, 2023:
of a string of 3 emojis is 3, not 12 as it would be if bytes
were counted.
- Regular expressions are processes as UTF-8.
+ Regular expressions are processed as UTF-8.
Unicode literals can be written as \u followed by one
to eight hexadecimal digits. These may appear in strings and
regular expressions.
-
diff --git a/usr.bin/awk/README.md b/usr.bin/awk/README.md
index e83be10d55a..ea232e87689 100644
--- a/usr.bin/awk/README.md
+++ b/usr.bin/awk/README.md
@@ -38,6 +38,7 @@ Regular expressions may include UTF-8 code points, including `\u`.
The option `--csv` turns on CSV processing of input:
fields are separated by commas, fields may be quoted with
double-quote (`"`) characters, quoted fields may contain embedded newlines.
+Double-quotes in fields have to be doubled and enclosed in quoted fields.
In CSV mode, `FS` is ignored.
If no explicit separator argument is provided,
@@ -112,6 +113,8 @@ move this to some place like `/usr/bin/awk`.
If your system does not have `yacc` or `bison` (the GNU
equivalent), you need to install one of them first.
+The default in the `makefile` is `bison`; you will have
+to edit the `makefile` to use `yacc`.
NOTE: This version uses ISO/IEC C99, as you should also. We have
compiled this without any changes using `gcc -Wall` and/or local C
@@ -131,4 +134,4 @@ We don't usually do releases.
#### Last Updated
-Mon 30 Oct 2023 12:53:07 MDT
+Mon 05 Feb 2024 08:46:55 IST
diff --git a/usr.bin/awk/b.c b/usr.bin/awk/b.c
index bc3f06fd320..89f4918b7a0 100644
--- a/usr.bin/awk/b.c
+++ b/usr.bin/awk/b.c
@@ -1,4 +1,4 @@
-/* $OpenBSD: b.c,v 1.50 2024/01/25 16:40:51 millert Exp $ */
+/* $OpenBSD: b.c,v 1.51 2024/04/25 18:33:53 millert Exp $ */
/****************************************************************
Copyright (C) Lucent Technologies 1997
All Rights Reserved
@@ -613,11 +613,11 @@ static void resize_gototab(fa *f, int state)
size_t orig_size = f->gototab[state].allocated; // 2nd half of new mem is this size
memset(p + orig_size, 0, orig_size * sizeof(gtte)); // clean it out
- f->gototab[state].allocated = new_size; // update gotottab info
+ f->gototab[state].allocated = new_size; // update gototab info
f->gototab[state].entries = p;
}
-static int get_gototab(fa *f, int state, int ch) /* hide gototab inplementation */
+static int get_gototab(fa *f, int state, int ch) /* hide gototab implementation */
{
gtte key;
gtte *item;
@@ -644,7 +644,7 @@ static int entry_cmp(const void *l, const void *r)
return left->ch - right->ch;
}
-static int set_gototab(fa *f, int state, int ch, int val) /* hide gototab inplementation */
+static int set_gototab(fa *f, int state, int ch, int val) /* hide gototab implementation */
{
if (f->gototab[state].inuse == 0) {
f->gototab[state].entries[0].ch = ch;
@@ -657,8 +657,8 @@ static int set_gototab(fa *f, int state, int ch, int val) /* hide gototab inplem
if (tab->inuse + 1 >= tab->allocated)
resize_gototab(f, state);
- f->gototab[state].entries[f->gototab[state].inuse-1].ch = ch;
- f->gototab[state].entries[f->gototab[state].inuse-1].state = val;
+ f->gototab[state].entries[f->gototab[state].inuse].ch = ch;
+ f->gototab[state].entries[f->gototab[state].inuse].state = val;
f->gototab[state].inuse++;
return val;
} else {
@@ -683,9 +683,9 @@ static int set_gototab(fa *f, int state, int ch, int val) /* hide gototab inplem
gtt *tab = & f->gototab[state];
if (tab->inuse + 1 >= tab->allocated)
resize_gototab(f, state);
- ++tab->inuse;
f->gototab[state].entries[tab->inuse].ch = ch;
f->gototab[state].entries[tab->inuse].state = val;
+ ++tab->inuse;
qsort(f->gototab[state].entries,
f->gototab[state].inuse, sizeof(gtte), entry_cmp);
@@ -836,8 +836,6 @@ int nematch(fa *f, const char *p0) /* non-empty match, for sub */
}
-#define MAX_UTF_BYTES 4 // UTF-8 is up to 4 bytes long
-
/*
* NAME
* fnematch
@@ -874,16 +872,28 @@ bool fnematch(fa *pfa, FILE *f, char **pbuf, int *pbufsize, int quantum)
do {
/*
- * Call u8_rune with at least MAX_UTF_BYTES ahead in
+ * Call u8_rune with at least awk_mb_cur_max ahead in
* the buffer until EOF interferes.
*/
- if (k - j < MAX_UTF_BYTES) {
- if (k + MAX_UTF_BYTES > buf + bufsize) {
+ if (k - j < awk_mb_cur_max) {
+ if (k + awk_mb_cur_max > buf + bufsize) {
+ char *obuf = buf;
adjbuf(&buf, &bufsize,
- bufsize + MAX_UTF_BYTES,
+ bufsize + awk_mb_cur_max,
quantum, 0, "fnematch");
+
+ /* buf resized, maybe moved. update pointers */
+ *pbufsize = bufsize;
+ if (obuf != buf) {
+ i = buf + (i - obuf);
+ j = buf + (j - obuf);
+ k = buf + (k - obuf);
+ *pbuf = buf;
+ if (patlen)
+ patbeg = buf + (patbeg - obuf);
+ }
}
- for (n = MAX_UTF_BYTES ; n > 0; n--) {
+ for (n = awk_mb_cur_max ; n > 0; n--) {
*k++ = (c = getc(f)) != EOF ? c : 0;
if (c == EOF) {
if (ferror(f))
@@ -920,10 +930,6 @@ bool fnematch(fa *pfa, FILE *f, char **pbuf, int *pbufsize, int quantum)
s = 2;
} while (1);
- /* adjbuf() may have relocated a resized buffer. Inform the world. */
- *pbuf = buf;
- *pbufsize = bufsize;
-
if (patlen) {
/*
* Under no circumstances is the last character fed to
diff --git a/usr.bin/awk/main.c b/usr.bin/awk/main.c
index 481e98b078a..11296ce12bf 100644
--- a/usr.bin/awk/main.c
+++ b/usr.bin/awk/main.c
@@ -1,4 +1,4 @@
-/* $OpenBSD: main.c,v 1.68 2024/01/25 16:40:51 millert Exp $ */
+/* $OpenBSD: main.c,v 1.69 2024/04/25 18:33:53 millert Exp $ */
/****************************************************************
Copyright (C) Lucent Technologies 1997
All Rights Reserved
@@ -23,7 +23,7 @@ ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
THIS SOFTWARE.
****************************************************************/
-const char *version = "version 20240122";
+const char *version = "version 20240422";
#define DEBUG
#include <stdio.h>
diff --git a/usr.bin/awk/run.c b/usr.bin/awk/run.c
index 73ab148952b..bf24e29bc73 100644
--- a/usr.bin/awk/run.c
+++ b/usr.bin/awk/run.c
@@ -1,4 +1,4 @@
-/* $OpenBSD: run.c,v 1.84 2024/01/25 16:40:51 millert Exp $ */
+/* $OpenBSD: run.c,v 1.85 2024/04/25 18:33:53 millert Exp $ */
/****************************************************************
Copyright (C) Lucent Technologies 1997
All Rights Reserved
@@ -1831,7 +1831,7 @@ Cell *split(Node **a, int nnn) /* split(a[0], a[1], a[2]); a[3] is type */
for (;;) {
n++;
t = s;
- while (*s != sep && *s != '\n' && *s != '\0')
+ while (*s != sep && *s != '\0')
s++;
temp = *s;
setptr(s, '\0');
@@ -2527,7 +2527,7 @@ void backsub(char **pb_ptr, const char **sptr_ptr);
Cell *dosub(Node **a, int subop) /* sub and gsub */
{
fa *pfa;
- int tempstat;
+ int tempstat = 0;
char *repl;
Cell *x;