summaryrefslogtreecommitdiff
path: root/lib/libc/regex
AgeCommit message (Collapse)Author
2022-12-27spelling fixes; from paul tagliamonteJason McIntyre
any changes not taken noted on tech, but chiefly here i did not take the cancelation - cancellation changes;
2022-09-11.Li -> .Vt where appropriate;Jason McIntyre
from josiah frentsos, tweaked by schwarze ok schwarze
2022-08-06Improve HISTORY and add AUTHORS.Ingo Schwarze
In particular, mention the 4.0BSD and v8/Tahoe APIs that were supported until OpenBSD 5.4 and that matter for the evolution of RE functions in the BSD libc. Joint work with and OK jsg@. Regarding authorship of the v8 functions, Russ Cox writes near the end of https://swtch.com/~rsc/regexp/regexp1.html : "While writing the text editor sam in the early 1980s, Rob Pike wrote a new regular expression implementation, which Dave Presotto extracted into a library that appeared in the Eighth Edition. Pike's implementation incorporated submatch tracking into an efficient NFA simulation but, like the rest of the Eighth Edition source, was not widely distributed. Pike himself did not realize that his technique was anything new. Henry Spencer reimplemented the Eighth Edition library interface from scratch, but using backtracking, and released his implementation into the public domain. It became very widely used, eventually serving as the basis for the slow regular expression implementations mentioned earlier: Perl, PCRE, Python, and so on. (In his defense, Spencer knew the routines could be slow, and he didn't know that a more efficient algorithm existed. He even warned in the documentation, "Many users have found the speed perfectly adequate, although replacing the insides of egrep with this code would be a mistake.") Pike's regular expression implementation, extended to support Unicode, was made freely available with sam in late 1992, but the particularly efficient regular expression search algorithm went unnoticed." [...]
2022-08-06Delete the ridiculous first three sentences of BUGSIngo Schwarze
and fix some minor markup nits: get rid of useless .Tn macros and add one missing .Fn macro. No objection from jsg@.
2021-07-07Mention that there are alternatives for ERE '+' and '?' in BRE.Martijn van Duren
OK kn@, millert@
2021-01-03Make CHIN() Boolean-valued and use this to turn an expression with aTheo Buehler
quintuple negation into one with a simple negation. From miod, ok millert
2021-01-03Turn macros into inline functions so that there is no need to document inTheo Buehler
comments that they will evaluate their arguments multiple times. From miod, ok millert
2021-01-02Remove two now-unused functions; a result of the categories removal.Todd C. Miller
From miod@, OK tb@
2020-12-31More regular error handling with the REQUIRE macro.Todd C. Miller
Changing it from ((condition) || function call) to an if() wrapped in a do/while is easier to read and more stylistically consistent. The seterr() function no longer needs to return a value. From miod@, OK tb@
2020-12-31Remove unused categories in re_guts; they are written to but never read.Todd C. Miller
From miod@, OK tb@
2020-12-31Strings in struct parse can be const, they are never modified.Todd C. Miller
Also, the temporary array in nonnewline() can be made static const. From miod@, OK tb@
2020-12-30regcomp.c uses the "start + count < end" idiom to check that there areTheo Buehler
"count" bytes available in an array of char "start" and "end" both point to. This is fine, unless "start + count" goes beyond the last element of the array. In this case, pedantic interpretation of the C standard makes the comparison of such a pointer against "end" undefined, and optimizers from hell will happily remove as much code as possible because of this. An example of this occurs in regcomp.c's bothcases(), which defines bracket[3], sets "next" to "bracket" and "end" to "bracket + 2". Then it invokes p_bracket(), which starts with "if (p->next + 5 < p->end)"... Because bothcases() and p_bracket() are static functions in regcomp.c, there is a real risk of miscompilation if aggressive inlining happens. The following diff rewrites the "start + count < end" constructs into "end - start > count". Assuming "end" and "start" are always pointing in the array (such as "bracket[3]" above), "end - start" is well-defined and can be compared without trouble. As a bonus, MORE2() implies MORE() therefore SEETWO() can be simplified a bit. from miod, ok millert
2020-12-30Constify the strings in regerror.c and make use of the strlcpy()Theo Buehler
return value to avoid a redundant strlen() call. from miod, ok millert
2020-12-30cclasses[] multis field is always an empty string. Remove it and codeTheo Buehler
dealing with it. This code was incomplete anyway. from miod, ok millert
2020-12-30Constify the strings in cnames[]. No functional change.Theo Buehler
from miod, ok millert
2020-12-28Fix an off-by-one error in the marking of the O_CH operator followingTodd C. Miller
an OOR2 operator. Also includes a regress test for the issue. From FreeBSD via miod@
2020-10-13Do some easy .data -> .rodata/.data.rel.ro conversionsPhilip Guenther
ok millert@ deraadt@
2019-02-05Fix typo in last commit.Todd C. Miller
2019-02-05Avoid an out of bounds read when regcomp() is passed a bad expression.Todd C. Miller
When an invalid regular expression is passed, seterr() is called which sets p->error to the appropriate error code and sets p->next and p->end to nuls[]. However, p->next is decremented in the default case in p_ere_exp() and p_simp_re() which makes it point to one byte before nuls[]. From FreeBSD. OK tedu@ deraadt@
2018-07-11Drop a const-bomb on regexec. It's probably not a good idea to remove aMartijn van Duren
const promise when processing it in the regex engine. Minor tweak and OK schwarze@
2017-10-30fix oob read; form llvm via Vlad Tsyrklevich; ok millert@Otto Moerbeek
2016-12-22Clarify code by eliminating unused #define's MUSTSEE, MUSTNOTSEE and inliningKenneth R Westerback
MUSTEAT. ok tom@
2016-12-21Adopt relevant part of NetBSD's r1.7 commit to discard unused results of theKenneth R Westerback
expressions generated by the REQUIRE() macro. Thus eliminating from build output 100 lines or so of gcc complaints about "computed but not used". cluebat & ok tom@
2016-09-21Delete casts to off_t and size_t that are implied by assignmentsPhilip Guenther
or prototypes. Ditto for some of the char* and void* casts too. verified no change to instructions on ILP32 (i386) and LP64 (amd64) ok natano@ abluhm@ deraadt@ millert@
2016-05-26Change the way regexec handles REG_STARTEND combined with REG_NOTBOL.Martijn van Duren
The new code sees this combination as a continuation of string at offset pmatch[0].rm_so, instead of a new string which starts at that offset. This change fixes a search quirk in vi and is needed for upcoming fixes in ed/sed/vi. This new behaviour is also used in gnu regex. Lots of help from schwarze@ Manpage bits by schwarze@ OK schwarze@ and millert@
2016-05-25KNF with respect to indentation; no code changeIngo Schwarze
2016-05-25Fix another one-byte buffer underflow (read access only).Ingo Schwarze
This change touches code that only runs when REG_BASIC is given and the regular expression is anchored with [[:<:]] or \< _and_ uses backreferences. Simplify the logic while here, already looking at the previous character if REG_STARTEND and REG_NOTBOL are both in use, in anticipation of martijn@'s upcoming patch which will further improve REG_STARTEND. OK millert@ martijn@ Also tested by Pedro Giffuni (pfg) on FreeBSD.
2016-05-17Fix a one-byte buffer underflow (read access only).Ingo Schwarze
This change touches code that only runs when REG_BASIC is given and the regular expression is anchored with ^ _and_ uses backreferences. The segfault could only be triggered when the ^ anchor was inside a leading () subexpression quantified with *. OK martijn@ Patch also proofread by Pedro Giffuni <pfg at FreeBSD dot org>.
2016-05-04Remove old cruft.Vadim Zhukov
okay millert@
2016-03-30for some time now mandoc has not required MLINKS to functionJason McIntyre
correctly - logically complete that now by removing MLINKS from base; authors need only to ensure there is an entry in NAME for any function/ util being added. MLINKS will still work, and remain for perl to ease upgrades; ok nicm (curses) bcook (ssl) ok schwarze, who provided a lot of feedback and assistance ok tb natano jung
2015-12-28Remove NULL-checks before free() and needless argument casts.mmcc
ok tb@
2015-12-28Remove NULL-checks before free() and unnecessary argument casts.mmcc
ok tb@
2015-12-28Remove NULL-checks before free() and a few related dead assignments.mmcc
ok and valuable input from millert@
2015-11-10update NAME section to include all documented functions,Jason McIntyre
or otherwise change Dt to reflect the name of an existing function; feedback/ok schwarze
2015-11-01delete old lint ARGSUSED commentsPhilip Guenther
2015-09-14Avoid .Ns right after .Pf, it's pointless.Ingo Schwarze
In some cases, do additional cleanup in the immediate vicinity.
2015-09-14Wrap <langinfo.h> and <regexp.h> so internal calls go direct andPhilip Guenther
the symbols are weak
2015-02-28Reduce usage of predefined strings in manpages.Anthony J. Bentley
Predefined strings are not very portable across troff implementations, and they make the source much harder to read. Usually the intended character can be written directly. No output changes, except for two instances where the incorrect escape was used in the first place. tweaks + ok schwarze@
2014-12-09put back some information what the character classes actually mean;Ingo Schwarze
while here, remove the lie that regex(3) character classes would depend on the locale; ok jmc@
2014-12-09no more ctype(3);Jason McIntyre
2014-10-18reallocarray() -- a little tricky to reviewTheo de Raadt
ok doug millert
2014-10-11Userland reallocarray() audit.Doug Hogan
Avoid potential integer overflow in the size argument of malloc() and realloc() by using reallocarray() to avoid unchecked multiplication. ok deraadt@
2014-10-09use reallocarray(NULL, a, b) instead of malloc(a, b), which gives usTheo de Raadt
proper mult int overflow detection. The existing code already handles malloc failure properly, of course.
2014-09-10zap trailing whitespace;Jason McIntyre
2014-09-10document \<word\> as being non standardJonathan Gray
from Pedro F. Giffuni in FreeBSD pr 153257 ok millert@ tedu@
2014-09-08add \<word\> support to regcomp. prompted by renewed interest from jsgTed Unangst
because such support is reportedly common and in somewhat wide use. undocumented for now because we don't endorse this. ok jsg millert
2014-05-06reallocarray for things which are arrays. ok deraadtTed Unangst
2014-01-22Use consistent phrasing for bitmask flags.Philip Guenther
tweaking and ok millert@ jmc@
2014-01-21obvious .Pa fixes; found with mandocdb(8)Ingo Schwarze
2013-11-28The print() routine here can be passed at least some of the non-charactersPhilip Guenther
OUT to EOW, making its domain CHAR_MIN...CODEMAX. It makes sense to have pchar() take the same domain and output those non-characters appropriately, so the (unsigned char) cast for isprint() goes in pchar(). Constipate pchar() while we're here, and let print() pass through NUL to it, as it knows how to output it unambiguously. ok otto@ millert@