Age | Commit message (Collapse) | Author |
|
any changes not taken noted on tech, but chiefly here i did not take the
cancelation - cancellation changes;
|
|
from josiah frentsos, tweaked by schwarze
ok schwarze
|
|
In particular, mention the 4.0BSD and v8/Tahoe APIs that were
supported until OpenBSD 5.4 and that matter for the evolution
of RE functions in the BSD libc.
Joint work with and OK jsg@.
Regarding authorship of the v8 functions, Russ Cox writes
near the end of https://swtch.com/~rsc/regexp/regexp1.html :
"While writing the text editor sam in the early 1980s, Rob Pike
wrote a new regular expression implementation, which Dave Presotto
extracted into a library that appeared in the Eighth Edition.
Pike's implementation incorporated submatch tracking into an efficient
NFA simulation but, like the rest of the Eighth Edition source, was
not widely distributed. Pike himself did not realize that his
technique was anything new.
Henry Spencer reimplemented the Eighth Edition library interface
from scratch, but using backtracking, and released his implementation
into the public domain. It became very widely used, eventually
serving as the basis for the slow regular expression implementations
mentioned earlier: Perl, PCRE, Python, and so on. (In his defense,
Spencer knew the routines could be slow, and he didn't know that a
more efficient algorithm existed. He even warned in the documentation,
"Many users have found the speed perfectly adequate, although
replacing the insides of egrep with this code would be a mistake.")
Pike's regular expression implementation, extended to support
Unicode, was made freely available with sam in late 1992, but the
particularly efficient regular expression search algorithm went
unnoticed." [...]
|
|
and fix some minor markup nits:
get rid of useless .Tn macros and add one missing .Fn macro.
No objection from jsg@.
|
|
OK kn@, millert@
|
|
quintuple negation into one with a simple negation.
From miod, ok millert
|
|
comments that they will evaluate their arguments multiple times.
From miod, ok millert
|
|
From miod@, OK tb@
|
|
Changing it from ((condition) || function call) to an if() wrapped
in a do/while is easier to read and more stylistically consistent.
The seterr() function no longer needs to return a value.
From miod@, OK tb@
|
|
From miod@, OK tb@
|
|
Also, the temporary array in nonnewline() can be made static const.
From miod@, OK tb@
|
|
"count" bytes available in an array of char "start" and "end" both point
to.
This is fine, unless "start + count" goes beyond the last element of the
array. In this case, pedantic interpretation of the C standard makes
the comparison of such a pointer against "end" undefined, and optimizers
from hell will happily remove as much code as possible because of this.
An example of this occurs in regcomp.c's bothcases(), which defines
bracket[3], sets "next" to "bracket" and "end" to "bracket + 2". Then it
invokes p_bracket(), which starts with "if (p->next + 5 < p->end)"...
Because bothcases() and p_bracket() are static functions in regcomp.c,
there is a real risk of miscompilation if aggressive inlining happens.
The following diff rewrites the "start + count < end" constructs into
"end - start > count". Assuming "end" and "start" are always pointing in
the array (such as "bracket[3]" above), "end - start" is well-defined
and can be compared without trouble.
As a bonus, MORE2() implies MORE() therefore SEETWO() can be simplified
a bit.
from miod, ok millert
|
|
return value to avoid a redundant strlen() call.
from miod, ok millert
|
|
dealing with it. This code was incomplete anyway.
from miod, ok millert
|
|
from miod, ok millert
|
|
an OOR2 operator. Also includes a regress test for the issue.
From FreeBSD via miod@
|
|
ok millert@ deraadt@
|
|
|
|
When an invalid regular expression is passed, seterr() is called which
sets p->error to the appropriate error code and sets p->next and
p->end to nuls[]. However, p->next is decremented in the default
case in p_ere_exp() and p_simp_re() which makes it point to one
byte before nuls[]. From FreeBSD. OK tedu@ deraadt@
|
|
const promise when processing it in the regex engine.
Minor tweak and OK schwarze@
|
|
|
|
MUSTEAT.
ok tom@
|
|
expressions generated by the REQUIRE() macro. Thus eliminating from build
output 100 lines or so of gcc complaints about "computed but not used".
cluebat & ok tom@
|
|
or prototypes. Ditto for some of the char* and void* casts too.
verified no change to instructions on ILP32 (i386) and LP64 (amd64)
ok natano@ abluhm@ deraadt@ millert@
|
|
The new code sees this combination as a continuation of string at offset
pmatch[0].rm_so, instead of a new string which starts at that offset.
This change fixes a search quirk in vi and is needed for upcoming fixes in
ed/sed/vi.
This new behaviour is also used in gnu regex.
Lots of help from schwarze@
Manpage bits by schwarze@
OK schwarze@ and millert@
|
|
|
|
This change touches code that only runs when REG_BASIC is given and the
regular expression is anchored with [[:<:]] or \< _and_ uses backreferences.
Simplify the logic while here, already looking at the previous character
if REG_STARTEND and REG_NOTBOL are both in use, in anticipation of
martijn@'s upcoming patch which will further improve REG_STARTEND.
OK millert@ martijn@
Also tested by Pedro Giffuni (pfg) on FreeBSD.
|
|
This change touches code that only runs when REG_BASIC is given and
the regular expression is anchored with ^ _and_ uses backreferences.
The segfault could only be triggered when the ^ anchor was inside
a leading () subexpression quantified with *.
OK martijn@
Patch also proofread by Pedro Giffuni <pfg at FreeBSD dot org>.
|
|
okay millert@
|
|
correctly - logically complete that now by removing MLINKS from base;
authors need only to ensure there is an entry in NAME for any function/
util being added. MLINKS will still work, and remain for perl to ease
upgrades;
ok nicm (curses) bcook (ssl)
ok schwarze, who provided a lot of feedback and assistance
ok tb natano jung
|
|
ok tb@
|
|
ok tb@
|
|
ok and valuable input from millert@
|
|
or otherwise change Dt to reflect the name of an existing function;
feedback/ok schwarze
|
|
|
|
In some cases, do additional cleanup in the immediate vicinity.
|
|
the symbols are weak
|
|
Predefined strings are not very portable across troff implementations,
and they make the source much harder to read. Usually the intended
character can be written directly.
No output changes, except for two instances where the incorrect escape
was used in the first place.
tweaks + ok schwarze@
|
|
while here, remove the lie that regex(3) character classes would
depend on the locale;
ok jmc@
|
|
|
|
ok doug millert
|
|
Avoid potential integer overflow in the size argument of malloc() and
realloc() by using reallocarray() to avoid unchecked multiplication.
ok deraadt@
|
|
proper mult int overflow detection. The existing code already handles
malloc failure properly, of course.
|
|
|
|
from Pedro F. Giffuni in FreeBSD pr 153257
ok millert@ tedu@
|
|
because such support is reportedly common and in somewhat wide use.
undocumented for now because we don't endorse this.
ok jsg millert
|
|
|
|
tweaking and ok millert@ jmc@
|
|
|
|
OUT to EOW, making its domain CHAR_MIN...CODEMAX. It makes sense to have
pchar() take the same domain and output those non-characters appropriately,
so the (unsigned char) cast for isprint() goes in pchar(). Constipate
pchar() while we're here, and let print() pass through NUL to it, as it
knows how to output it unambiguously.
ok otto@ millert@
|