diff options
author | Kaleb Keithley <kaleb@freedesktop.org> | 2003-11-14 16:49:22 +0000 |
---|---|---|
committer | Kaleb Keithley <kaleb@freedesktop.org> | 2003-11-14 16:49:22 +0000 |
commit | 0a193e032ba1ecf3f003e027e833dc9d274cb740 (patch) | |
tree | a1dcc00cb7f5d26e437e05e658c38fc323fe919d /lisp/re/README |
Initial revision
Diffstat (limited to 'lisp/re/README')
-rw-r--r-- | lisp/re/README | 121 |
1 files changed, 121 insertions, 0 deletions
diff --git a/lisp/re/README b/lisp/re/README new file mode 100644 index 0000000..848e1e9 --- /dev/null +++ b/lisp/re/README @@ -0,0 +1,121 @@ +$XFree86: xc/programs/xedit/lisp/re/README,v 1.4 2002/11/15 07:01:32 paulo Exp $ + +LAST UPDATED: $Date$ + + This is a small regex library for fast matching tokens in text. It was built +to be used by xedit and it's syntax highlight code. It is not compliant with +IEEE Std 1003.2, but is expected to be used where very fast matching is +required, and exotic patterns will not be used. + + To understand what kind of patterns this library is expected to be used with, +see the file <XRoot>xc/programs/xedit/lisp/modules/progmodes/c.lsp and some +samples in the file tests.txt, with comments for patterns that will not work, +or may give incorrect results. + + The library is not built upon the standard regex library by Henry Spencer, +but is completely written from scratch, but it's syntax is heavily based on +that library, and the only reason for it to exist is that unfortunately +the standard version does not fit the requirements needed by xedit. +Anyways, I would like to thanks Henry for his regex library, it is a really +very useful tool. + + Small description of understood tokens: + + M A T C H I N G +------------------------------------------------------------------------ +. Any character (won't match newline if compiled with RE_NEWLINE) +\w Any word letter (shortcut to [a-zA-Z0-9_] +\W Not a word letter (shortcut to [^a-zA-Z0-9_] +\d Decimal number +\D Not a decimal number +\s A space +\S Not a space +\l A lower case letter +\u An upper case letter +\c A control character, currently the range 1-32 (minus tab) +\C Not a control character +\o Octal number +\O Not an octal number +\x Hexadecimal number +\X Not an hexadecimal number +\< Beginning of a word (matches an empty string) +\> End of a word (matches an empty string) +^ Beginning of a line (matches an empty string) +$ End of a line (matches an empty string) +[...] Matches one of the characters inside the brackets + ranges are specified separating two characters with "-". + If the first character is "^", matches only if the + character is not in this range. To add a "]" make it + the first character, and to add a "-" make it the last. +\1 to \9 Backreference, matches the text that was matched by a group, + that is, text that was matched by the pattern inside + "(" and ")". + + + O P E R A T O R S +------------------------------------------------------------------------ +() Any pattern inside works as a backreference, and is also + used to group patterns. +| Alternation, allows choosing different possibilities, like + character ranges, but allows patterns of different lengths. + + + R E P E T I T I O N +------------------------------------------------------------------------ +<re>* <re> may occur any number of times, including zero +<re>+ <re> must occur at least once +<re>? <re> is optional +<re>{<e>} <re> must occur exactly <e> times +<re>{<n>,} <re> must occur at least <n> times +<re>{,<m>} <re> must not occur more than <m> times +<re>{<n>,<m>} <re> must occur at least <n> times, but no more than <m> + + + Note that "." is a special character, and when used with a repetition +operator it changes completely its meaning. For example, ".*" matches +anything up to the end of the input string (unless the pattern was compiled +with RE_NEWLINE, in that case it will match anything, but a newline). + + + Limitations: + +o Only minimal matches supported. The engine has only one level "backtracking", + so, it also only does minimal matches to allow backreferences working + properly, and to avoid failing to match depending on the input. + +o Only one level "grouping", for example, with the pattern: + (a(b)c) + If "abc" is anywhere in the input, it will be in "\1", but there will + not exist a "\2" for "b". + +o Some "special repetitions" were not implemented, these are: + .{<e>} + .{<n>,} + .{,<m>} + .{<n>,<m>} + +o Some patterns will never match, for example: + \w*\d + Since "\w*" already includes all possible matches of "\d", "\d" will + only be tested when "\w*" failed. There are no plans to make such + patterns work. + + + Some of these limitations may be worked on future versions of the library, +but this is not what the library is expected to do, and, adding support for +correct handling of these would probably make the library slower, what is +not the reason of it to exist in the first time. + + If you need "true" regex than this library is not for you, but if all +you need is support for very quickly finding simple patterns, than this +library can be a very powerful tool, on some patterns it can run more +than 200 times faster than "true" regex implementations! And this is +the reason it was written. + + + + Send comments and code to me (paulo@XFree86.Org) or to the XFree86 +mailing/patch lists. + +-- +Paulo |