diff options
author | Ingo Schwarze <schwarze@cvs.openbsd.org> | 2015-10-13 23:30:43 +0000 |
---|---|---|
committer | Ingo Schwarze <schwarze@cvs.openbsd.org> | 2015-10-13 23:30:43 +0000 |
commit | 9925e5168895883876a7016b4e1fb1e97961064f (patch) | |
tree | 37f1c77c0a8e22ce3db34a9f414ae931714075e8 /regress | |
parent | c50fea35705e12062d913205519e3a1b4b80703a (diff) |
Reject the escape sequences \[uD800] to \[uDFFF] in the parser.
These surrogates are not valid Unicode codepoints,
so treat them just like any other undefined character escapes:
Warn about them and do not produce output.
Issue noticed while talking to stsp@, semarie@, and bentley@.
Diffstat (limited to 'regress')
-rw-r--r-- | regress/usr.bin/mandoc/char/unicode/input.out_ascii | 4 | ||||
-rw-r--r-- | regress/usr.bin/mandoc/char/unicode/input.out_lint | 2 | ||||
-rw-r--r-- | regress/usr.bin/mandoc/char/unicode/input.out_utf8 | 4 |
3 files changed, 6 insertions, 4 deletions
diff --git a/regress/usr.bin/mandoc/char/unicode/input.out_ascii b/regress/usr.bin/mandoc/char/unicode/input.out_ascii index a9946d1b528..7711574c61d 100644 --- a/regress/usr.bin/mandoc/char/unicode/input.out_ascii +++ b/regress/usr.bin/mandoc/char/unicode/input.out_ascii @@ -37,8 +37,8 @@ DDEESSCCRRIIPPTTIIOONN U+CFFF 0xecbfbf <?><?> end of last normal middle byte U+D000 0xed8080 <?><?> begin of strange middle byte U+D7FF 0xed9fbf <?><?> highest public three-byte - U+D800 0xeda080 <?>??? lowest surrogate - U+DFFF 0xedbfbf <?>??? highest surrogate + U+D800 0xeda080 ??? lowest surrogate + U+DFFF 0xedbfbf ??? highest surrogate U+E000 0xee8080 <?><?> lowest private use U+FFFF 0xefbfbf <?><?> highest three-byte diff --git a/regress/usr.bin/mandoc/char/unicode/input.out_lint b/regress/usr.bin/mandoc/char/unicode/input.out_lint index 77b6161cbab..8ac05edcef0 100644 --- a/regress/usr.bin/mandoc/char/unicode/input.out_lint +++ b/regress/usr.bin/mandoc/char/unicode/input.out_lint @@ -24,9 +24,11 @@ mandoc: input.in:34:19: ERROR: skipping bad character: 0xbf mandoc: input.in:41:25: ERROR: skipping bad character: 0xed mandoc: input.in:41:26: ERROR: skipping bad character: 0xa0 mandoc: input.in:41:27: ERROR: skipping bad character: 0x80 +mandoc: input.in:41:17: WARNING: invalid escape sequence: \[uD800] mandoc: input.in:42:25: ERROR: skipping bad character: 0xed mandoc: input.in:42:26: ERROR: skipping bad character: 0xbf mandoc: input.in:42:27: ERROR: skipping bad character: 0xbf +mandoc: input.in:42:17: WARNING: invalid escape sequence: \[uDFFF] mandoc: input.in:50:19: ERROR: skipping bad character: 0xf0 mandoc: input.in:50:20: ERROR: skipping bad character: 0x80 mandoc: input.in:50:21: ERROR: skipping bad character: 0x80 diff --git a/regress/usr.bin/mandoc/char/unicode/input.out_utf8 b/regress/usr.bin/mandoc/char/unicode/input.out_utf8 index 44813b8d7ae..89aa6719533 100644 --- a/regress/usr.bin/mandoc/char/unicode/input.out_utf8 +++ b/regress/usr.bin/mandoc/char/unicode/input.out_utf8 @@ -37,8 +37,8 @@ DDEESSCCRRIIPPTTIIOONN U+CFFF 0xecbfbf ì¿¿ì¿¿ end of last normal middle byte U+D000 0xed8080 퀀퀀 begin of strange middle byte U+D7FF 0xed9fbf ퟿퟿ highest public three-byte - U+D800 0xeda080 í €??? lowest surrogate - U+DFFF 0xedbfbf í¿¿??? highest surrogate + U+D800 0xeda080 ??? lowest surrogate + U+DFFF 0xedbfbf ??? highest surrogate U+E000 0xee8080  lowest private use U+FFFF 0xefbfbf ï¿¿ï¿¿ highest three-byte |