diff options
author | Ingo Schwarze <schwarze@cvs.openbsd.org> | 2019-02-24 04:54:37 +0000 |
---|---|---|
committer | Ingo Schwarze <schwarze@cvs.openbsd.org> | 2019-02-24 04:54:37 +0000 |
commit | 3d32e6b92c48da3621e92f72abab9354e09cf387 (patch) | |
tree | 9d0cc00c7bb1a376cf8599d25b42f0e6b64f676d /usr.bin/less | |
parent | 4ba254967eda4f7b00607c0abb7cf1604d8011f3 (diff) |
To measure the display width of a wide character in pwidth(), use
the standard function wcwidth(3) instead of several hand-rolled
functions accessing outdated local character tables, making this
part of the code conform to our in-tree Unicode 10.
Of course, with the current hand-rolled (and buggy) UTF-8 parser
contained in less(1), this only works if wchar_t stores UCS-4 values
and is more than 31 bits wide, but both will always be true on
OpenBSD, and ultmately, we shall switch to mbtowc(3) for parsing
anyway, lifting these restrictuons.
The existence of the outdated character tables was originally
called out by Evan Silberman on bugs@.
OK stsp@
Diffstat (limited to 'usr.bin/less')
-rw-r--r-- | usr.bin/less/line.c | 75 |
1 files changed, 40 insertions, 35 deletions
diff --git a/usr.bin/less/line.c b/usr.bin/less/line.c index 42d1a24502b..c39a639fef4 100644 --- a/usr.bin/less/line.c +++ b/usr.bin/less/line.c @@ -15,6 +15,8 @@ * in preparation for output to the screen. */ +#include <wchar.h> + #include "charset.h" #include "less.h" @@ -373,50 +375,53 @@ attr_ewidth(int a) * attribute sequence to be inserted, so this must be taken into account. */ static int -pwidth(LWCHAR ch, int a, LWCHAR prev_ch) +pwidth(wchar_t ch, int a, wchar_t prev_ch) { int w; - if (ch == '\b') - /* - * Backspace moves backwards one or two positions. - * XXX - Incorrect if several '\b' in a row. - */ - return ((utf_mode && is_wide_char(prev_ch)) ? -2 : -1); - - if (!utf_mode || is_ascii_char(ch)) { - if (control_char((char)ch)) { - /* - * Control characters do unpredictable things, - * so we don't even try to guess; say it doesn't move. - * This can only happen if the -r flag is in effect. - */ - return (0); - } - } else { - if (is_composing_char(ch) || is_combining_char(prev_ch, ch)) { - /* - * Composing and combining chars take up no space. - * - * Some terminals, upon failure to compose a - * composing character with the character(s) that - * precede(s) it will actually take up one column - * for the composing character; there isn't much - * we could do short of testing the (complex) - * composition process ourselves and printing - * a binary representation when it fails. - */ - return (0); - } + /* + * In case of a backspace, back up by the width of the previous + * character. If that is non-printable (for example another + * backspace) or zero width (for example a combining accent), + * the terminal may actually back up to a character even further + * back, but we no longer know how wide that may have been. + * The best guess possible at this point is that it was + * hopefully width one. + */ + if (ch == L'\b') { + w = wcwidth(prev_ch); + if (w <= 0) + w = 1; + return (-w); } + w = wcwidth(ch); + + /* + * Non-printable characters can get here if the -r flag is in + * effect, and possibly in some other situations (XXX check that!). + * Treat them as zero width. + * That may not always match their actual behaviour, + * but there is no reasonable way to be more exact. + */ + if (w == -1) + w = 0; + + /* + * Combining accents take up no space. + * Some terminals, upon failure to compose them with the + * characters that precede them, will actually take up one column + * for the combining accent; there isn't much we could do short + * of testing the (complex) composition process ourselves and + * printing a binary representation when it fails. + */ + if (w == 0) + return (0); + /* * Other characters take one or two columns, * plus the width of any attribute enter/exit sequence. */ - w = 1; - if (is_wide_char(ch)) - w++; if (curr > 0 && !is_at_equiv(attr[curr-1], a)) w += attr_ewidth(attr[curr-1]); if ((apply_at_specials(a) != AT_NORMAL) && |