summaryrefslogtreecommitdiff
path: root/gnu/usr.bin
diff options
context:
space:
mode:
authorIngo Schwarze <schwarze@cvs.openbsd.org>2024-05-14 18:38:14 +0000
committerIngo Schwarze <schwarze@cvs.openbsd.org>2024-05-14 18:38:14 +0000
commitde628b3172c196b4c8e91f4e9d554f4ade647bf0 (patch)
treeb411fbd53f73fc31dc521c46e3d26099ed35cd39 /gnu/usr.bin
parentac5798f9ce85058c7112837ba18178c9db6c85c8 (diff)
The makewhatis(8) program already provided a "-T utf8" option
to put UTF-8 strings into the database, but that only worked for input files containing the manually written, mnemonic roff(7) character escape sequences documented in mandoc_char(7). Even though mandoc(1), man(1), and man.cgi(8) have been able to properly handle UTF-8 and ISO-Latin-1 encoded input files for many years, makewhatis(8) unconditionally replaced all non-ASCII bytes in all input files with ASCII question marks ("?"). Improve this by changing two aspects of non-ASCII character handling in makewhatis(8) at the same time. 1. In the makewhatis(8) main program, when configuring the roff(7) parser, enable UTF-8 and ISO-Latin-1 autorecognition and translation to \[uXXXX] roff(7) Unicode character escape sequences. The man(1) and man.cgi(8) programs prove that this option has been working very reliably for many years, so there is no risk. 2. In the makewhatis(8) string rendering code, if "-T utf8" was requested, translate these escape sequences to UTF-8 strings, just like makewhatis(8) already did it for ESCAPE_SPECIAL sequences. Otherwise, i.e. if an ASCII-only database is desired, replace all character escape sequences by ASCII transliterations, again like it was already done for ESCAPE_SPECIAL sequences. With this change, giving UTF-8 command line arguments to apropos(1) allows searching in UTF-8 and ISO-Latin-1 encoded manual pages if the respective mandoc.db(5) has been built with makewhatis(8) -T utf8. Issue found while investigating a question from Valid-Amirali-Averiva at rambler dot ru, who is using mandoc on FreeBSD to process documents containing cyrillic letters.
Diffstat (limited to 'gnu/usr.bin')
0 files changed, 0 insertions, 0 deletions