summaryrefslogtreecommitdiff
path: root/usr.bin/mandoc/roff.c
AgeCommit message (Collapse)Author
2019-07-01delete trailing whitespace and space-tab sequences; no code change;Ingo Schwarze
patch from Michal Nowak <mnowak at startmail dot com> who found these with git pbchk in the illumos tree
2019-04-21When calling an empty macro, do not clobber existing arguments.Ingo Schwarze
Fixing a bug found with the groffer(1) version 1.19 manual page following a report from Jan Stary.
2019-04-21Implement the roff .break request (break out of a .while loop).Ingo Schwarze
Jan Stary <hans at stare dot cz> found it in an ancient groffer(1) manual page (version 1.19) on MacOS X Mojave. Having .break not implemented wasn't a particularly bright idea because obviously, it tended to cause infinite loops.
2019-02-06Let roff_getname() end the roff identifier at a tab characterIngo Schwarze
and audit all its callers whether termination is handled correctly. Resulting improvements: * An escape or tab ending the macro name in a macro invocation is discarded, and argument processing is started after it. * An escape or tab ending a name in ".if d" and ".if r" is preserved. * An escape ending a name in ".ds" causes the whole request to be ignored. * A tab ending a name in ".ds" becomes part of the string. * An escape or tab ending a name in ".rm" causes the rest of the line to be ignored. * An escape or tab ending the first name in ".als", ".rn", or ".nr" causes the whole request to be ignored. Kurt Jaeger <pi at FreeBSD> made me aware of https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235456#c0 and in that bug report, comment 0 item (3) is a special case of this class of issues. Yes, the "mh" manual pages are no doubt among the worst on the planet.
2019-02-06adjust style and comments in roff_getname(); no functional changeIngo Schwarze
2019-01-05no-fill mode has to be suspended during tbl(7) rendering, tooIngo Schwarze
2019-01-05Some high-level block macros have an effect similar to temporarilyIngo Schwarze
suspending no-fill mode during their head. Model this with an additional roff parser state flag ROFF_NONOFILL. That is much simpler than it would be to save and restore the ROFF_NOFILL flag itself, in particular since the latter can be switched (with lasting effect) by the .nf and .fi requests even while its effect is temporarily suspended. This commit does not change formatting yet, but prepares for future formatting simplifications and improvements.
2018-12-31Store the fill mode with a new flag NODE_NOFILL in every node,Ingo Schwarze
like it is already done with NODE_SYNPRETTY, such that the fill mode becomes more directly available to the formatters. Not used yet, but will be used by upcoming commits.
2018-12-31Move parsing of the .nf and .fi (fill mode) requests from the man(7)Ingo Schwarze
parser to the roff(7) parser. As a side effect, .nf and .fi are now also parsed in mdoc(7) input, though the mdoc(7) formatters still ignore most of their effect.
2018-12-31Cleanup, minus 15 LOC, no functional change:Ingo Schwarze
Simplify the way the man(7) and mdoc(7) validators are called. Reset the parser state with a common function before calling them. There is no need to again reset the parser state afterwards, the parsers are no longer used after validation. This allows getting rid of man_node_validate() and mdoc_node_validate() as separate functions.
2018-12-30Cleanup, no functional change:Ingo Schwarze
The struct roff_man used to be a bad mixture of internal parser state and public parsing results. Move the public results to the parsing result struct roff_meta, which is already public. Move the rest of struct roff_man to the parser-internal header roff_int.h. Since the validators need access to the parser state, call them from the top level parser during mparse_result() rather than from the main programs, also reducing code duplication. This keeps parser internal state out of thee main programs (five in mandoc portable) and out of eight formatters.
2018-12-21Rename mandoc_getarg() to roff_getarg() and pass it the roff parserIngo Schwarze
struct as an argument such that after copy-in, it can call roff_expand() once again, which used to be called roff_res() before this. This fixes a subtle low-level roff(7) parsing bug reported by Fabio Scotoni <fabio at esse dot ch> in the 4.4BSD-Lite2 mdoc.samples(7) manual page, because that page used an escaped escape sequence in a macro argument. To expand escaped escape sequences in quoted mdoc(7) arguments, too, stop bypassing the call to roff_getarg() in mdoc_argv.c, function args() for this case. This does not solve the case of escaped escape sequences in quoted .Bl -column phrases yet. Because roff_expand() can make the string longer, roff_getarg() can no longer operate in-place but needs to malloc(3) the returned string. In the high-level parsers, free(3) that string after processing it.
2018-12-20Bugfix:Ingo Schwarze
When after a \\, \t, or \a, another \t or \a had to be resolved in copy mode within the same argument, the argument got corrupted. Found while working on a loosely related bug report from Fabio Scotoni <fabio at esse dot ch>.
2018-12-18As a first step towards making roff_res() callable from mandoc_getarg(),Ingo Schwarze
move the function mandoc_getarg() from mandoc.c to roff.c. It was misplaced in mandoc.c in the first place; that file is intended for utilities needed both by parsers and by formatters, while reading macro arguments in copy mode is purely a task of the roff(7) parser. Needed as a preliminary for an upcoming bugfix. No code change.
2018-12-15Several improvements to escape sequence handling.Ingo Schwarze
* Add the missing special character \_ (underscore). * Partial implementations of \a (leader character) and \E (uninterpreted escape character). * Parse and ignore \r (reverse line feed). * Add a WARNING message about undefined escape sequences. * Add an UNSUPP message about unsupported escape sequences. * Mark \! and \? (transparent throughput) and \O (suppress output) as unsupported. * Treat the various variants of zero-width spaces as one-byte escape sequences rather than as special characters, to avoid defining bogus forms with square brackets. * For special characters with one-byte names, do not define bogus forms with square brackets, except for \[-], which is valid. * In the form with square brackets, undefined special characters do not fall back to printing the name verbatim, not even for one-byte names. * Starting a special character name with a blank is an error. * Undefined escape sequences never abort formatting of the input string, not even in HTML output mode. * Document the newly handled escapes, and a few that were missing. * Regression tests for most of the above.
2018-12-14Cleanup, no functional change:Ingo Schwarze
Now that message handling is properly encapsulated, remove struct mparse pointers from four structs (roff, roff_man, tbl_node, eqn_node) and from the argument lists of five functions (roff_alloc, roff_man_alloc, mandoc_getarg, tbl_alloc, eqn_alloc). Except for being passed to the main program as an opaque object, it now only occurs in read.c, as it should, and not across 15 files like in the past.
2018-12-14Almost mechanical diff to remove the "struct mparse *" argumentIngo Schwarze
from mandoc_msg(), where it is no longer used. While here, rename mandoc_vmsg() to mandoc_msg() and retire the old version: There is really no point in having another function merely to save "%s" in a few places. Minus 140 lines of code.
2018-12-13Cleanup, no functional change:Ingo Schwarze
Split the top level parser interface out of the utility header mandoc.h, into a new header mandoc_parse.h, for use in the main program and in the main parser only. Move enum mandoc_os into roff.h because struct roff_man is the place where it is stored. This allows removal of mandoc.h from seven files in low-level parsers and in formatters.
2018-12-13Cleanup, no functional change:Ingo Schwarze
No need to expose the eqn(7) syntax tree data structures everywhere. Move them to their own include file, "eqn.h". While here, delete the unused enum eqn_pilet.
2018-12-13Cleanup, no functional change:Ingo Schwarze
In libroff.h, nothing was left except the eqn(7) parser interface, which isn't really part of the roff(7) parser, so rename it to eqn_parse.h. While here, move struct eqn_def to eqn.c because that's the only file using it, and let eqn_box_free() and eqn_free() handle NULL.
2018-12-13Cleanup, no functional change:Ingo Schwarze
Move tbl(7)-specific parser internals out of libroff.h. Move some tbl(7)-internal processing from roff.c to tbl.c.
2018-12-12Cleanup, no functional change:Ingo Schwarze
No need to expose the tbl(7) syntax tree data structures everywhere. Move them to their own include file, "tbl.h", and improve comments.
2018-12-04Clean up the validation of .Pp, .PP, .sp, and .br. Make sure allIngo Schwarze
combinations are handled, and are handled in a systematic manner. This resolves some erratic duplicate handling, handles a number of missing cases, and improves diagnostics in various respects. Move validation of .br and .sp to the roff validation module rather than doing that twice in the mdoc and man validation modules. Move the node relinking function to the roff library where it belongs. In validation functions, only look at the node itself, at previous nodes, and at descendants, not at following nodes or ancestors, such that only nodes are inspected which are already validated.
2018-11-26When a conditional block is closed by putting "\}" on a text lineIngo Schwarze
by itself (which is somewhat unusual but not invalid; most authors use the empty macro line ".\}" instead), agree more closely with groff and do not produce a double space in the output. Quirk reported by millert@. While here, tweak the rest of the function body of roff_cond_text() to more closely match roff_cond_sub(). The subtly different handling could make people (including myself) wonder whether there is any point in being different. Testing shows there is not.
2018-10-25Implement the \f(CW and \f(CR (constant width font) escape sequencesIngo Schwarze
for HTML output. Somewhat relevant because pod2man(1) relies on this. Missing feature reported by Pali dot Rohar at gmail dot com. Note that constant width font was already correctly selected before this when required by semantic markup. Only attempting physical markup with the low-level escape sequence was ineffective.
2018-08-25Rudimentary implementation of the roff(7) .char (output glyphIngo Schwarze
definition) request, used for example by groff_hdtbl(7). This simplistic implementation may interact incorrectly with the .tr (input character translation) request. But come on, you are not only using .char *and* .tr, but you do so with respect to the same character in the same manual page?
2018-08-24Rudimentary implementation of the roff(7) .while request.Ingo Schwarze
Needed for example by groff_hdtbl(7). There are two limitations: It does not support nested .while requests yet, and each .while loop must start and end in the same scope. The roff_parseln() return codes are now more flexible and allow OR'ing options.
2018-08-23Implement the roff(7) .shift and .return requests,Ingo Schwarze
for example used by groff_hdtbl(7) and groff_mom(7). Also correctly interpolate arguments during nested macro execution even after .shift and .return, implemented using a stack of argument arrays. Note that only read.c, but not roff.c can detect the end of a macro execution, and the existence of .shift implies that arguments cannot be interpolated up front, so unfortunately, this includes a partial revert of roff.c rev. 1.209, moving argument interpolation back into the function roff_res().
2018-08-21Implement the \\$@ escape sequence (insert all macro arguments,Ingo Schwarze
quoted) in addition to the already supported \\$* (similar, but unquoted). Then use \\$@ to improve the implementation of the .als request (macro alias). Needed by groff_hdtbl(7). Gosh, it feels like the manual pages of the groff package are exercising every bloody roff(7) feature under the sun. In the manual page source code itself, not merely in the implementation of the used macro packages, that is.
2018-08-20Expand \n(.$ (the number of macro arguments) right in roff_userdef(),Ingo Schwarze
before even reparsing the expanded macro. That is the least dirty way to fix the bug that \(.$ remained set after execution of the user-defined macro ended. Any other way to fix it would probably require changes to read.c, which really shouldn't be bothered with such roff(7) internals.
2018-08-19Mostly complete implementation of the 'c' (character available)Ingo Schwarze
roff conditional, except that the .char request still isn't supported and that behaviour differs from groff in many edge cases. But at least valid character names and numbers are now distinguished from invalid ones. This also fixes the bug that parsing of the 'c' conditional was incomplete, which resulted in leaking the tested character to the input parser at the beginning of the body when the condition was inverted.
2018-08-18Bugfix: When a line ends with '\ \"', don't strip the trailing spaceIngo Schwarze
because that turned it into a bogus line continuation.
2018-08-18support the highly surprising escape sequence \# (line continuationIngo Schwarze
with comment); used for example by gropdf(1)
2018-08-18implement the GNU man-ext .SY/.YS (synopsis block) macro in man,Ingo Schwarze
used in most manual pages of the groff package
2018-08-16implement the GNU man-ext .TQ macro in man(7),Ingo Schwarze
used for example by groff_diff(7)
2018-08-16Implement the \*(.T predefined string (interpolate device name)Ingo Schwarze
by allowing the preprocessor to pass it through to the formatters. Used for example by the groff_char(7) manual page.
2018-08-10Implement the roff(7) .nop (no operation) request.Ingo Schwarze
Examples of manual pages (ab)using it include groff(7), chem(1), groff_mom(7), and groff_hdtbl(7).
2018-08-01After rewriting the parse buffer from scratch, we also have to resetIngo Schwarze
the parse point to the beginning of the new buffer or we risk out of bounds accesses. Bug found by Leah Neukirchen <leah at vuxu dot org> with valgrind on Void Linux.
2018-04-11preserve comments before .Dd when converting mdoc(7) to man(7)Ingo Schwarze
with mandoc -Tman; suggested by Thomas Klausner <wiz at NetBSD>
2018-04-10Two new low-level roff(7) features:Ingo Schwarze
* .nr optional third argument (auto-increment step size) * \n+ and \n- numerical register auto-increment and -decrement bentley@ reported on Dec 9, 2013 that lang/sbcl(1) uses these.
2018-04-09When accessing an undefined number register, define it to be zero, likeIngo Schwarze
the previous commit for strings and macros, only technically simpler. Desired behaviour also mentioned by Werner Lemberg in 2011. This diff adds functionality but is -21 +19 LOC. :-)
2018-04-09Using an undefined string or macro will cause it to be defined as empty.Ingo Schwarze
Observed by Werner Lemberg on Nov 14, 2011 and rotting on my TODO list ever since.
2017-07-14The .Dd and .TH macros must interrupt .ce, too;Ingo Schwarze
fixing tree corruption and assertion failure found by jsg@ with afl(1)
2017-07-14Explicitly initialize a variable where the compiler is (understandably)Ingo Schwarze
unable to figure out that it is never used uninitialized. While here, tweak the content of the variable to make its usage easier to understand. No functional change.
2017-07-13eqn(7) .EQ has to break man(7) next-line scope, or tree corruptionIngo Schwarze
and use after free many ensue; again found by jsg@ with afl(1)
2017-07-08Simplify by creating struct roff_node syntax tree nodes for tbl(7)Ingo Schwarze
right from roff_parseln() rather than delegating to read.c, similar to what i just did for eqn(7). The interface function roff_span() becomes obsolete and is deleted, the former interface function roff_addtbl() becomes static, the interface functions tbl_read() and tbl_cdata() become void, and minus twelve linus of code. No functional change.
2017-07-08fix an assertion failure triggered by .ce in next-line scope;Ingo Schwarze
found by jsg@ with afl(1)
2017-07-081. Eliminate struct eqn, instead use the existing membersIngo Schwarze
of struct roff_node which is allocated for each equation anyway. 2. Do not keep a list of equation parsers, one parser is enough. Minus fifty lines of code, no functional change.
2017-07-04Fix handling of \} on roff request lines.Ingo Schwarze
Cures bogus error messages in pages generated with pod2man(1).
2017-06-25Add support for the MT and ME mailto macros, used for example in wg(8).Anthony J. Bentley
feedback and ok schwarze@