Age | Commit message (Collapse) | Author |
|
amendments to his diff are noted on tech
|
|
line, use the current output position as the reference position
for tabs on that input line. This brings mandoc in line with the
behaviour of GNU, Heirloom, and Plan 9 roff.
|
|
whatsoever (for example \fR) and escape sequences that produce
invisible zero-width output (for example \&). No, i'm not joking,
groff does make that distinction, and it has consequences in some
situations, for example for vertical spacing in no-fill mode.
Heirloom and Plan 9 behaviour is subtly different, but in case of
doubt, we want to follow groff.
While this fixes the behaviour for the majority of escape sequences,
in particular for those most likely to occur in practice, it is not
perfect yet because some of the more exotic ESCAPE_IGNORE sequences
are actually of the "no output whatsoever" type but treated
as "invisible zero-width" for now. With the new ASCII_NBRZW mechanism
in place, switching them over one by one when the need arises will
no longer be very difficult.
|
|
not to *output* lines. In particular, if an input line gets broken in
fill mode and a tab occurs in the second output line, it advances to a
position of at least (width of the first output line) + (width of a
space character even though this is never printed) + (width of the part
of the second output line that precedes the tab).
Implement the same logic in mandoc.
Again, do not use tabs in filled text: they have surprising effects,
including this one.
|
|
non-breakable in exactly the same way as "\ ". That is, the preceding
word, the tab character, and the following word are always kept
together on the same output line. If filling is enabled and an
output line break is required before the end of the following word,
the break occurs before the beginning of the preceding word.
Make mandoc behave in the same way.
Of course, using literal tab characters in filled text remains a
bad idea, and the "WARNING: tab in filled text" remains unchanged.
|
|
1. The combination \z\h is a no-op whatever the argument may be.
In the past, the \z only affected the first space character generated
by the \h, which was wrong.
2. For the conbination \zX\h with a positive argument, the first
space resulting from the \h is not printed but consumed by the \z.
3. For the combination \zX\h with a negative argument, application
of the \z needs to be completed before the \h can be started.
In the past, if this combination occurred at the beginning of an
output line, the \h backed up to the beginning of the line and
after that, the \z attempted to back up even further, triggering
an assertion.
Bugs found during an audit of assignments to termp->col that i
started after the bugfix tbl_term.c rev. 1.65. The assertion
triggered by bug 3 was *not* yet found by afl(1).
|
|
sequence in -T ps and -T pdf output mode, use an appropriate
horizontal distance by correctly using the term_len() utility
function. Output from the -T ascii, -T utf8, and -T html modes
was already correct and remains unchanged.
Lennart Jablonka <hummsmith42 at gmail dot com> found and reported
this unit conversion bug (misinterpreting AFM units as if they were
en units) when rendering scdoc-generated manuals (which is a low
quality generator, but that's no excuse for mandoc misformatting \h)
on Alpine Linux. Lennart also tested this patch.
|
|
and resetting the internal state to the initial state.
Call this function from the proper place in term_free().
With the way the module is currently used, this does not imply any
functional change, but doing proper cleanup is more robust, makes
it easier during code review to understand what is going on, and
makes it explicit that there is no memory leak.
|
|
in the tbl(7) layout font modifier.
Get rid of the TBL_CELL_BOLD and TBL_CELL_ITALIC flags and use
the usual ESCAPE_FONT* enum mandoc_esc members from mandoc.h instead,
which simplifies and unifies some code.
While here, also support CB and CI in roff(7) \f escape sequences
and in roff(7) .ft requests for all output modes. Using those is
certainly not recommended because portability is limited even with
groff, but supporting them makes some existing third-party manual
pages look better, in particular in HTML output mode.
Bug-compatible with groff as far as i'm aware, except that i consider
font names starting with the '\n' (ASCII 0x0a line feed) character
so insane that i decided to not support them.
Missing feature reported by nabijaczleweli dot xyz in
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=992002.
I used none of the code from the initial patch submitted by
nabijaczleweli, but some of their ideas.
Final patch tested by them, too.
|
|
While here, drop two unused arguments from the function term_field();
the related work was already done by term_fill() before this commit.
I found the bug in an afl run
that was performed by Jan Schreiber <jes at posteo dot de>.
|
|
are exhaustive. While there is no bug, being explicit has no downside
is is potentially safer for the future.
Michal Nowak <mnowak at startmail dot com> reported that gcc 4.4.4
and 7.4.0 on illumos throw -Wuninitialized false positives.
|
|
inter-word spacing, let's try again with 250 AFM units.
Regression caused during my recent term_flushln() reorg in rev. 1.138,
reported by brynet@ (sorry and many thanks for reporting).
|
|
the terminal filling routine, controlled by new flags TERMP_CENTER
and TERMP_RIGHT.
This became possible by the recent term_flushln() rewrite.
No functional change yet, but to be used by upcoming commits.
|
|
This function has always been among the most complicated parts of
mandoc, and it repeatedly needed substantial functional enhancements.
The present rewrite is required to prepare for the implementation
of simultaneous filling and centering of output lines.
The previous implementation looked at each word in turn and printed
it to the output stream as soon as it was found to still fit on the
current output line. Obviously, that approach neither allows
centering nor adjustment to the right margin.
The new implementation first decides which part of the paragraph
to put onto the current output line, also measuring the display
width of that part, even if that part consists of multiple words
including intervening whitespace. This will allow moving the whole
output line to the right as desired before printing it, for example
to center it or to adjust it to the right margin.
The function is split into three parts, each much shorter, solving a
better defined task, much easier to understand and better commented:
1. the steering function term_flushln() looping over output lines;
2. the calculation function term_fill() looping over input characters;
3. and the output function term_field() looping over printed characters.
No functional change yet.
|
|
* Add the missing special character \_ (underscore).
* Partial implementations of \a (leader character)
and \E (uninterpreted escape character).
* Parse and ignore \r (reverse line feed).
* Add a WARNING message about undefined escape sequences.
* Add an UNSUPP message about unsupported escape sequences.
* Mark \! and \? (transparent throughput)
and \O (suppress output) as unsupported.
* Treat the various variants of zero-width spaces as one-byte escape
sequences rather than as special characters, to avoid defining bogus
forms with square brackets.
* For special characters with one-byte names, do not define bogus
forms with square brackets, except for \[-], which is valid.
* In the form with square brackets, undefined special characters do not
fall back to printing the name verbatim, not even for one-byte names.
* Starting a special character name with a blank is an error.
* Undefined escape sequences never abort formatting of the input
string, not even in HTML output mode.
* Document the newly handled escapes, and a few that were missing.
* Regression tests for most of the above.
|
|
for HTML output. Somewhat relevant because pod2man(1) relies on this.
Missing feature reported by Pali dot Rohar at gmail dot com.
Note that constant width font was already correctly selected before
this when required by semantic markup. Only attempting physical
markup with the low-level escape sequence was ineffective.
|
|
by allowing the preprocessor to pass it through to the formatters.
Used for example by the groff_char(7) manual page.
|
|
OK schwarze
|
|
used for example by zoem(1)
|
|
|
|
in horizontal orientation in the terminal formatter
|
|
|
|
inside individual table cells that contain text blocks.
This cures overlong lines in various Xenocara manuals.
|
|
a pointer to the end of the parsed data, making it easier to
parse subsequent bytes
|
|
second step: make the per-column byte pointer persistent across
term_flushln() calls, such that a subsequent call can continue at
the point where the previous call left. If more than one column
is in use, return from term_flushln() when the column is full,
rather than breaking the output line.
No functional change, because nothing sets up multiple columns yet.
|
|
first step: split column data out of the terminal state struct into
a new column state struct and use an array of such column state
structs. No functional change.
|
|
and after that, previously written output gets overwritten, but
overwriting with blanks does *not* erase previously written content.
Yes, manual pages exist that are crazy enough to rely on that...
|
|
The Tcl/Tk manual pages use this extensively.
Delete the TERM_MAXMARGIN hack, it breaks .mc inside .nf;
instead, implement a proper TERMP_BRNEVER flag.
|
|
Eliminate the "overstep" state variable.
The information is already contained in "viscol".
Minus 60 lines of code, no functional change intended.
|
|
A full implementation would require access to output device properties
and state variables (both only available after the main parser has
finalized the parse tree) before numerical expansions in the roff
preprocessor (i.e., before the main parser is even started).
Not trying to pull that stunt right now because the static-width
implementation committed here is sufficient for tcl-style manual pages
and already more complicated than i would have suspected.
|
|
Good enough to cope with the average DocBook insanity.
|
|
This is the first feature made possible by the parser reorganization.
Improves the formatting of the SYNOPSIS in many Xenocara GL manuals.
Also important for ports, as reported by many, including naddy@.
|
|
Reported by jsg@ after an afl(1) run long ago.
|
|
sequences that jsg@ found with afl(1):
* Avoid writing \t\b in term.c.
* Handle trailing \b in term_ps.c.
|
|
"the" with the obviously intended word.
Started with a "the the" spotted by Mihal Mazurek.
|
|
where only sizeof(enum termfont) is needed.
Fixes CID 1288941. From christos@ via wiz@, both at NetBSD.
|
|
fixing input like \fB\('e; issue reported by bentley@
|
|
* Use ohash(3) rather than a hand-rolled hash table.
* Make the character table static in the chars.c module:
There is no need to pass a pointer around, we most certainly
never want to use two different character tables concurrently.
* No need to keep the characters in a separate file chars.in;
that merely encourages downstream porters to mess with them.
* Sort the characters to agree with the mandoc_chars(7) manual page.
* Specify Unicode codepoints in hex, not decimal (that's the detail
that originally triggered this patch).
No functional change, minus 100 LOC, and i don't see a performance change.
|
|
that were right between two adjacent case statement. Keep only
those 24 where the first case actually executes some code before
falling through to the next case.
|
|
|
|
|
|
in mdoc(7) .Bl -tag and man(7) .TP, but not in man(7) .IP.
Quirk reported by Jan Stary <hans at stare dot cz> on ports@.
|
|
escape sequences; that's cleaner for all output modes, and it's required
to prevent the PostScript/PDF formatter from dying on assertions.
Bug found by jsg@ with afl.
|
|
implementation. As a side effect, minus ten lines of code.
As another side effect, this also fixes the assertion failure that
used to be triggered by "\z\o'ab'c" at the beginning of an output
line, found by jsg@ with afl (test case 022/Apr27).
|
|
There is a first rounding to basic units on the input side.
After that, rounding rules differ between requests and macros.
Requests round to the nearest possible character position.
Macros round to the next character position to the left.
Implement that by changing the return value of term_hspan()
to basic units and leaving the second scaling and rounding stage
to the formatters instead of doing it in the terminal handler.
Improves for example argtable2(3).
|
|
Replace struct mdoc_meta and struct man_meta by a unified struct roff_meta.
Written of the train from London to Exeter on the way to p2k15.
|
|
|
|
font stack. The latter fail after the stack is grown with realloc().
Fixing an assertion failure found by jsg@ with afl some time ago
(test case number 51).
|
|
This is of some relevance because the pod2man(1) preamble abuses it
for the icelandic letter Thorn, instead of simply using \(TP and \(Tp.
Missing feature found by sthen@ in DateTime::Locale::is_IS(3p).
|
|
Not exactly recommended for use, rather for groff compatibility.
While here, introduce similar SHRT_MAX limits as in man(7),
fixing a few cases of infinite output found by jsg@ with afl.
|