summaryrefslogtreecommitdiff
path: root/usr.bin/mandoc/html.c
AgeCommit message (Collapse)Author
2017-02-05mark up .Ar, .Fa, .Va, .Ft, and .Vt with <var> rather than <i>;Ingo Schwarze
suggested by bentley@ long ago, but needed lots of cleanup first
2017-02-05for .Rs, use <cite>Ingo Schwarze
2017-02-05Improve <table> syntax:Ingo Schwarze
The <col> element can only appear inside <colgroup>, so use <colgroup>. The <tbody> element is optional and useless, so don't use it. Even if we would ever need <thead> or <tfoot>, <tbody> would still be optional and useless; besides, we will likely never need <thead> or <tfoot>, simply because our languages don't support such functionality.
2017-01-29eliminate one useless struct and one level of indirection;Ingo Schwarze
no functional change
2017-01-28Simplify usage of print_otag() even more:Ingo Schwarze
accept NULL to skip the attribute or format.
2017-01-26Fix -man -Thtml formatting after .nf (which has nothing to doIngo Schwarze
with "literal", by the way, it means "no fill"): * Use <pre> such that whitespace is preserved. * Preserve lines breaks. * For font alternating macros, avoid node recursion which required scary juggling with the fill state. Instead, simply print the text children directly. Missing feature first noticed by kristaps@ in 2011, the again reported by afresh1@ in 2016, and finally reported here: https://github.com/Debian/debiman/issues/21 , which i only found because of Shane Kerr's comment here: https://plus.google.com/110314300533310775053/posts/H1eaw9Yskoc
2017-01-25Improve HTML formatting of .Bl -tag.Ingo Schwarze
In particular, when using the style sheet, put the body on the same line as the head for short heads, or on the next line for long heads, in a way that preserves both correct indentation and correct vertical spacing with and without -compact, and with one or more heads per body (hi, Zaphod) - eight use cases so far - and with and without -tag, and with and without -offset, 32 use cases grand total. Using many ideas from zhuk@, from <David dot Dahlberg at fkie dot fraunhofer dot de>, and from Benny Lofgren <bl dash lists at lofgren dot biz>, and a few of my own. This is an excellent demonstration that CSS is an extremely hostile language, much more trapful and much harder to use than, say, C. When matthew@ reported this in July 2014 (!), it was already a known issue, and i no longer remember for how long. My first serious attempt at fixing it (in November 2015) failed miserably. I'd love to see simplifications of both the generated HTML code and of the style sheet, but without breaking any of the 32 use cases, please.
2017-01-21slightly simplify header and footer stylesIngo Schwarze
2017-01-19clean up markup of .Bd, .D1, .Dl, .Li, and .Ql;Ingo Schwarze
in particular, stop abuse of <blockquote>
2017-01-19Clean up CSS rules for sections and paragraphs.Ingo Schwarze
Start using real macro names for CSS classes.
2017-01-19Implement line breaking of the generated HTML code at space charactersIngo Schwarze
in filled text. This does not affect HTML semantics, but makes the HTML code even more humanly readable. While here, - collapse multiple consecutive space characters in filled text - and insert a blank between style entries.
2017-01-18Make HTML output more human readable by overhauling line break logicIngo Schwarze
around tags and by introducing some simple indentation. No change of HTML semantics intended.
2017-01-17Completely delete the buf field of struct html and all the buf*()Ingo Schwarze
interfaces. Such a static buffer was a bad idea in the first place, causing unfixable truncation that was only prevented by triggering an assertion failure. Instead, let the small number of remaining users allocate and free their own, temporary dynamic buffers, or for the case of .Xr and .In, pass the original data to be assembled in print_otag().
2017-01-17Simplify the usage of print_otag() by making it accept a variableIngo Schwarze
number of arguments. Delete struct htmlpair and all the PAIR_*() macros. Delete enum htmlattr, handle that in print_otag() instead. Minus 190 lines of code; no functional change except better ordering of attributes (class before style) in three cases.
2017-01-08style: missing blank between case statement and label;Ingo Schwarze
from Tiago Silva <tiagofilipesilva at icloud dot com> long ago
2015-12-25Generate simpler in-page links: just replace spaces with underscores.Anthony J. Bentley
So http://example.com/OpenBSD-current/man1/ls.1#x546865204c6f6e6720466f726d6174 becomes http://example.com/OpenBSD-current/man1/ls.1#The_Long_Format ok schwarze@
2015-10-13Major character table cleanup:Ingo Schwarze
* Use ohash(3) rather than a hand-rolled hash table. * Make the character table static in the chars.c module: There is no need to pass a pointer around, we most certainly never want to use two different character tables concurrently. * No need to keep the characters in a separate file chars.in; that merely encourages downstream porters to mess with them. * Sort the characters to agree with the mandoc_chars(7) manual page. * Specify Unicode codepoints in hex, not decimal (that's the detail that originally triggered this patch). No functional change, minus 100 LOC, and i don't see a performance change.
2015-10-12Fix an obvious bug found during the /* FALLTHROUGH */ cleanup:Ingo Schwarze
ASCII_NBRSP has to be rendered as "&nbsp;", not "-".
2015-10-12To make the code more readable, delete 283 /* FALLTHROUGH */ commentsIngo Schwarze
that were right between two adjacent case statement. Keep only those 24 where the first case actually executes some code before falling through to the next case.
2015-10-06modernize style: "return" is not a function; ok cmp(1)Ingo Schwarze
2015-09-26/* NOTREACHED */ after abort() is silly, delete itIngo Schwarze
2015-03-27Actually use the new man.conf(5) "output" directive.Ingo Schwarze
Additional functionality, yet minus 45 lines of code.
2015-01-21Rudimentary implementation of the roff(7) \o escape sequence (overstrike).Ingo Schwarze
This is of some relevance because the pod2man(1) preamble abuses it for the icelandic letter Thorn, instead of simply using \(TP and \(Tp. Missing feature found by sthen@ in DateTime::Locale::is_IS(3p).
2014-12-20resolve some code duplication; no functional changeIngo Schwarze
2014-12-02Fix the implementation and documentation of \c (continue text input line).Ingo Schwarze
In particular, make it work in no-fill mode, too. Reminded by Carsten dot Kunze at arcor dot de (Heirloom roff).
2014-12-01The header libmandoc.h is part of the internal parser interface,Ingo Schwarze
but html.c is not part of the parser at all, so it cannot include that header, and actually, it doesn't need it. Found while auditing includes after Theo's recent *.h commit.
2014-10-29In terminal output, unify handling of Unicode and numbered characterIngo Schwarze
escape sequences just like it was earlier implemented for -Thtml. Do not let control characters other than ASCII 9 (horizontal tab) propagate to the output, even though groff allows them; but that really doesn't look like a great idea. Let mchars_num2char() return int such that we can distinguish invalid \N syntax from \N'0'. This also reduces the danger of signed char issues popping up.
2014-10-28Make the character table available to libroff so it can check theIngo Schwarze
validity of character escape names and warn about unknown ones. This requires mchars_spec2cp() to report unknown names again. Fortunately, that doesn't require changing the calling code because according to groff, invalid character escapes should not produce output anyway, and now that we warn about them, that's fine.
2014-10-27Handle output encoding for unicode, numbered and named escape sequencesIngo Schwarze
in one common, safe way instead of three different ways. In particular, * skip NUL, it is used to mean "no output desired" * deny 0x01-0x1F and 0x7F-0x9F, print REPLACEMENT CHARACTER instead * print 0x20-0x7E literally or name-encoded, as required * print characters above 0x9F numerically
2014-10-27Fix a regression in term.c rev. 1.89 reported by bentley@:Ingo Schwarze
In UTF-8 output, do not print anything if mchars_spec2cp() returns 0. In particular, this repairs handling of zero-width spaces (\&). While here, let mchars_spec2cp() return 0xFFFD instead of -1 if the character is not found, simplifying the using code. In HTML output, do not print obfuscated ASCII characters and do not test for one-char escapes, mchars_spec2cp() already does that.
2014-10-26Improve -Tascii output for Unicode escape sequences: For the first 512Ingo Schwarze
code points, provide ASCII approximations. This is already much better than what groff does, which prints nothing for most code points. A few minor fixes while here: * Handle Unicode escape sequences in the ASCII range. * In case of errors, use the REPLACEMENT CHARACTER U+FFFD for -Tutf8 and the string "<?>" for -Tascii output. * Handle all one-character escape sequences in mchars_spec2{cp,str}() and remove the workarounds on the higher level.
2014-10-13Add missing */ after $OpenBSD$ tagCharles Longeau
ok schwarze@
2014-10-10Partial eqn(7) rewrite by kristaps@ in order to get operator precedence right.Ingo Schwarze
2014-10-09parse and render "from" and "to" clauses in eqn, and render matrices;Ingo Schwarze
written by kristaps@ during EuroBSDCon
2014-10-09initial bits of MathML rendering for eqn(7) -Thtml;Ingo Schwarze
written by kristaps@ during EuroBSDCon
2014-10-07Switch HTML output to polyglot HTML5; have only one single -Thml mode.Ingo Schwarze
Replace hard-coded widths and alignments with a minimal embedded stylesheet. Do not use <p> because it cannot appear inside block macros. Remove the "summary" attribute because it is not HTML5. Written by kristaps@ some months ago, finished during EuroBSDCon.
2014-08-14Revert previous, as requested by kristaps@.Ingo Schwarze
The .Bf block can contain subblocks, so it has to render as an element that can contain flow content. But <em> cannot contain flow content, only phrasing content. Rendering .Em and .Bf differently would by unfortunate, and closing out .Bf before subblocks and re-opening it afterwards would merely complicate both the C code of the program and the generated HTML code. Besides, converting .Em to semantic HTML markup would require some content to be put into <em> and some into <i>, but we cannot automatically distinguish which is which, so strictly speaking, we can't use semantic HTML here but have to fall back to physical markup. Wonders of HTML...
2014-08-13Begin cleanup of scaling units.Ingo Schwarze
Note that we use 240u := 1i for all devices, even -Tps and -Tpdf. Big fix of -Tascii rendering of f, m, and u. Small fix of -Tascii rendering of c. Big fix of -Thtml rendering of u. Big fix of -Tps rendering of m, p, and u. Clarify -Tps rendering of c. Correct documentation of scaling units, in particular with respect to u. This for example improves rendering of the OpenGL manuals. Joint work with kristaps@.
2014-08-13Use <em> for .Em and .Bf -emphasis.Ingo Schwarze
The vast majority of .Em in real-world manuals is stress emphasis, for which <em> is the correct markup. Admittedly, there are some instances of .Em usage for alternate quality, for which <i> would be a better match. Most of these are technical terms that neither allow semantic markup nor are keywords - for the latter, .Sy would be preferable. A typical example is that the shell breaks input into .Em words . Alternate voice or mood, which would also require <i>, is almost absent from manuals. We cannot satisfy both stress emphasis and alternate quality, so pick the one that fits more often and looks less wrong when off. Patch from Guy Harris <guy at alum dot mit dot edu>. ok bentley@ joerg@NetBSD
2014-07-23Security fix:Ingo Schwarze
After decoding numeric (\N) and one-character (\<, \> etc.) character escape sequences, do not forget to HTML-encode the resulting ASCII character. Malicious manuals were able to smuggle XSS content by roff-escaping the HTML-special characters they need. That's a classic bug type in many web applications, actually... :-( Found myself while auditing the HTML formatter for safe output handling.
2014-07-22Security fix:Ingo Schwarze
The function print_encode() is used both for plain text and for quoted attribute values. Escape the '"' character such that malicious manuals cannot pull off XSS attacks using malformed .Lk, .Mt, .%U, and .UR macros (and maybe others) to trigger the latter case. In the former case, escaping does no harm. Issue found by Sebastien Marie <semarie-openbsd at latrappe dot fr>.
2014-04-23Audit strlcpy(3)/strlcat(3) usage.Ingo Schwarze
* Repair three instances of silent truncation, use asprintf(3). * Change two instances of strlen(3)+malloc(3)+strlcpy(3)+strlcat(3)+... to use asprintf(3) instead to make them less error prone. * Cast the return value of four instances where the destination buffer is known to be large enough to (void). * Completely remove three useless instances of strlcpy(3)/strlcat(3). * Mark two places in -Thtml with XXX that can cause information loss and crashes but are not easy to fix, requiring design changes of some internal interfaces. * The file mandocdb.c remains to be audited.
2014-04-20KNF: case (FOO): -> case FOO, remove /* LINTED */ and /* ARGSUSED */,Ingo Schwarze
remove trailing whitespace and blanks before tabs, improve some indenting; no functional change
2014-03-21The files mandoc.c and mandoc.h contained both specialised low-levelIngo Schwarze
functions used for multiple languages (mdoc, man, roff), for example mandoc_escape(), mandoc_getarg(), mandoc_eos(), and generic auxiliary functions. Split the auxiliaries out into their own file and header. While here, do some #include cleanup.
2014-01-22Implement the \: (optional line break) escape sequence,Ingo Schwarze
documented in the Ossanna-Kernighan-Ritter troff manual and also supported by groff. Missing feature reported by Steffen Nurpmeso <sdaoden at gmail dot com>.
2014-01-05Fix one case where a non-literal is used as format string.Ingo Schwarze
Fix another case where a variable is formatted using the wrong type. Patch from Joerg Sonnenberger <joerg@NetBSD>.
2013-08-08Implement the roff(7) font-escape sequence \f(BI "bold+italic".Ingo Schwarze
This improves the formatting of about 40 base manuals and reduces groff-mandoc formatting differences in base by about 5%.
2012-05-28Implement the roff \z escape sequence, intended to output the nextIngo Schwarze
character without advancing the cursor position; implement it to simply skip the next character, as it will usually be overwritten. With this change, the pod2man(1) preamble user-defined string \*:, intended to render as a diaeresis or umlaut diacritic above the preceding character, is rendered in a slightly less ugly way, though still not correctly. It was rendered as "z.." and is now rendered as ".". Given that the definition of \*: uses elaborate manual \h positioning, there is little chance for mandoc(1) to ever render it correctly, but at least we can refrain from printing out a spurious "z", and we can make the \z do something semi-reasonable for easier cases.
2011-10-09Sync to version 1.12.0; all code by kristaps@:Ingo Schwarze
Implement .Rv in -Tman. Let -man -Tman work a bit like cat(1). Add the -Ofragment option to -T[x]html. Minor fixes in -T[x]html. Lots of apropos(1) and -Tman code cleanup.
2011-07-08clean up .HP, .IP, .TP, .nf, and \c handling in -T[x]html;Ingo Schwarze
from kristaps@