summaryrefslogtreecommitdiff
path: root/gnu/usr.bin/groff/grohtml/design.ms
diff options
context:
space:
mode:
authorTodd C. Miller <millert@cvs.openbsd.org>2000-04-09 07:58:38 +0000
committerTodd C. Miller <millert@cvs.openbsd.org>2000-04-09 07:58:38 +0000
commit098fe4a0b368c914c7d1f7ce086634958df8796a (patch)
tree35c4467b0223be7d6cd8bf4a8d03b0010b342e2a /gnu/usr.bin/groff/grohtml/design.ms
parent972922b0b73ac8052cf5ab98e029ac4e27c752f3 (diff)
groff 1.15
Diffstat (limited to 'gnu/usr.bin/groff/grohtml/design.ms')
-rw-r--r--gnu/usr.bin/groff/grohtml/design.ms156
1 files changed, 156 insertions, 0 deletions
diff --git a/gnu/usr.bin/groff/grohtml/design.ms b/gnu/usr.bin/groff/grohtml/design.ms
new file mode 100644
index 00000000000..e62e2233096
--- /dev/null
+++ b/gnu/usr.bin/groff/grohtml/design.ms
@@ -0,0 +1,156 @@
+.nr PS 12
+.nr VS 14
+.LP
+.TL
+Design of grohtml
+.sp 1i
+.SH
+What is grohtml
+.LP
+Grohtml is a back end for groff which generates html.
+The aim of grohtml is to produce respectible html given
+fairly typical groff input.
+.SH
+Limitations of grohtml
+.LP
+Although basic text can be translated
+in a straightforward fashion there are some areas where grohtml
+has to try and guess text relationship. In particular whenever
+grohtml encounters text tables and indented paragraphs or
+two column mode it will try and utilize the html table construct
+to preserve columns. Grohtml also attempts to work out which
+lines should be automatically formatted by the browser.
+Ultimately in trying to make reasonable guesses most of the time
+it will make mistakes occasionally.
+.PP
+Tbl, pic, eqn's are also generated using images which may be
+considered a limitation.
+.SH
+Overview of html.cc
+.LP
+This file briefly provides an overview of how html.cc operates.
+The html device driver works as follows:
+.IP (i) .5i
+firstly it creates a linked list of all words on a page.
+.IP (ii) .5i
+it runs through the page and finds the left most margin. Later
+on when generating the page it removes the margin.
+.IP (iii) .5i
+scans a page and builds two kinds of regions ascii text and graphical.
+The graphical regions consist of tbl's, eqn's, pic's
+(basically anything that cannot be textually displayed).
+It will scan through a page to find lines (such as footer etc)
+and places these into tiny graphical regions. Certain fonts
+also are treated as a graphical region - as html has no easy
+equivalent. For example Greek math symbols.
+.LP
+Finally all graphical regions are translated into png files and
+all text regions into html text.
+.PP
+To give grohtml a sporting chance of accuratly deciding which
+is a graphical region and which is text, the front end programs
+tbl, eqn, pic have all been tweeked to encapsulate pictures, tables
+and equations with the following lines:
+.sp
+.nf
+\f[CR]\&.if '\\*(.T'html' \\X(graphic-start(\c
+
+\&.if '\\*(.T'html' \\X(graphic-end(\c
+\fP
+.fi
+.sp
+these appear to grohtml as:
+.sp
+.nf
+\f[CR]\&x X graphic-start
+
+\&...
+
+\&x X graphic-end\fP
+.fi
+.sp
+.LP
+In addition to graphic-start and graphic-end there are two
+other "special characters" which are used.
+.sp
+\f[CR]\&x X index:N\fP
+.sp
+where N is a number. The purpose of this sequence is to stop
+devhtml from automatically producing links to headings which
+have a header level >N.
+The line:
+.sp
+\f[CR]\&x X html:STRING\fR
+.sp
+.LP
+allows a STRING to be passed through to the output file with
+no processing whatsoever. Ie it allows users to include html
+commands, via macro, such as:
+.sp
+\f[CR]\&.URL "Latest Emacs" "ftp://somewonderful.gnu.software"\fP
+.sp
+.LP
+Where the URL macro bundles the info into STRING above.
+For more info consult: \f[CR]tmac/tmac.arkup\fP.
+.PP
+While scanning through a page the html device copies headings and titles
+into a list of links which are later written to the beginning
+of the html document.
+.SH
+Table handling code
+.LP
+Provided that the -t option is not present when grohtml is run the grohtml
+driver will attempt to find textual tables and generate html tables.
+This allows .RS and .RE commands to operate with auto formatting. It also
+should grohtml to process .2C correctly. However, the table handling code
+has to examine the troff output and \fIguess\fR when a table starts and
+finishes. It is well to know the limitations of this approach as it
+sometimes makes the wrong decision.
+.LP
+Here are some of the rules that grohtml uses for terminating a html table:
+.LP
+.IP "(i)" .5i
+A table will be terminated when grohtml finds line which is all in bold
+font (it believes that this is a header which is outside of a table).
+This might be considered incorrect behaviour especially if you use .2C
+which generates a heading on the left column when the corresponding
+right row is blank.
+.IP "(ii)" .5i
+A table is terminated when grohtml sees that the complete line is
+has been spanned by words. Ie no gaps exist.
+.IP "(nb)" .5i
+the documentation about these rules is particularly incomplete and needs finishing
+when time prevails.
+.SH
+To do
+.LP
+.IP (i) .5i
+finish working out the max and min x, y, extents for splines.
+.IP (ii) .5i
+check and test thoroughly all the character descriptions in devhtml
+(originally taken from devX100)
+.IP (iii) .5i
+improve tmac.arkup
+.IP (vi) .5i
+also improve documentation.
+.IP (v) .5i
+fix the bugs which are exposed by Eric Raymonds pic guide,
+\fBMaking Pictures With GNU PIC\fR. It appears that grohtml becomes confused
+about which sections of the document are text and which sections need
+to be rendered as an image.
+.IP (vi) .5i
+it would be nice to modularise the source. A natural division might be
+to extract the table handling code from html.cc into table.cc.
+The table.cc could be expanded to recognise output from tbl and try
+and generate html tables with lines/rules/boxes. The code as it stands
+should cope with very simple plain text tables. But of course at present
+it does not get a chance to do this because the output of gtbl is
+bracketed by \fCgraphic-start\fR and \fCgraphic-end\fR.
+.IP (vii) .5i
+introduce anti aliasing for the images as mentioned by Werner.
+.SH
+Dependencies
+.LP
+Grohtml is dependent upon grops, gs which are invoked to
+generate all png files. Png files are generated whenever a table, picture,
+equation or line is encountered.