mandoc - UNIX manpage compiler toolset

	Commit message (Collapse)	Author	Age	Files	Lines
*	Instead of the custom <div class="manual-text">, use the standard	Ingo Schwarze	2022-07-03	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	HTML <main> element. The benefit is that it has the ARIA landmark role "main" by default. To ease the transition for people using their own CSS file instead of mandoc.css, retain the custom class for now. I had this idea in a discussion with Anna Vyalkova <cyber at sysrq dot in>. Patch from Anna, slightly tweaked by me.
*	Improve accessibility of -T html -O toc output by using the <nav> element	Ingo Schwarze	2022-06-24	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	in the DPUB-ARIA doc-toc role. Patch from Anna Vyalkova <cyber at sysrq dot in> slightly tweaked by me. This is hopefully the start of a collaboration to improve accessibility of Unix manual pages using the WAI-ARIA, HTML-ARIA, and DPUB-ARIA standards. Progress appears to be possible without changing anything with respect to the way manual pages are written. Instead, it seems sufficient to properly translate semantic cues already implied by existing mdoc(7) markup into the appropriate HTML elements and ARIA attributes. Overall, the total length of HTML output is likely to increase slightly, but not much.
*	If the layout or data of an individual cell in a tbl(7) contains	Ingo Schwarze	2021-09-09	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	only "_", "-", or "=", requesting a horizontal line to be drawn across the middle of the cell, print <hr/> in that cell in HTML output. That is arguably slightly ugly because HTML 5 regards <hr/> as semantic markup, meaning "thematic break". If somebody knowns a better way to render a horizontal line across the middle of a table cell with pure HTML and CSS, and without implying a specific meaning, please tell me. Missing feature reported by <Oliver dot Corff at email dot de>.
*	When a .Tg is attached to a paragraph, attach the permalink	Ingo Schwarze	2020-04-18	1	-0/+2
\| \| \| \|	to the first word, or the first few words if they are short.
*	Split tagging into a validation part including prioritization	Ingo Schwarze	2020-03-13	1	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in tag.{h,c} and {mdoc,man}_validate.c and into a formatting part including command line argument checking in term_tag.{h,c}, html.c, and {mdoc\|man}_{term\|html}.c. Immediate functional benefits include: * Improved prioritization of automatic tags for .Em and .Sy. * Avoiding bogus automatic tags when .Em, .Fn, or .Sy are explicitly tagged. * Explicit tagging of .Er and .Fl now works in HTML output. * Automatic tagging of .IP and .TP now works in HTML output. But mainly, this patch provides clean earth to build further improvements on. Technical changes: * Main program: Write a tag file for ASCII and UTF-8 output only. * All formatters: There is no more need to delay writing the tags. * mdoc(7)+man(7) formatters: No more need for elaborate syntax tree inspection. * HTML formatter: If available, use the "string" attribute as the tag. * HTML formatter: New function to write permalinks, to reduce code duplication. Style cleanup in the vicinity while here: * mdoc(7) terminal formatter: To set up bold font for children, defer to termp_bold_pre() rather than calling term_fontpush() manually. * mdoc(7) terminal formatter: Garbage collect some duplicate functions. * mdoc(7) HTML formatter: Unify <code> handling, delete redundant functions. * Where possible, use switch statements rather than if cascades. * Get rid of some more Yoda notation. The necessity for such changes was first discussed with kn@, but i didn't bother him with a request to review the resulting -673/+782 line patch.
*	Introduce a new mdoc(7) macro .Tg ("tag") to explicitly mark a place	Ingo Schwarze	2020-01-19	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	as defining a term. Please only use it when automatic tagging does not work. Manual page authors will not be required to add the new macro; using it remains optional. HTML output is still rudimentary in this version and will be polished later. Thanks to kn@ for reminding me that i have been considering since BSDCan 2014 whether something like this might be useful. Given that possibilities of making automatic tagging better are running out and there are still several situations where automatic tagging cannot do the job, i think the time is now ripe. Feedback and no objection from millert@; OK espie@ inoguchi@ kn@.
*	delete the TAG_IDIV crutch, which is no longer used	Ingo Schwarze	2019-09-01	1	-1/+0
\|
*	In the HTML formatter, assert(3) that no HTML nesting violation occurs.	Ingo Schwarze	2019-08-29	1	-11/+11
\| \| \| \| \| \| \|	Tested on the complete manual page trees of Version 7 AT&T UNIX, 4.4BSD-Lite2, POSIX-2013, OpenBSD 2.2 to 6.5 and -current, FreeBSD 10.0 to 12.0, NetBSD 6.1.5 to 8.1, DragonFly 3.8.2 to 5.6.1, and Linux 4.05 to 5.02.
*	In HTML output, allow switching the desired font for subsequent	Ingo Schwarze	2019-04-30	1	-12/+3
\| \| \| \| \| \| \| \|	text without printing an opening tag right away, and use that in the .ft request handler. While here, garbage collect redundant enum htmlfont and reduce code duplication in print_text(). Fixing an assertion failure reported by Michael <Stapelberg at Debian> in pmRegisterDerived(3) from libpcp3-dev.
*	Wrap .Sh/.SH sections and .Ss/.SS subsections in HTML <section> elements	Ingo Schwarze	2019-03-01	1	-0/+1
\| \| \| \| \| \|	as recommended for accessibility by the HTML 5 standard. Triggered by a similar, but slightly different suggestion from Laura Morales <lauretas at mail dot com>.
*	The .UR and .MT blocks in man(7) are represented by <a> elements	Ingo Schwarze	2019-01-18	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	which establish phrasing context, but they can contain paragraph breaks (which is relevant for terminal formatting, so we can't just change the structure of the syntax tree), which are respresented by <p> elements and cannot occur inside <a>. Fix this by prematurely closing the <a> element in the HTML formatter. This menas that the clickable text in HTML output is shorter than what is represented as the link text in terminal output, but in HTML, it is frankly impossible to have the clickable area of a hyperlink extend across a paragraph break. The difference in presentation is not a major problem, and besides, paragraph breaks inside .UR are rather poor style in the first place. The implementation is quite tricky. Naively closing out the <a> prematurely would result in accessing a stale pointer when later reaching the physical end of the .UR block. So this commit separates visual and structural closing of "struct tag" stack items. Visual closing means that the HTML element is closed but the "struct tag" remains on the stack, to avoid later access to a stale pointer and to avoid closing the same HTML element a second time later. This also needs reference counting of pointers to "struct tag" stack items because often more than one child holds a pointer to the same parent item, and only the outermost child can safely do the physical closing. In the whole corpus of nearly half a million manual pages on man.openbsd.org, this problem occurs in exactly one page: the groff(1) version 1.20.1 manual contained in DragonFly-3.8.2, which contains a formatting error triggering the bug.
*	Represent mdoc(7) .Pp (and .sp, and some SYNOPSIS and .Rs features)	Ingo Schwarze	2019-01-07	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	by the <p> HTML element and use the html_fillmode() mechanism for .Bd -unfilled, just like it was done for man(7) earlier, finally getting rid both of the horrible <div class="Pp"></div> hack and of the worst HTML syntax violations caused by nested displays. Care is needed because in some situations, paragraphs have to remain open across several subsequent macros, whereas in other situations, they must get closed together with a block containing them. Some implementation details include: * Always close paragraphs before emitting HTML flow content. * Let html_close_paragraph() also close <pre> for extra safety. * Drop the old, now unused function print_paragraph(). * Minor adjustments in the top-level man(7) node formatter for symmetry. * Bugfix: .Ss heads suspend no-fill mode, even though .Ss doesn't end it. * Bugfix: give up on .Op semantic markup for now, see the comment.
*	Finally, represent the man(7) .PP and .HP macros by the natural	Ingo Schwarze	2019-01-06	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \|	choice, which is the <p> HTML element. On top of the previous fill-mode improvements, the key to making this possible is to automatically close the <p> when required: before headers, subsequent paragraphs, lists, indented blocks, synopsis blocks, tbl(7) blocks, and before blocks using no-fill mode. In man(7) documents, represent the .sp request by a blank line in no-fill mode and in the same way as .PP in fill mode.
*	Now that the NODE_NOFILL flag in the syntax tree is accurate,	Ingo Schwarze	2019-01-05	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	use it in the man(7) HTML formatter rather than keeping fill mode state locally, resulting in massive simplification (minus 40 LOC). Move the html_fillmode() state handler function to the html.c module such that both the man(7) and the roff(7) formatter (and in the future, also the mdoc(7) formatter) can use it. Give it a query mode, to be invoked with TOKEN_NONE.
*	drop flag HTML_LITERAL which is no longer used	Ingo Schwarze	2018-12-31	1	-1/+0
\|
*	Yet another round of improvements to manual font selection.	Ingo Schwarze	2018-12-16	1	-0/+1
\| \| \| \| \| \| \| \| \|	Unify handling of \f and .ft. Support \f4 (bold+italic). Support ".ft BI" and ".ft CW" for terminal output. Support the .ft request in HTML output. Reject the bogus fonts \f(C1, \f(C2, \f(C3, and \f(CP. In regress.pl, only strip leading whitespace in math mode.
*	Implement the \f(CW and \f(CR (constant width font) escape sequences	Ingo Schwarze	2018-10-25	1	-0/+1
\| \| \| \| \| \| \| \| \|	for HTML output. Somewhat relevant because pod2man(1) relies on this. Missing feature reported by Pali dot Rohar at gmail dot com. Note that constant width font was already correctly selected before this when required by semantic markup. Only attempting physical markup with the low-level escape sequence was ineffective.
*	Add an option -T html -O toc to add a brief table of contents near	Ingo Schwarze	2018-10-02	1	-0/+2
\| \| \| \| \|	the top of HTML pages containing at least two non-standard sections. Suggested by Adam Kalisz and discussed with kristaps@ during EuroBSDCon 2018.
*	Support a second argument to -O man,	Ingo Schwarze	2018-10-02	1	-1/+2
\| \| \| \| \| \|	selecting the format according to local existence of the file. Suggested by kristaps@ during EuroBSDCon 2018. Written on the train Frankfurt-Karlsruhe returning from EuroBSDCon.
*	Delete substantial amounts of code	Ingo Schwarze	2018-06-25	1	-1/+0
\| \| \| \|	now that we no longer use variable style= attributes.
*	Do not write <colgroup> elements. Their only purpose is to enforce	Ingo Schwarze	2018-06-25	1	-2/+0
\| \| \| \| \| \|	author-specified column widths, which can harm responsive design and provide no real benefit: HTML rendering engines usually do just fine automatically selecting appropriate column widths.
*	Do not write duplicate id= attributes, they violate HTML syntax.	Ingo Schwarze	2018-05-25	1	-1/+1
\| \| \| \| \|	Append suffixes for disambiguation. Issue first reported by Jakub Klinkovsky <j dot l dot k at gmx dot com> (Arch Linux).
*	Fix a long-standing issue:	Ingo Schwarze	2018-05-09	1	-0/+1
\| \| \| \| \| \| \| \| \|	Some macros (Nd, Oo) can contain blocks but rendered as elements that can only contain phrasing content, resulting in invalid HTML nesting. Switch them to <div>. Also move the related "display: inline" style from the HTML to the CSS. Reminded during a conversation with John Gardner.
*	preserve comments before .Dd and .TH (typically Copyright and license)	Ingo Schwarze	2018-04-13	1	-1/+2
\| \| \| \| \|	in full HTML output, but not with -Ofragment, e.g. in man.cgi(8); suggested by Thomas Klausner <wiz at NetBSD>
*	1. Eliminate struct eqn, instead use the existing members	Ingo Schwarze	2017-07-08	1	-2/+2
\| \| \| \| \| \|	of struct roff_node which is allocated for each equation anyway. 2. Do not keep a list of equation parsers, one parser is enough. Minus fifty lines of code, no functional change.
*	Write text boxes as <mi>, <mn>, or <mo> as appropriate,	Ingo Schwarze	2017-06-23	1	-0/+1
\| \| \| \| \|	and write fontstyle or fontweight attributes where required. Missing features reported by bentley@.
*	Start roff formatter modules for HTML and termininal output,	Ingo Schwarze	2017-05-04	1	-0/+2
\| \| \| \| \| \| \|	used by both the mdoc and man formatters, with the ultimate goal of reducing code duplication between the two macro formatters. Made possible by the parser unification. Add the first formatting function (for the .br request).
*	Minimal support for deep linking into man(7) pages.	Ingo Schwarze	2017-03-15	1	-0/+2
\| \| \| \| \|	As the man(7) language does not provide semantic markup, only .SH, .SS, and .UR become anchors for now.
*	mark up .Ar, .Fa, .Va, .Ft, and .Vt with <var> rather than <i>;	Ingo Schwarze	2017-02-05	1	-0/+1
\| \| \| \|	suggested by bentley@ long ago, but needed lots of cleanup first
*	for .Rs, use <cite>	Ingo Schwarze	2017-02-05	1	-0/+1
\|
*	Improve <table> syntax:	Ingo Schwarze	2017-02-05	1	-1/+1
\| \| \| \| \| \| \| \|	The <col> element can only appear inside <colgroup>, so use <colgroup>. The <tbody> element is optional and useless, so don't use it. Even if we would ever need <thead> or <tfoot>, <tbody> would still be optional and useless; besides, we will likely never need <thead> or <tfoot>, simply because our languages don't support such functionality.
*	eliminate one useless struct and one level of indirection;	Ingo Schwarze	2017-01-29	1	-5/+1
\| \| \| \|	no functional change
*	Fix -man -Thtml formatting after .nf (which has nothing to do	Ingo Schwarze	2017-01-26	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	with "literal", by the way, it means "no fill"): * Use <pre> such that whitespace is preserved. * Preserve lines breaks. * For font alternating macros, avoid node recursion which required scary juggling with the fill state. Instead, simply print the text children directly. Missing feature first noticed by kristaps@ in 2011, the again reported by afresh1@ in 2016, and finally reported here: https://github.com/Debian/debiman/issues/21 , which i only found because of Shane Kerr's comment here: https://plus.google.com/110314300533310775053/posts/H1eaw9Yskoc
*	clean up markup of .Bd, .D1, .Dl, .Li, and .Ql;	Ingo Schwarze	2017-01-19	1	-1/+0
\| \| \| \|	in particular, stop abuse of <blockquote>
*	Implement line breaking of the generated HTML code at space characters	Ingo Schwarze	2017-01-19	1	-2/+5
\| \| \| \| \| \| \| \| \|	in filled text. This does not affect HTML semantics, but makes the HTML code even more humanly readable. While here, - collapse multiple consecutive space characters in filled text - and insert a blank between style entries.
*	Make HTML output more human readable by overhauling line break logic	Ingo Schwarze	2017-01-18	1	-0/+3
\| \| \| \| \|	around tags and by introducing some simple indentation. No change of HTML semantics intended.
*	Completely delete the buf field of struct html and all the buf*()	Ingo Schwarze	2017-01-17	1	-15/+0
\| \| \| \| \| \| \| \| \|	interfaces. Such a static buffer was a bad idea in the first place, causing unfixable truncation that was only prevented by triggering an assertion failure. Instead, let the small number of remaining users allocate and free their own, temporary dynamic buffers, or for the case of .Xr and .In, pass the original data to be assembled in print_otag().
*	Simplify the usage of print_otag() by making it accept a variable	Ingo Schwarze	2017-01-17	1	-35/+2
\| \| \| \| \| \| \| \| \| \|	number of arguments. Delete struct htmlpair and all the PAIR_*() macros. Delete enum htmlattr, handle that in print_otag() instead. Minus 190 lines of code; no functional change except better ordering of attributes (class before style) in three cases.
*	Use __attribute__((__format__ throughout.	Ingo Schwarze	2016-07-19	1	-4/+2
\| \| \| \| \| \| \|	Triggered by a smaller patch from Christos Zoulas. While here, unify style, move several config tests to config.h, and delete the useless MANDOC_CONFIG_H.
*	In private header files, __BEGIN_DECLS and __END_DECLS are pointless.	Ingo Schwarze	2015-11-07	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \|	Because these work slightly differently on different systems, they are becoming a maintenance burden in the portable version, so delete them. Besides, one of the chief design goals of the mandoc toolbox is to make sure that nothing related to documentation requires C++. Consequently, linking mandoc against any kind of C++ program would defeat the purpose and is not supported. I don't understand why kristaps@ added them in the first place.
*	Major character table cleanup:	Ingo Schwarze	2015-10-13	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Use ohash(3) rather than a hand-rolled hash table. * Make the character table static in the chars.c module: There is no need to pass a pointer around, we most certainly never want to use two different character tables concurrently. * No need to keep the characters in a separate file chars.in; that merely encourages downstream porters to mess with them. * Sort the characters to agree with the mandoc_chars(7) manual page. * Specify Unicode codepoints in hex, not decimal (that's the detail that originally triggered this patch). No functional change, minus 100 LOC, and i don't see a performance change.
*	Fix the implementation and documentation of \c (continue text input line).	Ingo Schwarze	2014-12-02	1	-0/+1
\| \| \| \| \|	In particular, make it work in no-fill mode, too. Reminded by Carsten dot Kunze at arcor dot de (Heirloom roff).
*	header cleanup:	Ingo Schwarze	2014-12-01	1	-2/+5
\| \| \| \| \| \|	* add missing forward declarations * remove needless header inclusions * some style unification
*	remove unneccessary inclusion protection; patch from deraadt@	Ingo Schwarze	2014-12-01	1	-4/+0
\|
*	Make the character table available to libroff so it can check the	Ingo Schwarze	2014-10-28	1	-1/+1
\| \| \| \| \| \| \| \|	validity of character escape names and warn about unknown ones. This requires mchars_spec2cp() to report unknown names again. Fortunately, that doesn't require changing the calling code because according to groff, invalid character escapes should not produce output anyway, and now that we warn about them, that's fine.
*	sync Copyright years after merge to OpenBSD; no code change	Ingo Schwarze	2014-10-10	1	-1/+1
\|
*	Re-write of eqn(7) parser and MathML output.	Kristaps Dzonsons	2014-10-10	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	This adds parser-level support for the grammar described by the eqn second-edition technical paper, "Typesetting Mathematics — User's Guide" (Kernighan, Cherry). The reason for this re-write is the grouping rules, which were not possible given the existing implementation. The re-write has also considerably simplified the HTML (and, if it ever is completed, terminal) front-end.
*	Change "to" and "from" commands to use munder, mover, and munderover.	Kristaps Dzonsons	2014-09-28	1	-0/+3
\|
*	Add support for some MathML elements and attributes in our HTML5.	Kristaps Dzonsons	2014-09-28	1	-0/+15
\|
*	Don't pretend we have a separate XHTML and HTML mode any more.	Kristaps Dzonsons	2014-09-27	1	-6/+0
\|