mandoc - UNIX manpage compiler toolset

	Commit message (Collapse)	Author	Age	Files	Lines
*	Restore the traditional behaviour of the man(7) single-font	Ingo Schwarze	2022-08-16	1	-10/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	macros .B, .I, .SM, and .SB that the next-line scope extends to the end of the next logical input line and is not extended if that line ends with a \c (no-space) escape sequence. While improving a loosely related feature in the man(7) .TP macro, a regression entered the groff codebase in groff commit 3549fd9f (28-Apr-2017) caused by the usual sloppiness of Bjarni Ingi Gislason. Since that time, groff wrongly had \c extend next-line scope to a second line for these macros. In man.c rev. 1.127 (25-Aug-2018) i synched mandoc behaviour with groff in this respect, unfortunately failing to notice the recent regression in groff. The groff regression was finally fixed by gbranden@ in commit 09c028f3 (07-Jun-2022). With the present commit, mandoc is back in sync with both GNU and Heirloom roff regarding the interaction of single-font macros with \c.
*	New tests of tabs in fill mode, in particular	Ingo Schwarze	2022-08-16	3	-2/+140
\| \| \| \|	when multiple input or output lines are involved.
*	Adjust the desired output after the improvements in term.c rev. 1.290.	Ingo Schwarze	2022-08-16	1	-1/+1
\| \| \| \| \| \|	The new version of this file was generated with groff-current. Heirloom nroff produces exactly the same output for the content of the DESCRIPTION.
*	When starting a new input line, even when continuing the same output	Ingo Schwarze	2022-08-16	5	-11/+49
\| \| \| \| \| \|	line, use the current output position as the reference position for tabs on that input line. This brings mandoc in line with the behaviour of GNU, Heirloom, and Plan 9 roff.
*	Even though the constant ASCII_ESC is only used in the roff pre-parser roff.c,	Ingo Schwarze	2022-08-16	2	-9/+8
\| \| \| \| \|	move it to the top level include file mandoc.h to reduce the risk of causing clashes when introducing new ASCII_* constants in the future.
*	Some more tests of no-fill mode similar to mdoc/Bd/blank.in	Ingo Schwarze	2022-08-15	2	-10/+29
\| \| \| \|	after vertical spacing was improved in man_term.c rev. 1.239.
*	Simplify handling of no-fill mode in man(7) by inspecting NODE_NOFILL	Ingo Schwarze	2022-08-15	1	-24/+20
\| \| \| \| \| \| \| \| \| \| \| \| \|	at the beginning of the node handler, in the same way as it is done in the mdoc(7) node handler. As a side effect, this also fixes a bug: if an input line contained nothing but an escape sequence producing no output whatsoever (for example, \fR), the old code incorrectly emitted a blank line anyway, whereas the new code only emits such a blank link if the input line actually produces output (even invisible zero-width output). To make the distinction, the ASCII_NBRZW -> lastcol -> term_newln() mechanism established in term.c rev. 1.289 is used.
*	Distinguish between escape sequences that produce no output	Ingo Schwarze	2022-08-15	7	-17/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	whatsoever (for example \fR) and escape sequences that produce invisible zero-width output (for example \&). No, i'm not joking, groff does make that distinction, and it has consequences in some situations, for example for vertical spacing in no-fill mode. Heirloom and Plan 9 behaviour is subtly different, but in case of doubt, we want to follow groff. While this fixes the behaviour for the majority of escape sequences, in particular for those most likely to occur in practice, it is not perfect yet because some of the more exotic ESCAPE_IGNORE sequences are actually of the "no output whatsoever" type but treated as "invisible zero-width" for now. With the new ASCII_NBRZW mechanism in place, switching them over one by one when the need arises will no longer be very difficult.
*	In GNU, Heirloom, and Plan 9 roff, tab positions apply to input lines,	Ingo Schwarze	2022-08-15	3	-7/+12
\| \| \| \| \| \| \| \| \| \| \| \| \|	not to output lines. In particular, if an input line gets broken in fill mode and a tab occurs in the second output line, it advances to a position of at least (width of the first output line) + (width of a space character even though this is never printed) + (width of the part of the second output line that precedes the tab). Implement the same logic in mandoc. Again, do not use tabs in filled text: they have surprising effects, including this one.
*	In GNU, Heirloom, and Plan 9 roff, literal tab characters are	Ingo Schwarze	2022-08-15	1	-19/+15
\| \| \| \| \| \| \| \| \| \| \| \| \|	non-breakable in exactly the same way as "\ ". That is, the preceding word, the tab character, and the following word are always kept together on the same output line. If filling is enabled and an output line break is required before the end of the following word, the break occurs before the beginning of the preceding word. Make mandoc behave in the same way. Of course, using literal tab characters in filled text remains a bad idea, and the "WARNING: tab in filled text" remains unchanged.
*	two ideas from RGBDS	Ingo Schwarze	2022-08-09	1	-0/+8
\|
*	prevent breakable hyphens in segment identifiers	Ingo Schwarze	2022-08-09	1	-2/+5
\| \| \| \| \|	from being turned into underscores; bug reported by <Eldred dot fr> Habert
*	For clarity and consistency, refer to ".Bx 4.0" rather than ".Bx 4".	Ingo Schwarze	2022-08-04	1	-1/+3
\| \| \| \|	Also, mention /usr/ucb/man because /usr/bin/man did not provide -f in 4.0BSD.
*	If the body of a man(7) .MT or .UR block is empty, do not emit a warning.	Ingo Schwarze	2022-08-02	4	-9/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Leaving the body empty is legitimate in this case if the author only wants to display a mail address or URI without providing a link text. Output modules already handle this correctly: terminal output shows just the URI without an accompanying text, HTML output uses the URI for both the href= attribute and as the content of the <a> element. The documentation was also wrong and claimed that an .MT or .UR block with an empty body would produce no output. As explained above, this isn't true. Bogus warning reported by Alejandro Colomar <alx dot manpages at gmail dot com>.
*	Delete OpenBSD-only rules from the regress/roff/de Makefile	Ingo Schwarze	2022-08-02	1	-38/+0
\| \| \| \| \|	after they were changed in OpenBSD. Tracking these rules here would be useless.
*	For accessibility, label the last two widgets in the search form.	Ingo Schwarze	2022-07-06	1	-4/+6
\| \| \| \|	Patch from Anna Vyalkova <cyber at sysrq dot in>, significantly tweaked by me.
*	https://www.w3.org/WAI/ARIA/apg/practices/names-and-descriptions/ says:	Ingo Schwarze	2022-07-06	3	-7/+7
\| \| \| \| \| \| \|	"Start names with a capital letter; it helps some screen readers speak them with appropriate inflection." Anna Vyalkova already did that correctly when sending patches, but i ruined it when committing, so fix it now.
*	improve the description of header.html and footer.html	Ingo Schwarze	2022-07-06	1	-4/+6
\|
*	assign the ARIA role "doc-subtitle" to the .Nd element;	Ingo Schwarze	2022-07-06	1	-1/+1
\| \| \| \|	discussed with Anna Vyalkova <cyber at sysrq dot in>
*	While the HTML standard allows multiple <h1> elements in the same	Ingo Schwarze	2022-07-06	14	-41/+41
\| \| \| \| \| \| \| \| \| \| \| \|	document, <h1> is intended for top level headers, and most of the sections in a manual page can hardly be considered top-level. It is more usual to use <h1> only for the main title of the document of for the site name. Consequently, move .Sh/.SH from <h1> to <h2> and .Ss/.SS from <h2> to <h3>, freeing <h1> for use by header.html in man.cgi(8). Discussed with Anna Vyalkova <cyber at sysrq dot in>.
*	Finally get rid of the archaic <table> markup for header and footer lines	Ingo Schwarze	2022-07-05	4	-41/+49
\| \| \| \| \| \| \| \| \| \| \|	and use flexbox CSS instead. Improve accessibility by adding role and aria-label attributes to these header and footer lines. Using ideas from both Anna Vyalkova <cyber at sysrq dot in> and myself. As a welcome side effect, this also resolves the long-standing issue that the rendering was always 65em wide, requiring horizontal scrolling when the window was narrower. Now, rendering nicely adapts to browser windows of arbitrary narrowness.
*	Somehow, the content of header.html ended up	Ingo Schwarze	2022-07-05	1	-30/+42
\| \| \| \| \| \|	before and outside the <header> element. Fix this by moving it into the <header> element where it belongs. While here, also wrap footer.html in a <footer> element.
*	Improve accessibility of man.cgi(8) in various respects,	Ingo Schwarze	2022-07-04	1	-15/+32
\| \| \| \| \| \| \|	in particular adding <header>, <main>, and <nav> elements and role and aria-label attributes in several places. Patch from Anna Vyalkova <cyber at sysrq dot in>, minimally tweaked by me.
*	Put the HTML comment containing the Copyright header (if any)	Ingo Schwarze	2022-07-04	2	-4/+4
\| \| \| \| \| \| \|	between the <head> and the <body> rather than before the <head> because the <meta charset="utf-8"/> element ought to be within the first 1024 bytes of the HTML code. Issue found with validator.w3.org.
*	spelling; patch from jsg@	Ingo Schwarze	2022-07-03	1	-1/+1
\|
*	Instead of the custom <div class="manual-text">, use the standard	Ingo Schwarze	2022-07-03	5	-6/+7
\| \| \| \| \| \| \| \| \| \|	HTML <main> element. The benefit is that it has the ARIA landmark role "main" by default. To ease the transition for people using their own CSS file instead of mandoc.css, retain the custom class for now. I had this idea in a discussion with Anna Vyalkova <cyber at sysrq dot in>. Patch from Anna, slightly tweaked by me.
*	In groff commit 78e66624 on May 7 20:15:33 2021 +1000,	Ingo Schwarze	2022-06-26	2	-2/+2
\| \| \| \| \| \|	G. Branden Robinson changed the -T ascii rendering of \(sd, the "second" symbol, U+2033 DOUBLE PRIME, from '' to ". Follow suit in mandoc.
*	additional info regarding the .nf <br/> issue	Ingo Schwarze	2022-06-26	1	-1/+3
\|
*	If an .Xr macro contains a section argument, write an aria-label attribute	Ingo Schwarze	2022-06-25	2	-11/+21
\| \| \| \| \| \| \| \|	such that users of screen readers aren't forced to listen to lengthy and distracting readings like "mdoc, left parenthesis, 7, right parenthesis". Based on a patch from Anna Vyalkova <cyber at sysrq dot in>, significantly tweaked by me.
*	Improve accessibility of -T html -O toc output by using the <nav> element	Ingo Schwarze	2022-06-24	4	-2/+12
\| \| \| \| \| \| \| \| \| \| \| \| \|	in the DPUB-ARIA doc-toc role. Patch from Anna Vyalkova <cyber at sysrq dot in> slightly tweaked by me. This is hopefully the start of a collaboration to improve accessibility of Unix manual pages using the WAI-ARIA, HTML-ARIA, and DPUB-ARIA standards. Progress appears to be possible without changing anything with respect to the way manual pages are written. Instead, it seems sufficient to properly translate semantic cues already implied by existing mdoc(7) markup into the appropriate HTML elements and ARIA attributes. Overall, the total length of HTML output is likely to increase slightly, but not much.
*	Delete the statement that the default stylesheet only used CSS1	Ingo Schwarze	2022-06-22	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	because that has no longer been true for some time now. I would certainly like to adhere to a coherent standard and state which one that is. Unfortunately, the W3C deliberately smashed the CSS standard into pieces such that a coherent standard no longer exists and such that statements about standard conformance have become next to meaningless. Consequently, i now remain reluctantly silent regarding CSS standard(s) conformance. Going back to CSS2.1, published in 2011, which was the last CSS standard in the proper sense of the word, is not an option because it has gaping holes in functionality and is no longer adequate for use on today's WWW.
*	#include <stddef.h>, needed for NULL; bug reported by op@	Ingo Schwarze	2022-06-21	1	-0/+2
\|
*	When looking for the next block to tag, we aren't interested in children	Ingo Schwarze	2022-06-08	1	-1/+2
\| \| \| \| \|	of the current block but really want the next block instead. This fixes a segfault reported by Evan Silberman <evan at jklol dot net> on bugs@.
*	Surprisingly, every escape sequence can also be used as an argument	Ingo Schwarze	2022-06-08	29	-52/+1053
\| \| \| \| \| \| \|	delimiter for an outer escape sequence, in which case the delimiting escape sequence retains its syntax but usually ignores its argument and loses its inherent effect. Add rudimentary support for this syntax quirk in order to improve parsing compatibility with groff.
*	Split the excessively generic diagnostic message "invalid escape sequence"	Ingo Schwarze	2022-06-07	10	-21/+44
\| \| \| \| \|	into the more specific messages "invalid escape argument delimiter" and "invalid escape sequence argument".
*	Purge duplicate error reporting from the .tr request parser:	Ingo Schwarze	2022-06-07	1	-11/+2
\| \| \| \| \|	the error was already reported earlier when roff_expand() called roff_escape().
*	To better match groff parsing, reject digits and some mathematical	Ingo Schwarze	2022-06-06	1	-7/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	operators as argument delimiters for some escape sequences that take numerical arguments, in the same way as it had already been done for \h. Argument delimiter parsing for escape sequences taking numerical arguments is not perfect yet. In particular, when a character representing a scaling unit is abused as the argument delimiter, parsing for that character becomes context-dependent, and it is no longer possible to find the end of the escape sequence without calling the full numerical expression parser, which i refrain from attempting in this commit. For now, continuing to misparse insane constructions like \Bc1c+1cc (which is valid in groff and resolves to "1" because 1c+1c = two centimeters is a valid numerical expression and 'c' is also a valid delimiter) is a small price to pay for keeping complexity at bay and for not losing focus in the ongoing series of refinements.
*	adjust two desired error messages after roff_escape.c rev. 1.11	Ingo Schwarze	2022-06-06	1	-2/+2
\| \| \| \|	improved diagnostics for the \C escape sequence
*	Allow arbitrary argument delimiters for \C, like groff does.	Ingo Schwarze	2022-06-06	1	-4/+5
\| \| \| \| \| \| \| \| \|	The restriction of only allowing ' as the delimiter was introduced by kristaps@ on 2011/04/09 when he first supported \C. For most other escape sequences, similar restrictions were relaxed later on, but for the rarely used \C, it was apparently forgotten. While here, reject empty character names: they are never valid.
*	add and update a few entries	Ingo Schwarze	2022-06-06	1	-0/+11
\|
*	With the improved escape sequence parser, it becomes easy to also improve	Ingo Schwarze	2022-06-05	16	-96/+147
\| \| \| \| \| \| \| \| \|	diagnostics. Distinguish "incomplete escape sequence", "invalid special character", and "unknown special character" from the generic "invalid escape sequence", also promoting them from WARNING to ERROR because incomplete escape sequences are severe syntax violations and because encountering an invalid or unknown special character makes it likely that part of the document content intended by the authors gets lost.
*	Small cleanup of error reporting:	Ingo Schwarze	2022-06-05	1	-11/+12
\| \| \| \| \| \| \|	call mandoc_msg() only once at the end, not sometimes in the middle, classify incomplete, non-expanding escape sequences as ESCAPE_ERROR, and also reduce the number of return statemants; no formatting change intended.
*	During identifier parsing, handle undefined escape sequences	Ingo Schwarze	2022-06-03	18	-58/+222
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in the same way as groff: * \\ is always reduced to \ * \. is always reduced to . * other undefined escape sequences are usually reduced to the escape name, for example \G to G, except during the expansion of expanding escape sequences having the standard argument form (in particular \* and \n), in which case the backslash is preserved literally. Yes, this is confusing indeed. For example, the following have the same meaning: * .ds \. and .ds . which is not the same as .ds \\. * \[\.] and \[.] which is not the same as \[\\.] .ds \G and .ds G which is not the same as .ds \\G * \[\G] and \[\\G] which is not the same as \*[G] <- sic! To feel less dirty, have a leaning toothpick, if you are so inclined. This patch also slightly improves the string shown by the "escaped character not allowed in a name" error message.
*	Since \. is not a character escape sequence, re-classify it from the	Ingo Schwarze	2022-06-02	3	-11/+15
\| \| \| \| \| \| \| \| \| \| \| \| \|	wrong parsing class ESCAPE_SPECIAL to the better-suited parsing class ESCAPE_UNDEF, exactly like it is already done for the similar \\, which isn't a character escape sequence either. No formatting change is intended just yet, but this will matter for upcoming improvements in the parser for roff(7) macro, string, and register names. See the node "5.23.2 Copy Mode" in "info groff" regarding what \\ and \. really mean.
*	Avoid the layering violation of re-parsing for \E in roff_expand().	Ingo Schwarze	2022-06-02	3	-31/+25
\| \| \| \| \| \| \| \| \|	To that end, add another argument to roff_escape() returning the index of the escape name. This also makes the code in roff_escape() a bit more uniform in so far as it no longer needs the "char esc_name" local variable but now does everything with indices into buf[]. No functional change.
*	Fix a buffer overrun in the roff(7) escape sequence parser that could	Ingo Schwarze	2022-06-01	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	be triggered by macro arguments ending in double backslashes, for example if people wrote .Sq "\\" instead of the correct .Sq "\e". The bug was hard to find because it caused a segfault only very rarely, according to my measurements with a probability of less than one permille. I'm sorry that the first one to hit the bug was an arm64 release build run by deraadt@. Thanks to bluhm@ for providing access to an arm64 machine for debugging purposes. In the end, the bug turned out to be architecture-independent. The reason for the bug was that i assumed an invariant that does not exist. The function roff_parse_comment() is very careful to make sure that the input buffer does not end in an escape character before passing it on, so i assumed this is still true when reaching roff_expand() immediately afterwards. But roff_expand() can also be reached from roff_getarg(), in which case there can be a lone escape character at the end of the buffer in case copy mode processing found and converted a double backslash. Fix this by handling a trailing escape character correctly in the function roff_escape(). The lesson here probably is to refrain from assuming an invariant unless verifying that the invariant actually holds is reasonably simple. In some cases, in particular for invariants that are important but not simple, it might also make sense to assert(3) rather than just assume the invariant. An assertion failure is so much better than a buffer overrun...
*	Rudimentary implementation of the \A escape sequence, following groff	Ingo Schwarze	2022-05-31	3	-5/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	semantics (test identifier for syntactical validity), not at all following the completely unrelated Heirloom semantics (define hyperlink target position). The main motivation for providing this implementation is to get \A into the parsing class ESCAPE_EXPAND that corresponds to groff parsing behaviour, which is quite similar to the \B escape sequence (test numerical expression for syntactical validity). This is likely to improve parsing of nested escape sequences in the future. Validation isn't perfect yet. In particular, this implementation rejects \A arguments containing some escape sequences that groff allows to slip through. But that is unlikely to cause trouble even in documents using \A for non-trivial purposes. Rejecting the nested escapes in question might even improve robustnest because the rejected names are unlikely to really be usable for practical purposes - no matter that groff dubiously considers them syntactically valid.
*	Trivial patch to put the roff(7) \g (interpolate format of register)	Ingo Schwarze	2022-05-31	3	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	escape sequence into the correct parsing class, ESCAPE_EXPAND. Expansion of \g is supposed to work exactly like the expansion of the related escape sequence \n (interpolate register value), but since we ignore the .af (assign output format) request, we just interpolate an empty string to replace the \g sequence. Surprising as it may seem, this actually makes a formatting difference for deviate input like ".O\gNx" which used to raise bogus "escaped character not allowed in a name" and "skipping unknown macro" errors and printed nothing, whereas now it correctly prints "OpenBSD".
*	Dummy implementation of the roff(7) \V (interpolate environment variable)	Ingo Schwarze	2022-05-30	7	-9/+47
\| \| \| \| \| \| \| \| \|	escape sequence. This is needed to get \V into the correct parsing class, ESCAPE_EXPAND. It is intentional that mandoc(1) output is not influenced by environment variables, so interpolate the name of the variable with some decorating punctuation rather than interpolating its value.
*	Re-classify the roff(7) \r (reverse line feed) escape sequence	Ingo Schwarze	2022-05-20	6	-10/+36
\| \| \| \| \| \| \|	from "ignore" to "unsupported" because when an input file uses it, mandoc(1) is likely to significantly misformat the output, usually showing parts of the output in a different order than the author intended.