mandoc - UNIX manpage compiler toolset

	Commit message (Collapse)	Author	Age	Files	Lines
*	remove a bogus warning that went away as a side effect	Ingo Schwarze	2022-05-19	1	-1/+0
\| \| \| \|	of the roff_expand() reorganization in roff.c rev. 1.388
*	Make roff_expand() parse left-to-right rather than right-to-left.	Ingo Schwarze	2022-05-19	6	-646/+677
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some escape sequences have side effects on global state, implying that the order of evaluation matters. For example, this fixes the long-standing bug that "\n+x\n+x\n+x" after ".nr x 0 1" used to print "321"; now it correctly prints "123". Right-to-left parsing was convenient because it implicitly handled nested escape sequences. With correct left-to-right parsing, nesting now requires an explicit implementation, here solved as follows: 1. Handle nested expanding escape sequences iteratively. When finding one, expand it, then retry parsing the enclosing escape sequence from the beginning, which will ultimately succeed as soon as it no longer contains any nested expanding escape sequences. 2. Handle nested non-expanding escape sequences recursively. When finding one, the escape sequence parser calls itself to find the end of the inner sequence, then continues parsing the outer sequence after that point. This requires the mandoc_escape() function to operate in two different modes. The roff(7) parser uses it in a mode where it generates diagnostics and may return an expansion request instead of a parse result. All other callers, in particular the formatters, use it in a simpler mode that never generates diagnostics and always returns a definite parsing result, but that requires all expanding escape sequences to already have been expanded earlier. The bulk of the code is the same for both modes. Since this required a major rewrite of the function anyway, move it into its own new file roff_escape.c and out of the file mandoc.c, which was misnamed in the first place and lacks a clear focus. As a side benefit, this also fixes a number of assertion failures that tb@ found with afl(1), for example "\n\\\\0", "\v\-\\0", and "\w\-\\\\\$0*0". As another side benefit, it also resolves some code duplication between mandoc_escape() and roff_expand() and centralizes all handling of escape sequences (except for expansion) in roff_escape.c, hopefully easing maintenance and feature improvements in the future. While here, also move end-of-input handling out of the complicated function roff_expand() and into the simpler function roff_parse_comment(), making the logic easier to understand. Since this is a major reorganization of a central component of mandoc(1), stability of the program might slightly suffer for a few weeks, but i believe that's not a problem at this point of the release cycle. The new code already satisfies the regression suite, but more tweaking and regression testing to further improve the handling of various escape sequences will likely follow in the near future.
*	improve a comment explaining a particularly nasty hack; no code change	Ingo Schwarze	2022-05-19	1	-1/+6
\|
*	Split a new function roff_parse_comment() out of roff_expand() because this	Ingo Schwarze	2022-05-01	5	-99/+154
\| \| \| \| \| \| \|	functionality is not needed when called from roff_getarg(). This makes the long and complicated function roff_expand() significantly shorter, and also simpler in so far as it no longer needs to return ROFF_APPEND. No functional change intended.
*	Provide a new function roff_req_or_macro() to parse and handle a request	Ingo Schwarze	2022-04-30	9	-43/+160
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	or macro, including context-dependent error handling inside tbl(7) code and inside .ce/.rj blocks. Use it both in the top level roff(7) parser and inside conditional blocks. This fixes an assertion failure triggered by ".if 1 .ce" inside tbl(7) code, found by tb@ using afl(1). As a side benefit for readability, only one place remains in the code that calls the main handler functions for the various roff(7) requests. This patch also improves column numbers in some error messages and various comments.
*	Add comments to some of the enum roff_tok values;	Ingo Schwarze	2022-04-30	1	-12/+12
\| \| \| \| \| \|	particularly useful for values that have non-obvious semantics like ROFF_MAX, ROFF_cblock, ROFF_RENAMED, and TOKEN_NONE; no code change.
*	Refactor the handler function roff_block_sub() for clarity and simplicity.	Ingo Schwarze	2022-04-30	1	-16/+9
\| \| \| \| \| \| \| \| \| \| \| \| \|	1. Do not needlessly access the function pointer table roffs[]. Instead, simply call the block closing function directly. 2. Sort code: handle both cases of block closing at the beginning of the function rather than one at the beginning and one at the end. 3. Trim excessive, partially repetitive and obvious comments, also making the comments considerably more precise. No functional change.
*	The syntax of the roff(7) .mc request is quite special	Ingo Schwarze	2022-04-28	9	-3/+135
\| \| \| \| \| \| \| \| \|	and the roff_onearg() parsing function is too generic, so provide a dedicated parsing function instead. This fixes an assertion failure when an \o escape sequence is passed as the argument; the bug was found by tb@ using afl(1). It also makes mandoc output more similar to groff in various cases.
*	Element next-line scopes may nest, so man_breakscope() may have to	Ingo Schwarze	2022-04-28	5	-9/+59
\| \| \| \| \| \| \| \|	break multiple element next-line scopes at the same time, similar to what man_descope() already does for unconditional rewinding. This fixes an assertion failure that tb@ found with afl(1), caused by .SH .I .I .BI and similar sequences of macros without arguments.
*	The .AT, .DT, and .UC macros are allowed inside next-line scope	Ingo Schwarze	2022-04-27	12	-8/+104
\| \| \| \| \| \|	and never produce output at the place of their invocation. Minibugs found while investigating unrelated afl(1) reports from tb@.
*	Fix three bugs regarding the interaction of \z and \h:	Ingo Schwarze	2022-04-27	6	-7/+58
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. The combination \z\h is a no-op whatever the argument may be. In the past, the \z only affected the first space character generated by the \h, which was wrong. 2. For the conbination \zX\h with a positive argument, the first space resulting from the \h is not printed but consumed by the \z. 3. For the combination \zX\h with a negative argument, application of the \z needs to be completed before the \h can be started. In the past, if this combination occurred at the beginning of an output line, the \h backed up to the beginning of the line and after that, the \z attempted to back up even further, triggering an assertion. Bugs found during an audit of assignments to termp->col that i started after the bugfix tbl_term.c rev. 1.65. The assertion triggered by bug 3 was not yet found by afl(1).
*	typo in example text: unsused -> unused; noticed by tb@	Ingo Schwarze	2022-04-26	4	-5/+5
\|
*	At the end of every tbl(7) cell, clear the \z state.	Ingo Schwarze	2022-04-26	6	-5/+65
\| \| \| \| \| \| \| \| \|	This is needed because the TERMP_MULTICOL mode is designed such that term_tbl() buffers all the cells of the table row before the normal reset logic near the end of term_flushln() can be reached. This fixes an assertion failure triggered by \z near the end of a table cell, found by tb@ using afl(1).
*	If a node is tagged explicitly, skip implicit tagging for that node.	Ingo Schwarze	2022-04-26	7	-7/+67
\| \| \| \| \| \| \| \|	Apart from making sense in the first place, this fixes an assertion failure that happened when the calculated implicit tag did not match the string value of the first child of the node, Bug found by tb@ using afl(1).
*	When we open a new .while loop, let's not attempt to close out	Ingo Schwarze	2022-04-24	1	-2/+4
\| \| \| \| \| \| \| \| \|	another enclosing .while loop at the same time. Instead, postpone the closing until the next iteration of ROFF_RERUN. This prevents one-line constructions like ".while 0 .while 0 something" and ".while rx .while rx .rr x" (which admittedly aren't particularly useful) from dying of abort(3), which was a bug tb@ found with afl(1).
*	If a .shift request has a negative argument, do not use a negative array	Ingo Schwarze	2022-04-24	7	-11/+32
\| \| \| \| \| \| \| \|	index but use 0 instead of the argument, just like groff. Warn about the invalid argument. While here, fix the column number in another warning message. Segfault reported by tb@, found with afl(1).
*	If the last data row of a tbl(7) contains nothing but a horizontal line,	Ingo Schwarze	2022-04-23	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	do not skip closing the table and cleaning up memory at the end of the table in the HTML output module. This bug resulted in skipping the tblcalc() function and reusing the existing roffcol array for the next tbl(7) processed. If the next table had more columns than the one ending with a horizontal line in the last data row, uninitialized memory was read, potentially resulting in near-infinite output. The bug was introduced in rev. 1.29 (2018/11/26) but only fully exposed by rev. 1.38 (2021/09/09). Until rev. 1.37, it could only cause misformatting and invalid HTML output syntax but not huge output because up to that point, the function did not use the roffcol array. Nasty bug found the hard way by Michael Stapelberg on the production server manpages.debian.org. Michael also supplied example files and excellent instructions how to reproduce the bug, which was very difficult because no real-world manual page is known that triggers the bug by itself, so to reproduce the bug, mandoc(1) had to be invoked with at least two file name arguments.
*	support for hunting memory leaks;	Ingo Schwarze	2022-04-14	19	-66/+848
\| \| \| \|	designed and written last autumn, polished today
*	some HTML/CSS issues from John Gardner	Ingo Schwarze	2022-04-14	1	-1/+12
\|
*	prefer https links in man pages	Ingo Schwarze	2022-04-14	1	-3/+3
\| \| \| \| \|	patch from jsg@ ok gnezdo@ miod@ jmc@
*	To prevent infinite recursion while expanding eqn(7) definitions,	Ingo Schwarze	2022-04-13	5	-19/+94
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	we must not reset the recursion counter when moving beyond the end of the previous expansion, but we may only do so when moving beyond the rightmost position reached by any expansion in the current equation. This matters because definitions can nest; consider: .EQ define inner "content" define outer "inner outer" outer .EN This endless loop was found by tb@ using afl(1). Incidentally, GNU eqn(1) also performs an infinite loop in this situation and then crashes when memory runs out, but that's not an excuse for nasty behaviour of mandoc(1). While here, consistently print the expanded content even when the expansion is finally truncated. While that is not likely to help end-users, it may help authors of eqn(7) code to understand what's going on. Besides, it sends a very clear signal that something is amiss, which was easy to miss in the past unless people enabled -W error or used -T lint.
*	Do not die on an assertion if an input file contains no section	Ingo Schwarze	2022-04-13	4	-5/+8
\| \| \| \| \| \| \| \| \|	whatsoever and ends with a broken next-line scope. Obviously, this cannot happen in a real manual page, but mandoc(1) should not die even when fed absurd input. This bug was independently reported by both jsg@ and tb@ who both found it with afl(1).
*	Surprisingly, groff supports multiple copy mode escapes at the	Ingo Schwarze	2022-04-13	5	-7/+55
\| \| \| \| \| \| \| \| \| \| \| \| \|	beginning of an escape sequence: \, \E, \EE, \EEE, and so on all do the same outside copy mode, so let them do the same in mandoc(1), too. This fixes an assertion failure triggered by \EEX that tb@ found with afl(1). The first E was consumed by roff_expand(), but that function failed to recognize the escape sequence as the expansion of a user-defined string and handed it over to mandoc_escape(), which consumed the second E and then died on an assertion because it is not prepared to handle user-defined strings. Fix this by letting both* functions handle arbitrary numbers of 'E's correctly.
*	When calculating the with of spanned columns, which for example matters	Ingo Schwarze	2022-04-08	1	-2/+5
\| \| \| \| \| \| \| \|	for centering text spanning multiple tbl(7) columns, correctly account for the spacing between columns instead of wrongly assuming the default spacing of 3n. Patch from Simon Branch <simonmbranch at gmail dot com>.
*	new TODO entry: handle Unicode letters in tags	Ingo Schwarze	2022-03-27	1	-0/+5
\|
*	The demandoc(1) program neither reads nor writes any databases, so	Ingo Schwarze	2022-03-20	1	-2/+1
\| \| \| \| \| \| \| \| \|	delete a sentence taking about databases. Having that sentence in the first place probably was a copy-and-paste mistake when adopting some text from the makewhatis(8) manual page. Triggered by a smaller patch sent to discuss@ by Paul A. Patience <paul at apatience dot com>.
*	Avoid legacy CSS2 syntax for the "display" property and use the CSS3	Ingo Schwarze	2022-03-17	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	two-value syntax "display: inline flow;" instead. In particular, there is no need to establish a new block formatting context with "flow-root", and in fact that's detrimental because it appears to introduce spurious soft-wrap opportunities. jmc@ reported a bogus line break between the opening angle bracket generated by .Aq Mt and the following email address.
*	Just say that the databases are intended for use by apropos(1), whatis(1),	Ingo Schwarze	2022-03-16	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \|	and man(1), without restricting that statement to "man -k". Suggested by and patch OK'ed by jmc@. While only apropos(1) and whatis(1) strictly require the database and while our man(1) implementation can find many manual pages even when no database is available or when the database is incomplete or corrupt, it does use the database even without -k whenever possible. Consequently, this change makes the manual page less confusing.
*	In the first example, use "mandoc -a" directly rather "mandoc -l".	Ingo Schwarze	2022-02-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	It feels more natural to me to use -a directly when asking mandoc(1) to use a pager. The reason that "mandoc -l" does exactly the same as "mandoc -a" is that "mandoc" is essentially "man -lc", so the -a implied by -l negates the -c and the -l has no effect because it is already the default for mandoc(1). The more usual command for doing the same is "man -l foo.1 bar.1 ..." but that's off-topic for the mandoc(1) manual page. Patch on tech@ from Anders Damsgaard <anders at adamsgaard dot dk>.
*	remove "please" from manual page;	Ingo Schwarze	2022-02-08	1	-1/+1
\| \| \| \|	patch from jsg@, ok jmc@ sthen@ millert@
*	Tedu support for the -xsh4.2 argument to the mdoc(7) .St macro	Ingo Schwarze	2022-01-13	2	-7/+3
\| \| \| \| \| \| \| \| \| \| \|	because all of the following hold: * It is an alias for a part of an ancient standard that is no longer important. * To refer to that old standard, -xpg4.2 is readily available and portable. * It is unused in OpenBSD, FreeBSD, and NetBSD. * Groff never supported it. I agreed with G. Branden Robinson that deleting this from mandoc is preferable to adding it to groff.
*	Only sort the result array if it contains more than one element,	Ingo Schwarze	2022-01-13	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	making the mansearch() function easier to read for human auditors. No functional change on OpenBSD. As observed by Mark Millard <marklmi at yahoo dot com>, neither the latest version of POSIX 2008 nor C11 defines what qsort(3) should do for base == NULL && nmemb == 0. My impression is it is indeed undefined behaviour because the standards say that base shall point to an array, NULL does not point to an array, and while there is special wording saying that compar() shall not be called if nmemb == 0, i fail to see any similar wording stating that base shall not be accessed if nmemb == 0. Consequently, this patch is also likely to improve standard conformance and portability. Minor issue found by Stefan Esser <se at FreeBSD> with UBSAN. He sent a patch to bugs@, but my patch differs in a minor way.
*	More accurately represent cells containing horizontal lines in -T tree	Ingo Schwarze	2022-01-12	1	-4/+8
\| \| \| \| \| \| \|	output. In particular, do not represent "_" as "-", and distinguish "_" from "\_" and "=" from "\=". Output tweak following a related question from Ted Bullock <tbullock at comlore dot com>.
*	According to the tbl(7) manual, if a data cell contains only the	Ingo Schwarze	2022-01-12	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	two character sequence "\_" or "\=", a single or double horizontal line is supposed to be drawn inside the cell, not joining its neighbours. I am not aware of any way to do that with HTML and/or CSS. Still, it seems closer to the intent of the document author to draw a horizontal line with <hr/>, even though that line will join the neighbour cells, rather than printing a literal '_' or '=' character. Formatting tweak inspired by a related question from Ted Bullock <tbullock at comlore dot com>.
*	In one of the examples, the tbl(7) source code displayed	Ingo Schwarze	2022-01-12	1	-2/+2
\| \| \| \| \| \| \| \|	contains a backslash that needs to be escaped, and the missing escaping resulted in very misleading formatting. Documentation bug found due to a question from Ted Bullock <tbullock at comlore dot com>.
*	column width specifications in tbl(7) HTML output	Ingo Schwarze	2022-01-12	1	-0/+4
\|
*	When rendering the \h (horizontal motion) low-level roff(7) escape	Ingo Schwarze	2022-01-10	1	-5/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	sequence in -T ps and -T pdf output mode, use an appropriate horizontal distance by correctly using the term_len() utility function. Output from the -T ascii, -T utf8, and -T html modes was already correct and remains unchanged. Lennart Jablonka <hummsmith42 at gmail dot com> found and reported this unit conversion bug (misinterpreting AFM units as if they were en units) when rendering scdoc-generated manuals (which is a low quality generator, but that's no excuse for mandoc misformatting \h) on Alpine Linux. Lennart also tested this patch.
*	merge OpenBSD commit by jmc@:	Ingo Schwarze	2021-12-06	1	-1/+1
\| \| \| \|	sytle -> style; adapted from changes by SAITOH masanobu (NetBSD)
*	Make sure that the configuration file is always read, even when	Ingo Schwarze	2021-11-05	1	-48/+37
\| \| \| \| \| \| \| \| \| \| \|	running with the -M option or with a MANPATH environment variable that has neither a leading or trailing ":" nor any "::". If -M or MANPATH override the configuration file rather than adding to it, just ignore any "manpath" directives while processing the configuration file. This fixes a bug reported by Jan Stary <hans at stare dot cz> on misc@.
*	Commit and commit message by deraadt@:	Ingo Schwarze	2021-11-05	2	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	For open/openat, if the flags parameter does not contain O_CREAT, the 3rd (variadic) mode_t parameter is irrelevant. Many developers in the past have passed mode_t (0, 044, 0644, or such), which might lead future people to copy this broken idiom, and perhaps even believe this parameter has some meaning or implication or application. Delete them all. This comes out of a conversation where tb@ noticed that a strange (but intentional) pledge behaviour is to always knock-out high-bits from mode_t on a number of system calls as a safety factor, and his bewilderment that this appeared to be happening against valid modes (at least visually), but no sorry, they are all irrelevant junk. They could all be 0xdeafbeef. ok millert
*	simplify a few accesses to fields of structs, using auxiliary pointer	Ingo Schwarze	2021-10-17	1	-5/+3
\| \| \| \| \|	variables that are already present (and used nearby) in the code; no functional change
*	Simplify the code building lists of spans, no output change intended.	Ingo Schwarze	2021-10-17	1	-6/+7
\| \| \| \| \| \| \| \|	A comment in the code claimed that the list of spans would be sorted, but the sorting did not actually work. The layout "LSSS,LLSL" resulted in the list "0-3, 1-2", whereas the layout "LLSL,LSSS" resulted in the list "1-2, 0-3". Since sorting serves no purpose, just leave the list unsorted.
*	better error message if mandocd is not found	Ingo Schwarze	2021-10-15	1	-1/+1
\|
*	Clean up memory handling in spawn_pager(), free(3)ing everything	Ingo Schwarze	2021-10-04	1	-14/+12
\| \| \| \| \|	that is malloc(3)ed. In addition to being less confusing, the new code is also shorter by two lines.
*	In man(1) mode, properly clean up the resn[] result array	Ingo Schwarze	2021-10-04	1	-0/+7
\| \| \| \| \| \| \| \| \| \|	after processing each name given on the command line. Failure to do so resulted in a memory leak of about 50 kilobytes per name given on the command line. Since man(1) uses a few Megabytes of memory anyway and people rarely give hundreds of names on the command line, this leak did not cause practical problems, but cleaning up properly is better in any case.
*	Provide a cleanup function for the term_tab module, freeing memory	Ingo Schwarze	2021-10-04	3	-8/+20
\| \| \| \| \| \| \| \| \| \|	and resetting the internal state to the initial state. Call this function from the proper place in term_free(). With the way the module is currently used, this does not imply any functional change, but doing proper cleanup is more robust, makes it easier during code review to understand what is going on, and makes it explicit that there is no memory leak.
*	store the operating system name obtained from uname(3) in the adequate	Ingo Schwarze	2021-10-04	3	-7/+8
\| \| \| \| \| \|	struct together with similar state date rather than in a function-scope static variable, such that it can be free(3)d in roff_man_free(); no functional change
*	Do not leak 64 bytes of heap memory every time a manual page calls	Ingo Schwarze	2021-10-04	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	a user-defined macro. Calls of standard mdoc(7) and man(7) macros were unaffected, so the effect on OpenBSD manual pages was small, about 80 Kilobytes grand total for a full run of "makewhatis /usr/share/man". Argument expansion contexts for user-defined macros are stored on a stack that grows as needed if calls of user-defined macros are nested or recursive. Individual stack entries contain dynamically allocated arrays of pointers to arguments; these argument arrays also grow as needed if user-defined macros take more than eight arguments. The mistake was that argument arrays of already initialized expansion contexts were leaked rather than reused on subsequent macro calls. I found this issue in a systematic hunt for memory leaks after Michael <Stapelberg at Debian> reported memory exhaustion problems on the production server manpages.debian.org. This sub-Megabyte leak is not the cause of Michael's trouble, though, where Gigabytes of memory are being wasted. We are still investigating whether the original problem may be related to his supervisor process, which is written in Go, rather than to mandoc.
*	tagging issues from weerd@ regarding hyphens	Ingo Schwarze	2021-10-01	1	-0/+11
\|
*	Revert part of the previous diff to fix a regression (another endless loop)	Ingo Schwarze	2021-09-28	1	-3/+17
\| \| \| \| \| \| \| \| \|	reported by Michael <Stapelberg at Debian> in the Linux md(4) manual. The reason the colwidth[] array is needed is not that it stores widths different from those in tbl->cols[].width, but that only part of the columns participate in the comparisons, i.e. only those intersecting at least one span the still requires width distribution.