mandoc - UNIX manpage compiler toolset

	Commit message (Collapse)	Author	Age	Files	Lines
*	When starting a new input line, even when continuing the same output	Ingo Schwarze	2022-08-16	1	-6/+37
\| \| \| \| \| \|	line, use the current output position as the reference position for tabs on that input line. This brings mandoc in line with the behaviour of GNU, Heirloom, and Plan 9 roff.
*	Distinguish between escape sequences that produce no output	Ingo Schwarze	2022-08-15	1	-6/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	whatsoever (for example \fR) and escape sequences that produce invisible zero-width output (for example \&). No, i'm not joking, groff does make that distinction, and it has consequences in some situations, for example for vertical spacing in no-fill mode. Heirloom and Plan 9 behaviour is subtly different, but in case of doubt, we want to follow groff. While this fixes the behaviour for the majority of escape sequences, in particular for those most likely to occur in practice, it is not perfect yet because some of the more exotic ESCAPE_IGNORE sequences are actually of the "no output whatsoever" type but treated as "invisible zero-width" for now. With the new ASCII_NBRZW mechanism in place, switching them over one by one when the need arises will no longer be very difficult.
*	In GNU, Heirloom, and Plan 9 roff, tab positions apply to input lines,	Ingo Schwarze	2022-08-15	1	-7/+10
\| \| \| \| \| \| \| \| \| \| \| \| \|	not to output lines. In particular, if an input line gets broken in fill mode and a tab occurs in the second output line, it advances to a position of at least (width of the first output line) + (width of a space character even though this is never printed) + (width of the part of the second output line that precedes the tab). Implement the same logic in mandoc. Again, do not use tabs in filled text: they have surprising effects, including this one.
*	In GNU, Heirloom, and Plan 9 roff, literal tab characters are	Ingo Schwarze	2022-08-15	1	-19/+15
\| \| \| \| \| \| \| \| \| \| \| \| \|	non-breakable in exactly the same way as "\ ". That is, the preceding word, the tab character, and the following word are always kept together on the same output line. If filling is enabled and an output line break is required before the end of the following word, the break occurs before the beginning of the preceding word. Make mandoc behave in the same way. Of course, using literal tab characters in filled text remains a bad idea, and the "WARNING: tab in filled text" remains unchanged.
*	Fix three bugs regarding the interaction of \z and \h:	Ingo Schwarze	2022-04-27	1	-3/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. The combination \z\h is a no-op whatever the argument may be. In the past, the \z only affected the first space character generated by the \h, which was wrong. 2. For the conbination \zX\h with a positive argument, the first space resulting from the \h is not printed but consumed by the \z. 3. For the combination \zX\h with a negative argument, application of the \z needs to be completed before the \h can be started. In the past, if this combination occurred at the beginning of an output line, the \h backed up to the beginning of the line and after that, the \z attempted to back up even further, triggering an assertion. Bugs found during an audit of assignments to termp->col that i started after the bugfix tbl_term.c rev. 1.65. The assertion triggered by bug 3 was not yet found by afl(1).
*	When rendering the \h (horizontal motion) low-level roff(7) escape	Ingo Schwarze	2022-01-10	1	-5/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	sequence in -T ps and -T pdf output mode, use an appropriate horizontal distance by correctly using the term_len() utility function. Output from the -T ascii, -T utf8, and -T html modes was already correct and remains unchanged. Lennart Jablonka <hummsmith42 at gmail dot com> found and reported this unit conversion bug (misinterpreting AFM units as if they were en units) when rendering scdoc-generated manuals (which is a low quality generator, but that's no excuse for mandoc misformatting \h) on Alpine Linux. Lennart also tested this patch.
*	Provide a cleanup function for the term_tab module, freeing memory	Ingo Schwarze	2021-10-04	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	and resetting the internal state to the initial state. Call this function from the proper place in term_free(). With the way the module is currently used, this does not imply any functional change, but doing proper cleanup is more robust, makes it easier during code review to understand what is going on, and makes it explicit that there is no memory leak.
*	Support two-character font names (BI, CW, CR, CB, CI)	Ingo Schwarze	2021-08-10	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in the tbl(7) layout font modifier. Get rid of the TBL_CELL_BOLD and TBL_CELL_ITALIC flags and use the usual ESCAPE_FONT* enum mandoc_esc members from mandoc.h instead, which simplifies and unifies some code. While here, also support CB and CI in roff(7) \f escape sequences and in roff(7) .ft requests for all output modes. Using those is certainly not recommended because portability is limited even with groff, but supporting them makes some existing third-party manual pages look better, in particular in HTML output mode. Bug-compatible with groff as far as i'm aware, except that i consider font names starting with the '\n' (ASCII 0x0a line feed) character so insane that i decided to not support them. Missing feature reported by nabijaczleweli dot xyz in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=992002. I used none of the code from the initial patch submitted by nabijaczleweli, but some of their ideas. Final patch tested by them, too.
*	Do not indent by SIZE_MAX/2 when .ce occurs inside explicit no-fill mode.	Ingo Schwarze	2020-09-02	1	-12/+9
\| \| \| \| \| \| \| \|	While here, drop two unused arguments from the function term_field(); the related work was already done by term_fill() before this commit. I found the bug in an afl run that was performed by Jan Schreiber <jes at posteo dot de>.
*	Explicitly state that the cases in the inner switch in term_fill()	Ingo Schwarze	2019-06-03	1	-0/+2
\| \| \| \| \| \| \| \|	are exhaustive. While there is no bug, being explicit has no downside is is potentially safer for the future. Michal Nowak <mnowak at startmail dot com> reported that gcc 4.4.4 and 7.4.0 on illumos throw -Wuninitialized false positives.
*	In PostScript and PDF output, one AFM unit is not nearly enough	Ingo Schwarze	2019-01-15	1	-2/+3
\| \| \| \| \| \|	inter-word spacing, let's try again with 250 AFM units. Regression caused during my recent term_flushln() reorg in rev. 1.278, reported by brynet@ (sorry and many thanks for reporting).
*	Implement centering and adjustment to the right margin directly in	Ingo Schwarze	2019-01-04	1	-1/+15
\| \| \| \| \| \| \|	the terminal filling routine, controlled by new flags TERMP_CENTER and TERMP_RIGHT. This became possible by the recent term_flushln() rewrite. No functional change yet, but to be used by upcoming commits.
*	Rewrite the line filling function for terminal output yet again.	Ingo Schwarze	2019-01-03	1	-178/+252
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This function has always been among the most complicated parts of mandoc, and it repeatedly needed substantial functional enhancements. The present rewrite is required to prepare for the implementation of simultaneous filling and centering of output lines. The previous implementation looked at each word in turn and printed it to the output stream as soon as it was found to still fit on the current output line. Obviously, that approach neither allows centering nor adjustment to the right margin. The new implementation first decides which part of the paragraph to put onto the current output line, also measuring the display width of that part, even if that part consists of multiple words including intervening whitespace. This will allow moving the whole output line to the right as desired before printing it, for example to center it or to adjust it to the right margin. The function is split into three parts, each much shorter, solving a better defined task, much easier to understand and better commented: 1. the steering function term_flushln() looping over output lines; 2. the calculation function term_fill() looping over input characters; 3. and the output function term_field() looping over printed characters. No functional change yet.
*	Several improvements to escape sequence handling.	Ingo Schwarze	2018-12-15	1	-8/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add the missing special character \_ (underscore). * Partial implementations of \a (leader character) and \E (uninterpreted escape character). * Parse and ignore \r (reverse line feed). * Add a WARNING message about undefined escape sequences. * Add an UNSUPP message about unsupported escape sequences. * Mark \! and \? (transparent throughput) and \O (suppress output) as unsupported. * Treat the various variants of zero-width spaces as one-byte escape sequences rather than as special characters, to avoid defining bogus forms with square brackets. * For special characters with one-byte names, do not define bogus forms with square brackets, except for \[-], which is valid. * In the form with square brackets, undefined special characters do not fall back to printing the name verbatim, not even for one-byte names. * Starting a special character name with a blank is an error. * Undefined escape sequences never abort formatting of the input string, not even in HTML output mode. * Document the newly handled escapes, and a few that were missing. * Regression tests for most of the above.
*	Implement the \f(CW and \f(CR (constant width font) escape sequences	Ingo Schwarze	2018-10-25	1	-0/+1
\| \| \| \| \| \| \| \| \|	for HTML output. Somewhat relevant because pod2man(1) relies on this. Missing feature reported by Pali dot Rohar at gmail dot com. Note that constant width font was already correctly selected before this when required by semantic markup. Only attempting physical markup with the low-level escape sequence was ineffective.
*	Implement the \*(.T predefined string (interpolate device name)	Ingo Schwarze	2018-08-16	1	-1/+26
\| \| \| \| \|	by allowing the preprocessor to pass it through to the formatters. Used for example by the groff_char(7) manual page.
*	fix typo in TERMP_MULTICOL flag test;	Ingo Schwarze	2017-07-28	1	-1/+1
\| \| \| \|	patch from florian@, found with clang
*	implement so-called absolute horizontal motion: \h'\|...',	Ingo Schwarze	2017-06-14	1	-1/+6
\| \| \| \|	used for example by zoem(1)
*	let \l use the right fill character	Ingo Schwarze	2017-06-14	1	-5/+5
\|
*	improve rounding rules for scaling units	Ingo Schwarze	2017-06-14	1	-3/+17
\| \| \| \|	in horizontal orientation in the terminal formatter
*	implement the roff(7) \p (break output line) escape sequence	Ingo Schwarze	2017-06-14	1	-0/+32
\|
*	Implement automatic line breaking	Ingo Schwarze	2017-06-12	1	-24/+39
\| \| \| \| \|	inside individual table cells that contain text blocks. This cures overlong lines in various Xenocara manuals.
*	make the internal a2roffsu() interface more powerful by returning	Ingo Schwarze	2017-06-08	1	-13/+4
\| \| \| \| \|	a pointer to the end of the parsed data, making it easier to parse subsequent bytes
*	Prepare the terminal driver for filling multiple columns in parallel,	Ingo Schwarze	2017-06-07	1	-24/+39
\| \| \| \| \| \| \| \| \| \|	second step: make the per-column byte pointer persistent across term_flushln() calls, such that a subsequent call can continue at the point where the previous call left. If more than one column is in use, return from term_flushln() when the column is full, rather than breaking the output line. No functional change, because nothing sets up multiple columns yet.
*	Prepare the terminal driver for filling multiple columns in parallel,	Ingo Schwarze	2017-06-07	1	-66/+68
\| \| \| \| \| \|	first step: split column data out of the terminal state struct into a new column state struct and use an array of such column state structs. No functional change.
*	The \h escape sequence provides another method for moving backwards,	Ingo Schwarze	2017-06-07	1	-14/+27
\| \| \| \| \| \|	and after that, previously written output gets overwritten, but overwriting with blanks does not erase previously written content. Yes, manual pages exist that are crazy enough to rely on that...
*	Implement the roff(7) .mc (right margin character) request.	Ingo Schwarze	2017-06-04	1	-39/+60
\| \| \| \| \| \|	The Tcl/Tk manual pages use this extensively. Delete the TERM_MAXMARGIN hack, it breaks .mc inside .nf; instead, implement a proper TERMP_BRNEVER flag.
*	Make term_flushln() simpler and more robust:	Ingo Schwarze	2017-06-04	1	-61/+24
\| \| \| \| \| \|	Eliminate the "overstep" state variable. The information is already contained in "viscol". Minus 60 lines of code, no functional change intended.
*	Partial implementation of \h (horizontal line drawing function).	Ingo Schwarze	2017-06-02	1	-1/+57
\| \| \| \| \| \| \| \| \| \| \|	A full implementation would require access to output device properties and state variables (both only available after the main parser has finalized the parse tree) before numerical expansions in the roff preprocessor (i.e., before the main parser is even started). Not trying to pull that stunt right now because the static-width implementation committed here is sufficient for tcl-style manual pages and already more complicated than i would have suspected.
*	Minimal implementation of the \h (horizontal motion) escape sequence.	Ingo Schwarze	2017-06-01	1	-0/+22
\| \| \| \|	Good enough to cope with the average DocBook insanity.
*	Basic implementation of the roff(7) .ta (define tab stops) request.	Ingo Schwarze	2017-05-07	1	-11/+14
\| \| \| \| \| \|	This is the first feature made possible by the parser reorganization. Improves the formatting of the SYNOPSIS in many Xenocara GL manuals. Also important for ports, as reported by many, including naddy@.
*	Fix an assertion failure caused by \z\[u00FF] with -Tps/-Tpdf.	Ingo Schwarze	2017-01-08	1	-2/+14
\| \| \| \|	Reported by jsg@ after an afl(1) run long ago.
*	Fix assertion failures caused by whitespace inside \o'' (overstrike)	Ingo Schwarze	2016-08-10	1	-3/+5
\| \| \| \| \| \|	sequences that jsg@ found with afl(1): * Avoid writing \t\b in term.c. * Handle trailing \b in term_ps.c.
*	sed 's/the the/the/' in a comment; from krw@	Ingo Schwarze	2016-04-12	1	-1/+1
\|
*	This code wasted memory by allocating sizeof(enum termfont *)	Ingo Schwarze	2016-01-07	1	-1/+1
\| \| \| \| \|	where only sizeof(enum termfont) is needed. Fixes CID 1288941. From christos@ via wiz@, both at NetBSD.
*	apply bold and italic to all non-ASCII Unicode codepoints,	Ingo Schwarze	2015-10-23	1	-1/+1
\| \| \| \|	fixing input like \fB\('e; issue reported by bentley@
*	Major character table cleanup:	Ingo Schwarze	2015-10-13	1	-7/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Use ohash(3) rather than a hand-rolled hash table. * Make the character table static in the chars.c module: There is no need to pass a pointer around, we most certainly never want to use two different character tables concurrently. * No need to keep the characters in a separate file chars.in; that merely encourages downstream porters to mess with them. * Sort the characters to agree with the mandoc_chars(7) manual page. * Specify Unicode codepoints in hex, not decimal (that's the detail that originally triggered this patch). No functional change, minus 100 LOC, and i don't see a performance change.
*	To make the code more readable, delete 283 /* FALLTHROUGH */ comments	Ingo Schwarze	2015-10-12	1	-4/+0
\| \| \| \| \| \|	that were right between two adjacent case statement. Keep only those 24 where the first case actually executes some code before falling through to the next case.
*	modernize style: "return" is not a function	Ingo Schwarze	2015-10-06	1	-6/+6
\|
*	/* NOTREACHED */ after abort() is silly, delete it	Ingo Schwarze	2015-09-26	1	-1/+0
\|
*	Trailing whitespace is significant when determining the width of a tag	Ingo Schwarze	2015-09-21	1	-0/+6
\| \| \| \| \|	in mdoc(7) .Bl -tag and man(7) .TP, but not in man(7) .IP. Quirk reported by Jan Stary <hans at stare dot cz> on ports@.
*	Drop leading, internal, and trailing blank characters in \o (overstrike)	Ingo Schwarze	2015-08-30	1	-1/+7
\| \| \| \| \| \|	escape sequences; that's cleaner for all output modes, and it's required to prevent the PostScript/PDF formatter from dying on assertions. Bug found by jsg@ with afl.
*	Replace the kludge for the \z escape sequence by an actual	Ingo Schwarze	2015-04-29	1	-42/+26
\| \| \| \| \| \| \| \|	implementation. As a side effect, minus ten lines of code. As another side effect, this also fixes the assertion failure that used to be triggered by "\z\o'ab'c" at the beginning of an output line, found by jsg@ with afl (test case 022/Apr27).
*	Rounding rules for horizontal scaling widths are more complicated.	Ingo Schwarze	2015-04-04	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	There is a first rounding to basic units on the input side. After that, rounding rules differ between requests and macros. Requests round to the nearest possible character position. Macros round to the next character position to the left. Implement that by changing the return value of term_hspan() to basic units and leaving the second scaling and rounding stage to the formatters instead of doing it in the terminal handler. Improves for example argtable2(3).
*	Third step towards parser unification:	Ingo Schwarze	2015-04-02	1	-3/+3
\| \| \| \| \|	Replace struct mdoc_meta and struct man_meta by a unified struct roff_meta. Written of the train from London to Exeter on the way to p2k15.
*	prevent the skipvsp flag from creeping past actual text	Ingo Schwarze	2015-03-06	1	-0/+1
\|
*	Use relative offsets instead of absolute pointers for the terminal	Ingo Schwarze	2015-01-31	1	-14/+6
\| \| \| \| \| \|	font stack. The latter fail after the stack is grown with realloc(). Fixing an assertion failure found by jsg@ with afl some time ago (test case number 51).
*	Rudimentary implementation of the roff(7) \o escape sequence (overstrike).	Ingo Schwarze	2015-01-21	1	-1/+26
\| \| \| \| \| \|	This is of some relevance because the pod2man(1) preamble abuses it for the icelandic letter Thorn, instead of simply using \(TP and \(Tp. Missing feature found by sthen@ in DateTime::Locale::is_IS(3p).
*	Support negative indentations for mdoc(7) displays and lists.	Ingo Schwarze	2014-12-24	1	-1/+1
\| \| \| \| \| \|	Not exactly recommended for use, rather for groff compatibility. While here, introduce similar SHRT_MAX limits as in man(7), fixing a few cases of infinite output found by jsg@ with afl.
*	When a man(7) document contains unreasonably large numbers for	Ingo Schwarze	2014-12-24	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \|	indentations or paragraph distances, large output may be generated, which is practically the same as an endless loop; found by jsg@ with afl. Reject such unreasonably large numbers beyond arbitrary limits similar to those used by groff (max. 65 blank lines between paragraphs and max. SHRT_MAX characters per output line) and fall back to defaults when exceeded. Having the limits behave in exactly the same way is not relevant.