summaryrefslogtreecommitdiffstats
path: root/roff.c
Commit message (Collapse)AuthorAgeFilesLines
* Do not fail an assertion when a high level macro occurs in the bodyIngo Schwarze2019-12-261-1/+13
| | | | | | | of a conditional inside a .ce request block. Instead, abort the .ce block just like when there is no conditional in between. Bug found by espie@ working on the textproc/fstrcmp port.
* In the past, generating comment nodes stopped at the .TH or .DdIngo Schwarze2019-11-091-3/+7
| | | | | | | | | | | macro, which is usually close to the beginning of the file, right after the Copyright header comments. But espie@ found horrible input files in the textproc/fstrcmp port that generate lots of parse nodes before even getting to the header macro. In some formatters, comment nodes after some kinds of real content triggered assertions. So make sure generation of comment nodes stops once real content is encountered.
* delete trailing whitespace and space-tab sequences; no code change;Ingo Schwarze2019-07-011-2/+2
| | | | | patch from Michal Nowak <mnowak at startmail dot com> who found these with git pbchk in the illumos tree
* When calling an empty macro, do not clobber existing arguments.Ingo Schwarze2019-04-211-1/+6
| | | | | Fixing a bug found with the groffer(1) version 1.19 manual page following a report from Jan Stary.
* Implement the roff .break request (break out of a .while loop).Ingo Schwarze2019-04-211-8/+45
| | | | | | | Jan Stary <hans at stare dot cz> found it in an ancient groffer(1) manual page (version 1.19) on MacOS X Mojave. Having .break not implemented wasn't a particularly bright idea because obviously, it tended to cause infinite loops.
* Let roff_getname() end the roff identifier at a tab characterIngo Schwarze2019-02-061-7/+18
| | | | | | | | | | | | | | | | | | | | | | and audit all its callers whether termination is handled correctly. Resulting improvements: * An escape or tab ending the macro name in a macro invocation is discarded, and argument processing is started after it. * An escape or tab ending a name in ".if d" and ".if r" is preserved. * An escape ending a name in ".ds" causes the whole request to be ignored. * A tab ending a name in ".ds" becomes part of the string. * An escape or tab ending a name in ".rm" causes the rest of the line to be ignored. * An escape or tab ending the first name in ".als", ".rn", or ".nr" causes the whole request to be ignored. Kurt Jaeger <pi at FreeBSD> made me aware of https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235456#c0 and in that bug report, comment 0 item (3) is a special case of this class of issues. Yes, the "mh" manual pages are no doubt among the worst on the planet.
* adjust style and comments in roff_getname(); no functional changeIngo Schwarze2019-02-061-11/+14
|
* no-fill mode has to be suspended during tbl(7) rendering, tooIngo Schwarze2019-01-051-0/+2
|
* Some high-level block macros have an effect similar to temporarilyIngo Schwarze2019-01-051-2/+2
| | | | | | | | | | | | suspending no-fill mode during their head. Model this with an additional roff parser state flag ROFF_NONOFILL. That is much simpler than it would be to save and restore the ROFF_NOFILL flag itself, in particular since the latter can be switched (with lasting effect) by the .nf and .fi requests even while its effect is temporarily suspended. This commit does not change formatting yet, but prepares for future formatting simplifications and improvements.
* Store the fill mode with a new flag NODE_NOFILL in every node,Ingo Schwarze2018-12-311-0/+4
| | | | | | like it is already done with NODE_SYNPRETTY, such that the fill mode becomes more directly available to the formatters. Not used yet, but will be used by upcoming commits.
* Move parsing of the .nf and .fi (fill mode) requests from the man(7)Ingo Schwarze2018-12-311-20/+28
| | | | | | parser to the roff(7) parser. As a side effect, .nf and .fi are now also parsed in mdoc(7) input, though the mdoc(7) formatters still ignore most of their effect.
* Cleanup, minus 15 LOC, no functional change:Ingo Schwarze2018-12-311-5/+12
| | | | | | | | | Simplify the way the man(7) and mdoc(7) validators are called. Reset the parser state with a common function before calling them. There is no need to again reset the parser state afterwards, the parsers are no longer used after validation. This allows getting rid of man_node_validate() and mdoc_node_validate() as separate functions.
* Cleanup, no functional change:Ingo Schwarze2018-12-301-16/+13
| | | | | | | | | | | | | | The struct roff_man used to be a bad mixture of internal parser state and public parsing results. Move the public results to the parsing result struct roff_meta, which is already public. Move the rest of struct roff_man to the parser-internal header roff_int.h. Since the validators need access to the parser state, call them from the top level parser during mparse_result() rather than from the main programs, also reducing code duplication. This keeps parser internal state out of thee main programs (five in mandoc portable) and out of eight formatters.
* Rename mandoc_getarg() to roff_getarg() and pass it the roff parserIngo Schwarze2018-12-211-20/+53
| | | | | | | | | | | | | | | | | | struct as an argument such that after copy-in, it can call roff_expand() once again, which used to be called roff_res() before this. This fixes a subtle low-level roff(7) parsing bug reported by Fabio Scotoni <fabio at esse dot ch> in the 4.4BSD-Lite2 mdoc.samples(7) manual page, because that page used an escaped escape sequence in a macro argument. To expand escaped escape sequences in quoted mdoc(7) arguments, too, stop bypassing the call to roff_getarg() in mdoc_argv.c, function args() for this case. This does not solve the case of escaped escape sequences in quoted .Bl -column phrases yet. Because roff_expand() can make the string longer, roff_getarg() can no longer operate in-place but needs to malloc(3) the returned string. In the high-level parsers, free(3) that string after processing it.
* Bugfix:Ingo Schwarze2018-12-201-1/+1
| | | | | | | When after a \\, \t, or \a, another \t or \a had to be resolved in copy mode within the same argument, the argument got corrupted. Found while working on a loosely related bug report from Fabio Scotoni <fabio at esse dot ch>.
* As a first step towards making roff_res() callable from mandoc_getarg(),Ingo Schwarze2018-12-181-0/+97
| | | | | | | | | move the function mandoc_getarg() from mandoc.c to roff.c. It was misplaced in mandoc.c in the first place; that file is intended for utilities needed both by parsers and by formatters, while reading macro arguments in copy mode is purely a task of the roff(7) parser. Needed as a preliminary for an upcoming bugfix. No code change.
* Several improvements to escape sequence handling.Ingo Schwarze2018-12-151-14/+33
| | | | | | | | | | | | | | | | | | | | | | | * Add the missing special character \_ (underscore). * Partial implementations of \a (leader character) and \E (uninterpreted escape character). * Parse and ignore \r (reverse line feed). * Add a WARNING message about undefined escape sequences. * Add an UNSUPP message about unsupported escape sequences. * Mark \! and \? (transparent throughput) and \O (suppress output) as unsupported. * Treat the various variants of zero-width spaces as one-byte escape sequences rather than as special characters, to avoid defining bogus forms with square brackets. * For special characters with one-byte names, do not define bogus forms with square brackets, except for \[-], which is valid. * In the form with square brackets, undefined special characters do not fall back to printing the name verbatim, not even for one-byte names. * Starting a special character name with a blank is an error. * Undefined escape sequences never abort formatting of the input string, not even in HTML output mode. * Document the newly handled escapes, and a few that were missing. * Regression tests for most of the above.
* Cleanup, no functional change:Ingo Schwarze2018-12-141-9/+5
| | | | | | | | | | Now that message handling is properly encapsulated, remove struct mparse pointers from four structs (roff, roff_man, tbl_node, eqn_node) and from the argument lists of five functions (roff_alloc, roff_man_alloc, mandoc_getarg, tbl_alloc, eqn_alloc). Except for being passed to the main program as an opaque object, it now only occurs in read.c, as it should, and not across 15 files like in the past.
* Almost mechanical diff to remove the "struct mparse *" argumentIngo Schwarze2018-12-141-91/+76
| | | | | | | | from mandoc_msg(), where it is no longer used. While here, rename mandoc_vmsg() to mandoc_msg() and retire the old version: There is really no point in having another function merely to save "%s" in a few places. Minus 140 lines of code.
* Cleanup, no functional change:Ingo Schwarze2018-12-131-0/+1
| | | | | | | | | | Split the top level parser interface out of the utility header mandoc.h, into a new header mandoc_parse.h, for use in the main program and in the main parser only. Move enum mandoc_os into roff.h because struct roff_man is the place where it is stored. This allows removal of mandoc.h from seven files in low-level parsers and in formatters.
* Cleanup, no functional change:Ingo Schwarze2018-12-131-2/+1
| | | | | | No need to expose the eqn(7) syntax tree data structures everywhere. Move them to their own include file, "eqn.h". While here, delete the unused enum eqn_pilet.
* Cleanup, no functional change:Ingo Schwarze2018-12-131-5/+3
| | | | | | | | In libroff.h, nothing was left except the eqn(7) parser interface, which isn't really part of the roff(7) parser, so rename it to eqn_parse.h. While here, move struct eqn_def to eqn.c because that's the only file using it, and let eqn_box_free() and eqn_free() handle NULL.
* Cleanup, no functional change:Ingo Schwarze2018-12-131-21/+13
| | | | | Move tbl(7)-specific parser internals out of libroff.h. Move some tbl(7)-internal processing from roff.c to tbl.c.
* Cleanup, no functional change:Ingo Schwarze2018-12-121-1/+2
| | | | | No need to expose the tbl(7) syntax tree data structures everywhere. Move them to their own include file, "tbl.h", and improve comments.
* Clean up the validation of .Pp, .PP, .sp, and .br. Make sure allIngo Schwarze2018-12-041-0/+8
| | | | | | | | | | | | | | combinations are handled, and are handled in a systematic manner. This resolves some erratic duplicate handling, handles a number of missing cases, and improves diagnostics in various respects. Move validation of .br and .sp to the roff validation module rather than doing that twice in the mdoc and man validation modules. Move the node relinking function to the roff library where it belongs. In validation functions, only look at the node itself, at previous nodes, and at descendants, not at following nodes or ancestors, such that only nodes are inspected which are already validated.
* When a conditional block is closed by putting "\}" on a text lineIngo Schwarze2018-11-261-6/+28
| | | | | | | | | | | | | by itself (which is somewhat unusual but not invalid; most authors use the empty macro line ".\}" instead), agree more closely with groff and do not produce a double space in the output. Quirk reported by millert@. While here, tweak the rest of the function body of roff_cond_text() to more closely match roff_cond_sub(). The subtly different handling could make people (including myself) wonder whether there is any point in being different. Testing shows there is not.
* Implement the \f(CW and \f(CR (constant width font) escape sequencesIngo Schwarze2018-10-251-0/+1
| | | | | | | | | for HTML output. Somewhat relevant because pod2man(1) relies on this. Missing feature reported by Pali dot Rohar at gmail dot com. Note that constant width font was already correctly selected before this when required by semantic markup. Only attempting physical markup with the low-level escape sequence was ineffective.
* Rudimentary implementation of the roff(7) .char (output glyphIngo Schwarze2018-08-251-1/+72
| | | | | | | | | definition) request, used for example by groff_hdtbl(7). This simplistic implementation may interact incorrectly with the .tr (input character translation) request. But come on, you are not only using .char *and* .tr, but you do so with respect to the same character in the same manual page?
* Rudimentary implementation of the roff(7) .while request.Ingo Schwarze2018-08-241-124/+145
| | | | | | | | | | | Needed for example by groff_hdtbl(7). There are two limitations: It does not support nested .while requests yet, and each .while loop must start and end in the same scope. The roff_parseln() return codes are now more flexible and allow OR'ing options.
* Implement the roff(7) .shift and .return requests,Ingo Schwarze2018-08-231-170/+196
| | | | | | | | | | | | | | for example used by groff_hdtbl(7) and groff_mom(7). Also correctly interpolate arguments during nested macro execution even after .shift and .return, implemented using a stack of argument arrays. Note that only read.c, but not roff.c can detect the end of a macro execution, and the existence of .shift implies that arguments cannot be interpolated up front, so unfortunately, this includes a partial revert of roff.c rev. 1.337, moving argument interpolation back into the function roff_res().
* Implement the \\$@ escape sequence (insert all macro arguments,Ingo Schwarze2018-08-211-4/+18
| | | | | | | | | | | | quoted) in addition to the already supported \\$* (similar, but unquoted). Then use \\$@ to improve the implementation of the .als request (macro alias). Needed by groff_hdtbl(7). Gosh, it feels like the manual pages of the groff package are exercising every bloody roff(7) feature under the sun. In the manual page source code itself, not merely in the implementation of the used macro packages, that is.
* Expand \n(.$ (the number of macro arguments) right in roff_userdef(),Ingo Schwarze2018-08-201-7/+34
| | | | | | | | before even reparsing the expanded macro. That is the least dirty way to fix the bug that \(.$ remained set after execution of the user-defined macro ended. Any other way to fix it would probably require changes to read.c, which really shouldn't be bothered with such roff(7) internals.
* Mostly complete implementation of the 'c' (character available)Ingo Schwarze2018-08-191-4/+43
| | | | | | | | | | | roff conditional, except that the .char request still isn't supported and that behaviour differs from groff in many edge cases. But at least valid character names and numbers are now distinguished from invalid ones. This also fixes the bug that parsing of the 'c' conditional was incomplete, which resulted in leaking the tested character to the input parser at the beginning of the body when the condition was inverted.
* Bugfix: When a line ends with '\ \"', don't strip the trailing spaceIngo Schwarze2018-08-181-1/+2
| | | | because that turned it into a bogus line continuation.
* support the highly surprising escape sequence \# (line continuationIngo Schwarze2018-08-181-1/+8
| | | | with comment); used for example by gropdf(1)
* implement the GNU man-ext .SY/.YS (synopsis block) macro in man(7),Ingo Schwarze2018-08-181-1/+2
| | | | used in most manual pages of the groff package
* implement the GNU man-ext .TQ macro in man(7),Ingo Schwarze2018-08-161-0/+1
| | | | used for example by groff_diff(7)
* Implement the \*(.T predefined string (interpolate device name)Ingo Schwarze2018-08-161-0/+13
| | | | | by allowing the preprocessor to pass it through to the formatters. Used for example by the groff_char(7) manual page.
* Implement the roff(7) .nop (no operation) request.Ingo Schwarze2018-08-101-1/+11
| | | | | Examples of manual pages (ab)using it include groff(7), chem(1), groff_mom(7), and groff_hdtbl(7).
* After rewriting the parse buffer from scratch, we also have to resetIngo Schwarze2018-08-011-0/+3
| | | | | | the parse point to the beginning of the new buffer or we risk out of bounds accesses. Bug found by Leah Neukirchen <leah at vuxu dot org> with valgrind on Void Linux.
* preserve comments before .Dd when converting mdoc(7) to man(7)Ingo Schwarze2018-04-111-6/+29
| | | | with mandoc -Tman; suggested by Thomas Klausner <wiz at NetBSD>
* Two new low-level roff(7) features:Ingo Schwarze2018-04-101-14/+43
| | | | | | * .nr optional third argument (auto-increment step size) * \n+ and \n- numerical register auto-increment and -decrement bentley@ reported on Dec 9, 2013 that lang/sbcl(1) uses these.
* When accessing an undefined number register, define it to be zero, likeIngo Schwarze2018-04-091-23/+19
| | | | | | the previous commit for strings and macros, only technically simpler. Desired behaviour also mentioned by Werner Lemberg in 2011. This diff adds functionality but is -21 +19 LOC. :-)
* Using an undefined string or macro will cause it to be defined as empty.Ingo Schwarze2018-04-091-42/+81
| | | | | Observed by Werner Lemberg on Nov 14, 2011 and rotting on my TODO list ever since.
* The .Dd and .TH macros must interrupt .ce, too;Ingo Schwarze2017-07-141-1/+2
| | | | fixing tree corruption and assertion failure found by jsg@ with afl(1)
* Explicitly initialize a variable where the compiler is (understandably)Ingo Schwarze2017-07-141-5/+6
| | | | | | unable to figure out that it is never used uninitialized. While here, tweak the content of the variable to make its usage easier to understand. No functional change.
* eqn(7) .EQ has to break man(7) next-line scope, or tree corruptionIngo Schwarze2017-07-131-0/+2
| | | | and use after free many ensue; again found by jsg@ with afl(1)
* Simplify by creating struct roff_node syntax tree nodes for tbl(7)Ingo Schwarze2017-07-081-36/+36
| | | | | | | | | | | | right from roff_parseln() rather than delegating to read.c, similar to what i just did for eqn(7). The interface function roff_span() becomes obsolete and is deleted, the former interface function roff_addtbl() becomes static, the interface functions tbl_read() and tbl_cdata() become void, and minus twelve linus of code. No functional change.
* fix an assertion failure triggered by .ce in next-line scope;Ingo Schwarze2017-07-081-1/+2
| | | | found by jsg@ with afl(1)
* 1. Eliminate struct eqn, instead use the existing membersIngo Schwarze2017-07-081-47/+37
| | | | | | of struct roff_node which is allocated for each equation anyway. 2. Do not keep a list of equation parsers, one parser is enough. Minus fifty lines of code, no functional change.