summaryrefslogtreecommitdiffstats
path: root/mandoc.c
Commit message (Collapse)AuthorAgeFilesLines
* Make roff_expand() parse left-to-right rather than right-to-left.Ingo Schwarze2022-05-191-384/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some escape sequences have side effects on global state, implying that the order of evaluation matters. For example, this fixes the long-standing bug that "\n+x\n+x\n+x" after ".nr x 0 1" used to print "321"; now it correctly prints "123". Right-to-left parsing was convenient because it implicitly handled nested escape sequences. With correct left-to-right parsing, nesting now requires an explicit implementation, here solved as follows: 1. Handle nested expanding escape sequences iteratively. When finding one, expand it, then retry parsing the enclosing escape sequence from the beginning, which will ultimately succeed as soon as it no longer contains any nested expanding escape sequences. 2. Handle nested non-expanding escape sequences recursively. When finding one, the escape sequence parser calls itself to find the end of the inner sequence, then continues parsing the outer sequence after that point. This requires the mandoc_escape() function to operate in two different modes. The roff(7) parser uses it in a mode where it generates diagnostics and may return an expansion request instead of a parse result. All other callers, in particular the formatters, use it in a simpler mode that never generates diagnostics and always returns a definite parsing result, but that requires all expanding escape sequences to already have been expanded earlier. The bulk of the code is the same for both modes. Since this required a major rewrite of the function anyway, move it into its own new file roff_escape.c and out of the file mandoc.c, which was misnamed in the first place and lacks a clear focus. As a side benefit, this also fixes a number of assertion failures that tb@ found with afl(1), for example "\n\\\\*0", "\v\-\\*0", and "\w\-\\\\\$0*0". As another side benefit, it also resolves some code duplication between mandoc_escape() and roff_expand() and centralizes all handling of escape sequences (except for expansion) in roff_escape.c, hopefully easing maintenance and feature improvements in the future. While here, also move end-of-input handling out of the complicated function roff_expand() and into the simpler function roff_parse_comment(), making the logic easier to understand. Since this is a major reorganization of a central component of mandoc(1), stability of the program might slightly suffer for a few weeks, but i believe that's not a problem at this point of the release cycle. The new code already satisfies the regression suite, but more tweaking and regression testing to further improve the handling of various escape sequences will likely follow in the near future.
* Surprisingly, groff supports multiple copy mode escapes at theIngo Schwarze2022-04-131-3/+3
| | | | | | | | | | | | | beginning of an escape sequence: \, \E, \EE, \EEE, and so on all do the same outside copy mode, so let them do the same in mandoc(1), too. This fixes an assertion failure triggered by \EE*X that tb@ found with afl(1). The first E was consumed by roff_expand(), but that function failed to recognize the escape sequence as the expansion of a user-defined string and handed it over to mandoc_escape(), which consumed the second E and then died on an assertion because it is not prepared to handle user-defined strings. Fix this by letting *both* functions handle arbitrary numbers of 'E's correctly.
* Support two-character font names (BI, CW, CR, CB, CI)Ingo Schwarze2021-08-101-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | in the tbl(7) layout font modifier. Get rid of the TBL_CELL_BOLD and TBL_CELL_ITALIC flags and use the usual ESCAPE_FONT* enum mandoc_esc members from mandoc.h instead, which simplifies and unifies some code. While here, also support CB and CI in roff(7) \f escape sequences and in roff(7) .ft requests for all output modes. Using those is certainly not recommended because portability is limited even with groff, but supporting them makes some existing third-party manual pages look better, in particular in HTML output mode. Bug-compatible with groff as far as i'm aware, except that i consider font names starting with the '\n' (ASCII 0x0a line feed) character so insane that i decided to not support them. Missing feature reported by nabijaczleweli dot xyz in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=992002. I used none of the code from the initial patch submitted by nabijaczleweli, but some of their ideas. Final patch tested by them, too.
* Treat \*[.T] in the same way as \*(.T rather than calling abort(3).Ingo Schwarze2020-10-241-8/+15
| | | | | Bug found because the groff-current manual pages started using the variant form of this predefined string.
* Align to the new, sane behaviour of the groff_mdoc(7) .Dd macro:Ingo Schwarze2020-01-191-20/+34
| | | | | | | | without an argument, use the empty string, and always concatenate all arguments, no matter their number. This allows reducing the number of arguments of mandoc_normdate() and some other simplifications, at the same time polishing some error messages by adding the name of the macro in question.
* Fix mandoc_normdate() and the way it is used.Ingo Schwarze2019-06-271-3/+8
| | | | | | | | | | In the past, it could return NULL but the calling code wasn't prepared to handle that. Make sure it always returns an allocated string. While here, simplify the code by handling the "quick" attribute inside mandoc_normdate() rather than at multiple callsites. Triggered by deraadt@ pointing out that snprintf(3) error handling was incomplete in time2a().
* Do not print the style message "missing date" when the date is givenIngo Schwarze2019-05-211-2/+2
| | | | | | as "$Mdocdate$" without an actual date. That is the canonical way to write a new manual page and not bad style at all. Misleading message reported by kn@ on tech@.
* Cleanup, no functional change:Ingo Schwarze2018-12-301-1/+2
| | | | | | | | | | | | | | The struct roff_man used to be a bad mixture of internal parser state and public parsing results. Move the public results to the parsing result struct roff_meta, which is already public. Move the rest of struct roff_man to the parser-internal header roff_int.h. Since the validators need access to the parser state, call them from the top level parser during mparse_result() rather than from the main programs, also reducing code duplication. This keeps parser internal state out of thee main programs (five in mandoc portable) and out of eight formatters.
* As a first step towards making roff_res() callable from mandoc_getarg(),Ingo Schwarze2018-12-181-96/+0
| | | | | | | | | move the function mandoc_getarg() from mandoc.c to roff.c. It was misplaced in mandoc.c in the first place; that file is intended for utilities needed both by parsers and by formatters, while reading macro arguments in copy mode is purely a task of the roff(7) parser. Needed as a preliminary for an upcoming bugfix. No code change.
* Yet another round of improvements to manual font selection.Ingo Schwarze2018-12-161-41/+54
| | | | | | | | | Unify handling of \f and .ft. Support \f4 (bold+italic). Support ".ft BI" and ".ft CW" for terminal output. Support the .ft request in HTML output. Reject the bogus fonts \f(C1, \f(C2, \f(C3, and \f(CP. In regress.pl, only strip leading whitespace in math mode.
* Several improvements to escape sequence handling.Ingo Schwarze2018-12-151-16/+78
| | | | | | | | | | | | | | | | | | | | | | | * Add the missing special character \_ (underscore). * Partial implementations of \a (leader character) and \E (uninterpreted escape character). * Parse and ignore \r (reverse line feed). * Add a WARNING message about undefined escape sequences. * Add an UNSUPP message about unsupported escape sequences. * Mark \! and \? (transparent throughput) and \O (suppress output) as unsupported. * Treat the various variants of zero-width spaces as one-byte escape sequences rather than as special characters, to avoid defining bogus forms with square brackets. * For special characters with one-byte names, do not define bogus forms with square brackets, except for \[-], which is valid. * In the form with square brackets, undefined special characters do not fall back to printing the name verbatim, not even for one-byte names. * Starting a special character name with a blank is an error. * Undefined escape sequences never abort formatting of the input string, not even in HTML output mode. * Document the newly handled escapes, and a few that were missing. * Regression tests for most of the above.
* Cleanup, no functional change:Ingo Schwarze2018-12-141-1/+1
| | | | | | | | | | Now that message handling is properly encapsulated, remove struct mparse pointers from four structs (roff, roff_man, tbl_node, eqn_node) and from the argument lists of five functions (roff_alloc, roff_man_alloc, mandoc_getarg, tbl_alloc, eqn_alloc). Except for being passed to the main program as an opaque object, it now only occurs in read.c, as it should, and not across 15 files like in the past.
* Almost mechanical diff to remove the "struct mparse *" argumentIngo Schwarze2018-12-141-11/+8
| | | | | | | | from mandoc_msg(), where it is no longer used. While here, rename mandoc_vmsg() to mandoc_msg() and retire the old version: There is really no point in having another function merely to save "%s" in a few places. Minus 140 lines of code.
* Implement the \f(CW and \f(CR (constant width font) escape sequencesIngo Schwarze2018-10-251-1/+6
| | | | | | | | | for HTML output. Somewhat relevant because pod2man(1) relies on this. Missing feature reported by Pali dot Rohar at gmail dot com. Note that constant width font was already correctly selected before this when required by semantic markup. Only attempting physical markup with the low-level escape sequence was ineffective.
* \f[] means \fP, not \fRIngo Schwarze2018-08-201-4/+7
|
* Implement the \*(.T predefined string (interpolate device name)Ingo Schwarze2018-08-161-0/+7
| | | | | by allowing the preprocessor to pass it through to the formatters. Used for example by the groff_char(7) manual page.
* handle the non-portable GNU-style \[charNN], \[charNNN] characterIngo Schwarze2018-08-101-3/+21
| | | | escape sequences, used for example in the groff_char(7) manual page
* Issue a STYLE message when normalizing the date format in .Dd/.TH.Ingo Schwarze2018-07-281-1/+4
| | | | | | | | Leah Neukirchen pointed out that mdoclint(1) used to warn about a leading zero before the day number, so we know that both NetBSD and Void Linux want the message. It does no harm on OpenBSD because Mdocdate always does the right thing anyway. jmc@ agrees that it makes sense in contexts not using Mdocdate.
* warn about time machines; suggested by Thomas Klausner <wiz @ NetBSD>Ingo Schwarze2017-07-031-2/+10
|
* implement the roff(7) \p (break output line) escape sequenceIngo Schwarze2017-06-141-0/+2
|
* Style message about legacy man(7) date format in mdoc(7) documentsIngo Schwarze2017-06-111-6/+10
| | | | | and operating system dependent messages about missing or unexpected Mdocdate; inspired by mdoclint(1).
* Partial implementation of \h (horizontal line drawing function).Ingo Schwarze2017-06-021-2/+12
| | | | | | | | | | | A full implementation would require access to output device properties and state variables (both only available after the main parser has finalized the parse tree) before numerical expansions in the roff preprocessor (i.e., before the main parser is even started). Not trying to pull that stunt right now because the static-width implementation committed here is sufficient for tcl-style manual pages and already more complicated than i would have suspected.
* Minimal implementation of the \h (horizontal motion) escape sequence.Ingo Schwarze2017-06-011-1/+1
| | | | Good enough to cope with the average DocBook insanity.
* Simplify the logic in mandoc_normdate() and add some comments.Ingo Schwarze2015-11-121-15/+31
| | | | | | | | Also add a comment in time2a() explaining why it isn't possible to use just one single call to strftime(). Do some style cleanup while here. No functional change. Triggered by a very different patch from des@FreeBSD.
* Delete two preprocessor constants that are no longer used.Ingo Schwarze2015-10-151-2/+0
| | | | Patch from Michael Reed <m dot reed at mykolab dot com>.
* Reject the escape sequences \[uD800] to \[uDFFF] in the parser.Ingo Schwarze2015-10-131-0/+3
| | | | | | | These surrogates are not valid Unicode codepoints, so treat them just like any other undefined character escapes: Warn about them and do not produce output. Issue noticed while talking to stsp@, semarie@, and bentley@.
* To make the code more readable, delete 283 /* FALLTHROUGH */ commentsIngo Schwarze2015-10-121-31/+0
| | | | | | that were right between two adjacent case statement. Keep only those 24 where the first case actually executes some code before falling through to the next case.
* modernize style: "return" is not a functionIngo Schwarze2015-10-061-25/+26
|
* Parse and ignore the escape sequences \, and \/ (italic corrections).Ingo Schwarze2015-08-291-0/+4
| | | | | | | | Actually using these is very stupid because they are groff extensions and other roff(7) implementations typically print unintended characters at the places where they are used. Nevertheless, some manuals contain them, for example ocserv(8). Problem reported by Kurt Jaeger <pi at FreeBSD>.
* For selecting a two-digit font size, support the historic syntax \s12Ingo Schwarze2015-02-201-0/+8
| | | | | | | in addition to the classic syntax \s(12, the modern syntax \s[12], and the alternative syntax \s'12'. The historic syntax only works for the font sizes 10-39. Real-world usage found by naddy@ in plan9/rc.
* Rudimentary implementation of the roff(7) \o escape sequence (overstrike).Ingo Schwarze2015-01-211-4/+6
| | | | | | This is of some relevance because the pod2man(1) preamble abuses it for the icelandic letter Thorn, instead of simply using \(TP and \(Tp. Missing feature found by sthen@ in DateTime::Locale::is_IS(3p).
* Fix a read buffer overrun triggered by trailing \s- or trailing \s+Ingo Schwarze2015-01-011-3/+3
| | | | without the required subsequent argument; found by jsg@ with afl.
* Catch localtime() failure for additional safety;Ingo Schwarze2014-12-151-0/+2
| | | | patch from Jan Stary <hans at stare dot cz> some time ago.
* Tighten Unicode escape name parsing.Ingo Schwarze2014-10-281-4/+9
| | | | | Accept only 0xXXXX, 0xYXXXX, 0x10XXXX with Y != 0. This simplifies mchars_num2uc().
* Stricter syntax checking of Unicode character names:Ingo Schwarze2014-10-131-12/+11
| | | | | | | Require exactly 4, 5 or 6 hex digits and allow nothing else. This avoids mishandling stuff like \[ua] and \C'uA' as Unicode and also fixes underlining in eqn(7) -Thtml output which uses \[ul]. Problem found and semantics suggested by kristaps@.
* Fix a corner case where \H<nil> (where <nil> is the \0 character) wouldKristaps Dzonsons2014-08-181-1/+2
| | | | | | cause mandoc_escape() to read past the end of an allocated string. Found when a script scanning of all Mac OSX manual accidentally also scanned binary (gzip'd) files, discussed with schwarze@ on tech@.
* Improve build system and autodetection.Ingo Schwarze2014-08-161-1/+1
| | | | | | | | | * Make ./configure standalone, that's what people expect. * Let people write a ./configure.local from scratch, not edit existing files. * Autodetect wchar, sqlite3, and manpath and act accordingly. * Autodetect the need for -L/usr/local/lib and -lutil. * Get rid of config.h.p{re,ost}, let ./configure only write what's needed. * Let ./configure write a Makefile.local snippet, that's quite flexible.
* Get rid of HAVE_CONFIG_H, it is always defined; idea from libnbcompat.Ingo Schwarze2014-08-101-2/+0
| | | | | | Include <sys/types.h> where needed, it does not belong in config.h. Remove <stdio.h> from config.h; if it is missing somewhere, it should be added, but i cannot find a *.c file where it is missing.
* Clean up messages related to plain text and to escape sequences.Ingo Schwarze2014-07-061-2/+2
| | | | | * Mention invalid escape sequences and string names, and fallbacks. * Hierarchical naming.
* Fix handling of escape sequences taking numeric arguments.Ingo Schwarze2014-07-061-1/+3
| | | | | | | * Repair detection of invalid delimiters. * Discard the invalid delimiter together with the invalid sequence. Note to self: In general, strchr("\0...", c) is a thoroughly bad idea.
* Clean up the warnings related to document structure.Ingo Schwarze2014-07-011-1/+1
| | | | | | | | | * Hierarchical naming of the related enum mandocerr items. * Mention the offending macro, section title, or string. While here, improve some wordings: * Descriptive instead of imperative style. * Uniform style for "missing" and "skipping". * Where applicable, mention the fallback used.
* Start systematic improvements of error reporting.Ingo Schwarze2014-06-201-2/+2
| | | | | | | | | | | So far, this covers all WARNINGs related to the prologue. 1) hierarchical naming of MANDOCERR_* constants 2) mention the macro name in messages where that adds clarity 3) add one missing MANDOCERR_DATE_MISSING msg 4) fix the wording of one message related to the man(7) prologue Started on the plane back from Ottawa.
* KNF: case (FOO): -> case FOO:, remove /* LINTED */ and /* ARGSUSED */,Ingo Schwarze2014-04-201-61/+61
| | | | | remove trailing whitespace and blanks before tabs, improve some indenting; no functional change
* Fully implement the \B (validate numerical expression) andIngo Schwarze2014-04-081-5/+2
| | | | | | | | | | | partially implement the \w (measure text width) escape sequence in a way that makes them usable in numerical expressions and in conditional requests, similar to how \n (interpolate number register) and \* (expand user-defined string) are implemented. This lets mandoc(1) handle the baroque low-level roff code found at the beginning of the ggrep(1) manual. Thanks to pascal@ for the report.
* Accept arbitrary argument delimiters for various roff(7) escape sequences.Ingo Schwarze2014-04-071-4/+4
| | | | Needed for example by the new Perl pod2man(1) preamble.
* The files mandoc.c and mandoc.h contained both specialised low-levelIngo Schwarze2014-03-231-68/+1
| | | | | | | functions used for multiple languages (mdoc, man, roff), for example mandoc_escape(), mandoc_getarg(), mandoc_eos(), and generic auxiliary functions. Split the auxiliaries out into their own file and header. While here, do some #include cleanup.
* Simplify: Remove an unused argument from the mandoc_eos() function.Ingo Schwarze2013-12-311-4/+4
| | | | No functional change.
* Remove duplicate const specifiers from the declaration of mandoc_escape().Ingo Schwarze2013-12-301-1/+1
| | | | | Found by Thomas Klausner <wiz at NetBSD dot org> using clang. No functional change.
* I have no idea how it happened that \B, \H, \h, \L, and \l gotIngo Schwarze2013-12-261-7/+5
| | | | | | | | | | | mapped to ESCAPE_NUMBERED (which is for \N and only for \N), that made no sense at all. Properly remap them to ESCAPE_IGNORE. While here, move \B and \w from the group taking number arguments to the group taking string arguments; right now, that doesn't imply any functional change, but if we ever go ahead and implement a parser for roff(7) numerical expressions, it will suddenly start to matter, and cause confusion.
* Parse and ignore the roff(7) escape sequences \d (move half line down)Ingo Schwarze2013-12-251-0/+8
| | | | und \u (move half line up). Found by bentley@ in some DocBook crap.