summaryrefslogtreecommitdiffstats
path: root/regress/roff
Commit message (Collapse)AuthorAgeFilesLines
* Surprisingly, every escape sequence can also be used as an argumentIngo Schwarze2022-06-0818-23/+724
| | | | | | | delimiter for an outer escape sequence, in which case the delimiting escape sequence retains its syntax but usually ignores its argument and loses its inherent effect. Add rudimentary support for this syntax quirk in order to improve parsing compatibility with groff.
* Split the excessively generic diagnostic message "invalid escape sequence"Ingo Schwarze2022-06-075-8/+8
| | | | | into the more specific messages "invalid escape argument delimiter" and "invalid escape sequence argument".
* adjust two desired error messages after roff_escape.c rev. 1.11Ingo Schwarze2022-06-061-2/+2
| | | | improved diagnostics for the \C escape sequence
* With the improved escape sequence parser, it becomes easy to also improveIngo Schwarze2022-06-058-44/+44
| | | | | | | | | diagnostics. Distinguish "incomplete escape sequence", "invalid special character", and "unknown special character" from the generic "invalid escape sequence", also promoting them from WARNING to ERROR because incomplete escape sequences are severe syntax violations and because encountering an invalid or unknown special character makes it likely that part of the document content intended by the authors gets lost.
* During identifier parsing, handle undefined escape sequencesIngo Schwarze2022-06-0317-48/+174
| | | | | | | | | | | | | | | | | | | | | | in the same way as groff: * \\ is always reduced to \ * \. is always reduced to . * other undefined escape sequences are usually reduced to the escape name, for example \G to G, except during the expansion of expanding escape sequences having the standard argument form (in particular \* and \n), in which case the backslash is preserved literally. Yes, this is confusing indeed. For example, the following have the same meaning: * .ds \. and .ds . which is not the same as .ds \\. * \*[\.] and \*[.] which is not the same as \*[\\.] * .ds \G and .ds G which is not the same as .ds \\G * \*[\G] and \*[\\G] which is not the same as \*[G] <- sic! To feel less dirty, have a leaning toothpick, if you are so inclined. This patch also slightly improves the string shown by the "escaped character not allowed in a name" error message.
* Dummy implementation of the roff(7) \V (interpolate environment variable)Ingo Schwarze2022-05-304-3/+32
| | | | | | | | | escape sequence. This is needed to get \V into the correct parsing class, ESCAPE_EXPAND. It is intentional that mandoc(1) output is *not* influenced by environment variables, so interpolate the name of the variable with some decorating punctuation rather than interpolating its value.
* Re-classify the roff(7) \r (reverse line feed) escape sequenceIngo Schwarze2022-05-204-5/+31
| | | | | | | from "ignore" to "unsupported" because when an input file uses it, mandoc(1) is likely to significantly misformat the output, usually showing parts of the output in a different order than the author intended.
* Test the handling of some additional one-character escape sequencesIngo Schwarze2022-05-203-13/+43
| | | | | that take no argument and are ignored: \% \& \^ \a \d \t \u \{ \| \} No change to parsing or formatting needed.
* following the fixed parsing direction of roff_expand() in roff.c rev. 1.388,Ingo Schwarze2022-05-192-24/+24
| | | | some diagnostics now appear in a more reasonable order, too
* Adjust a column number in an error messageIngo Schwarze2022-05-191-1/+1
| | | | | | | | | | | | | | | | | | | | after the roff_expand() reorganization in roff.c rev. 1.388. The new parsing direction has two effects: 1. Correct output when a line contains more than one expanding escape sequence that has a side effect. 2. Column numbers in diagnostic messages now report the changed column numbers after any expansions left of them have taken place; in the past, column numbers refered to the original input line. Arguably, item 2 was a bit better in its old state, but slightly less helpful diagnostics are a small price to pay for correct output. Besides, when the expansion of user-defined strings or macros is involved, in many cases, mandoc(1) is already unable to report meaningful line and column numbers, so item 2 is not a noteworthy regression. The effort and code complication for fixing that would probably be excessive, in particular since well-written manual pages are not supposed to use such features in the first place.
* fix a wrong column number that got fixed as a side effectIngo Schwarze2022-05-191-1/+1
| | | | of the roff_expand() reorganization in roff.c rev. 1.388
* remove a bogus warning that went away as a side effectIngo Schwarze2022-05-191-1/+0
| | | | of the roff_expand() reorganization in roff.c rev. 1.388
* Split a new function roff_parse_comment() out of roff_expand() because thisIngo Schwarze2022-05-014-3/+48
| | | | | | | functionality is not needed when called from roff_getarg(). This makes the long and complicated function roff_expand() significantly shorter, and also simpler in so far as it no longer needs to return ROFF_APPEND. No functional change intended.
* Provide a new function roff_req_or_macro() to parse and handle a requestIngo Schwarze2022-04-303-2/+64
| | | | | | | | | | | | | | or macro, including context-dependent error handling inside tbl(7) code and inside .ce/.rj blocks. Use it both in the top level roff(7) parser and inside conditional blocks. This fixes an assertion failure triggered by ".if 1 .ce" inside tbl(7) code, found by tb@ using afl(1). As a side benefit for readability, only one place remains in the code that calls the main handler functions for the various roff(7) requests. This patch also improves column numbers in some error messages and various comments.
* The syntax of the roff(7) .mc request is quite specialIngo Schwarze2022-04-285-2/+65
| | | | | | | | | and the roff_onearg() parsing function is too generic, so provide a dedicated parsing function instead. This fixes an assertion failure when an \o escape sequence is passed as the argument; the bug was found by tb@ using afl(1). It also makes mandoc output more similar to groff in various cases.
* Fix three bugs regarding the interaction of \z and \h:Ingo Schwarze2022-04-275-4/+41
| | | | | | | | | | | | | | | | | | | | 1. The combination \z\h is a no-op whatever the argument may be. In the past, the \z only affected the first space character generated by the \h, which was wrong. 2. For the conbination \zX\h with a positive argument, the first space resulting from the \h is not printed but consumed by the \z. 3. For the combination \zX\h with a negative argument, application of the \z needs to be completed before the \h can be started. In the past, if this combination occurred at the beginning of an output line, the \h backed up to the beginning of the line and after that, the \z attempted to back up even further, triggering an assertion. Bugs found during an audit of assignments to termp->col that i started after the bugfix tbl_term.c rev. 1.65. The assertion triggered by bug 3 was *not* yet found by afl(1).
* If a .shift request has a negative argument, do not use a negative arrayIngo Schwarze2022-04-243-6/+13
| | | | | | | | index but use 0 instead of the argument, just like groff. Warn about the invalid argument. While here, fix the column number in another warning message. Segfault reported by tb@, found with afl(1).
* Surprisingly, groff supports multiple copy mode escapes at theIngo Schwarze2022-04-133-2/+50
| | | | | | | | | | | | | beginning of an escape sequence: \, \E, \EE, \EEE, and so on all do the same outside copy mode, so let them do the same in mandoc(1), too. This fixes an assertion failure triggered by \EE*X that tb@ found with afl(1). The first E was consumed by roff_expand(), but that function failed to recognize the escape sequence as the expansion of a user-defined string and handed it over to mandoc_escape(), which consumed the second E and then died on an assertion because it is not prepared to handle user-defined strings. Fix this by letting *both* functions handle arbitrary numbers of 'E's correctly.
* Support two-character font names (BI, CW, CR, CB, CI)Ingo Schwarze2021-08-102-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | in the tbl(7) layout font modifier. Get rid of the TBL_CELL_BOLD and TBL_CELL_ITALIC flags and use the usual ESCAPE_FONT* enum mandoc_esc members from mandoc.h instead, which simplifies and unifies some code. While here, also support CB and CI in roff(7) \f escape sequences and in roff(7) .ft requests for all output modes. Using those is certainly not recommended because portability is limited even with groff, but supporting them makes some existing third-party manual pages look better, in particular in HTML output mode. Bug-compatible with groff as far as i'm aware, except that i consider font names starting with the '\n' (ASCII 0x0a line feed) character so insane that i decided to not support them. Missing feature reported by nabijaczleweli dot xyz in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=992002. I used none of the code from the initial patch submitted by nabijaczleweli, but some of their ideas. Final patch tested by them, too.
* delete the two pairs of extra blank lines from expected man(7) terminalIngo Schwarze2021-06-2854-216/+0
| | | | output that are no longer printed since man_term.c rev. 1.236
* Rename syntax test of the \O escape sequence (suppress output groffIngo Schwarze2020-12-216-26/+26
| | | | | | extension; mandoc only implements syntax checking but ignores the sequence) to please Bill Gates and didickman@: avoid path names that only differ by case, like o.in vs. O.in.
* Treat \*[.T] in the same way as \*(.T rather than calling abort(3).Ingo Schwarze2020-10-247-13/+20
| | | | | Bug found because the groff-current manual pages started using the variant form of this predefined string.
* In HTML output, avoid printing a newline right after <pre>Ingo Schwarze2020-10-162-5/+2
| | | | | | | | and right before </pre> because that resulted in vertical whitespace not requested by the manual page author. Formatting bug reported by Aman Verma <amanraoverma plus vim at gmail dot com> on discuss@.
* Fix two issues with .po (page offset) formatting:Ingo Schwarze2020-09-033-2/+53
| | | | | | | | | | | | 1. Truncate excessive offsets to a width reasonable in the context of manual pages instead of printing excessively long lines and sometimes causing assertion failures; found in an afl run performed by Jan Schreiber <jes at posteo dot de>. 2. Remember both the requested and the applied page offset; otherwise, subtracting an excessive width, then adding it again, would end up with an incorrectly large offset. While here, simplify the code by reverting the previous offset up front, and also add some comments to make the general ideas easier to understand.
* If .ti had an excessive argument, using it was attempted, in someIngo Schwarze2020-09-033-2/+49
| | | | | | | | cases resulting in an assertion failure. Instead, truncate the temporary indent to a width reasonable in a manual page. I found the issue in an afl run that was performed by Jan Schreiber <jes at posteo dot de>.
* Do not indent by SIZE_MAX/2 when .ce occurs inside explicit no-fill mode.Ingo Schwarze2020-09-022-4/+20
| | | | | | | | While here, drop two unused arguments from the function term_field(); the related work was already done by term_fill() before this commit. I found the bug in an afl run that was performed by Jan Schreiber <jes at posteo dot de>.
* Put the code handling \} into a new function roff_cond_checkend()Ingo Schwarze2020-08-037-6/+95
| | | | | | | | | | | | | | | | | | | | and call that function not only from both places where copies existed - when processing text lines and when processing request/macro lines in conditional block scope - but also when closing a macro definition request, such that this construction works: .if n \{.de macroname macro content .. \} ignored arguments .macroname This fixes a bug reported by John Gardner <gardnerjohng at gmail dot com>. While here, avoid a confusing decrement of the line scope counter in roffnode_cleanscope() for conditional blocks that do not have line scope in the first place (no functional change for this part). Also improve validation of an internal invariant in roff_cblock() and polish some comments.
* trivial sync with OpenBSDIngo Schwarze2020-07-301-6/+6
| | | | | in parts of these files that are not used by -portable; consequently, no functional change
* adapt to new <p> output logic (html.c rev. 1.260)Ingo Schwarze2019-09-034-14/+6
|
* In HTML output, allow switching the desired font for subsequentIngo Schwarze2019-04-301-5/+4
| | | | | | | | text without printing an opening tag right away, and use that in the .ft request handler. While here, garbage collect redundant enum htmlfont and reduce code duplication in print_text(). Fixing an assertion failure reported by Michael <Stapelberg at Debian> in pmRegisterDerived(3) from libpcp3-dev.
* When calling an empty macro, do not clobber existing arguments.Ingo Schwarze2019-04-213-3/+30
| | | | | Fixing a bug found with the groffer(1) version 1.19 manual page following a report from Jan Stary.
* Implement the roff .break request (break out of a .while loop).Ingo Schwarze2019-04-213-2/+27
| | | | | | | Jan Stary <hans at stare dot cz> found it in an ancient groffer(1) manual page (version 1.19) on MacOS X Mojave. Having .break not implemented wasn't a particularly bright idea because obviously, it tended to cause infinite loops.
* Wrap .Sh/.SH sections and .Ss/.SS subsections in HTML <section> elementsIngo Schwarze2019-03-011-1/+1
| | | | | | as recommended for accessibility by the HTML 5 standard. Triggered by a similar, but slightly different suggestion from Laura Morales <lauretas at mail dot com>.
* Let roff_getname() end the roff identifier at a tab characterIngo Schwarze2019-02-0617-15/+199
| | | | | | | | | | | | | | | | | | | | | | and audit all its callers whether termination is handled correctly. Resulting improvements: * An escape or tab ending the macro name in a macro invocation is discarded, and argument processing is started after it. * An escape or tab ending a name in ".if d" and ".if r" is preserved. * An escape ending a name in ".ds" causes the whole request to be ignored. * A tab ending a name in ".ds" becomes part of the string. * An escape or tab ending a name in ".rm" causes the rest of the line to be ignored. * An escape or tab ending the first name in ".als", ".rn", or ".nr" causes the whole request to be ignored. Kurt Jaeger <pi at FreeBSD> made me aware of https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235456#c0 and in that bug report, comment 0 item (3) is a special case of this class of issues. Yes, the "mh" manual pages are no doubt among the worst on the planet.
* Test handling of escaped backslashes because the code related toIngo Schwarze2019-01-175-2/+97
| | | | copy mode is complicated and prone to regressions.
* Represent mdoc(7) .Pp (and .sp, and some SYNOPSIS and .Rs features)Ingo Schwarze2019-01-075-30/+14
| | | | | | | | | | | | | | | | | | | by the <p> HTML element and use the html_fillmode() mechanism for .Bd -unfilled, just like it was done for man(7) earlier, finally getting rid both of the horrible <div class="Pp"></div> hack and of the worst HTML syntax violations caused by nested displays. Care is needed because in some situations, paragraphs have to remain open across several subsequent macros, whereas in other situations, they must get closed together with a block containing them. Some implementation details include: * Always close paragraphs before emitting HTML flow content. * Let html_close_paragraph() also close <pre> for extra safety. * Drop the old, now unused function print_paragraph(). * Minor adjustments in the top-level man(7) node formatter for symmetry. * Bugfix: .Ss heads suspend no-fill mode, even though .Ss doesn't end it. * Bugfix: give up on .Op semantic markup for now, see the comment.
* Finally, represent the man(7) .PP and .HP macros by the naturalIngo Schwarze2019-01-064-3/+53
| | | | | | | | | | | choice, which is the <p> HTML element. On top of the previous fill-mode improvements, the key to making this possible is to automatically close the <p> when required: before headers, subsequent paragraphs, lists, indented blocks, synopsis blocks, tbl(7) blocks, and before blocks using no-fill mode. In man(7) documents, represent the .sp request by a blank line in no-fill mode and in the same way as .PP in fill mode.
* test the roff(7) .ce and .rj requests;Ingo Schwarze2019-01-044-2/+43
| | | | they were already supported in the past
* merge a test update from OpenBSD that was forgotten in AprilIngo Schwarze2018-12-212-1/+11
|
* Rename mandoc_getarg() to roff_getarg() and pass it the roff parserIngo Schwarze2018-12-211-6/+10
| | | | | | | | | | | | | | | | | | struct as an argument such that after copy-in, it can call roff_expand() once again, which used to be called roff_res() before this. This fixes a subtle low-level roff(7) parsing bug reported by Fabio Scotoni <fabio at esse dot ch> in the 4.4BSD-Lite2 mdoc.samples(7) manual page, because that page used an escaped escape sequence in a macro argument. To expand escaped escape sequences in quoted mdoc(7) arguments, too, stop bypassing the call to roff_getarg() in mdoc_argv.c, function args() for this case. This does not solve the case of escaped escape sequences in quoted .Bl -column phrases yet. Because roff_expand() can make the string longer, roff_getarg() can no longer operate in-place but needs to malloc(3) the returned string. In the high-level parsers, free(3) that string after processing it.
* Yet another round of improvements to manual font selection.Ingo Schwarze2018-12-169-25/+68
| | | | | | | | | Unify handling of \f and .ft. Support \f4 (bold+italic). Support ".ft BI" and ".ft CW" for terminal output. Support the .ft request in HTML output. Reject the bogus fonts \f(C1, \f(C2, \f(C3, and \f(CP. In regress.pl, only strip leading whitespace in math mode.
* Several improvements to escape sequence handling.Ingo Schwarze2018-12-1515-38/+218
| | | | | | | | | | | | | | | | | | | | | | | * Add the missing special character \_ (underscore). * Partial implementations of \a (leader character) and \E (uninterpreted escape character). * Parse and ignore \r (reverse line feed). * Add a WARNING message about undefined escape sequences. * Add an UNSUPP message about unsupported escape sequences. * Mark \! and \? (transparent throughput) and \O (suppress output) as unsupported. * Treat the various variants of zero-width spaces as one-byte escape sequences rather than as special characters, to avoid defining bogus forms with square brackets. * For special characters with one-byte names, do not define bogus forms with square brackets, except for \[-], which is valid. * In the form with square brackets, undefined special characters do not fall back to printing the name verbatim, not even for one-byte names. * Starting a special character name with a blank is an error. * Undefined escape sequences never abort formatting of the input string, not even in HTML output mode. * Document the newly handled escapes, and a few that were missing. * Regression tests for most of the above.
* Clean up the validation of .Pp, .PP, .sp, and .br. Make sure allIngo Schwarze2018-12-042-0/+3
| | | | | | | | | | | | | | combinations are handled, and are handled in a systematic manner. This resolves some erratic duplicate handling, handles a number of missing cases, and improves diagnostics in various respects. Move validation of .br and .sp to the roff validation module rather than doing that twice in the mdoc and man validation modules. Move the node relinking function to the roff library where it belongs. In validation functions, only look at the node itself, at previous nodes, and at descendants, not at following nodes or ancestors, such that only nodes are inspected which are already validated.
* When a conditional block is closed by putting "\}" on a text lineIngo Schwarze2018-11-263-4/+30
| | | | | | | | | | | | | by itself (which is somewhat unusual but not invalid; most authors use the empty macro line ".\}" instead), agree more closely with groff and do not produce a double space in the output. Quirk reported by millert@. While here, tweak the rest of the function body of roff_cond_text() to more closely match roff_cond_sub(). The subtly different handling could make people (including myself) wonder whether there is any point in being different. Testing shows there is not.
* Rudimentary implementation of the roff(7) .char (output glyphIngo Schwarze2018-08-257-2/+61
| | | | | | | | | definition) request, used for example by groff_hdtbl(7). This simplistic implementation may interact incorrectly with the .tr (input character translation) request. But come on, you are not only using .char *and* .tr, but you do so with respect to the same character in the same manual page?
* If man(7) next-line scope is open and the line ends with \c,Ingo Schwarze2018-08-252-3/+29
| | | | the scope remains open. Needed for example for groff_man(7).
* Rudimentary implementation of the roff(7) .while request.Ingo Schwarze2018-08-2416-2/+182
| | | | | | | | | | | Needed for example by groff_hdtbl(7). There are two limitations: It does not support nested .while requests yet, and each .while loop must start and end in the same scope. The roff_parseln() return codes are now more flexible and allow OR'ing options.
* Implement the roff(7) .shift and .return requests,Ingo Schwarze2018-08-2314-8/+184
| | | | | | | | | | | | | | for example used by groff_hdtbl(7) and groff_mom(7). Also correctly interpolate arguments during nested macro execution even after .shift and .return, implemented using a stack of argument arrays. Note that only read.c, but not roff.c can detect the end of a macro execution, and the existence of .shift implies that arguments cannot be interpolated up front, so unfortunately, this includes a partial revert of roff.c rev. 1.337, moving argument interpolation back into the function roff_res().
* Disable one test for now that is broken after the addition of \).Ingo Schwarze2018-08-192-4/+3
| | | | | | | | It is not broken because of \), which is correctly implemented, but the addition merely reveals a hidden bug elsewhere, almost certainly in \\ handling. Given that \\ is among the most mysterious escape sequences and using it is very strongly discouraged in manual pages, fixing that is not urgent - and may be hard.
* Implement the \*(.T predefined string (interpolate device name)Ingo Schwarze2018-08-166-1/+90
| | | | | by allowing the preprocessor to pass it through to the formatters. Used for example by the groff_char(7) manual page.