summaryrefslogtreecommitdiffstats
path: root/mandoc.h
Commit message (Collapse)AuthorAgeFilesLines
* Even though the constant ASCII_ESC is only used in the roff pre-parser roff.c,Ingo Schwarze2022-08-161-0/+2
| | | | | move it to the top level include file mandoc.h to reduce the risk of causing clashes when introducing new ASCII_* constants in the future.
* Distinguish between escape sequences that produce no outputIngo Schwarze2022-08-151-2/+3
| | | | | | | | | | | | | | | | | whatsoever (for example \fR) and escape sequences that produce invisible zero-width output (for example \&). No, i'm not joking, groff does make that distinction, and it has consequences in some situations, for example for vertical spacing in no-fill mode. Heirloom and Plan 9 behaviour is subtly different, but in case of doubt, we want to follow groff. While this fixes the behaviour for the majority of escape sequences, in particular for those most likely to occur in practice, it is not perfect yet because some of the more exotic ESCAPE_IGNORE sequences are actually of the "no output whatsoever" type but treated as "invisible zero-width" for now. With the new ASCII_NBRZW mechanism in place, switching them over one by one when the need arises will no longer be very difficult.
* Split the excessively generic diagnostic message "invalid escape sequence"Ingo Schwarze2022-06-071-1/+2
| | | | | into the more specific messages "invalid escape argument delimiter" and "invalid escape sequence argument".
* With the improved escape sequence parser, it becomes easy to also improveIngo Schwarze2022-06-051-0/+5
| | | | | | | | | diagnostics. Distinguish "incomplete escape sequence", "invalid special character", and "unknown special character" from the generic "invalid escape sequence", also promoting them from WARNING to ERROR because incomplete escape sequences are severe syntax violations and because encountering an invalid or unknown special character makes it likely that part of the document content intended by the authors gets lost.
* Make roff_expand() parse left-to-right rather than right-to-left.Ingo Schwarze2022-05-191-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some escape sequences have side effects on global state, implying that the order of evaluation matters. For example, this fixes the long-standing bug that "\n+x\n+x\n+x" after ".nr x 0 1" used to print "321"; now it correctly prints "123". Right-to-left parsing was convenient because it implicitly handled nested escape sequences. With correct left-to-right parsing, nesting now requires an explicit implementation, here solved as follows: 1. Handle nested expanding escape sequences iteratively. When finding one, expand it, then retry parsing the enclosing escape sequence from the beginning, which will ultimately succeed as soon as it no longer contains any nested expanding escape sequences. 2. Handle nested non-expanding escape sequences recursively. When finding one, the escape sequence parser calls itself to find the end of the inner sequence, then continues parsing the outer sequence after that point. This requires the mandoc_escape() function to operate in two different modes. The roff(7) parser uses it in a mode where it generates diagnostics and may return an expansion request instead of a parse result. All other callers, in particular the formatters, use it in a simpler mode that never generates diagnostics and always returns a definite parsing result, but that requires all expanding escape sequences to already have been expanded earlier. The bulk of the code is the same for both modes. Since this required a major rewrite of the function anyway, move it into its own new file roff_escape.c and out of the file mandoc.c, which was misnamed in the first place and lacks a clear focus. As a side benefit, this also fixes a number of assertion failures that tb@ found with afl(1), for example "\n\\\\*0", "\v\-\\*0", and "\w\-\\\\\$0*0". As another side benefit, it also resolves some code duplication between mandoc_escape() and roff_expand() and centralizes all handling of escape sequences (except for expansion) in roff_escape.c, hopefully easing maintenance and feature improvements in the future. While here, also move end-of-input handling out of the complicated function roff_expand() and into the simpler function roff_parse_comment(), making the logic easier to understand. Since this is a major reorganization of a central component of mandoc(1), stability of the program might slightly suffer for a few weeks, but i believe that's not a problem at this point of the release cycle. The new code already satisfies the regression suite, but more tweaking and regression testing to further improve the handling of various escape sequences will likely follow in the near future.
* The syntax of the roff(7) .mc request is quite specialIngo Schwarze2022-04-281-0/+2
| | | | | | | | | and the roff_onearg() parsing function is too generic, so provide a dedicated parsing function instead. This fixes an assertion failure when an \o escape sequence is passed as the argument; the bug was found by tb@ using afl(1). It also makes mandoc output more similar to groff in various cases.
* If a .shift request has a negative argument, do not use a negative arrayIngo Schwarze2022-04-241-1/+2
| | | | | | | | index but use 0 instead of the argument, just like groff. Warn about the invalid argument. While here, fix the column number in another warning message. Segfault reported by tb@, found with afl(1).
* print a BAGARG message if -T markdown is requested on man(7) input;Ingo Schwarze2021-08-141-0/+1
| | | | suggested by Michael Stapelberg at debian dot org
* Support two-character font names (BI, CW, CR, CB, CI)Ingo Schwarze2021-08-101-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | in the tbl(7) layout font modifier. Get rid of the TBL_CELL_BOLD and TBL_CELL_ITALIC flags and use the usual ESCAPE_FONT* enum mandoc_esc members from mandoc.h instead, which simplifies and unifies some code. While here, also support CB and CI in roff(7) \f escape sequences and in roff(7) .ft requests for all output modes. Using those is certainly not recommended because portability is limited even with groff, but supporting them makes some existing third-party manual pages look better, in particular in HTML output mode. Bug-compatible with groff as far as i'm aware, except that i consider font names starting with the '\n' (ASCII 0x0a line feed) character so insane that i decided to not support them. Missing feature reported by nabijaczleweli dot xyz in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=992002. I used none of the code from the initial patch submitted by nabijaczleweli, but some of their ideas. Final patch tested by them, too.
* The mandoc(1) manual already mentions that -T man output modeIngo Schwarze2021-07-041-1/+3
| | | | | | | | | | neither supports tbl(7) nor eqn(7) input. If an input file contains such code anyway, tell the user rather than failing an assert(3)ion. Fixing a crash reported by Bjarni Ingi Gislason <bjarniig at rhi dot hi dot is> in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=901636 which the Debian maintainer of mandoc, Michael at Stapelberg dot ch, forwarded to me.
* add a style message about overlong text lines,Ingo Schwarze2021-06-271-0/+1
| | | | | | | | trying very hard to avoid false positives, not at all trying to catch as many cases as possible; feature originally suggested by tb@, OK tb@ kn@ jmc@
* In -W style mode, check .Xr links along the full manpath becauseIngo Schwarze2021-06-021-1/+1
| | | | | | | | | | | that is more useful for validating manuals of non-base software. Nothing changes in -W all mode: by default for -T lint, we still assume we want to check base system conventions, including usually not wanting to link to non-base manual pages. The use case, a partial idea how to handle it, and a preliminary patch was originally presented by kn@, then refined by me. Final patch tested and OK'ed by kn@.
* Ignore unreasonably large spacing modifiers in tbl layouts.Ingo Schwarze2020-09-011-0/+1
| | | | | | Jan Schreiber <jes at posteo dot de> ran afl on mandoc and it turned out mandoc tried to use spacing modifiers so large that they would trigger assertion failures in term_ascii.c, function locale_advance().
* provide a STYLE message when mandoc knows the file name and the extensionIngo Schwarze2020-04-241-0/+1
| | | | | disagrees with the section number given in the .Dt or .TH macro; feature suggested and patch tested by jmc@
* Remove some stray argument names from function prototypes,Ingo Schwarze2020-04-031-3/+4
| | | | | | for consistency with the dominant style used in mandoc. No functional change. Patch from Martin Vahlensieck <academicsolutions dot ch>.
* Introduce a new mdoc(7) macro .Tg ("tag") to explicitly mark a placeIngo Schwarze2020-01-191-0/+1
| | | | | | | | | | | | | | | as defining a term. Please only use it when automatic tagging does not work. Manual page authors will not be required to add the new macro; using it remains optional. HTML output is still rudimentary in this version and will be polished later. Thanks to kn@ for reminding me that i have been considering since BSDCan 2014 whether something like this might be useful. Given that possibilities of making automatic tagging better are running out and there are still several situations where automatic tagging cannot do the job, i think the time is now ripe. Feedback and no objection from millert@; OK espie@ inoguchi@ kn@.
* Align to the new, sane behaviour of the groff_mdoc(7) .Dd macro:Ingo Schwarze2020-01-191-2/+2
| | | | | | | | without an argument, use the empty string, and always concatenate all arguments, no matter their number. This allows reducing the number of arguments of mandoc_normdate() and some other simplifications, at the same time polishing some error messages by adding the name of the macro in question.
* If messages are shown and output is printed without a pager, displayIngo Schwarze2019-07-141-0/+1
| | | | | | | | | | a heads-up on stderr at the end because otherwise, users may easily miss the messages: because messages typically occur while parsing, they typically preceed the output. This is most useful with flag combinations like "-c -W all" but may also help in some unusual error scenarios. Inconvenient ordering of output originally pointed out by espie@ for the example situation that /tmp/ is not writeable.
* Some time ago, i simplified mandoc_msg() such that it can be usedIngo Schwarze2019-07-101-2/+30
| | | | | | | | | everywhere and not only in the parsers. For more uniform messages, use it at more places instead of err(3), in particular in the main program. While here, integrate a few trivial functions called at exactly one place into the main option parser, and let a few more functions use the normal convention of returning 0 for success and -1 for error.
* Yet another round of improvements to manual font selection.Ingo Schwarze2018-12-161-0/+1
| | | | | | | | | Unify handling of \f and .ft. Support \f4 (bold+italic). Support ".ft BI" and ".ft CW" for terminal output. Support the .ft request in HTML output. Reject the bogus fonts \f(C1, \f(C2, \f(C3, and \f(CP. In regress.pl, only strip leading whitespace in math mode.
* Several improvements to escape sequence handling.Ingo Schwarze2018-12-151-0/+4
| | | | | | | | | | | | | | | | | | | | | | | * Add the missing special character \_ (underscore). * Partial implementations of \a (leader character) and \E (uninterpreted escape character). * Parse and ignore \r (reverse line feed). * Add a WARNING message about undefined escape sequences. * Add an UNSUPP message about unsupported escape sequences. * Mark \! and \? (transparent throughput) and \O (suppress output) as unsupported. * Treat the various variants of zero-width spaces as one-byte escape sequences rather than as special characters, to avoid defining bogus forms with square brackets. * For special characters with one-byte names, do not define bogus forms with square brackets, except for \[-], which is valid. * In the form with square brackets, undefined special characters do not fall back to printing the name verbatim, not even for one-byte names. * Starting a special character name with a blank is an error. * Undefined escape sequences never abort formatting of the input string, not even in HTML output mode. * Document the newly handled escapes, and a few that were missing. * Regression tests for most of the above.
* Almost mechanical diff to remove the "struct mparse *" argumentIngo Schwarze2018-12-141-4/+2
| | | | | | | | from mandoc_msg(), where it is no longer used. While here, rename mandoc_vmsg() to mandoc_msg() and retire the old version: There is really no point in having another function merely to save "%s" in a few places. Minus 140 lines of code.
* Major cleanup; may imply minor changes in edge cases of error reporting.Ingo Schwarze2018-12-141-3/+11
| | | | | | | | | | | Finally, drop support for the run-time configurable mandocmsg() callback. It was over-engineered from the start, never used for anything in a decade, and repeatedly caused maintenance headaches. Consolidate reporting infrastructure into two files, mandoc.h and mandoc_msg.c, mopping up the bits and pieces that were scattered around main.c, read.c, mandoc_parse.h, libmandoc.h, the prototypes of four parsing-related functions, and both parser structs.
* Cleanup, no functional change:Ingo Schwarze2018-12-131-34/+3
| | | | | | | | | | Split the top level parser interface out of the utility header mandoc.h, into a new header mandoc_parse.h, for use in the main program and in the main parser only. Move enum mandoc_os into roff.h because struct roff_man is the place where it is stored. This allows removal of mandoc.h from seven files in low-level parsers and in formatters.
* Cleanup, no functional change:Ingo Schwarze2018-12-131-68/+0
| | | | | | No need to expose the eqn(7) syntax tree data structures everywhere. Move them to their own include file, "eqn.h". While here, delete the unused enum eqn_pilet.
* Cleanup, no functional change:Ingo Schwarze2018-12-121-106/+1
| | | | | No need to expose the tbl(7) syntax tree data structures everywhere. Move them to their own include file, "tbl.h", and improve comments.
* In tbl(7) -T html output,Ingo Schwarze2018-11-251-1/+2
| | | | | | | | | | span cells horizontally and vertically as requested by the layout. Does not handle spans requested in the data section yet. To be able to do this, record the number of rows spanned in the first data cell (struct tbl_dat) of a vertical span. Missing feature reported by Pali dot Rohar at gmail dot com.
* Implement the \f(CW and \f(CR (constant width font) escape sequencesIngo Schwarze2018-10-251-0/+1
| | | | | | | | | for HTML output. Somewhat relevant because pod2man(1) relies on this. Missing feature reported by Pali dot Rohar at gmail dot com. Note that constant width font was already correctly selected before this when required by semantic markup. Only attempting physical markup with the low-level escape sequence was ineffective.
* Rudimentary implementation of the roff(7) .char (output glyphIngo Schwarze2018-08-251-0/+2
| | | | | | | | | definition) request, used for example by groff_hdtbl(7). This simplistic implementation may interact incorrectly with the .tr (input character translation) request. But come on, you are not only using .char *and* .tr, but you do so with respect to the same character in the same manual page?
* Rudimentary implementation of the roff(7) .while request.Ingo Schwarze2018-08-241-0/+4
| | | | | | | | | | | Needed for example by groff_hdtbl(7). There are two limitations: It does not support nested .while requests yet, and each .while loop must start and end in the same scope. The roff_parseln() return codes are now more flexible and allow OR'ing options.
* The upcoming .while request will have to re-execute roff(7) linesIngo Schwarze2018-08-231-2/+1
| | | | | | | parsed earlier, so they will have to be saved for reuse - but the read.c preparser does not know yet whether a line contains a .while request before passing it to the roff parser. To cope with that, save all parsed lines for now. Even shortens the code by 20 lines.
* Implement the roff(7) .shift and .return requests,Ingo Schwarze2018-08-231-0/+4
| | | | | | | | | | | | | | for example used by groff_hdtbl(7) and groff_mom(7). Also correctly interpolate arguments during nested macro execution even after .shift and .return, implemented using a stack of argument arrays. Note that only read.c, but not roff.c can detect the end of a macro execution, and the existence of .shift implies that arguments cannot be interpolated up front, so unfortunately, this includes a partial revert of roff.c rev. 1.337, moving argument interpolation back into the function roff_res().
* Implement the \*(.T predefined string (interpolate device name)Ingo Schwarze2018-08-161-0/+1
| | | | | by allowing the preprocessor to pass it through to the formatters. Used for example by the groff_char(7) manual page.
* Issue a STYLE message when normalizing the date format in .Dd/.TH.Ingo Schwarze2018-07-281-1/+2
| | | | | | | | Leah Neukirchen pointed out that mdoclint(1) used to warn about a leading zero before the day number, so we know that both NetBSD and Void Linux want the message. It does no harm on OpenBSD because Mdocdate always does the right thing anyway. jmc@ agrees that it makes sense in contexts not using Mdocdate.
* Style message about bad input encoding of em-dashes as -- instead of \(em.Ingo Schwarze2018-03-161-0/+1
| | | | Suggested by Thomas Klausner <wiz at NetBSD>; discussed with jmc@.
* be less assertive when warning about a possible typo;Ingo Schwarze2017-11-101-1/+1
| | | | from jca@, ok jmc@
* 1. Eliminate struct eqn, instead use the existing membersIngo Schwarze2017-07-081-11/+0
| | | | | | of struct roff_node which is allocated for each equation anyway. 2. Do not keep a list of equation parsers, one parser is enough. Minus fifty lines of code, no functional change.
* garbage collect unused enum member EQN_ROOTIngo Schwarze2017-07-071-1/+0
|
* Now that we have the -Wstyle message level, downgrade six warningsIngo Schwarze2017-07-061-8/+8
| | | | | | that are not syntax mistakes and that do not cause wrong formatting or content to style suggestions. Also upgrade two warnings that may cause information loss to errors.
* The EQN_LISTONE box type is pointless.Ingo Schwarze2017-07-051-1/+0
| | | | | | Simplify by just using EQN_LIST with expectargs = 1. Noticed while investigating a bug report from bentley@. No functional change.
* report trailing delimiters after macros where they are usually a mistake;Ingo Schwarze2017-07-031-2/+2
| | | | the idea came up in a discussion with Thomas Klausner <wiz at NetBSD>
* warn about time machines; suggested by Thomas Klausner <wiz @ NetBSD>Ingo Schwarze2017-07-031-0/+1
|
* add warning "cross reference to self"; inspired by mdoclintIngo Schwarze2017-07-021-0/+1
|
* Basic reporting of .Xrs to manual pages that don't existIngo Schwarze2017-07-011-0/+1
| | | | | | | | | | | | in the base system, inspired by mdoclint(1). We are able to do this because (1) the -mdoc parser, the -Tlint validator, and the man(1) manual page lookup code are all in the same program and (2) the mandoc.db(5) database format allows fast lookup. Feedback from, previous versions tested by, and OK jmc@. A few features will be added to this in the tree, step by step.
* warn about some non-portable idioms in .Bl -column;Ingo Schwarze2017-06-291-0/+2
| | | | triggered by a question from Yuri Pankov (illumos)
* Catch typos in .Sh names; suggested by jmc@.Ingo Schwarze2017-06-251-0/+1
| | | | | | I'm using a very simple, linear time / zero space fuzzy string matching heuristic rather than a full Levenshtein metric, to keep the code both simple and fast.
* operating system dependent message about unknown architecture;Ingo Schwarze2017-06-241-0/+1
| | | | inspired by mdoclint
* in the base system, suggest leaving .Os blank; inspired by mdoclintIngo Schwarze2017-06-241-0/+1
|
* Split -Wstyle into -Wstyle and the even lower -Wbase, and addIngo Schwarze2017-06-241-3/+13
| | | | | | | | | | | | | | | -Wopenbsd and -Wnetbsd to check conventions for the base system of a specific operating system. Mark operating system specific messages with "(OpenBSD)" at the end. Please use just "-Tlint" to check base system manuals (defaulting to -Wall, which is now -Wbase), but prefer "-Tlint -Wstyle" for the manuals of portable software projects you maintain that are not part of OpenBSD base, to avoid bogus recommendations about base system conventions that do not apply. Issue originally reported by semarie@, solution using an idea from tedu@, discussed with jmc@ and jca@.
* style message about duplicate RCS ids; inspired by mdoclintIngo Schwarze2017-06-171-0/+1
|