| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
move it to the top level include file mandoc.h to reduce the risk of causing
clashes when introducing new ASCII_* constants in the future.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
whatsoever (for example \fR) and escape sequences that produce
invisible zero-width output (for example \&). No, i'm not joking,
groff does make that distinction, and it has consequences in some
situations, for example for vertical spacing in no-fill mode.
Heirloom and Plan 9 behaviour is subtly different, but in case of
doubt, we want to follow groff.
While this fixes the behaviour for the majority of escape sequences,
in particular for those most likely to occur in practice, it is not
perfect yet because some of the more exotic ESCAPE_IGNORE sequences
are actually of the "no output whatsoever" type but treated
as "invisible zero-width" for now. With the new ASCII_NBRZW mechanism
in place, switching them over one by one when the need arises will
no longer be very difficult.
|
|
|
|
|
| |
into the more specific messages "invalid escape argument delimiter"
and "invalid escape sequence argument".
|
|
|
|
|
|
|
|
|
| |
diagnostics. Distinguish "incomplete escape sequence", "invalid special
character", and "unknown special character" from the generic "invalid
escape sequence", also promoting them from WARNING to ERROR because
incomplete escape sequences are severe syntax violations and because
encountering an invalid or unknown special character makes it likely
that part of the document content intended by the authors gets lost.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some escape sequences have side effects on global state, implying
that the order of evaluation matters. For example, this fixes the
long-standing bug that "\n+x\n+x\n+x" after ".nr x 0 1" used to
print "321"; now it correctly prints "123".
Right-to-left parsing was convenient because it implicitly handled
nested escape sequences. With correct left-to-right parsing, nesting
now requires an explicit implementation, here solved as follows:
1. Handle nested expanding escape sequences iteratively.
When finding one, expand it, then retry parsing the enclosing escape
sequence from the beginning, which will ultimately succeed as soon
as it no longer contains any nested expanding escape sequences.
2. Handle nested non-expanding escape sequences recursively.
When finding one, the escape sequence parser calls itself to find
the end of the inner sequence, then continues parsing the outer
sequence after that point.
This requires the mandoc_escape() function to operate in two different
modes. The roff(7) parser uses it in a mode where it generates
diagnostics and may return an expansion request instead of a parse
result. All other callers, in particular the formatters, use it
in a simpler mode that never generates diagnostics and always returns
a definite parsing result, but that requires all expanding escape
sequences to already have been expanded earlier. The bulk of the
code is the same for both modes.
Since this required a major rewrite of the function anyway, move
it into its own new file roff_escape.c and out of the file mandoc.c,
which was misnamed in the first place and lacks a clear focus.
As a side benefit, this also fixes a number of assertion failures
that tb@ found with afl(1), for example "\n\\\\*0", "\v\-\\*0",
and "\w\-\\\\\$0*0".
As another side benefit, it also resolves some code duplication
between mandoc_escape() and roff_expand() and centralizes all
handling of escape sequences (except for expansion) in roff_escape.c,
hopefully easing maintenance and feature improvements in the future.
While here, also move end-of-input handling out of the complicated
function roff_expand() and into the simpler function roff_parse_comment(),
making the logic easier to understand.
Since this is a major reorganization of a central component of
mandoc(1), stability of the program might slightly suffer for a few
weeks, but i believe that's not a problem at this point of the
release cycle. The new code already satisfies the regression suite,
but more tweaking and regression testing to further improve the
handling of various escape sequences will likely follow in the near
future.
|
|
|
|
|
|
|
|
|
| |
and the roff_onearg() parsing function is too generic,
so provide a dedicated parsing function instead.
This fixes an assertion failure when an \o escape sequence is
passed as the argument; the bug was found by tb@ using afl(1).
It also makes mandoc output more similar to groff in various cases.
|
|
|
|
|
|
|
|
| |
index but use 0 instead of the argument, just like groff.
Warn about the invalid argument.
While here, fix the column number in another warning message.
Segfault reported by tb@, found with afl(1).
|
|
|
|
| |
suggested by Michael Stapelberg at debian dot org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
in the tbl(7) layout font modifier.
Get rid of the TBL_CELL_BOLD and TBL_CELL_ITALIC flags and use
the usual ESCAPE_FONT* enum mandoc_esc members from mandoc.h instead,
which simplifies and unifies some code.
While here, also support CB and CI in roff(7) \f escape sequences
and in roff(7) .ft requests for all output modes. Using those is
certainly not recommended because portability is limited even with
groff, but supporting them makes some existing third-party manual
pages look better, in particular in HTML output mode.
Bug-compatible with groff as far as i'm aware, except that i consider
font names starting with the '\n' (ASCII 0x0a line feed) character
so insane that i decided to not support them.
Missing feature reported by nabijaczleweli dot xyz in
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=992002.
I used none of the code from the initial patch submitted by
nabijaczleweli, but some of their ideas.
Final patch tested by them, too.
|
|
|
|
|
|
|
|
|
|
| |
neither supports tbl(7) nor eqn(7) input.
If an input file contains such code anyway, tell the user
rather than failing an assert(3)ion.
Fixing a crash reported by Bjarni Ingi Gislason <bjarniig at rhi dot hi dot is>
in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=901636 which the
Debian maintainer of mandoc, Michael at Stapelberg dot ch, forwarded to me.
|
|
|
|
|
|
|
|
| |
trying very hard to avoid false positives,
not at all trying to catch as many cases as possible;
feature originally suggested by tb@,
OK tb@ kn@ jmc@
|
|
|
|
|
|
|
|
|
|
|
| |
that is more useful for validating manuals of non-base software.
Nothing changes in -W all mode: by default for -T lint, we still
assume we want to check base system conventions, including usually
not wanting to link to non-base manual pages.
The use case, a partial idea how to handle it, and a preliminary
patch was originally presented by kn@, then refined by me.
Final patch tested and OK'ed by kn@.
|
|
|
|
|
|
| |
Jan Schreiber <jes at posteo dot de> ran afl on mandoc and it turned
out mandoc tried to use spacing modifiers so large that they would
trigger assertion failures in term_ascii.c, function locale_advance().
|
|
|
|
|
| |
disagrees with the section number given in the .Dt or .TH macro;
feature suggested and patch tested by jmc@
|
|
|
|
|
|
| |
for consistency with the dominant style used in mandoc.
No functional change.
Patch from Martin Vahlensieck <academicsolutions dot ch>.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
as defining a term. Please only use it when automatic tagging does
not work. Manual page authors will not be required to add the new
macro; using it remains optional. HTML output is still rudimentary
in this version and will be polished later.
Thanks to kn@ for reminding me that i have been considering since
BSDCan 2014 whether something like this might be useful. Given
that possibilities of making automatic tagging better are running
out and there are still several situations where automatic tagging
cannot do the job, i think the time is now ripe.
Feedback and no objection from millert@; OK espie@ inoguchi@ kn@.
|
|
|
|
|
|
|
|
| |
without an argument, use the empty string, and always concatenate
all arguments, no matter their number.
This allows reducing the number of arguments of mandoc_normdate()
and some other simplifications, at the same time polishing some
error messages by adding the name of the macro in question.
|
|
|
|
|
|
|
|
|
|
| |
a heads-up on stderr at the end because otherwise, users may easily
miss the messages: because messages typically occur while parsing,
they typically preceed the output. This is most useful with flag
combinations like "-c -W all" but may also help in some unusual
error scenarios.
Inconvenient ordering of output originally pointed out by espie@
for the example situation that /tmp/ is not writeable.
|
|
|
|
|
|
|
|
|
| |
everywhere and not only in the parsers.
For more uniform messages, use it at more places instead of err(3),
in particular in the main program.
While here, integrate a few trivial functions called at exactly one
place into the main option parser, and let a few more functions use
the normal convention of returning 0 for success and -1 for error.
|
|
|
|
|
|
|
|
|
| |
Unify handling of \f and .ft.
Support \f4 (bold+italic).
Support ".ft BI" and ".ft CW" for terminal output.
Support the .ft request in HTML output.
Reject the bogus fonts \f(C1, \f(C2, \f(C3, and \f(CP.
In regress.pl, only strip leading whitespace in math mode.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add the missing special character \_ (underscore).
* Partial implementations of \a (leader character)
and \E (uninterpreted escape character).
* Parse and ignore \r (reverse line feed).
* Add a WARNING message about undefined escape sequences.
* Add an UNSUPP message about unsupported escape sequences.
* Mark \! and \? (transparent throughput)
and \O (suppress output) as unsupported.
* Treat the various variants of zero-width spaces as one-byte escape
sequences rather than as special characters, to avoid defining bogus
forms with square brackets.
* For special characters with one-byte names, do not define bogus
forms with square brackets, except for \[-], which is valid.
* In the form with square brackets, undefined special characters do not
fall back to printing the name verbatim, not even for one-byte names.
* Starting a special character name with a blank is an error.
* Undefined escape sequences never abort formatting of the input
string, not even in HTML output mode.
* Document the newly handled escapes, and a few that were missing.
* Regression tests for most of the above.
|
|
|
|
|
|
|
|
| |
from mandoc_msg(), where it is no longer used.
While here, rename mandoc_vmsg() to mandoc_msg() and retire the
old version: There is really no point in having another function
merely to save "%s" in a few places.
Minus 140 lines of code.
|
|
|
|
|
|
|
|
|
|
|
| |
Finally, drop support for the run-time configurable mandocmsg()
callback. It was over-engineered from the start, never used for
anything in a decade, and repeatedly caused maintenance headaches.
Consolidate reporting infrastructure into two files, mandoc.h and
mandoc_msg.c, mopping up the bits and pieces that were scattered
around main.c, read.c, mandoc_parse.h, libmandoc.h, the prototypes
of four parsing-related functions, and both parser structs.
|
|
|
|
|
|
|
|
|
|
| |
Split the top level parser interface out of the utility header
mandoc.h, into a new header mandoc_parse.h, for use in the main
program and in the main parser only.
Move enum mandoc_os into roff.h because struct roff_man is the
place where it is stored.
This allows removal of mandoc.h from seven files in low-level
parsers and in formatters.
|
|
|
|
|
|
| |
No need to expose the eqn(7) syntax tree data structures everywhere.
Move them to their own include file, "eqn.h".
While here, delete the unused enum eqn_pilet.
|
|
|
|
|
| |
No need to expose the tbl(7) syntax tree data structures everywhere.
Move them to their own include file, "tbl.h", and improve comments.
|
|
|
|
|
|
|
|
|
|
| |
span cells horizontally and vertically as requested by the layout.
Does not handle spans requested in the data section yet.
To be able to do this, record the number of rows spanned
in the first data cell (struct tbl_dat) of a vertical span.
Missing feature reported by Pali dot Rohar at gmail dot com.
|
|
|
|
|
|
|
|
|
| |
for HTML output. Somewhat relevant because pod2man(1) relies on this.
Missing feature reported by Pali dot Rohar at gmail dot com.
Note that constant width font was already correctly selected before
this when required by semantic markup. Only attempting physical
markup with the low-level escape sequence was ineffective.
|
|
|
|
|
|
|
|
|
| |
definition) request, used for example by groff_hdtbl(7).
This simplistic implementation may interact incorrectly
with the .tr (input character translation) request.
But come on, you are not only using .char *and* .tr, but you do so
with respect to the same character in the same manual page?
|
|
|
|
|
|
|
|
|
|
|
| |
Needed for example by groff_hdtbl(7).
There are two limitations:
It does not support nested .while requests yet,
and each .while loop must start and end in the same scope.
The roff_parseln() return codes are now more flexible
and allow OR'ing options.
|
|
|
|
|
|
|
| |
parsed earlier, so they will have to be saved for reuse - but the
read.c preparser does not know yet whether a line contains a .while
request before passing it to the roff parser. To cope with that,
save all parsed lines for now. Even shortens the code by 20 lines.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
for example used by groff_hdtbl(7) and groff_mom(7).
Also correctly interpolate arguments during nested macro execution
even after .shift and .return, implemented using a stack of argument
arrays.
Note that only read.c, but not roff.c can detect the end of a macro
execution, and the existence of .shift implies that arguments cannot
be interpolated up front, so unfortunately, this includes a partial
revert of roff.c rev. 1.337, moving argument interpolation back into
the function roff_res().
|
|
|
|
|
| |
by allowing the preprocessor to pass it through to the formatters.
Used for example by the groff_char(7) manual page.
|
|
|
|
|
|
|
|
| |
Leah Neukirchen pointed out that mdoclint(1) used to warn about a
leading zero before the day number, so we know that both NetBSD and
Void Linux want the message. It does no harm on OpenBSD because
Mdocdate always does the right thing anyway.
jmc@ agrees that it makes sense in contexts not using Mdocdate.
|
|
|
|
| |
Suggested by Thomas Klausner <wiz at NetBSD>; discussed with jmc@.
|
|
|
|
| |
from jca@, ok jmc@
|
|
|
|
|
|
| |
of struct roff_node which is allocated for each equation anyway.
2. Do not keep a list of equation parsers, one parser is enough.
Minus fifty lines of code, no functional change.
|
| |
|
|
|
|
|
|
| |
that are not syntax mistakes and that do not cause wrong formatting
or content to style suggestions.
Also upgrade two warnings that may cause information loss to errors.
|
|
|
|
|
|
| |
Simplify by just using EQN_LIST with expectargs = 1.
Noticed while investigating a bug report from bentley@.
No functional change.
|
|
|
|
| |
the idea came up in a discussion with Thomas Klausner <wiz at NetBSD>
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
in the base system, inspired by mdoclint(1).
We are able to do this because (1) the -mdoc parser, the -Tlint validator,
and the man(1) manual page lookup code are all in the same program
and (2) the mandoc.db(5) database format allows fast lookup.
Feedback from, previous versions tested by, and OK jmc@.
A few features will be added to this in the tree, step by step.
|
|
|
|
| |
triggered by a question from Yuri Pankov (illumos)
|
|
|
|
|
|
| |
I'm using a very simple, linear time / zero space fuzzy string
matching heuristic rather than a full Levenshtein metric, to keep
the code both simple and fast.
|
|
|
|
| |
inspired by mdoclint
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
-Wopenbsd and -Wnetbsd to check conventions for the base system of
a specific operating system. Mark operating system specific messages
with "(OpenBSD)" at the end.
Please use just "-Tlint" to check base system manuals (defaulting
to -Wall, which is now -Wbase), but prefer "-Tlint -Wstyle" for the
manuals of portable software projects you maintain that are not
part of OpenBSD base, to avoid bogus recommendations about base
system conventions that do not apply.
Issue originally reported by semarie@, solution using
an idea from tedu@, discussed with jmc@ and jca@.
|
| |
|