| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
escape sequence into the correct parsing class, ESCAPE_EXPAND.
Expansion of \g is supposed to work exactly like the expansion
of the related escape sequence \n (interpolate register value),
but since we ignore the .af (assign output format) request,
we just interpolate an empty string to replace the \g sequence.
Surprising as it may seem, this actually makes a formatting difference
for deviate input like ".O\gNx" which used to raise bogus "escaped
character not allowed in a name" and "skipping unknown macro" errors
and printed nothing, whereas now it correctly prints "OpenBSD".
|
|
|
|
|
|
|
|
|
| |
escape sequence. This is needed to get \V into the correct parsing
class, ESCAPE_EXPAND.
It is intentional that mandoc(1) output is *not* influenced by environment
variables, so interpolate the name of the variable with some decorating
punctuation rather than interpolating its value.
|
|
|
|
|
|
|
| |
from "ignore" to "unsupported" because when an input file uses it,
mandoc(1) is likely to significantly misformat the output,
usually showing parts of the output in a different order
than the author intended.
|
|
|
|
|
| |
that take no argument and are ignored: \% \& \^ \a \d \t \u \{ \| \}
No change to parsing or formatting needed.
|
|
|
|
| |
some diagnostics now appear in a more reasonable order, too
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
after the roff_expand() reorganization in roff.c rev. 1.388.
The new parsing direction has two effects:
1. Correct output when a line contains more than one expanding
escape sequence that has a side effect.
2. Column numbers in diagnostic messages now report the changed
column numbers after any expansions left of them have taken place;
in the past, column numbers refered to the original input line.
Arguably, item 2 was a bit better in its old state, but slightly
less helpful diagnostics are a small price to pay for correct
output. Besides, when the expansion of user-defined strings or
macros is involved, in many cases, mandoc(1) is already unable to
report meaningful line and column numbers, so item 2 is not a
noteworthy regression. The effort and code complication for fixing
that would probably be excessive, in particular since well-written
manual pages are not supposed to use such features in the first place.
|
|
|
|
| |
of the roff_expand() reorganization in roff.c rev. 1.388
|
|
|
|
| |
of the roff_expand() reorganization in roff.c rev. 1.388
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some escape sequences have side effects on global state, implying
that the order of evaluation matters. For example, this fixes the
long-standing bug that "\n+x\n+x\n+x" after ".nr x 0 1" used to
print "321"; now it correctly prints "123".
Right-to-left parsing was convenient because it implicitly handled
nested escape sequences. With correct left-to-right parsing, nesting
now requires an explicit implementation, here solved as follows:
1. Handle nested expanding escape sequences iteratively.
When finding one, expand it, then retry parsing the enclosing escape
sequence from the beginning, which will ultimately succeed as soon
as it no longer contains any nested expanding escape sequences.
2. Handle nested non-expanding escape sequences recursively.
When finding one, the escape sequence parser calls itself to find
the end of the inner sequence, then continues parsing the outer
sequence after that point.
This requires the mandoc_escape() function to operate in two different
modes. The roff(7) parser uses it in a mode where it generates
diagnostics and may return an expansion request instead of a parse
result. All other callers, in particular the formatters, use it
in a simpler mode that never generates diagnostics and always returns
a definite parsing result, but that requires all expanding escape
sequences to already have been expanded earlier. The bulk of the
code is the same for both modes.
Since this required a major rewrite of the function anyway, move
it into its own new file roff_escape.c and out of the file mandoc.c,
which was misnamed in the first place and lacks a clear focus.
As a side benefit, this also fixes a number of assertion failures
that tb@ found with afl(1), for example "\n\\\\*0", "\v\-\\*0",
and "\w\-\\\\\$0*0".
As another side benefit, it also resolves some code duplication
between mandoc_escape() and roff_expand() and centralizes all
handling of escape sequences (except for expansion) in roff_escape.c,
hopefully easing maintenance and feature improvements in the future.
While here, also move end-of-input handling out of the complicated
function roff_expand() and into the simpler function roff_parse_comment(),
making the logic easier to understand.
Since this is a major reorganization of a central component of
mandoc(1), stability of the program might slightly suffer for a few
weeks, but i believe that's not a problem at this point of the
release cycle. The new code already satisfies the regression suite,
but more tweaking and regression testing to further improve the
handling of various escape sequences will likely follow in the near
future.
|
| |
|
|
|
|
|
|
|
| |
functionality is not needed when called from roff_getarg(). This makes the
long and complicated function roff_expand() significantly shorter, and also
simpler in so far as it no longer needs to return ROFF_APPEND.
No functional change intended.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
or macro, including context-dependent error handling inside tbl(7) code
and inside .ce/.rj blocks. Use it both in the top level roff(7) parser
and inside conditional blocks.
This fixes an assertion failure triggered by ".if 1 .ce" inside tbl(7)
code, found by tb@ using afl(1).
As a side benefit for readability, only one place remains in the
code that calls the main handler functions for the various roff(7)
requests. This patch also improves column numbers in some error
messages and various comments.
|
|
|
|
|
|
| |
particularly useful for values that have non-obvious semantics
like ROFF_MAX, ROFF_cblock, ROFF_RENAMED, and TOKEN_NONE;
no code change.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Do not needlessly access the function pointer table roffs[].
Instead, simply call the block closing function directly.
2. Sort code: handle both cases of block closing at the beginning
of the function rather than one at the beginning and one at the end.
3. Trim excessive, partially repetitive and obvious comments, also
making the comments considerably more precise.
No functional change.
|
|
|
|
|
|
|
|
|
| |
and the roff_onearg() parsing function is too generic,
so provide a dedicated parsing function instead.
This fixes an assertion failure when an \o escape sequence is
passed as the argument; the bug was found by tb@ using afl(1).
It also makes mandoc output more similar to groff in various cases.
|
|
|
|
|
|
|
|
| |
break multiple element next-line scopes at the same time, similar to
what man_descope() already does for unconditional rewinding.
This fixes an assertion failure that tb@ found with afl(1), caused
by .SH .I .I .BI and similar sequences of macros without arguments.
|
|
|
|
|
|
| |
and never produce output at the place of their invocation.
Minibugs found while investigating unrelated afl(1) reports from tb@.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. The combination \z\h is a no-op whatever the argument may be.
In the past, the \z only affected the first space character generated
by the \h, which was wrong.
2. For the conbination \zX\h with a positive argument, the first
space resulting from the \h is not printed but consumed by the \z.
3. For the combination \zX\h with a negative argument, application
of the \z needs to be completed before the \h can be started.
In the past, if this combination occurred at the beginning of an
output line, the \h backed up to the beginning of the line and
after that, the \z attempted to back up even further, triggering
an assertion.
Bugs found during an audit of assignments to termp->col that i
started after the bugfix tbl_term.c rev. 1.65. The assertion
triggered by bug 3 was *not* yet found by afl(1).
|
| |
|
|
|
|
|
|
|
|
|
| |
This is needed because the TERMP_MULTICOL mode is designed such
that term_tbl() buffers all the cells of the table row before the
normal reset logic near the end of term_flushln() can be reached.
This fixes an assertion failure triggered by \z near the end
of a table cell, found by tb@ using afl(1).
|
|
|
|
|
|
|
|
| |
Apart from making sense in the first place, this fixes an assertion
failure that happened when the calculated implicit tag did not match
the string value of the first child of the node,
Bug found by tb@ using afl(1).
|
|
|
|
|
|
|
|
|
| |
another enclosing .while loop at the same time.
Instead, postpone the closing until the next iteration of ROFF_RERUN.
This prevents one-line constructions like ".while 0 .while 0 something"
and ".while rx .while rx .rr x" (which admittedly aren't particularly
useful) from dying of abort(3), which was a bug tb@ found with afl(1).
|
|
|
|
|
|
|
|
| |
index but use 0 instead of the argument, just like groff.
Warn about the invalid argument.
While here, fix the column number in another warning message.
Segfault reported by tb@, found with afl(1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
do not skip closing the table and cleaning up memory at the end of the
table in the HTML output module.
This bug resulted in skipping the tblcalc() function and reusing
the existing roffcol array for the next tbl(7) processed. If the
next table had more columns than the one ending with a horizontal
line in the last data row, uninitialized memory was read, potentially
resulting in near-infinite output.
The bug was introduced in rev. 1.29 (2018/11/26) but only fully exposed
by rev. 1.38 (2021/09/09). Until rev. 1.37, it could only cause
misformatting and invalid HTML output syntax but not huge output
because up to that point, the function did not use the roffcol array.
Nasty bug found the hard way by Michael Stapelberg on the production
server manpages.debian.org. Michael also supplied example files
and excellent instructions how to reproduce the bug, which was very
difficult because no real-world manual page is known that triggers
the bug by itself, so to reproduce the bug, mandoc(1) had to be
invoked with at least two file name arguments.
|
|
|
|
| |
designed and written last autumn, polished today
|
| |
|
|
|
|
|
| |
patch from jsg@
ok gnezdo@ miod@ jmc@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
we must not reset the recursion counter when moving beyond the end
of the *previous* expansion, but we may only do so when moving
beyond the rightmost position reached by *any* expansion in the
current equation. This matters because definitions can nest;
consider:
.EQ
define inner "content"
define outer "inner outer"
outer
.EN
This endless loop was found by tb@ using afl(1).
Incidentally, GNU eqn(1) also performs an infinite loop in this
situation and then crashes when memory runs out, but that's not an
excuse for nasty behaviour of mandoc(1).
While here, consistently print the expanded content even when the
expansion is finally truncated. While that is not likely to help
end-users, it may help authors of eqn(7) code to understand what's
going on. Besides, it sends a very clear signal that something is
amiss, which was easy to miss in the past unless people
enabled -W error or used -T lint.
|
|
|
|
|
|
|
|
|
| |
whatsoever and ends with a broken next-line scope. Obviously, this
cannot happen in a real manual page, but mandoc(1) should not die
even when fed absurd input.
This bug was independently reported by both jsg@ and tb@ who both
found it with afl(1).
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
beginning of an escape sequence: \, \E, \EE, \EEE, and so on all do
the same outside copy mode, so let them do the same in mandoc(1), too.
This fixes an assertion failure triggered by \EE*X that tb@ found
with afl(1). The first E was consumed by roff_expand(), but that
function failed to recognize the escape sequence as the expansion
of a user-defined string and handed it over to mandoc_escape(),
which consumed the second E and then died on an assertion because
it is not prepared to handle user-defined strings. Fix this by
letting *both* functions handle arbitrary numbers of 'E's correctly.
|
|
|
|
|
|
|
|
| |
for centering text spanning multiple tbl(7) columns, correctly account
for the spacing between columns instead of wrongly assuming the default
spacing of 3n.
Patch from Simon Branch <simonmbranch at gmail dot com>.
|
| |
|
|
|
|
|
|
|
|
|
| |
delete a sentence taking about databases. Having that sentence in
the first place probably was a copy-and-paste mistake when adopting
some text from the makewhatis(8) manual page.
Triggered by a smaller patch sent to discuss@
by Paul A. Patience <paul at apatience dot com>.
|
|
|
|
|
|
|
|
|
|
| |
two-value syntax "display: inline flow;" instead. In particular, there
is no need to establish a new block formatting context with "flow-root",
and in fact that's detrimental because it appears to introduce spurious
soft-wrap opportunities.
jmc@ reported a bogus line break between the opening angle bracket
generated by .Aq Mt and the following email address.
|
|
|
|
|
|
|
|
|
|
|
| |
and man(1), without restricting that statement to "man -k".
Suggested by and patch OK'ed by jmc@.
While only apropos(1) and whatis(1) strictly require the database
and while our man(1) implementation can find many manual pages even
when no database is available or when the database is incomplete or
corrupt, it does use the database even without -k whenever possible.
Consequently, this change makes the manual page less confusing.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It feels more natural to me to use -a directly when asking mandoc(1)
to use a pager. The reason that "mandoc -l" does exactly the same
as "mandoc -a" is that "mandoc" is essentially "man -lc", so the -a
implied by -l negates the -c and the -l has no effect because it is
already the default for mandoc(1).
The more usual command for doing the same is "man -l foo.1 bar.1 ..."
but that's off-topic for the mandoc(1) manual page.
Patch on tech@ from Anders Damsgaard <anders at adamsgaard dot dk>.
|
|
|
|
| |
patch from jsg@, ok jmc@ sthen@ millert@
|
|
|
|
|
|
|
|
|
|
|
| |
because all of the following hold:
* It is an alias for a part of an ancient standard that is no longer important.
* To refer to that old standard, -xpg4.2 is readily available and portable.
* It is unused in OpenBSD, FreeBSD, and NetBSD.
* Groff never supported it.
I agreed with G. Branden Robinson that deleting this from mandoc
is preferable to adding it to groff.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
making the mansearch() function easier to read for human auditors.
No functional change on OpenBSD.
As observed by Mark Millard <marklmi at yahoo dot com>, neither the
latest version of POSIX 2008 nor C11 defines what qsort(3) should do
for base == NULL && nmemb == 0.
My impression is it is indeed undefined behaviour because the
standards say that base shall point to an array, NULL does not point
to an array, and while there is special wording saying that compar()
shall not be called if nmemb == 0, i fail to see any similar wording
stating that base shall not be accessed if nmemb == 0.
Consequently, this patch is also likely to improve standard conformance
and portability.
Minor issue found by Stefan Esser <se at FreeBSD> with UBSAN.
He sent a patch to bugs@, but my patch differs in a minor way.
|
|
|
|
|
|
|
| |
output. In particular, do not represent "_" as "-", and distinguish "_"
from "\_" and "=" from "\=".
Output tweak following a related question from
Ted Bullock <tbullock at comlore dot com>.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
two character sequence "\_" or "\=", a single or double horizontal
line is supposed to be drawn inside the cell, not joining its
neighbours.
I am not aware of any way to do that with HTML and/or CSS.
Still, it seems closer to the intent of the document author to draw
a horizontal line with <hr/>, even though that line will join the
neighbour cells, rather than printing a literal '_' or '=' character.
Formatting tweak inspired by a related question from
Ted Bullock <tbullock at comlore dot com>.
|
|
|
|
|
|
|
|
| |
contains a backslash that needs to be escaped, and the
missing escaping resulted in very misleading formatting.
Documentation bug found due to a question from
Ted Bullock <tbullock at comlore dot com>.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
sequence in -T ps and -T pdf output mode, use an appropriate
horizontal distance by correctly using the term_len() utility
function. Output from the -T ascii, -T utf8, and -T html modes
was already correct and remains unchanged.
Lennart Jablonka <hummsmith42 at gmail dot com> found and reported
this unit conversion bug (misinterpreting AFM units as if they were
en units) when rendering scdoc-generated manuals (which is a low
quality generator, but that's no excuse for mandoc misformatting \h)
on Alpine Linux. Lennart also tested this patch.
|
|
|
|
| |
sytle -> style; adapted from changes by SAITOH masanobu (NetBSD)
|
|
|
|
|
|
|
|
|
|
|
| |
running with the -M option or with a MANPATH environment variable
that has neither a leading or trailing ":" nor any "::". If -M or
MANPATH override the configuration file rather than adding to it,
just ignore any "manpath" directives while processing the configuration
file.
This fixes a bug reported by Jan Stary <hans at stare dot cz>
on misc@.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For open/openat, if the flags parameter does not contain O_CREAT, the
3rd (variadic) mode_t parameter is irrelevant. Many developers in the past
have passed mode_t (0, 044, 0644, or such), which might lead future people
to copy this broken idiom, and perhaps even believe this parameter has some
meaning or implication or application. Delete them all.
This comes out of a conversation where tb@ noticed that a strange (but
intentional) pledge behaviour is to always knock-out high-bits from
mode_t on a number of system calls as a safety factor, and his bewilderment
that this appeared to be happening against valid modes (at least visually),
but no sorry, they are all irrelevant junk. They could all be 0xdeafbeef.
ok millert
|
|
|
|
|
| |
variables that are already present (and used nearby) in the code;
no functional change
|
|
|
|
|
|
|
|
| |
A comment in the code claimed that the list of spans would be sorted,
but the sorting did not actually work. The layout "LSSS,LLSL" resulted
in the list "0-3, 1-2", whereas the layout "LLSL,LSSS" resulted
in the list "1-2, 0-3". Since sorting serves no purpose, just leave
the list unsorted.
|
| |
|