| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
wrong parsing class ESCAPE_SPECIAL to the better-suited parsing class
ESCAPE_UNDEF, exactly like it is already done for the similar \\,
which isn't a character escape sequence either.
No formatting change is intended just yet, but this will matter for
upcoming improvements in the parser for roff(7) macro, string, and
register names.
See the node "5.23.2 Copy Mode" in "info groff" regarding
what \\ and \. really mean.
|
|
|
|
| |
Noticed because Branden Robinson worked on related documentation in groff.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add the missing special character \_ (underscore).
* Partial implementations of \a (leader character)
and \E (uninterpreted escape character).
* Parse and ignore \r (reverse line feed).
* Add a WARNING message about undefined escape sequences.
* Add an UNSUPP message about unsupported escape sequences.
* Mark \! and \? (transparent throughput)
and \O (suppress output) as unsupported.
* Treat the various variants of zero-width spaces as one-byte escape
sequences rather than as special characters, to avoid defining bogus
forms with square brackets.
* For special characters with one-byte names, do not define bogus
forms with square brackets, except for \[-], which is valid.
* In the form with square brackets, undefined special characters do not
fall back to printing the name verbatim, not even for one-byte names.
* Starting a special character name with a blank is an error.
* Undefined escape sequences never abort formatting of the input
string, not even in HTML output mode.
* Document the newly handled escapes, and a few that were missing.
* Regression tests for most of the above.
|
|
|
|
|
|
|
|
|
|
|
| |
Finally, drop support for the run-time configurable mandocmsg()
callback. It was over-engineered from the start, never used for
anything in a decade, and repeatedly caused maintenance headaches.
Consolidate reporting infrastructure into two files, mandoc.h and
mandoc_msg.c, mopping up the bits and pieces that were scattered
around main.c, read.c, mandoc_parse.h, libmandoc.h, the prototypes
of four parsing-related functions, and both parser structs.
|
|
|
|
|
| |
and of the playing card suits to match groff, using feedback
from Ralph Corderoy <ralph at inputplus dot co dot uk>.
|
|
|
|
|
|
| |
* Add two missing characters, \('Y and \('y.
* The Weierstrass p is not capital, see http://unicode.org/notes/tn27/.
* Add a groff-compatible ASCII transliteration for U+02DC: "~".
|
|
|
|
|
| |
intentionally leave it undocumented. Abused for example in the
groff(7) manual page.
|
|
|
|
|
| |
that can be changed unilaterally because groff fails to render them
at all.
|
|
|
|
|
|
|
|
|
|
| |
letters as in groff commit babca15f from trying to imitate the
characters' graphical shapes, which resulted in unintelligible
renderings in many cases, to transliterations conveying the characters'
meanings. One benefit is making these characters usable for portable
manual pages.
Solving a problem pointed out by bentley@.
|
|
|
|
| |
abused by mail/nmh; groff_char(7) confirms that this really exists
|
|
|
|
| |
triggered by multimedia/mkvtoolnix mkvmerge(1) using \(S2
|
|
|
|
|
|
|
|
|
|
|
| |
ugly in -Tascii output. For that reason, bentley@ submitted patches
to render "..." instead to groff in November 2014 (yes, more than
two years ago). Carsten Kunze yesterday merged them for the upcoming
groff-1.22.4 release. Yay!
Consequently, do the same in mandoc: Render \(Lq and \(Rq (which
are used for .Do, .Dq, .Lb, and .St) as '"' in -Tascii output.
All other output modes including -Tutf8 remain unchanged.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Use ohash(3) rather than a hand-rolled hash table.
* Make the character table static in the chars.c module:
There is no need to pass a pointer around, we most certainly
never want to use two different character tables concurrently.
* No need to keep the characters in a separate file chars.in;
that merely encourages downstream porters to mess with them.
* Sort the characters to agree with the mandoc_chars(7) manual page.
* Specify Unicode codepoints in hex, not decimal (that's the detail
that originally triggered this patch).
No functional change, minus 100 LOC, and i don't see a performance change.
|
| |
|
|
|
|
|
|
| |
of .Do/.Dc, .Dq, .Lb, and .St untouched.
Reduces groff-mandoc differences in OpenBSD base by about 7%.
Reminded of the issue by naddy@.
|
|
|
|
|
|
|
|
|
|
|
| |
escape sequences just like it was earlier implemented for -Thtml.
Do not let control characters other than ASCII 9 (horizontal tab)
propagate to the output, even though groff allows them; but that
really doesn't look like a great idea.
Let mchars_num2char() return int such that we can distinguish invalid \N
syntax from \N'0'. This also reduces the danger of signed char issues
popping up.
|
|
|
|
|
|
|
|
| |
validity of character escape names and warn about unknown ones.
This requires mchars_spec2cp() to report unknown names again.
Fortunately, that doesn't require changing the calling code because
according to groff, invalid character escapes should not produce
output anyway, and now that we warn about them, that's fine.
|
|
|
|
|
| |
Accept only 0xXXXX, 0xYXXXX, 0x10XXXX with Y != 0.
This simplifies mchars_num2uc().
|
|
|
|
|
|
|
|
|
|
| |
In UTF-8 output, do not print anything if mchars_spec2cp() returns 0.
In particular, this repairs handling of zero-width spaces (\&).
While here, let mchars_spec2cp() return 0xFFFD instead of -1
if the character is not found, simplifying the using code.
In HTML output, do not print obfuscated ASCII characters and
do not test for one-char escapes, mchars_spec2cp() already does that.
|
|
|
|
|
|
|
|
| |
sequences above codepoint 512 by doing a reverse lookup in the
existing mandoc_char(7) character table.
Again, groff isn't smart enough to do this and silently discards such
escape sequences without printing anything.
|
|
|
|
|
|
|
|
|
|
|
|
| |
code points, provide ASCII approximations. This is already much better
than what groff does, which prints nothing for most code points.
A few minor fixes while here:
* Handle Unicode escape sequences in the ASCII range.
* In case of errors, use the REPLACEMENT CHARACTER U+FFFD for -Tutf8
and the string "<?>" for -Tascii output.
* Handle all one-character escape sequences in mchars_spec2{cp,str}()
and remove the workarounds on the higher level.
|
|
|
|
|
|
| |
Include <sys/types.h> where needed, it does not belong in config.h.
Remove <stdio.h> from config.h; if it is missing somewhere, it should
be added, but i cannot find a *.c file where it is missing.
|
|
|
|
|
|
|
|
|
|
| |
After decoding numeric (\N) and one-character (\<, \> etc.)
character escape sequences, do not forget to HTML-encode the
resulting ASCII character. Malicious manuals were able to smuggle
XSS content by roff-escaping the HTML-special characters they need.
That's a classic bug type in many web applications, actually... :-(
Found myself while auditing the HTML formatter for safe output handling.
|
|
|
|
|
| |
remove trailing whitespace and blanks before tabs, improve some indenting;
no functional change
|
|
|
|
|
|
|
| |
functions used for multiple languages (mdoc, man, roff), for example
mandoc_escape(), mandoc_getarg(), mandoc_eos(), and generic auxiliary
functions. Split the auxiliaries out into their own file and header.
While here, do some #include cleanup.
|
|
|
|
|
|
|
| |
documented in the Ossanna-Kernighan-Ritter troff manual
and also supported by groff.
Missing feature reported by Steffen Nurpmeso <sdaoden at gmail dot com>.
|
|
|
|
|
|
|
|
|
|
|
| |
* Parsing macro arguments has to be done in copy mode,
which implies replacing "\t" by a literal tab character.
* Otherwise, render "\t" as the empty string, not as a 't' character.
This fixes formatting of the distfile example in the oldrdist(1) manual.
This also shows up in the unzip(1) manual as one of several issues
preventing the removal of USE_GROFF from the archivers/unzip port.
Thanks to espie@ for attracting my attention to the unzip(1) manual.
|
|
|
|
|
|
|
| |
data pointed to, pass the size of the right pointer type to calloc;
cosmetic issue reported by Ulrich Spoerlein <uqs@spoerlein.net>
found in Coverity Scan CID 978734.
No binary change - ok cmp(1).
|
|
|
|
| |
dumbness on my part.
|
|
|
|
| |
found while syncing to OpenBSD
|
|
|
|
|
|
|
| |
* Do not pass integers outside the ASCII range to isprint().
* Make sure escaped characters are really printed verbatim
when the escape sequence has no special meaning.
ok kristaps@
|
| |
|
|
|
|
| |
effect.
|
|
|
|
| |
hash chain or an extra function for checking matches.
|
|
|
|
| |
the libroff point. This clears up a nice chunk of code.
|
|
|
|
|
| |
to predefs.in. This also makes "BOTH" entries directly into CHAR. The
res2str and spec2str are now effectively the same function.
|
| |
|
|
|
|
|
|
| |
only used once and simply bloated the binary. Also fix mchars_num2char
to correctly render the character instead of using atoi(). This makes
the conversation more strict, but it's more correct.
|
|
|
|
| |
(oops). Do the same for -Thtml (oops^2).
|
|
|
|
|
|
| |
consist of type "int". This will take more work (especially in encode and
friends), but this is a strong start. This commit also consists of some
harmless lint fixes.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
change any code but for renaming functions and types to be consistent
with other mandoc.h stuff. The reason for moving into libmandoc is that
the rendering of special characters is part of mandoc itself---not an
external part. From mandoc(1)'s perspective, this changes nothing, but
for other utilities, it's important to have these part of libmandoc.
Note this isn't documented [yet] in mandoc.3 because there are some
parts I'd like to change around beforehand.
|
|
|
|
|
|
| |
groff's tmac.doc package. Originally noted by Matthew Dempsky.
Feedback by Jason McIntyre, joerg@, and schwarze@. Also add some
documentation about predefined strings, tweaked by schwarze@.
|
|
|
|
|
| |
necessary to all [real] front-ends, so stop pretending it's special.
While here, add some documentation to the variable types.
|
|
|
|
|
|
|
|
| |
so that everybody can use them. This follows the convention of
libXXXX.h being internal to a library and XXXX.h being the external
interface. Not only does this allow the removal of lots of redundant
NULL-checking code, it also sets the tone for adding new mandoc-global
routines.
|
|
|
|
|
|
|
| |
Don't use it in new manuals, it is inherently non-portable, but we
need it for backward-compatibility with existing manuals, for example
in Xenocara driver pages.
ok kristaps@ jmc@ and tested by Matthieu Herrb (matthieu at openbsd dot org)
|
|
|
|
|
|
|
| |
existing 'struct tbl' as 'struct tbl_node', then move all option stuff
into a 'struct tbl' in mandoc.h.
This conflicted with a structure in chars.c, which was renamed.
|