| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
that can be changed unilaterally because groff fails to render them
at all.
|
|
|
|
|
|
|
|
|
|
| |
letters as in groff commit babca15f from trying to imitate the
characters' graphical shapes, which resulted in unintelligible
renderings in many cases, to transliterations conveying the characters'
meanings. One benefit is making these characters usable for portable
manual pages.
Solving a problem pointed out by bentley@.
|
|
|
|
| |
abused by mail/nmh; groff_char(7) confirms that this really exists
|
|
|
|
| |
triggered by multimedia/mkvtoolnix mkvmerge(1) using \(S2
|
|
|
|
|
|
|
|
|
|
|
| |
ugly in -Tascii output. For that reason, bentley@ submitted patches
to render "..." instead to groff in November 2014 (yes, more than
two years ago). Carsten Kunze yesterday merged them for the upcoming
groff-1.22.4 release. Yay!
Consequently, do the same in mandoc: Render \(Lq and \(Rq (which
are used for .Do, .Dq, .Lb, and .St) as '"' in -Tascii output.
All other output modes including -Tutf8 remain unchanged.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Use ohash(3) rather than a hand-rolled hash table.
* Make the character table static in the chars.c module:
There is no need to pass a pointer around, we most certainly
never want to use two different character tables concurrently.
* No need to keep the characters in a separate file chars.in;
that merely encourages downstream porters to mess with them.
* Sort the characters to agree with the mandoc_chars(7) manual page.
* Specify Unicode codepoints in hex, not decimal (that's the detail
that originally triggered this patch).
No functional change, minus 100 LOC, and i don't see a performance change.
|
| |
|
|
|
|
|
|
| |
of .Do/.Dc, .Dq, .Lb, and .St untouched.
Reduces groff-mandoc differences in OpenBSD base by about 7%.
Reminded of the issue by naddy@.
|
|
|
|
|
|
|
|
|
|
|
| |
escape sequences just like it was earlier implemented for -Thtml.
Do not let control characters other than ASCII 9 (horizontal tab)
propagate to the output, even though groff allows them; but that
really doesn't look like a great idea.
Let mchars_num2char() return int such that we can distinguish invalid \N
syntax from \N'0'. This also reduces the danger of signed char issues
popping up.
|
|
|
|
|
|
|
|
| |
validity of character escape names and warn about unknown ones.
This requires mchars_spec2cp() to report unknown names again.
Fortunately, that doesn't require changing the calling code because
according to groff, invalid character escapes should not produce
output anyway, and now that we warn about them, that's fine.
|
|
|
|
|
| |
Accept only 0xXXXX, 0xYXXXX, 0x10XXXX with Y != 0.
This simplifies mchars_num2uc().
|
|
|
|
|
|
|
|
|
|
| |
In UTF-8 output, do not print anything if mchars_spec2cp() returns 0.
In particular, this repairs handling of zero-width spaces (\&).
While here, let mchars_spec2cp() return 0xFFFD instead of -1
if the character is not found, simplifying the using code.
In HTML output, do not print obfuscated ASCII characters and
do not test for one-char escapes, mchars_spec2cp() already does that.
|
|
|
|
|
|
|
|
| |
sequences above codepoint 512 by doing a reverse lookup in the
existing mandoc_char(7) character table.
Again, groff isn't smart enough to do this and silently discards such
escape sequences without printing anything.
|
|
|
|
|
|
|
|
|
|
|
|
| |
code points, provide ASCII approximations. This is already much better
than what groff does, which prints nothing for most code points.
A few minor fixes while here:
* Handle Unicode escape sequences in the ASCII range.
* In case of errors, use the REPLACEMENT CHARACTER U+FFFD for -Tutf8
and the string "<?>" for -Tascii output.
* Handle all one-character escape sequences in mchars_spec2{cp,str}()
and remove the workarounds on the higher level.
|
|
|
|
|
|
| |
Include <sys/types.h> where needed, it does not belong in config.h.
Remove <stdio.h> from config.h; if it is missing somewhere, it should
be added, but i cannot find a *.c file where it is missing.
|
|
|
|
|
|
|
|
|
|
| |
After decoding numeric (\N) and one-character (\<, \> etc.)
character escape sequences, do not forget to HTML-encode the
resulting ASCII character. Malicious manuals were able to smuggle
XSS content by roff-escaping the HTML-special characters they need.
That's a classic bug type in many web applications, actually... :-(
Found myself while auditing the HTML formatter for safe output handling.
|
|
|
|
|
| |
remove trailing whitespace and blanks before tabs, improve some indenting;
no functional change
|
|
|
|
|
|
|
| |
functions used for multiple languages (mdoc, man, roff), for example
mandoc_escape(), mandoc_getarg(), mandoc_eos(), and generic auxiliary
functions. Split the auxiliaries out into their own file and header.
While here, do some #include cleanup.
|
|
|
|
|
|
|
| |
documented in the Ossanna-Kernighan-Ritter troff manual
and also supported by groff.
Missing feature reported by Steffen Nurpmeso <sdaoden at gmail dot com>.
|
|
|
|
|
|
|
|
|
|
|
| |
* Parsing macro arguments has to be done in copy mode,
which implies replacing "\t" by a literal tab character.
* Otherwise, render "\t" as the empty string, not as a 't' character.
This fixes formatting of the distfile example in the oldrdist(1) manual.
This also shows up in the unzip(1) manual as one of several issues
preventing the removal of USE_GROFF from the archivers/unzip port.
Thanks to espie@ for attracting my attention to the unzip(1) manual.
|
|
|
|
|
|
|
| |
data pointed to, pass the size of the right pointer type to calloc;
cosmetic issue reported by Ulrich Spoerlein <uqs@spoerlein.net>
found in Coverity Scan CID 978734.
No binary change - ok cmp(1).
|
|
|
|
| |
dumbness on my part.
|
|
|
|
| |
found while syncing to OpenBSD
|
|
|
|
|
|
|
| |
* Do not pass integers outside the ASCII range to isprint().
* Make sure escaped characters are really printed verbatim
when the escape sequence has no special meaning.
ok kristaps@
|
| |
|
|
|
|
| |
effect.
|
|
|
|
| |
hash chain or an extra function for checking matches.
|
|
|
|
| |
the libroff point. This clears up a nice chunk of code.
|
|
|
|
|
| |
to predefs.in. This also makes "BOTH" entries directly into CHAR. The
res2str and spec2str are now effectively the same function.
|
| |
|
|
|
|
|
|
| |
only used once and simply bloated the binary. Also fix mchars_num2char
to correctly render the character instead of using atoi(). This makes
the conversation more strict, but it's more correct.
|
|
|
|
| |
(oops). Do the same for -Thtml (oops^2).
|
|
|
|
|
|
| |
consist of type "int". This will take more work (especially in encode and
friends), but this is a strong start. This commit also consists of some
harmless lint fixes.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
change any code but for renaming functions and types to be consistent
with other mandoc.h stuff. The reason for moving into libmandoc is that
the rendering of special characters is part of mandoc itself---not an
external part. From mandoc(1)'s perspective, this changes nothing, but
for other utilities, it's important to have these part of libmandoc.
Note this isn't documented [yet] in mandoc.3 because there are some
parts I'd like to change around beforehand.
|
|
|
|
|
|
| |
groff's tmac.doc package. Originally noted by Matthew Dempsky.
Feedback by Jason McIntyre, joerg@, and schwarze@. Also add some
documentation about predefined strings, tweaked by schwarze@.
|
|
|
|
|
| |
necessary to all [real] front-ends, so stop pretending it's special.
While here, add some documentation to the variable types.
|
|
|
|
|
|
|
|
| |
so that everybody can use them. This follows the convention of
libXXXX.h being internal to a library and XXXX.h being the external
interface. Not only does this allow the removal of lots of redundant
NULL-checking code, it also sets the tone for adding new mandoc-global
routines.
|
|
|
|
|
|
|
| |
Don't use it in new manuals, it is inherently non-portable, but we
need it for backward-compatibility with existing manuals, for example
in Xenocara driver pages.
ok kristaps@ jmc@ and tested by Matthieu Herrb (matthieu at openbsd dot org)
|
|
|
|
|
|
|
| |
existing 'struct tbl' as 'struct tbl_node', then move all option stuff
into a 'struct tbl' in mandoc.h.
This conflicted with a structure in chars.c, which was renamed.
|
|
|
|
|
|
| |
O- because the underlying macro depends on \(*W, which a prior pod2man
preamble `tr' macro rewrites as "-". This is an error in groff as this
tramples on the real \(*W, or Greek omega.
|
| |
|
|
|
|
| |
assigned within the pod2man preamble.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We now have sufficient practical experience to know what we want,
so this is intended to be final:
- provide -Wlevel (warning, error or fatal) to select what you care about
- provide -Wstop to stop after parsing a file with warnings you care about
- provide consistent exit status codes for those warnings you care about
- fully document what warnings, errors and fatal errors mean
- remove all other cruft from the user interface, less is more:
- remove all -f knobs along with the whole -f option
- remove the old -Werror because calling warnings "fatal" is silly
- always finish parsing each file, unless fatal errors prevent that
This commit also includes a couple of related simplifications behind
the scenes regarding error handling.
Feedback and OK kristaps@; Joerg Sonnenberger (NetBSD) and
Sascha Wildner (DragonFly BSD) agree with the general direction.
|
|
|
|
|
|
| |
when being used in manuals. Since we now support `ds', it's no longer
necessary to account for it. From a bug report originally by Thomas
Jeunet.
|
|
|
|
|
| |
I checked that substantial changes were committed
to these files during these years.
|
|
|
|
|
|
| |
the overhead of running strlen() for ASCII strings (yes, I benchmarked
this running mandoc_char(7) as input again and again with
hundredth-second penalties... on my slow-ass alpha).
|