summaryrefslogtreecommitdiffstats
path: root/chars.c
Commit message (Collapse)AuthorAgeFilesLines
* In groff commit 78e66624 on May 7 20:15:33 2021 +1000,Ingo Schwarze2022-06-261-1/+1
| | | | | | G. Branden Robinson changed the -T ascii rendering of \(sd, the "second" symbol, U+2033 DOUBLE PRIME, from '' to ". Follow suit in mandoc.
* Since \. is not a character escape sequence, re-classify it from theIngo Schwarze2022-06-021-2/+1
| | | | | | | | | | | | | wrong parsing class ESCAPE_SPECIAL to the better-suited parsing class ESCAPE_UNDEF, exactly like it is already done for the similar \\, which isn't a character escape sequence either. No formatting change is intended just yet, but this will matter for upcoming improvements in the parser for roff(7) macro, string, and register names. See the node "5.23.2 Copy Mode" in "info groff" regarding what \\ and \. really mean.
* Digit-width and narrow spaces are non-breaking.Ingo Schwarze2020-02-131-2/+3
| | | | Noticed because Branden Robinson worked on related documentation in groff.
* Several improvements to escape sequence handling.Ingo Schwarze2018-12-151-15/+5
| | | | | | | | | | | | | | | | | | | | | | | * Add the missing special character \_ (underscore). * Partial implementations of \a (leader character) and \E (uninterpreted escape character). * Parse and ignore \r (reverse line feed). * Add a WARNING message about undefined escape sequences. * Add an UNSUPP message about unsupported escape sequences. * Mark \! and \? (transparent throughput) and \O (suppress output) as unsupported. * Treat the various variants of zero-width spaces as one-byte escape sequences rather than as special characters, to avoid defining bogus forms with square brackets. * For special characters with one-byte names, do not define bogus forms with square brackets, except for \[-], which is valid. * In the form with square brackets, undefined special characters do not fall back to printing the name verbatim, not even for one-byte names. * Starting a special character name with a blank is an error. * Undefined escape sequences never abort formatting of the input string, not even in HTML output mode. * Document the newly handled escapes, and a few that were missing. * Regression tests for most of the above.
* Major cleanup; may imply minor changes in edge cases of error reporting.Ingo Schwarze2018-12-141-0/+1
| | | | | | | | | | | Finally, drop support for the run-time configurable mandocmsg() callback. It was over-engineered from the start, never used for anything in a decade, and repeatedly caused maintenance headaches. Consolidate reporting infrastructure into two files, mandoc.h and mandoc_msg.c, mopping up the bits and pieces that were scattered around main.c, read.c, mandoc_parse.h, libmandoc.h, the prototypes of four parsing-related functions, and both parser structs.
* Improve the ASCII rendering of \(Po (Pound Sterling)Ingo Schwarze2018-08-211-5/+5
| | | | | and of the playing card suits to match groff, using feedback from Ralph Corderoy <ralph at inputplus dot co dot uk>.
* Fix some issues found looking at groff_char(7):Ingo Schwarze2018-08-211-1/+4
| | | | | | * Add two missing characters, \('Y and \('y. * The Weierstrass p is not capital, see http://unicode.org/notes/tn27/. * Add a groff-compatible ASCII transliteration for U+02DC: "~".
* Add the \) special character, a variant of \& so arcane that iIngo Schwarze2018-08-191-0/+1
| | | | | intentionally leave it undocumented. Abused for example in the groff(7) manual page.
* Improve ASCII rendering of a few rare character escape sequencesIngo Schwarze2017-08-231-7/+7
| | | | | that can be changed unilaterally because groff fails to render them at all.
* Switch ASCII rendering of the same mathematical symbols and greekIngo Schwarze2017-08-231-77/+77
| | | | | | | | | | letters as in groff commit babca15f from trying to imitate the characters' graphical shapes, which resulted in unintelligible renderings in many cases, to transliterations conveying the characters' meanings. One benefit is making these characters usable for portable manual pages. Solving a problem pointed out by bentley@.
* add the \(ru (0.5m baseline ruler) character escape sequence,Ingo Schwarze2017-06-141-0/+1
| | | | abused by mail/nmh; groff_char(7) confirms that this really exists
* add about 15 missing character escape sequences found in groff_char(7);Ingo Schwarze2017-06-021-1/+17
| | | | triggered by multimedia/mkvtoolnix mkvmerge(1) using \(S2
* Many people have been complaining for a long time that ``...'' looksIngo Schwarze2017-02-171-2/+2
| | | | | | | | | | | ugly in -Tascii output. For that reason, bentley@ submitted patches to render "..." instead to groff in November 2014 (yes, more than two years ago). Carsten Kunze yesterday merged them for the upcoming groff-1.22.4 release. Yay! Consequently, do the same in mandoc: Render \(Lq and \(Rq (which are used for .Do, .Dq, .Lb, and .St) as '"' in -Tascii output. All other output modes including -Tutf8 remain unchanged.
* Major character table cleanup:Ingo Schwarze2015-10-131-80/+399
| | | | | | | | | | | | | * Use ohash(3) rather than a hand-rolled hash table. * Make the character table static in the chars.c module: There is no need to pass a pointer around, we most certainly never want to use two different character tables concurrently. * No need to keep the characters in a separate file chars.in; that merely encourages downstream porters to mess with them. * Sort the characters to agree with the mandoc_chars(7) manual page. * Specify Unicode codepoints in hex, not decimal (that's the detail that originally triggered this patch). No functional change, minus 100 LOC, and i don't see a performance change.
* modernize style: "return" is not a functionIngo Schwarze2015-10-061-11/+11
|
* Render \(lq and \(rq as '"' in -Tascii mode but leave the renderingIngo Schwarze2015-02-171-1/+1
| | | | | | of .Do/.Dc, .Dq, .Lb, and .St untouched. Reduces groff-mandoc differences in OpenBSD base by about 7%. Reminded of the issue by naddy@.
* In terminal output, unify handling of Unicode and numbered characterIngo Schwarze2014-10-291-5/+3
| | | | | | | | | | | escape sequences just like it was earlier implemented for -Thtml. Do not let control characters other than ASCII 9 (horizontal tab) propagate to the output, even though groff allows them; but that really doesn't look like a great idea. Let mchars_num2char() return int such that we can distinguish invalid \N syntax from \N'0'. This also reduces the danger of signed char issues popping up.
* Make the character table available to libroff so it can check theIngo Schwarze2014-10-281-1/+1
| | | | | | | | validity of character escape names and warn about unknown ones. This requires mchars_spec2cp() to report unknown names again. Fortunately, that doesn't require changing the calling code because according to groff, invalid character escapes should not produce output anyway, and now that we warn about them, that's fine.
* Tighten Unicode escape name parsing.Ingo Schwarze2014-10-281-8/+3
| | | | | Accept only 0xXXXX, 0xYXXXX, 0x10XXXX with Y != 0. This simplifies mchars_num2uc().
* Fix a regression in term.c rev. 1.229 reported by bentley@:Ingo Schwarze2014-10-271-1/+1
| | | | | | | | | | In UTF-8 output, do not print anything if mchars_spec2cp() returns 0. In particular, this repairs handling of zero-width spaces (\&). While here, let mchars_spec2cp() return 0xFFFD instead of -1 if the character is not found, simplifying the using code. In HTML output, do not print obfuscated ASCII characters and do not test for one-char escapes, mchars_spec2cp() already does that.
* In -Tascii mode, provide approximations even for some Unicode escapeIngo Schwarze2014-10-261-0/+11
| | | | | | | | sequences above codepoint 512 by doing a reverse lookup in the existing mandoc_char(7) character table. Again, groff isn't smart enough to do this and silently discards such escape sequences without printing anything.
* Improve -Tascii output for Unicode escape sequences: For the first 512Ingo Schwarze2014-10-261-15/+6
| | | | | | | | | | | | code points, provide ASCII approximations. This is already much better than what groff does, which prints nothing for most code points. A few minor fixes while here: * Handle Unicode escape sequences in the ASCII range. * In case of errors, use the REPLACEMENT CHARACTER U+FFFD for -Tutf8 and the string "<?>" for -Tascii output. * Handle all one-character escape sequences in mchars_spec2{cp,str}() and remove the workarounds on the higher level.
* Get rid of HAVE_CONFIG_H, it is always defined; idea from libnbcompat.Ingo Schwarze2014-08-101-2/+2
| | | | | | Include <sys/types.h> where needed, it does not belong in config.h. Remove <stdio.h> from config.h; if it is missing somewhere, it should be added, but i cannot find a *.c file where it is missing.
* Security fix:Ingo Schwarze2014-07-231-1/+12
| | | | | | | | | | After decoding numeric (\N) and one-character (\<, \> etc.) character escape sequences, do not forget to HTML-encode the resulting ASCII character. Malicious manuals were able to smuggle XSS content by roff-escaping the HTML-special characters they need. That's a classic bug type in many web applications, actually... :-( Found myself while auditing the HTML formatter for safe output handling.
* KNF: case (FOO): -> case FOO:, remove /* LINTED */ and /* ARGSUSED */,Ingo Schwarze2014-04-201-8/+9
| | | | | remove trailing whitespace and blanks before tabs, improve some indenting; no functional change
* The files mandoc.c and mandoc.h contained both specialised low-levelIngo Schwarze2014-03-231-0/+1
| | | | | | | functions used for multiple languages (mdoc, man, roff), for example mandoc_escape(), mandoc_getarg(), mandoc_eos(), and generic auxiliary functions. Split the auxiliaries out into their own file and header. While here, do some #include cleanup.
* Implement the \: (optional line break) escape sequence,Ingo Schwarze2014-01-221-1/+1
| | | | | | | documented in the Ossanna-Kernighan-Ritter troff manual and also supported by groff. Missing feature reported by Steffen Nurpmeso <sdaoden at gmail dot com>.
* Improve handling of the roff(7) "\t" escape sequence:Ingo Schwarze2013-06-201-1/+1
| | | | | | | | | | | * Parsing macro arguments has to be done in copy mode, which implies replacing "\t" by a literal tab character. * Otherwise, render "\t" as the empty string, not as a 't' character. This fixes formatting of the distfile example in the oldrdist(1) manual. This also shows up in the unzip(1) manual as one of several issues preventing the removal of USE_GROFF from the archivers/unzip port. Thanks to espie@ for attracting my attention to the unzip(1) manual.
* Even though the size of a pointer should not depend on the type of theIngo Schwarze2013-05-181-1/+1
| | | | | | | data pointed to, pass the size of the right pointer type to calloc; cosmetic issue reported by Ulrich Spoerlein <uqs@spoerlein.net> found in Coverity Scan CID 978734. No binary change - ok cmp(1).
* Const-ify some mchars arguments. I think these are non-const for historicalKristaps Dzonsons2011-11-081-6/+9
| | | | dumbness on my part.
* forgotten Copyright bumps; no code changeIngo Schwarze2011-09-181-1/+1
| | | | found while syncing to OpenBSD
* Regression fixes after merging 1.11.3 to OpenBSD (rev. 1.20):Ingo Schwarze2011-07-311-2/+4
| | | | | | | * Do not pass integers outside the ASCII range to isprint(). * Make sure escaped characters are really printed verbatim when the escape sequence has no special meaning. ok kristaps@
* Add support for 1/2, 1/4, and 3/4 (needed by eqn).Kristaps Dzonsons2011-07-221-1/+1
|
* Support `size' constructs in eqn.7. Generalise mandoc_strontou to thisKristaps Dzonsons2011-07-211-2/+2
| | | | effect.
* Simplify chars.c---there's really no need for extra code to reorder theKristaps Dzonsons2011-07-071-60/+7
| | | | hash chain or an extra function for checking matches.
* Remove all references to ESCAPE_PREDEF, which is now not exposed passedKristaps Dzonsons2011-05-241-31/+0
| | | | the libroff point. This clears up a nice chunk of code.
* Remove predefined strings from the chars.in file, as they're now localKristaps Dzonsons2011-05-241-22/+11
| | | | | to predefs.in. This also makes "BOTH" entries directly into CHAR. The res2str and spec2str are now effectively the same function.
* Flip on unicode output (via \[uNNNN]) in -T[x]html. Here we go!Kristaps Dzonsons2011-05-171-2/+16
|
* Remove function calls to res() and so forth in term_word(). These wereKristaps Dzonsons2011-05-151-3/+2
| | | | | | only used once and simply bloated the binary. Also fix mchars_num2char to correctly render the character instead of using atoi(). This makes the conversation more strict, but it's more correct.
* Fix missing support for \N'n' when calculating string widths in -TasciiKristaps Dzonsons2011-05-151-0/+1
| | | | (oops). Do the same for -Thtml (oops^2).
* Make character engine (-Tascii, -Tpdf, -Tps) ready for Unicode: make bufferKristaps Dzonsons2011-05-141-1/+2
| | | | | | consist of type "int". This will take more work (especially in encode and friends), but this is a strong start. This commit also consists of some harmless lint fixes.
* Filter all \N'' values with isprint(). Ok schwarze@.Kristaps Dzonsons2011-05-011-9/+5
|
* Make mchars_num2char() return a char like it says.Kristaps Dzonsons2011-04-301-10/+10
|
* Rename mchars_init() -> mchars_alloc() for consistency.Kristaps Dzonsons2011-04-301-1/+1
|
* Remove enum mcharst, which hasn't been used in quite some time.Kristaps Dzonsons2011-04-301-3/+1
|
* Move "chars" interface out of out.h and into mandoc.h. This doesn'tKristaps Dzonsons2011-04-291-28/+20
| | | | | | | | | | change any code but for renaming functions and types to be consistent with other mandoc.h stuff. The reason for moving into libmandoc is that the rendering of special characters is part of mandoc itself---not an external part. From mandoc(1)'s perspective, this changes nothing, but for other utilities, it's important to have these part of libmandoc. Note this isn't documented [yet] in mandoc.3 because there are some parts I'd like to change around beforehand.
* Add \*(Ai (ANSI) and \*(Px (POSIX) predefined strings, which are part ofKristaps Dzonsons2011-04-201-1/+1
| | | | | | groff's tmac.doc package. Originally noted by Matthew Dempsky. Feedback by Jason McIntyre, joerg@, and schwarze@. Also add some documentation about predefined strings, tweaked by schwarze@.
* Step 4: merge chars.h into out.h. The functions in this file areKristaps Dzonsons2011-03-221-1/+1
| | | | | necessary to all [real] front-ends, so stop pretending it's special. While here, add some documentation to the variable types.
* Move mandoc_{realloc,malloc,calloc} out of libmandoc.h and into mandoc.hKristaps Dzonsons2011-03-171-11/+2
| | | | | | | | so that everybody can use them. This follows the convention of libXXXX.h being internal to a library and XXXX.h being the external interface. Not only does this allow the removal of lots of redundant NULL-checking code, it also sets the tone for adding new mandoc-global routines.
* Implement the \N'number' (numbered character) roff escape sequence.Ingo Schwarze2011-01-301-0/+22
| | | | | | | Don't use it in new manuals, it is inherently non-portable, but we need it for backward-compatibility with existing manuals, for example in Xenocara driver pages. ok kristaps@ jmc@ and tested by Matthieu Herrb (matthieu at openbsd dot org)