summaryrefslogtreecommitdiffstats
path: root/chars.c
Commit message (Collapse)AuthorAgeFilesLines
* Many people have been complaining for a long time that ``...'' looksIngo Schwarze2017-02-171-2/+2
| | | | | | | | | | | ugly in -Tascii output. For that reason, bentley@ submitted patches to render "..." instead to groff in November 2014 (yes, more than two years ago). Carsten Kunze yesterday merged them for the upcoming groff-1.22.4 release. Yay! Consequently, do the same in mandoc: Render \(Lq and \(Rq (which are used for .Do, .Dq, .Lb, and .St) as '"' in -Tascii output. All other output modes including -Tutf8 remain unchanged.
* Major character table cleanup:Ingo Schwarze2015-10-131-80/+399
| | | | | | | | | | | | | * Use ohash(3) rather than a hand-rolled hash table. * Make the character table static in the chars.c module: There is no need to pass a pointer around, we most certainly never want to use two different character tables concurrently. * No need to keep the characters in a separate file chars.in; that merely encourages downstream porters to mess with them. * Sort the characters to agree with the mandoc_chars(7) manual page. * Specify Unicode codepoints in hex, not decimal (that's the detail that originally triggered this patch). No functional change, minus 100 LOC, and i don't see a performance change.
* modernize style: "return" is not a functionIngo Schwarze2015-10-061-11/+11
|
* Render \(lq and \(rq as '"' in -Tascii mode but leave the renderingIngo Schwarze2015-02-171-1/+1
| | | | | | of .Do/.Dc, .Dq, .Lb, and .St untouched. Reduces groff-mandoc differences in OpenBSD base by about 7%. Reminded of the issue by naddy@.
* In terminal output, unify handling of Unicode and numbered characterIngo Schwarze2014-10-291-5/+3
| | | | | | | | | | | escape sequences just like it was earlier implemented for -Thtml. Do not let control characters other than ASCII 9 (horizontal tab) propagate to the output, even though groff allows them; but that really doesn't look like a great idea. Let mchars_num2char() return int such that we can distinguish invalid \N syntax from \N'0'. This also reduces the danger of signed char issues popping up.
* Make the character table available to libroff so it can check theIngo Schwarze2014-10-281-1/+1
| | | | | | | | validity of character escape names and warn about unknown ones. This requires mchars_spec2cp() to report unknown names again. Fortunately, that doesn't require changing the calling code because according to groff, invalid character escapes should not produce output anyway, and now that we warn about them, that's fine.
* Tighten Unicode escape name parsing.Ingo Schwarze2014-10-281-8/+3
| | | | | Accept only 0xXXXX, 0xYXXXX, 0x10XXXX with Y != 0. This simplifies mchars_num2uc().
* Fix a regression in term.c rev. 1.229 reported by bentley@:Ingo Schwarze2014-10-271-1/+1
| | | | | | | | | | In UTF-8 output, do not print anything if mchars_spec2cp() returns 0. In particular, this repairs handling of zero-width spaces (\&). While here, let mchars_spec2cp() return 0xFFFD instead of -1 if the character is not found, simplifying the using code. In HTML output, do not print obfuscated ASCII characters and do not test for one-char escapes, mchars_spec2cp() already does that.
* In -Tascii mode, provide approximations even for some Unicode escapeIngo Schwarze2014-10-261-0/+11
| | | | | | | | sequences above codepoint 512 by doing a reverse lookup in the existing mandoc_char(7) character table. Again, groff isn't smart enough to do this and silently discards such escape sequences without printing anything.
* Improve -Tascii output for Unicode escape sequences: For the first 512Ingo Schwarze2014-10-261-15/+6
| | | | | | | | | | | | code points, provide ASCII approximations. This is already much better than what groff does, which prints nothing for most code points. A few minor fixes while here: * Handle Unicode escape sequences in the ASCII range. * In case of errors, use the REPLACEMENT CHARACTER U+FFFD for -Tutf8 and the string "<?>" for -Tascii output. * Handle all one-character escape sequences in mchars_spec2{cp,str}() and remove the workarounds on the higher level.
* Get rid of HAVE_CONFIG_H, it is always defined; idea from libnbcompat.Ingo Schwarze2014-08-101-2/+2
| | | | | | Include <sys/types.h> where needed, it does not belong in config.h. Remove <stdio.h> from config.h; if it is missing somewhere, it should be added, but i cannot find a *.c file where it is missing.
* Security fix:Ingo Schwarze2014-07-231-1/+12
| | | | | | | | | | After decoding numeric (\N) and one-character (\<, \> etc.) character escape sequences, do not forget to HTML-encode the resulting ASCII character. Malicious manuals were able to smuggle XSS content by roff-escaping the HTML-special characters they need. That's a classic bug type in many web applications, actually... :-( Found myself while auditing the HTML formatter for safe output handling.
* KNF: case (FOO): -> case FOO:, remove /* LINTED */ and /* ARGSUSED */,Ingo Schwarze2014-04-201-8/+9
| | | | | remove trailing whitespace and blanks before tabs, improve some indenting; no functional change
* The files mandoc.c and mandoc.h contained both specialised low-levelIngo Schwarze2014-03-231-0/+1
| | | | | | | functions used for multiple languages (mdoc, man, roff), for example mandoc_escape(), mandoc_getarg(), mandoc_eos(), and generic auxiliary functions. Split the auxiliaries out into their own file and header. While here, do some #include cleanup.
* Implement the \: (optional line break) escape sequence,Ingo Schwarze2014-01-221-1/+1
| | | | | | | documented in the Ossanna-Kernighan-Ritter troff manual and also supported by groff. Missing feature reported by Steffen Nurpmeso <sdaoden at gmail dot com>.
* Improve handling of the roff(7) "\t" escape sequence:Ingo Schwarze2013-06-201-1/+1
| | | | | | | | | | | * Parsing macro arguments has to be done in copy mode, which implies replacing "\t" by a literal tab character. * Otherwise, render "\t" as the empty string, not as a 't' character. This fixes formatting of the distfile example in the oldrdist(1) manual. This also shows up in the unzip(1) manual as one of several issues preventing the removal of USE_GROFF from the archivers/unzip port. Thanks to espie@ for attracting my attention to the unzip(1) manual.
* Even though the size of a pointer should not depend on the type of theIngo Schwarze2013-05-181-1/+1
| | | | | | | data pointed to, pass the size of the right pointer type to calloc; cosmetic issue reported by Ulrich Spoerlein <uqs@spoerlein.net> found in Coverity Scan CID 978734. No binary change - ok cmp(1).
* Const-ify some mchars arguments. I think these are non-const for historicalKristaps Dzonsons2011-11-081-6/+9
| | | | dumbness on my part.
* forgotten Copyright bumps; no code changeIngo Schwarze2011-09-181-1/+1
| | | | found while syncing to OpenBSD
* Regression fixes after merging 1.11.3 to OpenBSD (rev. 1.20):Ingo Schwarze2011-07-311-2/+4
| | | | | | | * Do not pass integers outside the ASCII range to isprint(). * Make sure escaped characters are really printed verbatim when the escape sequence has no special meaning. ok kristaps@
* Add support for 1/2, 1/4, and 3/4 (needed by eqn).Kristaps Dzonsons2011-07-221-1/+1
|
* Support `size' constructs in eqn.7. Generalise mandoc_strontou to thisKristaps Dzonsons2011-07-211-2/+2
| | | | effect.
* Simplify chars.c---there's really no need for extra code to reorder theKristaps Dzonsons2011-07-071-60/+7
| | | | hash chain or an extra function for checking matches.
* Remove all references to ESCAPE_PREDEF, which is now not exposed passedKristaps Dzonsons2011-05-241-31/+0
| | | | the libroff point. This clears up a nice chunk of code.
* Remove predefined strings from the chars.in file, as they're now localKristaps Dzonsons2011-05-241-22/+11
| | | | | to predefs.in. This also makes "BOTH" entries directly into CHAR. The res2str and spec2str are now effectively the same function.
* Flip on unicode output (via \[uNNNN]) in -T[x]html. Here we go!Kristaps Dzonsons2011-05-171-2/+16
|
* Remove function calls to res() and so forth in term_word(). These wereKristaps Dzonsons2011-05-151-3/+2
| | | | | | only used once and simply bloated the binary. Also fix mchars_num2char to correctly render the character instead of using atoi(). This makes the conversation more strict, but it's more correct.
* Fix missing support for \N'n' when calculating string widths in -TasciiKristaps Dzonsons2011-05-151-0/+1
| | | | (oops). Do the same for -Thtml (oops^2).
* Make character engine (-Tascii, -Tpdf, -Tps) ready for Unicode: make bufferKristaps Dzonsons2011-05-141-1/+2
| | | | | | consist of type "int". This will take more work (especially in encode and friends), but this is a strong start. This commit also consists of some harmless lint fixes.
* Filter all \N'' values with isprint(). Ok schwarze@.Kristaps Dzonsons2011-05-011-9/+5
|
* Make mchars_num2char() return a char like it says.Kristaps Dzonsons2011-04-301-10/+10
|
* Rename mchars_init() -> mchars_alloc() for consistency.Kristaps Dzonsons2011-04-301-1/+1
|
* Remove enum mcharst, which hasn't been used in quite some time.Kristaps Dzonsons2011-04-301-3/+1
|
* Move "chars" interface out of out.h and into mandoc.h. This doesn'tKristaps Dzonsons2011-04-291-28/+20
| | | | | | | | | | change any code but for renaming functions and types to be consistent with other mandoc.h stuff. The reason for moving into libmandoc is that the rendering of special characters is part of mandoc itself---not an external part. From mandoc(1)'s perspective, this changes nothing, but for other utilities, it's important to have these part of libmandoc. Note this isn't documented [yet] in mandoc.3 because there are some parts I'd like to change around beforehand.
* Add \*(Ai (ANSI) and \*(Px (POSIX) predefined strings, which are part ofKristaps Dzonsons2011-04-201-1/+1
| | | | | | groff's tmac.doc package. Originally noted by Matthew Dempsky. Feedback by Jason McIntyre, joerg@, and schwarze@. Also add some documentation about predefined strings, tweaked by schwarze@.
* Step 4: merge chars.h into out.h. The functions in this file areKristaps Dzonsons2011-03-221-1/+1
| | | | | necessary to all [real] front-ends, so stop pretending it's special. While here, add some documentation to the variable types.
* Move mandoc_{realloc,malloc,calloc} out of libmandoc.h and into mandoc.hKristaps Dzonsons2011-03-171-11/+2
| | | | | | | | so that everybody can use them. This follows the convention of libXXXX.h being internal to a library and XXXX.h being the external interface. Not only does this allow the removal of lots of redundant NULL-checking code, it also sets the tone for adding new mandoc-global routines.
* Implement the \N'number' (numbered character) roff escape sequence.Ingo Schwarze2011-01-301-0/+22
| | | | | | | Don't use it in new manuals, it is inherently non-portable, but we need it for backward-compatibility with existing manuals, for example in Xenocara driver pages. ok kristaps@ jmc@ and tested by Matthieu Herrb (matthieu at openbsd dot org)
* Churn to get parts of 'struct tbl' visible from mandoc.h: rename theKristaps Dzonsons2011-01-021-11/+11
| | | | | | | existing 'struct tbl' as 'struct tbl_node', then move all option stuff into a 'struct tbl' in mandoc.h. This conflicted with a structure in chars.c, which was renamed.
* Remove last pod2man escapes. These render ok, although \*(-- renders asKristaps Dzonsons2010-09-151-1/+1
| | | | | | O- because the underlying macro depends on \(*W, which a prior pod2man preamble `tr' macro rewrites as "-". This is an error in groff as this tramples on the real \(*W, or Greek omega.
* Churny commit to quiet lint. No functional changes.Kristaps Dzonsons2010-09-041-2/+2
|
* Remove the pod2man table entries. They can now be properly read andKristaps Dzonsons2010-08-291-1/+1
| | | | assigned within the pod2man preamble.
* Implement a simple, consistent user interface for error handling.Ingo Schwarze2010-08-201-2/+2
| | | | | | | | | | | | | | | | | We now have sufficient practical experience to know what we want, so this is intended to be final: - provide -Wlevel (warning, error or fatal) to select what you care about - provide -Wstop to stop after parsing a file with warnings you care about - provide consistent exit status codes for those warnings you care about - fully document what warnings, errors and fatal errors mean - remove all other cruft from the user interface, less is more: - remove all -f knobs along with the whole -f option - remove the old -Werror because calling warnings "fatal" is silly - always finish parsing each file, unless fatal errors prevent that This commit also includes a couple of related simplifications behind the scenes regarding error handling. Feedback and OK kristaps@; Joerg Sonnenberger (NetBSD) and Sascha Wildner (DragonFly BSD) agree with the general direction.
* Remove \*(C+ from the pre-predefined strings. It is always `ds'-definedKristaps Dzonsons2010-08-161-1/+1
| | | | | | when being used in manuals. Since we now support `ds', it's no longer necessary to account for it. From a bug report originally by Thomas Jeunet.
* Sync to OpenBSD: add missing Copyright years.Ingo Schwarze2010-07-311-1/+1
| | | | | I checked that substantial changes were committed to these files during these years.
* Remove asciisz from chars.in. It frees up a nice chunk of memory and atKristaps Dzonsons2010-07-261-9/+8
| | | | | | the overhead of running strlen() for ASCII strings (yes, I benchmarked this running mandoc_char(7) as input again and again with hundredth-second penalties... on my slow-ass alpha).
* Clean up mandoc_special() (in order later to catch \m). It also flagsKristaps Dzonsons2010-07-181-1/+2
| | | | | | several syntactic errors that weren't caught before. Also un-puke chars.c on zero-length \[].
* By letting strncmp() do its job and not helping it with a prior lengthKristaps Dzonsons2010-07-171-9/+8
| | | | | check, we can remove the hard-coded length of all escape patterns. This frees up a nice chunk of memory.
* Change chars.in HTML encoding to be a Unicode codepoint (int), which isKristaps Dzonsons2010-07-161-23/+64
| | | | later formatted in html.c.
* Churn as I finish email address migration kth.se -> bsd.lv.Kristaps Dzonsons2010-06-191-1/+1
|