diff options
author | Ingo Schwarze <schwarze@openbsd.org> | 2014-10-26 17:12:03 +0000 |
---|---|---|
committer | Ingo Schwarze <schwarze@openbsd.org> | 2014-10-26 17:12:03 +0000 |
commit | 769a036f3a9f484327108011e3bfbe984e435947 (patch) | |
tree | 79c751b46195aae7e4a581337e647055584884f7 /html.c | |
parent | 90de6f743cde657a20885806bb1ea6bce6741b71 (diff) | |
download | mandoc-769a036f3a9f484327108011e3bfbe984e435947.tar.gz |
Improve -Tascii output for Unicode escape sequences: For the first 512
code points, provide ASCII approximations. This is already much better
than what groff does, which prints nothing for most code points.
A few minor fixes while here:
* Handle Unicode escape sequences in the ASCII range.
* In case of errors, use the REPLACEMENT CHARACTER U+FFFD for -Tutf8
and the string "<?>" for -Tascii output.
* Handle all one-character escape sequences in mchars_spec2{cp,str}()
and remove the workarounds on the higher level.
Diffstat (limited to 'html.c')
-rw-r--r-- | html.c | 14 |
1 files changed, 12 insertions, 2 deletions
@@ -437,8 +437,18 @@ print_encode(struct html *h, const char *p, int norecurse) case ESCAPE_UNICODE: /* Skip past "u" header. */ c = mchars_num2uc(seq + 1, len - 1); - if ('\0' != c) - printf("&#x%x;", c); + + /* + * XXX Security warning: + * For now, forbid Unicode obfuscation of ASCII + * characters. An audit of the callers is + * required before this can be removed. + */ + + if (c < 0x80) + c = 0xFFFD; + + printf("&#x%x;", c); break; case ESCAPE_NUMBERED: c = mchars_num2char(seq, len); |