summaryrefslogtreecommitdiffstats
path: root/html.c
diff options
context:
space:
mode:
authorIngo Schwarze <schwarze@openbsd.org>2014-10-26 17:12:03 +0000
committerIngo Schwarze <schwarze@openbsd.org>2014-10-26 17:12:03 +0000
commit769a036f3a9f484327108011e3bfbe984e435947 (patch)
tree79c751b46195aae7e4a581337e647055584884f7 /html.c
parent90de6f743cde657a20885806bb1ea6bce6741b71 (diff)
downloadmandoc-769a036f3a9f484327108011e3bfbe984e435947.tar.gz
Improve -Tascii output for Unicode escape sequences: For the first 512
code points, provide ASCII approximations. This is already much better than what groff does, which prints nothing for most code points. A few minor fixes while here: * Handle Unicode escape sequences in the ASCII range. * In case of errors, use the REPLACEMENT CHARACTER U+FFFD for -Tutf8 and the string "<?>" for -Tascii output. * Handle all one-character escape sequences in mchars_spec2{cp,str}() and remove the workarounds on the higher level.
Diffstat (limited to 'html.c')
-rw-r--r--html.c14
1 files changed, 12 insertions, 2 deletions
diff --git a/html.c b/html.c
index 1bed87d4..f1d3ad1a 100644
--- a/html.c
+++ b/html.c
@@ -437,8 +437,18 @@ print_encode(struct html *h, const char *p, int norecurse)
case ESCAPE_UNICODE:
/* Skip past "u" header. */
c = mchars_num2uc(seq + 1, len - 1);
- if ('\0' != c)
- printf("&#x%x;", c);
+
+ /*
+ * XXX Security warning:
+ * For now, forbid Unicode obfuscation of ASCII
+ * characters. An audit of the callers is
+ * required before this can be removed.
+ */
+
+ if (c < 0x80)
+ c = 0xFFFD;
+
+ printf("&#x%x;", c);
break;
case ESCAPE_NUMBERED:
c = mchars_num2char(seq, len);