Improve -Tascii output for Unicode escape sequences: For the first 512

code points, provide ASCII approximations. This is already much better than what groff does, which prints nothing for most code points. A few minor fixes while here: * Handle Unicode escape sequences in the ASCII range. * In case of errors, use the REPLACEMENT CHARACTER U+FFFD for -Tutf8 and the string "<?>" for -Tascii output. * Handle all one-character escape sequences in mchars_spec2{cp,str}() and remove the workarounds on the higher level.
author: Ingo Schwarze <schwarze@openbsd.org> 2014-10-26 17:12:03 +0000
committer: Ingo Schwarze <schwarze@openbsd.org> 2014-10-26 17:12:03 +0000
commit: 769a036f3a9f484327108011e3bfbe984e435947 (patch)
tree: 79c751b46195aae7e4a581337e647055584884f7 /html.c
parent: 90de6f743cde657a20885806bb1ea6bce6741b71 (diff)
download: mandoc-769a036f3a9f484327108011e3bfbe984e435947.tar.gz
1 files changed, 12 insertions, 2 deletions
diff --git a/html.c b/html.c
index 1bed87d4..f1d3ad1a 100644
--- a/html.c
+++ b/html.c
@@ -437,8 +437,18 @@ print_encode(struct html *h, const char *p, int norecurse)
 		case ESCAPE_UNICODE:
 			/* Skip past "u" header. */
 			c = mchars_num2uc(seq + 1, len - 1);
-			if ('\0' != c)
-				printf("&#x%x;", c);
+
+			/*
+			 * XXX Security warning:
+			 * For now, forbid Unicode obfuscation of ASCII
+			 * characters.  An audit of the callers is
+			 * required before this can be removed.
+			 */
+
+			if (c < 0x80)
+				c = 0xFFFD;
+
+			printf("&#x%x;", c);
 			break;
 		case ESCAPE_NUMBERED:
 			c = mchars_num2char(seq, len);
author	Ingo Schwarze <schwarze@openbsd.org>	2014-10-26 17:12:03 +0000
committer	Ingo Schwarze <schwarze@openbsd.org>	2014-10-26 17:12:03 +0000
commit	769a036f3a9f484327108011e3bfbe984e435947 (patch)
tree	79c751b46195aae7e4a581337e647055584884f7 /html.c
parent	90de6f743cde657a20885806bb1ea6bce6741b71 (diff)
download	mandoc-769a036f3a9f484327108011e3bfbe984e435947.tar.gz