summaryrefslogtreecommitdiffstats
path: root/html.c
Commit message (Collapse)AuthorAgeFilesLines
* Generate simpler in-page links: just replace spaces with underscores.Ingo Schwarze2016-01-041-3/+3
| | | | Patch from bentley@.
* Major character table cleanup:Ingo Schwarze2015-10-131-4/+2
| | | | | | | | | | | | | * Use ohash(3) rather than a hand-rolled hash table. * Make the character table static in the chars.c module: There is no need to pass a pointer around, we most certainly never want to use two different character tables concurrently. * No need to keep the characters in a separate file chars.in; that merely encourages downstream porters to mess with them. * Sort the characters to agree with the mandoc_chars(7) manual page. * Specify Unicode codepoints in hex, not decimal (that's the detail that originally triggered this patch). No functional change, minus 100 LOC, and i don't see a performance change.
* Fix an obvious bug found during the /* FALLTHROUGH */ cleanup:Ingo Schwarze2015-10-121-1/+1
| | | | ASCII_NBRSP has to be rendered as " ", not "-".
* To make the code more readable, delete 283 /* FALLTHROUGH */ commentsIngo Schwarze2015-10-121-10/+1
| | | | | | that were right between two adjacent case statement. Keep only those 24 where the first case actually executes some code before falling through to the next case.
* modernize style: "return" is not a functionIngo Schwarze2015-10-061-7/+7
|
* /* NOTREACHED */ after abort() is silly, delete itIngo Schwarze2015-09-261-1/+0
|
* Actually use the new man.conf(5) "output" directive.Ingo Schwarze2015-03-271-28/+9
| | | | Additional functionality, yet minus 45 lines of code.
* Rudimentary implementation of the roff(7) \o escape sequence (overstrike).Ingo Schwarze2015-01-211-1/+8
| | | | | | This is of some relevance because the pod2man(1) preamble abuses it for the icelandic letter Thorn, instead of simply using \(TP and \(Tp. Missing feature found by sthen@ in DateTime::Locale::is_IS(3p).
* resolve some code duplication; no functional changeIngo Schwarze2014-12-201-26/+18
|
* Fix the implementation and documentation of \c (continue text input line).Ingo Schwarze2014-12-021-1/+2
| | | | | In particular, make it work in no-fill mode, too. Reminded by Carsten dot Kunze at arcor dot de (Heirloom roff).
* The header libmandoc.h is part of the internal parser interface,Ingo Schwarze2014-12-011-1/+0
| | | | | | but html.c is not part of the parser at all, so it cannot include that header, and actually, it doesn't need it. Found while auditing includes after Theo's recent *.h commit.
* In terminal output, unify handling of Unicode and numbered characterIngo Schwarze2014-10-291-3/+6
| | | | | | | | | | | escape sequences just like it was earlier implemented for -Thtml. Do not let control characters other than ASCII 9 (horizontal tab) propagate to the output, even though groff allows them; but that really doesn't look like a great idea. Let mchars_num2char() return int such that we can distinguish invalid \N syntax from \N'0'. This also reduces the danger of signed char issues popping up.
* Make the character table available to libroff so it can check theIngo Schwarze2014-10-281-21/+3
| | | | | | | | validity of character escape names and warn about unknown ones. This requires mchars_spec2cp() to report unknown names again. Fortunately, that doesn't require changing the calling code because according to groff, invalid character escapes should not produce output anyway, and now that we warn about them, that's fine.
* Handle output encoding for unicode, numbered and named escape sequencesIngo Schwarze2014-10-271-22/+10
| | | | | | | | in one common, safe way instead of three different ways. In particular, * skip NUL, it is used to mean "no output desired" * deny 0x01-0x1F and 0x7F-0x9F, print REPLACEMENT CHARACTER instead * print 0x20-0x7E literally or name-encoded, as required * print characters above 0x9F numerically
* Fix a regression in term.c rev. 1.229 reported by bentley@:Ingo Schwarze2014-10-271-4/+5
| | | | | | | | | | In UTF-8 output, do not print anything if mchars_spec2cp() returns 0. In particular, this repairs handling of zero-width spaces (\&). While here, let mchars_spec2cp() return 0xFFFD instead of -1 if the character is not found, simplifying the using code. In HTML output, do not print obfuscated ASCII characters and do not test for one-char escapes, mchars_spec2cp() already does that.
* Improve -Tascii output for Unicode escape sequences: For the first 512Ingo Schwarze2014-10-261-2/+12
| | | | | | | | | | | | code points, provide ASCII approximations. This is already much better than what groff does, which prints nothing for most code points. A few minor fixes while here: * Handle Unicode escape sequences in the ASCII range. * In case of errors, use the REPLACEMENT CHARACTER U+FFFD for -Tutf8 and the string "<?>" for -Tascii output. * Handle all one-character escape sequences in mchars_spec2{cp,str}() and remove the workarounds on the higher level.
* sync Copyright years after merge to OpenBSD; no code changeIngo Schwarze2014-10-101-1/+1
|
* Re-write of eqn(7) parser and MathML output.Kristaps Dzonsons2014-10-101-0/+1
| | | | | | | | | | This adds parser-level support for the grammar described by the eqn second-edition technical paper, "Typesetting Mathematics — User's Guide" (Kernighan, Cherry). The reason for this re-write is the grouping rules, which were not possible given the existing implementation. The re-write has also considerably simplified the HTML (and, if it ever is completed, terminal) front-end.
* Change "to" and "from" commands to use munder, mover, and munderover.Kristaps Dzonsons2014-09-281-0/+3
|
* Add support for some MathML elements and attributes in our HTML5.Kristaps Dzonsons2014-09-281-0/+15
|
* Don't pretend we have a separate XHTML and HTML mode any more.Kristaps Dzonsons2014-09-271-14/+7
|
* Remove <p> in favour of <div class="spacer">.Kristaps Dzonsons2014-09-271-2/+14
| | | | | | | This is good because <p> is brittle: it can't appear within other block macros. This fixes a regression of the original HTML5 patch as noted by schwarze@ on the tech@ list, 14/8/2014.
* Remove last hard-coded width attribute.Kristaps Dzonsons2014-09-271-2/+3
|
* HTML5-isation: remove more alignments.Kristaps Dzonsons2014-09-271-2/+2
|
* Continue in HTML5-ing by kicking out some hard-coded alignments.Kristaps Dzonsons2014-09-271-1/+5
|
* Kick out "summary" attribute, which isn't HTML5.Kristaps Dzonsons2014-09-271-1/+0
|
* Kick out two attibutes we don't use any more in HTML5.Kristaps Dzonsons2014-09-271-2/+0
|
* First, add space for default styling for HTML5 (non-fragment) output.Kristaps Dzonsons2014-09-271-0/+6
| | | | | This uses a <style /> block right before the <link /> for the stylesheet. Use this to kick out hardcoded header and footer table widths.
* First steps in HTML5: use UTF8 meta-charset and HTML5 doctype identifier.Kristaps Dzonsons2014-09-271-38/+5
|
* Revert previous, as requested by kristaps@.Ingo Schwarze2014-08-141-1/+0
| | | | | | | | | | | | | The .Bf block can contain subblocks, so it has to render as an element that can contain flow content. But <em> cannot contain flow content, only phrasing content. Rendering .Em and .Bf differently would by unfortunate, and closing out .Bf before subblocks and re-opening it afterwards would merely complicate both the C code of the program and the generated HTML code. Besides, converting .Em to semantic HTML markup would require some content to be put into <em> and some into <i>, but we cannot automatically distinguish which is which, so strictly speaking, we can't use semantic HTML here but have to fall back to physical markup. Wonders of HTML...
* Begin cleaning up scaling units.Kristaps Dzonsons2014-08-131-0/+2
| | | | | | | | | Start with the horizontal terminal specifiers, making sure that they match up with troff. Then move on to PS, PDF, and HTML, noting that we stick to the terminal default width for "u". Lastly, fix some completely-wrong documentation and note that we diverge from troff w/r/t "u".
* Use <em> for .Em and .Bf -emphasis.Ingo Schwarze2014-08-131-0/+1
| | | | | | | | | | | | | | | | | The vast majority of .Em in real-world manuals is stress emphasis, for which <em> is the correct markup. Admittedly, there are some instances of .Em usage for alternate quality, for which <i> would be a better match. Most of these are technical terms that neither allow semantic markup nor are keywords - for the latter, .Sy would be preferable. A typical example is that the shell breaks input into .Em words . Alternate voice or mood, which would also require <i>, is almost absent from manuals. We cannot satisfy both stress emphasis and alternate quality, so pick the one that fits more often and looks less wrong when off. Patch from Guy Harris <guy at alum dot mit dot edu>. ok joerg@ bentley@
* Get rid of HAVE_CONFIG_H, it is always defined; idea from libnbcompat.Ingo Schwarze2014-08-101-2/+0
| | | | | | Include <sys/types.h> where needed, it does not belong in config.h. Remove <stdio.h> from config.h; if it is missing somewhere, it should be added, but i cannot find a *.c file where it is missing.
* Security fix:Ingo Schwarze2014-07-231-26/+37
| | | | | | | | | | After decoding numeric (\N) and one-character (\<, \> etc.) character escape sequences, do not forget to HTML-encode the resulting ASCII character. Malicious manuals were able to smuggle XSS content by roff-escaping the HTML-special characters they need. That's a classic bug type in many web applications, actually... :-( Found myself while auditing the HTML formatter for safe output handling.
* Security fix:Ingo Schwarze2014-07-221-1/+4
| | | | | | | | | | The function print_encode() is used both for plain text and for quoted attribute values. Escape the '"' character such that malicious manuals cannot pull off XSS attacks using malformed .Lk, .Mt, .%U, and .UR macros (and maybe others) to trigger the latter case. In the former case, escaping does no harm. Issue found by Sebastien Marie <semarie-openbsd at latrappe dot fr>.
* Audit strlcpy(3)/strlcat(3) usage.Ingo Schwarze2014-04-231-0/+6
| | | | | | | | | | | | | * Repair three instances of silent truncation, use asprintf(3). * Change two instances of strlen(3)+malloc(3)+strlcpy(3)+strlcat(3)+... to use asprintf(3) instead to make them less error prone. * Cast the return value of four instances where the destination buffer is known to be large enough to (void). * Completely remove three useless instances of strlcpy(3)/strlcat(3). * Mark two places in -Thtml with XXX that can cause information loss and crashes but are not easy to fix, requiring design changes of some internal interfaces. * The file mandocdb.c remains to be audited.
* KNF: case (FOO): -> case FOO:, remove /* LINTED */ and /* ARGSUSED */,Ingo Schwarze2014-04-201-66/+58
| | | | | remove trailing whitespace and blanks before tabs, improve some indenting; no functional change
* The files mandoc.c and mandoc.h contained both specialised low-levelIngo Schwarze2014-03-231-0/+1
| | | | | | | functions used for multiple languages (mdoc, man, roff), for example mandoc_escape(), mandoc_getarg(), mandoc_eos(), and generic auxiliary functions. Split the auxiliaries out into their own file and header. While here, do some #include cleanup.
* Implement the \: (optional line break) escape sequence,Ingo Schwarze2014-01-221-2/+8
| | | | | | | documented in the Ossanna-Kernighan-Ritter troff manual and also supported by groff. Missing feature reported by Steffen Nurpmeso <sdaoden at gmail dot com>.
* Tag functions with format strings as arguments as printf-like.Joerg Sonnenberger2014-01-051-1/+1
| | | | | Fix one case where a non-literal is used as format string. Fix another case where a variable is formatted using the wrong type.
* Implement the roff(7) font-escape sequence \f(BI "bold+italic".Ingo Schwarze2013-08-081-9/+34
| | | | | This improves the formatting of about 40 base manuals and reduces groff-mandoc formatting differences in base by about 5%.
* Implement the roff \z escape sequence, intended to output the nextIngo Schwarze2012-05-311-24/+57
| | | | | | | | | | | | | | | | | | character without advancing the cursor position; implement it to simply skip the next character, as it will usually be overwritten. With this change, the pod2man(1) preamble user-defined string \*:, intended to render as a diaeresis or umlaut diacritic above the preceding character, is rendered in a slightly less ugly way, though still not correctly. It was rendered as "z.." and is now rendered as ".". Given that the definition of \*: uses elaborate manual \h positioning, there is little chance for mandoc(1) to ever render it correctly, but at least we can refrain from printing out a spurious "z", and we can make the \z do something semi-reasonable for easier cases. "just commit" kristaps@
* Add the -Ofragment option to -T[]x]html. This accomodates for embeddingKristaps Dzonsons2011-10-051-2/+6
| | | | | manual output in existing HTML or XHTML documents, e.g., when invoking mandoc from an SSI or CGI.
* Fix handling of the `\c' escape in -T[x]html.Kristaps Dzonsons2011-07-071-1/+3
|
* The bufcat() function in -T[x]html was eating one byte off the end of itsKristaps Dzonsons2011-07-041-1/+0
| | | | | | concatenated string. This for some reason hasn't been found before now... ? Anyway, fixed, and make the IDs created again be correctly prefixed by a letter as per the HTML spec.
* Use the correct Unicode value for the zero-width space, which means thatKristaps Dzonsons2011-05-241-22/+5
| | | | | spec2cp never needs to fall through to spec2str. Then clean out html.c of its unnecessary print_res() function.
* Remove all references to ESCAPE_PREDEF, which is now not exposed passedKristaps Dzonsons2011-05-241-25/+0
| | | | the libroff point. This clears up a nice chunk of code.
* Make any un-recognised font be considered a call for the Roman font.Kristaps Dzonsons2011-05-181-0/+4
| | | | | This makes sequences of \f[unknown] \fP not completely puke. From a TODO by schwarze@.
* Flip on unicode output (via \[uNNNN]) in -T[x]html. Here we go!Kristaps Dzonsons2011-05-171-0/+8
|
* Clean-up fallout: differentiate ID's and HREF's (where to put the `#').Kristaps Dzonsons2011-05-171-3/+2
| | | | Make buffmt functions internally bufinit(), too.