summaryrefslogtreecommitdiffstats
path: root/html.c
Commit message (Collapse)AuthorAgeFilesLines
* Fix a regression in term.c rev. 1.229 reported by bentley@:Ingo Schwarze2014-10-271-4/+5
| | | | | | | | | | In UTF-8 output, do not print anything if mchars_spec2cp() returns 0. In particular, this repairs handling of zero-width spaces (\&). While here, let mchars_spec2cp() return 0xFFFD instead of -1 if the character is not found, simplifying the using code. In HTML output, do not print obfuscated ASCII characters and do not test for one-char escapes, mchars_spec2cp() already does that.
* Improve -Tascii output for Unicode escape sequences: For the first 512Ingo Schwarze2014-10-261-2/+12
| | | | | | | | | | | | code points, provide ASCII approximations. This is already much better than what groff does, which prints nothing for most code points. A few minor fixes while here: * Handle Unicode escape sequences in the ASCII range. * In case of errors, use the REPLACEMENT CHARACTER U+FFFD for -Tutf8 and the string "<?>" for -Tascii output. * Handle all one-character escape sequences in mchars_spec2{cp,str}() and remove the workarounds on the higher level.
* sync Copyright years after merge to OpenBSD; no code changeIngo Schwarze2014-10-101-1/+1
|
* Re-write of eqn(7) parser and MathML output.Kristaps Dzonsons2014-10-101-0/+1
| | | | | | | | | | This adds parser-level support for the grammar described by the eqn second-edition technical paper, "Typesetting Mathematics — User's Guide" (Kernighan, Cherry). The reason for this re-write is the grouping rules, which were not possible given the existing implementation. The re-write has also considerably simplified the HTML (and, if it ever is completed, terminal) front-end.
* Change "to" and "from" commands to use munder, mover, and munderover.Kristaps Dzonsons2014-09-281-0/+3
|
* Add support for some MathML elements and attributes in our HTML5.Kristaps Dzonsons2014-09-281-0/+15
|
* Don't pretend we have a separate XHTML and HTML mode any more.Kristaps Dzonsons2014-09-271-14/+7
|
* Remove <p> in favour of <div class="spacer">.Kristaps Dzonsons2014-09-271-2/+14
| | | | | | | This is good because <p> is brittle: it can't appear within other block macros. This fixes a regression of the original HTML5 patch as noted by schwarze@ on the tech@ list, 14/8/2014.
* Remove last hard-coded width attribute.Kristaps Dzonsons2014-09-271-2/+3
|
* HTML5-isation: remove more alignments.Kristaps Dzonsons2014-09-271-2/+2
|
* Continue in HTML5-ing by kicking out some hard-coded alignments.Kristaps Dzonsons2014-09-271-1/+5
|
* Kick out "summary" attribute, which isn't HTML5.Kristaps Dzonsons2014-09-271-1/+0
|
* Kick out two attibutes we don't use any more in HTML5.Kristaps Dzonsons2014-09-271-2/+0
|
* First, add space for default styling for HTML5 (non-fragment) output.Kristaps Dzonsons2014-09-271-0/+6
| | | | | This uses a <style /> block right before the <link /> for the stylesheet. Use this to kick out hardcoded header and footer table widths.
* First steps in HTML5: use UTF8 meta-charset and HTML5 doctype identifier.Kristaps Dzonsons2014-09-271-38/+5
|
* Revert previous, as requested by kristaps@.Ingo Schwarze2014-08-141-1/+0
| | | | | | | | | | | | | The .Bf block can contain subblocks, so it has to render as an element that can contain flow content. But <em> cannot contain flow content, only phrasing content. Rendering .Em and .Bf differently would by unfortunate, and closing out .Bf before subblocks and re-opening it afterwards would merely complicate both the C code of the program and the generated HTML code. Besides, converting .Em to semantic HTML markup would require some content to be put into <em> and some into <i>, but we cannot automatically distinguish which is which, so strictly speaking, we can't use semantic HTML here but have to fall back to physical markup. Wonders of HTML...
* Begin cleaning up scaling units.Kristaps Dzonsons2014-08-131-0/+2
| | | | | | | | | Start with the horizontal terminal specifiers, making sure that they match up with troff. Then move on to PS, PDF, and HTML, noting that we stick to the terminal default width for "u". Lastly, fix some completely-wrong documentation and note that we diverge from troff w/r/t "u".
* Use <em> for .Em and .Bf -emphasis.Ingo Schwarze2014-08-131-0/+1
| | | | | | | | | | | | | | | | | The vast majority of .Em in real-world manuals is stress emphasis, for which <em> is the correct markup. Admittedly, there are some instances of .Em usage for alternate quality, for which <i> would be a better match. Most of these are technical terms that neither allow semantic markup nor are keywords - for the latter, .Sy would be preferable. A typical example is that the shell breaks input into .Em words . Alternate voice or mood, which would also require <i>, is almost absent from manuals. We cannot satisfy both stress emphasis and alternate quality, so pick the one that fits more often and looks less wrong when off. Patch from Guy Harris <guy at alum dot mit dot edu>. ok joerg@ bentley@
* Get rid of HAVE_CONFIG_H, it is always defined; idea from libnbcompat.Ingo Schwarze2014-08-101-2/+0
| | | | | | Include <sys/types.h> where needed, it does not belong in config.h. Remove <stdio.h> from config.h; if it is missing somewhere, it should be added, but i cannot find a *.c file where it is missing.
* Security fix:Ingo Schwarze2014-07-231-26/+37
| | | | | | | | | | After decoding numeric (\N) and one-character (\<, \> etc.) character escape sequences, do not forget to HTML-encode the resulting ASCII character. Malicious manuals were able to smuggle XSS content by roff-escaping the HTML-special characters they need. That's a classic bug type in many web applications, actually... :-( Found myself while auditing the HTML formatter for safe output handling.
* Security fix:Ingo Schwarze2014-07-221-1/+4
| | | | | | | | | | The function print_encode() is used both for plain text and for quoted attribute values. Escape the '"' character such that malicious manuals cannot pull off XSS attacks using malformed .Lk, .Mt, .%U, and .UR macros (and maybe others) to trigger the latter case. In the former case, escaping does no harm. Issue found by Sebastien Marie <semarie-openbsd at latrappe dot fr>.
* Audit strlcpy(3)/strlcat(3) usage.Ingo Schwarze2014-04-231-0/+6
| | | | | | | | | | | | | * Repair three instances of silent truncation, use asprintf(3). * Change two instances of strlen(3)+malloc(3)+strlcpy(3)+strlcat(3)+... to use asprintf(3) instead to make them less error prone. * Cast the return value of four instances where the destination buffer is known to be large enough to (void). * Completely remove three useless instances of strlcpy(3)/strlcat(3). * Mark two places in -Thtml with XXX that can cause information loss and crashes but are not easy to fix, requiring design changes of some internal interfaces. * The file mandocdb.c remains to be audited.
* KNF: case (FOO): -> case FOO:, remove /* LINTED */ and /* ARGSUSED */,Ingo Schwarze2014-04-201-66/+58
| | | | | remove trailing whitespace and blanks before tabs, improve some indenting; no functional change
* The files mandoc.c and mandoc.h contained both specialised low-levelIngo Schwarze2014-03-231-0/+1
| | | | | | | functions used for multiple languages (mdoc, man, roff), for example mandoc_escape(), mandoc_getarg(), mandoc_eos(), and generic auxiliary functions. Split the auxiliaries out into their own file and header. While here, do some #include cleanup.
* Implement the \: (optional line break) escape sequence,Ingo Schwarze2014-01-221-2/+8
| | | | | | | documented in the Ossanna-Kernighan-Ritter troff manual and also supported by groff. Missing feature reported by Steffen Nurpmeso <sdaoden at gmail dot com>.
* Tag functions with format strings as arguments as printf-like.Joerg Sonnenberger2014-01-051-1/+1
| | | | | Fix one case where a non-literal is used as format string. Fix another case where a variable is formatted using the wrong type.
* Implement the roff(7) font-escape sequence \f(BI "bold+italic".Ingo Schwarze2013-08-081-9/+34
| | | | | This improves the formatting of about 40 base manuals and reduces groff-mandoc formatting differences in base by about 5%.
* Implement the roff \z escape sequence, intended to output the nextIngo Schwarze2012-05-311-24/+57
| | | | | | | | | | | | | | | | | | character without advancing the cursor position; implement it to simply skip the next character, as it will usually be overwritten. With this change, the pod2man(1) preamble user-defined string \*:, intended to render as a diaeresis or umlaut diacritic above the preceding character, is rendered in a slightly less ugly way, though still not correctly. It was rendered as "z.." and is now rendered as ".". Given that the definition of \*: uses elaborate manual \h positioning, there is little chance for mandoc(1) to ever render it correctly, but at least we can refrain from printing out a spurious "z", and we can make the \z do something semi-reasonable for easier cases. "just commit" kristaps@
* Add the -Ofragment option to -T[]x]html. This accomodates for embeddingKristaps Dzonsons2011-10-051-2/+6
| | | | | manual output in existing HTML or XHTML documents, e.g., when invoking mandoc from an SSI or CGI.
* Fix handling of the `\c' escape in -T[x]html.Kristaps Dzonsons2011-07-071-1/+3
|
* The bufcat() function in -T[x]html was eating one byte off the end of itsKristaps Dzonsons2011-07-041-1/+0
| | | | | | concatenated string. This for some reason hasn't been found before now... ? Anyway, fixed, and make the IDs created again be correctly prefixed by a letter as per the HTML spec.
* Use the correct Unicode value for the zero-width space, which means thatKristaps Dzonsons2011-05-241-22/+5
| | | | | spec2cp never needs to fall through to spec2str. Then clean out html.c of its unnecessary print_res() function.
* Remove all references to ESCAPE_PREDEF, which is now not exposed passedKristaps Dzonsons2011-05-241-25/+0
| | | | the libroff point. This clears up a nice chunk of code.
* Make any un-recognised font be considered a call for the Roman font.Kristaps Dzonsons2011-05-181-0/+4
| | | | | This makes sequences of \f[unknown] \fP not completely puke. From a TODO by schwarze@.
* Flip on unicode output (via \[uNNNN]) in -T[x]html. Here we go!Kristaps Dzonsons2011-05-171-0/+8
|
* Clean-up fallout: differentiate ID's and HREF's (where to put the `#').Kristaps Dzonsons2011-05-171-3/+2
| | | | Make buffmt functions internally bufinit(), too.
* Cleanups in -T[x]html: make html_idcat() use the buffer and be calledKristaps Dzonsons2011-05-171-24/+5
| | | | | | bufcat_id(), then collapse it into a little function without so much crap. Next, make bufinit() only be called when we really need to do so, and not simply before pre/post calls.
* Clean-ups in -T[x]html: inline print_num(), as it was just a singleKristaps Dzonsons2011-05-171-63/+24
| | | | | | | | | | conditional; same for print_xmltype() and print_doctype(), same reason; make bufncat() be static, as it was only being called from html.c; have bufcat() simply call through to strlcat(). Finally, assert() whenever we truncate. Also rename buffmt() -> bufcat_fmt() to differentiate from buffmt_man et al., which do not concatenate.
* Clean up -T[x]html by using a table instead of a switch statement forKristaps Dzonsons2011-05-171-41/+16
| | | | | the roff units. Also remove a comment about CSS and number types (they all accept decimal numbers).
* Fix missing support for \N'n' when calculating string widths in -TasciiKristaps Dzonsons2011-05-151-2/+3
| | | | (oops). Do the same for -Thtml (oops^2).
* Give -Thtml and -Txhtml the gift of recognising escapes when calculatingKristaps Dzonsons2011-05-141-0/+36
| | | | | widths (e.g., `Bl -tag -width "\s[blahblah]bar"). This has long since been done for -Tascii but escaped noticed with -T[x]html.
* Make mchars_num2char() return a char like it says.Kristaps Dzonsons2011-04-301-4/+3
|
* Rename mchars_init() -> mchars_alloc() for consistency.Kristaps Dzonsons2011-04-301-1/+1
|
* Remove enum mcharst, which hasn't been used in quite some time.Kristaps Dzonsons2011-04-301-1/+1
|
* No code change: fixing spelling errors. From a patch by uqs@. Thanks!Kristaps Dzonsons2011-04-301-1/+1
|
* Move "chars" interface out of out.h and into mandoc.h. This doesn'tKristaps Dzonsons2011-04-291-7/+7
| | | | | | | | | | change any code but for renaming functions and types to be consistent with other mandoc.h stuff. The reason for moving into libmandoc is that the rendering of special characters is part of mandoc itself---not an external part. From mandoc(1)'s perspective, this changes nothing, but for other utilities, it's important to have these part of libmandoc. Note this isn't documented [yet] in mandoc.3 because there are some parts I'd like to change around beforehand.
* Remove a2roffdeco() and mandoc_special() functions and replace them withKristaps Dzonsons2011-04-091-46/+41
| | | | | | | | | | | | | | | | | | a public (mandoc.h) function mandoc_escape(), which merges the functionality of both prior functions. Reason: code duplication. The a2roffdeco() and mandoc_special() functions were pretty much the same thing and both quite complex. This allows one function to receive improvements in (e.g.) subexpression handling and performance, instead of having to replicate functionality. As such, the mandoc_escape() function already handles a superset of the escapes handled in previous versions and has improvements in performance (using strcspn(), for example) and reliable handling of subexpressions. This code Works For Me, but may need work to catch any regressions. Since the benefits are great (leaner code, simpler API), I'd rather have it in-tree than floating as a patch.
* Move mandoc_isdelim() back into libmdoc.h. This fixes an unreportedKristaps Dzonsons2011-03-221-8/+0
| | | | | | | | | error where (1) -man pages were punctuating delimiters (e.g., `.B a ;') and where (2) standalone punctuation in -mdoc or -man (e.g., ";" on its own line) would also be punctuated. This introduces a small amount of complexity of mdoc_{html,term}.c must manage their own spacing with running print_word() or print_text(). The check for delimiting now happens in mdoc_macro.c's dword().
* Step 4: merge chars.h into out.h. The functions in this file areKristaps Dzonsons2011-03-221-1/+0
| | | | | necessary to all [real] front-ends, so stop pretending it's special. While here, add some documentation to the variable types.
* Move mdoc_isdelim() into mandoc.h as mandoc_isdelim(). This allows theKristaps Dzonsons2011-03-171-37/+5
| | | | | | removal of manual delimiter checks in html.c and term.c. Finally, add the escaped period as a closing delimiter, removing a TODO to this effect.