summaryrefslogtreecommitdiffstats
path: root/html.c
Commit message (Collapse)AuthorAgeFilesLines
* Revert previous, as requested by kristaps@.Ingo Schwarze2014-08-141-1/+0
| | | | | | | | | | | | | The .Bf block can contain subblocks, so it has to render as an element that can contain flow content. But <em> cannot contain flow content, only phrasing content. Rendering .Em and .Bf differently would by unfortunate, and closing out .Bf before subblocks and re-opening it afterwards would merely complicate both the C code of the program and the generated HTML code. Besides, converting .Em to semantic HTML markup would require some content to be put into <em> and some into <i>, but we cannot automatically distinguish which is which, so strictly speaking, we can't use semantic HTML here but have to fall back to physical markup. Wonders of HTML...
* Begin cleaning up scaling units.Kristaps Dzonsons2014-08-131-0/+2
| | | | | | | | | Start with the horizontal terminal specifiers, making sure that they match up with troff. Then move on to PS, PDF, and HTML, noting that we stick to the terminal default width for "u". Lastly, fix some completely-wrong documentation and note that we diverge from troff w/r/t "u".
* Use <em> for .Em and .Bf -emphasis.Ingo Schwarze2014-08-131-0/+1
| | | | | | | | | | | | | | | | | The vast majority of .Em in real-world manuals is stress emphasis, for which <em> is the correct markup. Admittedly, there are some instances of .Em usage for alternate quality, for which <i> would be a better match. Most of these are technical terms that neither allow semantic markup nor are keywords - for the latter, .Sy would be preferable. A typical example is that the shell breaks input into .Em words . Alternate voice or mood, which would also require <i>, is almost absent from manuals. We cannot satisfy both stress emphasis and alternate quality, so pick the one that fits more often and looks less wrong when off. Patch from Guy Harris <guy at alum dot mit dot edu>. ok joerg@ bentley@
* Get rid of HAVE_CONFIG_H, it is always defined; idea from libnbcompat.Ingo Schwarze2014-08-101-2/+0
| | | | | | Include <sys/types.h> where needed, it does not belong in config.h. Remove <stdio.h> from config.h; if it is missing somewhere, it should be added, but i cannot find a *.c file where it is missing.
* Security fix:Ingo Schwarze2014-07-231-26/+37
| | | | | | | | | | After decoding numeric (\N) and one-character (\<, \> etc.) character escape sequences, do not forget to HTML-encode the resulting ASCII character. Malicious manuals were able to smuggle XSS content by roff-escaping the HTML-special characters they need. That's a classic bug type in many web applications, actually... :-( Found myself while auditing the HTML formatter for safe output handling.
* Security fix:Ingo Schwarze2014-07-221-1/+4
| | | | | | | | | | The function print_encode() is used both for plain text and for quoted attribute values. Escape the '"' character such that malicious manuals cannot pull off XSS attacks using malformed .Lk, .Mt, .%U, and .UR macros (and maybe others) to trigger the latter case. In the former case, escaping does no harm. Issue found by Sebastien Marie <semarie-openbsd at latrappe dot fr>.
* Audit strlcpy(3)/strlcat(3) usage.Ingo Schwarze2014-04-231-0/+6
| | | | | | | | | | | | | * Repair three instances of silent truncation, use asprintf(3). * Change two instances of strlen(3)+malloc(3)+strlcpy(3)+strlcat(3)+... to use asprintf(3) instead to make them less error prone. * Cast the return value of four instances where the destination buffer is known to be large enough to (void). * Completely remove three useless instances of strlcpy(3)/strlcat(3). * Mark two places in -Thtml with XXX that can cause information loss and crashes but are not easy to fix, requiring design changes of some internal interfaces. * The file mandocdb.c remains to be audited.
* KNF: case (FOO): -> case FOO:, remove /* LINTED */ and /* ARGSUSED */,Ingo Schwarze2014-04-201-66/+58
| | | | | remove trailing whitespace and blanks before tabs, improve some indenting; no functional change
* The files mandoc.c and mandoc.h contained both specialised low-levelIngo Schwarze2014-03-231-0/+1
| | | | | | | functions used for multiple languages (mdoc, man, roff), for example mandoc_escape(), mandoc_getarg(), mandoc_eos(), and generic auxiliary functions. Split the auxiliaries out into their own file and header. While here, do some #include cleanup.
* Implement the \: (optional line break) escape sequence,Ingo Schwarze2014-01-221-2/+8
| | | | | | | documented in the Ossanna-Kernighan-Ritter troff manual and also supported by groff. Missing feature reported by Steffen Nurpmeso <sdaoden at gmail dot com>.
* Tag functions with format strings as arguments as printf-like.Joerg Sonnenberger2014-01-051-1/+1
| | | | | Fix one case where a non-literal is used as format string. Fix another case where a variable is formatted using the wrong type.
* Implement the roff(7) font-escape sequence \f(BI "bold+italic".Ingo Schwarze2013-08-081-9/+34
| | | | | This improves the formatting of about 40 base manuals and reduces groff-mandoc formatting differences in base by about 5%.
* Implement the roff \z escape sequence, intended to output the nextIngo Schwarze2012-05-311-24/+57
| | | | | | | | | | | | | | | | | | character without advancing the cursor position; implement it to simply skip the next character, as it will usually be overwritten. With this change, the pod2man(1) preamble user-defined string \*:, intended to render as a diaeresis or umlaut diacritic above the preceding character, is rendered in a slightly less ugly way, though still not correctly. It was rendered as "z.." and is now rendered as ".". Given that the definition of \*: uses elaborate manual \h positioning, there is little chance for mandoc(1) to ever render it correctly, but at least we can refrain from printing out a spurious "z", and we can make the \z do something semi-reasonable for easier cases. "just commit" kristaps@
* Add the -Ofragment option to -T[]x]html. This accomodates for embeddingKristaps Dzonsons2011-10-051-2/+6
| | | | | manual output in existing HTML or XHTML documents, e.g., when invoking mandoc from an SSI or CGI.
* Fix handling of the `\c' escape in -T[x]html.Kristaps Dzonsons2011-07-071-1/+3
|
* The bufcat() function in -T[x]html was eating one byte off the end of itsKristaps Dzonsons2011-07-041-1/+0
| | | | | | concatenated string. This for some reason hasn't been found before now... ? Anyway, fixed, and make the IDs created again be correctly prefixed by a letter as per the HTML spec.
* Use the correct Unicode value for the zero-width space, which means thatKristaps Dzonsons2011-05-241-22/+5
| | | | | spec2cp never needs to fall through to spec2str. Then clean out html.c of its unnecessary print_res() function.
* Remove all references to ESCAPE_PREDEF, which is now not exposed passedKristaps Dzonsons2011-05-241-25/+0
| | | | the libroff point. This clears up a nice chunk of code.
* Make any un-recognised font be considered a call for the Roman font.Kristaps Dzonsons2011-05-181-0/+4
| | | | | This makes sequences of \f[unknown] \fP not completely puke. From a TODO by schwarze@.
* Flip on unicode output (via \[uNNNN]) in -T[x]html. Here we go!Kristaps Dzonsons2011-05-171-0/+8
|
* Clean-up fallout: differentiate ID's and HREF's (where to put the `#').Kristaps Dzonsons2011-05-171-3/+2
| | | | Make buffmt functions internally bufinit(), too.
* Cleanups in -T[x]html: make html_idcat() use the buffer and be calledKristaps Dzonsons2011-05-171-24/+5
| | | | | | bufcat_id(), then collapse it into a little function without so much crap. Next, make bufinit() only be called when we really need to do so, and not simply before pre/post calls.
* Clean-ups in -T[x]html: inline print_num(), as it was just a singleKristaps Dzonsons2011-05-171-63/+24
| | | | | | | | | | conditional; same for print_xmltype() and print_doctype(), same reason; make bufncat() be static, as it was only being called from html.c; have bufcat() simply call through to strlcat(). Finally, assert() whenever we truncate. Also rename buffmt() -> bufcat_fmt() to differentiate from buffmt_man et al., which do not concatenate.
* Clean up -T[x]html by using a table instead of a switch statement forKristaps Dzonsons2011-05-171-41/+16
| | | | | the roff units. Also remove a comment about CSS and number types (they all accept decimal numbers).
* Fix missing support for \N'n' when calculating string widths in -TasciiKristaps Dzonsons2011-05-151-2/+3
| | | | (oops). Do the same for -Thtml (oops^2).
* Give -Thtml and -Txhtml the gift of recognising escapes when calculatingKristaps Dzonsons2011-05-141-0/+36
| | | | | widths (e.g., `Bl -tag -width "\s[blahblah]bar"). This has long since been done for -Tascii but escaped noticed with -T[x]html.
* Make mchars_num2char() return a char like it says.Kristaps Dzonsons2011-04-301-4/+3
|
* Rename mchars_init() -> mchars_alloc() for consistency.Kristaps Dzonsons2011-04-301-1/+1
|
* Remove enum mcharst, which hasn't been used in quite some time.Kristaps Dzonsons2011-04-301-1/+1
|
* No code change: fixing spelling errors. From a patch by uqs@. Thanks!Kristaps Dzonsons2011-04-301-1/+1
|
* Move "chars" interface out of out.h and into mandoc.h. This doesn'tKristaps Dzonsons2011-04-291-7/+7
| | | | | | | | | | change any code but for renaming functions and types to be consistent with other mandoc.h stuff. The reason for moving into libmandoc is that the rendering of special characters is part of mandoc itself---not an external part. From mandoc(1)'s perspective, this changes nothing, but for other utilities, it's important to have these part of libmandoc. Note this isn't documented [yet] in mandoc.3 because there are some parts I'd like to change around beforehand.
* Remove a2roffdeco() and mandoc_special() functions and replace them withKristaps Dzonsons2011-04-091-46/+41
| | | | | | | | | | | | | | | | | | a public (mandoc.h) function mandoc_escape(), which merges the functionality of both prior functions. Reason: code duplication. The a2roffdeco() and mandoc_special() functions were pretty much the same thing and both quite complex. This allows one function to receive improvements in (e.g.) subexpression handling and performance, instead of having to replicate functionality. As such, the mandoc_escape() function already handles a superset of the escapes handled in previous versions and has improvements in performance (using strcspn(), for example) and reliable handling of subexpressions. This code Works For Me, but may need work to catch any regressions. Since the benefits are great (leaner code, simpler API), I'd rather have it in-tree than floating as a patch.
* Move mandoc_isdelim() back into libmdoc.h. This fixes an unreportedKristaps Dzonsons2011-03-221-8/+0
| | | | | | | | | error where (1) -man pages were punctuating delimiters (e.g., `.B a ;') and where (2) standalone punctuation in -mdoc or -man (e.g., ";" on its own line) would also be punctuated. This introduces a small amount of complexity of mdoc_{html,term}.c must manage their own spacing with running print_word() or print_text(). The check for delimiting now happens in mdoc_macro.c's dword().
* Step 4: merge chars.h into out.h. The functions in this file areKristaps Dzonsons2011-03-221-1/+0
| | | | | necessary to all [real] front-ends, so stop pretending it's special. While here, add some documentation to the variable types.
* Move mdoc_isdelim() into mandoc.h as mandoc_isdelim(). This allows theKristaps Dzonsons2011-03-171-37/+5
| | | | | | removal of manual delimiter checks in html.c and term.c. Finally, add the escaped period as a closing delimiter, removing a TODO to this effect.
* Move mandoc_{realloc,malloc,calloc} out of libmandoc.h and into mandoc.hKristaps Dzonsons2011-03-171-10/+2
| | | | | | | | so that everybody can use them. This follows the convention of libXXXX.h being internal to a library and XXXX.h being the external interface. Not only does this allow the removal of lots of redundant NULL-checking code, it also sets the tone for adding new mandoc-global routines.
* Make lint shut up a little bit.Kristaps Dzonsons2011-03-151-2/+1
|
* Implement the \N'number' (numbered character) roff escape sequence.Ingo Schwarze2011-01-301-1/+17
| | | | | | | Don't use it in new manuals, it is inherently non-portable, but we need it for backward-compatibility with existing manuals, for example in Xenocara driver pages. ok kristaps@ jmc@ and tested by Matthieu Herrb (matthieu at openbsd dot org)
* Change how -Thtml behaves with tables: use multiple rows, with widthsKristaps Dzonsons2011-01-131-0/+13
| | | | | | | | | set by COL, until an external macro is encountered. At this point in time, close out the table and process the macro. When the first table row is again re-encountered, re-start the table. This requires a bit of tracking added to "struct html", but the change is very small and follows the logic of meta-fonts. This all follows a bug-report by joerg@.
* In case an ID attribute is written in pieces, only protect the firstIngo Schwarze2010-12-271-9/+13
| | | | | | | | piece with a prepended 'x', not each piece, such that quoted and unquoted .Sh, .Ss, and .Sx arguments are compatible with each other. Fixing a bug reported by Nicolas Joly <njoly at NetBSD dot org>, avoiding a regression in my first patch as pointed out by njoly as well. "feel free to do so" kristaps@
* Apparently the U tag is deprecated, so use a SPAN instead (blah). BumpKristaps Dzonsons2010-12-241-1/+0
| | | | version date for release.
* Drastically fix -T[x]html's handling of font-escape mode changes (i.e.,Kristaps Dzonsons2010-12-241-26/+24
| | | | | | | | | | | | | | | using \fI or \fP). Now, using these modes will cause a font to be rendered for each word; furthermore, setting mode within a word will do the correct thing. Second, make -man use real font tags (B, I, SMALL) to set its font instead of using font modes and fix up the pre-macro unsetting of the current mode. This fixes how roff.7 wasn't validating (<P> closing out a font mode) and has been checked against gcc.1 (more will come). I considered failure to validate OUR manual to be a show-stopper for the up-coming release.
* Implement reference-counted version of original union mdoc_data. ThisKristaps Dzonsons2010-12-221-1/+2
| | | | | | | simplifies clean-up and allows for more types without extra hassle. Also made in-line literal types in -T[x]html use CODE instead of SPAN to match how literal blocks use PRE.
* More use default tags, this time I and U. Also fix a stack overflowKristaps Dzonsons2010-12-201-0/+2
| | | | segfault in the last commit.
* Give header and footer table cells default widths (using WIDTH and ALIGNKristaps Dzonsons2010-12-201-14/+14
| | | | | | | | | atttributes) if no style is specified. Give the default-bold elements a B tag instead of a SPAN tag, as this can be overriden in the stylesheet. Prune some unused attributes from html.h.
* Make literal `Bd' use a PRE in -Thtml. Make `Bd' output in general useKristaps Dzonsons2010-12-171-0/+1
| | | | only a single DIV or PRE. Tag all displays with display class.
* Have synopsis_pre() in -Thtml emit P or BR, not DIVs.Kristaps Dzonsons2010-12-171-0/+4
| | | | Banish header and footer TABLE styling to example.style.css.
* Use a single P tag for paragraph breaks (which can be configured forKristaps Dzonsons2010-12-151-0/+1
| | | | | | | paragraph breaking in CSS). Use -man's handling of `sp' and `br', which accomodates for scaling widths (-mdoc wasn't).
* Remove stupid outer DIV tag in favour of regular BODY and HTML that canKristaps Dzonsons2010-12-151-0/+1
| | | | | | | | be handled in CSS. Clarified "lit" tag (will be the subject of future clarification). Removed CSS2 note in mandoc.1, which is no longer the case.
* In-progress move from -T[x]html using DIVs for its lists to using DL,Kristaps Dzonsons2010-12-151-7/+4
| | | | | OL, and UL. Issue raised by Will Backman, solution proposed by schwarze@.