summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Several improvements to escape sequence handling.Ingo Schwarze2018-12-1540-128/+507
| | | | | | | | | | | | | | | | | | | | | | | * Add the missing special character \_ (underscore). * Partial implementations of \a (leader character) and \E (uninterpreted escape character). * Parse and ignore \r (reverse line feed). * Add a WARNING message about undefined escape sequences. * Add an UNSUPP message about unsupported escape sequences. * Mark \! and \? (transparent throughput) and \O (suppress output) as unsupported. * Treat the various variants of zero-width spaces as one-byte escape sequences rather than as special characters, to avoid defining bogus forms with square brackets. * For special characters with one-byte names, do not define bogus forms with square brackets, except for \[-], which is valid. * In the form with square brackets, undefined special characters do not fall back to printing the name verbatim, not even for one-byte names. * Starting a special character name with a blank is an error. * Undefined escape sequences never abort formatting of the input string, not even in HTML output mode. * Document the newly handled escapes, and a few that were missing. * Regression tests for most of the above.
* zap trailing whitespace; from jmc@Ingo Schwarze2018-12-141-1/+1
|
* Cleanup, no functional change:Ingo Schwarze2018-12-1414-63/+19
| | | | | | | | | | Now that message handling is properly encapsulated, remove struct mparse pointers from four structs (roff, roff_man, tbl_node, eqn_node) and from the argument lists of five functions (roff_alloc, roff_man_alloc, mandoc_getarg, tbl_alloc, eqn_alloc). Except for being passed to the main program as an opaque object, it now only occurs in read.c, as it should, and not across 15 files like in the past.
* Almost mechanical diff to remove the "struct mparse *" argumentIngo Schwarze2018-12-1419-566/+424
| | | | | | | | from mandoc_msg(), where it is no longer used. While here, rename mandoc_vmsg() to mandoc_msg() and retire the old version: There is really no point in having another function merely to save "%s" in a few places. Minus 140 lines of code.
* Fold mparse_parse_buffer() into mparse_readfd(), making the codeIngo Schwarze2018-12-141-40/+41
| | | | | | | | | | considerably more readable. This is possible now that i finally deleted mparse_readmem() from mandoc portable - an unused function that never existed in OpenBSD. This cleanup already made me find a minor bug: after a recursive parse, restoring the line number of the parent file was forgotten. This is fixed now.
* Delete the function mparse_readmem() that has been unused for almost aIngo Schwarze2018-12-142-15/+0
| | | | | | decade but regularly makes maintenance harder. Mandoc is not a general-purpose library, and being as pluggable as possible is not among the goals of the project.
* Major cleanup; may imply minor changes in edge cases of error reporting.Ingo Schwarze2018-12-1424-477/+426
| | | | | | | | | | | Finally, drop support for the run-time configurable mandocmsg() callback. It was over-engineered from the start, never used for anything in a decade, and repeatedly caused maintenance headaches. Consolidate reporting infrastructure into two files, mandoc.h and mandoc_msg.c, mopping up the bits and pieces that were scattered around main.c, read.c, mandoc_parse.h, libmandoc.h, the prototypes of four parsing-related functions, and both parser structs.
* Cleanup, no functional change:Ingo Schwarze2018-12-1323-116/+148
| | | | | | | | | | Split the top level parser interface out of the utility header mandoc.h, into a new header mandoc_parse.h, for use in the main program and in the main parser only. Move enum mandoc_os into roff.h because struct roff_man is the place where it is stored. This allows removal of mandoc.h from seven files in low-level parsers and in formatters.
* libmdoc.h no longer needs mdoc.hIngo Schwarze2018-12-132-14/+9
|
* Cleanup, no functional change:Ingo Schwarze2018-12-134-85/+50
| | | | | | | Finally merge the pointless file st.in into st.c. Nobody should do operating systems dependent changes to standards: By definition, standards are the same for every operating system. While here, libmdoc.h no longer requires mdoc.h.
* Cleanup, no functional change:Ingo Schwarze2018-12-134-10/+26
| | | | | | Move the roffhash_*() functions from roff.h to roff_int.h because they are only intended for use by parsers, neither by main programs nor by formatters.
* Cleanup, no functional change:Ingo Schwarze2018-12-1311-86/+117
| | | | | | No need to expose the eqn(7) syntax tree data structures everywhere. Move them to their own include file, "eqn.h". While here, delete the unused enum eqn_pilet.
* Cleanup, no functional change:Ingo Schwarze2018-12-137-98/+117
| | | | | | | | In libroff.h, nothing was left except the eqn(7) parser interface, which isn't really part of the roff(7) parser, so rename it to eqn_parse.h. While here, move struct eqn_def to eqn.c because that's the only file using it, and let eqn_box_free() and eqn_free() handle NULL.
* Cleanup, no functional change:Ingo Schwarze2018-12-1312-103/+206
| | | | | Move tbl(7)-specific parser internals out of libroff.h. Move some tbl(7)-internal processing from roff.c to tbl.c.
* Cleanup, no functional change:Ingo Schwarze2018-12-1215-125/+179
| | | | | No need to expose the tbl(7) syntax tree data structures everywhere. Move them to their own include file, "tbl.h", and improve comments.
* HTML syntax audit: render \p as <br/>, not as <div>.Ingo Schwarze2018-12-041-4/+1
| | | | It can occur anywhere, in particular in phrasing context.
* Restrict "vertical-align: middle;" to <td> descendants of class="tbl"Ingo Schwarze2018-12-041-2/+2
| | | | elements, we don't want that for other tables.
* Make sure all borders in a table are drawn in the same color.Ingo Schwarze2018-12-041-1/+6
| | | | | | | Required because browsers tend to have inconsistent defaults: For example, Firefox 62.0.2 sets border-color for tbody, but not for table, and Pali Rohar reports that Chrome set it for td, but not for tr or tbody. The td part is from Pali Rohar, the tbody and tr parts from me.
* During validation, drop .br before a text line starting with aIngo Schwarze2018-12-041-0/+8
| | | | | | | | | blank, rather than teaching each formatter individually to ignore the .br in such situations. That's simpler and also results in better diagnostics. Mark Harris <mark dot hsj at gmail dot com> reported that -T html got confused in particular.
* Clean up the validation of .Pp, .PP, .sp, and .br. Make sure allIngo Schwarze2018-12-0416-97/+149
| | | | | | | | | | | | | | combinations are handled, and are handled in a systematic manner. This resolves some erratic duplicate handling, handles a number of missing cases, and improves diagnostics in various respects. Move validation of .br and .sp to the roff validation module rather than doing that twice in the mdoc and man validation modules. Move the node relinking function to the roff library where it belongs. In validation functions, only look at the node itself, at previous nodes, and at descendants, not at following nodes or ancestors, such that only nodes are inspected which are already validated.
* In the validators, translate obsolete macro aliases (Lp, Ot, LP, P)Ingo Schwarze2018-12-039-38/+132
| | | | | | to the standard forms (Pp, Ft, PP) up front, such that later code does not need to look for the obsolete versions. This reduces the risk of incomplete handling.
* Render .br as <br/>, not as an empty <div>.Ingo Schwarze2018-12-031-5/+1
| | | | | | | | | | | | | | | | | The element <br/> was already employed for many other purposes, so there is nothing wrong with using it. Also, it is safer because <br/> is permitted in phrasing content, whereas <div> is only allowed in flow content. This is the first part of the HTML syntax audit which i wanted to do for a long time. Reminded by a loosely related bug report from Mark Harris <mark dot hsj at gmail dot com>. Examples of where this caused HTML nesting syntax errors: * in man(7) code between .nf and .fi * in mdoc(7) code between .Bd -unfilled and .Ed * in mdoc(7) code between .Ql Xo and .Xc * in mdoc(7) code between .Rs and .Re
* Do not draw horizontal lines through vertical spansIngo Schwarze2018-11-291-4/+18
| | | | | which are requested in the data section rather than in the layout. Mini-feature found in misc/pfm(1).
* Now that it is better understood how borders work,Ingo Schwarze2018-11-291-71/+105
| | | | | | | | | | | rewrite tbl_hrule() in a simpler way. Fix several bugs in the process. No more special flags, just use the existing TBL_OPT_* from mandoc.h. Reduce the number of tracked rows from three to two, which is more logical: one above the line and one below is sufficient to figure out crossings. No more magic quirks, all conditions are readily comprehensible now. Add comments.
* Better handle automatic column width assignments in the presence ofIngo Schwarze2018-11-292-47/+205
| | | | | | | | | | | | | | horizontal spans, by implementing a moderately difficult iterative algoritm. The benefit is that spans containing long text no longer cause an excessive width of their starting column. The result is likely not optimal, in particular in the presence of many spans overlapping in complicated ways nor when spans interact with equalizing or maximizing colums. But i doubt the practical usefulness of making this more complicated. Issue originally reported in synaptics(4), which now looks better, by tedu@ three years ago, and reminded by Pali Rohar this summer.
* Bugfix: never set termp->enc to the ambiguous value TERMENC_LOCALE,Ingo Schwarze2018-11-281-3/+3
| | | | | | but instead set it to TERMENC_UTF8 or TERMENC_ASCII. Makes tbl(7) box drawing work under -T locale (that is, by default when LC_CTYPE is defined appropriately).
* additional check needed after the previous (box drawing) patchIngo Schwarze2018-11-281-4/+7
|
* In -T utf8 output mode, render tbl(7) borders with the UnicodeIngo Schwarze2018-11-282-159/+317
| | | | | | | | | | | | | box drawing characters, U+2500 to U+257F. Originally suggested by bentley@ four years ago, reminded this summer by Pali Rohar. Binary and decimal arithmetics are boring, so let's use some ternary arithmetics for a change. That said, some other aspects are too complicated for my liking, so this could use some polishing in the future.
* Implement tbl(7) lines in -T html output,Ingo Schwarze2018-11-262-69/+128
| | | | | | | | | | | | as far as they are on the edges of table cells rather than going through the middle of cells: * the box, doublebox, and allbox options; * the | and || layout modifiers; * and the _ and = data lines; - but not yet _ and = in individual layout and data cells. Missing feature reported by Pali dot Rohar at gmail dot com.
* When a conditional block is closed by putting "\}" on a text lineIngo Schwarze2018-11-264-10/+58
| | | | | | | | | | | | | by itself (which is somewhat unusual but not invalid; most authors use the empty macro line ".\}" instead), agree more closely with groff and do not produce a double space in the output. Quirk reported by millert@. While here, tweak the rest of the function body of roff_cond_text() to more closely match roff_cond_sub(). The subtly different handling could make people (including myself) wonder whether there is any point in being different. Testing shows there is not.
* Mark Harris pointed out that people might have doubts whether all filesIngo Schwarze2018-11-261-4/+4
| | | | | contained in the mandoc toolkit are "code and documentation", and whether this is of any consequence for licensing, so clarify.
* Place mandoc.css into the public domain.Ingo Schwarze2018-11-261-0/+5
| | | | | | | | | | | | | | | | The reason for doing this rather than using the ISC license is that i guess that in some contexts, a requirement to preserve a Copyright and license header might be inconvenient, and i really don't care at all how people use it. What matters is that they do use it, or something similar - attempts to use mandoc without any CSS are a constant source of grief and bogus bug reports because HTML without CSS doesn't look very good: the more structural and semantic and the less presentational and old-fashioned the HTML, the more so. Thanks to Mark Harris <mark dot hsj at gmail dot com> for pointing out that the permissions on this particular file were unclear.
* Simplify writing of tbl(7) cells by using the new feature of passingIngo Schwarze2018-11-261-16/+4
| | | | | | a NULL pointer for the value of a style attribute, in which case the attribute is omitted from the HTML element. Minus 12 lines of ugly and repetitive code, no functional change.
* Support more than one style attribute one the same HTML element.Ingo Schwarze2018-11-262-15/+32
| | | | | In fact, this is already required when a table uses non-default horizontal and vertical alignment in the same cell.
* Let cells containing nothing but \^ extend the cell above.Ingo Schwarze2018-11-254-25/+39
| | | | Missing feature reported by Pali dot Rohar at gmail dot com.
* In tbl(7) -T html output,Ingo Schwarze2018-11-257-73/+140
| | | | | | | | | | span cells horizontally and vertically as requested by the layout. Does not handle spans requested in the data section yet. To be able to do this, record the number of rows spanned in the first data cell (struct tbl_dat) of a vertical span. Missing feature reported by Pali dot Rohar at gmail dot com.
* HTML formatting of .IPIngo Schwarze2018-11-251-0/+6
|
* Implement horizontal and vertical alignment of tbl(7) cell contentIngo Schwarze2018-11-243-10/+36
| | | | | in -T html output. This does not handle spanned cells yet. Missing feature reported by Pali dot Rohar at gmail dot com.
* When a font escape appears in the middle of a string,Ingo Schwarze2018-11-231-1/+4
| | | | | make sure it doesn't cause output of bogus whitespace. Fixing a bug reported by Pali dot Rohar at gmail dot com.
* Correct and shorten the description of the sort order of apropos(1)Ingo Schwarze2018-11-221-18/+2
| | | | | | | | results. As a matter of fact, which manpath the page comes from does not matter in that context. That only matters for the priority of pages in man(1) mode (without -a, -f, and -k). Noticed while working on a patch from Yuri Pankov <yuripv at FreeBSD>.
* In apropos(1) output, stop sorting .Nm search results by nameIngo Schwarze2018-11-223-9/+2
| | | | | | | | | | priorities (bits). The obscure feature wasn't documented and merely confused people - for example Edward Tomasz Napierala <trasz at FreeBSD>, see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=227408. Smaller patch provided by Yuri Pankov <yuripv at FreeBSD>, but i'm also retiring the now unused "bits" member from struct manpage. Simplification is good.
* In -T locale (the default), -T ascii, and -T utf8 mode, provide a newIngo Schwarze2018-11-227-12/+60
| | | | | | | | | | | | | | | output option -O tag[=term] to move right to the definition of "term" when opening the manual page in a pager, effectively porting the -T html fragment name feature - https://man.openbsd.org/ksh#ulimit - to the terminal. Try: $ man -O tag uvm_sysctl $ man -O tag=ulimit ksh $ man -O tag 3 compress Feature development triggered by a question from kn@. Klemens also tested, provided feedback that resulted in improvements, and provided an OK.
* Improve POSIX compliance by making case-insensitive extendedIngo Schwarze2018-11-192-13/+31
| | | | | | | | | | | | | | | | | regular expressions the default in man(1) -k searches, also matching what the man-db package used by many Linux distributions does. Originally requested by Wolfram Schneider <wosch at FreeBSD> via Yuri Pankov <yuripv at FreeBSD>. Feedback and OK cheloha@, and no objections when shown on tech@. Thanks to cheloha@ for pointing out that POSIX requires this behaviour and for the suggestion to explicitly say that *extended* regular expressions are used here. While here, unify spelling of case-[in]sensitive, fix a typo, update the EXAMPLES, and add a STANDARDS section.
* Correctly construct empty lists in dbm_page_get().Ingo Schwarze2018-11-191-3/+3
| | | | | | | | | | Original commit message by the author of this bugfix patch, bluhm@: lstmatch() expects a list of strings separated by \0 and terminated with \0\0. In the NULL case dbm_page_get() returned only simple strings so correct processing was depending on data layout. Use an additional \0 to terminate the single string lists. Found by mandoc regress since llvm linker on amd64 arranges strings differently.
* in -man -Thtml, vertical spacing is required before .IPIngo Schwarze2018-10-251-0/+6
|
* Implement the \f(CW and \f(CR (constant width font) escape sequencesIngo Schwarze2018-10-257-1/+21
| | | | | | | | | for HTML output. Somewhat relevant because pod2man(1) relies on this. Missing feature reported by Pali dot Rohar at gmail dot com. Note that constant width font was already correctly selected before this when required by semantic markup. Only attempting physical markup with the low-level escape sequence was ineffective.
* The ctags(1) file format uses whitespace as a field delimiter, andIngo Schwarze2018-10-231-9/+26
| | | | | | | | | | | there is no escaping mechanism, so tags cannot contain whitespace. Consequently, we used to simply not tag macro arguments containing space characters. Instead, let's tag the first word, unless there is a proper match for that word somewhere else. For example, this makes ":tquery" work in ntpd.conf(5). Feature suggested by kn@, who also thinks the implementation looks reasonable and works in his testing.
* Input lines that are not blank but generate no output,Ingo Schwarze2018-10-231-2/+5
| | | | | | | | | | | for example lines containing nothing but "\&", are significant in no-fill mode and can be represented by blank lines inside <pre>. Fixing a bug that Pali dot Rohar at gmail dot com found in pod2man(1) output, for example Email::Address::XS(3p). While here, inside no-fill mode, there is no need to encode totally blank input lines by emulating .PP - just let them through as we are inside <pre> anyway.
* Rewrite parse_path_info() to be four lines shorter, simplify ownershipIngo Schwarze2018-10-191-46/+40
| | | | | | | | of allocated strings, do not write to the input string, and improve diagnostic output. The confusing error message "invalid arch" as a reaction to mistyping the release name was noticed by tb@, who likes the new code and message.
* update DESCRIPTION and COMPATIBILITY, mostly correcting statementsIngo Schwarze2018-10-041-18/+17
| | | | from the past that are no longer true