************************************************************************ * Official mandoc TODO. * $Id$ ************************************************************************ Many issues are annotated for difficulty as follows: - loc = locality of the issue * single file issue, affects file only, or very few ** single module issue, affects several files of one module *** cross-module issue, significantly impacts multiple modules and may require substantial changes to internal interfaces - exist = difficulty of the existing code in this area * affected code is straightforward and easy to read and change ** affected code is somewhat complex, but once you understand the design, not particularly difficult to understand *** affected code uses a special, exceptionally tricky design - algo = difficulty of the new algorithm to be written * the required logic and code is straightforward ** the required logic is somewhat complex and needs a careful design *** the required logic is exceptionally tricky, maybe an approach to solve that is not even known yet - size = the amount of code to be written or changed * a small number of lines (at most 100, usually much less) ** a considerable amount of code (several dozen to a few hundred) *** a large amount of code (many hundreds, maybe thousands) - imp = importance of the issue * mostly for completeness ** would be nice to have *** issue causes considerable inconvenience Obviously, as the issues have not been solved yet, these annotations are mere guesses, and some may be wrong. ************************************************************************ * crashes ************************************************************************ - The abort() in bufcat(), html.c, can be triggered via buffmt_includes() by running -Thtml -Oincludes on a file containing a long .In argument. Fixing this will probably require reworking the whole bufcat() concept. loc ** exist * algo * size ** imp ** ************************************************************************ * missing features ************************************************************************ --- missing roff features ---------------------------------------------- - .ad (adjust margins) .ad l -- adjust left margin only (flush left) .ad r -- adjust right margin only (flush right) .ad c -- center text on line .ad b -- adjust both margins (alias: .ad n) .na -- temporarily disable adjustment without changing the mode .ad -- re-enable adjustment without changing the mode Adjustment mode is ignored while in no-fill mode (.nf). loc *** exist *** algo ** size ** imp ** (parser reorg would help) - .fc (field control) found by naddy@ in xloadimage(1) loc ** exist *** algo * size * imp * - .nr third argument (auto-increment step size, requires \n+) found by bentley@ in sbcl(1) Mon, 9 Dec 2013 18:36:57 -0700 loc * exist * algo * size * imp ** - .ns (no-space mode) occurs in xine-config(1) reported by brad@ Sat, 15 Jan 2011 15:45:23 -0500 loc *** exist *** algo *** size ** imp * - .ta (tab settings) #1 most important issue naddy@ Mon, 16 Feb 2015 20:59:17 +0100 ircbug(1) gnats(1) reported by brad@ Sat, 15 Jan 2011 15:50:51 -0500 also Tcl_NewStringObj(3) via wiz@ Wed, 5 Mar 2014 22:27:43 +0100 also posix2time(3) Carsten Kunze Mon, 1 Dec 2014 13:03:10 +0100 loc ** exist *** algo ** size ** imp *** - .ti (temporary indent) found by naddy@ in xloadimage(1) [devel/libvstr] vstr(3) found by bentley@ in nmh(1) Mon, 23 Apr 2012 13:38:28 -0600 loc ** exist ** algo ** size * imp ** (parser reorg helps a lot) - .while and .shift found by jca@ in ratpoison(1) Sun, 30 Jun 2013 12:01:09 +0200 loc * exist ** algo ** size ** imp ** - \h horizontal move #2 most important issue naddy@ Mon, 16 Feb 2015 20:59:17 +0100 found in cclive(1) nasm(1) bogofilter(1) asciidoc/DocBook output bentley@ on discuss@ Sat, 21 Sep 2013 22:29:34 -0600 naddy@ Thu, 4 Dec 2014 16:26:41 +0100 loc ** exist ** algo ** size * imp *** (parser reorg helps a lot) - \n+ and \n- numerical register increment and decrement found by bentley@ in sbcl(1) Mon, 9 Dec 2013 18:36:57 -0700 loc * exist * algo * size * imp ** - \n(.$ macro argument count number register; ocserv(8) by autogen found by sthen@ Thu, 19 Feb 2015 22:03:01 +0000 loc * exist ** algo * size * imp ** - \w'' improve width measurements would not be very useful without an expression parser, see below needed for Tcl_NewStringObj(3) via wiz@ Wed, 5 Mar 2014 22:27:43 +0100 loc ** exist *** algo *** size * imp *** - \\ in high-level macro arguments Currently, \\ is expanded in two situations: 1) macro and string definition (roff.c setstrn()) 2) macro argument parsing (mandoc.c mandoc_getarg()) For user defined macros, the second happens in time because of ROFF_REPARSE. But for standard high-level macros, it only happens after entering the high level parsers, which is too late because the code doesn't get back to roff.c roff_res() from that point. Because this requires distinguishing requests, user-defined macros and standard macros on the roff_res() level, it is hard to solve without the parser reorg. Found by naddy@ in devel/cutils cobfusc(1) Mon, 16 Feb 2015 19:10:52 +0100 loc *** exist *** algo *** size ** imp * - using undefined strings or macros defines them to be empty wl@ Mon, 14 Nov 2011 14:37:01 +0000 loc * exist * algo * size * imp * --- missing mdoc features ---------------------------------------------- - .Bl -column .Xo support is missing ultimate goal: restore .Xr and .Dv to lib/libc/compat-43/sigvec.3 lib/libc/gen/signal.3 lib/libc/sys/sigaction.2 loc * exist *** algo *** size * imp ** - edge case: decide how to deal with blk_full bad nesting, e.g. .Sh .Nm .Bk .Nm .Ek .Sh found by jmc@ in ssh-keygen(1) from jmc@ Wed, 14 Jul 2010 18:10:32 +0100 loc * exist *** algo *** size ** imp ** - .Bd -centered implies -filled, not -unfilled, which is not easy to implement; it requires code similar to .ce, which we don't have either. Besides, groff has bug causing text right *before* .Bd -centered to be centered as well. loc *** exist *** algo ** size ** imp ** (parser reorg would help) - .Bd -filled should not be the same as .Bd -ragged, but align both the left and right margin. In groff, it is implemented in terms of .ad b, which we don't have either. Found in cksum(1). loc *** exist *** algo ** size ** imp ** (parser reorg would help) - implement blank `Bl -column', such as .Bl -column .It foo Ta bar .El loc * exist *** algo *** size * imp * - explicitly disallow nested `Bl -column', which would clobber internal flags defined for struct mdoc_macro loc * exist * algo * size * imp ** - In .Bl -column .It, the end of the line probably has to be regarded as an implicit .Ta, if there could be one, see the following mildly ugly code from login.conf(5): .Bl -column minpasswordlen program xetcxmotd .It path Ta path Ta value of Dv _PATH_DEFPATH .br Default search path. reported by Michal Mazurek via jmc@ Thu, 7 Apr 2011 16:00:53 +0059 loc * exist *** algo ** size * imp ** - inside `.Bl -column' phrases, punctuation is handled like normal text, e.g. `.Bl -column .It Fl x . Ta ...' should give "-x -." - inside `.Bl -column' phrases, TERMP_IGNDELIM handling by `Pf' is not safe, e.g. `.Bl -column .It Pf a b .' gives "ab." but should give "ab ." - check whether it is correct that `D1' uses INDENT+1; does it need its own constant? loc * exist ** algo ** size * imp ** - prohibit `Nm' from having non-text HEAD children (e.g., NetBSD mDNSShared/dns-sd.1) (mdoc_html.c and mdoc_term.c `Nm' handlers can be slightly simplified) - support translated section names e.g. x11/scrotwm scrotwm_es.1:21:2: error: NAME section must be first that one uses NOMBRE because it is spanish... deraadt tends to think that section-dependent macro behaviour is a bad idea in the first place, so this may be irrelevant loc ** exist ** algo ** size * imp ** - When there is free text in the SYNOPSIS and that free text contains the .Nm macro, groff somehow understands to treat the .Nm as an in-line macro, while mandoc treats it as a block macro and breaks the line. No idea how the logic for distinguishing in-line and block instances should be, needs investigation. uqs@ Thu, 2 Jun 2011 11:03:51 +0200 uqs@ Thu, 2 Jun 2011 11:33:35 +0200 loc * exist ** algo *** size * imp ** --- missing man features ----------------------------------------------- - -T[x]html doesn't stipulate non-collapsing spaces in literal mode --- missing tbl features ----------------------------------------------- - look at the POSIX manuals in the books/man-pages-posix port, they use some unsupported tbl(7) features. loc * exist ** algo ** size ** imp *** - use Unicode U+2500 to U+256C for table borders in tbl(7) -Tutf-8 output suggested by bentley@ Tue, 14 Oct 2014 04:10:55 -0600 loc * exist ** algo * size * imp ** - allow standalone `.' to be interpreted as an end-of-layout delimiter instead of being thrown away as a no-op roff line reported by Yuri Pankov, Wed 18 May 2011 11:34:59 CEST loc ** exist ** algo ** size * imp ** --- missing eqn features ----------------------------------------------- - The "size" keyword is parsed, but ignored by the formatter. loc * exist * algo * size * imp * - The spacing characters `~', `^', and tab are currently ignored, see User's Guide (Second Edition) page 2 section 4. loc * exist * algo ** size * imp ** - Mark and lineup are parsed and ignored, see User's Guide (Second Edition) page 5 section 15. loc ** exist ** algo ** size ** imp ** --- missing misc features ---------------------------------------------- - italic correction (\/) in PostScript mode Werner LEMBERG on groff at gnu dot org Sun, 10 Nov 2013 12:47:46 loc ** exist ** algo * size * imp * - When makewhatis(8) encounters a FATAL parse error, it silently treats the file as formatted, which makes no sense at all for paths like man1/foo.1 - and which also contradicts what the manual says at the end of the description. The end result will be ENOENT for file names returned by mansearch() in manpage.file. loc * exist * algo * size * imp ** - makewhatis(8) for preformatted pages: parse the section number from the header line and compare to the section number from the directory name loc * exist * algo * size * imp ** - Does makewhatis(8) detect missing NAME sections, missing names, and missing descriptions in all the file formats? loc * exist * algo * size * imp *** - clean up escape sequence handling, creating three classes: (1) fully implemented, or parsed and ignored without loss of content (2) unimplemented, potentially causing loss of content or serious mangling of formatting (e.g. \n) -> ERROR see textproc/mgdiff(1) for nice examples (3) undefined, just output the character -> perhaps WARNING loc *** exist ** algo ** size ** imp *** (parser reorg helps) - kettenis wants base roff, ms, and me Fri, 1 Jan 2010 22:13:15 +0100 (CET) loc ** exist ** algo ** size *** imp * --- compatibility checks ----------------------------------------------- - write a configure check for [[:<:]] support and provide some fallback for whatis(1) when it doesn't work; Svyatoslav Mishyn Wed, 17 Dec 2014 11:07:10 +0200 - is .Bk implemented correctly in modern groff? sobrado@ Tue, 19 Apr 2011 22:12:55 +0200 - compare output to Heirloom roff, Solaris roff, and http://repo.or.cz/w/neatroff.git http://litcave.rudi.ir/ - look at AT&T DWB http://www2.research.att.com/sw/download Carsten Kunze has patches Mon, 4 Aug 2014 17:01:28 +0200 - look at pages generated from reStructeredText, e.g. devel/mercurial hg(1) These are a weird mixture of man(7) and custom autogenerated low-level roff stuff. Figure out to what extent we can cope. For details, see http://docutils.sourceforge.net/rst.html noted by stsp@ Sat, 24 Apr 2010 09:17:55 +0200 reminded by nicm@ Mon, 3 May 2010 09:52:41 +0100 - look at pages generated from ronn(1) github.com/rtomayko/ronn (based on markdown) - look at pages generated from Texinfo source by yat2m, e.g. security/gnupg First impression is not that bad. - look at pages generated by pandoc; see https://github.com/jgm/pandoc/blob/master/src/Text/Pandoc/Writers/Man.hs porting planned by kili@ Thu, 19 Jun 2014 19:46:28 +0200 - check compatibility with Plan9: http://swtch.com/usr/local/plan9/tmac/tmac.an http://swtch.com/plan9port/man/man7/man.html "Anthony J. Bentley" 28 Dec 2010 21:58:40 -0700 - check compatibility with the man(7) formatter https://raw.githubusercontent.com/rofl0r/hardcore-utils/master/man.c - check compatibility with http://ikiwiki.info/plugins/contrib/mandoc/ https://github.com/schmonz/ikiwiki/compare/mandoc Amitai Schlair Mon, 19 May 2014 14:05:53 -0400 ************************************************************************ * formatting issues: ugly output ************************************************************************ - revisit empty in-line macros look at the difference between "Em x Em ." and "Sq x Em ." Carsten Kunze Fri, 12 Dec 2014 00:15:41 +0100 loc *** exist *** algo *** size * imp ** - a column list with blank `Ta' cells triggers a spurious start-with-whitespace printing of a newline - In .Bl -column, .It a"bc" shows the quotes in groff, but not in mandoc loc * exist *** algo ** size * imp ** - In .Bl -column, .It Em AuthenticationKey Length ought to render "Key Length" with emphasis, too, see OpenBSD iked.conf(5). reported again Nicolas Joly via wiz@ Wed, 12 Oct 2011 00:20:00 +0200 loc * exist *** algo *** size ** imp *** - empty phrases in .Bl column produce too few blanks try e.g. .Bl -column It Ta Ta reported by millert Fri, 02 Apr 2010 16:13:46 -0400 loc * exist *** algo *** size * imp ** - .%T can have trailing punctuation. Currently, it puts the trailing punctuation into a trailing MDOC_TEXT element inside its own scope. That element should rather be outside its scope, such that the punctuation does not get underlines. This is not trivial to implement because .%T then needs some features of in_line_eoln() - slurp all arguments into one single text element - and one feature of in_line() - put trailing punctuation out of scope. Found in mount_nfs(8) and exports(5), search for "Appendix". loc ** exist ** algo *** size * imp ** - Trailing punctuation after .%T triggers EOS spacing, at least outside .Rs (eek!). Simply setting ARGSFL_DELIM for .%T is not the right solution, it sends mandoc into an endless loop. reported by Nicolas Joly Sat, 17 Nov 2012 11:49:54 +0100 loc * exist ** algo ** size * imp ** - global variables in the SYNOPSIS of section 3 pages .Vt vs .Vt/.Va vs .Ft/.Va vs .Ft/.Fa ... from kristaps@ Tue, 08 Jun 2010 11:13:32 +0200 - in enclosures, mandoc sometimes fancies a bogus end of sentence reminded by jmc@ Thu, 23 Sep 2010 18:13:39 +0059 loc * exist ** algo *** size * imp *** - a line starting with "\fB something" counts as starting with whitespace and triggers a line break; found in audio/normalize-mp3(1) loc ** exist * algo ** size * imp ** - formatting /usr/local/man/man1/latex2man.1 with groff and mandoc reveals lots of bugs both in groff and mandoc... reported by bentley@ Wed, 22 May 2013 23:49:30 -0600 --- PDF issues --------------------------------------------------------- - PDF output doesn't use a monospaced font for .Bd -literal Example: "mandoc -Tpdf afterboot.8 > output.pdf && pdfviewer output.pdf". Search the text "Routing tables". Also check what PostScript mode does when fixing this. reported by juanfra@ Wed, 04 Jun 2014 21:44:58 +0200 instructions from juanfra@ Wed, 11 Jun 2014 02:21:01 +0200 add a new <> block to the PDF files with /BaseFont /Courier and change the /Name from /F0 to the new font (/F5 (?)). loc * exist ** algo ** size * imp ** --- HTML issues -------------------------------------------------------- -
formatting is ugly hints are easy to find on the web, e.g. http://stackoverflow.com/questions/1713048/ see also matthew@ Fri, 18 Jul 2014 19:25:12 -0700 loc * exist * algo ** size * imp *** - jsg on icb, Nov 3, 2014: try to guess Xr in man(7) for hyperlinking - The tables used to render the three-part page headers actually force the width of the to the max-width given for . Not yet sure how to fix that... Observed by an Anonymous Coward on undeadly.org: http://undeadly.org/cgi?action=article&sid=20140925064244&pid=1 loc * exist * algo ** size * imp *** - consider whether can be used for Ar Dv Er Ev Fa Va. from bentley@ Wed, 13 Aug 2014 09:17:55 -0600 - check https://github.com/trentm/mdocml ************************************************************************ * formatting issues: gratuitous differences ************************************************************************ - .Fn reopens a new scope after punctuation in mandoc, but closes its scope for good in groff. Do we want to change mandoc or groff? Steffen Nurpmeso Sat, 08 Nov 2014 13:34:59 +0100 loc * exist ** algo ** size * imp ** - In .Bl -enum -width 0n, groff continues one the same line after the number, mandoc breaks the line. mail to kristaps@ Mon, 20 Jul 2009 02:21:39 +0200 loc * exist ** algo ** size * imp ** - .Pp between two .It in .Bl -column should produce one, not two blank lines, see e.g. login.conf(5). reported by jmc@ Sun, 17 Apr 2011 14:04:58 +0059 reported again by sthen@ Wed, 18 Jan 2012 02:09:39 +0000 (UTC) loc * exist *** algo ** size * imp ** - If the *first* line after .It is .Pp, break the line right after the tag, do not pad with space characters before breaking. See the description of the a, c, and i commands in sed(1). loc * exist ** algo ** size * imp ** - If the first line after .It is .D1, do not assert a blank line in between, see for example tmux(1). reported by nicm@ 13 Jan 2011 00:18:57 +0000 loc * exist ** algo ** size * imp ** - Trailing punctuation after .It should trigger EOS spacing. reported by Nicolas Joly Sat, 17 Nov 2012 11:49:54 +0100 Probably, this should be fixed somewhere in termp_it_pre(), not sure. loc * exist ** algo ** size * imp ** - .Nx 1.0a should be "NetBSD 1.0A", not "NetBSD 1.0a", see OpenBSD ccdconfig(8). loc * exist * algo * size * imp ** - In .Bl -tag, if a tag exceeds the right margin and must be continued on the next line, it must be indented by -width, not width+1; see "rule block|pass" in OpenBSD ifconfig(8). loc * exist *** algo ** size * imp ** - When the -width string contains macros, the macros must be rendered before measuring the width, for example .Bl -tag -width ".Dv message" in magic(5), located in src/usr.bin/file, is the same as -width 7n, not -width 11n. The same applies to .Bl -column column widths; reported again by Nicolas Joly Thu, 1 Mar 2012 13:41:26 +0100 via wiz@ 5 Mar reported again by Franco Fichtner Fri, 27 Sep 2013 21:02:28 +0200 loc *** exist *** algo *** size ** imp *** An easy partial fix would be to just skip the first word if it starts with a dot, including any following white space, when measuring. loc * exist * algo * size * imp *** - The \& zero-width character counts as output. That is, when it is alone on a line between two .Pp, we want three blank lines, not two as in mandoc. loc ** exist ** algo ** size * imp ** - Header lines of excessive length: Port OpenBSD man_term.c rev. 1.25 to mdoc_term.c and document it in mdoc(7) and man(7) COMPATIBILITY found while talking to Chris Bennett loc * exist * algo * size * imp * - trailing whitespace must be ignored even when followed by a font escape, see for example makes \fBdig \fR operate in batch mode in dig(1). loc ** exist ** algo ** size * imp ** ************************************************************************ * portability ************************************************************************ - word boundaries in regular expressions for whatis(1) set up config tests to use [[:<:]], \<, or nothing reminded by Peter Bray Fri, 03 Apr 2015 23:02:16 +1100 ************************************************************************ * warning issues ************************************************************************ - check that MANDOCERR_BADTAB is thrown in the right cases, i.e. when finding a literal tab character in fill mode, and possibly change the wording of the warning message to refer to fill mode, not literal mode See the mail from Werner LEMBERG on the groff list, Fri, 14 Feb 2014 18:54:42 +0100 (CET) loc * exist ** algo ** size * imp ** - warn about attempts to call non-callable macros Steffen Nurpmeso Tue, 11 Nov 2014 22:55:16 +0100 Note that formatting is inconsistent in groff. .Fn Po prints "Po()", .Ar Sh prints "file ..." and no "Sh". Relatively hard because the relevant code is scattered all over mdoc_macro.c and all subtly different. loc ** exist ** algo ** size ** imp ** - warn about "new sentence, new line" loc ** exist ** algo *** size * imp ** - mandoc_special does not really check the escape sequence, but just the overall format loc ** exist ** algo *** size ** imp ** - integrate mdoclint into mandoc ("end-of-line whitespace" thread) from jmc@ Mon, 13 Jul 2009 17:12:09 +0100 from kristaps@ Mon, 13 Jul 2009 18:34:53 +0200 from jmc@ Mon, 13 Jul 2009 17:45:37 +0059 from kristaps@ Mon, 13 Jul 2009 19:02:03 +0200 (mostly done, check what remains) - -Tlint parser errors and warnings to stdout to tech@mdocml, naddy@ Wed, 28 Sep 2011 11:21:46 +0200 wait! kristaps@ Sun, 02 Oct 2011 17:12:52 +0200 - for system errors, use errno/strerror/warn/err ************************************************************************ * documentation issues ************************************************************************ - mention hyphenation rules: breaking at letter-letter in text mode (not macro args) proper hyphenation is unimplemented - talk about spacing around delimiters to jmc@, kristaps@ Sat, 23 Apr 2011 17:41:27 +0200 - mark macros as: page structure domain, manual domain, general text domain is this useful? - mention /usr/share/misc/mdoc.template in mdoc(7)? - Is all the content from http://www.std.com/obi/BSD/doc/usd/28.tbl/tbl covered in tbl(7)? ************************************************************************ * performance issues ************************************************************************ - Why are we using MAP_SHARED, not MAP_PRIVATE for mmap(2)? How does SQLITE_CONFIG_PAGECACHE actually work? Document it! from kristaps@ Sat, 09 Aug 2014 13:51:36 +0200 Several areas can be cleaned up to make mandoc even faster. These are - improve hashing mechanism for macros (quite important: performance) - improve hashing mechanism for characters (not as important) - the PDF file is HUGE: this can be reduced by using relative offsets - instead of re-initialising the roff predefined-strings set before each parse, create a read-only version the first time and copy it loc * exist ** algo ** size * imp ** ************************************************************************ * structural issues ************************************************************************ - Improve -O suboption parsing. Do it in the main program such that errors can be reported. Pay attention to distinguishing the mandoc(1) and apropos(1) styles of both options. loc ** exist * algo ** size ** imp *** - Use libz directly instead of forking gunzip(1). Suggested by bapt at FreeBSD among others. - We use the input line number at several places to distinguish same-line from different-line input. That plainly doesn't work with user-defined macros, leading to random breakage. - Find better ways to prevent endless loops in roff(7) macro and string expansion. - Finish cleanup of date handling. Decide which formats should be recognized where. Update both mdoc(7) and man(7) documentation. Triggered by Tim van der Molen Tue, 22 Feb 2011 20:30:45 +0100 - struct mparse refactoring Steffen Nurpmeso Thu, 04 Sep 2014 12:50:00 +0200 - Consider creating some views that will make the database more readable from the sqlite3 shell. Consider using them to abstract from the database structure, too. suggested by espie@ Sat, 19 Apr 2014 14:52:57 +0200 ************************************************************************ * CGI issues ************************************************************************ - Enable HTTP compression by detecting gzip encoding and filtering output through libz. - Sandbox (see OpenSSH). - Enable caching support via HTTP 304 and If-Modified-Since. - Allow for cgi.h to be overridden by CGI environment variables. Otherwise, binary distributions will inherit the compile-time behaviour, which is not optimal. - Have Mac OSX systems automatically disable -static compilation of the CGI: -static isn't supported. ************************************************************************ * to improve in the groff_mdoc(7) macros ************************************************************************ - use uname(1) to set doc-default-operating-system at install time tobimensch Mon, 1 Dec 2014 00:25:07 +0100