diff options
author | Kristaps Dzonsons <kristaps@bsd.lv> | 2011-05-26 14:43:07 +0000 |
---|---|---|
committer | Kristaps Dzonsons <kristaps@bsd.lv> | 2011-05-26 14:43:07 +0000 |
commit | 45ff372f0eebb1e7c8a2dff9d3d6d03438ad2cc1 (patch) | |
tree | dcafdf0e73b6ef61439b3caafa83aca5ef25b776 /preconv.1 | |
parent | 0a5a4862af168a9bcde9fb52486639d47bbba300 (diff) | |
download | mandoc-45ff372f0eebb1e7c8a2dff9d3d6d03438ad2cc1.tar.gz |
preconv is now on encoding-recognition parity with groff. This last
commit adds parsing of "File Variables" in the first two lines in order
to grok the encoding. This completes groff's recognition sequence (-e,
BOM, File variables, -D, default). I've also cleaned up the manual to
indicate this and for some general readability.
preconv is now compiled by default in the Makefile.
Diffstat (limited to 'preconv.1')
-rw-r--r-- | preconv.1 | 71 |
1 files changed, 45 insertions, 26 deletions
@@ -42,18 +42,8 @@ Its arguments are as follows: .Bl -tag -width Ds .It Fl D Ar enc The default encoding. -This is case-insensitive. -See -.Sx Algorithm -and -.Sx Encodings . .It Fl e Ar enc The document's encoding. -This is case-insensitive. -See -.Sx Algorithm -and -.Sx Encodings . .It Ar file The input file. .El @@ -63,27 +53,23 @@ If is not provided, .Nm accepts standard input. -Output is written to standard output. -Unicode characters in the ASCII range are printed as regular ASCII -characters; those above this range are printed using the +See +.Sx Algorithm +for encoding choice. +.Pp +The recoded input is written to standard output: Unicode characters in +the ASCII range are printed as regular ASCII characters, while those +above this range are printed using the .Sq \e[uNNNN] format documented in .Xr mandoc_char 7 . .Pp If input bytes are improperly formed in the current encoding, they're passed unmodified to standard output. -.Ss Encodings -The +For some encodings, such as UTF-8, unrecoverable input sequences will +cause .Nm -utility accepts the -.Ar utf\-8 , -.Ar us\-ascii , -and -.Ar latin\-1 -encodings as arguments to -.Fl D Ar enc -or -.Fl e Ar enc . +to stop processing and exit. .Ss Algorithm An encoding is chosen according to the following steps: .Bl -enum @@ -91,13 +77,41 @@ An encoding is chosen according to the following steps: From the argument passed to .Fl e Ar enc . .It -If a BOM exists, utf\-8 encoding is selected. +If a BOM exists, UTF\-8 encoding is selected. +.It +From the coding tags parsed from +.Qq File Variables +on the first two lines of input. +A file variable is an input line of the form +.Pp +.Dl \%.\e\(dq -*- key: val [; key: val ]* -*- +.Pp +where +.Cm key +is +.Qq coding +and +.Cm val +is the name of the encoding. +A typical usage may be +.Pp +.Dl \%.\e\(dq -*- mode: troff; coding: utf-8 -*- .It From the argument passed to .Fl D Ar enc . .It If all else fails, Latin\-1 is used. .El +.Pp +The +.Nm +utility recognises the UTF\-8, us\-ascii, and latin\-1 encodings as +passed to the +.Fl e +and +.Fl D +arguments, or as coding tags. +Encodings are matched case-insensitively. .\" .Sh IMPLEMENTATION NOTES .\" Not used in OpenBSD. .\" .Sh RETURN VALUES @@ -107,7 +121,12 @@ If all else fails, Latin\-1 is used. .\" .Sh FILES .Sh EXIT STATUS .Ex -std -.\" .Sh EXAMPLES +.Sh EXAMPLES +Explicitly page a UTF\-8 manual +.Pa foo.1 +in the current locale: +.Pp +.Dl $ preconv \-e utf\-8 foo.1 | mandoc -Tlocale | less .\" .Sh DIAGNOSTICS .\" For sections 1, 4, 6, 7, & 8 only. .\" .Sh ERRORS |