summaryrefslogtreecommitdiffstats
path: root/preconv.1
diff options
context:
space:
mode:
authorKristaps Dzonsons <kristaps@bsd.lv>2011-05-26 14:43:07 +0000
committerKristaps Dzonsons <kristaps@bsd.lv>2011-05-26 14:43:07 +0000
commit45ff372f0eebb1e7c8a2dff9d3d6d03438ad2cc1 (patch)
treedcafdf0e73b6ef61439b3caafa83aca5ef25b776 /preconv.1
parent0a5a4862af168a9bcde9fb52486639d47bbba300 (diff)
downloadmandoc-45ff372f0eebb1e7c8a2dff9d3d6d03438ad2cc1.tar.gz
preconv is now on encoding-recognition parity with groff. This last
commit adds parsing of "File Variables" in the first two lines in order to grok the encoding. This completes groff's recognition sequence (-e, BOM, File variables, -D, default). I've also cleaned up the manual to indicate this and for some general readability. preconv is now compiled by default in the Makefile.
Diffstat (limited to 'preconv.1')
-rw-r--r--preconv.171
1 files changed, 45 insertions, 26 deletions
diff --git a/preconv.1 b/preconv.1
index d035784f..3e63788f 100644
--- a/preconv.1
+++ b/preconv.1
@@ -42,18 +42,8 @@ Its arguments are as follows:
.Bl -tag -width Ds
.It Fl D Ar enc
The default encoding.
-This is case-insensitive.
-See
-.Sx Algorithm
-and
-.Sx Encodings .
.It Fl e Ar enc
The document's encoding.
-This is case-insensitive.
-See
-.Sx Algorithm
-and
-.Sx Encodings .
.It Ar file
The input file.
.El
@@ -63,27 +53,23 @@ If
is not provided,
.Nm
accepts standard input.
-Output is written to standard output.
-Unicode characters in the ASCII range are printed as regular ASCII
-characters; those above this range are printed using the
+See
+.Sx Algorithm
+for encoding choice.
+.Pp
+The recoded input is written to standard output: Unicode characters in
+the ASCII range are printed as regular ASCII characters, while those
+above this range are printed using the
.Sq \e[uNNNN]
format documented in
.Xr mandoc_char 7 .
.Pp
If input bytes are improperly formed in the current encoding, they're
passed unmodified to standard output.
-.Ss Encodings
-The
+For some encodings, such as UTF-8, unrecoverable input sequences will
+cause
.Nm
-utility accepts the
-.Ar utf\-8 ,
-.Ar us\-ascii ,
-and
-.Ar latin\-1
-encodings as arguments to
-.Fl D Ar enc
-or
-.Fl e Ar enc .
+to stop processing and exit.
.Ss Algorithm
An encoding is chosen according to the following steps:
.Bl -enum
@@ -91,13 +77,41 @@ An encoding is chosen according to the following steps:
From the argument passed to
.Fl e Ar enc .
.It
-If a BOM exists, utf\-8 encoding is selected.
+If a BOM exists, UTF\-8 encoding is selected.
+.It
+From the coding tags parsed from
+.Qq File Variables
+on the first two lines of input.
+A file variable is an input line of the form
+.Pp
+.Dl \%.\e\(dq -*- key: val [; key: val ]* -*-
+.Pp
+where
+.Cm key
+is
+.Qq coding
+and
+.Cm val
+is the name of the encoding.
+A typical usage may be
+.Pp
+.Dl \%.\e\(dq -*- mode: troff; coding: utf-8 -*-
.It
From the argument passed to
.Fl D Ar enc .
.It
If all else fails, Latin\-1 is used.
.El
+.Pp
+The
+.Nm
+utility recognises the UTF\-8, us\-ascii, and latin\-1 encodings as
+passed to the
+.Fl e
+and
+.Fl D
+arguments, or as coding tags.
+Encodings are matched case-insensitively.
.\" .Sh IMPLEMENTATION NOTES
.\" Not used in OpenBSD.
.\" .Sh RETURN VALUES
@@ -107,7 +121,12 @@ If all else fails, Latin\-1 is used.
.\" .Sh FILES
.Sh EXIT STATUS
.Ex -std
-.\" .Sh EXAMPLES
+.Sh EXAMPLES
+Explicitly page a UTF\-8 manual
+.Pa foo.1
+in the current locale:
+.Pp
+.Dl $ preconv \-e utf\-8 foo.1 | mandoc -Tlocale | less
.\" .Sh DIAGNOSTICS
.\" For sections 1, 4, 6, 7, & 8 only.
.\" .Sh ERRORS