summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorIngo Schwarze <schwarze@openbsd.org>2016-08-01 12:27:15 +0000
committerIngo Schwarze <schwarze@openbsd.org>2016-08-01 12:27:15 +0000
commit650e61d7ec09f05839e8744e003a7dc376facf73 (patch)
treec4439cc21465b7d2b4e171f4cc8c7e83aeef1aaf
parent5ce556afa6b71ea3e8cb0f4bd06ab4cbee499ef9 (diff)
downloadmandoc-650e61d7ec09f05839e8744e003a7dc376facf73.tar.gz
document the new file format
-rw-r--r--mandoc.db.5230
1 files changed, 151 insertions, 79 deletions
diff --git a/mandoc.db.5 b/mandoc.db.5
index d0ccfb6c..2f688193 100644
--- a/mandoc.db.5
+++ b/mandoc.db.5
@@ -1,6 +1,6 @@
.\" $Id$
.\"
-.\" Copyright (c) 2014 Ingo Schwarze <schwarze@openbsd.org>
+.\" Copyright (c) 2014, 2016 Ingo Schwarze <schwarze@openbsd.org>
.\"
.\" Permission to use, copy, modify, and distribute this software for any
.\" purpose with or without fee is hereby granted, provided that the above
@@ -23,7 +23,7 @@
.Sh DESCRIPTION
The
.Nm
-SQLite3 file format is used to store information about installed manual
+file format is used to store information about installed manual
pages to facilitate semantic searching for manuals.
Each manual page tree contains its own
.Nm
@@ -34,87 +34,156 @@ for examples.
Such database files are generated by
.Xr makewhatis 8
and used by
+.Xr man 1 ,
.Xr apropos 1
and
.Xr whatis 1 .
.Pp
-One line in the following tables describes:
-.Bl -tag -width Ds
-.It Sy mpages
-One physical manual page file, no matter how many times and under which
-names it may appear in the file system.
-.It Sy mlinks
-One entry in the file system, no matter which content it points to.
-.It Sy names
-One manual page name, no matter whether it appears in a page header,
-in a NAME or SYNOPSIS section, or as a file name.
-.It Sy keys
-One chunk of text from some macro invocation.
+The file format uses three datatypes:
+.Pp
+.Bl -dash -compact -offset 2n -width 1n
+.It
+32-bit signed integer numbers in big endian (network) byte ordering
+.It
+NUL-terminated strings
+.It
+lists of NUL-terminated strings, terminated by a second NUL character
.El
.Pp
-Each record in the latter three tables uses its
-.Va pageid
-column to point to a record in the
-.Sy mpages
-table.
+Numbers are aligned to four-byte boundaries; where they follow
+strings or lists of strings, padding with additional NUL characters
+occurs.
+Some, but not all, numbers point to positions in the file.
+These pointers are measured in bytes, and the first byte of the
+file is considered to be byte 0.
+.Pp
+Each file consists of:
+.Pp
+.Bl -dash -compact -offset 2n -width 1n
+.It
+One magic number, 0x3a7d0cdb.
+.It
+One version number, currently 1.
+.It
+One pointer to the macros table.
+.It
+One pointer to the final magic number.
+.It
+The pages table (variable length).
+.It
+The macros table (variable length).
+.It
+The magic number once again, 0x3a7d0cdb.
+.El
.Pp
-The other columns are as follows; unless stated otherwise, they are
-of type
-.Vt TEXT .
-.Bl -tag -width mpages.desc
-.It Sy mpages.desc
-The description line
-.Pq Sq \&Nd
-of the page.
-.It Sy mpages.form
-An
-.Vt INTEGER
-bit field.
-If bit
-.Dv FORM_GZ
-is set, the page is compressed and requires
-.Xr gunzip 1
-for display.
-If bit
-.Dv FORM_SRC
-is set, the page is unformatted, that is in
+The pages table contains one entry for each physical manual page
+file, no matter how many hard and soft links it may have in the
+file system.
+The pages table consists of:
+.Pp
+.Bl -dash -compact -offset 2n -width 1n
+.It
+The number of pages in the database.
+.It
+For each page:
+.Bl -dash -compact -offset 2n -width 1n
+.It
+One pointer to the list of names.
+.It
+One pointer to the list of sections.
+.It
+One pointer to the list of architectures
+or 0 if the page is machine-independent.
+.It
+One pointer to the one-line description string.
+.It
+One pointer to the list of filenames.
+.El
+.It
+For each page, the list of names.
+Each name is preceded by a single byte indicating the sources of the name.
+The meaning of the bits is:
+.Bl -dash -compact -offset 2n -width 1n
+.It
+0x10: The name appears in a filename.
+.It
+0x08: The name appears in a header line, i.e. in a .Dt or .TH macro.
+.It
+0x04: The name is the first one in the title line, i.e. it appears
+in the first .Nm macro in the NAME section.
+.It
+0x02: The name appears in any .Nm macro in the NAME section.
+.It
+0x01: The name appears in an .Nm block in the SYNOPSIS section.
+.El
+.It
+For each page, the list of sections.
+Each section is given as a string, not as a number.
+.It
+For each architecture-dependent page, the list of architectures.
+.It
+For each page, the one-line description string taken from the .Nd macro.
+.It
+For each page, the list of filenames relative to the root of the
+respective manpath.
+This list includes hard links, soft links, and links simulated
+with .so
+.Xr roff 7
+requests.
+The first filename is preceded by a single byte
+having the following significance:
+.Bl -dash -compact -offset 2n -width 1n
+.It
+.Dv FORM_SRC No = 0x01 :
+The file format is
.Xr mdoc 7
or
-.Xr man 7
-format, and requires
-.Xr mandoc 1
-for display.
-If bit
-.Dv FORM_SRC
-is not set, the page is formatted, i.e. a
-.Sq cat
-page.
-.It Sy mlinks.sec
-The manual section as found in the subdirectory name.
-.It Sy mlinks.arch
-The manual architecture as found in the subdirectory name, or
-.Qq any .
-.It Sy mlinks.name
-The manual name as found in the file name.
-.It Sy names.bits
-An
-.Vt INTEGER
-bit mask telling whether the name came from a header line, from the
-NAME or SYNOPSIS section, or from a file name.
-Bits are defined in
-.In mansearch.h .
-.It Sy names.name
-The name itself.
-.It Sy keys.bits
-An
-.Vt INTEGER
-bit mask telling which semantic contexts the key was found in;
-defined in
-.In mansearch.h ,
-documented in
+.Xr man 7 .
+.It
+.Dv FORM_CAT No = 0x02 :
+The manual page is preformatted.
+.El
+.It
+Zero to three NUL bytes for padding.
+.El
+.Pp
+The macros table consists of:
+.Pp
+.Bl -dash -compact -offset 2n -width 1n
+.It
+The number of different macro keys, currently 36.
+The ordering of macros is defined in
+.In mansearch.h
+and the significance of the macro keys is documented in
.Xr apropos 1 .
-.It Sy keys.key
-The string found in those contexts.
+.It
+For each macro key, one pointer to the respective macro table.
+.It
+For each macro key, the macro table (variable length).
+.El
+.Pp
+Each macro table consists of:
+.Pp
+.Bl -dash -compact -offset 2n -width 1n
+.It
+The number of entries in the table.
+.It
+For each entry:
+.Bl -dash -compact -offset 2n -width 1n
+.It
+One pointer to the value of the macro key.
+Each value is a string of text taken from some macro invocation.
+.It
+One pointer to the list of pages.
+.El
+.It
+For each entry, the value of the macro key.
+.It
+Zero to three NUL bytes for padding.
+.It
+For each entry, one or more pointers to pages in the pages table,
+pointing to the pointer to the list of names,
+followed by the number 0.
.El
.Sh FILES
.Bl -tag -width /usr/share/man/mandoc.db -compact
@@ -128,10 +197,16 @@ Window System.
The same for
.Xr packages 7 .
.El
+.Pp
+A program to dump
+.Nm
+files in a human-readable format suitable for
+.Xr diff 1
+is provided in the directory
+.Pa /usr/src/regress/usr.bin/mandoc/db/dbm_dump/ .
.Sh SEE ALSO
.Xr apropos 1 ,
.Xr man 1 ,
-.Xr sqlite3 1 ,
.Xr whatis 1 ,
.Xr makewhatis 8
.Sh HISTORY
@@ -140,7 +215,7 @@ A manual page database
first appeared in
.Bx 2 .
The present format first appeared in
-.Ox 5.6 .
+.Ox 6.1 .
.Sh AUTHORS
.An -nosplit
The original version of
@@ -148,9 +223,6 @@ The original version of
was written by
.An Bill Joy
in 1979.
-An SQLite3 version was first implemented by
-.An Kristaps Dzonsons Aq Mt kristaps@bsd.lv
-in 2012.
The present database format was designed by
.An Ingo Schwarze Aq Mt schwarze@openbsd.org
-in 2014.
+in 2016.