summaryrefslogtreecommitdiffstats
path: root/mandocdb.c
Commit message (Collapse)AuthorAgeFilesLines
* Do not sync to disk after each individual manual page (duh!),Ingo Schwarze2014-01-061-4/+6
| | | | | | | | | | | | only sync to disk one single time when all data is ready. Rebuild times for /usr/share/man/mandoc.db shrink on my notebook: In standard mode from 45 seconds to 11 seconds (75% reduction) In -Q mode from 25 seconds to 3.1 seconds (87% reduction) For comparison: makewhatis(8): 4.2 seconds That is, in -Q mode, we are now *faster* than the existing makewhatis(8), and careful profiling shows there is still a lot of room for improval.
* Fix mandocdb(8) -d and -u.Ingo Schwarze2014-01-061-14/+35
| | | | It was broken by recent optimizations.
* Rename dbindex() to dbadd() to be less confusing.Ingo Schwarze2014-01-061-5/+5
| | | | | The concept of an index file is gone since the switch to SQLite. No functional change.
* Remove the redundant "file" column from the "mlinks" table.Ingo Schwarze2014-01-061-3/+1
| | | | | The contents can easily be reconstructed from sec, arch, name, form. Shrinks the database by another 3% in standard mode and 9% in -Q mode.
* Drop Nd from the mpages table, it is still in the keys table.Ingo Schwarze2014-01-061-14/+1
| | | | | This shrinks the database in standard mode by 3%, in -Q mode by 9%, without loss of functionality.
* Add an option -Q (quick) to mandocdb(8)Ingo Schwarze2014-01-051-7/+11
| | | | | | | | | | | | | | | for accelerated generation of reduced-size databases. Implement this by allowing the parsers to optionally abort the parse sequence after the NAME section. While here, garbage collect the unused void *arg attribute of struct mparse and mparse_alloc() and fix some errors in mandoc(3). This reduces the processing time of mandocdb(8) on /usr/share/man by a factor of 2 and the database size by a factor of 4. However, it still takes 5 times the time and 6 times the space of makewhatis(8), so more work is clearly needed.
* Rip out the complete "reachable" checks, without replacement.Ingo Schwarze2014-01-051-68/+6
| | | | | | | | | | | | | | It's a pity i spent time during t2k13 writing this; however, when an entire concept is busted, let us not look back, There is no such thing as an unreachable page. Even if you are crazy enough to put a page starting with ".Dt NAMEI 9" into a file man1/cat.1, we now make sure that it can be found by all of the following: Nm=namei Nm=cat sec=1 sec=9 It will always be displayed as: cat(1) - pathname lookup So you know that you have to type `man cat` to get at it. That obsoletes the concept of "unreachable manuals" for good.
* Remove the obsolete file name column from the mpages table.Ingo Schwarze2014-01-051-13/+1
| | | | | This column wasn't helpful because one manpage can have multiple MLINKS. Use the file name column in the mlinks table, instead.
* Remove the obsolete sec and arch columns from the mpages table.Ingo Schwarze2014-01-051-7/+3
| | | | | They were confusing because a manpage can have MLINKS in different sections and architectures.
* Reimplement apropos -s NUM -S ARCH EXPR by internally converting it toIngo Schwarze2014-01-051-2/+4
| | | | | | | | | | | | | | | apropos \( EXPR \) -a 'sec~^NUM$' -a 'arch~^(ARCH|any)$' in preparation for removal of sec and arch from the mpage table. Almost no functional change except for the following bonus: This also makes sure that for cross-section and cross-arch MLINKs, all of the following work: apropos -s 1 encrypt apropos -s 8 encrypt apropos -s 1 makekey apropos -s 8 makekey While here, print error messages about invalid regexps to stderr.
* Put section and architecture info into the keys table,Ingo Schwarze2014-01-051-2/+10
| | | | | | | | | | in preparation for removing them from the mpages table, aiming for cleaner and more uniform interfaces. Database growth is below 4%, part of which will be reclaimed. As a bonus, this allows searches like: ./obj/apropos An=kettenis -a arch=ppc ./obj/apropos An=kettenis -a sec~[^4]
* Avoid "utf8" in the names of a function and a struct memberIngo Schwarze2014-01-021-18/+18
| | | | | that don't necessarily have anything to do with UTF-8. Just renaming, no functional change.
* Do not put UTF-8-encoded strings into the database by default, use ASCII.Ingo Schwarze2014-01-021-16/+34
| | | | Just like for mandoc(1), provide a -Tutf8 option for people who want that.
* Polish the mlink_add() interface:Ingo Schwarze2014-01-021-26/+20
| | | | | Allocate memory inside, not in the callers. No functional change.
* Check all MLINKS for consistency with the content of the manual page,Ingo Schwarze2014-01-021-42/+59
| | | | | not just the first one. This doesn't change how the check is done, but just which MLINKS are checked.
* Yet another regression introduced by Kristaps when he switched fromIngo Schwarze2013-12-311-33/+5
| | | | | | | | | | | | | | Berkeley DB to SQLite3: In the .In parser, the logic got inverted. The resulting NULL pointer access was found by clang; scan log provided by Ulrich Spoerlein <uqs at FreeBSD>. The best fix is to simply remove the whole, pointless custom handler function for .In and let the framework do its work. Now searching for included header files actually works. While here, remove the similarly pointless custom .St handler, fix the return value of the .Fd handler and disentangle the spaghetti in the .Nm handler.
* remove assignments that will be overwritten right afterwards,Ingo Schwarze2013-12-311-1/+0
| | | | | and remove pointless local variables; found in a clang output from Ulrich Spoerlein <uqs at FreeBSD>
* Oops, that segfaulted after deleting an mlink from the list.Ingo Schwarze2013-12-271-7/+9
| | | | Fix the loop logic in mlinks_undupe().
* Split mlinks_undupe() out of mpages_merge()Ingo Schwarze2013-12-271-36/+52
| | | | | such that the check for source manuals of the same name can be done for multiple mlinks pointing to the same preformatted mpage.
* Save the MLINK name into the database, too;Ingo Schwarze2013-12-271-2/+4
| | | | apropos(1) will need it to display its results.
* Write more than one mlink per mpage to the database.Ingo Schwarze2013-12-271-7/+10
| | | | Not yet used by apropos(1).
* Allow saving more than one mlink per mpage in the mlinks ohash.Ingo Schwarze2013-12-271-23/+0
| | | | | | We are still only using one of them for now. Actually, we are now using a different one, but the order the mlinks are found is random anyway.
* Enable the framework code to allow more than one mlink per mpage.Ingo Schwarze2013-12-271-2/+3
| | | | Not used yet.
* Clean up the interface of mlink_add().Ingo Schwarze2013-12-271-69/+66
| | | | | Consistently use "fsec" and "fform" for info derived from the file name. No functional change.
* Another step on the way to clear naming, this time regarding mlinks:Ingo Schwarze2013-12-271-76/+36
| | | | | | | | * rename global ohash filenames to mlinks * rename ofadd() to mlink_add() * fold fileadd() and inoadd() into mlink_add() * fold filecheck() into mpages_merge() Still no functional change.
* Split struct mlink out of struct mpage.Ingo Schwarze2013-12-271-82/+123
| | | | Still a 1:1 relation, no functional change yet.
* Add an additional mlinks table to the database, redundant for now,Ingo Schwarze2013-12-271-21/+52
| | | | | | both because it contains nothing but a subset of the data of the existing mpages table and because the relationship of mpage and mlink entries is still 1:1. But all that will eventually change.
* Drop the mpages_list, use the existing mpages ohash for iteration.Ingo Schwarze2013-12-261-23/+27
| | | | | No functional change except that the order of database entries changes, which doesn't matter anyway.
* The name "id" is terrible for a struct.Ingo Schwarze2013-12-261-14/+14
| | | | | Make this more searchable by calling it "inodev". No functional change.
* To better support MLINKS, we will have to split the "docs" databaseIngo Schwarze2013-12-261-178/+183
| | | | | | | | table into two tables, on for actual files on disk, one for (often multiple) directory entries pointing to them. That implies splitting struct of into two structs, to be called "mpage" and "mlink", respectively. As a preparation, globally rename "of" and "inos" to "mpage". No functional change.
* Stop parsing man(7) input when we found all we were searching for,Ingo Schwarze2013-12-261-1/+4
| | | | such that we don't trigger an assertion on a duplicate NAME section.
* The man(7) language has no syntax to specify architectures, but itIngo Schwarze2013-10-271-10/+11
| | | | | | | | | | can still be used to write architecture-specific manuals, of course. So just derive the architecture a man(7) manual belongs to from the directory where it is located and refrain from warning about each and every architecture-specific man(7) manual found. While here, delete some trailing whitespace in the neighbourhood.
* The code in ofmerge() only tried the source parsers if at least oneIngo Schwarze2013-10-271-1/+1
| | | | | | | | | | | | | | | | | | | of the path (/man1/ .. /man9/) or the file name suffix (*.1 .. *.9) indicated a source manual. That missed source manuals with unusual names in unusual locations. Instead, as the existing comment right above already suggests, try the source parsers unless both the path and the file name suffix unambiguously indicate a preformatted manual (/cat*/*.0). This change is not expensive in practice because no real-world system will have large numbers of preformatted pages outside /cat*/*.0. The only way to make information loss even less probable would be to try the source parsers on all files, even /cat*/*.0, which wouldn't buy us much because no real-world system will call source manuals /cat*/*.0, but it will be expensive in practice, because many real-world systems have large numbers of preformatted pages called /cat*/*.0.
* delete duplicate NULL check and polish style;Ingo Schwarze2013-10-271-5/+5
| | | | no functional change
* Fix an assertion in dbindex(): Null strings are never entered into theIngo Schwarze2013-10-181-1/+1
| | | | string table. Fortunately, they never need UTF-8 translation either.
* Manuals to be checked with "mandocdb -t" need not be in the currentIngo Schwarze2013-10-181-1/+1
| | | | directory or one of its subdirectories.
* For the strings table, ohash_init is only called in ofmerge(),Ingo Schwarze2013-07-021-16/+15
| | | | | so move the str_info structure into that function. No functional change.
* Turning off synchronous mode logically belongs to opening the database,Ingo Schwarze2013-07-021-9/+11
| | | | so move the statement into the function dbopen().
* Restore the check whether each page added to the databaseIngo Schwarze2013-07-021-18/+86
| | | | | is actually reachable by man(1). This check got lost when the database backend was changed from Berkeley to sqlite.
* The mdoc_handler flags are unused and will never be used.Ingo Schwarze2013-06-071-126/+123
| | | | | Having a mask is sufficient to trigger putmdockey. Simplify by dropping the flags; no functional change.
* In .Xr database entries, mention the manual section again;Ingo Schwarze2013-06-071-2/+13
| | | | | the section was dropped when switching from db to sqlite. Use the customary format foo(N).
* The string hash table is only needed to combine multiple occurencesIngo Schwarze2013-06-071-109/+67
| | | | | | | | | | | | | | | | | | | | | | of the same string within the same manual, so initialize and purge it for each manual in ofmerge() instead of one single time in main(). There is no point in saving manual names and descriptions in that table because each of them occurs only once, or very few times. The is no point in saving section numbers there because they are so much shorter than the descriptions. Testing with the complete tree /usr/share/man/ on my notebook shows that this change slightly reduces memory consumption by about 20% while there is no measurable difference in execution time. As a bonus, this allows to delete the functions stradd() and stradds(), the "next" member from struct str, and the global struct str *words. While adapting the places in the code using stradd(), i noticed that parsing of the mdoc(7) .Nd macro was completely broken and that for formatted manual pages with unusable NAME section, the description was never set in the struct of. This commit fixes both bugs as well.
* Optimize stradds() and putkeys() to not call ohash_qlookupi()Ingo Schwarze2013-06-061-44/+27
| | | | | | | and ohash_find() twice. As a bonus, this allows to drop hashget(). While here, rename index to slot to match the terminology in the ohash manual; it also prevents potential clashes with index(3). Drop the slot variable altogether where it is used only once.
* Drop wordaddbuf() which is identical to putkeys().Ingo Schwarze2013-06-061-21/+8
| | | | | Also rename straddbuf() to stradds() to be more similar to putkeys(). Just cleanup, no functional change.
* In dbopen(), check success of remove("mandoc.db~").Ingo Schwarze2013-06-061-21/+14
| | | | | While here, simplify dbopen() and dbclose(): No need for strlcpy() and strlcat() when dealing with constant strings only.
* In parse_catpage(), the comment saying that the filename would beIngo Schwarze2013-06-061-0/+1
| | | | | | used as a default page description if no usable NAME section was found was preserved when moving from db to sqlite, but the code line actually doing that was removed without replacement. So, put it back.
* The return value from parse_man() is completely unused,Ingo Schwarze2013-06-051-13/+9
| | | | so make the function void; no functional change.
* Two sanity checks got lost in treescan()Ingo Schwarze2013-06-051-29/+36
| | | | | | | | | during the switch from db to sqlite; restore these: * Warn and skip when directory and file name mismatch. * Warn and skip when finding special files. * Warning about "mandocdb.db" is useless, it is always present. * While here, do not hardcode "mandocdb.db", use MANDOC_DB.
* Add back the realpath() checks that got lost during the change fromIngo Schwarze2013-06-051-121/+159
| | | | | | | | | | | | | | | | | | db to sqlite; they are needed to prevent corruption of the database when paths containing dot, dotdot, or symlinks are given on the command line. Also make sure the exit-code is really non-zero on system errors and use mandoc(1) exit codes. To make all this simpler, * Drop the "basedir" argument from almost every function and make it global because it is really state info used all over the place. * Move "startdir" and "fd" as local vars into set_basedir() because they are only used for this one purpose, i.e. to move out of basedir again. While here, * Clarify the name of path_arg in the main program; in the -C case, it is not a dir, and anyway there are lots of different dirs around. * Include missing <stdio.h> needed for perror().
* Some places used PATH_MAX from <limits.h>, some MAXPATHLEN from <sys/param.h>.Ingo Schwarze2013-06-051-16/+16
| | | | | | Consistently use the PATH_MAX since it is specified by POSIX, while MAXPATHLEN is not. In preparation for using this at a few more places.