diff options
Diffstat (limited to 'tlgu/tlgu.1')
-rw-r--r-- | tlgu/tlgu.1 | 192 |
1 files changed, 192 insertions, 0 deletions
diff --git a/tlgu/tlgu.1 b/tlgu/tlgu.1 new file mode 100644 index 0000000..d3cc149 --- /dev/null +++ b/tlgu/tlgu.1 @@ -0,0 +1,192 @@ +.TH tlgu 1 "Feb, 2005" "Version 1.1" "TLG to Unicode Converter" +.SH NAME + +tlgu \- convert TLG (D) CD-ROM txt files to Unicode + +.SH SYNOPSIS +.B tlgu +[ +.I options +] +.I input_file +.I output_file + +.SH DESCRIPTION +.B tlgu +will convert an \fIinput_file\fP from Thesaurus Linguae Graeca (TLG) representation +to a Unicode (UTF-8) \fIoutput_file\fP. The TLG representation consists of \fBbeta-code\fP +text and \fBcitation\fP information. + +.SH OPTIONS +.TP +.B \-b +inserts a form feed and citation information (levels a, b, c, d) on every "book" citation +change. By default the program will output line feeds only (see also \fB\-p\fP). +.TP +.B \-p +observes paging instructions. +By default the program will output line feeds only. +.TP +.B \-r +primarily Roman text. Some TLG texts, notably doccan1.txt and doccan2.txt are mainly +roman texts lacking explicit language change codes. Setting this option will force +a change to roman text after each citation block is encountered. +.TP +.B \-v +highest-level reference citation is included before each text line (v-level) +.TP +.B \-w +reference citation is included before each text line (w-level) +.TP +.B \-x +reference citation is included before each text line (x-level) +.TP +.B \-y +reference citation is included before each text line (y-level) +.TP +.B \-z +lowest-level reference citation is included before each text line (z-level). +.sp 1 +.TP +.B \-B +inserts blank space (a tab) before each and every line. +.TP +.B \-C +citation debug information is output. +.TP +.B \-S +special code debug information is output. +.TP +.B \-V +block processing information is output (verbose). +.TP +.B \-W +each work (book) is output as a separate file in the form output_file-xxx.txt + +.SH HISTORY AND INTENDED USE +The purpose of \fBtlgu\fP is to translate binary TLG-format files into readable and editable text. +It is based on an earlier program written in 80x86 assembly language (1996) outputting codes for +a home-made font which used the prevalent hellenic font encodings of that time complemented +by dead accent characters - not very attractive, but readable. +.sp 1 +Then came Unicode and a plethora of accented character glyphs; nice-looking but +with the well-known drawback that special processing is needed to do wild-card searches. +Nice polytonic fonts have now been made available (Cardo, Gentium, Athena, Athenian, +Porson) and, surely, these will be expanded as special-use code points are included +in the Unicode definition (musical symbols, other special symbols) and more fonts will be created. +.sp 1 +So, at this point in time, \fBtlgu\fP will crunch a file which has been formatted +according to the published TLG-D format and produce codes for most glyphs +generally available. No attempt has been made to introduce multi-character sequences +or formatting codes (font changes). If a code has not been defined, the program will output +the respective "code family" glyph. You may use the \fB\-S\fP option to check such codes +against the published beta code definition. +.sp 1 +You may not like the character output for a specific code. Check out the \fBtlgcodes.h\fP file +containing the special symbol and punctuation codes and select one to suit you better. It will +probably be a while before the beta to Unicode correspondence settles down. + + +.SH EXAMPLES +.B ./tlgu -r DOCCAN2.TXT doccanu.txt +Translate the TLG canon to a unicode text file. Note the use of the \fB-r\fP option (this file +expects Roman as the default font). +.TP +.B ./tlgu -x -y -z TLG1799.TXT tlg1799u.txt +Generate a continuous file with the texts of granpa Euclides. Available citations (-x -y -z) +are Book//demonstratio/line as shown in the respective "cit" field of doccan2.txt. +.TP +.B ./tlgu -b -B TLG1799.TXT tlg1799u.txt +Generate the same texts, this time with a page feed and book citation information on the first +page of each book and a tab before each line (use with OOo versions earlier than 1.1.4). +.TP +.B ./tlgu -C TLG1799.TXT tlg1799u.txt +See how the citation information changes within each TLG block. +.TP +.B ./tlgu -S TLG1799.TXT tlg1799u.txt | sort > symbols1799.txt +Check out the symbols used in a work. Book and x, y, z references are printed on a separate +line for each symbol. Sort / grep the output to locate specific symbols of interest; save in +a file for later use. +.TP +.B ./tlgu -W TLG0006.TXT tlg0006u +Will produce separate files for each work, named tlg006u-001.txt etc. + +.SH POST-PROCESSING EXAMPLES +I use the OpenOffice suite for most of my work. This example shows one of many possible +ways of using the search and replace facility to create a readable version of the Suda lexicon. +.TP +.B ./tlgu -B TLG4085.TXT tlg4085u.txt +A Unicode file with the text is created +.TP +.B Open the generated file with OOo: +File | Open | Filename: tlg4085u.txt, +File Type: Text Encoded \-\- Press Open +.sp 1 +The ASCII Filter Options window appears. Select the Unicode (UTF-8) character set and +a proper Unicode font installed in your machine (e.g. Cardo). Press OK. +.TP +.B Replace angle brackets with expanded text +Lexicon terms are enclosed in <angle brackets>. The actual beta codes indicate the use of +expanded text for emphasis. Select Edit | Find & Replace. The \fBFind & Replace\fP window appears. +.sp 1 +In the \fBSearch For\fP field, type the following expression: \fB<[^<>]*>\fP +This means "find any characters between angle brackets, not including angle brackets". +.sp 1 +In the \fBReplace With\fP window insert a single ampersand: \fB&\fP +This means that we need to \fBadd\fP formatting information (this case) or additional text to +the text found. Press \fBFormat...\fP and select the \fBPosition\fP tab; select Spacing +Expanded by 2.0 points. Press OK. +.sp 1 +Check the \fBRegular Expressions\fP box and press \fBReplace All\fP. +.sp 1 +You may now replace the angle brackets with nothings. +.sp 1 +Repeat the above procedure for titles enclosed in {braces}. Write a macro... +.TP +.B Other useful information +In the "Execute" tab of the "Properties" window of my KDesktop Link to Application +I have the following command (single line): +.br +\fBLC_CTYPE=el_GR.UTF-8 /whereitsat/OpenOffice.org1.1.x/soffice\fP +.br +The prefix, an environment variable, allows you to use the same program with different locales; +in this case, hellenic Unicode (UTF-8). +.sp 1 +I put my default locale and keyboard definitions in my \fB.profile\fP: +.br +.na +.B export LC_CTYPE=el_GR.UTF-8 +.br +.na +.B setxkbmap us+el polytonic -option grp:ctrl_shift_toggle +.br +.sp 1 +This way multi-lingual text can be entered; keyboard layout switching is done by pressing Ctrl/Shift. +.SH REFERENCES +There are several texts describing the internal representation of \fBPHI\fP and +\fBTLG\fP text, ID data, citation data and index files. The originator of this +format is the Packard Humanities Institute. The TLG is maintained by UCI \- see +\fBwww.tlg.uci.edu\fP \- where you may find the \fBTLG Beta Code Manual\fP and the +\fBTLG Beta Code Quick Reference Guide\fP. +.sp 1 +Unicode consortium publications pertaining to the codification +of characters used in Hellenic literature, scientific and musical texts. +.sp 1 +The OpenOffice suite (\fBwww.openoffice.org\fP) includes a word processor that you +can use to load, process and create new polytonic texts. + +.SH COPYRIGHT +Copyright (C) 2004, 2005 Dimitri Marinakis (dm ssa gr). + +This program is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License (version 2) as published by +the Free Software Foundation. + +This program is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with this program; if not, write to the Free Software +Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA |