aboutsummaryrefslogtreecommitdiffstats
path: root/doc/expert/exa_inde
blob: c3ddb934c4b500dfec25d2cb2aecd3ccf8b28d84 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
@Section
   @Title { Merged index entries }
   @Tag { exa_inde }
@Begin
@PP
Getting index entries to merge correctly has been quite a struggle.
It is easy to specify what is wanted, but Lout lacks the lists and
objects (in the object-oriented sense) that would make the
implementation straightforward.  The whole problem was reanalysed
for Version 3.26, reimplemented, tested more carefully than is
usually necessary in Lout, and proved correct as follows.
@PP
We ignore page number ranges in this proof.  It is not hard to
show that they will be handled correctly too, provided they
do not overlap with other entries with the same key.  The
effect of such overlaps is undefined, leaving us nothing to
prove.  We also assume that every entry with a given
key has the same label, including any format (that is, the same
initial part before the page number).  If labels differ the
result is undefined and there is nothing to prove.
@PP
We will prove that raw entries always have the form
@ID @Code "label &0.03fu {}"
and that non-raw entries always have the form
@ID @Code "label &0.03fu {}{@OneCol ,} pn1{@OneCol ,} pn2"
where the pattern may repeat for any number of page numbers
{@Code pn1}, {@Code pn2}, etc.  In addition, the page numbers
will be distinct, monotone increasing, and consist of exactly
the numbers in the original unmerged entries.
@PP
These expressions are not the simplest that would give the
correct appearance.  Without @Code "&0.03fu {}" the code
would not work correctly, as will be explained below.  Without
@@OneCol the commas would be subject to an optimization which
can merge them into the previous word.  It's too difficult to
explain when this optimization will and will not be applied;
suffice to say that it will sometimes not happen when melding,
and this will cause @@Meld to get its equality testing wrong,
so it must be prevented from happening at all.
@PP
Our proof is by induction on the number of entries merged
together.  First, we need to establish the base cases.  If the
index entry is raw, the following expression is used to define
its value:
@ID @Code "label &0.03fu {}"
If the index entry is non-raw, the following expression is
used to define its value:
@ID @Code "label &"0.03fu" {}{@OneCol ,} pn"
where @Code pn is the page number or page number range of
the entry.  In each case we clearly have an entry that
satisfies all the requirements of the theorem.
@PP
Now consider what happens when we come to merge two
entries.  The code used to carry out this merge is
@ID @OneRow @Code @Verbatim {
def @Merge left x right y
{
    { x @Rump { x @Meld y } } @Case
    {
        ""   @Yield  x
        else @Yield  { { x{@OneCol ,} } @Meld y }
    }

}
}
where @Code x is the first entry and {@Code y} is
the second.
@PP
We call the expression
@ID @Code "x @Rump { x @Meld y }"
the {@I discriminant}, since it determines which case
to apply.  We will track this in detail below, but
approximately, its function is to determine whether @Code y
contains something that is different from anything in
{@Code x}.  If so, then @Code "x @Meld y" differs from
@Code "x" and the discriminant is non-empty; if not,
@Code "x @Meld y" is equal to @Code "x" and the discriminant
is empty.
@PP
The first entry, @Code { x }, may be raw or non-raw, and the
second, @Code { y }, may also be raw or non-raw, together
giving four cases, which we take in turn.
@PP
If both entries are raw, then by assumption they have the
same labels and so are identical.  Thus, @Code "x @Meld y"
equals @Code { x }, the discriminant is empty,
and the result is @Code { x }, which is correct.
@PP
If @Code { x } is raw and @Code { y } is non-raw, then
the discriminant is non-empty and the result is the meld
of two objects, the first having the form
@ID @Code "label &0.03fu {}{@OneCol ,}"
being @Code "x" with a comma appended, and
the second being some non-raw entry such as
@ID @Code "label &0.03fu {}{@OneCol ,} pn1{@OneCol ,} pn2"
where the pattern may repeat.  We are assuming by induction
that @Code y has this form.  Clearly, this meld gives a
value equal to @Code { y }, which is the correct result.
@PP
If @Code { x } is non-raw and @Code { y } is raw, the
@@Meld in the discriminant melds two values typified by
@ID @Code "label &0.03fu {}{@OneCol ,} pn1{@OneCol ,} pn2"
and
@ID @Code "label &0.03fu {}"
The result of this is @Code { x } with an empty object added
at the end.  This empty object is the second element of @Code { y },
which is not equal to any element of @Code { x }:  the second
element of @Code x is not @Code "{}" but rather
@Code { "{}{@OneCol ,}" }, because @@Meld treats immediately adjacent
objects as single elements.  The result of @@Rump is then this
extra empty object, so the discriminant is the empty object and
we return @Code { x }, correctly.  It is this case that requires
us to use {@Code "0.03fu"}; without it we would be melding
@ID @Code "label{@OneCol ,} pn1{@OneCol ,} pn2"
with
@ID @Code "label"
producing
@ID @Code "label{@OneCol ,} pn1{@OneCol ,} pn2 label"
leading to a non-empty discriminant and the wrong answer.
@PP
This leaves just the case where both @Code x and @Code y
are non-raw.  We will divide this last case into three
sub-cases, but first we need some general observations.
@PP
Index entries are sorted for merging in the order in
which their anchor points appear in the final printed
document.  This means that over the course of these
entries the page numbers are non-decreasing.  It is
therefore clear that, although the order of merging
is undefined (actually a balanced tree order is used),
whenever two entries are presented for merging, all the
page numbers in the first entry are no larger than all
the page numbers in the second entry.  We are also
assuming inductively that the page numbers in each entry
are distinct and monotone increasing.  Thus, there can
be at most one page number common to any two entries
being merged, and if there is one in common it is the
last page number of the first entry and the first of
the second.
@PP
Our first sub-case is when the two entries have no
page number in common.  Since @Code { y } is non-raw,
it has a page number not equal to any page number in
@Code { x }.  Therefore the discriminant is non-empty
and the result is the meld of @Code "x{@OneCol ,}"
with @Code { y }, which for example could be the
meld of
@ID @Code "label &0.03fu {}{@OneCol ,} pn1{@OneCol ,} pn2{@OneCol ,}"
with
@ID @Code "label &0.03fu {}{@OneCol ,} pn3{@OneCol ,} pn4"
This will give the right answer, since @@Meld treats
adjacent objects as single elements, and always incorporates
elements from the first parameter first when it has a choice.
@PP
Our second sub-case is when the two entries have a
page number in common and @Code { y } has two or more
page numbers.  The common page number must be the last
of @Code x and the first of @Code { y }, so again
@Code { y } has something (its last page number)
distinct from @Code { x }, the discriminant is non-empty,
and we end up for example melding
@ID @Code "label &0.03fu {}{@OneCol ,} pn1{@OneCol ,} pn2{@OneCol ,}"
with
@ID @Code "label &0.03fu {}{@OneCol ,} pn2{@OneCol ,} pn3"
Again it's clear that the meld will produce the right
answer; in fact, this second sub-case could be unified
with the first sub-case.
@PP
Our third sub-case is when the two entries have a page
number in common and @Code { y } has only one page
number.  In this case, typified by @Code { x } with value 
@ID @Code "label &0.03fu {}{@OneCol ,} pn1{@OneCol ,} pn2"
and @Code y with value
@ID @Code "label &0.03fu {}{@OneCol ,} pn2"
it is clear that @Code y offers nothing new, the
discriminant is empty, and the result, quite correctly,
is @Code { x }.  This completes the proof.
@End @Section