Page 254 - DLIS402_INFORMATION_ANALYSIS_AND_REPACKAGING
P. 254
Unit 11: Indexing Language: Types and Characteristics
There is no advantage to be found by adding adjectives to the phrase rule. Notes
It is also important not to forget individual nouns, as they seem to convey much meaning.
The comparison between these last two graphs points strongly in favor of noun phrases as phrase
rule.
Sample vs. Whole
Figure 11.11
100
90
80
70 Original
Precision 50 TIPSAMP
60
40
30 TIPFULL
20
10
0
0 1020 3040 50 60 70 80 90 100
Recall
The difference between a thesaurus constructed with a full database (TIPFULL) and one constructed
by taking 1 of every 5 documents from this same collection (TIPSAMP) is practically nonexistent.
The ideal sample size was not calculated, though. The phrase rule used is “noun-phrases” (that is,
{NNN, NN, N}).
A related result shows that a thesaurus built for one collection can be used successfully to improve
search in a separate but similar one.
Online use of the thesaurus (short queries)
Figure 11.12
80
70
60 Original
50
Precision 40 Dup
Nodup
30
20 Both
10
0
0 10 20 30 40 50 60 70 80 90 100
Recall
LOVELY PROFESSIONAL UNIVERSITY 249