Page 232 - DLIS402_INFORMATION_ANALYSIS_AND

Page 232 - DLIS402_INFORMATION_ANALYSIS_AND_REPACKAGING

P. 232

Unit 11: Indexing Language: Types and Characteristics

Again, an algorithm could generate this. It is much more compact than a full coverage. The problem Notes
is it expects too much of the user. First, the user often has to do two lookups instead of one. In
addition, users often don’t know the terms they need to look up. For instance, a reader does not
know the name of the LowObject, but only of the Mid1Object. The reader then has to find Mid2Object
to find the name LowObject.
Good human indexers can produce an index of any reasonable length that minimizes user lookup
time and maximizes user success rates. The result, for our example, is almost always somewhere
between the complete coverage and absolute minimal coverage.
Human indexers can do that because they work with three maps in their heads: the map of the book
or text being indexed, the map of the subject area, and the map of the knowledge levels and mental
habits of likely users. A good indexer will know in great detail when to use full coverage and when
to be selective. Could a computer program and database accomplish the same? None do yet. When
creating an index of a book that has subject hierarchy issues, a human indexer will rely heavily on
the concept of “importance”.

Importance

The main point of book indexing is to speed up human retrieval of meaningful information. For
that reason over-indexing, which may lead to multiple fruitless searches, is not a good solution. At
the same time printing a complete (in terms of coverage) index is usually prohibited by cost
considerations.
So one consideration professionals give considerable thought to while indexing a work is deciding
which topics do require entries, and which do not. Indexers of books who do not understand the
subject matter may take a machine-like approach to this task. Their rule might be if it is a noun,
index it. If their publisher is not interested in providing the reader of the book with an index that is
a quarter as long as the book itself, despite being in 6 point type, the indexer will be asked to
shorten the index, which is to say, guess at which entries are important.
As usual, professional indexers have provided some rules of thumb for this. The most basic is: the
more the author writes about a subject, the more important it is. A topic with an entire chapter
devoted to it is more important than a topic that has a couple of pages devoted to it, which in turn
is more important than topic covering a single paragraph or sentence. At the bottom of the priority
list is topics are merely mentioned.
We can imagine, if the page-range problem can be solved, that an MI could use the above general
rule to measure the importance of a term becoming an index entry. Given the allowed length of the
printed index, the terms with least importance could be eliminated with great precision.
But we know that a single sentence, say a key definition, may be more important than covering
longer lengths of text that add little to the discussion. So, given a goal of helping a human user, the
indexer’s knowledge base and judgement are going to do far better at sorting terms in order of
importance than any algorithm based on text length.
In fact human indexers make judgements as to importance as they read the text; they are often able
to draft an extremely usable index approximating a required length (say 5% of the overall text
length) on a single pass.
Helping the User
Professional indexers have many rules and guidelines for constructing indexes, some of which have
identified above. Naturally some indexers are more rule-oriented than others. Whenever there are
a set of rules that are to be obeyed in a complex terrain, at times one or more rules will conflict with
each other.
The overriding rule, when creating book indexes, is to help the user. This may seem like a very
vague rule, but all human index writers are also index users. Hopefully they use indexes of books

LOVELY PROFESSIONAL UNIVERSITY 227

227 228 229 230 231 232 233 234 235 236 237