Page 230 - DLIS402_INFORMATION_ANALYSIS_AND_REPACKAGING
P. 230
Unit 11: Indexing Language: Types and Characteristics
to the topic because the same information is simply repeated at each point? Often this is an indicator Notes
of poor writing or thinking by the author, but certainly it is not the indexer’s job to torture users by
compounding the error.
The current run of software that produces indexes is particularly bad at this, since a topic may be
mentioned dozens or even hundreds of times in a book. A professional indexer wanting to keep so
many entries on a topic would break them down into second level entries. This is something software
cannot do using a simple algorithm. Only understanding the relationships of subentries to entries,
including the meanings of words, would allow this to be accomplished.
Given the use of modern word processors by authors, repetition is sometimes word-for-word. In
that case a computer indexing program would be able to recognize repetition. If the repetition is not
word-for-word, a program that does not understand the actual meanings of words will not spot the
repetition.
Conversely, sometimes a passage that is in some sense repetitious is still important to index. An
example might be a warning about potential software errors. The wording might be the same, but it
may be important to the reader to be able to find all the cases that may generate an error. Only a
knowledgeable indexer with a sense of “importance” can correctly make case-by-case decisions
about whether an entry is likely to be useful instead of noisy.
Word Boundaries
There are many problems analogous to page-range determination (requiring the drawing of
boundaries) in the human language domain. Almost every ordinary word in the English language
carries with it the question of coverage. No adult adept at English would dispute that the following
sentence can be used to accurately describe a situation:
“That is not a cat; that’s a lion!”
And yet few would dispute the following assertion:
“A lion is a cat.”
Simplistic logic is not much help here. Some would argue that more precise use of the English
language would help: “That’s not a house cat!” Any particular difficulty might be overcome in this
way, but it is humans as a group who sort out such uses of language. If just a few nouns were
lacking a tight definition, we might be tempted by the project.
Even in science and technology precise, clearly limited subjects are in short supply. Make a definition
of most things in the world, and a set of questions can be easily generated (by humans) that point
out the tendency of the real world to blur. “Light Emitting Diode.” Well, what if it emits infrared
radiation? What if it is faulty? What if something appears to me to be a LED on an instrument panel,
but it’s light isn’t produced by a diode?
This problem is remarkably similar to (and in practical indexing connected to) the page range
problem. If a text switches from discussing a laser to discussing a maser, do I terminate the laser
locator and create a separate entry for maser? Is light a general term for electromagnetic radiation
(as in: the speed of light), or is it specific to the frequencies visible to the human eye? If there are 3
pages total on the topic of amplification by stimulated emission of radiation, and the laser/maser
divide appears to be accidental rather than fundamental, an indexer should take a different approach
than if there are 5 pages on lasers and 23 on masers. [I might create an entry such as: lasers, 23-27.
See also masers]
Verbs as well as nouns have their boundary issues. Concepts expressed in phrases, sentences, and
whole books have boundary issues. While it is true that there are mathematical models for probability,
overlaps, and topologies which have been applied with great success to problems such as quantum
physics, so far they have not been successfully applied to clarifying the meanings of human languages
for MIs.
LOVELY PROFESSIONAL UNIVERSITY 225