Page 230 - DLIS402_INFORMATION_ANALYSIS_AND_REPACKAGING
P. 230

Unit 11: Indexing Language: Types and Characteristics




            to the topic because the same information is simply repeated at each point? Often this is an indicator  Notes
            of poor writing or thinking by the author, but certainly it is not the indexer’s job to torture users by
            compounding the error.
            The current run of software that produces indexes is particularly bad at this, since a topic may be
            mentioned dozens or even hundreds of times in a book. A professional indexer wanting to keep so
            many entries on a topic would break them down into second level entries. This is something software
            cannot do using a simple algorithm. Only understanding the relationships of subentries to entries,
            including the meanings of words, would allow this to be accomplished.
            Given the use of modern word processors by authors, repetition is sometimes word-for-word. In
            that case a computer indexing program would be able to recognize repetition. If the repetition is not
            word-for-word, a program that does not understand the actual meanings of words will not spot the
            repetition.
            Conversely, sometimes a passage that is in some sense repetitious is still important to index. An
            example might be a warning about potential software errors. The wording might be the same, but it
            may be important to the reader to be able to find all the cases that may generate an error. Only a
            knowledgeable indexer with a sense of “importance” can correctly make case-by-case decisions
            about whether an entry is likely to be useful instead of noisy.

            Word Boundaries

            There are many problems analogous to page-range determination (requiring the drawing of
            boundaries) in the human language domain. Almost every ordinary word in the English language
            carries with it the question of coverage. No adult adept at English would dispute that the following
            sentence can be used to accurately describe a situation:
            “That is not a cat; that’s a lion!”
            And yet few would dispute the following assertion:
            “A lion is a cat.”
            Simplistic logic is not much help here. Some would argue that more precise use of the English
            language would help: “That’s not a house cat!” Any particular difficulty might be overcome in this
            way, but it is humans as a group who sort out such uses of language. If just a few nouns were
            lacking a tight definition, we might be tempted by the project.
            Even in science and technology precise, clearly limited subjects are in short supply. Make a definition
            of most things in the world, and a set of questions can be easily generated (by humans) that point
            out the tendency of the real world to blur. “Light Emitting Diode.” Well, what if it emits infrared
            radiation? What if it is faulty? What if something appears to me to be a LED on an instrument panel,
            but it’s light isn’t produced by a diode?
            This problem is remarkably similar to (and in practical indexing connected to) the page range
            problem. If a text switches from discussing a laser to discussing a maser, do I terminate the laser
            locator and create a separate entry for maser? Is light a general term for electromagnetic radiation
            (as in: the speed of light), or is it specific to the frequencies visible to the human eye? If there are 3
            pages total on the topic of amplification by stimulated emission of radiation, and the laser/maser
            divide appears to be accidental rather than fundamental, an indexer should take a different approach
            than if there are 5 pages on lasers and 23 on masers. [I might create an entry such as: lasers, 23-27.
            See also masers]
            Verbs as well as nouns have their boundary issues. Concepts expressed in phrases, sentences, and
            whole books have boundary issues. While it is true that there are mathematical models for probability,
            overlaps, and topologies which have been applied with great success to problems such as quantum
            physics, so far they have not been successfully applied to clarifying the meanings of human languages
            for MIs.



                                             LOVELY PROFESSIONAL UNIVERSITY                                   225
   225   226   227   228   229   230   231   232   233   234   235