Page 228 - DLIS402_INFORMATION_ANALYSIS_AND_REPACKAGING
P. 228
Unit 11: Indexing Language: Types and Characteristics
Notes
Language is again connected to human action and to cultural and social issues,
and a given natural language is not just one well-defined thing, why forms of
sublanguages also have to be considered. Research in automatic indexing is no
longer primarily a question of better computers, but primarily a question of better
understanding of human language and the social actions, that this language is
serving.
Assigned indexing which is not just a not simple substitutions of document terms with synonyms,
but which represents independent conceptualizations of document contents may turn out to be the
most important area in which human indexing performs better than automatic indexing (for example
assigning “romantic poem” to a poem, which does not describe itself as such).
Indexing Books: Lessons in Language Computations
A common reaction from computer professionals, when told that back-of-book indexes are still
written by human beings, is: “Don’t they use computers to do that now?” The answer “No,” must
be followed by the explanation, the almost redundant “Because no one has been able to write a
software program that can index books well.” The key word here is “well.” [Those indexers who
like ripostes might ask computer programmers why humans are still needed to do computer
programming].
What computers can do easily is generate a list of words or phrases in a work (book or manual) with
the pages (or other locators or pointers) where the words or phrases appear, and arrange the list in
alphabetical order. This gives the product some of the appearance of a professionally produced
index. Such an index can be of some value to humans who need to find information in a book, but
it is nowhere near as valuable as a professionally human-produced index.
If a book is in electronic format it may sometimes be easier for a user to use a search function than
an index to find information. But to search well (efficiently and effectively) a user must have some
of the same skills and knowledge of the book’s topics as a professional book indexer. A good index
is not just a list of words with pointers (locators, in publishing jargon). A good index is a structure
optimized to help two human minds meet.
In addition to knowing the formal rules of indexing, professional indexers have developed a number
of rules-of-thumb that help them to produce indexes that are highly valuable to book users. In
addition, even mediocre human book indexers do certain activities, with definite results, with little
conscious effort, that are exceptionally hard for a computer program to do at present. Human
intelligence is clearly still superior to machine intelligence in the indexing game.
Quality indexing is such a difficult task for machine indexers (MIs). This process will illuminate
aspects of the nature of the relationships between indexes, books, language, and real world
knowledge. A number of paradigms will be considered for computer models of language, structuring
data, and creating useful indexes (both back of the book and of more generalized sorts).
Useful indexes could be being used to solve a number of problems aside from rapidly
looking up a subject in a book. They have applications in many aspects of real-world
problem solving. In fact, the vocabulary of language is itself an index (of sorts). For
humans (babies), learning about the world is an indexing process. Failing to index the
world properly leads directly to failure to function well in the world. Adding indexing
intelligence to machines would greatly enhance their ability to function in the world.
LOVELY PROFESSIONAL UNIVERSITY 223