Page 244 - DLIS402_INFORMATION_ANALYSIS_AND_REPACKAGING
P. 244
Unit 11: Indexing Language: Types and Characteristics
Retrieval and ranking tools Notes
Document abstracting
Book indexing
Indexicon
Effect of automatic methods on professionals
References
Introduction
This subject will examine developments in automatic indexing and abstracting in which the computer
creates the index and abstract, with little or no human intervention. The emphasis is on practical
applications, rather than theoretical studies. This does not cover computer-aided indexing, in which
computers enhance the work of human indexers, or indexing of the Internet.
Research into automatic indexing and abstracting has been progressing since the late 1950’s. Early
reports claimed success, but practical applications have been limited. Computer indexing and
abstracting are now being used commercially, with prospects for further use in the future. The
history of automatic indexing and abstracting is well covered by Lancaster (1991).
Database Indexing
The simplest method for indexing articles for bibliographic databases is extraction indexing, in
which terms are extracted from the text of the article for inclusion in the index. The frequency of
words in the article is determined, and the words which are found most often are included in the
index. Alternatively, the words which occur most often in the article compared to their occurrence
in the rest of the database, or in normal language, are included. This method can also take into
account word stems (so that run and running are recognised as referring to the same concept), and
can recognise phrases as well as single words.
Computer extraction indexing is more consistent than human extraction indexing.
However, most human indexing is not simple extraction indexing, but is
assignment indexing, in which the terms used in the index are not necessarily
those found in the text.
Assignment Indexing
For assignment indexing, the computer has a thesaurus, or controlled vocabulary, which lists all
the subject headings which may be used in the index. For each of these subject headings it also has
a list of profile words. These are words which, when found in the text of the article, indicate that the
thesaurus term should be allocated.
For example, for the thesaurus term childbirth, the profile might include the words: childbirth,
birth, labour, labour, delivery, forceps, baby, and born. As well as the profile, the computer also has
criteria for inclusion—instructions as to how often, and in what combination, the profile words
must be present for that thesaurus term to be allocated.
The criteria might say, for example, that if the word childbirth is found ten times in an article, then
the thesaurus term childbirth will be allocated. However if the word delivery is found ten times in
an article, this in itself is not enough to warrant allocation of the term childbirth, as delivery could
be referring to other subjects such as mail delivery. The criteria in this case would specify that the
term delivery must occur a certain number of times, along with one or more of the other terms in
the profile.
LOVELY PROFESSIONAL UNIVERSITY 239