Page 244 - DLIS402_INFORMATION_ANALYSIS_AND_REPACKAGING
P. 244

Unit 11: Indexing Language: Types and Characteristics




            Retrieval and ranking tools                                                              Notes
            Document abstracting
            Book indexing
            Indexicon
            Effect of automatic methods on professionals
            References
            Introduction
            This subject will examine developments in automatic indexing and abstracting in which the computer
            creates the index and abstract, with little or no human intervention. The emphasis is on practical
            applications, rather than theoretical studies. This does not cover computer-aided indexing, in which
            computers enhance the work of human indexers, or indexing of the Internet.
            Research into automatic indexing and abstracting has been progressing since the late 1950’s. Early
            reports claimed success, but practical applications have been limited. Computer indexing and
            abstracting are now being used commercially, with prospects for further use in the future. The
            history of automatic indexing and abstracting is well covered by Lancaster (1991).

            Database Indexing

            The simplest method for indexing articles for bibliographic databases is extraction indexing, in
            which terms are extracted from the text of the article for inclusion in the index. The frequency of
            words in the article is determined, and the words which are found most often are included in the
            index. Alternatively, the words which occur most often in the article compared to their occurrence
            in the rest of the database, or in normal language, are included. This method can also take into
            account word stems (so that run and running are recognised as referring to the same concept), and
            can recognise phrases as well as single words.




                        Computer extraction indexing is more consistent than human extraction indexing.
                        However, most human indexing is not simple extraction indexing, but is
                        assignment indexing, in which the terms used in the index are not necessarily
                        those found in the text.

            Assignment Indexing
            For assignment indexing, the computer has a thesaurus, or controlled vocabulary, which lists all
            the subject headings which may be used in the index. For each of these subject headings it also has
            a list of profile words. These are words which, when found in the text of the article, indicate that the
            thesaurus term should be allocated.
            For example, for the thesaurus term childbirth, the profile might include the words: childbirth,
            birth, labour, labour, delivery, forceps, baby, and born. As well as the profile, the computer also has
            criteria for inclusion—instructions as to how often, and in what combination, the profile words
            must be present for that thesaurus term to be allocated.
            The criteria might say, for example, that if the word childbirth is found ten times in an article, then
            the thesaurus term childbirth will be allocated. However if the word delivery is found ten times in
            an article, this in itself is not enough to warrant allocation of the term childbirth, as delivery could
            be referring to other subjects such as mail delivery. The criteria in this case would specify that the
            term delivery must occur a certain number of times, along with one or more of the other terms in
            the profile.




                                             LOVELY PROFESSIONAL UNIVERSITY                                   239
   239   240   241   242   243   244   245   246   247   248   249