Page 164 - DLIS402_INFORMATION_ANALYSIS_AND_REPACKAGING
P. 164

Unit 9: Cataloguing and Subject Indexing: Principles and Practices



            First Step in Indexing                                                                   Notes

            The first step in indexing is to decide on the subject matter of the document. In manual indexing, the
            indexer would consider the subject matter in terms of answer to a set of questions such as “Does the
            document deal with a specific product, condition or phenomenon?”  As the analysis is influenced by
            the knowledge and experience of the indexer, it follows that two indexers may analyse the content
            differently and so come up with different index terms. This will impact on the success of retrieval.

            Automatic vs. Manual Subject Analysis

            Automatic indexing follows set processes of analysing frequencies of word patterns and comparing
            results to other documents in order to assign to subject categories. This requires no understanding of
            the material being indexed therefore leads to more uniform indexing but this is at the expense of the
            true meaning being interpreted. A computer program will not understand the meaning of statements
            and may therefore fail to assign some relevant terms or assign incorrectly. Human indexers focus
            their attention on certain parts of the document such as the title, abstract, summary and conclusions,
            as analysing the full text in depth is costly and time consuming. An automated system takes away the
            time limit and allows the entire document to be analysed, but also has the option to be directed to
            particular parts of the document.

            Second Stage
            The second stage of indexing involves the translation of the subject analysis into a set of index terms.
            This can involve extracting from the document or assigning from a controlled vocabulary. With the
            ability to conduct a full text search widely available, many people have come to rely on their own
            expertise in conducting information searches and full text search has become very popular.
            Subject indexing and its experts, professional indexers, catalougers, and librarians, remain crucial
            to information organization and retrieval. These experts understand controlled vocabularies and
            are able to find information that cannot be located by full text search. The cost of expert analysis to
            create subject indexing is not easily compared to the cost of hardware, software and labor to
            manufacture a comparable set of full-text, fully searchable materials. With new web applications
            that allow every user to annotate documents, social tagging has gained popularity especially in the
            Web.




                     One application of indexing, the book index, remains relatively unchanged despite
                     the information revolution.

            Extraction Indexing

            Extraction indexing involves taking words directly from the document. It uses natural language and
            lends itself well to automated techniques here word frequencies are calculated and those with a
            frequency over a pre-determined threshold are used as index terms. A stop-list containing common
            words such as the, and would be referred to and such stop words would be excluded as index terms.
            Automated extraction indexing may lead to loss of meaning of terms by indexing single words as
            opposed to phrases. Although it is possible to extract commonly occurring phrases, it becomes
            more difficult if key concepts are inconsistently worded in phrases. Automated extraction indexing
            also has the problem that even with use of a stop-list to remove common words such as “the,” some
            frequent words may not be useful for allowing discrimination between documents.
            For example, the term glucose is likely to occur frequently in any document related to diabetes.
            Therefore use of this term would likely return most or all the documents in the database.





                                             LOVELY PROFESSIONAL UNIVERSITY                                   159
   159   160   161   162   163   164   165   166   167   168   169