Page 164 - DLIS402_INFORMATION_ANALYSIS_AND

Page 164 - DLIS402_INFORMATION_ANALYSIS_AND_REPACKAGING

P. 164

Unit 9: Cataloguing and Subject Indexing: Principles and Practices

First Step in Indexing Notes

The first step in indexing is to decide on the subject matter of the document. In manual indexing, the
indexer would consider the subject matter in terms of answer to a set of questions such as “Does the
document deal with a specific product, condition or phenomenon?” As the analysis is influenced by
the knowledge and experience of the indexer, it follows that two indexers may analyse the content
differently and so come up with different index terms. This will impact on the success of retrieval.

Automatic vs. Manual Subject Analysis

Automatic indexing follows set processes of analysing frequencies of word patterns and comparing
results to other documents in order to assign to subject categories. This requires no understanding of
the material being indexed therefore leads to more uniform indexing but this is at the expense of the
true meaning being interpreted. A computer program will not understand the meaning of statements
and may therefore fail to assign some relevant terms or assign incorrectly. Human indexers focus
their attention on certain parts of the document such as the title, abstract, summary and conclusions,
as analysing the full text in depth is costly and time consuming. An automated system takes away the
time limit and allows the entire document to be analysed, but also has the option to be directed to
particular parts of the document.

Second Stage
The second stage of indexing involves the translation of the subject analysis into a set of index terms.
This can involve extracting from the document or assigning from a controlled vocabulary. With the
ability to conduct a full text search widely available, many people have come to rely on their own
expertise in conducting information searches and full text search has become very popular.
Subject indexing and its experts, professional indexers, catalougers, and librarians, remain crucial
to information organization and retrieval. These experts understand controlled vocabularies and
are able to find information that cannot be located by full text search. The cost of expert analysis to
create subject indexing is not easily compared to the cost of hardware, software and labor to
manufacture a comparable set of full-text, fully searchable materials. With new web applications
that allow every user to annotate documents, social tagging has gained popularity especially in the
Web.

One application of indexing, the book index, remains relatively unchanged despite
the information revolution.

Extraction Indexing

Extraction indexing involves taking words directly from the document. It uses natural language and
lends itself well to automated techniques here word frequencies are calculated and those with a
frequency over a pre-determined threshold are used as index terms. A stop-list containing common
words such as the, and would be referred to and such stop words would be excluded as index terms.
Automated extraction indexing may lead to loss of meaning of terms by indexing single words as
opposed to phrases. Although it is possible to extract commonly occurring phrases, it becomes
more difficult if key concepts are inconsistently worded in phrases. Automated extraction indexing
also has the problem that even with use of a stop-list to remove common words such as “the,” some
frequent words may not be useful for allowing discrimination between documents.
For example, the term glucose is likely to occur frequently in any document related to diabetes.
Therefore use of this term would likely return most or all the documents in the database.

LOVELY PROFESSIONAL UNIVERSITY 159

159 160 161 162 163 164 165 166 167 168 169