Page 92 - DLIS405_INFORMATION_STORAGE_AND_RETRIEVAL
P. 92
Unit 9: Trends in Indexing
Assigning terms, which is not simple substitutions of synonyms, but which represents independent Notes
conceptualizations of document contents may turn out to be the most important area in which
human indexing is better than automatic indexing.
As traditional classification is a time-consuming and expensive process, it is obvious that
investigations into the use of automated solutions are worthwhile. At the same time, classification
is an activity where a significant level of human expertise, abstract thinking and understanding is
needed and this is not easy to replace by artificial intelligence or expert systems. There are no known
examples of traditional library classification being undertaken completely by computer software.
Knowledge structuring on the Internet has to cope with far larger numbers of resources, exponential
growth rates and a high risk of changes occurring in documents which already exist.
This is the background to a growing number of research projects and experimental systems which
are trying to support knowledge-structuring activities on the Internet with automatic methods.
Most of these projects use methods of derived indexing, i.e. they extract information from the
documents and then use it for structuring tasks.
Automated classification will probably not replace intellectual classification as far as quality subject
services are concerned, but will rather support and complement selection and subject indexing
efforts. Intellectual classification is always needed to validate and improve the automatic methods.
However, robot-generated databases, as an add-on to quality services in a subject area, will be
automatically classified. One practical goal in DESIRE II is to explore simple applications of
automated classification methods on a robot-generated subject index to the Web.
Many different tests will be carried out on the ‘All’ Engineering (AE) robot-generated
database of engineering documents from the Internet.
The effort required will be studied and the resulting outcomes evaluated. A pilot service of the ‘All’
Engineering Web index will offer a full classification and browsing structure with the most suitable
solution found during the project. In addition, a comprehensive state-of-the-art report on projects,
methods, alternatives and problems concerning automatic classification will also be presented.
9.2 Assigned Indexing
Assigned terms may, on the one hand simply substitute terms represented in the document with
other terms, e.g. from a controlled vocabulary. On the other hand, an assigned term may represent a
conceptualization of the document, which is not expressed in the document with any terms. A romantic
poem, for example, does not describe itself as such, but may be assigned the term “romantic poem”.
It is common to classify documents according to an organization of disciplines.
Documents may or may not describe their disciplinary memberships. Even if they do, the authors
organization of disciplines may be different from those chosen to be assigned by a library or an
information system. Assigning terms, which is not a simple substitutions of synonyms, but which
represents independent conceptualizations of document contents may turn out to be the most
important area in which human indexing is better than automatic indexing.
From the preceding discussion, it is clear that if the terms are selected from the title or the text of a
document and used without any alteration as index terms, then this is referred to as natural language
indexing or derived indexing. If however, the selected terms are translated or encoded into authorized
terms by the help of a prescribed list, then the indexing language becomes controlled or artificial.
This process is called Assigned Indexing.
LOVELY PROFESSIONAL UNIVERSITY 87