Page 200 - DLIS402_INFORMATION_ANALYSIS_AND_REPACKAGING
P. 200
Unit 11: Indexing Language: Types and Characteristics
The most important property of an indexing language is whether the indexer has to assign a given Notes
unit to a pre-established conceptual system or not. If he has to assign to a pre-established system the
most important property is whether the concepts or classes reflect the needs or not: Whether they
have an adequate reflection of the subject to be indexed and whether the level of specificity is good.
Only when these conditions have been met may other considerations be important. For example
there are hierarchical classifications more difficult to adopt to new developments compared with
alphabetical systems.
Like other semantic tools are indexing languages systems of concepts with more or less information
about semantic relations.
11.2 Types of Indexing Languages
Controlled indexing language: Only approved terms can be used by the indexer to describe the
document
Natural language indexing language: Any term from the document in question can be used to
describe the document.
Free indexing language: Any term (not only from the document) can be used to describe the
document.
When indexing a document, the indexer also has to choose the level of indexing exhaustivity, the
level of detail in which the document is described. For example using low indexing exhaustivity,
minor aspects of the work will not be described with index terms. In general the higher the indexing
exhaustivity, the more terms indexed for each document.
In recent years free text search as a means of access to documents has become popular. This involves
using natural language indexing with an indexing exhaustively set to maximum (every word in the
text is indexed). Many studies have been done to compare the efficiency and effectiveness of free
text searches against documents that have been indexed by experts using a few well chosen controlled
vocabulary descriptors.
Controlled vocabularies are often claimed to improve the accuracy of free text searching, such as to
reduce irrelevant items in the retrieval list. These irrelevant items (false positives) are often caused
by the inherent ambiguity of natural language. Take the English word football for example.
Football is the name given to a number of different team sports. Worldwide the most popular of
these team sports is Association football, which also happens to be called soccer in several countries.
The English language word football is also applied to Rugby football (Rugby union and rugby
league), American football, Australian rules football, Gaelic football, and Canadian football. A search
for football therefore will retrieve documents that are about several completely different sports.
Controlled vocabulary solves this problem by tagging the documents in such a way that the
ambiguities are eliminated.
Compared to free text searching, the use of a controlled vocabulary can dramatically increase the
performance of an information retrieval system, if performance is measured by precision (the
percentage of documents in the retrieval list that are actually relevant to the search topic).
In some cases controlled vocabulary can enhance recall as well, because unlike natural language
schemes, once the correct authorized term is searched, you don’t need to worry about searching for
other terms that might be synonyms of that term.
However, a controlled vocabulary search may also lead to unsatisfactory recall, in that it will fail to
retrieve some documents that are actually relevant to the search question.
LOVELY PROFESSIONAL UNIVERSITY 195