Page 121 - DLIS405_INFORMATION_STORAGE_AND_RETRIEVAL
P. 121

Information Storage and Retrieval



                 Notes          12.3 Indexing Languages


                                There are three main types of indexing languages.
                                  •  Controlled indexing language: Only approved terms can be used by the indexer to describe
                                     the document.
                                  •  Natural language indexing language: Any term from the document in question can be used
                                     to describe the document.
                                  •  Free indexing language: Any term (not only from the document) can be used to describe the
                                     document.
                                When indexing a document, the indexer also has to choose the level of indexing exhaustivity, the
                                level of detail in which the document is described. For example using low indexing exhaustivity,
                                minor aspects of the work will not be described with index terms. In general the higher the indexing
                                exhaustivity, the more terms indexed for each document.
                                Controlled vocabularies are often claimed to improve the accuracy of free text searching, such as to
                                reduce irrelevant items in the retrieval list. These irrelevant items (false positives) are often caused
                                by the inherent ambiguity of natural language. Take the English word football for example. Football
                                is the name given to a number of different team sports. Worldwide the most popular of these team
                                sports is Association football, which also happens to be called soccer in several countries.
                                Compared to free text searching, the use of a controlled vocabulary can dramatically increase the
                                performance of an information retrieval system, if performance is measured by precision (the
                                percentage of documents in the retrieval list that are actually relevant to the search topic).
                                In some cases controlled vocabulary can enhance recall as well, because unlike natural language
                                schemes, once the correct authorized term is searched, you don’t need to worry about searching for
                                other terms that might be synonyms of that term.
                                However, a controlled vocabulary search may also lead to unsatisfactory recall, in that it will fail to
                                retrieve some documents that are actually relevant to the search question.
                                This is particularly problematic when the search question involves terms that are sufficiently
                                tangential to the subject area such that the indexer might have decided to tag it using a different
                                term (but the searcher might consider the same). Essentially, this can be avoided only by an
                                experienced user of controlled vocabulary whose understanding of the vocabulary coincides with
                                the way it is used by the indexer.
                                Controlled vocabularies are also quickly out-dated and in fast developing fields of knowledge, the
                                authorized terms available might not be available if they are not updated regularly. Even in the best
                                case scenario, controlled language is often not as specific as using the words of the text itself. Indexers
                                trying to choose the appropriate index terms might misinterpret the author, while a free text search
                                is in no danger of doing so, because it uses the author’s own words.
                                The use of controlled vocabularies can be costly compared to free text searches because human
                                experts or expensive automated systems are necessary to index each entry. Furthermore, the user
                                has to be familiar with the controlled vocabulary scheme to make best use of the system. But as
                                already mentioned, the control of synonyms, homographs can help increase precision.
                                Numerous methodologies have been developed to assist in the creation of controlled vocabularies,
                                including faceted classification, which enables a given data record or document to be described in
                                multiple ways.


                                Types of Controlled Vocabularies

                                Currier (2005) distinguish between the following kinds of controlled vocabularies to which we added
                                metadata schemes.





          116                              LOVELY PROFESSIONAL UNIVERSITY
   116   117   118   119   120   121   122   123   124   125   126