Page 200 - DLIS402_INFORMATION_ANALYSIS_AND_REPACKAGING
P. 200

Unit 11: Indexing Language: Types and Characteristics




            The most important property of an indexing language is whether the indexer has to assign a given  Notes
            unit to a pre-established conceptual system or not. If he has to assign to a pre-established system the
            most important property is whether the concepts or classes reflect the needs or not: Whether they
            have an adequate reflection of the subject to be indexed and whether the level of specificity is good.
            Only when these conditions have been met may other considerations be important. For example
            there are hierarchical classifications more difficult to adopt to new developments compared with
            alphabetical systems.
            Like other semantic tools are indexing languages systems of concepts with more or less information
            about semantic relations.


            11.2 Types of Indexing Languages

            Controlled indexing language: Only approved terms can be used by the indexer to describe the
            document
            Natural language indexing language: Any term from the document in question can be used to
            describe the document.
            Free indexing language: Any term (not only from the document) can be used to describe the
            document.
            When indexing a document, the indexer also has to choose the level of indexing exhaustivity, the
            level of detail in which the document is described. For example using low indexing exhaustivity,
            minor aspects of the work will not be described with index terms. In general the higher the indexing
            exhaustivity, the more terms indexed for each document.
            In recent years free text search as a means of access to documents has become popular. This involves
            using natural language indexing with an indexing exhaustively set to maximum (every word in the
            text is indexed). Many studies have been done to compare the efficiency and effectiveness of free
            text searches against documents that have been indexed by experts using a few well chosen controlled
            vocabulary descriptors.
            Controlled vocabularies are often claimed to improve the accuracy of free text searching, such as to
            reduce irrelevant items in the retrieval list. These irrelevant items (false positives) are often caused
            by the inherent ambiguity of natural language. Take the English word football for example.
            Football is the name given to a number of different team sports. Worldwide the most popular of
            these team sports is Association football, which also happens to be called soccer in several countries.
            The English language word football is also applied to Rugby football (Rugby union and rugby
            league), American football, Australian rules football, Gaelic football, and Canadian football. A search
            for football therefore will retrieve documents that are about several completely different sports.
            Controlled vocabulary solves this problem by tagging the documents in such a way that the
            ambiguities are eliminated.
            Compared to free text searching, the use of a controlled vocabulary can dramatically increase the
            performance of an information retrieval system, if performance is measured by precision (the
            percentage of documents in the retrieval list that are actually relevant to the search topic).
            In some cases controlled vocabulary can enhance recall as well, because unlike natural language
            schemes, once the correct authorized term is searched, you don’t need to worry about searching for
            other terms that might be synonyms of that term.
            However, a controlled vocabulary search may also lead to unsatisfactory recall, in that it will fail to
            retrieve some documents that are actually relevant to the search question.






                                             LOVELY PROFESSIONAL UNIVERSITY                                   195
   195   196   197   198   199   200   201   202   203   204   205