Page 215 - DLIS402_INFORMATION_ANALYSIS_AND_REPACKAGING
P. 215

Information Analysis and Repackaging



                   Notes         Self Assessment


                                 Multiple Choice Questions:
                                  1.   ...... are a proper subset of context-sensitive languages.
                                        (a)  Syntactic structure      (b) Syndectic structure  (c) Indexed languages.
                                  2.   ...... are kinds of metadata.
                                        (a)  Indexing languages       (b) Vocabulary tools    (c) IR thesaurus
                                  3.   The systematic selection of standardised terms is known as ...... .

                                        (a)  IR thesaurus             (b) Vocabulary control  (c) Need analysis.
                                  4.   ...... is a reference work that lists words grouped together according to similarity of meaning.
                                        (a)  Indexed languages        (b) Vocabulary cools    (c) Thesaurus.


                                 11.7 Automatic Indexing

                                 Automatic indexing is indexing made by algorithmic procedures. The algorithm works on a database
                                 containing document representations (which may be full text representations or bibliographical records
                                 or partial text representations and in principle also value added databases).



                                             Automatic indexing may also be performed on non-text databases, e.g. images or
                                             music.

                                 In text-databases may the algorithm perform string searching, but is mostly based on searching the
                                 words in the single document representation as well as in the total database (via inverted files). The
                                 use of words is mostly based on stemming). Algorithms may count co-occurrences of words (or
                                 references), they may consider levels of proximity between words, and so on.
                                 Automatic indexing may be contrasted to human indexing. It should be considered however, that if
                                 humans are being taught strict rules on how to index, their indexing should also be considered
                                 mechanical or algorithmic. If, for example, a librarian mechanically matches words from titles with
                                 words from a controlled vocabulary, is this corresponding to primitive forms of automatic indexing.
                                 It is also an open question whether the principles developed by the facet analytic approach can be
                                 automated.
                                 Of this reason should manual indexing and machine indexing not necessarily be considered two
                                 fundamentally different approaches to indexing, but the principles and assumptions underlying
                                 both kinds of indexing should be uncovered. For example, are assigned and derived indexing
                                 approaches, which may be applied - although differently - by both humans and machines. As pointed
                                 out by Anderson & Pérez-Carballo (2001), we know more about computer indexing than about
                                 human indexing because “machine methods must be rigorously described in detail for the computer
                                 to carry them out”. Automatic indexing may thus inspire us to put more precise questions also
                                 about human indexing.
                                 The earliest and most primitive form of automatic indexing were the KWIC / KWAC/ KWOC
                                 systems based just on simple, mechanical manipulations of terms derived from document titles.
                                 Related forms are the Permuterm Subject Index and the KeyWord Plus known from ISI’s citation
                                 indexes (this last system is based on assigning terms from cited titles).







            210                              LOVELY PROFESSIONAL UNIVERSITY
   210   211   212   213   214   215   216   217   218   219   220