Page 246 - DLIS402_INFORMATION_ANALYSIS_AND_REPACKAGING
P. 246

Unit 11: Indexing Language: Types and Characteristics




            The process whereby the creators of documents structure them to enhance retrieval is known as  Notes
            bottom-up indexing. A role for professional indexers in bottom-up indexing is as guides and trainers
            to document authors (Locke 1993).
            One reason that automatic indexing may be unsuited to book indexing is that book indexes are not
            usually available electronically, and cannot be used in conjunction with powerful search software
            (Mulvany and Milstead 1994).

            Document Abstracting
            Computers abstract documents (that is, condense their text) by searching for high frequency words
            in the text, and then selecting sentences in which clusters of these high frequency words occur.
            These sentences are then used in the order in which they appear in the text to make up the abstract.
            Flow can be improved by adding extra sentences (for example, if a sentence begins with ‘Hence’ or
            ‘However’ the previous sentence can be included as well) but the abstract remains an awkward
            collection of grammatically unrelated sentences.
            To try and show the subject content, weighting can be given to sentences from certain locations in
            the document (e.g., the introduction) and to sentences containing cue words (e.g., ‘finally’, which
            suggests that a conclusion is starting). In addition, an organisation can give a weighting to words
            which are important to them: a footwear producer, for example, could require that every sentence
            containing the words foot or shoe should be included in the abstract.
            Function: noun
            Text: 1
            Synonyms FOOL 3, butt, chump, dupe, fall guy, gudgeon, gull, pigeon, sap, sucker
            || 2
            Synonyms DOLLAR, bill, ||bone, ||buck, ||frogskin, ||iron man, one, ||skin, ||smacker, ||smackeroo
            Whereas an IR-oriented thesaurus’s aims are completely different: for example, this excerpt of the
            INSPEC Thesaurus (built to assist IR in the fields of physics, electrical engineering, electronics,
            computers and control):
            THESAURUS search words: natural languages
            UF natural language processing (UF=used for natural language processing)
            BT languages (BT=broader term is languages)
            TT languages (TT=top term in a hierarchy of terms)
            RT artificial intelligence (RT=related term/s)
                   computational linguistic
                   formal languages
                   programming languages
                   query languages
                   specification languages
                   speech recognition
                   user interfaces
            CC C4210L; C6140D; C6180N; C7820(CC=classification code)
            DI  January 1985(DI=date [1985])
            PT high level languages (PT=prior term to natural languages)
            This is still a manually generated thesauri (more on this later), but the differences are already
            apparent: it’s objective is no longer to provide better, richer vocabulary to a writer. Instead, it aims
            at:
             —   Assist indexing by providing a common, precise and controlled vocabulary. For an example,
                 libraries commonly use a similar hierarchy to classify their books.





                                             LOVELY PROFESSIONAL UNIVERSITY                                   241
   241   242   243   244   245   246   247   248   249   250   251