Page 226 - DLIS402_INFORMATION_ANALYSIS_AND_REPACKAGING
P. 226

Unit 11: Indexing Language: Types and Characteristics




            This may mean that research on automatic indexing has so far not considered historicist and  Notes
            pragmatic approaches very much. As claimed by Svenonius seems automating subject determination
            to belong to logical positivism: a subject is considered to be a string occurring above a certain
            frequency, which is not a stop word, and/or is found in a given location (e.g., title), or, in clustering
            algorithms, inferences are made such as “if document A is on subject X, then if document B is
            sufficiently similar to document A (above a certain threshold), then document B is on that subject.”
            A classification of approaches according to the epistemological point of view might look in this
            way:
            Empiricist approaches (inductive, bottom-up methods)
              •  Classical IR
              •  Td-idf
              •  Neural network
              •  Forms of: Bibliometric Knowledge Organization
              •  Rationalist approaches (deductive, top-down approaches, rule-based systems)
              •  Approaches based on universalistic assumptions about language and the mind
              •  Semantic primitives
              •  Semantics, compositional
              •  Historicist approaches (contextualizing approaches)
              •  Forms of: Bibliometric Knowledge Organization
              •  Sublanguage approaches.
            Genre analysis (e.g., Rehm, 2002)
              •  Pragmatic approaches (approaches considering values, goals, interests, “paradigms”, episte-
                 mologies).
              •  Forms of: Bibliometric Knowledge Organization.
            Approaches based on Exemplary documents
            “For the past ten years DRTC/ISI have had several projects on automatic indexing and automatic
            classification based on the conceptual principles of faceted classifications by Ranganathan and
            Bhattacharyya’s theory of “deep structure of subject indexing languages”. E.g. POPSI (knowledge
            representation model chosen to support inference rules for syntax synthesis), PROMETHEUS (parses
            expressive titles and extracts noun phrases within documents which are then processed through a
            knowledge representation model to generate meaningful strings) and VYASA (a knowledge
            representation system for automatic maintenance of analytico-synthetic scheme) “ Aida Slavic, 2006-
            09-03, message posted on iskol@lists.gseis.ucla.edu.

            Automatic and human indexing: Comparative aspects

            Martin Tulic expresses a skeptical attitude towards automatic indexing:
            “The primary reason computers cannot automatically generate usable indexes is that, in indexing,
            abstraction is more important than alphabetization. Abstractions result from intellectual processes
            based on judgments about what to include and what to exclude. Computers are good at algorithmic
            processes such as alphabetization, but not good at inexplicable processes such as abstraction.
            Another reason is that headings in an index do not depend solely on terms used in the document;
            they also depend on terminology employed by intended users of the index and on their familiarity
            with the document. For example: in medical indexing, separate entries may need to be provided for
            brand names of drugs, chemical names, popular names and names used in other countries, even
            when certain of the names are not mentioned in the text. A third reason is that indexes should not
            contain headings for topics for which there is no information in the document.



                                             LOVELY PROFESSIONAL UNIVERSITY                                   221
   221   222   223   224   225   226   227   228   229   230   231