Page 218 - DLIS402_INFORMATION_ANALYSIS_AND_REPACKAGING
P. 218

Unit 11: Indexing Language: Types and Characteristics




                                                              Pragmatic                              Notes
                                                              Discourse
                                                               Semantic
                                                               Syntactic
                                                                Lexical
                                                          Morphological
                           Phonetic
                              Liddys model (2003) of Natural Language Processing


            Hjorland has in his writings suggested that approaches to Library and Information Science (LIS)
            are basically epistemologically approaches, why they may be classified according to epistemological
            positions, e.g. in empiricist, rationalist, historicist and pragmatist approaches). For the application
            of these categories to indexing in general see indexing theory). Is this classification also possible
            and valid for automatic indexing?
            In principle, this should be the case. However, as pointed out by Liddy (2003) has the “lower levels”
            of language been thoroughly researched and implemented in natural language processing. Such
            lower levels (sounds, words, sentences) are more related to automatic indexing, while higher levels
            (meaning, semantics, pragmatics, discourses) are more related to human understanding and
            indexing.
            This may mean that research on automatic indexing has so far not considered historicist and
            pragmatic approaches very much. As claimed by Svenonius (2000, p. 46-49) seems automating subject
            determination to belong to logical positivism: a subject is considered to be a string occurring above
            a certain frequency, which is not a stop word, and/or is found in a given location (e.g. title), or, in
            clustering algorithms, inferences are made such as “if document A is on subject X, then if document
            B is sufficiently similar to document A (above a certain threshold), then document B is on that
            subject.”
            A classification of approaches according to the epistemological point of view might look in this
            way:
            Empiricist approaches (inductive, bottom-up methods)
              •  Classical IR
              •  Td-idf
              •  Neural network
              •  Forms of: Bibliometric Knowledge Organization
            Rationalist approaches (deductive, top-down approaches, rule-based systems)
              •  Approaches based on universalistic assumptions about language and the mind
              •  Semantic primitives
              •  Semantics, compositional
            Historicist approaches (contextualizing approaches)
              •  Forms of: Bibliometric Knowledge Organization
              •  Sublanguage approaches
              •  Genre analysis (e.g., Rehm, 2002).
            Pragmatic approaches (approaches considering values, goals, interests, “paradigms”,
            epistemologies).





                                             LOVELY PROFESSIONAL UNIVERSITY                                   213
   213   214   215   216   217   218   219   220   221   222   223