Page 218 - DLIS402_INFORMATION_ANALYSIS_AND_REPACKAGING
P. 218
Unit 11: Indexing Language: Types and Characteristics
Pragmatic Notes
Discourse
Semantic
Syntactic
Lexical
Morphological
Phonetic
Liddys model (2003) of Natural Language Processing
Hjorland has in his writings suggested that approaches to Library and Information Science (LIS)
are basically epistemologically approaches, why they may be classified according to epistemological
positions, e.g. in empiricist, rationalist, historicist and pragmatist approaches). For the application
of these categories to indexing in general see indexing theory). Is this classification also possible
and valid for automatic indexing?
In principle, this should be the case. However, as pointed out by Liddy (2003) has the “lower levels”
of language been thoroughly researched and implemented in natural language processing. Such
lower levels (sounds, words, sentences) are more related to automatic indexing, while higher levels
(meaning, semantics, pragmatics, discourses) are more related to human understanding and
indexing.
This may mean that research on automatic indexing has so far not considered historicist and
pragmatic approaches very much. As claimed by Svenonius (2000, p. 46-49) seems automating subject
determination to belong to logical positivism: a subject is considered to be a string occurring above
a certain frequency, which is not a stop word, and/or is found in a given location (e.g. title), or, in
clustering algorithms, inferences are made such as “if document A is on subject X, then if document
B is sufficiently similar to document A (above a certain threshold), then document B is on that
subject.”
A classification of approaches according to the epistemological point of view might look in this
way:
Empiricist approaches (inductive, bottom-up methods)
• Classical IR
• Td-idf
• Neural network
• Forms of: Bibliometric Knowledge Organization
Rationalist approaches (deductive, top-down approaches, rule-based systems)
• Approaches based on universalistic assumptions about language and the mind
• Semantic primitives
• Semantics, compositional
Historicist approaches (contextualizing approaches)
• Forms of: Bibliometric Knowledge Organization
• Sublanguage approaches
• Genre analysis (e.g., Rehm, 2002).
Pragmatic approaches (approaches considering values, goals, interests, “paradigms”,
epistemologies).
LOVELY PROFESSIONAL UNIVERSITY 213