Page 217 - DLIS402_INFORMATION_ANALYSIS_AND_REPACKAGING
P. 217

Information Analysis and Repackaging



                   Notes         Automatic indexing may be related to particular views on semantics and on systems evaluation
                                 that differs from philosophies associated with “intellectual indexing”. Semantic relations such as
                                 synonymy may be understood as a strong degree of co-occurrences.
                                 Anderson and Pérez-Carballo write:
                                 “Throughout the history of automatic indexing, two major theoretical models have emerged: the
                                 “vector-space model” and the probabilistic model. Sparck Jones, Walker and Robertson (2000) have
                                 provided a through review of the development, versions, results, and current status of the
                                 probabilistic model. In comparing this model to others, they conclude that “by far the best-developed
                                 non-probabilistic view of IR is the vector-space model (VSM), most famously embodied in the SMART
                                 system (Salton, 1975, Salton & McGill, 1983a). In some respect the basic logic of the VSM is common
                                 to many other approaches, including our own [i.e., the probabilistic model] . . . In practice the
                                 difference [between these two models] has become somewhat blurred.
                                 Each approach has borrowed ideas from the other, and to some extent the original motivations
                                 have become disguised by the process. . . . This mutual learning is reflected in the results of successive
                                 round[s] of TREC. . . . It may be argued that the performance differences that do appear have more
                                 to do with choices of the device set used, and detailed matters of implementation, than with
                                 foundational differences of approach”.
                                 The focus of our discussion will be on the automatic indexing of language texts. The various tactics
                                 and strategies are emphasized, rather than the underlying theoretical models”.
                                 Sparck Jones, Walker and Robertson (2000) compare their own probabilistic approach with other
                                 “approaches, models, methods and techniques”:
                                 The vector space model
                                 Probabilistic indexing and a unified model
                                 Dependency
                                 Logical information retrieval
                                 Networks
                                 Regression
                                 Other models (Hidden Markov Model)
                                 Golub (2005) made a distinction between “text categorization” and “document clustering”. The last
                                 approach is based on the information retrieval-tradition, while text-categorization is based on
                                 machine-learning in the artificial intelligence-tradition.
                                 Luckhardt (2006) presents the following approaches:
                                 The general linguistic approach
                                 The morpho-syntactic approach to automatic tagging
                                 The sublanguage approach: How can different domains be dealt with?
                                 The semantic relations approach: towards a semantic interlingua
                                 The semantic (text) knowledge approach: ´classification and thesauri and their use in NLP.
                                 As we see seem different authors writing on approaches to automatic indexing to disagree on what
                                 approaches actually exists. One way to consider approaches would be to consider the different
                                 levels of language considered.









            212                              LOVELY PROFESSIONAL UNIVERSITY
   212   213   214   215   216   217   218   219   220   221   222