Page 120 - DLIS402_INFORMATION_ANALYSIS_AND_REPACKAGING
P. 120

Unit 6: Information Retrieval Model and Search Strategies




            Model types                                                                              Notes
                                                Figure 6.2


                   Properties
                     of the     Without                   With term-interdependencies
              Mathe-  model   term-interde-
              matical          pendencies             Immanent            Transcendent
              basis                                term-dependencies  term-interdependencies

                             Standard
                              boolean                                      Fuzzy
               Set-theoretic                                                 set

                                       Extended
                                       boolean
                                                                                Balanced
                                                Generalised         Topic-based
                                                vector space        vector space  topic-based
                              Vector                                           vector space
               Algebraic      space                        Spread
                                                 Latent   activation   Back propagation
                                                semantic  neuronal
                                                          network      neuronal network
                               Binary
                              interde-     Language
                              pendence                                     Retrieval
               Probabilistic                                               by logical
                             Inference   Belief                             imaging
                             network    network

            Categorization of IR-models (translated from German entry, original source Dominik Kuropka).
            For the information retrieval to be efficient, the documents are typically transformed into a suitable
            representation. There are several representations. The picture above illustrates the relationship of
            some common models. In the picture, the models are categorized according to two dimensions: the
            mathematical basis and the properties of the model.
            First dimension: mathematical basis
               Set-theoretic models represent documents as sets of words or phrases. Similarities are usually
                 derived from set-theoretic operations on those sets. Common models are:
                 • Standard Boolean model
                 • Extended Boolean model
                 • Fuzzy retrieval
               Algebraic models represent documents and queries usually as vectors, matrices, or tuples.
                 The similarity of the query vector and document vector is represented as a scalar value.
                 • Vector space model
                 • Generalized vector space model
                 • (Enhanced) Topic-based Vector Space Model
                 • Extended Boolean model
                 • Latent semantic indexing aka latent semantic analysis
               Probabilistic models treat the process of document retrieval as a probabilistic inference. Simi-
                 larities are computed as probabilities that a document is relevant for a given query. Probabi-
                 listic theorems like the Bayes’ theorem are often used in these models.




                                             LOVELY PROFESSIONAL UNIVERSITY                                   115
   115   116   117   118   119   120   121   122   123   124   125