Page 108 - DLIS405_INFORMATION_STORAGE_AND_RETRIEVAL
P. 108

Unit 10: Information Storage and Retrieval System




          the study of what would later be called bibliometrics. In the 1930s and 1940s, S. C. Bradford used the  Notes
          term “relevant” to characterize articles relevant to a subject (cf., Bradford’s law). In the 1950s, the first
          information retrieval systems emerged, and researchers noted the retrieval of irrelevant articles as a
          significant concern. In 1958, B. C. Vickery made the concept of relevance explicit in an address at the
          International Conference on Scientific Information.Since 1958, information scientists have explored
          and debated definitions of relevance.




                   A particular focus of the debate was the distinction between “relevance to a subject”
                   or “topical relevance” and “user relevance”.

          Evaluation

          The information retrieval community has emphasized the use of test collections and benchmark tasks
          to measure topical relevance, starting with the Cranfield Experiments of the early 1960s and culminating
          in the TREC evaluations that continue to this day as the main evaluation framework for information
          retrieval research.In order to evaluate how well an information retrieval system retrieved topically
          relevant results, the relevance of retrieved results must be quantified. In Cranfield-style evaluations,
          this typically involves assigning a relevance level to each retrieved result, a process known as relevance
          assessment. Relevance levels can be binary (indicating a result is relevant or that it is not relevant), or
          graded (indicating results have a varying degree of match between the topic of the result and the
          information need).
          Once relevance levels have been assigned to the retrieved results, information retrieval performance
          measures can be used to assess the quality of a retrieval system’s output. In contrast to this focus
          solely on topical relevance, the information science community has emphasized user studies that
          consider user relevance. These studies often focus on aspects of human-computer interaction.

          Clustering and Relevance

          The cluster hypothesis, proposed by C. J. van Rijsbergen in 1979, asserts that two documents that are
          similar to each other have a high likelihood of being relevant to the same information need. With
          respect to the embedding similarity space, the cluster hypothesis can be interpreted globally or locally.
          The global interpretation assumes that there exist some fixed set of underlying topics derived from
          inter-document similarity. These global clusters or their representatives can then be used to relate
          relevance of two documents (e.g. two documents in the same cluster should both be relevant to the
          same request). Methods in this spirit include.

          Cluster-based Information Retrieval

          Cluster-based document expansion such as latent semantic analysis or its language modelling
          equivalents. It is important to ensure that clusters–either in isolation or combination – successfully
          model the set of possible relevant documents.
          A second interpretation, most notably advanced by Ellen Voorhees, focuses on the local relationships
          between documents. The local interpretation avoids having to model the number or size of clusters
          in the collection and allow relevance at multiple scales. Methods in this spirit include, multiple
          cluster retrieval spreading activation and relevance propagation methods.







                                            LOVELY PROFESSIONAL UNIVERSITY                                  103
   103   104   105   106   107   108   109   110   111   112   113