Page 104 - DLIS405_INFORMATION_STORAGE_AND_RETRIEVAL
P. 104

Unit 10: Information Storage and Retrieval System




          In a classification task, the precision for a class is the number of true positives (i.e. the number of  Notes
          items correctly labeled as belonging to the positive class) divided by the total number of elements
          labeled as belonging to the positive class (i.e. the sum of true positives and false positives, which are
          items incorrectly labeled as belonging to the class). Recall in this context is defined as the number of
          true positives divided by the total number of elements that actually belong to the positive class (i.e.
          the sum of true positives and false negatives, which are items which were not labeled as belonging
          to the positive class but should have been).
          Often, there is an inverse relationship between precision and recall, where it is possible to increase
          one at the cost of reducing the other. For example, an information retrieval system (such as a search
          engine) can often increase its recall by retrieving more documents, at the cost of increasing number
          of irrelevant documents retrieved (decreasing precision). Similarly, a classification system for
          deciding whether or not, say, a fruit is an orange, can achieve high precision by only classifying
          fruits with the exact right shape and color as oranges, but at the cost of low recall due to the number
          of false negatives from oranges that did not quite match the specification.

          Information Retrieval Context

          In information retrieval contexts, precision and recall are defined in terms of a set of retrieved
          documents (e.g. the list of documents produced by a web search engine for a query) and a set of
          relevant documents (e.g. the list of all documents on the internet that are relevant for a certain topic).


          10.3   Precision

          In the field of information retrieval, precision is the fraction of retrieved documents that are relevant
          to the search:

          Precision takes all retrieved documents into account, but it can also be evaluated at a given cut-off
          rank, considering only the topmost results returned by the system. This measure is called precision
          at n or P@n.

          For example for a text search on a set of documents precision is the number of correct results divided
          by the number of all returned results.
          Precision is also used with recall, the percent of all relevant documents that is returned by the
          search. The two measures are sometimes used together in the F1 Score (or f-measure) to provide a
          single measurement for a system.





                   The meaning and usage of “precision” in the field of Information Retrieval differs
                   from the definition of accuracy and precision within other branches of science and
                   technology.


          10.4 Recall

          Recall in information retrieval is the fraction of the documents that are relevant to the query that are
          successfully retrieved.
          For example for text search on a set of documents recall is the number of correct results divided by
          the number of results that should have been returned.





                                            LOVELY PROFESSIONAL UNIVERSITY                                   99
   99   100   101   102   103   104   105   106   107   108   109