Page 104 - DLIS405_INFORMATION_STORAGE_AND

Page 104 - DLIS405_INFORMATION_STORAGE_AND_RETRIEVAL

P. 104

Unit 10: Information Storage and Retrieval System

In a classification task, the precision for a class is the number of true positives (i.e. the number of Notes
items correctly labeled as belonging to the positive class) divided by the total number of elements
labeled as belonging to the positive class (i.e. the sum of true positives and false positives, which are
items incorrectly labeled as belonging to the class). Recall in this context is defined as the number of
true positives divided by the total number of elements that actually belong to the positive class (i.e.
the sum of true positives and false negatives, which are items which were not labeled as belonging
to the positive class but should have been).
Often, there is an inverse relationship between precision and recall, where it is possible to increase
one at the cost of reducing the other. For example, an information retrieval system (such as a search
engine) can often increase its recall by retrieving more documents, at the cost of increasing number
of irrelevant documents retrieved (decreasing precision). Similarly, a classification system for
deciding whether or not, say, a fruit is an orange, can achieve high precision by only classifying
fruits with the exact right shape and color as oranges, but at the cost of low recall due to the number
of false negatives from oranges that did not quite match the specification.

Information Retrieval Context

In information retrieval contexts, precision and recall are defined in terms of a set of retrieved
documents (e.g. the list of documents produced by a web search engine for a query) and a set of
relevant documents (e.g. the list of all documents on the internet that are relevant for a certain topic).

10.3 Precision

In the field of information retrieval, precision is the fraction of retrieved documents that are relevant
to the search:

Precision takes all retrieved documents into account, but it can also be evaluated at a given cut-off
rank, considering only the topmost results returned by the system. This measure is called precision
at n or P@n.

For example for a text search on a set of documents precision is the number of correct results divided
by the number of all returned results.
Precision is also used with recall, the percent of all relevant documents that is returned by the
search. The two measures are sometimes used together in the F1 Score (or f-measure) to provide a
single measurement for a system.

The meaning and usage of “precision” in the field of Information Retrieval differs
from the definition of accuracy and precision within other branches of science and
technology.

10.4 Recall

Recall in information retrieval is the fraction of the documents that are relevant to the query that are
successfully retrieved.
For example for text search on a set of documents recall is the number of correct results divided by
the number of results that should have been returned.

LOVELY PROFESSIONAL UNIVERSITY 99

99 100 101 102 103 104 105 106 107 108 109