Page 117 - DLIS402_INFORMATION_ANALYSIS_AND_REPACKAGING
P. 117
Information Analysis and Repackaging
Notes An object is an entity that is represented by information in a database. User queries are matched
against the database information. Depending on the application the data objects may be, for example,
text documents, images, audio, mind maps or videos. Often the documents themselves are not kept
or stored directly in the IR system, but are instead represented in the system by document surrogates
or metadata.
Most IR systems compute a numeric score on how well each object in the database match the query,
and rank the objects according to this value. The top ranking objects are then shown to the user. The
process may then be iterated if the user wishes to refine the query.
Performance and Correctness Measures
Many different measures for evaluating the performance of information retrieval systems have been
proposed. The measures require a collection of documents and a query. All common measures
described here assume a ground truth notion of relevancy: every document is known to be either
relevant or non-relevant to a particular query. In practice queries may be ill-posed and there may be
different shades of relevancy.
Precision
Precision is the fraction of the documents retrieved that are relevant to the user’s information need.
|{relevant documents} ∩ {retrieved documents}|
precision =
|{ retrieved documents}|
In binary classification, precision is analogous to positive predictive value. Precision takes all
retrieved documents into account. It can also be evaluated at a given cut-off rank, considering only
the topmost results returned by the system. This measure is called precision at n or P@n.
Note that the meaning and usage of “precision” in the field of Information Retrieval differs from
the definition of accuracy and precision within other branches of science and technology.
Recall
Recall is the fraction of the documents that are relevant to the query that are successfully retrieved.
|{relevant documents}∩ {retrieved documents}|
recall =
|{ relevant documents}|
In binary classification, recall is called sensitivity. So it can be looked at as the probability that a
relevant document is retrieved by the query.
It is trivial to achieve recall of 100% by returning all documents in response to any query. Therefore
recall alone is not enough but one needs to measure the number of non-relevant documents also,
for example by computing the precision.
Fall-Out
The proportion of non-relevant documents that are retrieved, out of all non-relevant documents
available:
|{non-relevant documents} ∩ {retrieved documents}|
fall-out = |{ non-relevant documents}|
In binary classification, fall-out is closely related to specificity (1–specificity). It can be looked at as
the probability that a non-relevant document is retrieved by the query.
It is trivial to achieve fall-out of 0% by returning zero documents in response to any query.
112 LOVELY PROFESSIONAL UNIVERSITY