Page 103 - DLIS405_INFORMATION_STORAGE_AND_RETRIEVAL
P. 103
Information Storage and Retrieval
Notes An information retrieval process begins when a user enters a query into the system. Queries are
formal statements of information needs, for example search strings in web search engines. In
information retrieval a query does not uniquely identify a single object in the collection. Instead,
several objects may match the query, perhaps with different degrees of relevancy.
An object is an entity that is represented by information in a database. User queries are matched
against the database information. Depending on the application the data objects may be, for example,
text documents, images, audio, mind maps or videos. Often the documents themselves are not kept
or stored directly in the IR system, but are instead represented in the system by document surrogates
or metadata.
Self Assessment
Multiple Choice Questions:
1. The physical and electronic diversity of an ......, along with the existence of multiple operating
platforms, enhances robustness, Flexibility, and adaptability.
(a) IRSE (b) ISRS
(c) DBMS (d) P2P
2. In Information retrieval a query does not uniquely identify a ...... object in the collection.
(a) Single (b) Double
(c) Triple (d) Fourth
10.2 Precision and Recall
Precision and recall are two widely used metrics for evaluating the correctness of a pattern recognition
algorithm. They can be seen as extended versions of accuracy, a simple metric that computes the
fraction of instances for which the correct result is returned.
When using precision and recall, the set of possible labels for a given instance is divided into two
subsets, one of which is considered “relevant” for the purposes of the metric. Recall is then computed
as the fraction of correct instances among all instances that actually belong to the relevant subset,
while precision is the fraction of correct instances among those that the algorithm believes to belong
to the relevant subset.
Precision can be seen as a measure of exactness or fidelity, whereas recall is a measure
of completeness.
In even simpler terms, a high recall means you haven’t missed anything but you may have a lot of
useless results to sift through (which would imply low precision). High precision means that
everything returned was a relevant result, but you might not have found all the relevant items
(which would imply low recall).
As an example, in an information retrieval scenario, the instances are documents and the task is to
return a set of relevant documents given a search term; or equivalently, to assign each document to
one of two categories, “relevant” and “not relevant”. In this case, the “relevant” documents are simply
those that belong to the “relevant” category. Recall is defined as the number of relevant documents
retrieved by a search divided by the total number of existing relevant documents, while precision is
defined as the number of relevant documents retrieved by a search divided by the total number of
documents retrieved by that search.
98 LOVELY PROFESSIONAL UNIVERSITY