Page 137 - DLIS402_INFORMATION_ANALYSIS_AND_REPACKAGING
P. 137
Information Analysis and Repackaging
Notes • The goal of information retrieval (IR) is to provide users with those documents that will sat-
isfy their information need.
• An information retrieval process begins when a user enters a query into the system.
• Precision is the fraction of the documents retrieved that are relevant to the user’s information
need.
|{relevant documents} ∩ {retrieved documents}|
precision =
|{ retrieved documents}|
• The proportion of non-relevant documents that are retrieved, out of all non-relevant docu-
ments available:
|{non-relevant documents} ∩ {retrieved documents}|
fall-out = |{ non-relevant documents}|
• The weighted harmonic mean of precision and recall, the traditional F-measure or balanced
F-score is:
2 . precision . recall
F = (precision + recall)
This is also known as the F measure, because recall and precision are evenly weighted.
1
The general formula for non-negative real β is:
(1 + β 2 ) (precision . recall)
.
F = β ( 2 . precision + recall) .
β
• Average precision emphasizes ranking relevant documents higher. It is the average of preci-
sions computed at the point of each of the relevant documents in the ranked sequence:
N
∑ ( P( r × rel ( ))
r
)
r = 1
AveP =
number of relevant documents
where r is the rank, N the number retrieved, rel() a binary function on the relevance of a given
rank, and P(r) precision at a given cut-off rank:
r
|{relevant retrieved documents of rank or less}|
P(r) =
r
• Mean average precision for a set of queries is the mean of the average precision scores for
each query.
Q
∑ AveP( q)
q = 1
MAP =
Q
where Q is the number of queries.
6.7 Keywords
Information Retrieval : Is to provide users with those documents that will satisfy their information
need.
Search Strategies : Are comprehensive plans for finding information.
132 LOVELY PROFESSIONAL UNIVERSITY