Page 119 - DLIS402_INFORMATION_ANALYSIS_AND_REPACKAGING
P. 119
Information Analysis and Repackaging
Notes R-Precision
Precision at R-th position in the ranking of results for a query that has R relevant documents. This
measure is highly correlated to Average Precision.
Mean average precision
Mean average precision for a set of queries is the mean of the average precision scores for each query.
Q
∑ AveP( q)
q = 1
MAP =
Q
where Q is the number of queries.
Discounted cumulative gain
DCG uses a graded relevance scale of documents from the result set to evaluate the usefulness, or
gain, of a document based on its position in the result list. The premise of DCG is that highly relevant
documents appearing lower in a search result list should be penalized as the graded relevance value
is reduced logarithmically proportional to the position of the result.
The DCG accumulated at a particular rank position p is defined as:
p rel
DCG p = rel + 1 ∑ log i i
i = 2 2
Since result set may vary in size among different queries or systems, to compare performances the
normalised version of DCG uses an ideal DCG. To this end, it sorts documents of a result list by
relevance, producing an ideal DCG at position p (IDCG ), which normalizes the score:
p
DCG
nDCG = p
p
IDCG p
The nDCG values for all queries can be averaged to obtain a measure of the average performance of
a ranking algorithm. Note that in a perfect ranking algorithm, the DCG will be the same as theIDCG p
p
producing an nDCG of 1.0. All nDCG calculations are then relative values on the interva l 0.0 to 1.0
and so are cross-query comparable.
This model has been very productive and has promoted our understanding of information retrieval
in many ways. However, as Kuhn noted, major models that are as central to a field as this one is,
eventually begin to show inadequacies as testing leads to greater and greater understanding of the
processes being studied. The limitations of the original model’s representation of the phenomenon
of interest become more and more evident.
It is only fitting, then, that in recent years the above classic model has come under attack in various
ways. Oddy and Belkin et al. have asked why it is necessary for the searcher to find a way to represent
the information need in a query understandable by the system. Why cannot the system make it
possible for the searcher to express the need directly as they would ordinarily, instead of in an
artificial query representation for the system’s consumption?
114 LOVELY PROFESSIONAL UNIVERSITY