P. 137

Information Analysis and Repackaging

                   Notes            •  The goal of information retrieval (IR) is to provide users with those documents that will sat-
                                      isfy their information need.
                                    •  An information retrieval process begins when a user enters a query into the system.
                                    •  Precision is the fraction of the documents retrieved that are relevant to the user’s information
                                                         |{relevant documents} ∩  {retrieved documents}|
                                                precision =
                                                                    |{ retrieved documents}|
                                    •  The proportion of non-relevant documents that are retrieved, out of all non-relevant docu-
                                      ments available:
                                                        |{non-relevant documents} ∩  {retrieved documents}|
                                                fall-out =        |{ non-relevant documents}|

                                    •  The weighted harmonic mean of precision and recall, the traditional F-measure or balanced
                                      F-score is:
                                                          2 . precision . recall
                                                       F =   (precision +  recall)

                                      This is also known as the F  measure, because recall and precision are evenly weighted.
                                      The general formula for non-negative real β is:
                                                           (1 + β 2 ) (precision . recall)
                                                       F  =   β (  2  . precision  + recall)  .
                                    •  Average precision emphasizes ranking relevant documents higher. It is the average of preci-
                                      sions computed at the point of each of the relevant documents in the ranked sequence:

                                                                    ∑   ( P( r × rel ( ))
                                                                    r = 1
                                                       AveP =
                                                              number of relevant documents
                                      where r is the rank, N the number retrieved, rel() a binary function on the relevance of a given
                                      rank, and P(r) precision at a given cut-off rank:
                                                     |{relevant retrieved documents of rank   or less}|
                                                P(r) =
                                    •  Mean average precision for a set of queries is the mean of the average precision scores for
                                      each query.
                                                              ∑   AveP( q)
                                                              q = 1
                                                       MAP =
                                      where Q is the number of queries.

                                 6.7   Keywords

                                 Information Retrieval : Is to provide users with those documents that will satisfy their information
                                 Search Strategies   : Are comprehensive plans for finding information.

            132                              LOVELY PROFESSIONAL UNIVERSITY
   132   133   134   135   136   137   138   139   140   141   142