Page 103 - DLIS405_INFORMATION_STORAGE_AND_RETRIEVAL
P. 103

Information Storage and Retrieval



                 Notes          An information retrieval process begins when a user enters a query into the system. Queries are
                                formal statements of information needs, for example search strings in web search engines. In
                                information retrieval a query does not uniquely identify a single object in the collection. Instead,
                                several objects may match the query, perhaps with different degrees of relevancy.
                                An object is an entity that is represented by information in a database. User queries are matched
                                against the database information. Depending on the application the data objects may be, for example,
                                text documents, images, audio, mind maps or videos. Often the documents themselves are not kept
                                or stored directly in the IR system, but are instead represented in the system by document surrogates
                                or metadata.


                                Self Assessment

                                Multiple Choice Questions:
                                 1.   The physical and electronic diversity of an ......, along with the existence of multiple operating
                                      platforms, enhances robustness, Flexibility, and adaptability.
                                      (a)  IRSE                            (b) ISRS
                                      (c)  DBMS                            (d) P2P
                                 2.   In Information retrieval a query does not uniquely identify a ...... object in the collection.
                                      (a)  Single                          (b) Double
                                      (c)  Triple                          (d) Fourth

                                10.2 Precision and Recall

                                Precision and recall are two widely used metrics for evaluating the correctness of a pattern recognition
                                algorithm. They can be seen as extended versions of accuracy, a simple metric that computes the
                                fraction of instances for which the correct result is returned.
                                When using precision and recall, the set of possible labels for a given instance is divided into two
                                subsets, one of which is considered “relevant” for the purposes of the metric. Recall is then computed
                                as the fraction of correct instances among all instances that actually belong to the relevant subset,
                                while precision is the fraction of correct instances among those that the algorithm believes to belong
                                to the relevant subset.





                                         Precision can be seen as a measure of exactness or fidelity, whereas recall is a measure
                                         of completeness.

                                In even simpler terms, a high recall means you haven’t missed anything but you may have a lot of
                                useless results to sift through (which would imply low precision). High precision means that
                                everything returned was a relevant result, but you might not have found all the relevant items
                                (which would imply low recall).
                                As an example, in an information retrieval scenario, the instances are documents and the task is to
                                return a set of relevant documents given a search term; or equivalently, to assign each document to
                                one of two categories, “relevant” and “not relevant”. In this case, the “relevant” documents are simply
                                those that belong to the “relevant” category. Recall is defined as the number of relevant documents
                                retrieved by a search divided by the total number of existing relevant documents, while precision is
                                defined as the number of relevant documents retrieved by a search divided by the total number of
                                documents retrieved by that search.





          98                               LOVELY PROFESSIONAL UNIVERSITY
   98   99   100   101   102   103   104   105   106   107   108