Page 128 - DLIS402_INFORMATION_ANALYSIS_AND_REPACKAGING
P. 128

Unit 6: Information Retrieval Model and Search Strategies




            US appellate courts handed down approximately 500 new cases per day, meaning that an accurate  Notes
            legal information retrieval system must incorporate methods of both sorting past data and managing
            new data.


            Techniques
            Boolean searches

            Boolean searches, where a user may specify terms such as use of specific words or judgments by a
            specific court, are the most common type of search available via legal information retrieval systems.
            They are widely implemented by services such as Westlaw, LexisNexis, and Findlaw. However,
            they overcome few of the problems discussed above.
            The recall and precision rates of these searches vary depending on the implementation and searches
            analyzed. One study found a basic boolean search’s recall rate to be roughly 20%, and its precision
            rate to be roughly 79%. Another study implemented a generic search (that is, not designed for legal
            uses) and found a recall rate of 56% and a precision rate of 72% among legal professionals. Both
            numbers increased when searches were run by non-legal professionals, to a 68% recall rate and 77%
            precision rate. This is likely explained because of the use of complex legal terms by the legal
            professionals.

            Manual classification
            In order to overcome the limits of basic boolean searches, information systems have attempted to
            classify case laws and statutes into more computer friendly structures. Usually, this results in the
            creation of an ontology to classify the texts, based on the way a legal professional might think about
            them. These attempt to link texts on the basis of their type, their value, and/or their topic areas. Most
            major legal search providers now implement some sort of classification search, such as Westlaw’s
            “Natural Language” or LexisNexis’ Headnote searches. Additionally, both of these services allow
            browsing of their classifications, via Westlaw’s West Key Numbers or Lexis’ Headnotes. Though
            these two search algorithms are proprietary and secret, it is known that they employ manual
            classification of text (though this may be computer-assisted).
            These systems can help overcome the majority of problems inherent in legal information retrieval
            systems, in that manual classification has the greatest chances of identifying landmark cases and
            understanding the issues that arise in the text. In one study, ontological searching resulted in a
            precision rate of 82% and a recall rate of 97% among legal professionals. The legal texts included,
            however, were carefully controlled to just a few areas of law in a specific jurisdiction.
            The major drawback to this approach is the requirement of using highly skilled legal professionals
            and large amounts of time to classify texts. As the amount of text available continues to increase,
            some have stated their belief that manual classification is unsustainable.

            Natural language processing
            In order to reduce the reliance on legal professionals and the amount of time needed, efforts have
            been made to create a system to automatically classify legal text and queries. Adequate translation
            of both would allow accurate information retrieval without the high cost of human classification.
            These automatic systems generally employ Natural Language Processing (NLP) techniques that are
            adapted to the legal domain, and also require the creation of a legal ontology.
            Though multiple systems have been postulated, few have reported results. One system, “SMILE,”
            which attempted to automatically extract classifications from case texts, resulted in an f-measure
            (which is a calculation of both recall rate and precision) of under 0.3 (compared to perfect f-measure
            of 1.0). This is probably much lower than an acceptable rate for general usage.






                                             LOVELY PROFESSIONAL UNIVERSITY                                   123
   123   124   125   126   127   128   129   130   131   132   133