Page 115 - DLIS402_INFORMATION_ANALYSIS_AND

Page 115 - DLIS402_INFORMATION_ANALYSIS_AND_REPACKAGING

P. 115

Information Analysis and Repackaging

Notes 6.1 History of Information Retrieval Model

The idea of using computers to search for relevant pieces of information was popularized in the
article As We May Think by Vannevar Bush in 1945. The first automated information retrieval systems
were introduced in the 1950s and 1960s. By 1970 several different techniques had been shown to
perform well on small text corpora such as the Cranfield collection (several thousand documents).
Large-scale retrieval systems, such as the Lockheed Dialog system, came into use early in the 1970s.
In 1992, the US Department of Defence along with the National Institute of Standards and Technology
(NIST), cosponsored the Text Retrieval Conference (TREC) as part of the TIPSTER text program.
The aim of this was to look into the information retrieval community by supplying the infrastructure
that was needed for evaluation of text retrieval methodologies on a very large text collection. This
catalyzed research on methods that scale to huge corpora. The introduction of web search engines
has boosted the need for very large scale retrieval systems even further.
The use of digital methods for storing and retrieving information has led to the phenomenon of
digital obsolescence, where a digital resource ceases to be readable because the physical media, the
reader required to read the media, the hardware, or the software that runs on it, is no longer available.
The information is initially easier to retrieve than if it were on paper, but is then effectively lost.

6.2 General Model of Information Retrieval

The goal of information retrieval (IR) is to provide users with those documents that will satisfy their
information need. We use the word “document” as a general term that could also include non-textual
information, such as multimedia objects. (Figure 1 ahead) provides a general overview of the
information retrieval process, which has been adapted from Lancaster and Warner (1993). Users have
to formulate their information need in a form that can be understood by the retrieval mechanism.
There are several steps involved in this translation process that we will briefly discuss below. Likewise,
the contents of large document collections need to be described in a form that allows the retrieval
mechanism to identify the potentially relevant documents quickly. In both cases, information may be
lost in the transformation process leading to a computer-usable representation. Hence, the matching
process is inherently imperfect.
Information seeking is a form of problem solving (Marcus 1994, Marchionini 1992). It proceeds
according to the interaction among eight sub processes: problem recognition and acceptance, problem
definition, search system selection, query formulation, query execution, examination of results
(including relevance feedback), information extraction, and reflection/iteration/termination. To
be able to perform effective searches, users have to develop the following expertise: knowledge
about various sources of information, skills in defining search problems and applying search
strategies, and competence in using electronic search tools.
Marchionini (1992) contends that some sort of spreadsheet is needed that supports users in the
problem definition as well as other information seeking tasks. The Info Crystal is such a spreadsheet
because it assists users in the formulation of their information needs and the exploration of the
retrieved documents, using the a visual interface that supports a “what-if” functionality. He further
predicts that advances in computing power and speed, together with improved information retrieval
procedures, will continue to blur the distinctions between problem articulation and examination of
results.

The Info Crystal is both a visual query language and a tool for visualizing retrieval
results.
The information need can be understood as forming a pyramid, where only its peak is made visible
by users in the form of a conceptual query (see Figure 6.1). The conceptual query captures the key

110 LOVELY PROFESSIONAL UNIVERSITY

110 111 112 113 114 115 116 117 118 119 120