Page 121 - DLIS402_INFORMATION_ANALYSIS_AND_REPACKAGING
P. 121
Information Analysis and Repackaging
Notes • Binary Independence Model
• Probabilistic relevance model on which is based the okapi (BM25) relevance function
• Uncertain inference
• Language models
• Divergence-from-randomness model
• Latent Dirichlet allocation
Feature-based retrieval models view documents as vectors of values of feature functions (or
just features) and seek the best way to combine these features into a single relevance score,
typically bylearning to rank methods. Feature functions are arbitrary functions of document
and query, and as such can easily incorporate almost any other retrieval model as just a yet
another feature.
Second dimension: properties of the model
Models without term-interdependencies treat different terms/words as independent. This
fact is usually represented in vector space models by the orthogonality assumption of term
vectors or in probabilistic models by an independency assumption for term variables.
Models with immanent term interdependencies allow a representation of interdependencies
between terms. However, the degree of the interdependency between two terms is defined by
the model itself. It is usually directly or indirectly derived (e.g., by dimensional reduction)
from the co-occurrence of those terms in the whole set of documents.
Models with transcendent term interdependencies allow a representation of interdependen-
cies between terms, but they do not allege how the interdependency between two terms is
defined. They relay an external source for the degree of interdependency between two terms.
(For example, a human or sophisticated algorithms).
Self Assessment
Multiple Choice Questions:
1. Automated information retrieval systems are used to reduce ......
(a) digital obsolescence (b) information overload
(c) information need
2. ...... is the fraction of the documents retrieved that are relevant to the user’s information
need.
(a) recall (b) precision (c) fall-out
3. ...... is the fraction of documents that are relevant to the query that are successfully retrieved.
(a) recall (b) F-measure (c) fall-out.
4. The proportion of non-relevant decuments that are retrieved, out of all non-relevant
documents available is known as ......
(a) recall (b) F-measure (c) fall-out.
6.3 Search Strategies
Search strategies are comprehensive plans for finding information — includes defining the information
need, and determining the form in which it is needed, if it exists, where it is located, how it is organized,
and how to retrieve it.
Advances in technologies and in particular the high volume of content accessible through the Internet,
has led to an explosion of information available on a global scale.
116 LOVELY PROFESSIONAL UNIVERSITY