Page 215 - DLIS402_INFORMATION_ANALYSIS_AND_REPACKAGING
P. 215
Information Analysis and Repackaging
Notes Self Assessment
Multiple Choice Questions:
1. ...... are a proper subset of context-sensitive languages.
(a) Syntactic structure (b) Syndectic structure (c) Indexed languages.
2. ...... are kinds of metadata.
(a) Indexing languages (b) Vocabulary tools (c) IR thesaurus
3. The systematic selection of standardised terms is known as ...... .
(a) IR thesaurus (b) Vocabulary control (c) Need analysis.
4. ...... is a reference work that lists words grouped together according to similarity of meaning.
(a) Indexed languages (b) Vocabulary cools (c) Thesaurus.
11.7 Automatic Indexing
Automatic indexing is indexing made by algorithmic procedures. The algorithm works on a database
containing document representations (which may be full text representations or bibliographical records
or partial text representations and in principle also value added databases).
Automatic indexing may also be performed on non-text databases, e.g. images or
music.
In text-databases may the algorithm perform string searching, but is mostly based on searching the
words in the single document representation as well as in the total database (via inverted files). The
use of words is mostly based on stemming). Algorithms may count co-occurrences of words (or
references), they may consider levels of proximity between words, and so on.
Automatic indexing may be contrasted to human indexing. It should be considered however, that if
humans are being taught strict rules on how to index, their indexing should also be considered
mechanical or algorithmic. If, for example, a librarian mechanically matches words from titles with
words from a controlled vocabulary, is this corresponding to primitive forms of automatic indexing.
It is also an open question whether the principles developed by the facet analytic approach can be
automated.
Of this reason should manual indexing and machine indexing not necessarily be considered two
fundamentally different approaches to indexing, but the principles and assumptions underlying
both kinds of indexing should be uncovered. For example, are assigned and derived indexing
approaches, which may be applied - although differently - by both humans and machines. As pointed
out by Anderson & Pérez-Carballo (2001), we know more about computer indexing than about
human indexing because “machine methods must be rigorously described in detail for the computer
to carry them out”. Automatic indexing may thus inspire us to put more precise questions also
about human indexing.
The earliest and most primitive form of automatic indexing were the KWIC / KWAC/ KWOC
systems based just on simple, mechanical manipulations of terms derived from document titles.
Related forms are the Permuterm Subject Index and the KeyWord Plus known from ISI’s citation
indexes (this last system is based on assigning terms from cited titles).
210 LOVELY PROFESSIONAL UNIVERSITY