Page 201 - DLIS402_INFORMATION_ANALYSIS_AND_REPACKAGING
P. 201
Information Analysis and Repackaging
Notes This is particularly problematic when the search question involves terms that are sufficiently
tangential to the subject area such that the indexer might have decided to tag it using a different
term (but the searcher might consider the same). Essentially, this can be avoided only by an
experienced user of controlled vocabulary whose understanding of the vocabulary coincides with
the way it is used by the indexer.
Another possibility is that the article is just not tagged by the indexer because indexing exhaustivity
is low. For example an article might mention football as a secondary focus, and the indexer might
decide not to tag it with “football” because it is not important enough compared to the main focus.
But it turns out that for the searcher that article is relevant and hence recall fails. A free text search
would automatically pick up that article regardless.
On the other hand free text searches have high exhaustivity (you search on every word) so it has
potential for high recall (assuming you solve the problems of synonyms by entering every
combination) but will have much lower precision.
Controlled vocabularies are also quickly out-dated and in fast developing fields of knowledge, the
authorized terms available might not be available if they are not updated regularly. Even in the best
case scenario, controlled language is often not as specific as using the words of the text itself.
Indexers trying to choose the appropriate index terms might misinterpret the author,
while a free text search is in no danger of doing so, because it uses the author’s own
words.
The use of controlled vocabularies can be costly compared to free text searches because human
experts or expensive automated systems are necessary to index each entry. Furthermore, the user
has to be familiar with the controlled vocabulary scheme to make best use of the system. But as
already mentioned, the control of synonyms, homographs can help increase precision.
The thesaurus is a controlled vocabulary of some cheese terms based on the ANSI/NISO Z39.19.1993
Standard Guidelines for the Construction, Format, and Management of Monolingual Thesauri. There
is a mixture of single-word and multi-word terms representing several aspects of the cheeses sold
in The Epicurean Cheese Shop. Scope notes are used to clarify the meaning of some of the terms,
especially terms that are not obvious in their meaning, such as barnyardy, and common terms used
in a specific way, such as low fat, medium fat, and high fat [3.2.2]. Nearly all the terms are from
English, although some French and Italian words may later be added as non-preferred terms, to
meet the needs of cheese connoisseurs with a working knowledge of those languages.
Pre-coordinate or Post-coordinate Retrieval
The thesaurus terms will eventually be integrated into a highly structured online database, which
will enable users to search for cheese varieties based on a variety of characteristics. Users will be
able to select characteristics from several different pop-up boxes that reflect the main classes of
terms in the thesaurus used to describe cheese varieties, such as fat content, flavour, flavour intensity,
texture, milk type, or national origin. The thesaurus terms will, therefore, be post-coordinated at
the retrieval stage, with the possibility of using Boolean operators to combine or restrict specific
cheese characteristics. Indexers will apply all applicable terms to the cheese varieties.
Numerous methodologies have been developed to assist in the creation of controlled vocabularies,
including faceted classification, which enables a given data record or document to be described in
multiple ways.
196 LOVELY PROFESSIONAL UNIVERSITY