Page 201 - DLIS402_INFORMATION_ANALYSIS_AND_REPACKAGING
P. 201

Information Analysis and Repackaging



                   Notes         This is particularly problematic when the search question involves terms that are sufficiently
                                 tangential to the subject area such that the indexer might have decided to tag it using a different
                                 term (but the searcher might consider the same). Essentially, this can be avoided only by an
                                 experienced user of controlled vocabulary whose understanding of the vocabulary coincides with
                                 the way it is used by the indexer.
                                 Another possibility is that the article is just not tagged by the indexer because indexing exhaustivity
                                 is low. For example an article might mention football as a secondary focus, and the indexer might
                                 decide not to tag it with “football” because it is not important enough compared to the main focus.
                                 But it turns out that for the searcher that article is relevant and hence recall fails. A free text search
                                 would automatically pick up that article regardless.
                                 On the other hand free text searches have high exhaustivity (you search on every word) so it has
                                 potential for high recall (assuming you solve the problems of synonyms by entering every
                                 combination) but will have much lower precision.
                                 Controlled vocabularies are also quickly out-dated and in fast developing fields of knowledge, the
                                 authorized terms available might not be available if they are not updated regularly. Even in the best
                                 case scenario, controlled language is often not as specific as using the words of the text itself.




                                         Indexers trying to choose the appropriate index terms might misinterpret the author,
                                         while a free text search is in no danger of doing so, because it uses the author’s own
                                         words.

                                 The use of controlled vocabularies can be costly compared to free text searches because human
                                 experts or expensive automated systems are necessary to index each entry. Furthermore, the user
                                 has to be familiar with the controlled vocabulary scheme to make best use of the system. But as
                                 already mentioned, the control of synonyms, homographs can help increase precision.
                                 The thesaurus is a controlled vocabulary of some cheese terms based on the ANSI/NISO Z39.19.1993
                                 Standard Guidelines for the Construction, Format, and Management of Monolingual Thesauri. There
                                 is a mixture of single-word and multi-word terms representing several aspects of the cheeses sold
                                 in The Epicurean Cheese Shop. Scope notes are used to clarify the meaning of some of the terms,
                                 especially terms that are not obvious in their meaning, such as barnyardy, and common terms used
                                 in a specific way, such as low fat, medium fat, and high fat [3.2.2]. Nearly all the terms are from
                                 English, although some French and Italian words may later be added as non-preferred terms, to
                                 meet the needs of cheese connoisseurs with a working knowledge of those languages.

                                 Pre-coordinate or Post-coordinate Retrieval
                                 The thesaurus terms will eventually be integrated into a highly structured online database, which
                                 will enable users to search for cheese varieties based on a variety of characteristics. Users will be
                                 able to select characteristics from several different pop-up boxes that reflect the main classes of
                                 terms in the thesaurus used to describe cheese varieties, such as fat content, flavour, flavour intensity,
                                 texture, milk type, or national origin. The thesaurus terms will, therefore, be post-coordinated at
                                 the retrieval stage, with the possibility of using Boolean operators to combine or restrict specific
                                 cheese characteristics. Indexers will apply all applicable terms to the cheese varieties.
                                 Numerous methodologies have been developed to assist in the creation of controlled vocabularies,
                                 including faceted classification, which enables a given data record or document to be described in
                                 multiple ways.






            196                              LOVELY PROFESSIONAL UNIVERSITY
   196   197   198   199   200   201   202   203   204   205   206