Page 249 - DLIS402_INFORMATION_ANALYSIS_AND_REPACKAGING
P. 249

Information Analysis and Repackaging



                   Notes            •  Taxonomy and Synonymy: same meaning, different levels of specificity
                                    •  Antinomy: opposite (in some sense) meaning

                                 However, these are not easily found during automatic thesaurus generation, as they require a great
                                 deal of “semantic” knowledge that is not easy to capture from the documents alone. Instead, the
                                 multi-purpose “associated with” relation is used.


                                 Normalization
                                 Manual thesauri use a very complex set of rules (few adjectives, strip some prepositions, noun
                                 form, capitalization) to achieve vocabulary “normalization”: store only the “base form” of each
                                 term, instead of all it’s variants. Normalization can be critical to reduce the amount of needed space.
                                 The problem with this complex normalization is that the user must be aware of the normalized
                                 form in order to use the thesaurus.
                                 In automatic thesauri, a simpler (but less precise) approach is usually taken:
                                    •  Apply a stoplist filter
                                    •  Use a standard stemmer on the remaining words (e.g., Porter)
                                 The other side of the problem (a single word for multiple meanings) arises with “homographs”.
                                 Homographs can be handled in manual thesauri via parenthetical specification (in INSPEC, the
                                 terms “bond(chemical)” and “bond(cohesive)”). This is not so easy to do in automatically generated
                                                     –
                                 ones, as the meaning can only be extracted from the term’s context.

                                 Automated Thesauri
                                 Manual vs. Automatic thesauri for IR
                                 This unit deals with the differences to be found between manually and automatically generated
                                 thesauri for the field of IR. The following tables illustrate those in the fields of structure, goal,
                                 construction and verification.

                                                                     Table 11.3

                                                       Manual                             Automatic
                                    Structure   – Hierarchy of thesaural terms    – Many different approaches, but not
                                                – High level of coordination       always hierarchical
                                                – Many types of relations between terms  – Lower level of coordination (phrase
                                                                                   selection not easy to do)
                                                – Complex normalization rules     – Simple normalization rules; hard to
                                                                                   separate homographs.
                                                – Field limits are specified by the creators  – Field limits are specified by the
                                                                                   collection

                                    Goal        – Main goal is to precisely define the  – Depending on level of coordination,
                                                  vocabulary to be used in a technical field  can be used for indexing.
                                                – Due to this precise definition, useful to  – Main use is to assist in retrieval
                                                  index documents.                 through (possibly automated) query
                                                                                   expansion/contraction
                                                – Assistance in developing search strategy
                                                – Assistance in retrieval through query
                                                  expansion/contraction







            244                              LOVELY PROFESSIONAL UNIVERSITY
   244   245   246   247   248   249   250   251   252   253   254