Page 175 - DLIS405_INFORMATION_STORAGE_AND_RETRIEVAL
P. 175
Information Storage and Retrieval
Notes for refining or expanding a free text query (either interactively or automatically). Alternatively a
thesaurus can be used both in searching and indexing with controlled vocabulary indexed datasets
and this latter use is the immediate application of our current work (although we also see the
techniques as useful with free text searching).
In retrieval, thesaurus relationships are conventionally used to expand synonyms and sometimes
narrower query terms but the FACET system also performs more general semantic term expansion
(to broader and to related concepts). Reasoning over the semantic relationships in the thesaurus
permits imprecise matching between query and index terms. This allows the ranking of matching
items in a result list or a ‘More like this’ option for similar but not necessarily identically indexed
items.
Faceted systems are based on a primary division of terminology into fundamental, high-level
categories, or facets. A knowledge system can be considered as enumerative, when all possible
simple and compound terms are explicitly listed in their hierarchical position, or as synthetic. Faceted
systems are normally synthetic; they do not attempt to include the vast number of possible multi-
concept headings or descriptors in a domain, but combine terms from a limited number of
fundamental facets, as needed when indexing or querying. This flexibility allows highly specific,
nuanced metadata descriptions (or annotations). Matching such compound descriptors poses
significant challenges when searching and the full potential for retrieval has remained untapped.
Objectives
• The overarching objective of the research was to.
• Develop and evaluate retrieval tools based on a matching function incorporating thesaurus
semantic closeness measures.
• Derive heuristics to guide automatic and interactive expansion/refinement of strings of the-
saurus terms, taking advantage of the context provided by facets.
• Experiment with techniques for creating complex queries using a query editor with knowledge
of the semantic roles of thesaurus facets. This will draw on previous work in the cultural heri-
tage domain.
• Design and implement semantic closeness measures based on thesaurus relationships.
Beneficiaries of the research
The research is directly relevant to cultural heritage organisations and the users of their digital
collections, also to collection management vendors and commercial image providers. Thesauri are
one of the most common Knowledge Organisation Systems and frequently underpin higher level
schemas and ontologies. Initiatives to update international thesaurus standards are currently
underway and various groups are working on XML/RDF representations for thesauri. Thesauri
and faceted approaches have been applied to website architecture and hierarchical browsing
interfaces to web databases.
FACET Architecture and Interfaces
The final FACET system comprises a tiered component-based architecture (Fig. 14.1), accessing a
SQL Server relational database. Queries with associated results are stored persistently using XML
format data.
170 LOVELY PROFESSIONAL UNIVERSITY