Page 246 - DLIS402_INFORMATION_ANALYSIS_AND_REPACKAGING
P. 246
Unit 11: Indexing Language: Types and Characteristics
The process whereby the creators of documents structure them to enhance retrieval is known as Notes
bottom-up indexing. A role for professional indexers in bottom-up indexing is as guides and trainers
to document authors (Locke 1993).
One reason that automatic indexing may be unsuited to book indexing is that book indexes are not
usually available electronically, and cannot be used in conjunction with powerful search software
(Mulvany and Milstead 1994).
Document Abstracting
Computers abstract documents (that is, condense their text) by searching for high frequency words
in the text, and then selecting sentences in which clusters of these high frequency words occur.
These sentences are then used in the order in which they appear in the text to make up the abstract.
Flow can be improved by adding extra sentences (for example, if a sentence begins with ‘Hence’ or
‘However’ the previous sentence can be included as well) but the abstract remains an awkward
collection of grammatically unrelated sentences.
To try and show the subject content, weighting can be given to sentences from certain locations in
the document (e.g., the introduction) and to sentences containing cue words (e.g., ‘finally’, which
suggests that a conclusion is starting). In addition, an organisation can give a weighting to words
which are important to them: a footwear producer, for example, could require that every sentence
containing the words foot or shoe should be included in the abstract.
Function: noun
Text: 1
Synonyms FOOL 3, butt, chump, dupe, fall guy, gudgeon, gull, pigeon, sap, sucker
|| 2
Synonyms DOLLAR, bill, ||bone, ||buck, ||frogskin, ||iron man, one, ||skin, ||smacker, ||smackeroo
Whereas an IR-oriented thesaurus’s aims are completely different: for example, this excerpt of the
INSPEC Thesaurus (built to assist IR in the fields of physics, electrical engineering, electronics,
computers and control):
THESAURUS search words: natural languages
UF natural language processing (UF=used for natural language processing)
BT languages (BT=broader term is languages)
TT languages (TT=top term in a hierarchy of terms)
RT artificial intelligence (RT=related term/s)
computational linguistic
formal languages
programming languages
query languages
specification languages
speech recognition
user interfaces
CC C4210L; C6140D; C6180N; C7820(CC=classification code)
DI January 1985(DI=date [1985])
PT high level languages (PT=prior term to natural languages)
This is still a manually generated thesauri (more on this later), but the differences are already
apparent: it’s objective is no longer to provide better, richer vocabulary to a writer. Instead, it aims
at:
— Assist indexing by providing a common, precise and controlled vocabulary. For an example,
libraries commonly use a similar hierarchy to classify their books.
LOVELY PROFESSIONAL UNIVERSITY 241