Page 227 - DLIS402_INFORMATION_ANALYSIS_AND_REPACKAGING
P. 227
Information Analysis and Repackaging
Notes A typical document includes many terms signifying topics about which it contains no information.
Computer programs include those terms in their results because they lack the intelligence required
to distinguish terms signifying topics about which information is presented from terms about which
no information is presented. A fourth reason is that headings and subheadings should be tailored
to the needs and viewpoints of anticipated users.
Some are aimed at users who are very knowledgeable about topics addressed in the document;
others at users with little knowledge. Some are reminders to those who read the document already;
others are enticements to potential readers. To date, no one has found a way to provide computer
programs with the judgment, expertise, intelligence or audience awareness that is needed to create
usable indexes. Until they do, automatic indexing will remain a pipe dream.”
Anderson and Pérez-Carballo, on the other hand, find that human indexing has to be limited to
specific kinds of tasks, which can justify their high costs and concludes their discussion of automatic
indexing:
“The bottom line is clear: automatic indexing works! And it appears to work just as well as human
indexing, just differently.
An important aspect is, of course, the qualifications of the human indexer. Should the author, for
example, be the indexer of his or her own works? (Cf., author supplied keywords).
What computers can do (and humans cannot):
Organize all words in a text and in a given database and make statistical operations on them (e.g.
Td-idf).
What humans can do (and computers cannot):
Understand words and texts on the background of implicit knowledge.
For example, consider these sentences from Bar-Hillel (1960): “Little John was looking for his toy
box. Finally he found it. The box was in the pen. “ The word pen can have at least two meanings (a
container for animals or children, and a writing implement). In the sentence The box was in the pen
one knows that only the first meaning is plausible; the second meaning is excluded by one’s
knowledge of the normal sizes of (writing) pens and boxes. Bar-Hillel contended that no computer
program could conceivably deal with such “real world” knowledge without recourse to a vast
encyclopedic store.
Warner (x) expresses the view that only syntactic labour, not semantic labour can be automated.
Semantic and syntactic labour is defined in, for example, Warner (2002):
“Semantic labour is concerned with the content, meaning, or, in semiotic terms, the signified of
messages. The intention of semantic labour may be the construction of further messages, for instance,
a description of the original message or a dialogic response.
Syntactic labour is concerned with the form, expression, or signifier of the original message.
Transformations operating on the form alone may produce further messages (classically, this would
be exemplified in the logic formalised by Boole).”
Conclusion
Automatic indexing may —at first —look like a reasonably limited and well-defined research topic.
Important developments have taken place, the practical implication which most of us use almost
every day. However, there seems to be no limits to how automatic indexing may be improved and
how the theoretical outlook opens-up. Nearly every aspect of human language may be involved in
the improvement machine processing of language (and each natural language may need special
consideration).
222 LOVELY PROFESSIONAL UNIVERSITY