Page 227 - DLIS402_INFORMATION_ANALYSIS_AND_REPACKAGING
P. 227

Information Analysis and Repackaging



                   Notes         A typical document includes many terms signifying topics about which it contains no information.
                                 Computer programs include those terms in their results because they lack the intelligence required
                                 to distinguish terms signifying topics about which information is presented from terms about which
                                 no information is presented. A fourth reason is that headings and subheadings should be tailored
                                 to the needs and viewpoints of anticipated users.
                                 Some are aimed at users who are very knowledgeable about topics addressed in the document;
                                 others at users with little knowledge. Some are reminders to those who read the document already;
                                 others are enticements to potential readers. To date, no one has found a way to provide computer
                                 programs with the judgment, expertise, intelligence or audience awareness that is needed to create
                                 usable indexes. Until they do, automatic indexing will remain a pipe dream.”
                                 Anderson and Pérez-Carballo, on the other hand,  find that human indexing has to be limited to
                                 specific kinds of tasks, which can justify their high costs and concludes their discussion of automatic
                                 indexing:
                                 “The bottom line is clear: automatic indexing works! And it appears to work just as well as human
                                 indexing, just differently.
                                 An important aspect is, of course, the qualifications of the human indexer. Should the author, for
                                 example, be the indexer of his or her own works? (Cf., author supplied keywords).
                                 What computers can do (and humans cannot):
                                 Organize all words in a text and in a given database and make statistical operations on them (e.g.
                                 Td-idf).
                                 What humans can do (and computers cannot):
                                 Understand words and texts on the background of implicit knowledge.
                                 For example, consider these sentences from Bar-Hillel (1960): “Little John was looking for his toy
                                 box. Finally he found it. The box was in the pen. “  The word pen can have at least two meanings (a
                                 container for animals or children, and a writing implement). In the sentence The box was in the pen
                                 one knows that only the first meaning is plausible; the second meaning is excluded by one’s
                                 knowledge of the normal sizes of (writing) pens and boxes. Bar-Hillel contended that no computer
                                 program could conceivably deal with such “real world” knowledge without recourse to a vast
                                 encyclopedic store.
                                 Warner (x) expresses the view that only syntactic labour, not semantic labour can be automated.
                                 Semantic and syntactic labour is defined in, for example, Warner (2002):
                                 “Semantic labour is concerned with the content, meaning, or, in semiotic terms, the signified of
                                 messages. The intention of semantic labour may be the construction of further messages, for instance,
                                 a description of the original message or a dialogic response.
                                 Syntactic labour is concerned with the form, expression, or signifier of the original message.
                                 Transformations operating on the form alone may produce further messages (classically, this would
                                 be exemplified in the logic formalised by Boole).”

                                 Conclusion

                                 Automatic indexing may —at first —look like a reasonably limited and well-defined research topic.
                                 Important developments have taken place, the practical implication which most of us use almost
                                 every day. However, there seems to be no limits to how automatic indexing may be improved and
                                 how the theoretical outlook opens-up. Nearly every aspect of human language may be involved in
                                 the improvement machine processing of language (and each natural language may need special
                                 consideration).







            222                              LOVELY PROFESSIONAL UNIVERSITY
   222   223   224   225   226   227   228   229   230   231   232