Page 235 - DCAP310_INTRODUCTION_TO_ARTIFICIAL_INTELLIGENCE_AND_EXPERT_SYSTEMS
P. 235
Unit 12: Natural Language Processing
to be created can be determined when new stories are read. Generalizations are analyzed as to Notes
their critical parts and evaluated in light of later evidence. The knowledge structures used and a
number of examples of the system in operation are presented.
12.5.1 Natural Language Systems
The Natural Language System is a software system designed to answer questions that are posed
to it in natural language. START parses incoming questions, matches the queries created from
the parse trees against its knowledge base and presents the appropriate information segments
to the user. In this way, START provides untrained users with speedy access to knowledge that
in many cases would take an expert some time to find.
START (SynTactic Analysis using Reversible Transformations) was developed by Boris Katz at
MIT’s Artificial Intelligence Laboratory. Currently, the system is undergoing further
development by the InfoLab Group, led by Boris Katz. This system was first connected to the
World Wide Web in December, 1993, and in its several forms has to date answered millions of
questions from users around the world.
A key technique called “natural language annotation” helps to connect information seekers to
information sources. This technique employs natural language sentences and phrases –
annotations – as descriptions of content that are associated with information segments at various
granularities. An information segment is retrieved when its annotation matches an input question.
This method allows this system to handle all variety of media, including text, diagrams, images,
video and audio clips, data sets, Web pages, and others.
The natural language processing component of this system consists of two modules that share
the same grammar. The understanding module analyzes English text and produces a knowledge
base that encodes information found in the text. Given an appropriate segment of the knowledge
base, the generating module produces English sentences. Used in conjunction with the technique
of natural language annotation, these modules put the power of sentence-level natural language
processing to use in the service of multimedia information access.
12.5.2 Recognition and Classification Process
It is generally easy for a person to differentiate the sound of a human voice, from that of a violin;
a handwritten numeral “3,” from an “8”; and the aroma of a rose, from that of an onion. However,
it is difficult for a programmable computer to solve these kinds of perceptual problems. These
problems are difficult because each pattern usually contains a large amount of information, and
the recognition problems typically have an inconspicuous, high-dimensional, structure.
Pattern recognition is the science of making inferences from perceptual data, using tools from
statistics, probability, computational geometry, machine learning, signal processing, and
algorithm design. Thus, it is of central importance to artificial intelligence and computer vision,
and has far-reaching applications in engineering, science, medicine, and business. In particular,
advances made during the last half century, now allow computers to interact more effectively
with humans and the natural world (e.g., speech recognition software). However, the most
important problems in pattern recognition are yet to be solved.
It is natural that we should seek to design and build machines that can recognize patterns. From
automated speech recognition, fingerprint identification, optical character recognition, DNA
sequence identification, and much more, it is clear that reliable, accurate pattern recognition by
machine would be immensely useful. Moreover, in solving the indefinite number of problems
required to build such systems, we gain deeper understanding and appreciation for pattern
recognition systems. For some problems, such as speech and visual recognition, our design
LOVELY PROFESSIONAL UNIVERSITY 229