Page 146 - DCAP506_ARTIFICIAL_INTELLIGENCE
P. 146
Artificial Intelligence
Notes field. The Redmond-based Natural Language Processing group is focused on developing efficient
algorithms to process texts and to make their information accessible to computer applications.
Since text can contain information at many different granularities, from simple word or token-
based representations, to rich hierarchical syntactic representations, to high-level logical
representations across document collections, the group seeks to work at the right level of analysis
for the application concerned.
11.1 Natural Language Processing – Overview
The goal of the Natural Language Processing (NLP) group is to design and build software that
will analyze, understand, and generate languages that humans use naturally, so that eventually
you will be able to address your computer as though you were addressing another person.
This goal is not easy to reach. “Understanding” language means, among other things, knowing
what concepts a word or phrase stands for and knowing how to link those concepts together in
a meaningful way. It’s ironic that natural language, the symbol system that is easiest for humans
to learn and use, is hardest for a computer to master. Long after machines have proven capable
of inverting large matrices with speed and grace, they still fail to master the basics of our spoken
and written languages.
The challenges we face stem from the highly ambiguous nature of natural language.
As an English speaker you effortlessly understand a sentence like “Flying planes can be
dangerous”. Yet this sentence presents difficulties to a software program that lacks both your
knowledge of the world and your experience with linguistic structures. Is the more plausible
interpretation that the pilot is at risk, or that the danger is to people on the ground? Should “can”
be analyzed as a verb or as a noun? Which of the many possible meanings of “plane” is relevant?
Depending on context, “plane” could refer to, among other things, an airplane, a geometric
object, or a woodworking tool. How much and what sort of context needs to be brought to bear
on these questions in order to adequately disambiguate the sentence?
We address these problems using a mix of knowledge-engineered and statistical/machine-
learning techniques to disambiguate and respond to natural language input. Our work has
implications for applications like text critiquing, information retrieval, question answering,
summarization, gaming, and translation. The grammar checkers in Office for English, French,
German, and Spanish are outgrowths of our research; Encarta uses our technology to retrieve
answers to user questions; Intellishrink uses natural language technology to compress cellphone
messages; Microsoft Product Support uses our machine translation software to translate the
Microsoft Knowledge Base into other languages. As our work evolves, we expect it to enable
any area where human users can benefit by communicating with their computers in a natural
way.
11.1.1 Evaluation in NLP
The first evaluation campaign on written texts seems to be a campaign dedicated to message
understanding in 1987 (Pallet 1998). Then, the Parseval/GEIG project compared phrase-structure
grammars (Black 1991). A series of campaigns within Tipster project were realized on tasks like
summarization, translation and searching (Hirshman 1998). In 1994, in Germany, the
Morpholympics compared German taggers. Then, the Senseval and Romanseval campaigns
were conducted with the objectives of semantic disambiguation. In 1996, the Sparkle campaign
compared syntactic parsers in four different languages (English, French, German and Italian).
In France, the Grace project compared a set of 21 taggers for French in 1997 (Adda 1999). In 2004,
during the Technolangue/Easy project, 13 parsers for French were compared. Large-scale
140 LOVELY PROFESSIONAL UNIVERSITY