Page 147 - DCAP506_ARTIFICIAL

Page 147 - DCAP506_ARTIFICIAL_INTELLIGENCE

P. 147

Unit 11: Natural Language Processing

evaluation of dependency parsers were performed in the context of the CoNLL shared tasks in Notes
2006 and 2007. In Italy, the evalita campaign was conducted in 2007 to compare various tools for
Italian evalita web site. In France, within the ANR-Passage project (end of 2007), 10 parsers for
French were compared passage web site. Adda G., Mariani J., Paroubek P., Rajman M. 1999
L’action GRACE d’évaluation de l’assignation des parties du discours pour le français. Langues
vol-2 Black E., Abney S., Flickinger D., Gdaniec C., Grishman R., Harrison P., Hindle D., Ingria
R., Jelinek F., Klavans J., Liberman M., Marcus M., Reukos S., Santoni B., Strzalkowski T. 1991 A
procedure for quantitatively comparing the syntactic coverage of English grammars. DARPA
Speech and Natural Language Workshop Hirshman L. 1998 Language understanding evaluation:
lessons learned from MUC and ATIS. LREC Granada Pallet D.S. 1998 The NIST role in automatic
speech recognition benchmark tests.

11.1.2 Tasks and Limitations of NLP

In theory, natural-language processing is a very attractive method of human-computer interaction.
Early systems such as SHRDLU, working in restricted “blocks worlds” with restricted
vocabularies, worked extremely well, leading researchers to excessive optimism, which was
soon lost when the systems were extended to more realistic situations with real-world ambiguity
and complexity. Natural-language understanding is sometimes referred to as an AI-complete
problem, because natural-language recognition seems to require extensive knowledge about
the outside world and the ability to manipulate it. The definition of “understanding” is one of
the major problems in natural-language processing.

11.1.3 Sub-problems of NLP

Speech Segmentation

In most spoken languages, the sounds representing successive letters blend into each other, so
the conversion of the analog signal to discrete characters can be a very difficult process. Also, in
natural speech there are hardly any pauses between successive words; the location of those
boundaries usually must take into account grammatical and semantic constraints, as well as the
context.

Text Segmentation

Some written languages like Chinese, Japanese and Thai do not have single-word boundaries
either, so any significant text parsing usually requires the identification of word boundaries,
which is often a non-trivial task.

Part-of-speech Tagging

Word sense disambiguation: Many words have more than one meaning; we have to select the
meaning which makes the most sense in context.

Syntactic Ambiguity

The grammar for natural languages is ambiguous, i.e. there are often multiple possible parse
trees for a given sentence. Choosing the most appropriate one usually requires semantic and
contextual information. Specific problem components of syntactic ambiguity include sentence
boundary disambiguation.

LOVELY PROFESSIONAL UNIVERSITY 141

142 143 144 145 146 147 148 149 150 151 152