Page 146 - DCAP506_ARTIFICIAL_INTELLIGENCE
P. 146

Artificial Intelligence




                    Notes          field. The Redmond-based Natural Language Processing group is focused on developing efficient
                                   algorithms to process texts and to make their information accessible to computer applications.
                                   Since text can contain information at many different granularities, from simple word or token-
                                   based  representations,  to  rich  hierarchical  syntactic  representations,  to  high-level  logical
                                   representations across document collections, the group seeks to work at the right level of analysis
                                   for the application concerned.

                                   11.1 Natural Language Processing – Overview


                                   The goal of the Natural Language Processing (NLP) group is to design and build software that
                                   will analyze, understand, and generate languages that humans use naturally, so that eventually
                                   you will be able to address your computer as though you were addressing another person.

                                   This goal is not easy to reach. “Understanding” language means, among other things, knowing
                                   what concepts a word or phrase stands for and knowing how to link those concepts together in
                                   a meaningful way. It’s ironic that natural language, the symbol system that is easiest for humans
                                   to learn and use, is hardest for a computer to master.  Long after machines have proven capable
                                   of inverting large matrices with speed and grace, they still fail to master the basics of our spoken
                                   and written languages.
                                   The challenges we face stem from the highly ambiguous nature of natural language.

                                   As  an  English  speaker you effortlessly understand a sentence like “Flying  planes  can  be
                                   dangerous”. Yet this sentence presents difficulties to a software program that lacks both your
                                   knowledge of the world and your experience with linguistic structures. Is the more plausible
                                   interpretation that the pilot is at risk, or that the danger is to people on the ground? Should “can”
                                   be analyzed as a verb or as a noun? Which of the many possible meanings of “plane” is relevant?
                                   Depending on context, “plane” could refer to, among other things, an airplane, a  geometric
                                   object, or a woodworking tool. How much and what sort of context needs to be brought to bear
                                   on these questions in order to adequately disambiguate the sentence?

                                   We address  these problems using a  mix of knowledge-engineered and  statistical/machine-
                                   learning techniques to disambiguate and respond to natural language input. Our work  has
                                   implications for applications  like text critiquing, information retrieval, question answering,
                                   summarization, gaming, and translation. The grammar checkers in Office for English, French,
                                   German, and Spanish are outgrowths of our research; Encarta uses our technology to retrieve
                                   answers to user questions; Intellishrink uses natural language technology to compress cellphone
                                   messages; Microsoft Product Support uses  our machine translation software  to translate the
                                   Microsoft Knowledge Base into other languages. As our work evolves, we expect it to enable
                                   any area where human users can benefit by communicating with their computers in a natural
                                   way.

                                   11.1.1 Evaluation in NLP

                                   The first evaluation campaign on written texts seems to be a campaign dedicated to message
                                   understanding in 1987 (Pallet 1998). Then, the Parseval/GEIG project compared phrase-structure
                                   grammars (Black 1991). A series of campaigns within Tipster project were realized on tasks like
                                   summarization,  translation  and  searching  (Hirshman  1998).  In  1994,  in  Germany,  the
                                   Morpholympics compared German taggers. Then, the Senseval and Romanseval  campaigns
                                   were conducted with the objectives of semantic disambiguation. In 1996, the Sparkle campaign
                                   compared syntactic parsers in four different languages (English, French, German and Italian).
                                   In France, the Grace project compared a set of 21 taggers for French in 1997 (Adda 1999). In 2004,
                                   during the Technolangue/Easy project,  13 parsers for French were compared. Large-scale




          140                               LOVELY PROFESSIONAL UNIVERSITY
   141   142   143   144   145   146   147   148   149   150   151