Page 128 - DCAP310_INTRODUCTION_TO_ARTIFICIAL_INTELLIGENCE_AND_EXPERT_SYSTEMS
P. 128

Introduction to Artificial Intelligence & Expert Systems




                    Notes          human or natural language input. What distinguishes these computation from other potential
                                   and actual NLP tasks is not only the volume of research devoted to them but the fact that for each
                                   one there is typically a well-defined problem setting, a standard metric for evaluating the task,
                                   standard corpora on which the task can be evaluated, and competitions devoted to the specific
                                   task.

                                   6.13.1 Defining Natural Language

                                   The following is a list of some of the most commonly researched tasks in NLP. Note that some
                                   of these tasks have direct real-world applications, while others more commonly serve as subtasks
                                   that are used to aid in solving larger tasks.
                                       Automatic Summarization: Produce a readable summary of a chunk of text. Often used to
                                       provide summaries of text of a known type, such as articles in the financial section of a
                                       newspaper.
                                       Co-reference Resolution: Given a sentence or larger chunk of text, determine which words
                                       (“mentions”) refer to the same objects (“entities”). Anaphora resolution is a specific example
                                       of this task, and is specifically concerned with matching up pronouns with the nouns or
                                       names that they refer to. The more general task of co reference resolution also includes
                                       identifying so-called “bridging relationships” involving referring expressions. For
                                       example, in a sentence such as “He entered John’s house through the front door”, “the
                                       front door” is a referring expression and the bridging relationship to be identified is the
                                       fact that the door being referred to is the front door of John’s house (rather than of some
                                       other structure that might also be referred to).

                                       Discourse Analysis: This rubric includes a number of related tasks. One task is identifying
                                       the discourse structure of connected text, i.e. the nature of the discourse relationships
                                       between sentences (e.g. elaboration, explanation, contrast). Another possible task is
                                       recognizing and classifying the speech acts in a chunk of text (e.g. yes-no question, content
                                       question, statement, assertion, etc.).

                                       Machine Translation: Automatically translate text from one human language to another.
                                       This is one of the most difficult problems, and is a member of a class of problems colloquially
                                       termed “AI-complete”, i.e. requiring all of the different types of knowledge that humans
                                       possess (grammar, semantics, facts about the real world, etc.) in order to solve properly.

                                       Morphological Segmentation: Separate words into individual morphemes and identify
                                       the class of the morphemes. The difficulty of this task depends greatly on the complexity
                                       of the morphology (i.e. the structure of words) of the language being considered. English
                                       has fairly simple morphology, especially inflectional morphology, and thus it is often
                                       possible to ignore this task entirely and simply model all possible forms of a word (e.g.
                                       “open, opens, opened, and opening”) as separate words. In languages such as Turkish,
                                       however, such an approach is not possible, as each dictionary entry has thousands of
                                       possible word forms.
                                       Named Entity Recognition (NER): Given a stream of text, determine which items in the
                                       text map to proper names, such as people or places, and what the type of each such name
                                       is (e.g. person, location, organization). Note that, although capitalization can aid in
                                       recognizing named entities in languages such as English, this information cannot aid in
                                       determining the type of named entity, and in any case is often inaccurate or insufficient.
                                       For example, the first word of a sentence is also capitalized, and named entities often span
                                       several words, only some of which are capitalized. Furthermore, many other languages in
                                       non-Western scripts (e.g. Chinese or Arabic) do not have any capitalization at all, and
                                       even languages with capitalization may not consistently use it to distinguish names.




          122                               LOVELY PROFESSIONAL UNIVERSITY
   123   124   125   126   127   128   129   130   131   132   133