Page 128 - DCAP310_INTRODUCTION_TO_ARTIFICIAL_INTELLIGENCE_AND_EXPERT

Page 128 - DCAP310_INTRODUCTION_TO_ARTIFICIAL_INTELLIGENCE_AND_EXPERT_SYSTEMS

P. 128

Introduction to Artificial Intelligence & Expert Systems

Notes human or natural language input. What distinguishes these computation from other potential
and actual NLP tasks is not only the volume of research devoted to them but the fact that for each
one there is typically a well-defined problem setting, a standard metric for evaluating the task,
standard corpora on which the task can be evaluated, and competitions devoted to the specific
task.

6.13.1 Defining Natural Language

The following is a list of some of the most commonly researched tasks in NLP. Note that some
of these tasks have direct real-world applications, while others more commonly serve as subtasks
that are used to aid in solving larger tasks.
Automatic Summarization: Produce a readable summary of a chunk of text. Often used to
provide summaries of text of a known type, such as articles in the financial section of a
newspaper.
Co-reference Resolution: Given a sentence or larger chunk of text, determine which words
(“mentions”) refer to the same objects (“entities”). Anaphora resolution is a specific example
of this task, and is specifically concerned with matching up pronouns with the nouns or
names that they refer to. The more general task of co reference resolution also includes
identifying so-called “bridging relationships” involving referring expressions. For
example, in a sentence such as “He entered John’s house through the front door”, “the
front door” is a referring expression and the bridging relationship to be identified is the
fact that the door being referred to is the front door of John’s house (rather than of some
other structure that might also be referred to).

Discourse Analysis: This rubric includes a number of related tasks. One task is identifying
the discourse structure of connected text, i.e. the nature of the discourse relationships
between sentences (e.g. elaboration, explanation, contrast). Another possible task is
recognizing and classifying the speech acts in a chunk of text (e.g. yes-no question, content
question, statement, assertion, etc.).

Machine Translation: Automatically translate text from one human language to another.
This is one of the most difficult problems, and is a member of a class of problems colloquially
termed “AI-complete”, i.e. requiring all of the different types of knowledge that humans
possess (grammar, semantics, facts about the real world, etc.) in order to solve properly.

Morphological Segmentation: Separate words into individual morphemes and identify
the class of the morphemes. The difficulty of this task depends greatly on the complexity
of the morphology (i.e. the structure of words) of the language being considered. English
has fairly simple morphology, especially inflectional morphology, and thus it is often
possible to ignore this task entirely and simply model all possible forms of a word (e.g.
“open, opens, opened, and opening”) as separate words. In languages such as Turkish,
however, such an approach is not possible, as each dictionary entry has thousands of
possible word forms.
Named Entity Recognition (NER): Given a stream of text, determine which items in the
text map to proper names, such as people or places, and what the type of each such name
is (e.g. person, location, organization). Note that, although capitalization can aid in
recognizing named entities in languages such as English, this information cannot aid in
determining the type of named entity, and in any case is often inaccurate or insufficient.
For example, the first word of a sentence is also capitalized, and named entities often span
several words, only some of which are capitalized. Furthermore, many other languages in
non-Western scripts (e.g. Chinese or Arabic) do not have any capitalization at all, and
even languages with capitalization may not consistently use it to distinguish names.

122 LOVELY PROFESSIONAL UNIVERSITY

123 124 125 126 127 128 129 130 131 132 133