Page 158 - DCAP506_ARTIFICIAL_INTELLIGENCE
P. 158
Artificial Intelligence
Notes as incompatible approaches, but Moser and Moore (1996), as well as Marcu, (1997) have suggested
that this is not the case.
Now briefly mention some NLP-methods used for discourse processing.
Methods
The primary problem with the methods, is (at least in the authors eyes) not what method to
choose, but what unit to manipulate with; What features, and in what relation to each other?
In discourse processing a mixture of NLP-methods can be used. This because in performing
discourse processing, one need the whole arsenal of tools for the lower levels, e.g. tagging,
parsing. The really difficult problem with discourse processing is to isolate the relevant features,
and to make use of them in an efficient way, i.e. what kind of information is relevant to tag, what
information is relevant to store, and what kinds of information is needed in different applications?
How to categorise, how to remember, and what perspective to take. Technically we are free to
choose any method that we might use for e.g. PoS - tagging, but we must first isolate what kind
of units or categories that is relevant to mark up. This means that for a certain application, it
might be the case that there is no need to give account for the full complexity in discourse
processing, but a shallow analysis would do. In another application, a more fine grained analysis
might be needed. Finite state methods have been used for discourse processing in terms of
information extraction (Hobbs et al. 1997). The system FASTUS uses a cascaded non-deterministic
finite-state automaton. The system is in five steps extracting (1) names and fixed expressions,
(2) basic noun groups, verb groups prepositions and other particles, (3) complex noun groups
and verb groups, (4) corresponding event structures, (5) distinct event structures that describe
the same event are detected and merged. The “lean” finite-state method was claimed to be very
successful for the task, as compared to the more complicated TACITUS system (Hobbs et al., 1993),
which included representation of discourse relations, based on abductive inferences.
Example: FASTUS seems to be an example of a successful limitation of steps to carry out
of discourse processing in the task of information extraction.
Statistical methods for anaphora resolution have been reported by Mitkov & Schmidt (1996). The
strategy used was an Uncertainty Reasoning approach, i.e. a scoring system was used, and in the
end the candidate with the highest score was chosen. This strategy performed slightly worse,
than one based on constraints and preferences, i.e. a more rule-based approach (Mitkov and
Schmidt, 1996). Machine learning has been used for discourse segmenting by Passonneau & Litman
(1997). On basis of human discourse segmenting the machine learning algorithm was trained.
The results gave about the same accuracy as human annotators.
Applications
In this step the second question posed by Barbara Grosz (1997) becomes important, i.e. how to
generally understand the issue of adoption? This issue is less investigated than the issues addressed
earlier, try to avoid this field as much as possible.
1. Natural Language Understanding: In the NLU system TACITUS Hobbs et al. (1993) have
made use of coherence relation, in aim to get a full representation of the message. The
system is to a certain extent based on coherence relations, as described by Hobbs (1985),
but the inference machinery is based on abduction. The knowledge base is an important
factor.
2. Automatic Summarization: Automatic text summarization is an application where
discourse understanding is crucial, i.e. it is important to be able to extract what is central
152 LOVELY PROFESSIONAL UNIVERSITY