Page 158 - DCAP506_ARTIFICIAL_INTELLIGENCE
P. 158

Artificial Intelligence




                    Notes          as incompatible approaches, but Moser and Moore (1996), as well as Marcu, (1997) have suggested
                                   that this is not the case.
                                   Now briefly mention some NLP-methods used for discourse processing.

                                   Methods

                                   The primary problem with the methods, is (at least in the authors eyes) not what method to
                                   choose, but what unit to manipulate with; What features, and in what relation to each other?
                                   In discourse processing a mixture of NLP-methods can be used. This because in performing
                                   discourse processing, one need the whole arsenal of tools for the lower  levels, e.g. tagging,
                                   parsing. The really difficult problem with discourse processing is to isolate the relevant features,
                                   and to make use of them in an efficient way, i.e. what kind of information is relevant to tag, what
                                   information is relevant to store, and what kinds of information is needed in different applications?
                                   How to categorise, how to remember, and what perspective to take. Technically we are free to
                                   choose any method that we might use for e.g. PoS - tagging, but we must first isolate what kind
                                   of units or categories that is relevant to mark up. This means that for a certain application, it
                                   might be the case that there is no need to give account for the full complexity in discourse
                                   processing, but a shallow analysis would do. In another application, a more fine grained analysis
                                   might be needed. Finite  state methods  have been  used for discourse processing in terms of
                                   information extraction (Hobbs et al. 1997). The system FASTUS uses a cascaded non-deterministic
                                   finite-state automaton. The system is in five steps extracting (1) names and fixed expressions,
                                   (2) basic noun groups, verb groups prepositions and other particles, (3) complex noun groups
                                   and verb groups, (4) corresponding event structures, (5) distinct event structures that describe
                                   the same event are detected and merged. The “lean” finite-state method was claimed to be very
                                   successful for the task, as compared to the more complicated TACITUS system (Hobbs et al., 1993),
                                   which included representation of discourse relations, based on abductive inferences.

                                          Example: FASTUS seems to be an example of a successful limitation of steps to carry out
                                   of discourse processing in the task of information extraction.
                                   Statistical methods for anaphora resolution have been reported by Mitkov & Schmidt (1996). The
                                   strategy used was an Uncertainty Reasoning approach, i.e. a scoring system was used, and in the
                                   end the candidate with the highest score was chosen. This strategy performed slightly worse,
                                   than one based on constraints and preferences, i.e. a more rule-based approach (Mitkov and
                                   Schmidt, 1996). Machine learning has been used for discourse segmenting by Passonneau & Litman
                                   (1997). On basis of human discourse segmenting the machine learning algorithm was trained.
                                   The results gave about the same accuracy as human annotators.

                                   Applications

                                   In this step the second question posed by Barbara Grosz (1997) becomes important, i.e. how to
                                   generally understand the issue of adoption? This issue is less investigated than the issues addressed
                                   earlier, try to avoid this field as much as possible.
                                   1.  Natural Language Understanding: In the NLU system TACITUS Hobbs et al. (1993) have
                                       made use of coherence relation, in aim to get a full representation of the message. The
                                       system is to a certain extent based on coherence relations, as described by Hobbs (1985),
                                       but the inference machinery is based on abduction. The knowledge base is an important
                                       factor.

                                   2.  Automatic Summarization:  Automatic  text  summarization  is  an application  where
                                       discourse understanding is crucial, i.e. it is important to be able to extract what is central




          152                               LOVELY PROFESSIONAL UNIVERSITY
   153   154   155   156   157   158   159   160   161   162   163