Page 229 - DCAP310_INTRODUCTION_TO_ARTIFICIAL_INTELLIGENCE_AND_EXPERT_SYSTEMS
P. 229

Unit 12: Natural Language Processing




          We proceed by matching words to the right hand sides of rules, and then matching the resulting  Notes
          symbols, or sequences of symbols, to the right hand sides of rules until we have an S covering
          the whole sentence. Again, we have magically made the right choices at each point. We can
          provide a complete algorithm for one type of left-to-right, bottom up parsing procedure in
          terms of operations on items like those used in the top down parser. (Since we shall only be
          dealing with completely parsed constituents we can dispense with the dots.)
          (i)  If there is a rule whose right hand side matches the next word in the input, create a new
               item with a new tree made from that rule on the end of the tree list, and the remainder of
               the input.


                 Example:
            <[], [they fish]>

            becomes:
            <[NP they], [fish] >
          (ii)  If there is a rule with n daughters matching the n most recent trees on the list, in order,
               create a new item which combines those daughters into an instance of the mother:

               < [NP they], [VP [Vi fish]], []>
               becomes:
               < [S [NP they][VP [Vi fish]]], []>

          A successful parse is represented by an item with no more input and a single S rooted tree on the
          list.
          The sequence of items produced by this method in parsing ‘they sh’ is:

          1.   []                               [they fish]
          2.   [NP they],                       [fish]
          3.   [NP they], [NP fish]              []
          4.   [NP they], [Vi fish]              []
          5.   [NP they], [VP [Vi fish]]         []

          6.   [S [NP they] [VP [Vi fish]]]      []
          Nothing can happen to item 3 and it is discarded. This particular type of bottom up algorithm is
          known as a ‘shift-reduce’ parser. Operation (i) shifts a word from the input on to the list of trees,
          which is operating like a ‘stack’ (a last-in-first-out store) insofar as it is only possible to get at the
          most recent items. Operation (ii) ‘reduces’ the list of trees by combining several of them from
          the top of the stack into one constituent. A generalization of this type of algorithm is familiar
          from computer science: the LR(k) family can be seen as shift-reduce algorithms with a certain
          amount (‘k’ words) of look ahead to determine, for a set of possible states of the parser, which
          action to take. The sequence of actions from a given grammar can be pre computed to give a
          ‘parsing table’ saying whether a shift or reduce is to be performed, and which state to go to next.
          The use of this technique for natural language parsing has been promoted by Tomita (1987),
          among others.
          While in general, bottom up algorithms are more efficient than top down algorithms, one
          particular phenomenon that they deal with only clumsily are ‘empty rules’: rules in which the
          right hand side is the empty string. Such rules are theoretically dispensable in a context free





                                           LOVELY PROFESSIONAL UNIVERSITY                                   223
   224   225   226   227   228   229   230   231   232   233   234