Page 229 - DCAP310_INTRODUCTION_TO_ARTIFICIAL_INTELLIGENCE_AND_EXPERT_SYSTEMS
P. 229
Unit 12: Natural Language Processing
We proceed by matching words to the right hand sides of rules, and then matching the resulting Notes
symbols, or sequences of symbols, to the right hand sides of rules until we have an S covering
the whole sentence. Again, we have magically made the right choices at each point. We can
provide a complete algorithm for one type of left-to-right, bottom up parsing procedure in
terms of operations on items like those used in the top down parser. (Since we shall only be
dealing with completely parsed constituents we can dispense with the dots.)
(i) If there is a rule whose right hand side matches the next word in the input, create a new
item with a new tree made from that rule on the end of the tree list, and the remainder of
the input.
Example:
<[], [they fish]>
becomes:
<[NP they], [fish] >
(ii) If there is a rule with n daughters matching the n most recent trees on the list, in order,
create a new item which combines those daughters into an instance of the mother:
< [NP they], [VP [Vi fish]], []>
becomes:
< [S [NP they][VP [Vi fish]]], []>
A successful parse is represented by an item with no more input and a single S rooted tree on the
list.
The sequence of items produced by this method in parsing ‘they sh’ is:
1. [] [they fish]
2. [NP they], [fish]
3. [NP they], [NP fish] []
4. [NP they], [Vi fish] []
5. [NP they], [VP [Vi fish]] []
6. [S [NP they] [VP [Vi fish]]] []
Nothing can happen to item 3 and it is discarded. This particular type of bottom up algorithm is
known as a ‘shift-reduce’ parser. Operation (i) shifts a word from the input on to the list of trees,
which is operating like a ‘stack’ (a last-in-first-out store) insofar as it is only possible to get at the
most recent items. Operation (ii) ‘reduces’ the list of trees by combining several of them from
the top of the stack into one constituent. A generalization of this type of algorithm is familiar
from computer science: the LR(k) family can be seen as shift-reduce algorithms with a certain
amount (‘k’ words) of look ahead to determine, for a set of possible states of the parser, which
action to take. The sequence of actions from a given grammar can be pre computed to give a
‘parsing table’ saying whether a shift or reduce is to be performed, and which state to go to next.
The use of this technique for natural language parsing has been promoted by Tomita (1987),
among others.
While in general, bottom up algorithms are more efficient than top down algorithms, one
particular phenomenon that they deal with only clumsily are ‘empty rules’: rules in which the
right hand side is the empty string. Such rules are theoretically dispensable in a context free
LOVELY PROFESSIONAL UNIVERSITY 223