Page 226 - DCAP310_INTRODUCTION_TO_ARTIFICIAL_INTELLIGENCE_AND_EXPERT_SYSTEMS
P. 226

Introduction to Artificial Intelligence & Expert Systems




                    Notes          complexity and termination properties may be different. Parsing algorithms are usually designed
                                   for classes of grammar rather than tailored towards individual grammars.




                                     Notes There are several important properties that a parsing algorithm should have if it is
                                     to be practically useful. It should be ‘sound’ with respect to a given grammar and lexicon;
                                     that is, it should not assign to an input sentence analyses which cannot arise from the
                                     grammar in question.

                                   It should also be ‘complete’; that it should assign to an input sentence all the analyses it can have
                                   with respect to the current grammar and lexicon. Ideally, the algorithm should also be ‘efficient’,
                                   entailing the minimum of computational work consistent with fulfilling the first two
                                   requirements, and ‘robust’: behaving in a reasonably sensible way when presented with a
                                   sentence that it is unable to fully analyse successfully. In this discussion, we generally ignore the
                                   latter two requirements: to some extent, they are covered by the companion section on ‘chart
                                   parsing’. There are several other dimensions on which is useful to characterise the behaviour of
                                   parsing algorithms. One can characterise their search strategy in terms of the characteristic
                                   alternatives of depth-first or breadth-first (q.v.). Orthogonally, one can characterise them in
                                   terms of the direction in which a structure is built: from the words upwards (‘bottom up’), or
                                   from the root node downwards (‘top down’). A third dimension is in terms of the sequential
                                   processing of input words: usually this is left-to-right, but right-to-left or ‘middle-out’ strategies
                                   are also feasible and may be preferable in some applications (e.g. parsing the output of a speech
                                   recogniser).

                                   12.3.1 Top Down Parsing

                                   We can illustrate a simple top down algorithm for a CFG as follows. Let the grammar contain
                                   the following rules (which for simplicity also introduce lexical entries instead of presupposing
                                   a separate component for this):
                                     S → NP VP
                                     NP → they | fish
                                     VP → Aux VP
                                     VP → Vi
                                     VP → Vt NP
                                     Aux → can

                                     Vi → fish
                                     Vt → can
                                   This will assign to the sentence ‘they can sh’ two distinct analyses corresponding to two
                                   interpretations ‘they are able/permitted to sh’ and ‘they put sh in cans’. (It will also generate
                                   several other good sentences and lots of odd ones). The top down algorithm is very simple. We
                                   begin with the start symbol, in this case, S, and see what rules it figures in as the mother. Then
                                   we look at the daughters of these rules and see whether the first one matches the next word in the
                                   input. If it does, we do the same for the remaining daughters. If not, we go through the same
                                   loop again, this time matching the daughters of the rules already found with mothers of other
                                   rules. The algorithm is easily illustrated on the input sentence ‘they sh’: the ‘dot’ precedes the
                                   daughter of the rule we are currently looking at, and the level of indentation indicates which
                                   rule has invoked the current action.



          220                               LOVELY PROFESSIONAL UNIVERSITY
   221   222   223   224   225   226   227   228   229   230   231