Page 225 - DCAP310_INTRODUCTION_TO_ARTIFICIAL_INTELLIGENCE_AND_EXPERT_SYSTEMS
P. 225

Unit 12: Natural Language Processing




                                                   3
          A context-free language can be recognized in O(n ) time. For every context-free language, a  Notes
                                                                  3
          machine can be built that takes a string as input and determines in O(n ) time whether the string
          is a member of the language, where n is the length of the string. Deterministic context-free
          languages are a subset of context-free languages that can be recognized in linear time. There
          exist various algorithms that target either this set of languages or some subset of it.
          In regular grammars, the left hand side is again only a single non-terminal symbol, but now the
          right-hand side is also restricted. The right side may be the empty string, or a single terminal
          symbol, or a single terminal symbol followed by a non-terminal symbol, but nothing else.
                                                            n n
          The language defined above is not regular, but the language {a b |m, n ≥ 1} (at least 1 a followed
          by at least 1 b, where the numbers may be different) is, as it can be defined by the grammar G
                                                                                      3
          with N = {S, A, B}, Σ = {a, b}, S the start symbol, and the following production rules:
          1.   S → aA

          2.   A → aA
          3.   A → bB
          4.   B → bB
          5.   B → ∈

          All languages generated by a regular grammar can be recognized in linear time by a finite state
          machine. Although, in practice, regular grammars are commonly expressed using regular
          expressions, some forms of regular expression used in practice do not strictly generate the
          regular languages and do not show linear recognitional performance due to those deviations.

          Self Assessment

          State whether the following statements are true or false:
          4.   A grammar is a set of production rules for strings in a formal language.

          5.   Grammar is the process of recognizing an utterance by breaking it down to a set of
               symbols and analyzing each one against the grammar of the language.
          6.   A production rule is applied to a string by replacing one occurrence of its right-hand side
               in the string by its left-hand side.

          12.3 Basic Parsing Techniques

          ‘Parsing’ is the term used to describe the process of automatically building syntactic analyses of
          a sentence in terms of a given grammar and lexicon. The resulting syntactic analyses may be
          used as input to a process of semantic interpretation, (or perhaps phonological interpretation,
          where aspects of this, like prosody, are sensitive to syntactic structure). Occasionally, ‘parsing’ is
          also used to include both syntactic and semantic analysis. We use it in the more conservative
          sense here, however. In most contemporary grammatical formalisms, the output of parsing is
          something logically equivalent to a tree, displaying dominance and precedence relations between
          constituents of a sentence, perhaps with further annotations in the form of attribute-value
          equations (‘features’) capturing other aspects of linguistic description. However, there are many
          different possible linguistic formalisms, and many ways of representing each of them, and
          hence many different ways of representing the results of parsing. We shall assume here a simple
          tree representation, and an underlying context-free grammatical (CFG) formalism. However,
          all of the algorithms described here can usually be used for more powerful unification based
          formalisms, provided these retain a context-free ‘backbone’, although in these cases their




                                           LOVELY PROFESSIONAL UNIVERSITY                                   219
   220   221   222   223   224   225   226   227   228   229   230