Page 224 - DCAP310_INTRODUCTION_TO_ARTIFICIAL_INTELLIGENCE_AND_EXPERT_SYSTEMS
P. 224

Introduction to Artificial Intelligence & Expert Systems




                    Notes          A formal grammar is a set of rules for rewriting strings, along with a “start symbol” from which
                                   rewriting starts. Therefore, a grammar is usually thought of as a language generator. However,
                                   it can also sometimes be used as the basis for a “recognizer”—a function in computing that
                                   determines whether a given string belongs to the language or is grammatically incorrect. To
                                   describe such recognizers, formal language theory uses separate formalisms, known as automata
                                   theory.



                                     Did u know? One of the interesting results of automata theory is that it is not possible to
                                     design a recognizer for certain formal languages.
                                   Parsing is the process of recognizing an utterance (a string in natural languages) by breaking it
                                   down to a set of symbols and analyzing each one against the grammar of the language. Most
                                   languages have the meanings of their utterances structured according to their syntax—a practice
                                   known as compositional semantics. As a result, the first step to describing the meaning of an
                                   utterance in language is to break it down part by part and look at its analyzed form.
                                   A grammar mainly consists of a set of rules for transforming strings. (If it only consisted of these
                                   rules, it would be a semi-Thue system.) To generate a string in the language, one begins with a
                                   string consisting of only a single start symbol. The production rules are then applied in any
                                   order, until a string that contains neither the start symbol nor designated non-terminal symbols
                                   is produced. A production rule is applied to a string by replacing one occurrence of its left-hand
                                   side in the string by its right-hand side (cf. the operation of the theoretical Turing machine). The
                                   language formed by the grammar consists of all distinct strings that can be generated in this
                                   manner. Any particular sequence of production rules on the start symbol yields a distinct string
                                   in the language. If there are multiple ways of generating the same single string, the grammar is
                                   said to be ambiguous.

                                          Example: Assume the alphabet consists of a and b, the start symbol is S, and we have the
                                   following production rules:
                                               S → aSb
                                               S → ba
                                   Then we start with S, and can choose a rule to apply to it. If we choose rule 1, we obtain the string
                                   aSb. If we then choose rule 1 again, we replace S with aSb and obtain the string aaSbb. If we now
                                   choose rule 2, we replace S with ba and obtain the string aababb, and are done. We can write this
                                   series of choices more briefly, using symbols:
                                               S ⇒ aSb ⇒ aaSbb ⇒ aababb.

                                                                             n
                                                                                n
                                   The language of the grammar is then the infinite set {a bab |n ≥ 0} = {ba, abab, aababb, aaababbb,…},
                                         k
                                   where a  is a repeated k times.
                                   A context-free grammar is a grammar in which the left-hand side of each production rule
                                   consists of only a single non-terminal symbol. This restriction is non-trivial; not all languages
                                   can be generated by context-free grammars. Those that can are called context-free languages.
                                   The language defined above is not a context-free language, and this can be strictly proven using
                                   the pumping lemma for context-free languages, but for example the language {a b |n ≥ 1} (at
                                                                                                    n n
                                   least 1 a followed by the same number of b’s) is context-free, as it can be defined by the grammar
                                   G  with N = {S}, Σ = {a, b}, S the start symbol, and the following production rules:
                                    2
                                   1.  S → aSb

                                   2.  S → ab




          218                               LOVELY PROFESSIONAL UNIVERSITY
   219   220   221   222   223   224   225   226   227   228   229