Page 224 - DCAP310_INTRODUCTION_TO_ARTIFICIAL_INTELLIGENCE_AND_EXPERT_SYSTEMS
P. 224
Introduction to Artificial Intelligence & Expert Systems
Notes A formal grammar is a set of rules for rewriting strings, along with a “start symbol” from which
rewriting starts. Therefore, a grammar is usually thought of as a language generator. However,
it can also sometimes be used as the basis for a “recognizer”—a function in computing that
determines whether a given string belongs to the language or is grammatically incorrect. To
describe such recognizers, formal language theory uses separate formalisms, known as automata
theory.
Did u know? One of the interesting results of automata theory is that it is not possible to
design a recognizer for certain formal languages.
Parsing is the process of recognizing an utterance (a string in natural languages) by breaking it
down to a set of symbols and analyzing each one against the grammar of the language. Most
languages have the meanings of their utterances structured according to their syntax—a practice
known as compositional semantics. As a result, the first step to describing the meaning of an
utterance in language is to break it down part by part and look at its analyzed form.
A grammar mainly consists of a set of rules for transforming strings. (If it only consisted of these
rules, it would be a semi-Thue system.) To generate a string in the language, one begins with a
string consisting of only a single start symbol. The production rules are then applied in any
order, until a string that contains neither the start symbol nor designated non-terminal symbols
is produced. A production rule is applied to a string by replacing one occurrence of its left-hand
side in the string by its right-hand side (cf. the operation of the theoretical Turing machine). The
language formed by the grammar consists of all distinct strings that can be generated in this
manner. Any particular sequence of production rules on the start symbol yields a distinct string
in the language. If there are multiple ways of generating the same single string, the grammar is
said to be ambiguous.
Example: Assume the alphabet consists of a and b, the start symbol is S, and we have the
following production rules:
S → aSb
S → ba
Then we start with S, and can choose a rule to apply to it. If we choose rule 1, we obtain the string
aSb. If we then choose rule 1 again, we replace S with aSb and obtain the string aaSbb. If we now
choose rule 2, we replace S with ba and obtain the string aababb, and are done. We can write this
series of choices more briefly, using symbols:
S ⇒ aSb ⇒ aaSbb ⇒ aababb.
n
n
The language of the grammar is then the infinite set {a bab |n ≥ 0} = {ba, abab, aababb, aaababbb,…},
k
where a is a repeated k times.
A context-free grammar is a grammar in which the left-hand side of each production rule
consists of only a single non-terminal symbol. This restriction is non-trivial; not all languages
can be generated by context-free grammars. Those that can are called context-free languages.
The language defined above is not a context-free language, and this can be strictly proven using
the pumping lemma for context-free languages, but for example the language {a b |n ≥ 1} (at
n n
least 1 a followed by the same number of b’s) is context-free, as it can be defined by the grammar
G with N = {S}, Σ = {a, b}, S the start symbol, and the following production rules:
2
1. S → aSb
2. S → ab
218 LOVELY PROFESSIONAL UNIVERSITY