Page 225 - DCAP310_INTRODUCTION_TO_ARTIFICIAL_INTELLIGENCE_AND_EXPERT_SYSTEMS
P. 225
Unit 12: Natural Language Processing
3
A context-free language can be recognized in O(n ) time. For every context-free language, a Notes
3
machine can be built that takes a string as input and determines in O(n ) time whether the string
is a member of the language, where n is the length of the string. Deterministic context-free
languages are a subset of context-free languages that can be recognized in linear time. There
exist various algorithms that target either this set of languages or some subset of it.
In regular grammars, the left hand side is again only a single non-terminal symbol, but now the
right-hand side is also restricted. The right side may be the empty string, or a single terminal
symbol, or a single terminal symbol followed by a non-terminal symbol, but nothing else.
n n
The language defined above is not regular, but the language {a b |m, n ≥ 1} (at least 1 a followed
by at least 1 b, where the numbers may be different) is, as it can be defined by the grammar G
3
with N = {S, A, B}, Σ = {a, b}, S the start symbol, and the following production rules:
1. S → aA
2. A → aA
3. A → bB
4. B → bB
5. B → ∈
All languages generated by a regular grammar can be recognized in linear time by a finite state
machine. Although, in practice, regular grammars are commonly expressed using regular
expressions, some forms of regular expression used in practice do not strictly generate the
regular languages and do not show linear recognitional performance due to those deviations.
Self Assessment
State whether the following statements are true or false:
4. A grammar is a set of production rules for strings in a formal language.
5. Grammar is the process of recognizing an utterance by breaking it down to a set of
symbols and analyzing each one against the grammar of the language.
6. A production rule is applied to a string by replacing one occurrence of its right-hand side
in the string by its left-hand side.
12.3 Basic Parsing Techniques
‘Parsing’ is the term used to describe the process of automatically building syntactic analyses of
a sentence in terms of a given grammar and lexicon. The resulting syntactic analyses may be
used as input to a process of semantic interpretation, (or perhaps phonological interpretation,
where aspects of this, like prosody, are sensitive to syntactic structure). Occasionally, ‘parsing’ is
also used to include both syntactic and semantic analysis. We use it in the more conservative
sense here, however. In most contemporary grammatical formalisms, the output of parsing is
something logically equivalent to a tree, displaying dominance and precedence relations between
constituents of a sentence, perhaps with further annotations in the form of attribute-value
equations (‘features’) capturing other aspects of linguistic description. However, there are many
different possible linguistic formalisms, and many ways of representing each of them, and
hence many different ways of representing the results of parsing. We shall assume here a simple
tree representation, and an underlying context-free grammatical (CFG) formalism. However,
all of the algorithms described here can usually be used for more powerful unification based
formalisms, provided these retain a context-free ‘backbone’, although in these cases their
LOVELY PROFESSIONAL UNIVERSITY 219