Page 159 - DCAP506_ARTIFICIAL_INTELLIGENCE
P. 159
Unit 11: Natural Language Processing
to the message. Summarization systems have been developed by e.g. Marcu, (1997). The Notes
approach was to parse the discourse relations (rhetorical relations, Mann & Thompson,
1988), and on basis of the parse be able to choose what discourse segments were most
relevant. The approach relied to a large extent on surface cues, such as discourse markers.
3. Natural Language Generation: Marcu, 1997, also tested rhetorical relations for Natural
Language Generation (NLG). On basis of rhetorical relations the ordering preferences of
discourse segments was scored. The higher the score, the more likely that the discourse
structure was coherent. Kibble & Power (1999) uses Centering theory for planning the
most coherent stretch of utterances. These brief examples are of course just a very limited
sample of applications which uses some kind of discourse related information. Still I will
finish here and make some concluding remarks, and connect to my own dissertation
subject.
Task Illustrate the term “discourse”.
Self Assessment
Fill in the blanks:
6. In .............................. Analysis, Individual worlds are scrutinized into their components and
non word tokens, like punctuation are alienated from the words.
7. In .............................., Linear sequences of words are malformed into structures that illustrate
how the words associate to each other.
8. .............................. concentrates on scrutinizing the words in a sentence so as to reveal the
grammatical arrangement of the sentence.
9. Design patterns intend to increase the flexibility of a model by .............................. some
aspects of a class.
10. The term .............................. includes both spoken and written forms, as well as both
monologue and dialogue.
11. .............................. is what makes a collections of sentences/utterances a discourse.
11.3 Spell Checking
The goal of spell checking is the detection and rectification of typographic and orthographic
faults in the text at the level of word incidence measured out of its perspective.
No one can write without any faults. Even people well familiar with the rules of language can,
just by misfortune, press a wrong key on the keyboard (maybe adjoining to the correct one) or
miss out a letter. Moreover, when typing, one at times does not harmonize correctly the
movements of the hands and fingers. All such errors are known as typos , or typographic errors.
Alternatively, some people do not recognize the correct spelling of some words, particularly in
a foreign language. Such errors are known as spelling errors.
Initially, a spell checker simply detects the strings that are not accurate words in a specified
natural language. It is believed that most of the orthographic or typographic errors lead to
strings that are impracticable as separate words in this language. Identifying the errors that
exchange by accident one word into another obtainable word, like English ‘then’ into ‘than’ or
Spanish ‘czar’ into ‘Caesar’, considers a task which needs much more influential tools.
LOVELY PROFESSIONAL UNIVERSITY 153