Page 208 - DCAP208_Management Support Systems

P. 208

Unit 12: Applications of Neural Network

12.2.7 Speechreading (Lipreading) Notes

As part of the research program Neuroinformatik the IPVR develops a neural speechreading
system as part of a user interface for a workstation. The three main parts of the system include
a face tracker (done by Marco Sommerau), lip modeling and speech processing (done by Michael
Vogt) and the development and application of SNNS for neural network training (done by
Günter Mamier).
Automatic speechreading is based on a robust lip image analysis. In this approach, no special
illumination or lip make-up is used. The analysis is based on true color video images. The
system allows for realtime tracking and storage of the lip region and robust off-line lip model
matching. The proposed model is based on cubic outline curves. A neural classifier detects
visibility of teeth edges and other attributes. At this stage of the approach the edge between the
closed lips is automatically modeled if applicable, based on a neural network’s decision.

To achieve high flexibility during lip-model development, a model description language has
been defined and implemented. The language allows the definition of edge models (in general)
based on knots and edge functions. Inner model forces stabilize the overall model shape. User
defined image processing functions may be applied along the model edges. These functions and
the inner forces contribute to an overall energy function.

!
Caution Adaptation of the model is done by gradient descent or simulated annealing like
algorithms.
The figure shows one configuration of the lip model, consisting of an upper lip edge and a lower
lip edge. The model edges are defined by Bezier-functions. Outer control knots stabilize the
position of the corners of the mouth.

Figure 12.13: Configuration of the Lip Model

Source: http://tralvex.com/pub/nap/#CoEvolution of Neural Networks for Control of Pursuit & Evasion
The model interpreter enables a permanent measurement of model knot positions and color
blends along model edges during adaptation to an utterance.
The resulting parameters may be used for speech recognition tasks in further steps.

Task Analyze the use of Bezier-functions.

LOVELY PROFESSIONAL UNIVERSITY 201

203 204 205 206 207 208 209 210 211 212 213