Page 199 - DCAP310_INTRODUCTION_TO_ARTIFICIAL_INTELLIGENCE_AND_EXPERT_SYSTEMS
P. 199
Unit 10: Matching Techniques
Syntax: Notes
Here is generic SQL syntax of SELECT command along with LIKE clause to fetch data from
MySQL table:
SELECT field1, field2,...fieldN table_name1, table_name2...
WHERE field1 LIKE condition1 [AND [OR]] filed2 = ‘somevalue’
Self Assessment
State whether the following statements are true or false:
7. The LIKE conditions specify a test involving pattern matching.
8. LIKE2 uses USC4 code points.
9. Char1 is a character expression, such as a character column, called the pattern.
10.4 Prediction by Partial Matching (PPM)
Prediction by Partial Matching (PPM) is an adaptive statistical data compression technique
based on context modeling and prediction. PPM models use a set of previous symbols in the
uncompressed symbol stream to predict the next symbol in the stream. PPM algorithms can also
be used to cluster data into predicted groupings in cluster analysis.
Predictions are usually reduced to symbol rankings. The number of previous symbols, n,
determines the order of the PPM model which is denoted as PPM(n). Unbounded variants where
the context has no length limitations also exist and are denoted as PPM*. If no prediction can be
made based on all n context symbols a prediction is attempted with n – 1 symbols. This process
is repeated until a match is found or no more symbols remain in context. At that point a fixed
prediction is made. PPM compression implementations vary greatly in other details. The actual
symbol selection is usually recorded using arithmetic coding, though it is also possible to use
Huffman encoding or even some type of dictionary coding technique.
The underlying model used in most PPM algorithms can also be extended to predict multiple
symbols. It is also possible to use non-Markov modeling to either replace or supplement Markov
modeling. The symbol size is usually static, typically a single byte, which makes generic handling
of any file format easy. Prediction by Partial Matching is a method to predict the next symbol
depending on n previous. This method is else called prediction by Markov Model of order n. In
case of text in natural language like English it is clear intuitively and proved by some researchers
that probability of every next symbol is highly dependent on previous symbols. The most
obvious method to use this feature is to incorporate it with one of known entropy compressions
which have to gather statistics. The natural candidates are Huffman and Arithmetic compressions.
It can be Adaptive or Static variants of this algorithms.
The key issue is to change alphabet to be alphabet of sequences and not alphabet of single
symbols:
Assume Markov model of order n.
Let the input stream to be sequence x.
For each symbol x that is in sequence x on place i:
i
Update probability of sequence y, y = [x ... x ].
i-n i
Let y to be a symbol in the target alphabet.
LOVELY PROFESSIONAL UNIVERSITY 193