Page 199 - DCAP310_INTRODUCTION_TO_ARTIFICIAL_INTELLIGENCE_AND_EXPERT_SYSTEMS
P. 199

Unit 10: Matching Techniques




          Syntax:                                                                               Notes
          Here is generic SQL syntax of SELECT command along with LIKE clause to fetch data from
          MySQL table:
             SELECT field1, field2,...fieldN table_name1, table_name2...
             WHERE field1 LIKE condition1  [AND [OR]] filed2  = ‘somevalue’

          Self Assessment

          State whether the following statements are true or false:
          7.   The LIKE conditions specify a test involving pattern matching.
          8.   LIKE2 uses USC4 code points.

          9.   Char1 is a character expression, such as a character column, called the pattern.
          10.4 Prediction by Partial Matching (PPM)


          Prediction by Partial Matching (PPM) is an adaptive statistical data compression technique
          based on context modeling and prediction. PPM models use a set of previous symbols in the
          uncompressed symbol stream to predict the next symbol in the stream. PPM algorithms can also
          be used to cluster data into predicted groupings in cluster analysis.

          Predictions are usually reduced to symbol rankings. The number of previous symbols,  n,
          determines the order of the PPM model which is denoted as PPM(n). Unbounded variants where
          the context has no length limitations also exist and are denoted as PPM*. If no prediction can be
          made based on all n context symbols a prediction is attempted with n – 1 symbols. This process
          is repeated until a match is found or no more symbols remain in context. At that point a fixed
          prediction is made. PPM compression implementations vary greatly in other details. The actual
          symbol selection is usually recorded using arithmetic coding, though it is also possible to use
          Huffman encoding or even some type of dictionary coding technique.
          The underlying model used in most PPM algorithms can also be extended to predict multiple
          symbols. It is also possible to use non-Markov modeling to either replace or supplement Markov
          modeling. The symbol size is usually static, typically a single byte, which makes generic handling
          of any file format easy. Prediction by Partial Matching is a method to predict the next symbol
          depending on n previous. This method is else called prediction by Markov Model of order n. In
          case of text in natural language like English it is clear intuitively and proved by some researchers
          that probability of every next symbol is highly dependent on previous symbols. The most
          obvious method to use this feature is to incorporate it with one of known entropy compressions
          which have to gather statistics. The natural candidates are Huffman and Arithmetic compressions.
          It can be Adaptive or Static variants of this algorithms.
          The key issue is to change alphabet to be alphabet of sequences and not alphabet of single
          symbols:
               Assume Markov model of order n.

               Let the input stream to be sequence x.
               For each symbol x  that is in sequence x on place i:
                             i
                    Update probability of sequence y, y = [x  ... x ].
                                                    i-n  i
                    Let y to be a symbol in the target alphabet.






                                           LOVELY PROFESSIONAL UNIVERSITY                                   193
   194   195   196   197   198   199   200   201   202   203   204