Page 68 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 68

Data Warehousing and Data Mining




                    notes          classifiers. Naive Bayesian classifiers assume that the effect of an attribute value on a given class
                                   is independent of the values of the other attributes. This assumption is called class conditional
                                   independence. It is made to simplify the computations involved, and in this sense, is considered
                                   “naive”. Bayesian belief networks are graphical models, which unlike naive Bayesian classifiers,
                                   allow the representation of dependencies among subsets of attributes.
                                   Apply Bayes Rule: c is the class, {v} observed attribute values:
                                               ({ }| ) ( ) c
                                             P v c P
                                   P c v
                                    ( |{ }) =
                                                  ({ })
                                                P v
                                   If we assume k possible disjoint diagnoses, c , …, c
                                                                      1    K
                                                 c
                                                       v
                                                    P
                                              P ({ } ({ }| c  )
                                    ( P c k  |{ }) =  k P ({ })  k
                                         v
                                                      v
                                   P({v}) may not be known, but total probability of diagnoses is 1
                                                          c
                                                             P
                                                                v
                                                           k
                                   P({v}) (the evidence):  Â P ({ } ({ }| c k )  = 1
                                                             ({ })
                                                     k     P v
                                   ⇒ P({v}) = S  |P(c ) P({v}|c |
                                            k    k      k
                                   Need to know P(c ), P({v}|c ) for all k
                                                 k
                                                         k
                                   Bayes Rules:
                                   Map vs. ML
                                   Rather than computing full posterior, can simplify computation if interested in classification:

                                   1.   ML (Maximum Likelihood) Hypothesis
                                       assume all hypotheses equiprobable a priori – simply maximize data likelihood:
                                                                 v
                                                                    c
                                                 c   =  argmax ({ }| )
                                                              P
                                                  ML
                                                           Œ
                                                           c C
                                   2.   MAP (Maximum A Posteriori) Class Hypothesis
                                                                    v
                                                               P
                                                                 c
                                                 c MAP  =  argmax ( |{ })
                                                            Œ
                                                           c C
                                                               P ({ } | ( ))
                                                                    c
                                                                  v
                                                                        c
                                                                      P
                                                      =  argmax
                                                                     v
                                                            Œ
                                                             c C  P ({ })
                                       can ignore denominator because same for all c
                                   Bayes theorem
                                   Bayes’ theorem relates the conditional and marginal probabilities of events A and B, where B has
                                   a non-vanishing probability:
                                                                            A
                                                                          B
                                                                                 A
                                                                               P
                                                                       P ( | ) ( )
                                                                   B
                                                              P ( | ) =            .
                                                                A
                                                                             B
                                                                           P ( )
          62                               LoveLy professionaL university
   63   64   65   66   67   68   69   70   71   72   73