Page 70 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 70

Data Warehousing and Data Mining




                    notes          In other words, Naïve Bayes classifiers assume that the effect of a variable value on a given
                                   class is independent of the values of other variable. This assumption is called class conditional
                                   independence.  It  is  made  to  simplify  the  computation  and  in  this  sense  considered  to  be
                                   “Naïve”.
                                   This  assumption  is  a  fairly  strong  assumption  and  is  often  not  applicable.  However,  bias
                                   in estimating probabilities often may not make a difference in practice - it is the order of the
                                   probabilities, not their exact values, that determine the classifications.

                                   Naive Bayes Classifier

                                   The Naive Bayes Classifier technique is based on the so-called Bayesian theorem and is particularly
                                   suited when the dimensionality of the inputs is high. Despite its simplicity, Naive Bayes can often
                                   outperform more sophisticated classification methods.














                                   To demonstrate the concept of Naïve Bayes Classification, consider the example displayed in the
                                   illustration above. As indicated, the objects can be classified as either GREEN or RED. Our task
                                   is to classify new cases as they arrive, i.e., decide to which class label they belong, based on the
                                   currently exiting objects.
                                   Since there are twice as many GREEN objects as RED, it is reasonable to believe that a new case
                                   (which hasn’t been observed yet) is twice as likely to have membership GREEN rather than RED.
                                   In the Bayesian analysis, this belief is known as the prior probability. Prior probabilities are based
                                   on previous experience, in this case the percentage of GREEN and RED objects, and often used to
                                   predict outcomes before they actually happen.
                                   Thus, we can write:










                                   Since there is a total of 60 objects, 40 of which are GREEN and 20 RED, our prior probabilities for
                                   class membership are:


                                                          40
                                   Prior probability for Green µ
                                                          60
                                                         20
                                   Prior probability for Red µ
                                                         60









          64                               LoveLy professionaL university
   65   66   67   68   69   70   71   72   73   74   75