Page 69 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 69

Unit 4: Data Mining Classification




          Each term in Bayes’ theorem has a conventional name:                                  notes
          1.   P(A) is the prior probability or marginal probability of A. It is “prior” in the sense that it
               does not take into account any information about B.
          2.   P(A|B) is the conditional probability of A, given B. It is also called the posterior probability
               because it is derived from or depends upon the specified value of B.

          3.   P(B|A) is the conditional probability of B given A.
          4.   P(B) is the prior or marginal probability of B, and acts as a normalizing constant.
          Intuitively, Bayes’ theorem in this form describes the way in which one’s beliefs about observing
          ‘A’ are updated by having observed ‘B’.

          Bayes Theorem: Example

          Use training examples to estimate class-conditional probability density functions for white-blood
          cell count (W)

                            0.4
                           0.35

                            0.3

                           0.25

                         P(W/D i )   0.2

                           0.15
                            0.1

                           0.05

                             0     20      25      30      35      40      45      50      55      60


                                          White blood cell count (W)

          Could use these to select maximum likelihood hypothesis.

          4.4 Naive Bayesian Classification


          Suppose  your  data  consist  of  fruits,  described  by  their  color  and  shape.  Bayesian  classifiers
          operate by saying “If you see a fruit that is red and round, which type of fruit is it most likely
          to be, based on the observed data sample? In future, classify red and round fruit as that type of
          fruit.”
          A difficulty arises when you have more than a few variables and classes - you would require an
          enormous number of observations (records) to estimate these probabilities.
          Naive  Bayes  classification  gets  around  this  problem  by  not  requiring  that  you  have  lots  of
          observations for each possible combination of the variables. Rather, the variables are assumed to
          be independent of one another and, therefore the probability that a fruit that is red, round, firm,
          3” in diameter, etc. will be an apple can be calculated from the independent probabilities that a
          fruit is red, that it is round, that it is firm, that it is 3” in diameter, etc.






                                           LoveLy professionaL university                                    63
   64   65   66   67   68   69   70   71   72   73   74