Page 70 - DCAP603_DATAWARE_HOUSING_AND

Page 70 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING

P. 70

Data Warehousing and Data Mining

notes In other words, Naïve Bayes classifiers assume that the effect of a variable value on a given
class is independent of the values of other variable. This assumption is called class conditional
independence. It is made to simplify the computation and in this sense considered to be
“Naïve”.
This assumption is a fairly strong assumption and is often not applicable. However, bias
in estimating probabilities often may not make a difference in practice - it is the order of the
probabilities, not their exact values, that determine the classifications.

Naive Bayes Classifier

The Naive Bayes Classifier technique is based on the so-called Bayesian theorem and is particularly
suited when the dimensionality of the inputs is high. Despite its simplicity, Naive Bayes can often
outperform more sophisticated classification methods.

To demonstrate the concept of Naïve Bayes Classification, consider the example displayed in the
illustration above. As indicated, the objects can be classified as either GREEN or RED. Our task
is to classify new cases as they arrive, i.e., decide to which class label they belong, based on the
currently exiting objects.
Since there are twice as many GREEN objects as RED, it is reasonable to believe that a new case
(which hasn’t been observed yet) is twice as likely to have membership GREEN rather than RED.
In the Bayesian analysis, this belief is known as the prior probability. Prior probabilities are based
on previous experience, in this case the percentage of GREEN and RED objects, and often used to
predict outcomes before they actually happen.
Thus, we can write:

Since there is a total of 60 objects, 40 of which are GREEN and 20 RED, our prior probabilities for
class membership are:

40
Prior probability for Green µ
60
20
Prior probability for Red µ
60

64 LoveLy professionaL university

65 66 67 68 69 70 71 72 73 74 75