Page 75 - DCAP603_DATAWARE_HOUSING_AND

Page 75 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING

P. 75

Unit 4: Data Mining Classification

For example, to decide whether play golf or not, let us consider the following decision tree notes
(see Figure)

figure 4.4: Decision tree for the golf example

In order to determine the decision (classification) for a given set of weather conditions from the
decision tree, first look at the value of Outlook. There are three possibilities.
1. If the value of Outlook is sunny, next consider the value of Humidity. If the value is less
than or equal to 75 the decision is play. Otherwise the decision is don’t play.
2. If the value of Outlook is overcast, the decision is play.
3. If the value of Outlook is rain, next consider the value of Windy. If the value is true the
decision is don’t play, otherwise the decision is play.
Decision Trees are useful for predicting exact outcomes. Applying the decision trees algorithm
to a training dataset results in the formation of a tree that allows the user to map a path to
a successful outcome. At every node along the tree, the user answers a question (or makes a
“decision”), such as “play” or “don’t play”.
The decision trees algorithm would be useful for a bank that wants to ascertain the characteristics
of good customers. In this case, the predicted outcome is whether or not the applicant represents
a bad credit risk. The outcome of a decision tree may be a Yes/No result (applicant is/is not a bad
credit risk) or a list of numeric values, with each value assigned a probability.
The training dataset consists of the historical data collected from past loans. Attributes that affect
credit risk might include the customer’s educational level, the number of kids the customer has,
or the total household income. Each split on the tree represents a decision that influences the final
predicted variable. For example, a customer who graduated from high school may be more likely
to pay back the loan. The variable used in the first split is considered the most significant factor.
So if educational level is in the first split, it is the factor that most influences credit risk.

Decision trees have been used in many application areas ranging from medicine to game theory
and business. Decision trees are the basis of several commercial rule induction systems.

4.7.1 Basic algorithm for Learning Decision trees

Algorithm: Generate a decision tree from the given training data.
Input: The training samples, samples, represented by discrete-valued attributes; the set of
candidate attributes, attribute-list.
Output: A decision tree.

LoveLy professionaL university 69

70 71 72 73 74 75 76 77 78 79 80