Page 170 - DCAP208_Management Support Systems
P. 170
Unit 10: Data Mining Tools and Techniques
have been used for problems ranging from credit card attrition prediction to time series prediction Notes
of the exchange rate of different international currencies. There are also some problems where
decision trees will not do as well. Some very simple problems where the prediction is just a
simple multiple of the predictor can be solved much more quickly and easily by linear regression.
Notes Usually the models to be built and the interactions to be detected are much more
complex in real world problems and this is where decision trees excel.
Using Decision Trees for Exploration
The decision tree technology can be used for exploration of the dataset and business problem.
This is often done by looking at the predictors and values that are chosen for each split of the
tree. Often times these predictors provide usable insights or propose questions that need to be
answered. For instance, if you ran across the following in your database for cellular phone churn
you might seriously wonder about the way your tele-sales operators were making their calls
and maybe change the way that they are compensated: “IF customer lifetime < 1.1 years AND
sales channel = tele-sales THEN chance of churn is 65%.
Using Decision Trees for Data Preprocessing
Another way that the decision tree technology has been used is for preprocessing data for other
prediction algorithms. Because the algorithm is fairly robust with respect to a variety of predictor
types (e.g. number, categorical, etc.) and because it can be run relatively quickly decision trees
can be used on the first pass of a data mining run to create a subset of possibly useful predictors
that can then be fed into neural networks, nearest neighbor and normal statistical routines -
which can take a considerable amount of time to run if there are large numbers of possible
predictors to be used in the model.
Decision Tress for Prediction
Although some forms of decision trees were initially developed as exploratory tools to refine
and preprocess data for more standard statistical techniques like logistic regression. They have
also been used and more increasingly often being used for prediction. This is interesting because
many statisticians will still use decision trees for exploratory analysis effectively building a
predictive model as a by product but then ignore the predictive model in favor of techniques
that they are most comfortable with. Sometimes veteran analysts will do this even excluding the
predictive model when it is superior to that produced by other techniques. With a host of new
products and skilled users now appearing this tendency to use decision trees only for exploration
now seems to be changing.
The First Step is Growing the Tree
The first step in the process is that of growing the tree. Specifically the algorithm seeks to create
a tree that works as perfectly as possible on all the data that is available. Most of the time it is not
possible to have the algorithm work perfectly. There is always noise in the database to some
degree (there are variables that are not being collected that have an impact on the target you are
trying to predict).
The name of the game in growing the tree is in finding the best possible question to ask at each
branch point of the tree. At the bottom of the tree you will come up with nodes that you would
like to be all of one type or the other. Thus the question: “Are you over 40?” probably does not
LOVELY PROFESSIONAL UNIVERSITY 163