Page 170 - DCAP208_Management Support Systems
P. 170

Unit 10: Data Mining Tools and Techniques




          have been used for problems ranging from credit card attrition prediction to time series prediction  Notes
          of the exchange rate of different international currencies. There are also some problems where
          decision trees will not do as well. Some very simple problems where the prediction is just a
          simple multiple of the predictor can be solved much more quickly and easily by linear regression.




             Notes  Usually the models to be built and the interactions to be detected are much more
            complex in real world problems and this is where decision trees excel.

          Using Decision Trees for Exploration

          The decision tree technology can be used for exploration of the dataset and business problem.
          This is often done by looking at the predictors and values that are chosen for each split of the
          tree. Often times these predictors provide usable insights or propose questions that need to be
          answered. For instance, if you ran across the following in your database for cellular phone churn
          you might seriously wonder about the way your tele-sales operators were making their calls
          and maybe change the way that they are compensated: “IF customer lifetime < 1.1 years AND
          sales channel = tele-sales THEN chance of churn is 65%.

          Using Decision Trees for Data Preprocessing

          Another way that the decision tree technology has been used is for preprocessing data for other
          prediction algorithms. Because the algorithm is fairly robust with respect to a variety of predictor
          types (e.g. number, categorical, etc.) and because it can be run relatively quickly decision trees
          can be used on the first pass of a data mining run to create a subset of possibly useful predictors
          that can then be fed into neural networks, nearest neighbor and normal statistical routines -
          which can take a considerable amount of time to run if there are large numbers of possible
          predictors to be used in the model.

          Decision Tress for Prediction

          Although some forms of decision trees were initially developed as exploratory tools to refine
          and preprocess data for more standard statistical techniques like logistic regression. They have
          also been used and more increasingly often being used for prediction. This is interesting because
          many statisticians will still use decision trees for exploratory analysis effectively building a
          predictive model as a by product but then ignore the predictive model in favor of techniques
          that they are most comfortable with. Sometimes veteran analysts will do this even excluding the
          predictive model when it is superior to that produced by other techniques. With a host of new
          products and skilled users now appearing this tendency to use decision trees only for exploration
          now seems to be changing.

          The First Step is Growing the Tree

          The first step in the process is that of growing the tree. Specifically the algorithm seeks to create
          a tree that works as perfectly as possible on all the data that is available. Most of the time it is not
          possible to have the algorithm work perfectly. There is always noise in the database to some
          degree (there are variables that are not being collected that have an impact on the target you are
          trying to predict).
          The name of the game in growing the tree is in finding the best possible question to ask at each
          branch point of the tree. At the bottom of the tree you will come up with nodes that you would
          like to be all of one type or the other. Thus the question: “Are you over 40?” probably does not



                                           LOVELY PROFESSIONAL UNIVERSITY                                   163
   165   166   167   168   169   170   171   172   173   174   175