Page 168 - DCAP208_Management Support Systems
P. 168

Unit 10: Data Mining Tools and Techniques




                                                                                                Notes
                 Table 10.2: Some of the Differences between the Nearest-Neighbor Data Mining
                                      Technique and Clustering
                      Nearest Neighbor                       Clustering
              Used for prediction as well as   Used mostly for consolidating data into a high-level
              consolidation.                  view and general grouping of records into like
                                              behaviors.
              Space is defined by the problem to be   Space is defined as default n-dimensional  space, or
              solved (supervised learning).   is defined by the user, or is a predefined space
                                              driven by past experience (unsupervised learning).
              Generally only uses distance metrics to   Can use other metrics besides distance to determine
              determine nearness.             nearness of two records - for example linking two
                                              points together.
          10.2.4 Decision Trees


          A decision tree is a predictive model that, as its name implies, can be viewed as a tree. Specifically
          each branch of the tree is a classification question and the leaves of the tree are partitions of the
          dataset with their classification.
          For example, if we were going to classify customers who churn (don’t renew their phone contracts)
          in the Cellular Telephone Industry a decision tree might look something like that found in
          Figure 10.2.
             Figure 10.2: A Decision Tree is a  Predictive  Model that makes a Prediction on the Basis of
                          a  Series of  Decision much Like the Game of 20 Questions

                                                    50 Churners
                                                    50 Non-Churners
                                                    New Technology Phone?

                                                 Yes                No
                                           30 Churners      20 Churners
                                           50 Non-Churners  0 Non-Churners
                                           Customer < 2.3 years?
                                   Yes                No
                           25 Churners      5 Churners
                           10 Non-Churners  40 Non-Churners
                           Age < 55
                            Yes                No
                     20 Churners      5 Churners
                     0 Non-Churners   10 Non-Churners



          You may notice some interesting things about the tree:

               It divides up the data on each branch point without losing any of the data (the number of
               total records in a given parent node is equal to the sum of the records contained in its two
               children).

               The number of churners and non-churners is conserved as you move up or down the tree
               It is pretty easy to understand how the model is being built (in contrast to the models from
               neural networks or from standard statistics).




                                           LOVELY PROFESSIONAL UNIVERSITY                                   161
   163   164   165   166   167   168   169   170   171   172   173