Page 74 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 74

Data Warehousing and Data Mining




                    notes
                                                 figure 4.3: effects of rotation and scaling on euclidean Distance















































                                   other Distances

                                   There are also many other distances that can be used for different data. Edit distance fits sequence
                                   and text data. The Tanimoto distance is suitable for data with binary-valued features.

                                   Actually, data normalization is one way to overcome the limitation of the distance functions.
                                   Functions.  For  example,  normalizing  the  data  to  the  same  scale  can  overcome  the  scaling
                                   problem of Euclidean distance, however, normalization may lead to information loss and lower
                                   classification accuracy.

                                   4.7 Classification by Decision Tree

                                   A decision tree is a flow-chart-like tree structure, where each internal node denotes a test on an
                                   attribute, each branch represents an outcome of the test, and leaf nodes represent classes or class
                                   distributions. The topmost node in a tree is the root node.











          68                               LoveLy professionaL university
   69   70   71   72   73   74   75   76   77   78   79