Page 74 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 74
Data Warehousing and Data Mining
notes
figure 4.3: effects of rotation and scaling on euclidean Distance
other Distances
There are also many other distances that can be used for different data. Edit distance fits sequence
and text data. The Tanimoto distance is suitable for data with binary-valued features.
Actually, data normalization is one way to overcome the limitation of the distance functions.
Functions. For example, normalizing the data to the same scale can overcome the scaling
problem of Euclidean distance, however, normalization may lead to information loss and lower
classification accuracy.
4.7 Classification by Decision Tree
A decision tree is a flow-chart-like tree structure, where each internal node denotes a test on an
attribute, each branch represents an outcome of the test, and leaf nodes represent classes or class
distributions. The topmost node in a tree is the root node.
68 LoveLy professionaL university