Page 73 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 73

Unit 4: Data Mining Classification




          city Block Distance                                                                   notes
          City block distance, sometimes called Manhattan distance is defines as

          Let x, y € X, where x = {x , x , …., x }and y = {y , y , …., y }.
                                      k
                                2
                                                       k
                              1
                                               1
                                                 2
                            k
          Then, d     (x, y) = ∑   │x  – y  │
                CityBlock     I=1   i  i
          This measure reflects the sum of the absolute distances along each coordinate axis. In Figure 4.2,
          the city block distance between P  and P  is given by
                                    1
                                          2
          D(P , P ) = │1 – 5│ + │3 – 1│ = 6
             1  2
          Although the city block distance is easy to compute, it is variant to scaling, rotation and many
          other transformations. In other words, the similarity is not preserved by the city block distance
          after these transformations. Such a distance measure would not be appropriate for many types of
          data (e.g., images) which may be invariant to rotation and scaling.
                         figure 4.2: city clock Distance between two points in 2D space

























          euclidean Distance

          Euclidean distance is the most common distance used as the dissimilarity measure. It is defined
          as







          Figure 4.3 illustrate the effects the rotations of scaling on Euclidean distance in a 2D space. It
          is obvious from Figure  4.3 that dissimilarity is preserved after rotation. But after scaling the
          x-axis, the dissimilarity between objects is changed. So Euclidean distance is invariant to rotation,
          but not to scaling. If rotation is the only acceptable operation for an image database, Euclidean
          distance would be a good choice.










                                           LoveLy professionaL university                                    67
   68   69   70   71   72   73   74   75   76   77   78