Page 73 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 73
Unit 4: Data Mining Classification
city Block Distance notes
City block distance, sometimes called Manhattan distance is defines as
Let x, y € X, where x = {x , x , …., x }and y = {y , y , …., y }.
k
2
k
1
1
2
k
Then, d (x, y) = ∑ │x – y │
CityBlock I=1 i i
This measure reflects the sum of the absolute distances along each coordinate axis. In Figure 4.2,
the city block distance between P and P is given by
1
2
D(P , P ) = │1 – 5│ + │3 – 1│ = 6
1 2
Although the city block distance is easy to compute, it is variant to scaling, rotation and many
other transformations. In other words, the similarity is not preserved by the city block distance
after these transformations. Such a distance measure would not be appropriate for many types of
data (e.g., images) which may be invariant to rotation and scaling.
figure 4.2: city clock Distance between two points in 2D space
euclidean Distance
Euclidean distance is the most common distance used as the dissimilarity measure. It is defined
as
Figure 4.3 illustrate the effects the rotations of scaling on Euclidean distance in a 2D space. It
is obvious from Figure 4.3 that dissimilarity is preserved after rotation. But after scaling the
x-axis, the dissimilarity between objects is changed. So Euclidean distance is invariant to rotation,
but not to scaling. If rotation is the only acceptable operation for an image database, Euclidean
distance would be a good choice.
LoveLy professionaL university 67