Page 126 - DMGT308_CUSTOMER_RELATIONSHIP_MANAGEMENT
P. 126
Unit 5: Closed Loop Marketing
is deferred until classification. It can also be used for regression. The k-nearest neighbour Notes
algorithm is amongst the simplest of all machine learning algorithms. An object is classified by
a majority vote of its neighbours, with the object being assigned to the class most common
amongst its k nearest neighbours. k is a positive integer, typically small. If k = 1, then the object
is simply assigned to the class of its nearest neighbour. In binary (two class) classification
problems, it is helpful to choose k to be an odd number as this avoids tied votes.
The same method can be used for regression, by simply assigning the property value for the
object to be the average of the values of its k nearest neighbours. It can be useful to weight the
contributions of the neighbours, so that the nearer neighbours contribute more to the average
than the more distant ones.
The neighbours are taken from a set of objects for which the correct classification (or, in the case
of regression, the value of the property) is known. This can be thought of as the training set for
the algorithm, though no explicit training step is required. In order to identify neighbours, the
objects are represented by position vectors in a multidimensional feature space. It is usual to use
the Euclidean distance, though other distance measures, such as the Manhattan distance could in
principle be used instead. The k-nearest neighbour algorithm is sensitive to the local structure
of the data.
Figure 5.2: Example of k-NN Classification
The test sample (green circle) should be classified either to the first class of blue squares or to the second
class of red triangles. If k = 3 it is classified to the second class because there are 2 triangles and only 1
square inside the inner circle. If k = 5 it is classified to first class (3 squares vs. 2 triangles inside the outer
circle)
Source: http://scn.sap.com/docs/DOC-5036
Algorithm of Nearest Neighbour
Example of k-NN classification: The test sample (green circle) should be classified either to the
first class of blue squares or to the second class of red triangles. If k = 3 it is classified to the
second class because there are 2 triangles and only 1 square inside the inner circle. If k = 5 it is
classified to first class (3 squares vs. 2 triangles inside the outer circle).
The training examples are vectors in a multidimensional feature space. The space is partitioned
into regions by locations and labels of the training samples. A point in the space is assigned
to the class c if it is the most frequent class label among the k nearest training samples.
LOVELY PROFESSIONAL UNIVERSITY 121