Page 126 - DMGT308_CUSTOMER_RELATIONSHIP_MANAGEMENT
P. 126

Unit 5: Closed Loop Marketing




          is  deferred until classification. It  can also be used  for regression.  The k-nearest neighbour  Notes
          algorithm is amongst the simplest of all machine learning algorithms. An object is classified by
          a  majority vote  of its neighbours, with the object being assigned  to the class most  common
          amongst its k nearest neighbours. k is a positive integer, typically small. If k = 1, then the object
          is simply  assigned to the class  of its nearest neighbour.  In binary  (two class) classification
          problems, it is helpful to choose k to be an odd number as this avoids tied votes.
          The same method can be used for regression, by simply assigning the property value for the
          object to be the average of the values of its k nearest neighbours. It can be useful to weight the
          contributions of the neighbours, so that the nearer neighbours contribute more to the average
          than the more distant ones.
          The neighbours are taken from a set of objects for which the correct classification (or, in the case
          of regression, the value of the property) is known. This can be thought of as the training set for
          the algorithm, though no explicit training step is required. In order to identify neighbours, the
          objects are represented by position vectors in a multidimensional feature space. It is usual to use
          the Euclidean distance, though other distance measures, such as the Manhattan distance could in
          principle be used instead. The k-nearest neighbour algorithm is sensitive to the local structure
          of the data.
                               Figure  5.2: Example  of k-NN  Classification


























          The test sample (green circle) should be classified either to the first class of blue squares or to the second
          class of red triangles. If k = 3 it is classified to the second class because there are 2 triangles and only 1
          square inside the inner circle. If k = 5 it is classified to first class (3 squares vs. 2 triangles inside the outer
          circle)
          Source:  http://scn.sap.com/docs/DOC-5036

          Algorithm of Nearest Neighbour

          Example of k-NN classification: The test sample (green circle) should be classified either to the
          first class of blue squares or to the second class of red triangles. If k = 3 it is classified to the
          second class because there are 2 triangles and only 1 square inside the inner circle. If k = 5 it is
          classified to first class (3 squares vs. 2 triangles inside the outer circle).
          The training examples are vectors in a multidimensional feature space. The space is partitioned
          into regions by locations and labels of the training samples. A point in the space is assigned
          to  the class  c if it is the most  frequent class  label  among  the k  nearest training samples.



                                           LOVELY PROFESSIONAL UNIVERSITY                                   121
   121   122   123   124   125   126   127   128   129   130   131