Page 127 - DMGT308_CUSTOMER_RELATIONSHIP_MANAGEMENT
P. 127

Customer Relationship Management




                    Notes          Usually Euclidean distance is used as the distance metric; however this will only work with
                                   numerical values. In cases such as text classification another metric, such as the overlap metric
                                   (or Hamming distance) can be used.

                                   The training phase of the algorithm consists only of storing the feature vectors and class labels
                                   of the training samples. In the actual classification phase, the test sample (whose class is not
                                   known) is  represented as a vector in the feature space. Distances from the new vector to all
                                   stored vectors are computed and k closest samples are selected. There are a number of ways to
                                   classify the new vector to a particular class; one of the most used techniques is to predict the new
                                   vector to the most common class amongst the K nearest neighbours. A major drawback to using
                                   this technique to classify a  new vector  to a class is  that the  classes with  the more  frequent
                                   examples tend to dominate the prediction of the new vector, as they tend to come up in the K
                                   nearest neighbours when the neighbours are computed due to their large number. One of the
                                   ways to overcome this problem is to take into account the distance of each K nearest neighbours
                                   with the new vector that is to be classified and predict the class of the new vector based on these
                                   distances.

                                   Decision Trees

                                   A decision tree is a structure that can be used to divide up a large collection of records into
                                   successively smaller sets of records by applying a sequence of simple decision rules. With each
                                   successive division, the members of the resulting sets become more and more similar to one
                                   another. The familiar division of living things into kingdoms, phyla, classes, orders, families,
                                   genera, and species, invented by the Dwedish botanist Carl Linnaeous in the 1730s, provides a
                                   good example. Within the animal kingdom, a  particular animal  is assigned  to the phylum
                                   chordata if it has a spinal cord. Additional characteristics are  used to further subdivide the
                                   chordates into the birds, mammals, reptiles, and so on. These classes are further subdivided
                                   until,  at  the  lowest  level  in  the  taxonomy,  members  of  the  same  species  are  not  only
                                   morphologically similar, they are capable of breeding and producing fertile offspring.
                                   Decision trees are simple knowledge representation and they classify examples to a finite number
                                   of classes, the nodes are labelled with attribute  names, the  edges are labelled with  possible
                                   values for this attribute and the leaves labelled with different classes. Objects are classified by
                                   following a path down the tree, by taking the edges, corresponding to the values of the attributes
                                   in an object.
                                   The following is an example of objects that describe the weather at a given time. The objects
                                   contain information on the outlook, humidity etc. Some objects are positive examples denote by
                                   P and others are negative i.e. N. Classification is in this case the construction of a tree structure,
                                   illustrated in the following diagram, which can be used to classify all the objects correctly.

                                                           Figure 5.3:  Decision  Tree  Structure


















                                   Source:  http://scn.sap.com/docs/DOC-5036



          122                               LOVELY PROFESSIONAL UNIVERSITY
   122   123   124   125   126   127   128   129   130   131   132