Page 127 - DMGT308_CUSTOMER_RELATIONSHIP_MANAGEMENT
P. 127
Customer Relationship Management
Notes Usually Euclidean distance is used as the distance metric; however this will only work with
numerical values. In cases such as text classification another metric, such as the overlap metric
(or Hamming distance) can be used.
The training phase of the algorithm consists only of storing the feature vectors and class labels
of the training samples. In the actual classification phase, the test sample (whose class is not
known) is represented as a vector in the feature space. Distances from the new vector to all
stored vectors are computed and k closest samples are selected. There are a number of ways to
classify the new vector to a particular class; one of the most used techniques is to predict the new
vector to the most common class amongst the K nearest neighbours. A major drawback to using
this technique to classify a new vector to a class is that the classes with the more frequent
examples tend to dominate the prediction of the new vector, as they tend to come up in the K
nearest neighbours when the neighbours are computed due to their large number. One of the
ways to overcome this problem is to take into account the distance of each K nearest neighbours
with the new vector that is to be classified and predict the class of the new vector based on these
distances.
Decision Trees
A decision tree is a structure that can be used to divide up a large collection of records into
successively smaller sets of records by applying a sequence of simple decision rules. With each
successive division, the members of the resulting sets become more and more similar to one
another. The familiar division of living things into kingdoms, phyla, classes, orders, families,
genera, and species, invented by the Dwedish botanist Carl Linnaeous in the 1730s, provides a
good example. Within the animal kingdom, a particular animal is assigned to the phylum
chordata if it has a spinal cord. Additional characteristics are used to further subdivide the
chordates into the birds, mammals, reptiles, and so on. These classes are further subdivided
until, at the lowest level in the taxonomy, members of the same species are not only
morphologically similar, they are capable of breeding and producing fertile offspring.
Decision trees are simple knowledge representation and they classify examples to a finite number
of classes, the nodes are labelled with attribute names, the edges are labelled with possible
values for this attribute and the leaves labelled with different classes. Objects are classified by
following a path down the tree, by taking the edges, corresponding to the values of the attributes
in an object.
The following is an example of objects that describe the weather at a given time. The objects
contain information on the outlook, humidity etc. Some objects are positive examples denote by
P and others are negative i.e. N. Classification is in this case the construction of a tree structure,
illustrated in the following diagram, which can be used to classify all the objects correctly.
Figure 5.3: Decision Tree Structure
Source: http://scn.sap.com/docs/DOC-5036
122 LOVELY PROFESSIONAL UNIVERSITY