Page 125 - DMGT308_CUSTOMER_RELATIONSHIP_MANAGEMENT
P. 125
Customer Relationship Management
Notes the objects or groups close to one another, until all of the groups are merged into one (the
topmost level of the hierarchy), or until a termination condition holds. The divisive approach,
also called the “top-down” approach, starts with all the objects in the same cluster. In each
successive iteration, a cluster is split up into smaller clusters, until eventually each object is in
one cluster, or until a termination condition holds.
Hierarchical methods suffer from the fact that once a step (merge or split) is done, it can never be
undone. This rigidity is useful in that it leads to smaller computation costs by not worrying
about a combinatorial number of different choices. However, a major problem of such techniques
is that they cannot correct erroneous decisions.
It can be advantageous to combine iterative relocation and hierarchical agglomeration by first
using a hierarchical agglomerative algorithm and then refining the result using iterative
relocation. Some scalable clustering algorithms, such as BIRCH and CURE, have been developed
based on such an integrated approach.
Density-based Methods
Most partitioning methods cluster objects based on the distance between objects. Such methods
can find only spherical-shaped clusters and encounter difficulty at discovering clusters of arbitrary
shapes. Other clustering methods have been developed based on the notion of density. Their
general idea is to continue growing the given cluster as long as the density (number of objects
or data points) in the “neighbourhood” exceeds some threshold, i.e., for each data point within
a given cluster, the neighbourhood of a given radius has to contain at least a minimum number
of points. Such a method can be used to filter out noise (outliers), and discover clusters of
arbitrary shape.
DBSCAN is a typical density-based method which grows clusters according to a density threshold.
OPTICS is a density-based method which computes an augmented clustering ordering for
automatic and interactive cluster analysis.
Grid-based Methods
Grid-based methods quantize the object space into a finite number of cells which form a grid
structure. All of the clustering operations are performed on the grid structure (i.e., on the
quantized space). The main advantage of this approach is its fast processing time which is
typically independent of the number of data objects, and dependent only on the number of cells
in each dimension in the quantized space. STING is a typical example of a grid-based method.
CLIQUE and Wave-Cluster are two clustering algorithms which are both grid-based and
density-based.
Model-based Methods
Model-based methods hypothesize a model for each of the clusters, and find the best fit of the
data to the given model. A model-based algorithm may locate clusters by constructing a density
function that reflects the spatial distribution of the data points. It also leads to a way of
automatically determining the number of clusters based on standard statistics, taking “noise” or
outliers into account and thus yielding robust clustering methods.
Nearest Neighbour
In pattern recognition, the k-nearest neighbour’s algorithm (k-NN) is a method for classifying
objects based on closest training examples in the feature space. k-NN is a type of instance-based
learning, or lazy learning where the function is only approximated locally and all computation
120 LOVELY PROFESSIONAL UNIVERSITY