Page 125 - DMGT308_CUSTOMER_RELATIONSHIP_MANAGEMENT
P. 125

Customer Relationship Management




                    Notes          the objects or groups  close to  one another, until all of the groups are merged into  one (the
                                   topmost level of the hierarchy), or until a termination condition holds. The divisive approach,
                                   also called the “top-down” approach, starts with  all the objects in the same  cluster. In each
                                   successive iteration, a cluster is split up into smaller clusters, until eventually each object is in
                                   one cluster, or until a termination condition holds.
                                   Hierarchical methods suffer from the fact that once a step (merge or split) is done, it can never be
                                   undone. This rigidity is useful in that it leads to smaller computation costs by not  worrying
                                   about a combinatorial number of different choices. However, a major problem of such techniques
                                   is that they cannot correct erroneous decisions.

                                   It can be advantageous to combine iterative relocation and hierarchical agglomeration by first
                                   using  a hierarchical  agglomerative algorithm  and  then  refining the  result  using  iterative
                                   relocation. Some scalable clustering algorithms, such as BIRCH and CURE, have been developed
                                   based on such an integrated approach.

                                   Density-based Methods

                                   Most partitioning methods cluster objects based on the distance between objects. Such methods
                                   can find only spherical-shaped clusters and encounter difficulty at discovering clusters of arbitrary
                                   shapes. Other clustering methods have been developed based on the notion of density. Their
                                   general idea is to continue growing the given cluster as long as the density (number of objects
                                   or data points) in the “neighbourhood” exceeds some threshold, i.e., for each data point within
                                   a given cluster, the neighbourhood of a given radius has to contain at least a minimum number
                                   of points. Such a method can  be used to filter out noise  (outliers), and  discover clusters of
                                   arbitrary shape.
                                   DBSCAN is a typical density-based method which grows clusters according to a density threshold.
                                   OPTICS is a density-based method which computes  an augmented  clustering ordering  for
                                   automatic and interactive cluster analysis.

                                   Grid-based Methods

                                   Grid-based methods quantize the object space into a finite number of cells which form a grid
                                   structure. All of the clustering operations are performed  on the  grid structure  (i.e., on  the
                                   quantized space). The main  advantage of this approach is its fast processing time which is
                                   typically independent of the number of data objects, and dependent only on the number of cells
                                   in each dimension in the quantized space. STING is a typical example of a grid-based method.
                                   CLIQUE  and  Wave-Cluster  are two clustering algorithms  which are  both grid-based  and
                                   density-based.

                                   Model-based Methods

                                   Model-based methods hypothesize a model for each of the clusters, and find the best fit of the
                                   data to the given model. A model-based algorithm may locate clusters by constructing a density
                                   function that reflects  the spatial distribution  of the  data  points.  It also  leads  to  a  way  of
                                   automatically determining the number of clusters based on standard statistics, taking “noise” or
                                   outliers into account and thus yielding robust clustering methods.

                                   Nearest Neighbour

                                   In pattern recognition, the k-nearest neighbour’s algorithm (k-NN) is a method for classifying
                                   objects based on closest training examples in the feature space. k-NN is a type of instance-based
                                   learning, or lazy learning where the function is only approximated locally and all computation



          120                               LOVELY PROFESSIONAL UNIVERSITY
   120   121   122   123   124   125   126   127   128   129   130