Page 124 - DMGT308_CUSTOMER_RELATIONSHIP_MANAGEMENT
P. 124

Unit 5: Closed Loop Marketing




               semantic interpretations and  applications. It is important  to study  how an application  Notes
               goal may influence the selection of clustering methods.
          With these requirements in mind, our study of cluster analysis proceeds as follows. Firstly, we
          study different types of data and how they can influence clustering methods. Secondly, we
          present a general categorization of clustering methods. We then study each clustering method
          in  detail,  including  partitioning  methods, hierarchical  methods,  density-based  methods,
          grid-based methods, and model-based methods. We also examine clustering in high-dimensional
          space and outlier analysis.

          A Categorization of Major Clustering Methods

          There exist a large number of clustering algorithms in the literature. The choice of clustering
          algorithm depends  both on  the type  of data  available and on  the particular purpose and
          application. If cluster analysis is used as a descriptive or exploratory tool, it is possible to try
          several algorithms on the same data to see what the data may disclose.

          In general, major clustering methods can be classified into the following categories.
          Partitioning Methods


          Given a database of n objects or data tuples, a partitioning method constructs k partitions of the
          data, where each partition represents a cluster, and k< n. That is, it classifies the data into k
          groups, which together satisfy the following requirements:
          1.   Each group must contain at least one object, and
          2.   Each object must belong to exactly one group. Notice that the second requirement can be
               relaxed in some fuzz partitioning techniques.3
          Given  k, the  number of  partitions  to  construct, a  partitioning method  creates  an  initial
          partitioning. It  then uses an iterative relocation technique  which attempts  to improve  the
          partitioning by  moving objects from one group to  another. The  general criterion of a  good
          partitioning is that objects in the same  cluster are  “close” or  related to each other, whereas
          objects of different clusters are “far apart” or very different. There are various kinds of other
          criteria for judging the quality of partitions.

          To achieve global optimality in partitioning-based clustering would  require the  exhaustive
          enumeration of all of the possible partitions. Instead, most applications adopt one of two popular
          heuristic methods:
          1.   The k-means algorithm, where each cluster is represented by the mean value of the objects
               in the cluster; and

          2.   The k-medoids algorithm, where each cluster is represented by one of the objects located
               near the centre of the cluster.
          These heuristic clustering methods work well for finding spherical-shaped clusters in small to
          medium sized databases. For finding clusters with complex shapes and for clustering very large
          data sets, partitioning-based methods need to be extended.

          Hierarchical Methods

          A hierarchical method creates a hierarchical decomposition of the given set of data objects. A
          hierarchical method can be classified as being either agglomerative or divisive, based on how
          the  hierarchical  decomposition is  formed.  The  agglomerative  approach,  also  called  the
          “bottom-up” approach, starts with each object forming a separate group. It successively merges



                                           LOVELY PROFESSIONAL UNIVERSITY                                   119
   119   120   121   122   123   124   125   126   127   128   129