Page 124 - DMGT308_CUSTOMER_RELATIONSHIP_MANAGEMENT
P. 124
Unit 5: Closed Loop Marketing
semantic interpretations and applications. It is important to study how an application Notes
goal may influence the selection of clustering methods.
With these requirements in mind, our study of cluster analysis proceeds as follows. Firstly, we
study different types of data and how they can influence clustering methods. Secondly, we
present a general categorization of clustering methods. We then study each clustering method
in detail, including partitioning methods, hierarchical methods, density-based methods,
grid-based methods, and model-based methods. We also examine clustering in high-dimensional
space and outlier analysis.
A Categorization of Major Clustering Methods
There exist a large number of clustering algorithms in the literature. The choice of clustering
algorithm depends both on the type of data available and on the particular purpose and
application. If cluster analysis is used as a descriptive or exploratory tool, it is possible to try
several algorithms on the same data to see what the data may disclose.
In general, major clustering methods can be classified into the following categories.
Partitioning Methods
Given a database of n objects or data tuples, a partitioning method constructs k partitions of the
data, where each partition represents a cluster, and k< n. That is, it classifies the data into k
groups, which together satisfy the following requirements:
1. Each group must contain at least one object, and
2. Each object must belong to exactly one group. Notice that the second requirement can be
relaxed in some fuzz partitioning techniques.3
Given k, the number of partitions to construct, a partitioning method creates an initial
partitioning. It then uses an iterative relocation technique which attempts to improve the
partitioning by moving objects from one group to another. The general criterion of a good
partitioning is that objects in the same cluster are “close” or related to each other, whereas
objects of different clusters are “far apart” or very different. There are various kinds of other
criteria for judging the quality of partitions.
To achieve global optimality in partitioning-based clustering would require the exhaustive
enumeration of all of the possible partitions. Instead, most applications adopt one of two popular
heuristic methods:
1. The k-means algorithm, where each cluster is represented by the mean value of the objects
in the cluster; and
2. The k-medoids algorithm, where each cluster is represented by one of the objects located
near the centre of the cluster.
These heuristic clustering methods work well for finding spherical-shaped clusters in small to
medium sized databases. For finding clusters with complex shapes and for clustering very large
data sets, partitioning-based methods need to be extended.
Hierarchical Methods
A hierarchical method creates a hierarchical decomposition of the given set of data objects. A
hierarchical method can be classified as being either agglomerative or divisive, based on how
the hierarchical decomposition is formed. The agglomerative approach, also called the
“bottom-up” approach, starts with each object forming a separate group. It successively merges
LOVELY PROFESSIONAL UNIVERSITY 119