Page 123 - DMGT308_CUSTOMER_RELATIONSHIP_MANAGEMENT
P. 123

Customer Relationship Management




                    Notes          The guideline of striving for high intraclass similarity and low interclass similarity still applies.
                                   In data mining, efforts have focused on finding methods for efficient and effective cluster analysis
                                   in large databases. Active themes of research focus on the scalability of clustering methods, the
                                   effectiveness of methods for clustering complex shapes and types of data, high-dimensional
                                   clustering techniques, and methods for clustering mixed numerical and categorical data in large
                                   databases.
                                   Clustering is a challenging field of research where its potential applications pose their own
                                   special requirements. The following are typical requirements of clustering in data mining.
                                   1.  Scalability: Many clustering algorithms work well in small data sets containing less than
                                       200 data objects; however, a large database may contain millions of objects. Clustering on
                                       a sample of a given large data set may lead to biased results. Highly scalable clustering
                                       algorithms are needed.
                                   2.  Ability to deal with different types of attributes: Many algorithms are designed to cluster
                                       interval-based (numerical) data. However, applications may require clustering other types
                                       of data, such as binary, categorical (nominal), and ordinal data, or mixtures of these data
                                       types.

                                   3.  Discovery of clusters with arbitrary shape: Many clustering algorithms determine clusters
                                       based on Euclidean or Manhattan distance measures. Algorithms based on such distance
                                       measures tend to find spherical clusters with similar size and density. However, a cluster
                                       could be of any shape. It is important to develop algorithms which can detect clusters of
                                       arbitrary shape.
                                   4.  Minimal requirements  for domain  knowledge to  determine input  parameters:  Many
                                       clustering algorithms require users to input certain parameters in cluster analysis (such as
                                       the number of desired clusters). The clustering results are often quite sensitive to input
                                       parameters. Parameters are often hard to determine, especially for data sets containing
                                       high-dimensional objects. This not  only burdens users, but  also makes  the quality of
                                       clustering difficult to control.

                                   5.  Ability to deal with noisy data: Most real-world databases contain outliers or missing,
                                       unknown, or erroneous data. Some clustering algorithms are sensitive to such data and
                                       may lead to clusters of poor quality.
                                   6.  Insensitivity to the order of input records: Some clustering algorithms are sensitive to the
                                       order of input data, e.g., the same set of data, when presented with different orderings to
                                       such an algorithm, may generate dramatically different clusters. It is important to develop
                                       algorithms which are insensitive to the order of input.

                                   7.  High dimensionality: A database or a data warehouse may contain several dimensions or
                                       attributes.  Many  clustering algorithms  are  good  at  handling low-dimensional  data,
                                       involving only two to three dimensions. Human eyes are good at judging the quality of
                                       clustering  for  up  to  three  dimensions.  It  is  challenging  to  cluster  data  objects  in
                                       high-dimensional space, especially considering that data in high-dimensional space can
                                       be very sparse and highly skewed.
                                   8.  Constraint-based clustering:  Real-world applications  may need  to perform clustering
                                       under various kinds of constraints. Suppose that your job is to choose the locations for a
                                       given number of new automatic cash stations (ATMs) in a city. To decide upon this, you
                                       may cluster households while considering constraints such as the city’s rivers and highway
                                       networks, and customer requirements per region. A challenging task is to find groups of
                                       data with good clustering behaviour that satisfy specified constraints.
                                   9.  Interpretability  and  usability:  Users  expect  clustering  results  to  be  interpretable,
                                       comprehensible, and usable.  That is,  clustering may need to  be tied  up with specific


          118                               LOVELY PROFESSIONAL UNIVERSITY
   118   119   120   121   122   123   124   125   126   127   128