Page 121 - DMGT308_CUSTOMER_RELATIONSHIP

Page 121 - DMGT308_CUSTOMER_RELATIONSHIP_MANAGEMENT

P. 121

Customer Relationship Management

Notes An oft-stated goal of data mining is the discovery of patterns and relationships among different
variables in the database. This is no different from some of the goals of statistical inference:
consider for instance, simple linear regression. Similarly, the pair-wise relationship between
the products sold above can be nicely represented by means of an undirected weighted graph,
with products as the nodes and weighted edges for the presence of the particular product pair in
as many transactions as proportional to the weights. While undirected graphs provide a graphical
display, directed a cyclic graphs are perhaps more interesting – they provide understanding of
the phenomena driving the relationships between the variables. The nature of these relationships
can be analyzed using classical and modern statistical tools such as regression, neural networks
and so on.
Another aspect of knowledge discovery is supervised learning. Statistical tools such as
discriminant analysis or classification trees often need to be refined for these problems. Some
additional methods to be investigated here are k-nearest neighbour methods, bootstrap
aggregation or bagging, and boosting which originally evolved in the machine learning
literature, but whose statistical properties have been analyzed in recent years by statisticians.
Boosting is particularly useful in the context of data streams – when we have rapid data flowing
into the system and real-time classification rules are needed. Such capability is especially desirable
in the context of financial data, to guard against credit card and calling card fraud, when transactions
are streaming in from several sources and an automated split-second determination of fraudulent
or genuine use has to be made, based on past experience.
Another important aspect of knowledge discovery is unsupervised learning or clustering, which
is the categorization of the observations in a dataset into an a priori unknown number of
groups, based on some characteristic of the observations. This is a very difficult problem, and is
only compounded when the database is massive. Hierarchical clustering, probability based
methods, as well as optimization partitioning algorithms are all difficult to apply here. Maitra
(2001) develops, under restrictive Gaussian equal-dispersion assumptions, a multipass scheme
which clusters an initial sample, filters out observations that can be reasonably classified by
these clusters, and iterates the above procedure on the remainder. This method is scalable,
which means that it can be used on datasets of any size.
The field of data mining, like statistics, concerns itself with “learning from data” or “turning
data into information”.

5.2.3 Clustering

Cluster analysis is used to form groups or clusters of similar records based on several measures
made on these records. The key idea is to characterize the clusters in ways that would be useful
for the aims of the analysis. This data has been applied in many areas, including astronomy,
archaeology, medicine, chemistry, education, psychology, linguistics and sociology.

Example: Biologists have made extensive use of classes and subclasses to organize species.
A spectacular success of the clustering idea in chemistry was Mendeleev’s periodic table of the
elements.

One popular use of cluster analysis in marketing is for market segmentation: customers are
segmented based on demographic and transaction history information and a marketing strategy
is tailored for each segment. Another use is for market structure analysis identifying groups of
similar products according to competitive measures of similarity. In marketing and political
forecasting, clustering of neighbourhoods using U.S. postal zip codes has been used successfully
to group neighbourhoods by lifestyles. Claritas, a company that pioneered this approach, grouped
neighbourhoods into 40 clusters using various measures of consumer expenditure and
demographics. Examining the clusters enabled Claritas to come up with evocative names, such

116 LOVELY PROFESSIONAL UNIVERSITY

116 117 118 119 120 121 122 123 124 125 126