Page 146 - DCAP208_Management Support Systems
P. 146
Unit 9: Data Mining
The data mining functionalities and the variety of knowledge they discover are briefly presented Notes
in the following list:
Characterization: Data characterization is a summarization of general features of objects
in a target class, and produces what is called characteristic rules. The data relevant to a user-
specified class are normally retrieved by a database query and run through a
summarization module to extract the essence of the data at different levels of abstractions.
Example: One may want to characterize the OurVideoStore customers who regularly
rent more than 30 movies a year. With concept hierarchies on the attributes describing the target
class, the attribute-oriented induction n method can be used, for example, to carry out data
summarization. Note that with a data cube containing summarization of data, simple OLAP
operations fit the purpose of data characterization.
Discrimination: Data discrimination produces what are called discriminant rules and is
basically the comparison of the general features of objects between two classes referred to
as the target class and the contrasting class.
Example: One may want to compare the general characteristics of the customers who
rented more than 30 movies in the last year with those whose rental account is lower than 5.
The techniques used for data discrimination are very similar to the techniques used for
data characterization with the exception that data discrimination results include comparative
measures.
Association analysis: Association analysis is the discovery of what are commonly called
association rules. It studies the frequency of items occurring together in transactional
databases, and based on a threshold called support, identifies the frequent item sets. Another
threshold, confidence, which is the conditional probability than an item appears in a
transaction when another item appears, is used to pinpoint association rules. Association
analysis is commonly used for market basket analysis.
Example: It could be useful for the OurVideoStore manager to know what movies are
often rented together or if there is a relationship between renting a certain type of movies and
buying popcorn or pop.
The discovered association rules are of the form: P -> Q [s,c], where P and Q are conjunctions
of attribute value-pairs, and s (for support) is the probability that P and Q appear together
in a transaction and c (for confidence) is the conditional probability that Q appears in a
transaction when P is present.
Example: The hypothetic association rule: RentType(X, “game”) AND Age(X, “13-19”) ->
Buys(X, “pop”) [s=2%, c=55%] would indicate that 2% of the transactions considered are of
customers aged between 13 and 19 who are renting a game and buying a pop, and that there is
a certainty of 55% that teenage customers who rent a game also buy pop.
Classification: Classification analysis is the organization of data in given classes. Also
known a supervised classification, the classification uses given class labels to order the
objects in the data collection. Classification approaches normally use a training set where
all objects are already associated with known class labels. The classification algorithm
learns from the training set and builds a model. The model is used to classify new objects.
LOVELY PROFESSIONAL UNIVERSITY 139