Page 37 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 37
Unit 2: Data Mining Concept
2.7 Data Mining functionalities — What kinds of patterns can be notes
Mined?
We have studied that the data mining can be performed on various types of data stores and
database systems. On mining over the databases two kinds of patterns can be discovered
depending upon the data mining tasks employed:
1. Descriptive data mining tasks that describe the general properties of the existing data.
These include data characterisation and discrimination.
2. Predictive data mining tasks that attempt to do predictions based on inference on available
data.
The data mining functionalities and the variety of knowledge they discover are briefly presented
in the following list:
characterisation
Data characterisation is a summarisation of general features of objects in a target class, and
produces what is called characteristic rules. The data relevant to a user-specified class are
normally retrieved by a database query and run through a summarisation module to extract the
essence of the data at different levels of abstractions.
Example: One may want to characterise the OurVideoStore customers who regularly
rent more than 30 movies a year. With concept hierarchies on the attributes describing the
target class, the attributeoriented induction method can be used, for example, to carry out data
summarisation. Note that with a data cube containing summarisation of data, simple OLAP
operations fit the purpose of data characterisation.
Discrimination
Data discrimination produces what are called discriminant rules and is basically the comparison
of the general features of objects between two classes referred to as the target class and the
contrasting class.
Example: One may want to compare the general characteristics of the customers who
rented more than 30 movies in the last year with those whose rental account is lower than 5.
The techniques used for data discrimination are very similar to the techniques used for data
characterisation with the exception that data discrimination results include comparative
measures.
association analysis
Association analysis is based on the association rules. It studies the frequency of items occurring
together in transactional databases, and based on a threshold called support, identifies the
frequent item sets. Another threshold, confidence, which is the conditional probability than an
item appears in a transaction when another item appears, is used to pinpoint association rules.
Association analysis is commonly used for market basket analysis.
Example: It could be useful for the OurVideoStore manager to know what movies are
often rented together or if there is a relationship between renting a certain type of movies and
buying popcorn or pop. The discovered association rules are of the form: P→Q [s, c], where P
and Q are conjunctions of attribute value-pairs, and s (for support) is the probability that P and
LoveLy professionaL university 31