Page 37 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 37

Unit 2: Data Mining Concept




          2.7  Data Mining functionalities — What kinds of patterns can be                      notes
               Mined?

          We have studied that the data mining can be performed on various types of data stores and
          database  systems.  On  mining  over  the  databases  two  kinds  of  patterns  can  be  discovered
          depending upon the data mining tasks employed:
          1.   Descriptive data mining tasks that describe the general properties of the existing data.
               These include data characterisation and discrimination.
          2.   Predictive data mining tasks that attempt to do predictions based on inference on available
               data.

          The data mining functionalities and the variety of knowledge they discover are briefly presented
          in the following list:

          characterisation

          Data  characterisation  is  a  summarisation  of  general  features  of  objects  in  a  target  class,  and
          produces  what  is  called  characteristic  rules.  The  data  relevant  to  a  user-specified  class  are
          normally retrieved by a database query and run through a summarisation module to extract the
          essence of the data at different levels of abstractions.


                 Example: One may want to characterise the OurVideoStore customers who regularly
          rent  more  than  30  movies  a  year.  With  concept  hierarchies  on  the  attributes  describing  the
          target class, the attributeoriented induction method can be used, for example, to carry out data
          summarisation.  Note  that  with  a  data  cube  containing  summarisation  of  data,  simple  OLAP
          operations fit the purpose of data characterisation.

          Discrimination

          Data discrimination produces what are called discriminant rules and is basically the comparison
          of  the  general  features  of  objects  between  two  classes  referred  to  as  the  target  class  and  the
          contrasting class.


                 Example: One may want to compare the general characteristics of the customers who
          rented more than 30 movies in the last year with those whose rental account is lower than 5.
          The techniques used for data discrimination are very similar to the techniques used for data
          characterisation  with  the  exception  that  data  discrimination  results  include  comparative
          measures.

          association analysis

          Association analysis is based on the association rules. It studies the frequency of items occurring
          together  in  transactional  databases,  and  based  on  a  threshold  called  support,  identifies  the
          frequent item sets. Another threshold, confidence, which is the conditional probability than an
          item appears in a transaction when another item appears, is used to pinpoint association rules.
          Association analysis is commonly used for market basket analysis.


                 Example: It could be useful for the OurVideoStore manager to know what movies are
          often rented together or if there is a relationship between renting a certain type of movies and
          buying popcorn or pop. The discovered association rules are of the form: P→Q [s, c], where P
          and Q are conjunctions of attribute value-pairs, and s (for support) is the probability that P and




                                           LoveLy professionaL university                                    31
   32   33   34   35   36   37   38   39   40   41   42