Page 146 - DCAP208_Management Support Systems
P. 146

Unit 9: Data Mining




          The data mining functionalities and the variety of knowledge they discover are briefly presented  Notes
          in the following list:

               Characterization: Data characterization is a summarization of general features of objects
               in a target class, and produces what is called characteristic rules. The data relevant to a user-
               specified class are normally retrieved by a database query and run through a
               summarization module to extract the essence of the data at different levels of abstractions.


                 Example: One may want to characterize the OurVideoStore customers who regularly
          rent more than 30 movies a year. With concept hierarchies on the attributes describing the target
          class, the  attribute-oriented induction n method can be used, for example, to carry out data
          summarization. Note that with a data cube containing summarization of data, simple OLAP
          operations fit the purpose of data characterization.
               Discrimination: Data discrimination produces what are called discriminant rules and is
               basically the comparison of the general features of objects between two classes referred to
               as the target class and the contrasting class.


                 Example: One may want to compare the general characteristics of the customers who
          rented more than 30 movies in the last year with those whose rental account is lower than 5.
               The techniques used for data discrimination are very similar to the techniques used for
               data characterization with the exception that data discrimination results include comparative
               measures.
               Association analysis: Association analysis is the discovery of what are commonly called
               association rules. It studies the frequency of items occurring together in transactional
               databases, and based on a threshold called support, identifies the frequent item sets. Another
               threshold, confidence, which is the conditional probability than an item appears in a
               transaction when another item appears, is used to pinpoint association rules. Association
               analysis is commonly used for market basket analysis.


                 Example: It could be useful for the OurVideoStore manager to know what movies are
          often rented together or if there is a relationship between renting a certain type of movies and
          buying popcorn or pop.
               The discovered association rules are of the form: P -> Q [s,c], where P and Q are conjunctions
               of attribute value-pairs, and s (for support) is the probability that P and Q appear together
               in a transaction and c (for confidence) is the conditional probability that Q appears in a
               transaction when P is present.


                 Example: The hypothetic association rule: RentType(X, “game”) AND Age(X, “13-19”) ->
          Buys(X, “pop”) [s=2%, c=55%] would indicate that 2% of the transactions considered are of
          customers aged between 13 and 19 who are renting a game and buying a pop, and that there is
          a certainty of 55% that teenage customers who rent a game also buy pop.

               Classification: Classification analysis is the organization of data in given classes. Also
               known a supervised classification, the classification uses given class labels to order the
               objects in the data collection. Classification approaches normally use a training set where
               all objects are already associated with known class labels. The classification algorithm
               learns from the training set and builds a model. The model is used to classify new objects.







                                           LOVELY PROFESSIONAL UNIVERSITY                                   139
   141   142   143   144   145   146   147   148   149   150   151