Page 87 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 87

Unit 4: Data Mining Classification




          1r algorithm                                                                          notes

          One of the simple approaches used to find classification rules is called 1R, as it generated a one
          level decision tree. This algorithm examines the “rule that classify an object on the basis of a
          single attribute”.
          The basic idea is that rules are constructed to test a single attribute and branch for every value
          of that attribute. For each branch, the class with the best classification is the one occurring most
          often in the training data. The error rate of the rules is then determined by counting the number
          of instances that do not have the majority class in the training data. Finally, the error rate for each
          attribute’s rule set is evaluated, and the rule set with the minimum error rate is chosen.
          A comprehensive comparative evaluation of the performance of 1R and other methods on 16
          datasets (many of which were most commonly used in machine learning research) was performed.
          Despite it simplicity, 1R produced surprisingly accurate rules, just a few percentage points lower
          in accuracy than the decision produced by the state of the art algorithm (C4). The decision tree
          produced by C4 were in most cases considerably larger than 1R’s rules, and the rules generated
          by  1R  were  much  easier  to  interpret.  1R  teherfore  provides  a  baseline  performance  using  a
          rudimentary technique to be used before progressing to more sophisticated algorithms.

          other algorithms

          Basic covering algorithms construct rules that classify training data perfectly, that is, they tend to
          overfit the training set causing insufficient generalization and difficulty for processing new data.
          However, for applications in real world domains, methods for handling noisy data, mechanisms
          for avoiding overfitting even on training data, and relaxation requirements of the constraints are
          needed. Pruning is one of the ways of dealing with these problems, and it approaches the problem
          of overfitting by learning a general concept from the training set “to improve the prediction of
          unseen instance”. The concept of Reduced Error Pruning (REP) was developed by, where some
          of the training examples were withheld as a test set and performance of the rule was measured
          on them. Also, Incremental Reduced Error Pruning (IREP) has proven to be efficient in handling
          over-fitting, and it form the basis of RIPPER. SLIPPER (Simple Learner with Iterative Pruning to
          Produce Error Reduction) uses “confidence-rated boosting to learn an ensemble of rules.”

          applications of rule-based algorithms

          Rule  based  algorithms  are  widely  used  for  deriving  classification  rules  applied  in  medical
          sciences  for  diagnosing  illnesses,  business  planning,  banking  government  and  different
          disciplines of science. Particularly, covering algorithms have deep roots in machine learning.
          Within data mining, covering algorithms including SWAP-1, RIPPER, and DAIRY are used in
          text classification, adapted in gene expression programming for discovering classification rules.




              Task    Explain how will you remove the training data covered by rule R.


          4.10 combining techniques


          Data mining is an application-driven field where research questions tend to be motivated by
          real-world data sets. In this context, a broad spectrum of formalisms and techniques has been
          proposed by researchers in a large number of applications. Organizing them in inherently rather
          difficult; that is why we highlight the central role played by the various types of data motivating
          the current research.




                                           LoveLy professionaL university                                    81
   82   83   84   85   86   87   88   89   90   91   92