Page 91 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 91

Unit 4: Data Mining Classification





          l z  Data cleaning, relevance analysis and data transformation are the preprocessing steps that   notes
               may be applied to the data in order to help improve the accuracy, efficiency, and scalability
               of the classification or prediction process.

          l z  Classification and prediction methods can be compared and evaluated ac cording to the
               criteria of Predictive accuracy, Speed, Robustness, Scalability and Interpretability.
          l z  A decision tree is a flow-chart-like tree structure, where each internal node denotes a test on
               an attribute, each branch represents an outcome of the test, and leaf nodes represent classes
               or class distributions. The topmost node in a tree is the root node.

          4.12 keywords

          Bayes theorem: Bayesian classification is based on Bayes theorem.
          Bayesian belief networks: These are graphical models, which unlike naive Bayesian classifiers,
          allow the representation of dependencies among subsets of attributes
          Bayesian classification: Bayesian classifiers are statistical classifiers.
          Classification: Classification is a data mining technique used to predict group membership for
          data instances.
          Data cleaning: Data cleaning refers to the preprocessing of data in order to remove or reduce
          noise (by applying smoothing techniques, for example), and the treatment of missing values (e.g.,
          by replacing a missing value with the most commonly occurring value for that attribute, or with
          the most probable value based on statistics).

          Decision Tree: A decision tree is a flow-chart-like tree structure, where each internal node denotes
          a test on an attribute, each branch represents an outcome of the test, and leaf nodes represent
          classes or class distributions. The topmost node in a tree is the root node.

          Decision tree induction: The automatic generation of decision rules from examples is known as
          rule induction or automatic rule induction.
          Interpretability: This refers is the level of understanding and insight that is provided by the
          learned model.
          Naive Bayesian classifiers: They assume that the effect of an attribute value on a given class is
          independent of the values of the other attributes.
          Overfitting:  Decision  trees  that  are  too  large  are  susceptible  to  a  phenomenon  called  as
          overfitting.
          Prediction: Prediction is similar to classification, except that for prediction, the results lie in the
          future.
          Predictive accuracy: This refers to the ability of the model to correctly predict the class label of
          new or previously unseen data.

          Scalability: This refers to the ability of the learned model to perform efficiently on large amounts
          of data.
          Supervised learning: The learning of the model is ‘supervised’ if it is told to which class each
          training sample belongs.
          Tree pruning: After building a decision tree a tree pruning step can be performed to reduce the
          size of the decision tree.
          Unsupervised learning: In unsupervised learning, the class labels of the training samples are not
          known, and the number or set of classes to be learned may not be known in advance.





                                           LoveLy professionaL university                                    85
   86   87   88   89   90   91   92   93   94   95   96