Page 91 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 91
Unit 4: Data Mining Classification
l z Data cleaning, relevance analysis and data transformation are the preprocessing steps that notes
may be applied to the data in order to help improve the accuracy, efficiency, and scalability
of the classification or prediction process.
l z Classification and prediction methods can be compared and evaluated ac cording to the
criteria of Predictive accuracy, Speed, Robustness, Scalability and Interpretability.
l z A decision tree is a flow-chart-like tree structure, where each internal node denotes a test on
an attribute, each branch represents an outcome of the test, and leaf nodes represent classes
or class distributions. The topmost node in a tree is the root node.
4.12 keywords
Bayes theorem: Bayesian classification is based on Bayes theorem.
Bayesian belief networks: These are graphical models, which unlike naive Bayesian classifiers,
allow the representation of dependencies among subsets of attributes
Bayesian classification: Bayesian classifiers are statistical classifiers.
Classification: Classification is a data mining technique used to predict group membership for
data instances.
Data cleaning: Data cleaning refers to the preprocessing of data in order to remove or reduce
noise (by applying smoothing techniques, for example), and the treatment of missing values (e.g.,
by replacing a missing value with the most commonly occurring value for that attribute, or with
the most probable value based on statistics).
Decision Tree: A decision tree is a flow-chart-like tree structure, where each internal node denotes
a test on an attribute, each branch represents an outcome of the test, and leaf nodes represent
classes or class distributions. The topmost node in a tree is the root node.
Decision tree induction: The automatic generation of decision rules from examples is known as
rule induction or automatic rule induction.
Interpretability: This refers is the level of understanding and insight that is provided by the
learned model.
Naive Bayesian classifiers: They assume that the effect of an attribute value on a given class is
independent of the values of the other attributes.
Overfitting: Decision trees that are too large are susceptible to a phenomenon called as
overfitting.
Prediction: Prediction is similar to classification, except that for prediction, the results lie in the
future.
Predictive accuracy: This refers to the ability of the model to correctly predict the class label of
new or previously unseen data.
Scalability: This refers to the ability of the learned model to perform efficiently on large amounts
of data.
Supervised learning: The learning of the model is ‘supervised’ if it is told to which class each
training sample belongs.
Tree pruning: After building a decision tree a tree pruning step can be performed to reduce the
size of the decision tree.
Unsupervised learning: In unsupervised learning, the class labels of the training samples are not
known, and the number or set of classes to be learned may not be known in advance.
LoveLy professionaL university 85