Page 269 - DCAP310_INTRODUCTION_TO_ARTIFICIAL_INTELLIGENCE_AND_EXPERT_SYSTEMS
P. 269

Unit 14: Types of Learning




          14.7 Inductive Bias                                                                   Notes

          The inductive bias of a learning algorithm is the set of assumptions that the learner uses to
          predict outputs given inputs that it has not encountered (Mitchell, 1980).
          In machine learning, one aims to construct algorithms that are able to learn to predict a certain
          target output. To achieve this, the learning algorithm is presented some training examples that
          demonstrate the intended relation of input and output values. Then the learner is supposed to
          approximate the correct output, even for examples that have not been shown during training.
          Without any additional assumptions, this task cannot be solved exactly since unseen situations
          might have an arbitrary output value. The kind of necessary assumptions about the nature of the
          target function are subsumed in the term inductive bias (Mitchell, 1980; desJardins and Gordon,
          1995). A classical example of an inductive bias is Occam’s Razor, assuming that the simplest
          consistent hypothesis about the target function is actually the best. Here consistent means that
          the hypothesis of the learner yields correct outputs for all of the examples that have been given
          to the algorithm. Approaches to a more formal definition of inductive bias are based on
          mathematical logic. Here, the inductive bias is a logical formula that, together with the training
          data, logically entails the hypothesis generated by the learner.


               !
             Caution Unfortunately, this strict formalism fails in many practical cases, where the
             inductive bias can only be given as a rough description (e.g. in the case of neural networks),
             or not at all.
          The following is a list of common inductive biases in machine learning algorithms.

               Maximum Conditional Independence: If the hypothesis can be cast in a Bayesian framework,
               try to maximize conditional independence. This is the bias used in the Naive Bayes
               classifier.
               Minimum Cross-validation Error: When trying to choose among hypotheses, select the
               hypothesis with the lowest cross-validation error. Although cross-validation may seem
               to be free of bias, the No Free Lunch theorems show that cross-validation must be biased.

               Maximum Margin: When drawing a boundary between two classes, attempt to maximize
               the width of the boundary. This is the bias used in Support Vector Machines. The
               assumption is that distinct classes tend to be separated by wide boundaries.

               Minimum Description Length: When forming a hypothesis, attempt to minimize the length
               of the description of the hypothesis. The assumption is that simpler hypotheses are more
               likely to be true.

               Minimum Features: Unless there is good evidence that a feature is useful, it should be
               deleted. This is the assumption behind feature selection algorithms.
               Nearest Neighbors: Assume that most of the cases in a small neighborhood in feature space
               belong to the same class. Given a case for which the class is unknown, guess that it belongs
               to the same class as the majority in its immediate neighborhood. This is the bias used in
               the k-nearest neighbor algorithm. The assumption is that cases that are near each other
               tend to belong to the same class.










                                           LOVELY PROFESSIONAL UNIVERSITY                                   263
   264   265   266   267   268   269   270   271   272   273   274