Page 175 - DCAP208_Management Support Systems
P. 175

Management Support Systems




                    Notes          In order to accomplish this the neural network tries to have the hidden nodes extract features
                                   from the input nodes that efficiently describe the record represented at the input layer. This
                                   forced “squeezing” of the data through the narrow hidden layer forces the neural network to
                                   extract only those predictors and combinations of predictors that are best at recreating the input
                                   record. The link weights used to create the inputs to the hidden nodes are effectively creating
                                   features that are combinations of the input nodes values.
                                   The concepts of neural networks are discussed in detail in the next unit.

                                   10.2.6 Rule Induction


                                   Rule induction is one of the major forms of data mining and is perhaps the most common form
                                   of knowledge discovery in unsupervised learning systems. It is also perhaps the form of data
                                   mining that most closely resembles the process that most people think about when they think
                                   about data mining, namely “mining” for gold through a vast database. The gold in this case
                                   would be a rule that is interesting – that tells you something about your database that you didn’t
                                   already know and probably weren’t able to explicitly articulate (aside from saying “show me
                                   things that are interesting”).

                                   Rule induction on a data base can be a massive undertaking where all possible patterns are
                                   systematically pulled out of the data and then an accuracy and significance are added to them
                                   that tell the user how strong the pattern is and how likely it is to occur again. In general these
                                   rules are relatively simple such as for a market basket database of items scanned in a consumer
                                   market basket you might find interesting correlations in your database such as:
                                       If bagels are purchased then cream cheese is purchased 90% of the time and this pattern
                                       occurs in 3% of all shopping baskets.

                                       If live plants are purchased from a hardware store then plant fertilizer is purchased 60% of
                                       the time and these two items are bought together in 6% of the shopping baskets.



                                     Did u know? The rules that are pulled from the database are extracted and ordered to be
                                     presented to the user based on the percentage of times that they are correct and how often
                                     they apply.
                                   The bane of rule induction systems is also its strength - that it retrieves all possible interesting
                                   patterns in the database. This is a strength in the sense that it leaves no stone unturned but it can
                                   also be viewed as a weakness because the user can easily become overwhelmed with such a
                                   large number of rules that it is difficult to look through all of them. You almost need a second
                                   pass of data mining to go through the list of interesting rules that have been generated by the
                                   rule induction system in the first place in order to find the most valuable gold nugget amongst
                                   them all. This overabundance of patterns can also be problematic for the simple task of prediction
                                   because all possible patterns are culled from the database there may be conflicting predictions
                                   made by equally interesting rules. Automating the process of culling the most interesting rules
                                   and of combing the recommendations of a variety of rules are well handled by many of the
                                   commercially available rule induction systems on the market today and is also an area of active
                                   research.

                                   Self Assessment

                                   Fill in the blanks:
                                   5.  ................... is a branch of mathematics concerning the collection and the description of
                                       data.



          168                               LOVELY PROFESSIONAL UNIVERSITY
   170   171   172   173   174   175   176   177   178   179   180