Page 175 - DCAP208_Management Support Systems
P. 175
Management Support Systems
Notes In order to accomplish this the neural network tries to have the hidden nodes extract features
from the input nodes that efficiently describe the record represented at the input layer. This
forced “squeezing” of the data through the narrow hidden layer forces the neural network to
extract only those predictors and combinations of predictors that are best at recreating the input
record. The link weights used to create the inputs to the hidden nodes are effectively creating
features that are combinations of the input nodes values.
The concepts of neural networks are discussed in detail in the next unit.
10.2.6 Rule Induction
Rule induction is one of the major forms of data mining and is perhaps the most common form
of knowledge discovery in unsupervised learning systems. It is also perhaps the form of data
mining that most closely resembles the process that most people think about when they think
about data mining, namely “mining” for gold through a vast database. The gold in this case
would be a rule that is interesting – that tells you something about your database that you didn’t
already know and probably weren’t able to explicitly articulate (aside from saying “show me
things that are interesting”).
Rule induction on a data base can be a massive undertaking where all possible patterns are
systematically pulled out of the data and then an accuracy and significance are added to them
that tell the user how strong the pattern is and how likely it is to occur again. In general these
rules are relatively simple such as for a market basket database of items scanned in a consumer
market basket you might find interesting correlations in your database such as:
If bagels are purchased then cream cheese is purchased 90% of the time and this pattern
occurs in 3% of all shopping baskets.
If live plants are purchased from a hardware store then plant fertilizer is purchased 60% of
the time and these two items are bought together in 6% of the shopping baskets.
Did u know? The rules that are pulled from the database are extracted and ordered to be
presented to the user based on the percentage of times that they are correct and how often
they apply.
The bane of rule induction systems is also its strength - that it retrieves all possible interesting
patterns in the database. This is a strength in the sense that it leaves no stone unturned but it can
also be viewed as a weakness because the user can easily become overwhelmed with such a
large number of rules that it is difficult to look through all of them. You almost need a second
pass of data mining to go through the list of interesting rules that have been generated by the
rule induction system in the first place in order to find the most valuable gold nugget amongst
them all. This overabundance of patterns can also be problematic for the simple task of prediction
because all possible patterns are culled from the database there may be conflicting predictions
made by equally interesting rules. Automating the process of culling the most interesting rules
and of combing the recommendations of a variety of rules are well handled by many of the
commercially available rule induction systems on the market today and is also an area of active
research.
Self Assessment
Fill in the blanks:
5. ................... is a branch of mathematics concerning the collection and the description of
data.
168 LOVELY PROFESSIONAL UNIVERSITY