Page 138 - DCAP208_Management Support Systems
P. 138

Unit 9: Data Mining




          Like the term artificial intelligence, data mining is an umbrella term that can be applied to a  Notes
          number of varying activities. In the corporate world, data mining is used most frequently to
          determine the direction of trends and predict the future. It is employed to build models and
          decision support systems that give people information they can use. Data mining takes a frontline
          role in the battle against terrorism. It was supposedly used to determine the leader of the 9/11
          attacks.
          Data miners are statisticians who use techniques with names like near-neighbor models,
          k-means clustering, holdout method, k-fold cross validation, the leave-one-out method, and so
          on. Regression techniques are used to subtract irrelevant patterns, leaving only useful
          information. The term Bayesian is seen frequently in the field, referring to a class of inference
          techniques that predict the likelihood of future events by combining prior probabilities and
          probabilities based on conditional events.




             Notes  Spam filtering is arguably a form of data mining, which automatically brings
             relevant messages to the surface from a chaotic sea of phishing attempts and Viagra
             pitches.
          Decision trees are used to filter mountains of data. In a decision tree, all data passes through an
          entrance node, where it faces a filter that separates the data into streams depending on its
          characteristics.


                 Example: Data about consumer behavior is likely to be filtered based on demographic
          factors.

          Data mining is not primarily about fancy graphs and visualization techniques, but it does
          employ them to show what it has found. It is known that we can absorb more statistical information
          visually than verbally and this format for presentation can be very persuasive and powerful if
          used in the right context.

          As our civilization becomes increasingly data-saturated and sensors are distributed en masse
          into our local environments, we will inadvertently discover things that might be missed on the
          first pass over. Data mining will let us correct these mistakes and discover new insights based on
          past data, giving us more bang for our data storage buck.

          9.1.1 Types of Information

          We have been collecting a myriad of data, from simple numerical measurements and text
          documents, to more complex information such as spatial data, multimedia channels, and hypertext
          documents. Here is a non-exclusive list of a variety of information collected in digital form in
          databases and in flat files.
               Business transactions: Every transaction in the business industry is (often) “memorized”
               for perpetuity. Such transactions are usually time related and can be inter-business deals
               such as purchases, exchanges, banking, stock, etc., or intra-business operations such as
               management of in-house wares and assets. Large department stores, for example, thanks
               to the widespread use of bar codes, store millions of transactions daily representing often
               terabytes of data. Storage space is not the major problem, as the price of hard disks is
               continuously dropping, but the effective use of the data in a reasonable time frame for
               competitive decision-making is definitely the most important problem to solve for
               businesses that struggle to survive in a highly competitive world.





                                           LOVELY PROFESSIONAL UNIVERSITY                                   131
   133   134   135   136   137   138   139   140   141   142   143