Page 176 - DCAP208_Management Support Systems
P. 176

Unit 10: Data Mining Tools and Techniques




          6.   ................... is a technique that classifies each record in a dataset based on a combination of  Notes
               the classes of the k record(s) most similar to it in a historical dataset (where k 1).
          7.   ................... is the method by which like records are grouped together.

          8.   A ................... is a predictive model that, as its name implies, can be viewed as a tree.
          9.   Artificial ................... are computer programs implementing sophisticated pattern detection
               and machine learning algorithms on a computer.

          10.  ................... is the form of data mining that most closely resembles the process that most
               people think about when they think about data mining, namely “mining” for gold through
               a vast database.

          10.3 Text Mining

          Text mining is the process of using computer technology to sift through text documents for the
          purposes of research and analysis. It is often considered very similar to the process known as
          data mining, but it relies on special programming to look in uncategorized text and find meaning
          or patterns instead of analyzing pre-categorized database information. Text mining has many
          applications in areas like science, marketing, and data organization.
          The complexity involved in organizing words into language is much too extreme for computers
          to handle, but scientists have worked hard to improve this kind of programming. Many methods
          have been developed that let scientists identify phrases and discover facts about text. This is
          generally not the same as fully deciphering the meaning, but it allows for shortcuts that achieve
          many of the same goals. Text mining takes advantage of some of these techniques, and as this
          technology improves, text mining is generally expected to improve as well.
          Experts use text information analysis primarily to do research into written documents. Large
          amounts of written data can be hard to analyze because of the tremendous amount of time
          required. Computers can go through this text much quicker, but they can’t understand it. Text
          mining techniques allow computers to find useful trends in text, presenting the data in a way
          that may reveal new facts or allow experts to make discoveries.
          An example of a use for this technology would be market research.

          Experts could analyze search results on a product name and have the program look for phrases
          that express user sentiment. In this way, they may find out how people really feel about their
          product in a very detailed way. They could also simply look for their product and see which
          phrases were popping up most often, and this might help them develop new ideas about how to
          please their customers.
          Another use for mining text is analyzing scientific papers on similar subjects looking for new
          trends or agreements. This has allowed some scientists to make predictive assumptions that
          have proven useful in fields like protein analysis. Some experts think these sorts of applications
          may eventually provide unexpected discoveries.
          A process called data mining is actually quite similar to the mining of text, but it is generally less
          complex to do because it relies on text that’s already been formatted into categories.
          For example, the software could go through all the information for job applicants in a database,
          looking for trends.

               !
             Caution  Text mining is more difficult for computers to do because pure text is harder to
             analyze than data with categories.



                                           LOVELY PROFESSIONAL UNIVERSITY                                   169
   171   172   173   174   175   176   177   178   179   180   181