Page 176 - DCAP208_Management Support Systems
P. 176
Unit 10: Data Mining Tools and Techniques
6. ................... is a technique that classifies each record in a dataset based on a combination of Notes
the classes of the k record(s) most similar to it in a historical dataset (where k 1).
7. ................... is the method by which like records are grouped together.
8. A ................... is a predictive model that, as its name implies, can be viewed as a tree.
9. Artificial ................... are computer programs implementing sophisticated pattern detection
and machine learning algorithms on a computer.
10. ................... is the form of data mining that most closely resembles the process that most
people think about when they think about data mining, namely “mining” for gold through
a vast database.
10.3 Text Mining
Text mining is the process of using computer technology to sift through text documents for the
purposes of research and analysis. It is often considered very similar to the process known as
data mining, but it relies on special programming to look in uncategorized text and find meaning
or patterns instead of analyzing pre-categorized database information. Text mining has many
applications in areas like science, marketing, and data organization.
The complexity involved in organizing words into language is much too extreme for computers
to handle, but scientists have worked hard to improve this kind of programming. Many methods
have been developed that let scientists identify phrases and discover facts about text. This is
generally not the same as fully deciphering the meaning, but it allows for shortcuts that achieve
many of the same goals. Text mining takes advantage of some of these techniques, and as this
technology improves, text mining is generally expected to improve as well.
Experts use text information analysis primarily to do research into written documents. Large
amounts of written data can be hard to analyze because of the tremendous amount of time
required. Computers can go through this text much quicker, but they can’t understand it. Text
mining techniques allow computers to find useful trends in text, presenting the data in a way
that may reveal new facts or allow experts to make discoveries.
An example of a use for this technology would be market research.
Experts could analyze search results on a product name and have the program look for phrases
that express user sentiment. In this way, they may find out how people really feel about their
product in a very detailed way. They could also simply look for their product and see which
phrases were popping up most often, and this might help them develop new ideas about how to
please their customers.
Another use for mining text is analyzing scientific papers on similar subjects looking for new
trends or agreements. This has allowed some scientists to make predictive assumptions that
have proven useful in fields like protein analysis. Some experts think these sorts of applications
may eventually provide unexpected discoveries.
A process called data mining is actually quite similar to the mining of text, but it is generally less
complex to do because it relies on text that’s already been formatted into categories.
For example, the software could go through all the information for job applicants in a database,
looking for trends.
!
Caution Text mining is more difficult for computers to do because pure text is harder to
analyze than data with categories.
LOVELY PROFESSIONAL UNIVERSITY 169