Page 141 - DCAP208_Management Support Systems
P. 141
Management Support Systems
Notes Data integration: At this stage, multiple data sources, often heterogeneous, may be
combined in a common source.
Data selection: At this step, the data relevant to the analysis is decided on and retrieved
from the data collection.
Data transformation: It is also known as data consolidation, it is a phase in which the
selected data is transformed into forms appropriate for the mining procedure.
Data mining: It is the crucial step in which clever techniques are applied to extract patterns
potentially useful.
Pattern evaluation: In this step, strictly interesting patterns representing knowledge are
identified based on given measures.
Knowledge representation: It is the final phase in which the discovered knowledge is
visually represented to the user. This essential step uses visualization techniques to help
users understand and interpret the data mining results.
It is common to combine some of these steps together. For instance, data cleaning and data
integration can be performed together as a pre-processing phase to generate a data warehouse.
Data selection and data transformation can also be combined where the consolidation of the data is
the result of the selection, or, as for the case of data warehouses, the selection is done on
transformed data.
The KDD is an iterative process. Once the discovered knowledge is presented to the user, the
evaluation measures can be enhanced, the mining can be further refined, new data can be
selected or further transformed, or new data sources can be integrated, in order to get different,
more appropriate results.
Data mining derives its name from the similarities between searching for valuable information
in a large database and mining rocks for a vein of valuable ore. Both imply either sifting
through a large amount of material or ingeniously probing the material to exactly pinpoint
where the values reside. It is, however, a misnomer, since mining for gold in rocks is usually
called “gold mining” and not “rock mining”, thus by analogy, data mining should have been
called “knowledge mining” instead. Nevertheless, data mining became the accepted customary
term, and very rapidly a trend that even overshadowed more general terms such as knowledge
discovery in databases (KDD) that describe a more complete process.
!
Caution Other similar terms referring to data mining are: data dredging, knowledge
extraction and pattern discovery.
9.1.3 Types of Data
In principle, data mining is not specific to one type of media or data. Data mining should be
applicable to any kind of information repository. However, algorithms and approaches may
differ when applied to different types of data. Indeed, the challenges presented by different
types of data vary significantly. Data mining is being put into use and studied for databases,
including relational databases, object-relational databases and object-oriented databases, data
warehouses, transactional databases, unstructured and semi-structured repositories such as the
World Wide Web, advanced databases such as spatial databases, multimedia databases,
time-series databases and textual databases, and even flat files. Here are some examples in more
detail:
134 LOVELY PROFESSIONAL UNIVERSITY