Page 141 - DCAP208_Management Support Systems
P. 141

Management Support Systems




                    Notes              Data integration: At this stage, multiple data sources, often heterogeneous, may be
                                       combined in a common source.

                                       Data selection: At this step, the data relevant to the analysis is decided on and retrieved
                                       from the data collection.
                                       Data transformation: It is also known as data consolidation, it is a phase in which the
                                       selected data is transformed into forms appropriate for the mining procedure.
                                       Data mining: It is the crucial step in which clever techniques are applied to extract patterns
                                       potentially useful.

                                       Pattern evaluation: In this step, strictly interesting patterns representing knowledge are
                                       identified based on given measures.

                                       Knowledge representation: It is the final phase in which the discovered knowledge is
                                       visually represented to the user. This essential step uses visualization techniques to help
                                       users understand and interpret the data mining results.

                                   It is common to combine some of these steps together. For instance,  data cleaning and data
                                   integration can be performed together as a pre-processing phase to generate a data warehouse.
                                   Data selection and data transformation can also be combined where the consolidation of the data is
                                   the result of the selection, or, as for the case of data warehouses, the selection is done on
                                   transformed data.
                                   The KDD is an iterative process. Once the discovered knowledge is presented to the user, the
                                   evaluation measures can be enhanced, the mining can be further refined, new data can be
                                   selected or further transformed, or new data sources can be integrated, in order to get different,
                                   more appropriate results.

                                   Data mining derives its name from the similarities between searching for valuable information
                                   in a large database and mining rocks for a vein of valuable ore. Both imply either sifting
                                   through a large amount of material or ingeniously probing the material to exactly pinpoint
                                   where the values reside. It is, however, a misnomer, since mining for gold in rocks is usually
                                   called “gold mining” and not “rock mining”, thus by analogy, data mining should have been
                                   called “knowledge mining” instead. Nevertheless, data mining became the accepted customary
                                   term, and very rapidly a trend that even overshadowed more general terms such as knowledge
                                   discovery in databases (KDD) that describe a more complete process.

                                       !

                                     Caution  Other similar terms referring to data mining are: data dredging, knowledge
                                     extraction and pattern discovery.

                                   9.1.3 Types of Data

                                   In principle, data mining is not specific to one type of media or data. Data mining should be
                                   applicable to any kind of information repository. However, algorithms and approaches may
                                   differ when applied to different types of data. Indeed, the challenges presented by different
                                   types of data vary significantly. Data mining is being put into use and studied for databases,
                                   including relational databases, object-relational databases and object-oriented databases, data
                                   warehouses, transactional databases, unstructured and semi-structured repositories such as the
                                   World Wide Web, advanced databases such as spatial databases, multimedia databases,
                                   time-series databases and textual databases, and even flat files. Here are some examples in more
                                   detail:






          134                               LOVELY PROFESSIONAL UNIVERSITY
   136   137   138   139   140   141   142   143   144   145   146