Page 142 - DCAP208_Management Support Systems
P. 142

Unit 9: Data Mining




               Flat files: Flat files are actually the most common data source for data mining algorithms,  Notes
               especially at the research level. Flat files are simple data files in text or binary format with
               a structure known by the data mining algorithm to be applied. The data in these files can
               be transactions, time-series data, scientific measurements, etc.

               Relational Databases: Briefly, a relational database consists of a set of tables containing
               either values of entity attributes, or values of attributes from entity relationships. Tables
               have columns and rows, where columns represent attributes and rows represent tuples.
               A tuple in a relational table corresponds to either an object or a relationship between
               objects and is identified by a set of attribute values representing a unique key.


                 Example: In Figure 9.2 we present some relations Customer, Items, and Borrow
          representing business activity in a fictitious video store OurVideoStore. These relations are just
          a subset of what could be a database for the video store and is given as an example.

              Figure 9.2: Fragments of  Some  Relations  from a Relational Database for OurVideoStore






















          Source:  http://webdocs.cs.ualberta.ca/~zaiane/courses/cmput690/notes/Chapter1/
          The most commonly used query language for relational database is SQL, which allows retrieval
          and manipulation of the data stored in the tables, as well as the calculation of aggregate functions
          such as average, sum, min, max and count. For instance, an SQL query to select the videos
          grouped by category would be:
          SELECT count(*) FROM Items WHERE type=video GROUP BY category.
          Data mining algorithms using relational databases can be more versatile than data mining
          algorithms specifically written for flat files, since they can take advantage of the structure
          inherent to relational databases. While data mining can benefit from SQL for data selection,
          transformation and consolidation, it goes beyond what SQL could provide, such as predicting,
          comparing, detecting deviations, etc.

               Data Warehouses: A data warehouse as a storehouse, is a repository of data collected from
               multiple data sources (often heterogeneous) and is intended to be used as a whole under
               the same unified schema. A data warehouse gives the option to analyze data from different
               sources under the same roof. Let us suppose that OurVideoStore becomes a franchise in
               North America. Many video stores belonging to OurVideoStore company may have
               different databases and different structures. If the executive of the company wants to
               access the data from all stores for strategic decision-making, future direction, marketing,





                                           LOVELY PROFESSIONAL UNIVERSITY                                   135
   137   138   139   140   141   142   143   144   145   146   147