Page 20 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 20

Data Warehousing and Data Mining




                    notes          of data warehouses is that data is stored at its most elemental level for use in reporting and
                                   information analysis.
                                   Within this generic intent, there are two primary approaches to organising the data in a data
                                   warehouse.
                                   The  first  is  using  a  “dimensional”  approach.  In  this  style,  information  is  stored  as  “facts”
                                   which are numeric or text data that capture specific data about a single transaction or event,
                                   and “dimensions” which contain reference information that allows each transaction or event
                                   to be classified in various ways. As an example, a sales transaction would be broken up into
                                   facts such as the number of products ordered, and the price paid, and dimensions such as date,
                                   customer, product, geographical location and salesperson. The main advantages of a dimensional
                                   approach  is  that  the  Data  Warehouse  is  easy  for  business  staff  with  limited  information
                                   technology experience to understand and use. Also, because the data is pre-processed into the
                                   dimensional form, the Data Warehouse tends to operate very quickly. The main disadvantage of
                                   the dimensional approach is that it is quite difficult to add or change later if the company changes
                                   the way in which it does business.
                                   The second approach uses database normalization. In this style, the data in the data warehouse is
                                   stored in third normal form. The main advantage of this approach is that it is quite straightforward
                                   to add new information into the database the primary disadvantage of this approach is that it can
                                   be rather slow to produce information and reports.



                                      Task     Database  means  to  keep  records  in  a  particular  format.  What  about  data
                                     warehouse?


                                   1.8 getting Multidimensional Data out of the Warehouse

                                   Data warehouse, generally speaking, is a kind of analytical tool used in business area. Since the
                                   amount of data of such a database is usually far beyond our imagination, and the inter-relation
                                   among  those  data  are  often  intertwined,  multidimensional  data,  with  its  multidimensional
                                   structure, are very popular for database as described.
                                   A key feature of multidimensional data is that it can be viewed from different aspects, leading to
                                   a broader picture of the whole data and a clear view of the trend. Therefore, such kind of data is
                                   often used as part of OLAP to server users with a high-speed and effective query.
                                   Getting multidimensional data out of the data warehouse are as follows:
                                   1.   Large data volumes, e.g., sales, telephone calls
                                       (a)   Giga-, Tera-, Peta-, Exa-byte
                                   2.   OLAP = On-Line Analytical Processing

                                       (a)   Interactive analysis
                                       (b)   Explorative discovery
                                       (c)   Fast response times required
                                   3.   OLAP operations

                                       (a)   Aggregation of data
                                       (b)   Standard aggregations operator, e.g., SUM
                                       (c)   Starting level, (Quarter, City)




          14                               LoveLy professionaL university
   15   16   17   18   19   20   21   22   23   24   25