Page 264 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 264

Data Warehousing and Data Mining




                    notes          14.1 Quality Driven Data Warehouse Design

                                   After  the  basic  online  transaction  processing  (OLTP)  infrastructure  is  in  place  in  many
                                   organizations, not least through standardized enterprise resource planning tools such as SAP/
                                   R3, the focus of interest is now broadening in at least three directions:

                                   1.   A broader range of multimedia data sources inside and outside the organization,
                                   2.   A broader range of clients with diverse interest and capability profiles as well as situational
                                       parameters, and
                                   3.   The  conversion  of  the  massive  experiential  data  created  by  transaction  processing  into
                                       knowledge relevant for organizational learning and action.
                                   A wide range of information flow logistics architectures is being proposed under labels such as
                                   supply chain management and business-to-business e-commerce. In such architectures, databases
                                   can be considered the short- and medium-term intermediate stores of information whereas data
                                   warehouses serve for long-term memory, knowledge creation and management. One could also
                                   say that a data warehouse is a long-term information broker between the operational part and the
                                   reporting/planning part of a company, supporting the Controlling department.

                                   The traditional data warehouse (DW) architecture is shown in Figure 14.1. Physically, a data
                                   warehouse  system  consists  of  databases  (source  databases,  materialized  views  in  the  data
                                   warehouse), data transport agents that ship data from one database to another, and a repository
                                   which stores metadata about the system and its evolution. In this architecture heterogeneous
                                   information sources are first made accessible in a uniform way through extraction mechanisms
                                   called  wrappers,  then  mediators  take  on  the  task  of  information  integration  and  conflict
                                   resolution. The separation between wrappers and mediators is a considered design decision,
                                   reflecting the separation between service wrappers and request brokers in middleware systems
                                   such as CORBA.

                                                     figure 14.1: traditional Data Warehouse architecture



























                                   The  resulting  standardized  and  integrated  data  are  stored  as  materialized  views  in  the  data
                                   warehouse. These base views (often called ODS or operational data store) are usually just slightly
                                   aggregated. To customize them for different analyst users, data marts with more aggregated data
                                   about specific domains of interest are constructed as second-level caches which are then accessed
                                   by data analysis tools ranging from query facilities through spreadsheet tools to full-fledged data
                                   mining systems.


          258                              LoveLy professionaL university
   259   260   261   262   263   264   265   266   267   268   269