Page 158 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 158

Data Warehousing and Data Mining




                    notes          8.1 Data Warehouse refreshment

                                   The possibility of having “fresh data” in a warehouse is a key factor for success in business
                                   applications.  In  many  activities  such  as  in  retail,  business  applications  rely  on  the  proper
                                   refreshment  of  their  warehouses.  For  instance,  Jahnke  mentions  in  the  case  of  WalMart,  the
                                   world’s most successful retailer. Many of WalMart’s large volume suppliers such as Procter &
                                   Gamble have direct access to the WalMart data warehouse, so they deliver goods to specific
                                   stores as needed. WalMart pays such companies for their products only when they are sold.
                                   Procter & Gamble ships 40% of its items in this way eliminating paperwork and sale calls on both
                                   sides. It is essential for the supplier to use fresh data in order to establish accurate shipment plans
                                   and to know how much money is due from the retailer.

                                   refreshment process within the Data Warehouse Lifecycle

                                   The data warehouse can be defined as a hierarchy of data stores which goes from source data
                                   to highly aggregated data. Between these tow extremes can be other data stores depending on
                                   the requirements of OLAP applications. One of these stores in the Corporate Data Warehouse
                                   store (CDW) which groups all aggregated views used for the generation of the data marts. The
                                   corporate data store can be complemented by an Operational Data Store (ODS) which groups the
                                   base data collected and integrated from the sources. Data extracted from each source can also be
                                   stored in different data structures. This hierarchy of data stores is a logical way to represent the
                                   data flow between the sources and the data marts. In practice all the intermediate states between
                                   the source and the data marts can be represented in the same database.
                                   Distinguish four levels in the construction of the hierarchy of stores. The first level includes three
                                   major steps:

                                   1.   The extraction of data from the operation data sources
                                   2.   Their cleaning with respect to the common rules defined for the data warehouse store.
                                   3.   Their possible archiving in the case when integration needs some synchronization between
                                       extraction.



                                      Note     However that this decomposition is only logical. The extraction step and part
                                     of cleaning step can be grouped into the same software component, such as a wrapper or
                                     a data migration tool.

                                   When the extraction and cleaning steps are separated data need to be stored in between. This can
                                   be done using one storage medium per source or one shared medium for all sources.
                                   The second level is the integration step. This phase is often coupled with rich data transformation
                                   capabilities into the same software component which usually performs the loading into the ODS
                                   when it exists or into the CDW. The third level concerns the data aggregation for the purpose of
                                   cubes construction. Finally the fourth level is a step of cube customization. All these steps can
                                   also be grouped into the same software such as multi-database system.















          152                              LoveLy professionaL university
   153   154   155   156   157   158   159   160   161   162   163