Page 158 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 158
Data Warehousing and Data Mining
notes 8.1 Data Warehouse refreshment
The possibility of having “fresh data” in a warehouse is a key factor for success in business
applications. In many activities such as in retail, business applications rely on the proper
refreshment of their warehouses. For instance, Jahnke mentions in the case of WalMart, the
world’s most successful retailer. Many of WalMart’s large volume suppliers such as Procter &
Gamble have direct access to the WalMart data warehouse, so they deliver goods to specific
stores as needed. WalMart pays such companies for their products only when they are sold.
Procter & Gamble ships 40% of its items in this way eliminating paperwork and sale calls on both
sides. It is essential for the supplier to use fresh data in order to establish accurate shipment plans
and to know how much money is due from the retailer.
refreshment process within the Data Warehouse Lifecycle
The data warehouse can be defined as a hierarchy of data stores which goes from source data
to highly aggregated data. Between these tow extremes can be other data stores depending on
the requirements of OLAP applications. One of these stores in the Corporate Data Warehouse
store (CDW) which groups all aggregated views used for the generation of the data marts. The
corporate data store can be complemented by an Operational Data Store (ODS) which groups the
base data collected and integrated from the sources. Data extracted from each source can also be
stored in different data structures. This hierarchy of data stores is a logical way to represent the
data flow between the sources and the data marts. In practice all the intermediate states between
the source and the data marts can be represented in the same database.
Distinguish four levels in the construction of the hierarchy of stores. The first level includes three
major steps:
1. The extraction of data from the operation data sources
2. Their cleaning with respect to the common rules defined for the data warehouse store.
3. Their possible archiving in the case when integration needs some synchronization between
extraction.
Note However that this decomposition is only logical. The extraction step and part
of cleaning step can be grouped into the same software component, such as a wrapper or
a data migration tool.
When the extraction and cleaning steps are separated data need to be stored in between. This can
be done using one storage medium per source or one shared medium for all sources.
The second level is the integration step. This phase is often coupled with rich data transformation
capabilities into the same software component which usually performs the loading into the ODS
when it exists or into the CDW. The third level concerns the data aggregation for the purpose of
cubes construction. Finally the fourth level is a step of cube customization. All these steps can
also be grouped into the same software such as multi-database system.
152 LoveLy professionaL university