Page 159 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 159
Unit 8: Data Warehouse Refreshment
In order to understand which kind of tools the refreshment process needs, it is important to notes
locate it within the global data warehouse lifecycle which is defined by three following phases:
Design Phase
The design phase consists of the definition of user views, auxiliary views, source extractors,
data cleaners, data integrators and all others features that guarantee an explicit specification of
the data warehouse application. These specifications could be done with respect to abstraction
levels (conceptual, logical and physical) ad user perspectives (source view, enterprise view, client
views). The result of the design is a set of formal or semiformal specification which constitutes
the metadata used by the data warehouse system and applications.
Loading Phase
The loading phase consists of the initial data warehouse instantiation which is the initial
computation of the data warehouse content. This initial loading is globally a sequential process
of four steps:
1. Preparation
2. Integration
3. High level aggregation
4. Customization
The first step is done for each source and consists of data extraction, data cleaning and possibly
data archiving before or after cleaning. The second step consists of data integration which is
reconciliation of data originated from heterogeneous sources and derivation of the base relations
of the ODS. The third step consists of the computation of aggregated views from base views. In
all three steps not just the loading of data but also the loading of indexed is of crucial importance
for query and update performance. While the data extracted from eth sources and integrated in
the ODS are considered as ground data with very low-level aggregation the data in aggregated
views are generally highly summarized using aggregation functions. These aggregated views
constitute what is sometimes called the CDS, i.e. the set of materialized views from which data
marts are derived. The fourth step consists of the derivation and customization of the user views
which define the data marts. Customization refers to various presentations needed by the users
for multidimensional data.
Refreshment Phase
The refreshment phase has a data flow similar to the loading phase but, while the loading process
is a massive feeding of the data warehouse the refreshment process capture the differential
changes that occurred in the sources and propagates them through the hierarchy of data stores.
The preparation step extract from each source the data that characterize the changes that have
occurred in this source since the last extraction. As for the loading phase, these data are cleaned
and possibly archived before their integration. The integration step reconciles the source changes
coming from multiple sources and adds them to the ODS. The aggregation step computes
incrementally the hierarchy of aggregated views using these changes. The customization step
propagates the summarized data to the data marts.
Requirements and Difficulties of Data Warehouse Refreshment
The refreshment of a data warehouse is an important process which determines the effective
usability of the data collected and aggregated from the sources. Indeed the quality of data
provided to the decision makers depends on the capability of the data warehouse system to
LoveLy professionaL university 153