Page 179 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 179
Unit 9: Data Warehouse Refreshment – II
aggregated data and can be organized into a multidimensional structure. Data extracted from notes
each source can also be stored in intermediate data recipients. Obviously, this hierarchy of data
stores is a logical way to represent the data flows which go from the sources to the data marts.
All these stores are not necessarily materialized, and if they are, they can just constitute different
layers of the same database.
Figure 9.2 shows a typical data warehouse architecture. This is a logical view whose operational
implementation receives many different answers in the data warehousing products. Depending
on each data source, extraction and cleaning can be done by the same wrapper or by distinct
tools. Similarly data reconciliation (also called multi-source cleaning) can be separated from or
merged with data integration (multi-sources operations). High level aggregation can be seen as a
set of computation techniques ranging from simple statistical functions to advanced data mining
algorithms. Customisation techniques may vary from one data mart to another, depending on
the way decision makers want to see the elaborated data.
figure 9.2: Data Warehouse architecture
The refreshment of a data warehouse is an important process which determines the effective
usability of the data collected and aggregated from the sources. Indeed, the quality of data
provided to the decision makers depends on the capability of the data warehouse system to
convey in a reasonable time, from the sources to the data marts, the changes made at the data
sources. Most of the design decisions are then concerned by the choice of data structures and
update techniques that optimise the refreshment of the data warehouse.
There is a quiet great confusion in the literature concerning data warehouse refreshment. Indeed,
this process is often either reduced to view maintenance problem or confused with the data
loading phase. Our purpose in this paper is to show that the data warehouse refreshment is a
more complex than the view maintenance problem, and different from the loading process. We
define the refreshment process as a workflow whose activities depend on the available products
for data extraction, cleaning and integration, and whose triggering events of these activities
depend on the application domain and on the required quality in terms of data freshness.
9.3.1 view Maintenance, Data Loading and Data refreshment
Data refreshment in data warehouses is generally confused with data loading as done during the
initial phase or with update propagation through a set of materialized views. Both analogies are
wrong. The following paragraphs argument on the differences between data loading and data
refreshment, and between view maintenance and data refreshment.
LoveLy professionaL university 173