Page 183 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 183
Unit 9: Data Warehouse Refreshment – II
describe the refreshment activities and their organization as a workflow. Then we give examples notes
of different workflow scenarios to show how refreshment may be a dynamic and evolving
process. Finally, we summarize the different perspectives through which a given refreshment
scenario should be considered.
The refreshment process is similar to the loading process in its data flow but, while the loading
process is a massive feeding of the data warehouse, the refreshment process captures the
differential changes hold in the sources and propagates them through the hierarchy of data stores
in the data warehouse. The preparation step extracts from each source the data that characterises
the changes that have occurred in this source since the last extraction. As for loading, this data
is cleaned and possibly archived before its integration. The integration step reconciliates the
source changes coming from multiple sources and adds them to the ODS. The aggregation
step recomputes incrementally the hierarchy of aggregated views using these changes. The
customisation step propagates the summarized data to the data marts. As well as for the loading
phase, this is a logical decomposition whose operational implementation receives many different
answers in the data warehouse products. This logical view allows a certain traceability of the
refreshment process. Figure 9.4 shows the activities of the refreshment process as well as a sample
of the coordinating events.
Figure 9.4: The Generic Workflow for the Refreshment Process
In workflow systems, activities are coordinated by control flows which may be notification of
process commitment, emails issued by agents, temporal events, or any other trigger events. In the
refreshment process, coordination is done through a wide range of event types.
You can distinguish several event types which may trigger and synchronize the refreshment
activities. They might be temporal events, termination events or any other user-defined event.
Depending on the refreshment scenario, one can choose an appropriate set of event types which
allows to achieve the correct level of synchronization.
Activities of the refreshment workflow are not executed as soon as they are triggered, they may
depend on the current state of the input data stores. For example, if the extraction is triggered
periodically, it is actually executed only when there are effective changes in the source log file. If
LoveLy professionaL university 177