Page 187 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 187
Unit 9: Data Warehouse Refreshment – II
of these changes. The triggering of the extraction may be also different from one source notes
to another. Different events can be defined, such as temporal events (periodic or fixed
absolute time), after each change detected on the source, on demand from the integration
process.
3. ODS-driven refreshment which defines part of the process which is automatically
monitored by the data warehouse system. This part concerns the integration phase. It may
be triggered at a synchronization point defined with respect to the ending of the preparation
phase. Integration can be considered as a whole and concerns all the source changes at the
same time. In this case, it can be triggered by an external event which might be a temporal
event or the ending of the preparation phase of the last source. The integration can also
be sequenced with respect to the termination of the preparation phase of each source, that
is extraction is integrated as soon as its cleaning is finished. The ODS can also monitor
the preparation phase and the aggregation phase by generation the relevant events that
triggers activities of these phases.
In the very simple case, one of the two first approaches is used as a single strategy. In a more
complex case, there may be as much strategies as the number of sources or high level aggregated
views. In between, there may be, for example, four different strategies corresponding to the
previous four phases. For some given user views, one can apply the client driven strategy (pull
strategy), while for other views one can apply the ODS-driven strategy (push strategy). Similarly,
some sources are solicited through a pull strategy while other apply a push strategy.
The strategy to choose depends on the semantic parameters but also on the tools available to
perform the refreshment activities (extraction, cleaning, integration). Some extraction tools do
also the cleaning in the fly while some integrators propagate immediately changes until the high
level views. Then, the generic workflow in Figure 9.4 is a logical view of the refreshment process.
It shows the main identified activities and the potential event types which can trigger them.
9.5 implementation issues
With respect to the implementation issues, different solutions can be considered. The conceptual
definition of the refreshment process by means of a workflow, leads naturally to envision an
implementation under the control of a common workflow system in the market, provided that
this latter one supplies event types and all features needed by the refreshment scenario. Another
solution we have preferred and consists in using active rules which should be executed under a
certain operational semantics. The rationale behind our choice is the flexibility and the evolutivity
provided by active rules. Indeed the refreshment strategy is not defined once for all; it may evolve
with the user needs, which may result in the change of the definition of materialized views or
the change of desired quality factors. It may also evolve when the actual values of the quality
factors slow down with the evolution of the data warehouse feeding or with the technology
used to implement it. Consequently, in order to master the complexity and the evolutivity of the
data warehouse, it is important to provide a flexible technology which allows to accommodate
this complexity and evolutivity. This is what active rules meant to provide. A prototype has
been developed and demonstrated in the context of the DWQ european research project on
Data warehouses. However, active rules cannot be considered as an alternative to workflow
representation. Workflow is a conceptual view of the refreshment process, while actives rules are
operational implementation of the workflow.
LoveLy professionaL university 181