Page 187 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 187

Unit 9: Data Warehouse Refreshment – II




               of these changes. The triggering of the extraction may be also different from one source   notes
               to  another. Different  events can be defined, such  as  temporal  events  (periodic  or fixed
               absolute time), after each change detected on the source, on demand from the integration
               process.
          3.   ODS-driven  refreshment  which  defines  part  of  the  process  which  is  automatically
               monitored by the data warehouse system. This part concerns the integration phase. It may
               be triggered at a synchronization point defined with respect to the ending of the preparation
               phase. Integration can be considered as a whole and concerns all the source changes at the
               same time. In this case, it can be triggered by an external event which might be a temporal
               event or the ending of the preparation phase of the last source. The integration can also
               be sequenced with respect to the termination of the preparation phase of each source, that
               is extraction is integrated as soon as its cleaning is finished. The ODS can also monitor
               the preparation phase and the aggregation phase by generation the relevant events that
               triggers activities of these phases.
          In the very simple case, one of the two first approaches is used as a single strategy. In a more
          complex case, there may be as much strategies as the number of sources or high level aggregated
          views.  In  between,  there  may  be,  for  example,  four  different  strategies  corresponding  to  the
          previous four phases. For some given user views, one can apply the client driven strategy (pull
          strategy), while for other views one can apply the ODS-driven strategy (push strategy). Similarly,
          some sources are solicited through a pull strategy while other apply a push strategy.

          The strategy to choose depends on the semantic parameters but also on the tools available to
          perform the refreshment activities (extraction, cleaning, integration). Some extraction tools do
          also the cleaning in the fly while some integrators propagate immediately changes until the high
          level views. Then, the generic workflow in Figure 9.4 is a logical view of the refreshment process.
          It shows the main identified activities and the potential event types which can trigger them.

          9.5 implementation issues


          With respect to the implementation issues, different solutions can be considered. The conceptual
          definition of the refreshment process by means of a workflow, leads naturally to envision an
          implementation under the control of a common workflow system in the market, provided that
          this latter one supplies event types and all features needed by the refreshment scenario. Another
          solution we have preferred and consists in using active rules which should be executed under a
          certain operational semantics. The rationale behind our choice is the flexibility and the evolutivity
          provided by active rules. Indeed the refreshment strategy is not defined once for all; it may evolve
          with the user needs, which may result in the change of the definition of materialized views or
          the change of desired quality factors. It may also evolve when the actual values of the quality
          factors slow down with the evolution of the data warehouse feeding or with the technology
          used to implement it. Consequently, in order to master the complexity and the evolutivity of the
          data warehouse, it is important to provide a flexible technology which allows to accommodate
          this complexity and evolutivity. This is what active rules meant to provide. A prototype has
          been  developed  and  demonstrated  in  the  context  of  the  DWQ  european  research  project  on
          Data  warehouses.  However,  active  rules  cannot  be  considered  as  an  alternative  to  workflow
          representation. Workflow is a conceptual view of the refreshment process, while actives rules are
          operational implementation of the workflow.














                                           LoveLy professionaL university                                   181
   182   183   184   185   186   187   188   189   190   191   192