Page 181 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 181

Unit 9: Data Warehouse Refreshment – II




          The  difference  between  the  refreshment  process  and  the  loading  process  is  mainly  in  the   notes
          following.  First,  the  refreshment  process  may  have  a  complete  asynchronism  between  its
          different activities (preparation, integration, aggregation and customisation). Second, there may
          be  a  high  level  parallelism  within  the  preparation  activity  itself,  each  data  source  having  its
          own availability window and its own strategy of extraction. The synchronization is done by the
          integration activity. Another difference lies in the source availability. While the loading phase
          requires a long period of availability, the refreshment phase should not overload the operational
          applications which use the data sources. Then, each source provides a specific access frequency
          and a restricted availability duration. Finally, there are more constraints on response time for
          the refreshment process than for the loading process. Indeed, with respect to the users, the data
          warehouse does not exist before the initial loading, so the computation time is included within
          the design project duration. After the initial loading, the data becomes visible and should satisfy
          user requirements in terms of data availability, accessibility and freshness.

          View Maintenance vs. Data Refreshment

          The propagation of changes during the refreshment process is done through a set of independent
          activities among which we find the maintenance of the views stored in the ODS and CDW levels.
          The view maintenance phase consists in propagating a certain change raised in a given source
          over a set of views stored at the ODS or CDW level. Such a phase is a classical materialized
          view maintenance problem except that, in data warehouses, the changes to propagate into the
          aggregated views are not exactly those occurred in the sources, but the result of pre-treatments
          performed  by  other  refreshment  activities  such  as  data  cleaning  and  multi-source  data
          reconciliation.
          The view maintenance problem has been intensively studied in the database research community.
          Most of the references focus on the problems raised by the maintenance of a set of materialized
          (also called concrete) views derived from a set of base relations when the current state of the base
          relations is modified. The main results concern:
          1.   Self-maintainability: Results concerning the self-maintainability are generalized for a set
               of views : a set of view V is self-maintainable with respect to the changes to the underlying
               base relations if the changes may be propagated in every views in V without querying the
               base relations (i.e. the information stored in the concrete views plus the instance of the
               changes are sufficient to maintain the views).
          2.   Coherent and efficient update propagation: Various algorithms are provided to schedule
               updates  propagation  through  each  individual  view,  taking  care  of  interdependencies
               between views, which may lead to possible inconsistencies. For this purpose, auxiliary views
               are often introduced to facilitate update propagation and to enforce self-maintainability.
          Results over the self-maintainability of a set of views are of a great interest in the data warehouse
          context, and it is commonly admitted that the set of views stored in a data warehouse have
          to  be  globally  selfmaintainable.  The  rationale  behind  this  recommendation  is  that  the  self-
          maintainability  is  a  strong  requirement  imposed  by  the  operational  sources  in  order  to  not
          overload their regular activity.
          Research on data warehouse refreshment has mainly focused on update propagation through
          materialized views. Many papers have been published on this topic, but a very few is devoted to
          the whole refreshment process as defined before. We consider view maintenance just as one step
          of the complete refreshment process. Other steps concern data cleaning, data reconciliation, data
          customisation, and if needed data archiving. In another hand, extraction and cleaning strategies
          may  vary  from  one  source  to  another,  as  well  as  update  propagation  which  may  vary  from
          one user view to another, depending for example on the desired freshness for data. So the data
          warehouse refreshment process cannot be limited to a view maintenance process.





                                           LoveLy professionaL university                                   175
   176   177   178   179   180   181   182   183   184   185   186