Page 159 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 159

Unit 8: Data Warehouse Refreshment




          In order to understand which kind of tools the refreshment process needs, it is important to   notes
          locate it within the global data warehouse lifecycle which is defined by three following phases:

          Design Phase

          The  design  phase  consists  of  the  definition  of  user  views,  auxiliary  views,  source  extractors,
          data cleaners, data integrators and all others features that guarantee an explicit specification of
          the data warehouse application. These specifications could be done with respect to abstraction
          levels (conceptual, logical and physical) ad user perspectives (source view, enterprise view, client
          views). The result of the design is a set of formal or semiformal specification which constitutes
          the metadata used by the data warehouse system and applications.

          Loading Phase

          The  loading  phase  consists  of  the  initial  data  warehouse  instantiation  which  is  the  initial
          computation of the data warehouse content. This initial loading is globally a sequential process
          of four steps:
          1.   Preparation
          2.   Integration

          3.   High level aggregation
          4.   Customization
          The first step is done for each source and consists of data extraction, data cleaning and possibly
          data archiving before or after cleaning. The second step consists of data integration which is
          reconciliation of data originated from heterogeneous sources and derivation of the base relations
          of the ODS. The third step consists of the computation of aggregated views from base views. In
          all three steps not just the loading of data but also the loading of indexed is of crucial importance
          for query and update performance. While the data extracted from eth sources and integrated in
          the ODS are considered as ground data with very low-level aggregation the data in aggregated
          views are generally highly summarized using aggregation functions. These aggregated views
          constitute what is sometimes called the CDS, i.e. the set of materialized views from which data
          marts are derived. The fourth step consists of the derivation and customization of the user views
          which define the data marts. Customization refers to various presentations needed by the users
          for multidimensional data.

          Refreshment Phase

          The refreshment phase has a data flow similar to the loading phase but, while the loading process
          is  a  massive  feeding  of  the  data  warehouse  the  refreshment  process  capture  the  differential
          changes that occurred in the sources and propagates them through the hierarchy of data stores.
          The preparation step extract from each source the data that characterize the changes that have
          occurred in this source since the last extraction. As for the loading phase, these data are cleaned
          and possibly archived before their integration. The integration step reconciles the source changes
          coming  from  multiple  sources  and  adds  them  to  the  ODS.  The  aggregation  step  computes
          incrementally the hierarchy of aggregated views using these changes. The customization step
          propagates the summarized data to the data marts.

          Requirements and Difficulties of Data Warehouse Refreshment

          The refreshment of a data warehouse is an important process which determines the effective
          usability  of  the  data  collected  and  aggregated  from  the  sources.  Indeed  the  quality  of  data
          provided to the decision makers depends on the capability of the data warehouse system to



                                           LoveLy professionaL university                                   153
   154   155   156   157   158   159   160   161   162   163   164