Page 180 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 180

Data Warehousing and Data Mining




                    notes          Data Loading vs. Data Refreshment

                                   The data warehouse loading phase consists in the initial data warehouse instantiation, that is the
                                   initial computation of the data warehouse content. This initial loading is globally a sequential
                                   process of four steps (Figure 9.3): (i) preparation, (ii) integration, (iii) high level aggregation and
                                   (iv) customisation. The first step is done for each source and consists in data extraction, data
                                   cleaning and possibly data archiving before or after cleaning. Archiving data in a history can
                                   be used both for synchronisation purpose between sources having different access frequencies
                                   and  for  some  specific  temporal  queries.  The  second  step  consists  in  data  reconciliation  and
                                   integration, that is cleaning multi-source cleaning of data originated from heterogeneous sources,
                                   and derivation of the base relations (or base views) of the operational data store (ODS). The third
                                   step consists in the computation of aggregated views from base views. While the data extracted
                                   from the sources and integrated in the ODS is considered as ground data with very low level
                                   aggregation, the data in the corporate data warehouse (CDW) is generally highly summarised
                                   using aggregation functions. The fourth step consists in the derivation and customisation of the
                                   user views which define the data marts. Customisation refers to various presentations needed by
                                   the users for multidimensional data.
                                                            figure 9.3: Data Loading activities




































                                   The main feature of the loading phase is that it constitutes the latest stage of the data warehouse
                                   design project. Before the end of the data loading, the data warehouse does not yet exist for the
                                   users.
                                   Consequently, there is no constraint on the response time. But, in contrast, with respect to the
                                   data sources, the loading phase requires more availability.
                                   The data flow which describes the loading phase can serve as a basis to define the refreshment
                                   process, but the corresponding workflows are different. The workflow of the refreshment process
                                   is dynamic and can evolve with users’ needs and with source evolution, while the workflow of
                                   the initial loading process is static and defined with respect to current user requirements and
                                   current sources.


          174                              LoveLy professionaL university
   175   176   177   178   179   180   181   182   183   184   185