Page 179 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 179

Unit 9: Data Warehouse Refreshment – II




          aggregated data and can be organized into a multidimensional structure. Data extracted from   notes
          each source can also be stored in intermediate data recipients. Obviously, this hierarchy of data
          stores is a logical way to represent the data flows which go from the sources to the data marts.
          All these stores are not necessarily materialized, and if they are, they can just constitute different
          layers of the same database.
          Figure 9.2 shows a typical data warehouse architecture. This is a logical view whose operational
          implementation receives many different answers in the data warehousing products. Depending
          on each data source, extraction and cleaning can be done by the same wrapper or by distinct
          tools. Similarly data reconciliation (also called multi-source cleaning) can be separated from or
          merged with data integration (multi-sources operations). High level aggregation can be seen as a
          set of computation techniques ranging from simple statistical functions to advanced data mining
          algorithms. Customisation techniques may vary from one data mart to another, depending on
          the way decision makers want to see the elaborated data.

                                  figure 9.2: Data Warehouse architecture


























          The refreshment of a data warehouse is an important process which determines the effective
          usability  of  the  data  collected  and  aggregated  from  the  sources.  Indeed,  the  quality  of  data
          provided to the decision makers depends on the capability of the data warehouse system to
          convey in a reasonable time, from the sources to the data marts, the changes made at the data
          sources. Most of the design decisions are then concerned by the choice of data structures and
          update techniques that optimise the refreshment of the data warehouse.
          There is a quiet great confusion in the literature concerning data warehouse refreshment. Indeed,
          this process is often either reduced to view maintenance problem or confused with the data
          loading phase. Our purpose in this paper is to show that the data warehouse refreshment is a
          more complex than the view maintenance problem, and different from the loading process. We
          define the refreshment process as a workflow whose activities depend on the available products
          for  data  extraction,  cleaning  and  integration,  and  whose  triggering  events  of  these  activities
          depend on the application domain and on the required quality in terms of data freshness.

          9.3.1 view Maintenance, Data Loading and Data refreshment

          Data refreshment in data warehouses is generally confused with data loading as done during the
          initial phase or with update propagation through a set of materialized views. Both analogies are
          wrong. The following paragraphs argument on the differences between data loading and data
          refreshment, and between view maintenance and data refreshment.



                                           LoveLy professionaL university                                   173
   174   175   176   177   178   179   180   181   182   183   184