Page 160 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 160

Data Warehousing and Data Mining




                    notes          propagate the changes made at the data sources in reasonable time. Most of the design decisions
                                   are then influenced by re choice of data structures and updating techniques that optimize the
                                   refreshment of the data warehouse.
                                   Building  an  efficient  refreshment  strategy  depends  on  various  parameters  related  to  the
                                   following:
                                   1.   Application  requirements:  e.g.,  data  freshness,  computation  time  of  queries  and  views,
                                       data accuracy

                                   2.   Source Constraints: e.g., availability windows, frequency of change.
                                   3.   Data Warehouse System limits: e.g., storage space limit, functional limits.
                                   Most  of  these  parameters  may  evolve  during  the  data  warehouse  lifetime,  hence  leading  to
                                   frequent  reconfiguration  of  the  data  warehouse  architecture  and  changes  in  the  refreshment
                                   strategies. Consequently data warehouse administrators must be provided with powerful tools
                                   that enable them to efficiently redesign data warehouse applications.

                                   For those corporations in which an ODS makes sense, Inmon proposes to distinguish among
                                   three classes of ODSs, depending on the speed of refreshment demanded.
                                   1.   The first class of ODSs is refreshed within a few seconds after the operational data sources
                                       are  updated.  Very  little  transformations  are  performed  as  the  data  passes  form  the
                                       operational environment into the ODS. A typical example of such an ODS is given by a
                                       banking environment where data sources keep individual accounts of a large multinational
                                       customer, and the ODS stores the total balance for this customer.
                                   2.   With  the  second  class  of  ODSs  integrated  and  transformed  data  are  first  accumulated
                                       and stored into an intermediate data store and then periodically forwarding to the ODS
                                       on say an hourly basis. This class usually involves more integration and transformation
                                       processing. To illustrate this consider now a bank that stores in the ODS an integrated
                                       individual bank account on a weekly basis, including the number of transactions during
                                       the week the starting and ending balances the largest and smallest transactions, etc. The
                                       daily transactions processed at the operational level are stored and forwarded on an hourly
                                       basis. Each change received by the ODS triggers the updating of a composite record o the
                                       current week.
                                   3.   Finally,  the  third  class  of  ODSs  is  strongly  asynchronous.  Data  are  extracted  from  the
                                       sources and used to refresh the ODS on a day-or-more basis. As an example of this class,
                                       consider an ODS that stores composite customer records computed from different sources.
                                       As customer data change very slowly, it is reasonable to refresh ODS in a more infrequent
                                       fashion.
                                   Quite  similar  distinctions  also  apply  for  the  refreshment  of  a  global  data  warehouse  except
                                   that  there  is  usually  no  counterpart  for  ODS  of  the  first  class.  The  period  for  refreshment  is
                                   considered  to  be  larger  for  global  data  warehouses.  Nevertheless,  different  data  warehouses
                                   demand  different  speed  of  refreshment.  Besides  the  speed  of  the  refreshment,  which  can  be
                                   determined statically after analyzing the requirements of the information processing application
                                   other dynamic parameters may influence the refreshment strategy of the data warehouse. For
                                   instance one may consider the volume of changes in the data sources as given by the number of
                                   update transactions. Coming back to the previous example of an ODS of the second class, such
                                   a parameter may determine dynamically the moment at which the changes accumulated into an
                                   intermediate data store should be forwarded to the ODS. Another parameter can be determined
                                   by the profile to queries that execute on the data warehouse. Some strategic queries that require
                                   to use fresh data may entail the refreshment of the data warehouse for instance using the changes
                                   that have been previously logged between then sources and the ODS or the sources and the
                                   global data warehouse.





          154                              LoveLy professionaL university
   155   156   157   158   159   160   161   162   163   164   165