Page 160 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 160
Data Warehousing and Data Mining
notes propagate the changes made at the data sources in reasonable time. Most of the design decisions
are then influenced by re choice of data structures and updating techniques that optimize the
refreshment of the data warehouse.
Building an efficient refreshment strategy depends on various parameters related to the
following:
1. Application requirements: e.g., data freshness, computation time of queries and views,
data accuracy
2. Source Constraints: e.g., availability windows, frequency of change.
3. Data Warehouse System limits: e.g., storage space limit, functional limits.
Most of these parameters may evolve during the data warehouse lifetime, hence leading to
frequent reconfiguration of the data warehouse architecture and changes in the refreshment
strategies. Consequently data warehouse administrators must be provided with powerful tools
that enable them to efficiently redesign data warehouse applications.
For those corporations in which an ODS makes sense, Inmon proposes to distinguish among
three classes of ODSs, depending on the speed of refreshment demanded.
1. The first class of ODSs is refreshed within a few seconds after the operational data sources
are updated. Very little transformations are performed as the data passes form the
operational environment into the ODS. A typical example of such an ODS is given by a
banking environment where data sources keep individual accounts of a large multinational
customer, and the ODS stores the total balance for this customer.
2. With the second class of ODSs integrated and transformed data are first accumulated
and stored into an intermediate data store and then periodically forwarding to the ODS
on say an hourly basis. This class usually involves more integration and transformation
processing. To illustrate this consider now a bank that stores in the ODS an integrated
individual bank account on a weekly basis, including the number of transactions during
the week the starting and ending balances the largest and smallest transactions, etc. The
daily transactions processed at the operational level are stored and forwarded on an hourly
basis. Each change received by the ODS triggers the updating of a composite record o the
current week.
3. Finally, the third class of ODSs is strongly asynchronous. Data are extracted from the
sources and used to refresh the ODS on a day-or-more basis. As an example of this class,
consider an ODS that stores composite customer records computed from different sources.
As customer data change very slowly, it is reasonable to refresh ODS in a more infrequent
fashion.
Quite similar distinctions also apply for the refreshment of a global data warehouse except
that there is usually no counterpart for ODS of the first class. The period for refreshment is
considered to be larger for global data warehouses. Nevertheless, different data warehouses
demand different speed of refreshment. Besides the speed of the refreshment, which can be
determined statically after analyzing the requirements of the information processing application
other dynamic parameters may influence the refreshment strategy of the data warehouse. For
instance one may consider the volume of changes in the data sources as given by the number of
update transactions. Coming back to the previous example of an ODS of the second class, such
a parameter may determine dynamically the moment at which the changes accumulated into an
intermediate data store should be forwarded to the ODS. Another parameter can be determined
by the profile to queries that execute on the data warehouse. Some strategic queries that require
to use fresh data may entail the refreshment of the data warehouse for instance using the changes
that have been previously logged between then sources and the ODS or the sources and the
global data warehouse.
154 LoveLy professionaL university