Page 274 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 274
Data Warehousing and Data Mining
notes 14.4 optimizing and Materialization of DW views
Among the techniques adopted in relational implementations of data warehouses to improve
query performance, view materialization and indexing are presumably the most effective ones.
Materialized views are physical structures that improve data access time by pre-computing
intermediary results. Then, user queries can be efficiently processed by using data stored within
views and do not need to access the original data. Nevertheless, the use of materialized views
requires additional storage space and entails maintenance overhead when refreshing the data
warehouse.
One of the most important issues in data warehouse physical design is to select an appropriate
set of materialized views, called a configuration of views, which minimizes total query response
time and the cost of maintaining the selected views, given a limited storage space. To achieve this
goal, views that are closely related to the workload queries must be materialized.
Materialized views are schema objects that can be used to summarize, precompute, replicate, and
distribute data, e.g., to construct a data warehouse.
A materialized view provides indirect access to table data by storing the results of a query in a
separate schema object. Unlike an ordinary view, which does not take up any storage space or
contain any data.
The motivation for using materialized views is to improve performance, but the overhead
associated with materialized view management can become a significant system management
problem. When reviewing or evaluating some of the necessary materialized view management
activities, consider some of the following:
1. Identifying what materialized views to create initially
2. Indexing the materialized views
3. Ensuring that all materialized views and materialized view indexes are refreshed properly
each time the database is updated
4. Checking which materialized views have been used
5. Determining how effective each materialized view has been on workload performance
6. Measuring the space being used by materialized views
7. Determining which new materialized views should be created
8. Determining which existing materialized views should be dropped
9. Archiving old detail and materialized view data that is no longer useful
After the initial effort of creating and populating the data warehouse or data mart, the major
administration overhead is the update process, which involves:
1. Periodic extraction of incremental changes from the operational systems
2. Transforming the data
3. Verifying that the incremental changes are correct, consistent, and complete
4. Bulk-loading the data into the warehouse
5. Refreshing indexes and materialized views so that they are consistent with the detail data
268 LoveLy professionaL university