Page 157 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 157
Unit 8: Data Warehouse Refreshment
Gurwinder Kaur, Lovely Professional University
unit 8: Data Warehouse refreshment notes
contents
Objectives
Introduction
8.1 Data Warehouse Refreshment
8.2 Incremental Data Extraction
8.3 Data Cleaning
8.3.1 Data Cleaning for Missing Values
8.3.2 Noisy Data
8.4 Summary
8.5 Keywords
8.6 Self Assessment
8.7 Review Questions
8.8 Further Readings
objectives
After studying this unit, you will be able to:
l z Know data warehouse refreshment
l z Explain incremental data extraction
l z Describe data cleaning
introduction
A distinguishing characteristic of data warehouses is the temporal character of warehouse data,
i.e., the management of histories over an extended period of time. Historical data is necessary for
business trend analysis which can be expressed in terms of analysing the temporal development
of real-time data. For the refreshment process, maintaining histories in the DWH means that
either periodical snapshots of the corresponding operational data or relevant operational updates
are propagated and stored in the warehouse, without overriding previous warehouse states.
Extraction is the operation of extracting data from a source system for further use in a data
warehouse environment. This is the first step of the ETL process. After the extraction, this data
can be transformed and loaded into the data warehouse.
The source systems for a data warehouse are typically transaction processing applications. For
example, one of the source systems for a sales analysis data warehouse might be an order entry
system that records all of the current order activities.
LoveLy professionaL university 151