Page 243 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 243
Unit 12: Metadata and Warehouse Quality
support” (Inmon, 1996). In the “Data Warehouse Toolkit”, Ralph Kimball gives a more concise notes
definition: “a copy of transaction data specifically structured for query and analysis” (Kimball,
1998). Both definitions stress the data warehouse’s analysis focus, and highlight the historical
nature of the data found in a data warehouse.
figure 12.8: Data Warehousing structure
stages of Data Warehousing susceptible to Data Quality problems
The purpose of paper here is to formulate a descriptive taxonomy of all the issues at all the stages
of Data Warehousing. The phases are:
1. Data Source
2. Data Integration and Data Profiling
3. Data Staging and ETL
4. Database Scheme (Modeling)
Quality of data can be compromised depending upon how data is received, entered, integrated,
maintained, processed (Extracted, Transformed and Cleansed) and loaded. Data is impacted by
numerous processes that bring data into your data environment, most of which affect its quality
to some extent. All these phases of data warehousing are responsible for data quality in the data
warehouse. Despite all the efforts, there still exists a certain percentage of dirty data. This residual
dirty data should be reported, stating the reasons for the failure in data cleansing for the same.
Data quality problems can occur in many different ways. The most common include:
1. Poor data handling procedures and processes.
2. Failure to stick on to data entry and maintenance procedures.
3. Errors in the migration process from one system to another.
4. External and third-party data that may not fit with your company data standards or may
otherwise be of unconvinced quality.
LoveLy professionaL university 237