Page 104 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 104
Data Warehousing and Data Mining
notes data quality, in its greater part, is treated as a second level factor, namely believability. Yet, in our
model, the rest of the factors proposed elsewhere, are treated as process quality factors.
figure 5.1: Data Quality factors
The completeness factor describes the percentage of the interesting real-world information entered
in the sources and/or the warehouse. For example, completeness could rate the extent to which a
string describing an address did actually fit in the size of the attribute which represents the address.
The credibility factor describes the credibility of the source that provided the information. The
accuracy factor describes the accuracy of the data entry process which happened at the sources.
The consistency factor describes the logical coherence of the information with respect to logical
rules and constraints. The data interpretability factor is concerned with data description (i.e. data
layout for legacy systems and external data, table description for relational databases, primary
and foreign keys, aliases, defaults, domains, explanation of coded values, etc.)
factor Methods of Measurement formulae
Completeness Performance of statistic checks The percentage of stored information
detected to be incomplete with respect
to the real world values
Credibility Documentation of the source which Percentage of inaccurate information
provided the information provided by each specific source
Accuracy Documentation of the person or The percentage of stored information
machine which entered the information detected to be inaccurate with respect to
and performance of statistical checks the real world values, due to data entry
reasons
Consistency Performance of statistic checks The percentage of stored information
detected to be inconsistent
Data Interpretability Data description (i.e. data layout for Number of pieces of information not
legacy systems and external data, table fully described
description for relational databases,
primary and foreign keys, aliases,
defaults, domains, explanation of coded
values, etc.)
5.7 Major research project in Data Warehousing
Major research project in data warehousing are:
1. Michael Hahne, Lothar Burow, and Torben Elvers, XML-Datenimport in das SAP Business
Information Warehouse bei Bayer MaterialScience, Auf dem Weg zur Integration Factory,
231-251, 2005.
2. H. Fan and A. Poulovassilis, Using AutoMed Metadata in Data Warehousing Environments,
DOLAP, 2003.
3. Michael Hahne, Transformation mehrdimensionaler Datenmodelle, Vom Data Warehouse
zum Corporate Knowledge Center, 399-420, 2002.
98 LoveLy professionaL university