Page 242 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 242
Data Warehousing and Data Mining
notes 12.4 Defining Data Warehouse Quality
The existence of data alone does not ensure that all the management functions and decisions
can be smoothly undertaken. The one definition of data quality is that it’s about bad data - data
that is missing or incorrect or invalid in some context. A broader definition is that data quality
is achieved when organization uses data that is timely. Understanding the key data quality
dimensions is the first step to data quality improvement. To be process able and interpretable in
an effective and efficient manner, data has to satisfy a set of quality criteria. Data satisfying those
quality criteria is said to be of high quality. Abundant attempts have been made to define data
quality and to identify its dimensions. Dimensions of data quality typically include accuracy,
reliability, importance, consistency, precision, timeliness, fineness, understandability, conciseness
and usefulness. For our research paper we have under taken the quality criteria by taking 6 key
dimensions as depicted below Figure 12.7.
figure 12.7: Data Quality criteria
1. Completeness: Deals with to ensure is all the requisite information available? Are some
data values missing, or in an unusable state?
2. Consistency: Do distinct occurrences of the same data instances agree with each other or
provide conflicting information. Are values consistent across data sets?
3. Validity: refers to the correctness and reasonableness of data.
4. Conformity: Are there expectations that data values conform to specified formats? If so,
do all the values conform to those formats? Maintaining conformance to specific formats is
important.
5. Accuracy: Do data objects accurately represent the “real world” values they are expected
to model? Incorrect spellings of product or person names, addresses, and even untimely or
not current data can impact operational and analytical applications.
6. Integrity: What data is missing important relationship linkages? The inability to link
related records together may actually introduce duplication across your systems.
Data Warehousing
Data warehouses are one of the foundations of the Decision Support Systems of many IS
operations. As defined by the “father of data warehouse”, William H. Inmon, a data warehouse
is “a collection of Integrated, Subject-Oriented, Non Volatile and Time Variant databases where
each unit of data is specific to some period of time. Data Warehouses can contain detailed data,
lightly summarized data and highly summarized data, all formatted for analysis and decision
236 LoveLy professionaL university