Page 242 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 242

Data Warehousing and Data Mining




                    notes          12.4 Defining Data Warehouse Quality

                                   The existence of data alone does not ensure that all the management functions and decisions
                                   can be smoothly undertaken. The one definition of data quality is that it’s about bad data - data
                                   that is missing or incorrect or invalid in some context. A broader definition is that data quality
                                   is  achieved  when  organization  uses  data  that  is  timely.  Understanding  the  key  data  quality
                                   dimensions is the first step to data quality improvement. To be process able and interpretable in
                                   an effective and efficient manner, data has to satisfy a set of quality criteria. Data satisfying those
                                   quality criteria is said to be of high quality. Abundant attempts have been made to define data
                                   quality and to identify its dimensions. Dimensions of data quality typically include accuracy,
                                   reliability, importance, consistency, precision, timeliness, fineness, understandability, conciseness
                                   and usefulness. For our research paper we have under taken the quality criteria by taking 6 key
                                   dimensions as depicted below Figure 12.7.

                                                             figure 12.7: Data Quality criteria





















                                   1.   Completeness: Deals with to ensure is all the requisite information available? Are some
                                       data values missing, or in an unusable state?

                                   2.   Consistency: Do distinct occurrences of the same data instances agree with each other or
                                       provide conflicting information. Are values consistent across data sets?
                                   3.   Validity: refers to the correctness and reasonableness of data.
                                   4.   Conformity: Are there expectations that data values conform to specified formats? If so,
                                       do all the values conform to those formats? Maintaining conformance to specific formats is
                                       important.
                                   5.   Accuracy: Do data objects accurately represent the “real world” values they are expected
                                       to model? Incorrect spellings of product or person names, addresses, and even untimely or
                                       not current data can impact operational and analytical applications.
                                   6.   Integrity:  What  data  is  missing  important  relationship  linkages?  The  inability  to  link
                                       related records together may actually introduce duplication across your systems.

                                   Data Warehousing

                                   Data  warehouses  are  one  of  the  foundations  of  the  Decision  Support  Systems  of  many  IS
                                   operations. As defined by the “father of data warehouse”, William H. Inmon, a data warehouse
                                   is “a collection of Integrated, Subject-Oriented, Non Volatile and Time Variant databases where
                                   each unit of data is specific to some period of time. Data Warehouses can contain detailed data,
                                   lightly summarized data and highly summarized data, all formatted for analysis and decision




          236                              LoveLy professionaL university
   237   238   239   240   241   242   243   244   245   246   247