Page 104 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 104

Data Warehousing and Data Mining




                    notes          data quality, in its greater part, is treated as a second level factor, namely believability. Yet, in our
                                   model, the rest of the factors proposed elsewhere, are treated as process quality factors.
                                                             figure 5.1: Data Quality factors















                                   The completeness factor describes the percentage of the interesting real-world information entered
                                   in the sources and/or the warehouse. For example, completeness could rate the extent to which a
                                   string describing an address did actually fit in the size of the attribute which represents the address.
                                   The credibility factor describes the credibility of the source that provided the information. The
                                   accuracy factor describes the accuracy of the data entry process which happened at the sources.
                                   The consistency factor describes the logical coherence of the information with respect to logical
                                   rules and constraints. The data interpretability factor is concerned with data description (i.e. data
                                   layout for legacy systems and external data, table description for relational databases, primary
                                   and foreign keys, aliases, defaults, domains, explanation of coded values, etc.)
                                   factor           Methods of Measurement        formulae
                                   Completeness     Performance of statistic checks  The  percentage  of  stored  information
                                                                                  detected to be incomplete with respect
                                                                                  to the real world values
                                   Credibility      Documentation  of  the  source  which  Percentage  of  inaccurate  information
                                                    provided the information      provided by each specific source
                                   Accuracy         Documentation  of  the  person  or  The  percentage  of  stored  information
                                                    machine which entered the information  detected to be inaccurate with respect to
                                                    and performance of statistical checks  the real world values, due to data entry
                                                                                  reasons
                                   Consistency      Performance of statistic checks  The  percentage  of  stored  information
                                                                                  detected to be inconsistent
                                   Data Interpretability   Data  description  (i.e.  data  layout  for  Number  of  pieces  of  information  not
                                                    legacy systems and external data, table  fully described
                                                    description  for  relational  databases,
                                                    primary  and  foreign  keys,  aliases,
                                                    defaults, domains, explanation of coded
                                                    values, etc.)

                                   5.7 Major research project in Data Warehousing

                                   Major research project in data warehousing are:
                                   1.   Michael Hahne, Lothar Burow, and Torben Elvers, XML-Datenimport in das SAP Business
                                       Information Warehouse bei Bayer MaterialScience, Auf dem Weg zur Integration Factory,
                                       231-251, 2005.
                                   2.   H. Fan and A. Poulovassilis, Using AutoMed Metadata in Data Warehousing Environments,
                                       DOLAP, 2003.
                                   3.   Michael Hahne, Transformation mehrdimensionaler Datenmodelle, Vom Data Warehouse
                                       zum Corporate Knowledge Center, 399-420, 2002.



          98                               LoveLy professionaL university
   99   100   101   102   103   104   105   106   107   108   109