Page 268 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 268

Data Warehousing and Data Mining




                    notes
                                                   figure 14.4: Data Quality issues and DWQ research results





























                                   A closer examination of the quality factor hierarchy reveals several relationships between quality
                                   parameters and design/operational aspects of DW’s. The DWQ project will investigate these
                                   relationships in a systematic manner:
                                   1.   The simple DW concept itself alleviates the problem of accessibility, by saving its users the
                                       effort of searching in a large, poorly structured information space, and avoiding interference
                                       of data analysis with operational data processing. However, the issue of delivering the
                                       information efficiently, is an important open problem in the light of its differences from
                                       traditional query processing. In a DW environment, there is an increased need for fast and
                                       dynamic aggregate query processing, indexing of aggregate results, as well as fast update
                                       of the DW content after changes are performed to the underlying information sources.
                                   2.   It remains difficult for DW customers to interpret the data because the semantics of data
                                       description languages for data warehouse schemata is weak, does not take into account
                                       domain-specific  aspects,  and  is  usually  not  formally  defined  and  therefore  hardly
                                       computer-supported. The DWQ project will ensure interpretability by investigating the
                                       syntax, semantics, and reasoning efficiency for rich schema languages which (a) give more
                                       structure to schemas, and (b) allow the integration of concrete domains (e.g., numerical
                                       reasoning, temporal and spatial domains) and aggregate data. This work builds in part on
                                       results obtained in the CLN (Computational Logic) II ESPRIT Project.

                                   3.   The usefulness of data is hampered because it is hard to adapt the contents of the DW to
                                       changing customer needs, and to offer a range of different policies for ensuring adequate
                                       timeliness of data at acceptable costs. The DWQ project will develop policies to extend
                                       active database concepts such that data caching is optimized for a given transaction load
                                       on the DW, and that distributed execution of triggers with user-defined cache refreshment
                                       policies  becomes  possible.  This  work  builds  on  earlier  active  database  research,  e.g.  in
                                       ACTNET and the STRETCH and IDEA ESPRIT projects.
                                   4.   The believability of data is hampered because the DW customer often does not know the
                                       credibility of the source and the accuracy of the data. Moreover, schema languages are too
                                       weak to ensure completeness and consistency testing. To ensure the quality of individual
                                       DW contents, the DWQ project will link rich schema languages to techniques for efficient
                                       integrity  checking  for  relational,  deductive,  and  object-oriented  databases.  Moreover,




          262                              LoveLy professionaL university
   263   264   265   266   267   268   269   270   271   272   273