Page 268 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 268
Data Warehousing and Data Mining
notes
figure 14.4: Data Quality issues and DWQ research results
A closer examination of the quality factor hierarchy reveals several relationships between quality
parameters and design/operational aspects of DW’s. The DWQ project will investigate these
relationships in a systematic manner:
1. The simple DW concept itself alleviates the problem of accessibility, by saving its users the
effort of searching in a large, poorly structured information space, and avoiding interference
of data analysis with operational data processing. However, the issue of delivering the
information efficiently, is an important open problem in the light of its differences from
traditional query processing. In a DW environment, there is an increased need for fast and
dynamic aggregate query processing, indexing of aggregate results, as well as fast update
of the DW content after changes are performed to the underlying information sources.
2. It remains difficult for DW customers to interpret the data because the semantics of data
description languages for data warehouse schemata is weak, does not take into account
domain-specific aspects, and is usually not formally defined and therefore hardly
computer-supported. The DWQ project will ensure interpretability by investigating the
syntax, semantics, and reasoning efficiency for rich schema languages which (a) give more
structure to schemas, and (b) allow the integration of concrete domains (e.g., numerical
reasoning, temporal and spatial domains) and aggregate data. This work builds in part on
results obtained in the CLN (Computational Logic) II ESPRIT Project.
3. The usefulness of data is hampered because it is hard to adapt the contents of the DW to
changing customer needs, and to offer a range of different policies for ensuring adequate
timeliness of data at acceptable costs. The DWQ project will develop policies to extend
active database concepts such that data caching is optimized for a given transaction load
on the DW, and that distributed execution of triggers with user-defined cache refreshment
policies becomes possible. This work builds on earlier active database research, e.g. in
ACTNET and the STRETCH and IDEA ESPRIT projects.
4. The believability of data is hampered because the DW customer often does not know the
credibility of the source and the accuracy of the data. Moreover, schema languages are too
weak to ensure completeness and consistency testing. To ensure the quality of individual
DW contents, the DWQ project will link rich schema languages to techniques for efficient
integrity checking for relational, deductive, and object-oriented databases. Moreover,
262 LoveLy professionaL university