Page 103 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 103
Unit 5: Data Warehouse Research – Issues and Research
Finally, quality aspects influence several factors of data warehouse design. For instance, the notes
required storage space can be influenced by the amount and volume of the quality indicators
needed (time, believability indicators etc.). Furthermore, problems like the improvement of
query optimization through the use of quality indicators (e.g. ameliorate caching), the modeling
of incomplete information of the data sources in the data warehouse, the reduction of negative
effects schema evolution has on data quality and the extension of data warehouse models and
languages, so as to make good use of quality information have to be dealt with.
Models and tools for the management of data warehouse quality can build on substantial previous
work in the fields of data quality.
5.6.1 Quality Definition
A definition and quantification of quality is given, as the fraction of Performance over Expectance.
Taguchi defined quality as the loss imparted to society from the time a product is shipped. The
total loss of society can be viewed as the sum of the producer’s loss and the customer’s loss. It is
well known that there is a tradeoff between the quality of a product or service and a production
cost and that an organization must find an equilibrium between these two parameters. If the
equilibrium is lost, then the organization loses anyway (either by paying too much money to
achieve a certain standard of quality, called “target”, or by shipping low quality products of
services, which result in bad reputation and loss of market share).
5.6.2 Data Quality research
Quite a lot of research has been done in the field of data quality. Both researchers and
practitioners have faced the problem of enhancing the quality of decision support systems,
mainly by ameliorating the quality of their data. In this section we will present the related
work on this field, which more or less influenced our approach for data warehouse quality. A
detailed presentation can be found in Wang et al., presents a framework of data analysis, based
on the ISO 9000 standard. The framework consists of seven elements adapted from the ISO 9000
standard: management responsibilities, operation and assurance cost, research and development,
production, distribution, personnel management and legal function. This framework reviews a
significant part of the literature on data quality, yet only the research and development aspects
of data quality seem to be relevant to the cause of data warehouse quality design. The three
main issues involved in this field are: analysis and design of the data quality aspects of data
products, design of data manufacturing systems (DMS’s) that incorporate data quality aspects
and definition of data quality dimensions. We should note, however, that data are treated as
products within the proposed framework. The general terminology of the framework regarding
quality is as follows: Data quality policy is the overall intention and direction of an organization
with respect to issues concerning the quality of data products. Data quality management is the
management function that determines and implements the data quality policy. A data quality
system encompasses the organizational structure, responsibilities, procedures, processes and
resources for implementing data quality management. Data quality control is a set of operational
techniques and activities that are used to attain the quality required for a data product. Data
quality assurance includes all the planed and systematic actions necessary to provide adequate
confidence that a data product will satisfy a given set of quality requirements.
5.6.3 Data Quality
The quality of the data that are stored in the warehouse, is obviously not a process by itself; yet
it is influenced by all the processes which take place in the warehouse environment. As already
mentioned, there has been quite a lot of research on the field of data quality, in the past. We define
data quality as a small subset of the factors proposed in other models. For example, our notion of
LoveLy professionaL university 97