Page 239 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 239
Unit 12: Metadata and Warehouse Quality
The quality meta-model is not instantiated directly with concrete quality factors and goals, it is notes
instantiated with patterns for quality factors and goals. The use of this intermediate instantiation
level enables data warehouse stakeholders to define templates of quality goals and factors. For
example, suppose that the analysis phase of a data warehouse project has detected that the
availability of the source database is critical to ensure that the daily online transaction processing
is not affected by the loading process of the data warehouse. A source administrator might later
instantiate this template of a quality goal with the expected availability of his specific source
database. Thus, the programmers of the data warehouse loading programs know the time
window of the update process.
Based on the meta-model for data warehouse architectures, we have developed a set of quality
factor templates which can be used as initial set for data warehouse quality management. The
methodology is an adaptation of the Total Quality Management approach and consists of the
following steps:
1. Design of object types, quality factors and goals,
2. Evaluation of the quality factors,
3. Analysis of the quality goals and factors and their possible improvements
4. Re-evaluation of a quality goal due to the evolution of data warehouse.
12.3.2 a Quality-oriented Data Warehouse process Model
As described in the previous section it is important that all relevant aspects of a data warehouse
are represented in the repository. Yet the described architecture and quality model does not
represent the workflow which is necessary to build and run a data warehouse, e.g. to integrate
data source or to refresh the data warehouse incrementally. Therefore, we have added a data
warehouse process model to our meta modeling framework. Our goal is to have a simple process
model which captures the most important issues of data warehouses rather than building a huge
construction which is difficult to understand and not very useful due to its complexity.
Figure 12.4 shows the meta model for data warehouse processes. A data warehouse process is
composed of several processes or process steps which may be further decomposed. Process steps
and the processes itself are executed in a specific order which is described by the “next” relation
between processes. A process works on an object type, e.g. data loading works on a source data
store and a data warehouse data store. The process itself must be executed by some object type,
usually an agent which is represented in the physical perspective of the architecture model. The
result of a process is some value of a domain, the execution of further processes may depend on
this value. For example, the data loading process returns as a result a boolean value representing
the completion value of the process, i.e. if it was successful or not. Further process steps like data
cleaning are only executed if the previous loading process was successful. The process is linked
to a stakeholder which controls or has initiated the process. Moreover, the result of a process is
the data which is produced as an outcome of the process, e.g. the tuples of a relation.
Processes affect a quality factor of an object type, e.g. the availability of data source or the
accuracy of a data store. It might be useful to store also the expected effect on the quality factor,
i.e. if the process improves or decreases the quality factor. However, the achieved effect on the
quality factor can only be determined by a new measurement of this factor. A query on the
metadata repository can then search for the processes which have improved the quality of a
certain object.
The processes can be subject to quality measurement, too. Yet, the quality of a process is usually
determined by the quality of its output. Therefore, we do not go into detail with process quality
but quality factors can be attached to processes, too.
LoveLy professionaL university 233