Page 239 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 239

Unit 12: Metadata and Warehouse Quality




          The quality meta-model is not instantiated directly with concrete quality factors and goals, it is   notes
          instantiated with patterns for quality factors and goals. The use of this intermediate instantiation
          level enables data warehouse stakeholders to define templates of quality goals and factors. For
          example,  suppose  that  the  analysis  phase  of  a  data  warehouse  project  has  detected  that  the
          availability of the source database is critical to ensure that the daily online transaction processing
          is not affected by the loading process of the data warehouse. A source administrator might later
          instantiate this template of a quality goal with the expected availability of his specific source
          database.  Thus,  the  programmers  of  the  data  warehouse  loading  programs  know  the  time
          window of the update process.
          Based on the meta-model for data warehouse architectures, we have developed a set of quality
          factor templates which can be used as initial set for data warehouse quality management. The
          methodology is an adaptation of the Total Quality Management approach and consists of the
          following steps:

          1.   Design of object types, quality factors and goals,
          2.   Evaluation of the quality factors,
          3.   Analysis of the quality goals and factors and their possible improvements
          4.   Re-evaluation of a quality goal due to the evolution of data warehouse.

          12.3.2 a Quality-oriented Data Warehouse process Model

          As described in the previous section it is important that all relevant aspects of a data warehouse
          are represented in the repository. Yet the described architecture and quality model does not
          represent the workflow which is necessary to build and run a data warehouse, e.g. to integrate
          data source or to refresh the data warehouse incrementally. Therefore, we have added a data
          warehouse process model to our meta modeling framework. Our goal is to have a simple process
          model which captures the most important issues of data warehouses rather than building a huge
          construction which is difficult to understand and not very useful due to its complexity.
          Figure 12.4 shows the meta model for data warehouse processes. A data warehouse process is
          composed of several processes or process steps which may be further decomposed. Process steps
          and the processes itself are executed in a specific order which is described by the “next” relation
          between processes. A process works on an object type, e.g. data loading works on a source data
          store and a data warehouse data store. The process itself must be executed by some object type,
          usually an agent which is represented in the physical perspective of the architecture model. The
          result of a process is some value of a domain, the execution of further processes may depend on
          this value. For example, the data loading process returns as a result a boolean value representing
          the completion value of the process, i.e. if it was successful or not. Further process steps like data
          cleaning are only executed if the previous loading process was successful. The process is linked
          to a stakeholder which controls or has initiated the process. Moreover, the result of a process is
          the data which is produced as an outcome of the process, e.g. the tuples of a relation.
          Processes  affect  a  quality  factor  of  an  object  type,  e.g.  the  availability  of  data  source  or  the
          accuracy of a data store. It might be useful to store also the expected effect on the quality factor,
          i.e. if the process improves or decreases the quality factor. However, the achieved effect on the
          quality factor can only be determined by a new measurement of this factor. A query on the
          metadata repository can then search for the processes which have improved the quality of a
          certain object.
          The processes can be subject to quality measurement, too. Yet, the quality of a process is usually
          determined by the quality of its output. Therefore, we do not go into detail with process quality
          but quality factors can be attached to processes, too.






                                           LoveLy professionaL university                                   233
   234   235   236   237   238   239   240   241   242   243   244