Page 103 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 103

Unit 5: Data Warehouse Research – Issues and Research




          Finally,  quality  aspects  influence  several  factors  of  data  warehouse  design.  For  instance,  the   notes
          required storage space can be influenced by the amount and volume of the quality indicators
          needed  (time,  believability  indicators  etc.).  Furthermore,  problems  like  the  improvement  of
          query optimization through the use of quality indicators (e.g. ameliorate caching), the modeling
          of incomplete information of the data sources in the data warehouse, the reduction of negative
          effects schema evolution has on data quality and the extension of data warehouse models and
          languages, so as to make good use of quality information have to be dealt with.
          Models and tools for the management of data warehouse quality can build on substantial previous
          work in the fields of data quality.
          5.6.1 Quality Definition


          A definition and quantification of quality is given, as the fraction of Performance over Expectance.
          Taguchi defined quality as the loss imparted to society from the time a product is shipped. The
          total loss of society can be viewed as the sum of the producer’s loss and the customer’s loss. It is
          well known that there is a tradeoff between the quality of a product or service and a production
          cost and that an organization must find an equilibrium between these two parameters. If the
          equilibrium is lost, then the organization loses anyway (either by paying too much money to
          achieve a certain standard of quality, called “target”, or by shipping low quality products of
          services, which result in bad reputation and loss of market share).

          5.6.2 Data Quality research

          Quite  a  lot  of  research  has  been  done  in  the  field  of  data  quality.  Both  researchers  and
          practitioners  have  faced  the  problem  of  enhancing  the  quality  of  decision  support  systems,
          mainly  by  ameliorating  the  quality  of  their  data.  In  this  section  we  will  present  the  related
          work on this field, which more or less influenced our approach for data warehouse quality. A
          detailed presentation can be found in Wang et al., presents a framework of data analysis, based
          on the ISO 9000 standard. The framework consists of seven elements adapted from the ISO 9000
          standard: management responsibilities, operation and assurance cost, research and development,
          production, distribution, personnel management and legal function. This framework reviews a
          significant part of the literature on data quality, yet only the research and development aspects
          of data quality seem to be relevant to the cause of data warehouse quality design. The three
          main issues involved in this field are: analysis and design of the data quality aspects of data
          products, design of data manufacturing systems (DMS’s) that incorporate data quality aspects
          and definition of data quality dimensions. We should note, however, that data are treated as
          products within the proposed framework. The general terminology of the framework regarding
          quality is as follows: Data quality policy is the overall intention and direction of an organization
          with respect to issues concerning the quality of data products. Data quality management is the
          management function that determines and implements the data quality policy. A data quality
          system  encompasses  the  organizational  structure,  responsibilities,  procedures,  processes  and
          resources for implementing data quality management. Data quality control is a set of operational
          techniques and activities that are used to attain the quality required for a data product. Data
          quality assurance includes all the planed and systematic actions necessary to provide adequate
          confidence that a data product will satisfy a given set of quality requirements.

          5.6.3 Data Quality

          The quality of the data that are stored in the warehouse, is obviously not a process by itself; yet
          it is influenced by all the processes which take place in the warehouse environment. As already
          mentioned, there has been quite a lot of research on the field of data quality, in the past. We define
          data quality as a small subset of the factors proposed in other models. For example, our notion of





                                           LoveLy professionaL university                                    97
   98   99   100   101   102   103   104   105   106   107   108