Page 101 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 101

Unit 5: Data Warehouse Research – Issues and Research




          critical, real-time decision support systems. Below are some of the most common technological   notes
          methods developed to address the problems related to data sharing through data propagation.
          Bulk Extract: In this method of data propagation, copy management tools or unload utilities are
          being used in order to extract all or a subset of the operational relational database. Typically, the
          extracted data is the transported to the target database using file transfer protocol (FTP) any other
          similar methods. The data which has been extracted may be transformed to the format used by
          the target on the host or target server.
          The database management system load products are then used in order to refresh the database
          target.  This  process  is  most  efficient  for  use  with  small  source  files  or  files  that  have  a  high
          percentage of changes because this approach does not distinguish changed versus unchanged
          records. Apparently, it is least efficient for large files where only a few records have changed.

          File Compare: This method is a variation of the bulk move approach. This process compares the
          newly extracted operational data to the previous version. After that, a set of incremental change
          records is created. The processing of incremental changes is similar to the techniques used in bulk
          extract except that the incremental changes are applied as updates to the target server within the
          scheduled process. This approach is recommended for smaller files where there only few record
          changes.
          Change Data Propagation: This method captures and records the changes to the file as part of the
          application change process. There are many techniques that can be used to implement Change
          Data Propagation such as triggers, log exits, log post processing or DBMS extensions. A file of
          incremental changes is created to contain the captured changes. After completion of the source
          transaction, the change records may already be transformed and moved to the target database.
          This type of data propagation is sometimes called near real time or continuous propagation and
          used in keeping the target database synchronized within a very brief period of a source system.

          5.6 Modelling and Measuring Data Warehouse Quality

          Data quality has been defined as the fraction of performance over expectancy, or as the loss
          imparted  to  society  from  the  time  a  product  is  shipped.  We  believe,  though,  that  the  best
          definition is the one found in: data quality is defined as “fitness for use”. The nature of this
          definition directly implies that the concept of data quality is relative. For example, data semantics
          (the interpretation of information) is different for each distinct user. As mentions “the problem of
          data quality is fundamentally intertwined in how [...] users actually use the data in the system”,
          since the users are actually the ultimate judges of the quality of the data produced for them: if
          nobody actually uses the data, then nobody will ever take care to improve its quality.
          As a decision support information system, a data warehouse must provide high level quality
          of  data  and  quality  of  service.  Coherency,  freshness,  accuracy,  accessibility,  availability  and
          performance are among the quality features required by the end users of the data warehouse.
          Still, too many stakeholders are involved in the lifecycle of the data warehouse; all of them seem
          to have their quality requirements. As already mentioned, the Decision Maker usually employs
          an OLAP query tool to get answers interesting to him. A decision-maker is usually concerned
          with  the  quality  of  the  stored  data,  their  timeliness  and  the  ease  of  querying  them  through
          the  OLAP  tools.  The  Data  Warehouse  Administrator  needs  facilities  such  as  error  reporting,
          metadata accessibility and knowledge of the timeliness of the data, in order to detect changes
          and reasons for them, or problems in the stored information. The Data Warehouse Designer
          needs to measure the quality of the schemata of the data warehouse environment (both existing
          and newly produced) and the quality of the metadata as well. Furthermore, he needs software
          evaluation standards to test the software packages he considers purchasing. The Programmers of
          Data Warehouse Components can make good use of software implementation standards in order
          to accomplish and evaluate their work. Metadata reporting can also facilitate their job, since they





                                           LoveLy professionaL university                                    95
   96   97   98   99   100   101   102   103   104   105   106