Page 101 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 101
Unit 5: Data Warehouse Research – Issues and Research
critical, real-time decision support systems. Below are some of the most common technological notes
methods developed to address the problems related to data sharing through data propagation.
Bulk Extract: In this method of data propagation, copy management tools or unload utilities are
being used in order to extract all or a subset of the operational relational database. Typically, the
extracted data is the transported to the target database using file transfer protocol (FTP) any other
similar methods. The data which has been extracted may be transformed to the format used by
the target on the host or target server.
The database management system load products are then used in order to refresh the database
target. This process is most efficient for use with small source files or files that have a high
percentage of changes because this approach does not distinguish changed versus unchanged
records. Apparently, it is least efficient for large files where only a few records have changed.
File Compare: This method is a variation of the bulk move approach. This process compares the
newly extracted operational data to the previous version. After that, a set of incremental change
records is created. The processing of incremental changes is similar to the techniques used in bulk
extract except that the incremental changes are applied as updates to the target server within the
scheduled process. This approach is recommended for smaller files where there only few record
changes.
Change Data Propagation: This method captures and records the changes to the file as part of the
application change process. There are many techniques that can be used to implement Change
Data Propagation such as triggers, log exits, log post processing or DBMS extensions. A file of
incremental changes is created to contain the captured changes. After completion of the source
transaction, the change records may already be transformed and moved to the target database.
This type of data propagation is sometimes called near real time or continuous propagation and
used in keeping the target database synchronized within a very brief period of a source system.
5.6 Modelling and Measuring Data Warehouse Quality
Data quality has been defined as the fraction of performance over expectancy, or as the loss
imparted to society from the time a product is shipped. We believe, though, that the best
definition is the one found in: data quality is defined as “fitness for use”. The nature of this
definition directly implies that the concept of data quality is relative. For example, data semantics
(the interpretation of information) is different for each distinct user. As mentions “the problem of
data quality is fundamentally intertwined in how [...] users actually use the data in the system”,
since the users are actually the ultimate judges of the quality of the data produced for them: if
nobody actually uses the data, then nobody will ever take care to improve its quality.
As a decision support information system, a data warehouse must provide high level quality
of data and quality of service. Coherency, freshness, accuracy, accessibility, availability and
performance are among the quality features required by the end users of the data warehouse.
Still, too many stakeholders are involved in the lifecycle of the data warehouse; all of them seem
to have their quality requirements. As already mentioned, the Decision Maker usually employs
an OLAP query tool to get answers interesting to him. A decision-maker is usually concerned
with the quality of the stored data, their timeliness and the ease of querying them through
the OLAP tools. The Data Warehouse Administrator needs facilities such as error reporting,
metadata accessibility and knowledge of the timeliness of the data, in order to detect changes
and reasons for them, or problems in the stored information. The Data Warehouse Designer
needs to measure the quality of the schemata of the data warehouse environment (both existing
and newly produced) and the quality of the metadata as well. Furthermore, he needs software
evaluation standards to test the software packages he considers purchasing. The Programmers of
Data Warehouse Components can make good use of software implementation standards in order
to accomplish and evaluate their work. Metadata reporting can also facilitate their job, since they
LoveLy professionaL university 95