Page 96 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 96
Data Warehousing and Data Mining
notes necessary on the source site. An example for a full extraction may be an export file of a
distinct table or a remote SQL statement scanning the complete source table.
2. Incremental Extraction: At a specific point in time, only the data that has changed since
a well-defined event back in history will be extracted. This event may be the last time of
extraction or a more complex business event like the last booking day of a fiscal period. To
identify this delta change there must be a possibility to identify all the changed information
since this specific time event. This information can be either provided by the source data
itself such as an application column, reflecting the last-changed timestamp or a change
table where an appropriate additional mechanism keeps track of the changes besides the
originating transactions. In most cases, using the latter method means adding extraction
logic to the source system.
Many data warehouses do not use any change-capture techniques as part of the extraction
process. Instead, entire tables from the source systems are extracted to the data warehouse or
staging area, and these tables are compared with a previous extract from the source system to
identify the changed data. This approach may not have significant impact on the source systems,
but it clearly can place a considerable burden on the data warehouse processes, particularly if the
data volumes are large.
Physical Extraction Methods
Depending on the chosen logical extraction method and the capabilities and restrictions on the
source side, the extracted data can be physically extracted by two mechanisms. The data can
either be extracted online from the source system or from an offline structure. Such an offline
structure might already exist or it might be generated by an extraction routine.
There are the following methods of physical extraction:
1. Online Extraction: The data is extracted directly from the source system itself. The
extraction process can connect directly to the source system to access the source tables
themselves or to an intermediate system that stores the data in a preconfigured manner (for
example, snapshot logs or change tables).
Note The intermediate system is not necessarily physically different from the source
system.
With online extractions, you need to consider whether the distributed transactions are
using original source objects or prepared source objects.
2. Offline Extraction: The data is not extracted directly from the source system but is staged
explicitly outside the original source system. The data already has an existing structure
(for example, redo logs, archive logs or transportable tablespaces) or was created by an
extraction routine.
5.2 Data reconciliation
An important aspect in ensuring the quality of data in business intelligence is the consistency of
the data. As a data warehouse, business intelligence integrates and transforms data and stores it
so that it is made available for analysis and interpretation. The consistency of the data between
the various process steps has to be ensured. Data reconciliation for DataSources allows you to
ensure the consistency of data that has been loaded into business intelligence and is available and
used productively there. You use the scenarios that are described below to validate the loaded
90 LoveLy professionaL university