Page 96 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 96

Data Warehousing and Data Mining




                    notes              necessary on the source site. An example for a full extraction may be an export file of a
                                       distinct table or a remote SQL statement scanning the complete source table.
                                   2.   Incremental Extraction: At a specific point in time, only the data that has changed since
                                       a well-defined event back in history will be extracted. This event may be the last time of
                                       extraction or a more complex business event like the last booking day of a fiscal period. To
                                       identify this delta change there must be a possibility to identify all the changed information
                                       since this specific time event. This information can be either provided by the source data
                                       itself such as an application column, reflecting the last-changed timestamp or a change
                                       table where an appropriate additional mechanism keeps track of the changes besides the
                                       originating transactions. In most cases, using the latter method means adding extraction
                                       logic to the source system.

                                   Many  data  warehouses  do  not  use  any  change-capture  techniques  as  part  of  the  extraction
                                   process. Instead, entire tables from the source systems are extracted to the data warehouse or
                                   staging area, and these tables are compared with a previous extract from the source system to
                                   identify the changed data. This approach may not have significant impact on the source systems,
                                   but it clearly can place a considerable burden on the data warehouse processes, particularly if the
                                   data volumes are large.

                                   Physical Extraction Methods

                                   Depending on the chosen logical extraction method and the capabilities and restrictions on the
                                   source side, the extracted data can be physically extracted by two mechanisms. The data can
                                   either be extracted online from the source system or from an offline structure. Such an offline
                                   structure might already exist or it might be generated by an extraction routine.

                                   There are the following methods of physical extraction:
                                   1.   Online  Extraction:  The  data  is  extracted  directly  from  the  source  system  itself.  The
                                       extraction process can connect directly to the source system to access the source tables
                                       themselves or to an intermediate system that stores the data in a preconfigured manner (for
                                       example, snapshot logs or change tables).




                                      Note     The intermediate system is not necessarily physically different from the source
                                     system.

                                       With  online  extractions,  you  need  to  consider  whether  the  distributed  transactions  are
                                       using original source objects or prepared source objects.
                                   2.   Offline Extraction: The data is not extracted directly from the source system but is staged
                                       explicitly outside the original source system. The data already has an existing structure
                                       (for example, redo logs, archive logs or transportable tablespaces) or was created by an
                                       extraction routine.

                                   5.2 Data reconciliation

                                   An important aspect in ensuring the quality of data in business intelligence is the consistency of
                                   the data. As a data warehouse, business intelligence integrates and transforms data and stores it
                                   so that it is made available for analysis and interpretation. The consistency of the data between
                                   the various process steps has to be ensured. Data reconciliation for DataSources allows you to
                                   ensure the consistency of data that has been loaded into business intelligence and is available and
                                   used productively there. You use the scenarios that are described below to validate the loaded




          90                               LoveLy professionaL university
   91   92   93   94   95   96   97   98   99   100   101