Page 264 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 264
Data Warehousing and Data Mining
notes 14.1 Quality Driven Data Warehouse Design
After the basic online transaction processing (OLTP) infrastructure is in place in many
organizations, not least through standardized enterprise resource planning tools such as SAP/
R3, the focus of interest is now broadening in at least three directions:
1. A broader range of multimedia data sources inside and outside the organization,
2. A broader range of clients with diverse interest and capability profiles as well as situational
parameters, and
3. The conversion of the massive experiential data created by transaction processing into
knowledge relevant for organizational learning and action.
A wide range of information flow logistics architectures is being proposed under labels such as
supply chain management and business-to-business e-commerce. In such architectures, databases
can be considered the short- and medium-term intermediate stores of information whereas data
warehouses serve for long-term memory, knowledge creation and management. One could also
say that a data warehouse is a long-term information broker between the operational part and the
reporting/planning part of a company, supporting the Controlling department.
The traditional data warehouse (DW) architecture is shown in Figure 14.1. Physically, a data
warehouse system consists of databases (source databases, materialized views in the data
warehouse), data transport agents that ship data from one database to another, and a repository
which stores metadata about the system and its evolution. In this architecture heterogeneous
information sources are first made accessible in a uniform way through extraction mechanisms
called wrappers, then mediators take on the task of information integration and conflict
resolution. The separation between wrappers and mediators is a considered design decision,
reflecting the separation between service wrappers and request brokers in middleware systems
such as CORBA.
figure 14.1: traditional Data Warehouse architecture
The resulting standardized and integrated data are stored as materialized views in the data
warehouse. These base views (often called ODS or operational data store) are usually just slightly
aggregated. To customize them for different analyst users, data marts with more aggregated data
about specific domains of interest are constructed as second-level caches which are then accessed
by data analysis tools ranging from query facilities through spreadsheet tools to full-fledged data
mining systems.
258 LoveLy professionaL university