Page 86 - DCAP208_Management Support Systems

P. 86

Unit 6: Data Warehousing

Notes
Figure 6.2: Single-layer Architecture for a Data Warehouse System

Source: http://www.mhprofessional.com/downloads/products/0071610391/0071610391_chap01.pdf
This means that a data warehouse is implemented as a multidimensional view of operational
data created by specific middleware, or an intermediate processing layer. The weakness of this
architecture lies in its failure to meet the requirement for separation between analytical and
transactional processing. Analysis queries are submitted to operational data after the middleware
interprets them. It this way, the queries affect regular transactional workloads. In addition,
although this architecture can meet the requirement for integration and correctness of data, it
cannot log more data than sources do. For these reasons, a virtual approach to data warehouses
can be successful only if analysis needs are particularly restricted and the data volume to analyze
is huge.

6.3.2 Two-Layer Architecture

The requirement for separation plays a fundamental role in defining the typical architecture for
a data warehouse system, as shown in Figure 6.3. Although it is typically called a two-layer
architecture to highlight a separation between physically available sources and data warehouses,
it actually consists of four subsequent data flow stages:
1. Source layer: A data warehouse system uses heterogeneous sources of data. That data is
originally stored to corporate relational databases or legacy1 databases, or it may come
from information systems outside the corporate walls.
2. Data staging: The data stored to sources should be extracted, cleansed to remove
inconsistencies and fill gaps, and integrated to merge heterogeneous sources into one
common schema. The so-called Extraction, Transformation, and Loading tools (ETL) can
merge heterogeneous schemata, extract, transform, cleanse, validate, filter, and load source
data into a data warehouse. Technologically speaking, this stage deals with problems that
are typical for distributed information systems, such as inconsistent data management
and incompatible data structures.

LOVELY PROFESSIONAL UNIVERSITY 79

81 82 83 84 85 86 87 88 89 90 91