Page 86 - DCAP208_Management Support Systems
P. 86

Unit 6: Data Warehousing




                                                                                                Notes
                      Figure 6.2: Single-layer  Architecture for a Data Warehouse System
































          Source:  http://www.mhprofessional.com/downloads/products/0071610391/0071610391_chap01.pdf
          This means that a data warehouse is implemented as a multidimensional view of operational
          data created by specific middleware, or an intermediate processing layer. The weakness of this
          architecture lies in its failure to meet the requirement for separation between analytical and
          transactional processing. Analysis queries are submitted to operational data after the middleware
          interprets them. It this way, the queries affect regular transactional workloads. In addition,
          although this architecture can meet the requirement for integration and correctness of data, it
          cannot log more data than sources do. For these reasons, a virtual approach to data warehouses
          can be successful only if analysis needs are particularly restricted and the data volume to analyze
          is huge.

          6.3.2 Two-Layer Architecture

          The requirement for separation plays a fundamental role in defining the typical architecture for
          a data warehouse system, as shown in Figure 6.3. Although it is typically called a two-layer
          architecture to highlight a separation between physically available sources and data warehouses,
          it actually consists of four subsequent data flow stages:
          1.   Source layer: A data warehouse system uses heterogeneous sources of data. That data is
               originally stored to corporate relational databases or legacy1 databases, or it may come
               from information systems outside the corporate walls.
          2.   Data staging: The data stored to sources should be extracted, cleansed to remove
               inconsistencies and fill gaps, and integrated to merge heterogeneous sources into one
               common schema. The so-called Extraction, Transformation, and Loading tools (ETL) can
               merge heterogeneous schemata, extract, transform, cleanse, validate, filter, and load source
               data into a data warehouse. Technologically speaking, this stage deals with problems that
               are typical for distributed information systems, such as inconsistent data management
               and incompatible data structures.




                                           LOVELY PROFESSIONAL UNIVERSITY                                   79
   81   82   83   84   85   86   87   88   89   90   91