Page 266 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 266
Data Warehousing and Data Mining
notes Despite the fact, that this data warehouse talks “only” about financial figures, there is a host of
semantic coherency questions to be solved between the different accounting definitions required
by tax laws, stock exchanges, different financial products, and the like. At the same time, there
are massive physical data integration problems to be solved by re-calculating ten thousands of
multi-dimensional data cubes on a daily basis to have close to zero-latency information for top
management. In light of such problems, many architectures discussed in the literature appear
somewhat naive.
The key to solving these enormous problems in a flexible and evolvable manner is enriched
metadata management, used by different kinds of interacting software components. In the
following section, we shall present our approach how to organize this.
14.2 interaction between Quality factors and DW tasks
Starting from a definition of the basic DW architecture and the relevant data quality issues,
the first project goal is to define the range of design and operational method alternatives for
each of the main architecture components and quality factors. Since usually a combination of
enabling technologies is required, innovations are envisioned both at the design (e.g., rich meta-
data representation and reasoning facilities) as well as at the operational level (e.g., viewing DW
contents as views over the underlying information sources, refreshment techniques and optimal
handling of views with aggregate functions become important). In a second step, formal models of
the DW architecture and services will be developed together with associated tools for consistency
checking in the richer model, reuse by subsumption, view materialization strategies, and other
components of the data warehousing software. These models and tools will make the knowledge
about operational alternatives and their configuration available to the data warehouse designer,
in order to allow the dynamic adaptation of the data warehouse structure and quality-of-service
to the ever-changing information sources and analysis patterns.
The increased accessibility of information over wide-area networks does not solve the problem to
have the right information in the right place at the right time with the right cost.
Data warehousing has become an important strategy to integrate heterogeneous information
sources in organizations, and to enable on-line analytic processing. Their development is a
consequence of the observation by W. Inmon and E. F. Codd in the early 1990’s that operational-
level on-line transaction processing (OLTP) and decision support applications (on-line analytic
processing or OLAP) cannot co-exist efficiently in the same database environment, mostly due to
their very different transaction characteristics.
A DW caches selected data of interest to a customer group, so that access becomes faster, cheaper
and more effective. As the long-term buffer between OLTP and OLAP (Figure 14.3), DW’s face two
essential questions: how to reconcile the stream of incoming data from multiple heterogeneous
legacy sources, and how to customize derived data storage to specific OLAP applications. The
trade-offs driving the design decisions concerning these two issues change continuously with
business needs, therefore design support and change management are of greatest importance
if we do not want to run DW projects into dead ends. This is a recognized problem in industry
which is not solvable without improved formal foundations.
260 LoveLy professionaL university