Page 270 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 270
Data Warehousing and Data Mining
notes The second group of results focuses on enhancing these enriched models by tools that support the
evolution and optimization of DW applications under changing quality goals. The corresponding
tools include: evolution operators which document the link between design decisions and quality
factors, reasoning methods which analyze and optimize view definitions with multi-dimensional
aggregated data, and allow efficient quality control in bulk data reconciliation from new sources;
and quantitative techniques which optimize data source selection, integration strategies, and
redundant view materialization with respect to given quality criteria, esp. performance criteria.
14.2.2 Quality factors and properties
To carry out data evaluation we firstly need to identify which quality factors to evaluate. The
choice of the most appropriate quality factors for a given DIS depends on the user applications
and the way the DIS is implemented. Several works study the quality factors that are more
relevant for different types of systems. The selection of the appropriate quality factors implies
the selection of metrics and the implementation of evaluation algorithms that measure, estimate
or bound such quality factors.
In order to calculate quality values corresponding to those factors, the algorithms need input
information describing system properties such as, for example, the time an activity needs to
execute or a descriptor stating if an activity materializes data or not. These properties can be of
two types: (i) descriptions, indicating some feature of the system (costs, delays, policies, strategies,
constraints, etc.), or (ii) measures, indicating a quality value corresponding to a quality factor,
which can be an actual value acquired from a source, a calculated value obtained executing an
evaluation algorithm or an expected value indicating the user desired value for the quality factor.
The selection of the adequate properties depends on the quality factors that are relevant for the
system and on the calculation processes.
Example: Consider a system where users are interested in the evaluation of response time
and freshness. To calculate the response time, it is necessary to know which activities materialize
data and the execution cost of the activities that do not materialize data. To calculate the data
freshness it is also necessary to know the refreshment frequencies and costs as well as the actual
freshness of the data in the sources. Other examples of properties can include execution policies,
source constraints and communication delays.
14.3 the DWQ Data Warehouse Design Methodology
Data warehouses support business decisions by collecting, consolidating, and organizing data for
reporting and analysis with tools such as online analytical processing (OLAP) and data mining.
Although data warehouses are built on relational database technology, the design of a data
warehouse database differs substantially from the design of an online transaction processing
system (OLTP) database.
14.3.1 Data Warehouses, oLtp, oLap and Data Mining
A relational database is designed for a specific purpose. Because the purpose of a data warehouse
differs from that of an OLTP, the design characteristics of a relational database that supports a
data warehouse differ from the design characteristics of an OLTP database.
264 LoveLy professionaL university