Page 270 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 270

Data Warehousing and Data Mining




                    notes          The second group of results focuses on enhancing these enriched models by tools that support the
                                   evolution and optimization of DW applications under changing quality goals. The corresponding
                                   tools include: evolution operators which document the link between design decisions and quality
                                   factors, reasoning methods which analyze and optimize view definitions with multi-dimensional
                                   aggregated data, and allow efficient quality control in bulk data reconciliation from new sources;
                                   and quantitative techniques which optimize data source selection, integration strategies, and
                                   redundant view materialization with respect to given quality criteria, esp. performance criteria.

                                   14.2.2 Quality factors and properties

                                   To carry out data evaluation we firstly need to identify which quality factors to evaluate. The
                                   choice of the most appropriate quality factors for a given DIS depends on the user applications
                                   and  the  way  the  DIS  is  implemented.  Several  works  study  the  quality  factors  that  are  more
                                   relevant for different types of systems. The selection of the appropriate quality factors implies
                                   the selection of metrics and the implementation of evaluation algorithms that measure, estimate
                                   or bound such quality factors.

                                   In order to calculate quality values corresponding to those factors, the algorithms need input
                                   information describing system properties such as, for example, the time an activity needs to
                                   execute or a descriptor stating if an activity materializes data or not. These properties can be of
                                   two types: (i) descriptions, indicating some feature of the system (costs, delays, policies, strategies,
                                   constraints, etc.), or (ii) measures, indicating a quality value corresponding to a quality factor,
                                   which can be an actual value acquired from a source, a calculated value obtained executing an
                                   evaluation algorithm or an expected value indicating the user desired value for the quality factor.
                                   The selection of the adequate properties depends on the quality factors that are relevant for the
                                   system and on the calculation processes.


                                          Example: Consider a system where users are interested in the evaluation of response time
                                   and freshness. To calculate the response time, it is necessary to know which activities materialize
                                   data and the execution cost of the activities that do not materialize data. To calculate the data
                                   freshness it is also necessary to know the refreshment frequencies and costs as well as the actual
                                   freshness of the data in the sources. Other examples of properties can include execution policies,
                                   source constraints and communication delays.

                                   14.3 the DWQ Data Warehouse Design Methodology

                                   Data warehouses support business decisions by collecting, consolidating, and organizing data for
                                   reporting and analysis with tools such as online analytical processing (OLAP) and data mining.
                                   Although  data  warehouses  are  built  on  relational  database  technology,  the  design  of  a  data
                                   warehouse database differs substantially from the design of an online transaction processing
                                   system (OLTP) database.

                                   14.3.1 Data Warehouses, oLtp, oLap and Data Mining

                                   A relational database is designed for a specific purpose. Because the purpose of a data warehouse
                                   differs from that of an OLTP, the design characteristics of a relational database that supports a
                                   data warehouse differ from the design characteristics of an OLTP database.













          264                              LoveLy professionaL university
   265   266   267   268   269   270   271   272   273   274   275