Page 182 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 182
Data Warehousing and Data Mining
notes To summarize the previous discussion, we can say that a refreshment process is a complex system
which may be composed of asynchronous and parallel activities that need a certain monitoring.
The refreshment process is an event-driven system which evolves frequently, following the
evolution of data sources and user requirements. Users, data warehouse administrators and data
source administrators may impose specific constraints as, respectively, freshness of data, space
limitation of the ODS or CDW, and access frequency to sources. There is no simple and unique
refreshment strategy which is suitable for all data warehouse applications, for all data warehouse
user, or for the whole data warehouse lifetime.
9.3.2 The Refreshment Process is a Workflow
A workflow is a set of coordinated activities which might be manual or automated activities
performed by actors. Workflow concepts have been used in various application domains such
as business process modeling, cooperative applications modeling and database transaction
modeling. Depending on the application domain, activities and coordination are defined using
appropriate specification languages such as statechart diagrams and Petri nets, or active rules. In
spite of this diversity of applications and representation, most of the workflow users refer more
or less to the concepts and terminology defined by the Workflow Coalition. Workflow systems
are supposed to provide high level flexibility to recursively decompose and merge activities, and
allow dynamic reorganization of the workflow process. These features are typically useful in the
context of data warehouse refreshment as the activities are performed by market products whose
functionalities and scope differ from one product to another.
Task Suppose that the data for analysis include the attribute age. The age values for
the data tuples are (in creasing order):
13, 15, 16, 19, 20, 21, 22, 22, 25, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 58
1. Use smoothing by bin means to smooth the above data, using a bin depth of 3.
Illustrate your steps. Comment on the effect of this technique for the given data.
2. How might you determine outliers in the data?
3. Use Min-max transformation to transform the value 35 for age onto the range
[0.0,1.0]
9.4 implementation of the approach
In this section, you show how the refreshment process can be defined as a workflow application.
We illustrate the interest of this approach buy the ability to define different scenarios depending
on user requirements, source constraints and data warehouse constraints. I show that these
scenarios may evolve through the time to fulfill evolution of any of the previous requirements
and constraints.
9.4.1 The Workflow of the Refreshment Process
The refreshment process aims to propagate changes raised in the data sources to the data
warehouse stores. This propagation is done through a set of independent activities (extraction,
cleaning, integration, ...) that can be organized in different ways, depending on the semantics one
wants to assign to the refreshment process and on the quality he wants to achieve. The ordering
of these activities and the context in which they are executed define this semantics and influence
this quality. Ordering and context result from the analysis of view definitions, data source
constraints and user requirement in terms of quality factors. In the following subsections, we will
176 LoveLy professionaL university