Page 182 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 182

Data Warehousing and Data Mining




                    notes          To summarize the previous discussion, we can say that a refreshment process is a complex system
                                   which may be composed of asynchronous and parallel activities that need a certain monitoring.
                                   The  refreshment  process  is  an  event-driven  system  which  evolves  frequently,  following  the
                                   evolution of data sources and user requirements. Users, data warehouse administrators and data
                                   source administrators may impose specific constraints as, respectively, freshness of data, space
                                   limitation of the ODS or CDW, and access frequency to sources. There is no simple and unique
                                   refreshment strategy which is suitable for all data warehouse applications, for all data warehouse
                                   user, or for the whole data warehouse lifetime.

                                   9.3.2 The Refreshment Process is a Workflow


                                   A workflow is a set of coordinated activities which might be manual or automated activities
                                   performed by actors. Workflow concepts have been used in various application domains such
                                   as  business  process  modeling,  cooperative  applications  modeling  and  database  transaction
                                   modeling. Depending on the application domain, activities and coordination are defined using
                                   appropriate specification languages such as statechart diagrams and Petri nets, or active rules. In
                                   spite of this diversity of applications and representation, most of the workflow users refer more
                                   or less to the concepts and terminology defined by the Workflow Coalition. Workflow systems
                                   are supposed to provide high level flexibility to recursively decompose and merge activities, and
                                   allow dynamic reorganization of the workflow process. These features are typically useful in the
                                   context of data warehouse refreshment as the activities are performed by market products whose
                                   functionalities and scope differ from one product to another.




                                      Task     Suppose that the data for analysis include the attribute age.  The age values for
                                     the data tuples are (in creasing order):
                                     13, 15, 16, 19, 20, 21, 22, 22, 25, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 58
                                     1.   Use  smoothing  by  bin  means  to  smooth  the  above  data,  using  a  bin  depth  of  3.
                                          Illustrate your steps. Comment on the effect of this technique for the given data.

                                     2.   How might you determine outliers in the data?
                                     3.   Use  Min-max  transformation  to  transform  the  value  35  for  age  onto  the  range
                                          [0.0,1.0]


                                   9.4 implementation of the approach

                                   In this section, you show how the refreshment process can be defined as a workflow application.
                                   We illustrate the interest of this approach buy the ability to define different scenarios depending
                                   on  user  requirements,  source  constraints  and  data  warehouse  constraints.  I  show  that  these
                                   scenarios may evolve through the time to fulfill evolution of any of the previous requirements
                                   and constraints.

                                   9.4.1 The Workflow of the Refreshment Process

                                   The  refreshment  process  aims  to  propagate  changes  raised  in  the  data  sources  to  the  data
                                   warehouse stores. This propagation is done through a set of independent activities (extraction,
                                   cleaning, integration, ...) that can be organized in different ways, depending on the semantics one
                                   wants to assign to the refreshment process and on the quality he wants to achieve. The ordering
                                   of these activities and the context in which they are executed define this semantics and influence
                                   this  quality.  Ordering  and  context  result  from  the  analysis  of  view  definitions,  data  source
                                   constraints and user requirement in terms of quality factors. In the following subsections, we will



          176                              LoveLy professionaL university
   177   178   179   180   181   182   183   184   185   186   187