Page 162 - DCAP603_DATAWARE_HOUSING_AND

Page 162 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING

P. 162

Data Warehousing and Data Mining

notes It is convenient to associate a wrapper with the data source in order to provide a uniform
description of the capabilities of the data sources. Moreover the role of the wrapper in a data
warehouse context in enlarged. Its first functionality is to give a description of the data stored by
each data source in a common data model. I assume that this common model is a relational data
model. This is the typical functionality of a wrapper in a classical wrapper/mediator architecture
therefore, I will call it wrapper functionality. The second functionality is to detect the changes of
interest that have happened in the underlying data source. This is a specific functionality required
by data warehouse architectures in order to support the refreshment of the data warehouse in
a incremental way. For this reason I reserve the term change monitoring to refer to this kind of
functionality.

Wrapper functionality

The principal function of the wrapper relative to this functionality is to make the underlying data
source appear as having the same data format and model that are used in the data warehouse
system. For instance, if the data source is a set of XML document and the data model used in the
data warehouse is the relational model, then the wrapper must be defined in such a way that it
present the data sources of this type as it they were relational.
The development of wrapper generators has received attention from the research community
especially in the case of sources that contain semi-structured data such as HTML or SGML
documents. These tools for instance, enable to query the documents using an OQL-base
interface.
Another important function that should be implemented by the wrapper is to establish the
communication with the underlying data source and allow the transfer of information between
the data source and the change monitor component. If the data warehouse system and the data
source share the same data model then the function of the wrapper would be just to translate
the data format and to support the communication with the data source. For data sources that
are relational system and supposing that the data model used in the data warehouse is also
relational it is possible to use wrappers that have been developed by software companies such as
database vendors or database independent companies. These wrappers also called “middleware”,
“gateways” or “brokers” have varying capabilities in terms of application programming interface,
performance and extensibility.
In the client server database environment several kinds of middleware have already been
developed to enable the exchange queries and their associated answers between a client
application and a database server, or between database servers in a transparent way. The term
“transparent” usually means that the middleware hide the underlying network protocol, the
database systems and the database query languages supported by these database systems from
the application.
The usual sequence of steps during the interaction of a client application and a database server
through a middleware agent is as follows. First, the middleware enables the application to connect
and disconnect to the database server. Then, it allows the preparation and execution of requests.
A request preparation specifies the request with formal parameters which generally entails its
compilation in the server. A prepared request can then be executed by invoking its name and
passing its actual parameters. Requests are generally expressed in SQL. Another functionality
offered by middleware is the fetching of results which enables a client application to get back all
or part of the result of a request. When the results are large, they can be cached on the serve. The
transfer of requests and results is often built on a protocol supporting remote procedure calls.
There has been an important effort to standardize the programming interface offered by
middleware and the underlying communication protocol. Call Level Interface (CLI) is a
standardized API developed by the X/Open standardization committee. It enables a client
application to extract data from a relational database server through a standard SQL-based

156 LoveLy professionaL university

157 158 159 160 161 162 163 164 165 166 167