Page 236 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 236
Data Warehousing and Data Mining
notes 12.2.5 the Meta Data hub
The meta data hub is used for managing the interchange and sharing of technical meta data
between decision processing products. It is intended for use primarily by technical staff during
the development and maintenance of data warehouses. The four main requirements of such a
hub are:
1. A meta data hub should support the interchange of meta data between systems and
products in a distributed meta data environment. The hub should have a documented
and open programmatic object interface (employing COM or CORBA, for example)
that enables third-party tools to use the services of the hub. A file transfer mechanism
supporting industry recognized file formats (comma delimited file, Meta Data Coalition
MDIS, Microsoft XML Interchange Format, for example) should also be provided for meta
data interchange.
2. A meta data hub should provide persistent stores for the management and sharing of meta
data. Meta data in a store should be maintainable by the object API and file transfer methods
outlined above and via supplied GUI and Web client interactive interfaces. An interactive and
batch meta data impact analysis and reporting feature is also required. The hub should offer an
agent interface that can scan and capture, at user-defined intervals, local products and systems
for new or modified meta data for adding to the meta data store. The meta data manager used
to maintain meta data in the store should support version and library control features that
can create a historical record of meta data changes and support group development. In large
distributed environments, the administrator should be able to physically partition the meta
data environment across multiple hub servers and meta data stores.
3. The meta data hub should, at a minimum, be able to manage data warehouse information
store definitions. Formats supported should include relational tables and columns, and
multidimensional measures and dimensions. Another type of meta data that could be
handled is information about the data sources used to create data warehouse information
and about the transforms applied to this source data before it is loaded in a warehouse. It
is recognized, however, that current ETL tools use their own proprietary transformation
methods, making it difficult to create a generalized facility for managing this type of
meta data. The product should at least provide the ability to document data source and
transformation meta data in free-form text format. Ideally, the hub should also document
details about the business meta data associated with the common business model discussed
earlier and the business views employed by business intelligence tools and analytic
applications to access warehouse information.
The hub should use industry-standard meta data models or supply its own meta-models for
the various types of meta data it manages. These meta-models should be fully documented
and extensible.
12.3 a repository Model for the DWQ framework
In the DWQ (Data Warehouse Quality) project we have advocated the need for enriched metadata
facilities for the exploitation of the knowledge collected in a data warehouse.
The proposed categorization of the DW metadata is based on a 3x3 framework, depicted in
figure 12.2: you identified three perspectives (conceptual, logical and physical) and three levels
(source, data warehouse, client). We made the observation, that the conceptual perspective, which
represents the real world of an enterprise, is missing in most data warehousing projects, with the
risk of incorrectly representing or interpreting the information found in the data warehouse.
230 LoveLy professionaL university