Page 236 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 236

Data Warehousing and Data Mining




                    notes          12.2.5 the Meta Data hub

                                   The meta data hub is used for managing the interchange and sharing of technical meta data
                                   between decision processing products. It is intended for use primarily by technical staff during
                                   the development and maintenance of data warehouses. The four main requirements of such a
                                   hub are:
                                   1.   A  meta  data  hub  should  support  the  interchange  of  meta  data  between  systems  and
                                       products in a distributed meta data environment. The hub should have a documented
                                       and  open  programmatic  object  interface  (employing  COM  or  CORBA,  for  example)
                                       that enables third-party tools to use the services of the hub. A file transfer mechanism
                                       supporting industry recognized file formats (comma delimited file, Meta Data Coalition
                                       MDIS, Microsoft XML Interchange Format, for example) should also be provided for meta
                                       data interchange.
                                   2.   A meta data hub should provide persistent stores for the management and sharing of meta
                                       data. Meta data in a store should be maintainable by the object API and file transfer methods
                                       outlined above and via supplied GUI and Web client interactive interfaces. An interactive and
                                       batch meta data impact analysis and reporting feature is also required. The hub should offer an
                                       agent interface that can scan and capture, at user-defined intervals, local products and systems
                                       for new or modified meta data for adding to the meta data store. The meta data manager used
                                       to maintain meta data in the store should support version and library control features that
                                       can create a historical record of meta data changes and support group development. In large
                                       distributed environments, the administrator should be able to physically partition the meta
                                       data environment across multiple hub servers and meta data stores.
                                   3.   The meta data hub should, at a minimum, be able to manage data warehouse information
                                       store definitions. Formats supported should include relational tables and columns, and
                                       multidimensional  measures  and  dimensions.  Another  type  of  meta  data  that  could  be
                                       handled is information about the data sources used to create data warehouse information
                                       and about the transforms applied to this source data before it is loaded in a warehouse. It
                                       is recognized, however, that current ETL tools use their own proprietary transformation
                                       methods,  making  it  difficult  to  create  a  generalized  facility  for  managing  this  type  of
                                       meta data. The product should at least provide the ability to document data source and
                                       transformation meta data in free-form text format. Ideally, the hub should also document
                                       details about the business meta data associated with the common business model discussed
                                       earlier  and  the  business  views  employed  by  business  intelligence  tools  and  analytic
                                       applications to access warehouse information.
                                       The hub should use industry-standard meta data models or supply its own meta-models for
                                       the various types of meta data it manages. These meta-models should be fully documented
                                       and extensible.

                                   12.3 a repository Model for the DWQ framework


                                   In the DWQ (Data Warehouse Quality) project we have advocated the need for enriched metadata
                                   facilities for the exploitation of the knowledge collected in a data warehouse.
                                   The  proposed  categorization  of  the  DW  metadata  is  based  on  a  3x3  framework,  depicted  in
                                   figure 12.2: you identified three perspectives (conceptual, logical and physical) and three levels
                                   (source, data warehouse, client). We made the observation, that the conceptual perspective, which
                                   represents the real world of an enterprise, is missing in most data warehousing projects, with the
                                   risk of incorrectly representing or interpreting the information found in the data warehouse.









          230                              LoveLy professionaL university
   231   232   233   234   235   236   237   238   239   240   241