Page 28 - DCAP606_BUSINESS_INTELLIGENCE
P. 28

Unit 2: Multidimensional Analysis




                                                                                                Notes
            For fact table data that has a much larger number of interesting groups (such as per-
            product sales information of a large retailer), retaining the sparse format becomes critical
            to avoid densification of such high cardinality information. Oracle Data Mining algorithms
            are designed to interpret missing entries in a sparse fact table appropriately, enabling
            increased performance and simpler transformation processing.
            Some steps in the referenced case study are not completely defined (in my opinion), and in
            those situations I will take my best guess as to the intended objective. This approximation
            is sufficient since the intent of this series of posts is to show the power and flexibility of
            Oracle Data Mining on a real-world scenario rather than to match the case study letter-for-
            letter.
            The following files support reproduction of the results in this series of posts:

            telcoddl.sql - SQL which creates the four tables
            telcoloadproc.plb - Obfuscated SQL which creates the procedure that can generate data
            and populate the tables - all data is generated, and patterns are injected to make it interesting
            and “real-world” like
            telcoprep.sql - A SQL create view statement corresponding to the data preparation steps
            from part 2 of this series
            telcomodel.sql - A SQL script corresponding to the steps from part 3 of this series
            In order to prepare a schema that can run the above SQL, a user must be created with the
            following privileges: create table, create view, create mining model, and create procedure
            (for telcoloadproc), as well as any other privs as needed for the database user (e.g., create
            session).Once the schema is prepared, telcoddl.sql and telcoloadproc.plb can be run to
            create the empty tables and the procedure for loading data. The procedure that is created
            is named telco_load, and it takes one optional argument - the number of customers (default
            10000). The results from parts 2 and 3 of this series correspond to loading 10,000 customers.

            The sample code in these posts has been tested against an 11gR2 database. Many new
            features have been added in each release, so some of the referenced routines and syntax
            are not available in older releases; however, similar functionality can be achieved with
            10g. The following modified scripts can be used with 10g (tested with 10gR2):
            telcoprep_10g.sql - A SQL create view statement corresponding to the data preparation
            steps from part 2 of this series, including substitution for the 11g PIVOT syntax and
            inclusion of manual data preparation for nested columns.

            telcomodel_10g.sql - A SQL script corresponding to the steps from part 3 of this series,
            including substitution of the Generalized Linear Model algorithm for 10g Support Vector
            Machine, manual data preparation leveraging the transformation package, use of
            dbms_data_mining.apply instead of 10gR2 built-in data mining scoring functions, explicit
            commit of settings prior to build, and removal of the EXPLAIN routine from the script
            flow.
            In addition, the create mining model privilege is not available in 10g.

            1.   Handling missing values for call data records: The CDR_T table records the number
                 of phone minutes used by a customer per month and per call type (tariff). For
                 example, the table may contain one record corresponding to the number of peak
                 (call type) minutes in January for a specific customer, and another record associated
                 with international calls in March for the same customer. This table is likely to be
                 fairly dense (most type-month combinations for a given customer will be present)
                                                                                 Contd....



                                           LOVELY PROFESSIONAL UNIVERSITY                                   23
   23   24   25   26   27   28   29   30   31   32   33