Page 30 - DCAP606_BUSINESS_INTELLIGENCE
P. 30

Unit 2: Multidimensional Analysis




                                                                                                Notes
              from (<prev query>)
              group by cust_id, tariff, rollup(month))
              group by cust_id;
                 The above query will first aggregate the minutes by cust_id-tariff-month
                 combination, but it will also rollup the month column to produce a total for each
                 cust_id-tariff combination. While the data in the case study was already aggregated
                 at the month level, the above query would also work on data that is at a finer
                 granularity.
                 Once the data is generated by the inner query, there is an outer group by on cust_id
                 with the COLLECT operation. The purpose of this step is to generate an output of
                 one row per customer, but each row contains an entry of type
                 DM_NESTED_NUMERICALS. This entry is a collection of pairs that capture the
                 number of minutes per tariff-month combination.
                 While we performed missing value replacement in the previous transformation
                 step, thereby densifying the data, Oracle Data Mining has a natural treatment for
                 missing rows. When data is presented as a DM_NESTED_NUMERICALS column, it
                 is assumed that any missing entries correspond to a zero in the value - matching the
                 first option for missing value treatment described earlier. If this is the correct
                 interpretation for missing values, then no missing value treatment step is necessary.
                 The data can remain in sparse form, yet the algorithms will correctly interpret the
                 missing entries as having an implicit value of zero.

            3.   Transposing Revenue Records: Again, no need to transpose when using Oracle Data
                 Mining. We add an aggregate to produce the total revenue per customer in addition
                 to the per-month breakout coming from the COLLECT.
             select cust_id, sum(revenue) rev_tot_sum,
              cast(collect(dm_nested_numerical(‘REV-’||month, revenue))
              as dm_nested_numericals) rev_per_mon
             from revenues
             group by cust_id;
             4.  Creating Derived Attributes: The final transformation step in the case study is to
                 generate some additional derived attributes, and connect everything together so
                 that each customer is composed of a single entity that includes all of the attributes
                 that have been identified to this point.
                 The PIVOT operation is used to generate named columns that can be easily combined
                 with arithmetic operations. Binning and filtering steps, as identified in the case
                 study, are included in the above SQL.
                 The query can execute in parallel on SMPs, as well as MPPs using Oracle’s RAC
                 technology. The data can be directly fed to Oracle Data Mining without having to
                 extract it from the database, materialize copies of any parts of the underlying tables,
                 or pivot data that is in a naturally multi-row format.
             Questions:
             1.  How the missing values were handled for call data records?

             2.  Why do we transposing revenue records?
          Source:  http://amozes-oracle.blogspot.in/2010/12/mining-star-schema-telco-churn-case.html





                                           LOVELY PROFESSIONAL UNIVERSITY                                   25
   25   26   27   28   29   30   31   32   33   34   35