Page 29 - DCAP606_BUSINESS_INTELLIGENCE
P. 29

Business Intelligence




                    Notes
                                          due to the coarse level of aggregation, but there may be some missing values.
                                          Missing entries may occur for a number of reasons: the customer made no calls of a
                                          particular type in a particular month, the customer switched providers during the
                                          timeframe, or perhaps there is a data entry problem. In the first situation, the correct
                                          interpretation of a missing entry would be to assume that the number of minutes for
                                          the type-month combination is zero. In the other situations, it is not appropriate to
                                          assume zero, but rather derive some representative value to replace the missing
                                          entries. The referenced case study takes the latter approach. The data is segmented
                                          by customer and call type, and within a given customer-call type combination, an
                                          average number of minutes is computed and used as a replacement value.
                                          In SQL, we need to generate additional rows for the missing entries and populate
                                          those rows with appropriate values. To generate the missing rows, Oracle’s partition
                                          outer join feature is a perfect fit.
                                     select cust_id, cdre.tariff, cdre.month, mins
                                     from cdr_t cdr partition by (cust_id) right outer join
                                      (select distinct tariff, month from cdr_t) cdre
                                      on (cdr.month = cdre.month and cdr.tariff = cdre.tariff);

                                          I have chosen to use a distinct on the CDR_T table to generate the set of values, but
                                          a more rigorous and performant (but less compact) approach would be to explicitly
                                          list the tariff-month combinations in the cdre inlined subquery rather than go directly
                                          against the CDR_T table itself.
                                          Now that the missing rows are generated, we need to replace the missing value
                                          entries with representative values as computed on a per-customer-call type basis.
                                          Oracle’s analytic functions are a great match for this step.
                                     select cust_id, tariff, month,
                                      nvl(mins, round(avg(mins) over (partition by cust_id, tariff))) mins
                                     from (<prev query>);
                                          We can use the avg function, and specify the partition by feature of the over clause
                                          to generate an average within each customer-call type group. The nvl function will
                                          replace the missing values with the tailored, computed averages.

                                     2.   Transposing Call Data Records: The next transformation step in the case study
                                          involves transposing the data in CDR_T from a multiple row per customer format
                                          to a single row per customer by generating new columns for all of the tariff-month
                                          combinations. While this is feasible with a small set of combinations, it will be
                                          problematic when addressing items with higher cardinality. Oracle Data Mining
                                          does not need to transpose the data. Instead, the data is combined using Oracle’s
                                          object-relational technology so that it can remain in its natural, multi-row format.
                                          Oracle Data Mining has introduced two data types to capture such data -
                                          DM_NESTED_NUMERICALS and DM_NESTED_CATEGORICALS.
                                          In addition, the case study suggests adding an attribute which contains the total
                                          number of minutes per call type for a customer (summed across all months). Oracle’s
                                          rollup syntax is useful for generating aggregates at different levels of granularity.
                                     select cust_id,
                                      cast(collect(dm_nested_numerical(tariff||’-’||nvl(month,’ALL’),mins))
                                      as dm_nested_numericals) mins_per_tariff_mon from
                                      (select cust_id, tariff, month, sum(mins) mins
                                                                                                         Contd....



          24                                LOVELY PROFESSIONAL UNIVERSITY
   24   25   26   27   28   29   30   31   32   33   34