Page 29 - DCAP606_BUSINESS_INTELLIGENCE
P. 29
Business Intelligence
Notes
due to the coarse level of aggregation, but there may be some missing values.
Missing entries may occur for a number of reasons: the customer made no calls of a
particular type in a particular month, the customer switched providers during the
timeframe, or perhaps there is a data entry problem. In the first situation, the correct
interpretation of a missing entry would be to assume that the number of minutes for
the type-month combination is zero. In the other situations, it is not appropriate to
assume zero, but rather derive some representative value to replace the missing
entries. The referenced case study takes the latter approach. The data is segmented
by customer and call type, and within a given customer-call type combination, an
average number of minutes is computed and used as a replacement value.
In SQL, we need to generate additional rows for the missing entries and populate
those rows with appropriate values. To generate the missing rows, Oracle’s partition
outer join feature is a perfect fit.
select cust_id, cdre.tariff, cdre.month, mins
from cdr_t cdr partition by (cust_id) right outer join
(select distinct tariff, month from cdr_t) cdre
on (cdr.month = cdre.month and cdr.tariff = cdre.tariff);
I have chosen to use a distinct on the CDR_T table to generate the set of values, but
a more rigorous and performant (but less compact) approach would be to explicitly
list the tariff-month combinations in the cdre inlined subquery rather than go directly
against the CDR_T table itself.
Now that the missing rows are generated, we need to replace the missing value
entries with representative values as computed on a per-customer-call type basis.
Oracle’s analytic functions are a great match for this step.
select cust_id, tariff, month,
nvl(mins, round(avg(mins) over (partition by cust_id, tariff))) mins
from (<prev query>);
We can use the avg function, and specify the partition by feature of the over clause
to generate an average within each customer-call type group. The nvl function will
replace the missing values with the tailored, computed averages.
2. Transposing Call Data Records: The next transformation step in the case study
involves transposing the data in CDR_T from a multiple row per customer format
to a single row per customer by generating new columns for all of the tariff-month
combinations. While this is feasible with a small set of combinations, it will be
problematic when addressing items with higher cardinality. Oracle Data Mining
does not need to transpose the data. Instead, the data is combined using Oracle’s
object-relational technology so that it can remain in its natural, multi-row format.
Oracle Data Mining has introduced two data types to capture such data -
DM_NESTED_NUMERICALS and DM_NESTED_CATEGORICALS.
In addition, the case study suggests adding an attribute which contains the total
number of minutes per call type for a customer (summed across all months). Oracle’s
rollup syntax is useful for generating aggregates at different levels of granularity.
select cust_id,
cast(collect(dm_nested_numerical(tariff||’-’||nvl(month,’ALL’),mins))
as dm_nested_numericals) mins_per_tariff_mon from
(select cust_id, tariff, month, sum(mins) mins
Contd....
24 LOVELY PROFESSIONAL UNIVERSITY