Page 30 - DCAP606_BUSINESS_INTELLIGENCE
P. 30
Unit 2: Multidimensional Analysis
Notes
from (<prev query>)
group by cust_id, tariff, rollup(month))
group by cust_id;
The above query will first aggregate the minutes by cust_id-tariff-month
combination, but it will also rollup the month column to produce a total for each
cust_id-tariff combination. While the data in the case study was already aggregated
at the month level, the above query would also work on data that is at a finer
granularity.
Once the data is generated by the inner query, there is an outer group by on cust_id
with the COLLECT operation. The purpose of this step is to generate an output of
one row per customer, but each row contains an entry of type
DM_NESTED_NUMERICALS. This entry is a collection of pairs that capture the
number of minutes per tariff-month combination.
While we performed missing value replacement in the previous transformation
step, thereby densifying the data, Oracle Data Mining has a natural treatment for
missing rows. When data is presented as a DM_NESTED_NUMERICALS column, it
is assumed that any missing entries correspond to a zero in the value - matching the
first option for missing value treatment described earlier. If this is the correct
interpretation for missing values, then no missing value treatment step is necessary.
The data can remain in sparse form, yet the algorithms will correctly interpret the
missing entries as having an implicit value of zero.
3. Transposing Revenue Records: Again, no need to transpose when using Oracle Data
Mining. We add an aggregate to produce the total revenue per customer in addition
to the per-month breakout coming from the COLLECT.
select cust_id, sum(revenue) rev_tot_sum,
cast(collect(dm_nested_numerical(‘REV-’||month, revenue))
as dm_nested_numericals) rev_per_mon
from revenues
group by cust_id;
4. Creating Derived Attributes: The final transformation step in the case study is to
generate some additional derived attributes, and connect everything together so
that each customer is composed of a single entity that includes all of the attributes
that have been identified to this point.
The PIVOT operation is used to generate named columns that can be easily combined
with arithmetic operations. Binning and filtering steps, as identified in the case
study, are included in the above SQL.
The query can execute in parallel on SMPs, as well as MPPs using Oracle’s RAC
technology. The data can be directly fed to Oracle Data Mining without having to
extract it from the database, materialize copies of any parts of the underlying tables,
or pivot data that is in a naturally multi-row format.
Questions:
1. How the missing values were handled for call data records?
2. Why do we transposing revenue records?
Source: http://amozes-oracle.blogspot.in/2010/12/mining-star-schema-telco-churn-case.html
LOVELY PROFESSIONAL UNIVERSITY 25