Page 27 - DCAP606_BUSINESS_INTELLIGENCE
P. 27
Business Intelligence
Notes Self Assessment
State whether the following statements are True or False:
5. A hierarchy is a set of parent-child relationships between attributes within a dimension.
6. Level-based hierarchies can also contain parent-child relationships.
7. A Ragged Hierarchy describes a self-referencing relationship, or self-join, within a
dimension main table.
8. Parent attribute is also known as an unbalanced hierarchy.
9. Skip-level is hierarchies of attributes that are used in service of Microsoft SQL Server to
arrange the members of a dimension into hierarchical structures.
10. User-defined is a hierarchy in which certain members do not have values for certain
higher levels are known as skip-level hierarchy.
11. Ragged Hierarchy consists of values that define the hierarchy in a parent-child relationship.
Case Study Mining a Star Schema
ne of the strengths of Oracle Data Mining is the ability to mine star schemas with
minimal effort. Star schemas are commonly used in relational databases, and
Othey often contain rich data with interesting patterns. While dimension tables
may contain interesting demographics, fact tables will often contain user behaviour, such
as phone usage or purchase patterns. Both of these aspects - demographics and usage
patterns - can provide insight into behaviour.
Churn is a critical problem in the telecommunications industry, and companies go to
great lengths to reduce the churn of their customer base. One case study describes a
telecommunications scenario involving understanding, and identification of, churn, where
the underlying data is present in a star schema. That case study is a good example for
demonstrating just how natural it is for Oracle Data Mining to analyse a star schema, so it
will be used as the basis for this series of posts.
The case study schema includes four tables: CUSTOMERS, SERVICES, REVENUES, and
CDR_T. The CUSTOMERS table contains one row per customer, as does the SERVICES
table, and both contain a customer id that can be used to join the tables together. Most data
mining tools are capable of handling this type of data, where one row of input corresponds
to one case for mining. The other two tables have multiple rows for each customer. The
CDR_T (call data records) table contains multiple records for each customer which captures
calling behaviour. In the case study, this information is already pre-aggregated by type of
call (peak, international, etc.) per month, but the information may also be available at a
finer level of granularity. The REVENUES table contains the revenue per customer on a
monthly basis for a five month history, so there are up to five rows per customer. Capturing
the information in the CDR_T and REVENUES table to help predict churn for a single
customer requires collapsing all of this fact table information into a single “case” per
customer. Most tools will require pivoting the data into columns, which has the drawbacks
of densifying data as well as pivoting data beyond column count limitations. The data in
a fact table is often stored in sparse form (this case study aggregates it to a denser form, but
it need not be this way for other mining activities), and keeping it in sparse form is highly
desirable.
Contd....
22 LOVELY PROFESSIONAL UNIVERSITY