Page 27 - DCAP606_BUSINESS_INTELLIGENCE
P. 27

Business Intelligence




                    Notes          Self Assessment

                                   State whether the following statements are True or False:
                                   5.  A hierarchy is a set of parent-child relationships between attributes within a dimension.
                                   6.  Level-based hierarchies can also contain parent-child relationships.

                                   7.  A Ragged Hierarchy describes a self-referencing relationship, or self-join, within a
                                       dimension main table.
                                   8.  Parent attribute is also known as an unbalanced hierarchy.

                                   9.  Skip-level is hierarchies of attributes that are used in service of Microsoft SQL Server to
                                       arrange the members of a dimension into hierarchical structures.
                                   10.  User-defined is a hierarchy in which certain members do not have values for certain
                                       higher levels are known as skip-level hierarchy.
                                   11.  Ragged Hierarchy consists of values that define the hierarchy in a parent-child relationship.




                                     Case Study  Mining a Star Schema

                                            ne of the strengths of Oracle Data Mining is the ability to mine star schemas with
                                            minimal effort. Star schemas are commonly used in relational databases, and
                                     Othey often contain rich data with interesting patterns. While dimension tables
                                     may contain interesting demographics, fact tables will often contain user behaviour, such
                                     as phone usage or purchase patterns. Both of these aspects - demographics and usage
                                     patterns - can provide insight into behaviour.
                                     Churn is a critical problem in the telecommunications industry, and companies go to
                                     great lengths to reduce the churn of their customer base. One case study describes a
                                     telecommunications scenario involving understanding, and identification of, churn, where
                                     the underlying data is present in a star schema. That case study is a good example for
                                     demonstrating just how natural it is for Oracle Data Mining to analyse a star schema, so it
                                     will be used as the basis for this series of posts.
                                     The case study schema includes four tables: CUSTOMERS, SERVICES, REVENUES, and
                                     CDR_T. The CUSTOMERS table contains one row per customer, as does the SERVICES
                                     table, and both contain a customer id that can be used to join the tables together. Most data
                                     mining tools are capable of handling this type of data, where one row of input corresponds
                                     to one case for mining. The other two tables have multiple rows for each customer. The
                                     CDR_T (call data records) table contains multiple records for each customer which captures
                                     calling behaviour. In the case study, this information is already pre-aggregated by type of
                                     call (peak, international, etc.) per month, but the information may also be available at a
                                     finer level of granularity. The REVENUES table contains the revenue per customer on a
                                     monthly basis for a five month history, so there are up to five rows per customer. Capturing
                                     the information in the CDR_T and REVENUES table to help predict churn for a single
                                     customer requires collapsing all of this fact table information into a single “case” per
                                     customer. Most tools will require pivoting the data into columns, which has the drawbacks
                                     of densifying data as well as pivoting data beyond column count limitations. The data in
                                     a fact table is often stored in sparse form (this case study aggregates it to a denser form, but
                                     it need not be this way for other mining activities), and keeping it in sparse form is highly
                                     desirable.
                                                                                                         Contd....



          22                                LOVELY PROFESSIONAL UNIVERSITY
   22   23   24   25   26   27   28   29   30   31   32