Page 199 - DCAP603_DATAWARE_HOUSING_AND

Page 199 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING

P. 199

Unit 10: Multi-dimensional Data Models and Aggregation

The major difference between the snowflake and star schema models is that the dimension tables notes
of the snowflake model may be kept in normalised form. Such a table is easy to maintain and also
saves storage space because a large dimension table can be extremely large when the dimensional
structure is included as columns. Since much of this space is redundant data, creating a normalised
structure will reduce the overall space requirement. However, the snowflake structure can reduce
the effectiveness of browsing since more joins will be needed to execute a query. Consequently,
the system performance may be adversely impacted. Performance benchmarking can be used to
determine what is best for your design.

Example: In Snowflake schema, the example diagram shown in Figure 10.6 has 4
dimension tables, 4 lookup tables and 1 fact table. The reason is that hierarchies (category, branch,
state, and month) are being broken out of the dimension tables (PRODUCT, ORGANISATION,
LOCATION, and TIME) respectively and shown separately. In OLAP, this Snowflake schema
approach increases the number of joins and poor performance in retrieval of data.

Figure 10.6: Example of Snowflake Schema

A compromise between the star schema and the snowflake schema is to adopt a mixed schema
where only the very large dimension tables are normalised. Normalising large dimension tables
saves storage space, while keeping small dimension tables unnormalised may reduce the cost and
performance degradation due to joins on multiple dimension tables. Doing both may lead to an
overall performance gain. However, careful performance tuning could be required to determine
which dimension tables should be normalised and split into multiple tables.

LoveLy professionaL university 193

194 195 196 197 198 199 200 201 202 203 204