Page 100 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 100

Data Warehousing and Data Mining




                    notes          3.   Store intermediate results: Sometimes logic for a query can be quite complex. Often, it
                                       is possible to achieve the desired result through the use of subqueries, inline views, and
                                       UNION-type statements. For those cases, the intermediate results are not stored in the
                                       database, but are immediately used within the query. This can lead to performance issues,
                                       especially when the intermediate results have a large number of rows.
                                   The way to increase query performance in those cases is to store the intermediate results in a
                                   temporary table, and break up the initial SQL statement into several SQL statements. In many
                                   cases, you can even build an index on the temporary table to speed up the query performance
                                   even more. Granted, this adds a little complexity in query management (i.e., the need to manage
                                   temporary tables), but the speedup in query performance is often worth the trouble.
                                   Specific query optimization strategies:

                                   1.   Use Index: Using an index is the first strategy one should use to speed up a query. In fact,
                                       this strategy is so important that index optimization is also discussed.
                                   2.   Aggregate Table: Pre-populating tables at higher levels so less amount of data need to be
                                       parsed.
                                   3.   Vertical Partitioning: Partition the table by columns. This strategy decreases the amount of
                                       data a SQL query needs to process.
                                   4.   Horizontal Partitioning: Partition the table by data value, most often time. This strategy
                                       decreases the amount of data a SQL query needs to process.
                                   5.   De-normalization: The process of de-normalization combines multiple tables into a single
                                       table. This speeds up query performance because fewer table joins are needed.
                                   6.   Server Tuning: Each server has its own parameters and often tuning server parameters so
                                       that it can fully take advantage of the hardware resources can significantly speed up query
                                       performance.

                                   5.5 update propagation

                                   Data Propagation is the distribution of data from one or more source data warehouses to one or
                                   more local access databases, according to propagation rules. Data warehouses need to manage
                                   big bulks of data every day. A data warehouse may start with a few data, and starts to grow day
                                   by day by constant sharing and receiving from various data sources.

                                   As  data  sharing  continues,  data  warehouse  management  becomes  a  big  issue.  Database
                                   administrators  need  to  manage  the  corporate  data  more  efficiently  and  in  different  subsets,
                                   groupings and time frames. As a company grows further, it may implement more and more data
                                   sources especially if the company expansions goes outside its current geographical location.
                                   Data  warehouses,  data  marts  and  operational  data  stores  are  becoming  indispensable  tools
                                   in today’s businesses. These data resources need to be constantly updated and the process of
                                   updating  involves  moving  large  volumes  of  data  from  one  system  to  another  and  forth  and
                                   back to a business intelligence system. It is common for data movement of high volumes to be
                                   performed in batches within a brief period without sacrificing performance of availability of
                                   operation applications or data from the warehouse.
                                   The  higher  the  volume  of  data  to  be  moved,  the  more  challenging  and  complex  the  process
                                   becomes.  As  such,  it  becomes  the  responsibility  of  the  data  warehouse  administrator to  find
                                   means of moving bulk data more quickly and identifying and moving only the data which has
                                   changed since the last data warehouse update.
                                   From these challenges, several new data propagation methods have been developed in business
                                   enterprises  resulting  in  data  warehouses  and  operational  data  stores  evolving  into  mission-




          94                               LoveLy professionaL university
   95   96   97   98   99   100   101   102   103   104   105