Page 100 - DCAP603_DATAWARE_HOUSING_AND

Page 100 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING

P. 100

Data Warehousing and Data Mining

notes 3. Store intermediate results: Sometimes logic for a query can be quite complex. Often, it
is possible to achieve the desired result through the use of subqueries, inline views, and
UNION-type statements. For those cases, the intermediate results are not stored in the
database, but are immediately used within the query. This can lead to performance issues,
especially when the intermediate results have a large number of rows.
The way to increase query performance in those cases is to store the intermediate results in a
temporary table, and break up the initial SQL statement into several SQL statements. In many
cases, you can even build an index on the temporary table to speed up the query performance
even more. Granted, this adds a little complexity in query management (i.e., the need to manage
temporary tables), but the speedup in query performance is often worth the trouble.
Specific query optimization strategies:

1. Use Index: Using an index is the first strategy one should use to speed up a query. In fact,
this strategy is so important that index optimization is also discussed.
2. Aggregate Table: Pre-populating tables at higher levels so less amount of data need to be
parsed.
3. Vertical Partitioning: Partition the table by columns. This strategy decreases the amount of
data a SQL query needs to process.
4. Horizontal Partitioning: Partition the table by data value, most often time. This strategy
decreases the amount of data a SQL query needs to process.
5. De-normalization: The process of de-normalization combines multiple tables into a single
table. This speeds up query performance because fewer table joins are needed.
6. Server Tuning: Each server has its own parameters and often tuning server parameters so
that it can fully take advantage of the hardware resources can significantly speed up query
performance.

5.5 update propagation

Data Propagation is the distribution of data from one or more source data warehouses to one or
more local access databases, according to propagation rules. Data warehouses need to manage
big bulks of data every day. A data warehouse may start with a few data, and starts to grow day
by day by constant sharing and receiving from various data sources.

As data sharing continues, data warehouse management becomes a big issue. Database
administrators need to manage the corporate data more efficiently and in different subsets,
groupings and time frames. As a company grows further, it may implement more and more data
sources especially if the company expansions goes outside its current geographical location.
Data warehouses, data marts and operational data stores are becoming indispensable tools
in today’s businesses. These data resources need to be constantly updated and the process of
updating involves moving large volumes of data from one system to another and forth and
back to a business intelligence system. It is common for data movement of high volumes to be
performed in batches within a brief period without sacrificing performance of availability of
operation applications or data from the warehouse.
The higher the volume of data to be moved, the more challenging and complex the process
becomes. As such, it becomes the responsibility of the data warehouse administrator to find
means of moving bulk data more quickly and identifying and moving only the data which has
changed since the last data warehouse update.
From these challenges, several new data propagation methods have been developed in business
enterprises resulting in data warehouses and operational data stores evolving into mission-

94 LoveLy professionaL university

95 96 97 98 99 100 101 102 103 104 105