Page 215 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 215

Unit 11: Query Processing and Optimization




          You can define query evaluation as the query-execution engine taking a query-evaluation plan,   notes
          executing that plan, and returning the answers to the query. The query processing involves the
          study of the following concepts:
          How to measure query costs?
          Algorithms for evaluating relational algebraic operations.

          How to evaluate a complete expression using algorithms on individual operations?

          optimisation
          A relational algebra expression may have many equivalent expressions.


                 Example:  σ  (salary  <  5000)  (πsalary  (EMP))  is  equivalent  to  πsalary  (σsalary  <5000
          (EMP)).
          Each relational algebraic operation can be evaluated using one of the several different algorithms.
          Correspondingly, a relational-algebraic expression can be evaluated in many ways.
          An  expression  that  specifies  detailed  evaluation  strategy  is  known  as  evaluation-plan,  for
          example, you can use an index on salary to find employees with salary < 5000 or we can perform
          complete relation scan and discard employees with salary ≥ 5000. The basis of selection of any of
          the scheme will be the cost.
          Query Optimisation: Amongst all equivalent plans choose the one with the lowest cost. Cost
          is estimated using statistical information from the database catalogue, for example, number of
          tuples in each relation, size of tuples, etc.
          Thus, in query optimisation we find an evaluation plan with the lowest cost. The cost estimation
          is made on the basis of heuristic rules.

          11.2 Description and requirements for Data Warehouse Queries

          Most data warehouse queries are designed to follow a star schema and can process hundreds
          of millions of rows in a single query. By default, the query optimizer detects queries against
          star schemas and creates efficient query plans for them. One method the optimizer can use to
          generate an efficient plan is to use bitmap filtering. A bitmap filter uses a compact representation
          of a set of values from a table in one part of the operator tree to filter rows from a second table
          in another part of the tree. Essentially, the filter performs a semi-join reduction; that is, only the
          rows in the second table that qualify for the join to the first table are processed.
          In SQL Server 2008, bitmap filtering can be introduced in the query plan after optimization,
          as in SQL Server 2005, or introduced dynamically by the query optimizer during query plan
          generation. When the filter is introduced dynamically, it is referred to as an optimized bitmap
          filter. Optimized bitmap filtering can significantly improve the performance of data warehouse
          queries that use star schemas by removing non-qualifying rows from the fact table early in the
          query plan. Without optimized bitmap filtering, all rows in the fact table are processed through
          some part of the operator tree before the join operation with the dimension tables removes the
          non-qualifying rows. When optimized bitmap filtering is applied, the non-qualifying rows in the
          fact table are eliminated immediately.

          11.2.1 understanding Bitmap filtering


          The bitmap filter compares favorably to the bitmap index. A bitmap index is an alternate form
          for representing row ID (RID) lists in a value-list index using one or more bit vectors indicating
          which row in a table contains a certain column value. Both can be very effective in removing
          unnecessary rows from result processing; however, there are important differences between a



                                           LoveLy professionaL university                                   209
   210   211   212   213   214   215   216   217   218   219   220