Page 216 - DCAP402_DCAO204_DATABASE MANAGEMENT SYSTEM_MANAGING DATABASE
P. 216

Unit 12: Query Processing and Optimization




          Partitioning is said to  be  skewed  if  some partitions  have significantly more  tuples than the  Notes
          others. This is the overflow condition. The overflow can be handled in a variety of ways:
          1.   Resolution (during the build phase): The overflow partition s is further partitioned using
               different hash function. The equivalent partition of r must be further partitioned similarily.
          2.   Avoidance (during  build phase): Partition build  relations  into  many partitions,  then
               combines them.
          However, such approaches for handling overflows fail with large numbers of duplicates. One
          option of avoiding such problems is to use block nested-loop join on the overflowed partitions.
                                                                  
          Let us explain the hash join and its cost for the natural join STUDENT  MARKS
          Assume a memory size of 25 blocks  M=25;
          SELECT build s as STUDENT as it has less number of blocks (100 blocks) and r probe as MARKS
          (500 blocks).
          Number of partitions to be created for STUDENT = (block of STUDENT/M)* fudge factor (1.2) =
          (100/25) × 1.2 = 4.8

          Thus, STUDENT relation will be partitioned into 5 partitions of 20 blocks each. MARKS will also
          have 5 partitions of 100 blocks each. The 25 buffers will be used as –20 blocks for one complete
          partition of STUDENT plus 4 more blocks for one block of each of the other 4 partitions. One
          block will be used for input of MARKS partitions.
          The total cost = 3(100+500) = 1800 as no recursive partitioning is needed.

          Hybrid Hash-join

          This is useful when the size of the memory is relatively large, and the build input is larger than
          the memory. Hybrid hash join keeps the first partition of the build relation in the memory. The
          first partition of STUDENT is maintained in the first 20 blocks of the buffer, and not written to
          the disk. The first block of MARKS is used right away for probing, instead of being written and
          read back. Thus, it has a cost of 3(80 + 400) + 20 +100 = 1560 block transfers for hybrid hash-join,
          instead of 1800 with plain hash-join.
          Hybrid hash-join is most useful if M is large, such that we can have bigger partitions.

          12.4.7 Complex Joins

          A join with a conjunctive condition can be handled, either by using the nested loop or block
          nested loop join, or alternatively, the result of one of the simpler joins (on a few conditions) can
          be computed and the final result may be evaluated by finding the tuples that satisfy the remaining
          conditions in the result.
          A join with a disjunctive condition can be calculated either by using the nested loop or block
          nested loop join, or it may be computed as the union of the records in individual joins.



              Task       Where you can use merge join? Explain











                                           LOVELY PROFESSIONAL UNIVERSITY                                   209
   211   212   213   214   215   216   217   218   219   220   221