Page 216 - DCAP402_DCAO204_DATABASE MANAGEMENT SYSTEM_MANAGING DATABASE
P. 216
Unit 12: Query Processing and Optimization
Partitioning is said to be skewed if some partitions have significantly more tuples than the Notes
others. This is the overflow condition. The overflow can be handled in a variety of ways:
1. Resolution (during the build phase): The overflow partition s is further partitioned using
different hash function. The equivalent partition of r must be further partitioned similarily.
2. Avoidance (during build phase): Partition build relations into many partitions, then
combines them.
However, such approaches for handling overflows fail with large numbers of duplicates. One
option of avoiding such problems is to use block nested-loop join on the overflowed partitions.
Let us explain the hash join and its cost for the natural join STUDENT MARKS
Assume a memory size of 25 blocks M=25;
SELECT build s as STUDENT as it has less number of blocks (100 blocks) and r probe as MARKS
(500 blocks).
Number of partitions to be created for STUDENT = (block of STUDENT/M)* fudge factor (1.2) =
(100/25) × 1.2 = 4.8
Thus, STUDENT relation will be partitioned into 5 partitions of 20 blocks each. MARKS will also
have 5 partitions of 100 blocks each. The 25 buffers will be used as –20 blocks for one complete
partition of STUDENT plus 4 more blocks for one block of each of the other 4 partitions. One
block will be used for input of MARKS partitions.
The total cost = 3(100+500) = 1800 as no recursive partitioning is needed.
Hybrid Hash-join
This is useful when the size of the memory is relatively large, and the build input is larger than
the memory. Hybrid hash join keeps the first partition of the build relation in the memory. The
first partition of STUDENT is maintained in the first 20 blocks of the buffer, and not written to
the disk. The first block of MARKS is used right away for probing, instead of being written and
read back. Thus, it has a cost of 3(80 + 400) + 20 +100 = 1560 block transfers for hybrid hash-join,
instead of 1800 with plain hash-join.
Hybrid hash-join is most useful if M is large, such that we can have bigger partitions.
12.4.7 Complex Joins
A join with a conjunctive condition can be handled, either by using the nested loop or block
nested loop join, or alternatively, the result of one of the simpler joins (on a few conditions) can
be computed and the final result may be evaluated by finding the tuples that satisfy the remaining
conditions in the result.
A join with a disjunctive condition can be calculated either by using the nested loop or block
nested loop join, or it may be computed as the union of the records in individual joins.
Task Where you can use merge join? Explain
LOVELY PROFESSIONAL UNIVERSITY 209