Page 214 - DCAP402_DCAO204_DATABASE MANAGEMENT SYSTEM

Page 214 - DCAP402_DCAO204_DATABASE MANAGEMENT SYSTEM_MANAGING DATABASE

P. 214

Unit 12: Query Processing and Optimization

12.4.6 Hash-join Notes

This is applicable to both the equi-joins and natural joins. A hash function h is used to partition
tuples of both relations, where h maps joining attribute (enroll no in our example) values to
{0, 1, ..., n-1}.
The join attribute is hashed to the join-hash partitions. In the example of Figure 12.4 we have
used mod 100 function to hashing, and n = 100.
Error!

Figure 12.4: A Hash-join Example

Once the partition tables of STUDENT and MARKS are made on the enrolment number, then
only the corresponding partitions will participate in the join as:

A STUDENT tuple and a MARKS tuple that satisfy the join condition will have the same value for
the join attributes. Therefore, they will be hashed to equivalent partition and thus can be joined
easily.

Algorithm for Hash-join

The hash-join of two relations r and s is computed as follows:
1. Partition the relation r and s using hashing function h. (When partitioning a relation, one
block of memory is reserved as the output buffer for each partition).
2. For each partition si of s, load the partition into memory and build an in-memory hash
index on the join attribute.

3. Read the tuples in ri from the disk, one block at a time. For each tuple in ri locate each
matching tuple in si using the in-memory hash index and output the concatenation of their
attributes.

In this method, the relation s is called the build relation and r is called the probe relation. The
value n (the number of partitions) and the hash function h is chosen in such a manner that each
si should fit in to the memory. Typically n is chosen as Number of blocks of s/Number of

LOVELY PROFESSIONAL UNIVERSITY 207

209 210 211 212 213 214 215 216 217 218 219