Page 194 - DCAP605_ADVANCED_DATA_STRUCTURE_AND

Page 194 - DCAP605_ADVANCED_DATA_STRUCTURE_AND_ALGORITHMS

P. 194

Unit 9: Hashing

9.1 Hashing Notes

In many applications we require to use a data object called symbol table. A symbol table is nothing
but a set of pairs (name, value). Where value represents collection of attributes associated with the

name, and this collection of attributes depends upon the program element identiﬁed by the name.
For example, if a name x is used to identify an array in a program, then the attributes associated
with x are the number of dimensions, lower bound and upper bound of each dimension, and the
element type. Therefore a symbol table can be thought of as a linear list of pairs (name, value),
and hence you can use a list of data object for realizing a symbol table. A symbol table is referred
to or accessed frequently either for adding the name, or for storing the attributes of the name, or

for retrieving the attributes of the name. Therefore accessing efﬁciency is a prime concern while
designing a symbol table. Hence the most common way of getting a symbol table implemented
is to use a hash table. Hashing is a method of directly computing the index of the table by using
some suitable mathematical function called hash function. The hash function operates on the
name to be stored in the symbol table, or whose attributes are to be retrieved from the symbol
table. If h is a hash function and x is a name, then h(x) gives the index of the table where x along
with its attributes can be stored. If x is already stored in the table, then h(x) gives the index of the
table where it is stored to retrieve the attributes of x from the table. There are various methods of
deﬁning a hash function like a division method. In this method, you take the sum of the values of

the characters, divide it by the size of the table, and take the remainder. This gives us an integer
value lying in the range of 0 to (n – 1) if the size of the table is n. The other method is a mid
square method. In this method, the identiﬁ er is ﬁ rst squared and then the appropriate number
of bits from the middle of square is used as the hash value. Since the middle bits of the square

usually depend on all the characters in the identiﬁer, it is expected that different identiﬁ ers will
result into different values. The number of middle bits that you select depends on the table size.
Therefore if r is the number of middle bits that you use to form hash value, then the table size
will be 2r. Hence when you use this method the table size is required to be power of 2. Another
method is folding in which the identiﬁer is partitioned into several parts, all but the last part

being of the same length. These parts are then added together to obtain the hash value.
To store the name or to add attributes of the name, you compute hash value of the name, and
place the name or attributes as the case may be, at that place in the table whose index is the hash
value of the name. For retrieving the attribute values of the name kept in the symbol table, I apply
the hash function to the name to obtain index of the table where you get the attributes of the

name. Hence you ﬁnd that no comparisons are required to be done. Hence the time required for
the retrieval is independent of the table size. Therefore retrieval is possible in a constant amount
of time, which will be the time taken for computing the hash function. Therefore hash table seems
to be the best for realization, of the symbol table, but there is one problem associated with the
hashing, and it is of collisions. Hash collision occurs when the two identiﬁers are mapped into

the same hash value. This happens because a hash function deﬁnes a mapping from a set of valid
identiﬁ ers to the set of those integers, which are used as indices of the table. Therefore you see

that the domain of the mapping deﬁned by the hash function is much larger than the range of the
mapping, and hence the mapping is of many to one nature. Therefore when I implement a hash
table a suitable collision handling mechanism is to be provided which will be activated when
there is a collision.

Collision handling involve ﬁnding out an alternative location for one of the two colliding

symbols. For example, if x and y are the different identiﬁers and if h(x) = h(y), x and y are the
colliding symbols. If x is encountered before y, then the ith entry of the table will be used for
accommodating symbol x, but later on when y comes there is a hash collision, and therefore

you have to ﬁnd out an alternative location either for x or y. This means you ﬁnd out a suitable

alternative location and either accommodate y in that location, or you can move x to that location

LOVELY PROFESSIONAL UNIVERSITY 189

189 190 191 192 193 194 195 196 197 198 199