Page 198 - DCAP605_ADVANCED_DATA_STRUCTURE_AND

Page 198 - DCAP605_ADVANCED_DATA_STRUCTURE_AND_ALGORITHMS

P. 198

Unit 9: Hashing

of the square are used to get the hash value. Since the middle bits of the square usually Notes

depend on all the characters in the identiﬁer, it is expected that different identiﬁ ers will
result in different values. The number of middle bits that we select depends on table size.
Therefore if r is the number of middle bits used to form hash value, then the table size will
be 2r, hence when you use mid square method the table size should be a power of 2.

9.3 Open Hashing

The simplest form of open hashing deﬁnes each slot in the hash table to be the head of a linked
list. All records that hash to a particular slot are placed on that slot’s linked list. The ﬁ gure below
illustrates a hash table where each slot stores one record and a link pointer to the rest of the list.

Records within a slot’s list can be ordered in several ways: by insertion order, by key value order,
or by frequency-of-access order. Ordering the list by key value provides an advantage in the case
of an unsuccessful search, because I know to stop searching the list once you encounter a key
that is greater than the one being searched for. If records on the list are unordered or ordered by
frequency, then an unsuccessful search will need to visit every record on the list.
Given a table of size M storing N records, the hash function will (ideally) spread the records evenly
among the M positions in the table, yielding on average N/M records for each list. Assuming
that the table has more slots than there are records to be stored, you can hope that few slots will
contain more than one record. In the case where a list is empty or has only one record, a search
requires only one access to the list. Thus, the average cost for hashing should be Θ(1). However,
if clustering causes many records to hash to only a few of the slots, then the cost to access a record
will be much higher because many elements on the linked list must be searched.
Open hashing is most appropriate when the hash table is kept in main memory, with the lists
implemented by a standard in-memory linked list. Storing an open hash table on disk in an

efﬁcient way is difﬁcult, because members of a given linked list might be stored on different disk

blocks. This would result in multiple disk accesses when searching for a particular key value,
which defeats the purpose of using hashing.
Let:
1. U be the universe of keys:
(a) Integers
(b) Character strings
(c) Complex bit patterns

LOVELY PROFESSIONAL UNIVERSITY 193

193 194 195 196 197 198 199 200 201 202 203