Page 194 - DCAP605_ADVANCED_DATA_STRUCTURE_AND_ALGORITHMS
P. 194

Unit 9: Hashing




          9.1 Hashing                                                                           Notes

          In many applications we require to use a data object called symbol table. A symbol table is nothing
          but a set of pairs (name, value). Where value represents collection of attributes associated with the

          name, and this collection of attributes depends upon the program element identified by the name.
          For example, if a name x is used to identify an array in a program, then the attributes associated
          with x are the number of dimensions, lower bound and upper bound of each dimension, and the
          element type. Therefore a symbol table can be thought of as a linear list of pairs (name, value),
          and hence you can use a list of data object for realizing a symbol table. A symbol table is referred
          to or accessed frequently either for adding the name, or for storing the attributes of the name, or

          for retrieving the attributes of the name. Therefore accessing efficiency is a prime concern while
          designing a symbol table. Hence the most common way of getting a symbol table implemented
          is to use a hash table. Hashing is a method of directly computing the index of the table by using
          some suitable mathematical function called hash function. The hash function operates on the
          name to be stored in the symbol table, or whose attributes are to be retrieved from the symbol
          table. If h is a hash function and x is a name, then h(x) gives the index of the table where x along
          with its attributes can be stored. If x is already stored in the table, then h(x) gives the index of the
          table where it is stored to retrieve the attributes of x from the table. There are various methods of
          defining a hash function like a division method. In this method, you take the sum of the values of

          the characters, divide it by the size of the table, and take the remainder. This gives us an integer
          value lying in the range of 0 to (n – 1) if the size of the table is n. The other method is a mid
          square method. In this method, the identifi er is fi rst squared and then the appropriate number
          of bits from the middle of square is used as the hash value. Since the middle bits of the square

          usually depend on all the characters in the identifier, it is expected that different identifi ers will
          result into different values. The number of middle bits that you select depends on the table size.
          Therefore if r is the number of middle bits that you use to form hash value, then the table size
          will be 2r. Hence when you use this method the table size is required to be power of 2. Another
          method is folding in which the identifier is partitioned into several parts, all but the last part

          being of the same length. These parts are then added together to obtain the hash value.
          To store the name or to add attributes of the name, you compute hash value of the name, and
          place the name or attributes as the case may be, at that place in the table whose index is the hash
          value of the name. For retrieving the attribute values of the name kept in the symbol table, I apply
          the hash function to the name to obtain index of the table where you get the attributes of the

          name. Hence you find that no comparisons are required to be done. Hence the time required for
          the retrieval is independent of the table size. Therefore retrieval is possible in a constant amount
          of time, which will be the time taken for computing the hash function. Therefore hash table seems
          to be the best for realization, of the symbol table, but there is one problem associated with the
          hashing, and it is of collisions. Hash collision occurs when the two identifiers are mapped into


          the same hash value. This happens because a hash function defines a mapping from a set of valid
          identifi ers to the set of those integers, which are used as indices of the table. Therefore you see

          that the domain of the mapping defined by the hash function is much larger than the range of the
          mapping, and hence the mapping is of many to one nature. Therefore when I implement a hash
          table a suitable collision handling mechanism is to be provided which will be activated when
          there is a collision.

          Collision handling involve  finding out an alternative location for one of the two colliding

          symbols. For example, if x and y are the different identifiers and if h(x) = h(y), x and y are the
          colliding symbols. If x is encountered before y, then the ith entry of the table will be used for
          accommodating symbol x, but later on when y comes there is a hash collision, and therefore

          you have to find out an alternative location either for x or y. This means you find out a suitable

          alternative location and either accommodate y in that location, or you can move x to that location






                                           LOVELY PROFESSIONAL UNIVERSITY                                   189
   189   190   191   192   193   194   195   196   197   198   199