Page 211 - DCAP310_INTRODUCTION_TO_ARTIFICIAL_INTELLIGENCE_AND_EXPERT_SYSTEMS
P. 211
Unit 11: Knowledge Organization and Management
which is eight characters that we can even memorize. Now suppose you place those characters Notes
in your ‘decrypter’ and you get 1024 TB ‘decompressed’. Most plainly said, we have a simple
equation where we have three variables (a, b and c). Within hash functions we only know the
value of the output which would be (if c = (a + b)) the letter ‘c’. So we need 2*c attempts to find out
the result which means brute-force.
Figure 11.1: Working of Hash Function
A hash function should be referentially transparent (stable), i.e., if called twice on input that is
“equal” (for example, strings that consist of the same sequence of characters), it should give the
same result. There is a construct in many programming languages that allows the user to
override equality and hash functions for an object: if two objects are equal, their hash codes must
be the same. This is crucial to finding an element in a hash table quickly, because two of the same
element would both hash to the same slot.
All hash functions that map a larger set of data to a smaller set of data cause collisions. Such hash
functions try to map the keys to the hash values as evenly as possible because collisions become
more frequent as hash tables fill up. Thus, single-digit hash values are frequently restricted to
80% of the size of the table. Depending on the algorithm used, other properties may be required
as well, such as double hashing and linear probing. Although the idea was conceived in the
1950s, the design of good hash functions is still a topic of active research.
Hash functions are related to (and often confused with) checksums, check digits, fingerprints,
randomization functions, error correcting codes, and cryptographic hash functions. Although
these concepts overlap to some extent, each has its own uses and requirements and is designed
and optimized differently.
Hash Tables
Hash functions are primarily used in hash tables, to quickly locate a data record (e.g., a dictionary
definition) given its search key (the headword). Specifically, the hash function is used to map the
search key to an index; the index gives the place in the hash table where the corresponding
record should be stored. Hash tables, in turn, are used to implement associative arrays and
dynamic sets.
Typically, the domain of a hash function (the set of possible keys) is larger than its range (the
number of different table indexes), and so it will map several different keys to the same index.
Therefore, each slot of a hash table is associated with (implicitly or explicitly) a set of records,
rather than a single record. For this reason, each slot of a hash table is often called a bucket, and
hash values are also called bucket indices.
Thus, the hash function only hints at the record’s location—it tells where one should start
looking for it. Still, in a half-full table, a good hash function will typically narrow the search
LOVELY PROFESSIONAL UNIVERSITY 205