Page 194 - DCAP407_DATA_STRUCTURE
P. 194
Unit 10: Trees
There are two data encoding schemes and Huffman Encoding scheme falls under one of the two
schemes. The following are the two data encoding schemes:
1. Fixed Length Encoding: In fixed length encoding, all symbols are encoded using the same number
of bits. An example of fixed length encoding is ASCII code which uses 7 bits to encode a total of
128 different symbols. The difficulty with fixed length codes is that the probability of occurrence of
the symbols to be encoded is not considered. A symbol that occurs 1000 times is encoded with the
same number of bits as a symbol which occurs only 10 times. This disadvantage makes fixed
length encoding inefficient for data compression.
2. Variable Length Encoding: Variable length encoding assigns less number of bits to symbols which
occur more often and more number of bits to symbols whose frequency is less. The Huffman
Encoding scheme falls in the category of variable length encoding i.e., code for the symbol
depends on the frequency of occurrence of that symbol.
Algorithm for Constructing Huffman tree
The following sequence of steps needs to be followed to construct a Huffman tree:
1. Input all symbols along with their respective frequencies.
2. Create leaf nodes representing the scanned symbols.
3. Let S be a set containing all the nodes created in step 2
When there is only one node in S, the following steps need to be followed:
1. Sort the nodes (symbols) in S with respect to their frequencies.
2. Create a new node to combine the two nodes with least frequencies.
3. Frequency of this new combined node will be equal to the sum of frequencies of nodes which were
combined. This newly created combined node will be the parent of two nodes which were
combined.
4. Replace the two nodes which were combined with the new combined node in S.
5. After the 4 step, you will be left with only one node, which is the root of the Huffman tree,
th
having frequency equal to the sum of all frequencies of all symbols. Thus, a tree is generated with
leaf nodes containing the basic symbols whose code is to be found.
With the help of an example we will learn how to construct a Huffman tree. The table 10.1 shows the
frequency of occurrence of different symbols.
Table 10.1: Symbol Frequency Table
Symbol Frequency of Occurrence
A 24
B 12
C 10
D 8
E 8
Using the symbols and frequencies from the table 10.1, we can create the leaf nodes and then sort them.
Symbols D and E have the least frequency, 8; these two nodes are combined to make a node DE having
LOVELY PROFESSIONAL UNIVERSITY 187