Huffman Code Explained using Heap [Introduction to Algorithms book]

Huffman codes is a widely used technique for compressing data. Huffman codes is a greedy algorithm to build up an optimal way of representing each character as a binary string. Suppose we have a 10,000 character data file that we wish to store. And we observe a character ‘a’ occurs very frequently.
Now, there are many ways to represent a file. We consider a problem of designing a binary character code wherein each character is represented by a unique binary string.

For instance, we need 3 bits to represent five characters [a=000, b=001, c=010, d=011, e=100]. Now if we use variable length codeword, we could save a lot of space, for instance:

Characterabcd
Frequency1001312155
Fixed-length code000001010011100
Variable-length code1010001011000
Huffman Codes

A data file of 200,000 characters contains only the characters (a..e), with above mentioned frequencies. If we use variable-length code instead of fixed-length code, we could save 80,000 bits.

Huffman Code Algorithm

Huffman invented a greedy algorithm that constructs an optimal prefix code (variable length code) called Huffman Code.

Huffman Codes

As we could see from the image above, for the given characters {a, b, c} with frequencies {2, 4, 8} respectively, Huffman Codes (bits 0, 1 on edges) generated are:

a: 00
b: 01
c: 1
Pseudocode

This problem could be solved by using a Priority Queue or Heap data structure.

Huffman(C)
1. n = size[C]
2. Q = C
3. for i = 1 to n-1:
4.   new node = z 
5.   left[z] = Extract-Min[Q]
6.   right[z] = Extract-Min[Q]
7.   f[z] = f[x] + f[y]
8.   Insert(Q, z)
9. Extract-Min(Q)

Note: Extract-Min(Q) operation removes the top (smallest) member of priority-queue Q and left[z] = Extract-Min[Q] operation removes the smallest element from priority-queue Q and assign it to left[z].

Code Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
//
//  main.cpp
//  Huffman Code
//
//  Created by Himanshu on 18/09/21.
//
  
#include <cstdio>
#include <iostream>
#include <vector>
#include <algorithm>
#define MAX 100
using namespace std;
 
struct node {
    int freq;
    char letter = NULL;
    struct node *left, *right;
};
typedef struct node Node;
 
bool cmp(const Node *a, const Node *b) {
    return (a->freq > b->freq);
}
 
Node* newNode(int freq, char ch) {
    Node *p =  new node();
     
    p->freq = freq;
    p->letter = ch;
    p->left = NULL;
    p->right = NULL;
     
    return p;
}
 
void printArray(int A[], int n) {
    for (int i=0; i<n; i++) {
        cout<<A[i];
    }
    cout<<endl;
}
 
Node* BuildHuffmanTree (vector<int> freq) {
    int n = (int)freq.size();
    vector<Node*> minHeap;
 
    for (int i=0; i<n; i++) {
        Node *z = newNode(freq[i], (char)(i + 'a'));
        minHeap.push_back(z);
        push_heap(minHeap.begin(), minHeap.end(), &cmp);
    }
     
    while (minHeap.size() > 1) {
         
        //This is a non-alphabet node, hence -1 for ch
        Node *z = newNode(0, -1);
         
        Node *x = minHeap.front();
        pop_heap(minHeap.begin(), minHeap.end(), &cmp);
        minHeap.pop_back();
        z->left = x;
         
        Node *y = minHeap.front();
        pop_heap(minHeap.begin(), minHeap.end(), &cmp);
        minHeap.pop_back();
        z->right = y;
         
        z->freq = x->freq + y->freq;
         
        minHeap.push_back(z);
        push_heap(minHeap.begin(), minHeap.end(), &cmp);
    }
     
    return minHeap.front();
}
      
void solve (int A[], Node *root, int n) {
     
    if (root->letter != -1) {
        cout<<(char)(root->letter)<<": ";
        printArray(A, n);
    }
     
    if (root->left) {
        A[n] = 0;
        solve(A, root->left, n+1);
    }
     
    if (root->right) {
        A[n] = 1;
        solve(A, root->right, n+1);
    }
 
}
 
int main() {
    //    char    =    {a,  b,  c, d,  e,  f}
    vector<int> freq = {5, 20, 15, 13, 30, 45};
    int temp[MAX];
     
    Node *node = BuildHuffmanTree(freq);
    solve(temp, node, 0);
     
    return 0;
}

Output

b: 00
e: 01
c: 100
a: 1010
d: 1011
f: 11

Time complexity: O(nlogn) (For calling Heapify operation (which is O(logn)), n times)
Auxiliary Space: O(n) (For creating Heap)

n is the number of alphabet characters

Leave a Reply

Your email address will not be published. Required fields are marked *