Huffman Coding & Compression

H446 · 1.3 Exchanging Data · A-Level Computer Science

Component 01

Run-Length Encoding (RLE)

Replace consecutive repeated values with a (count, value) pair. Effective when data has many repeating sequences.

Example

Original: AAABBBBBCCA

RLE output: 3A5B2C1A

Original: 11 chars × 8 bits = 88 bits

Compressed: 4 pairs × 16 bits = 64 bits

Saving: 27%

Best & Worst Cases

Best case — highly repetitive data

AAAAAAAAAAAAAAAA → 16A (huge saving)

Use case: fax transmission, simple bitmap images

Worst case — random/varied data

ABCDEFG → 1A1B1C1D1E1F1G (file GROWS)

RLE is ineffective for natural language text or photographs.

Huffman Coding

A variable-length prefix coding scheme where more frequent characters receive shorter codes. It is lossless — the original data can be perfectly reconstructed.

Algorithm Steps

  1. Count the frequency of each character in the data
  2. Create a leaf node for each character, sorted by frequency (ascending)
  3. Repeatedly take the two nodes with the lowest frequency, combine into a parent node with their combined frequency
  4. Repeat until a single root node remains — this is the Huffman tree
  5. Assign 0 to left branches, 1 to right branches (or vice versa — consistent)
  6. The code for each character is read by tracing from root to leaf

Worked Example: "AABBC"

Frequency Table

CharFreqCodeBits
A201
B2102
C1112

Without Huffman (3 chars → 2 bits each): 5 × 2 = 10 bits

With Huffman: (2×1)+(2×2)+(1×2) = 8 bits → 20% saving

Tree (simplified)

5 A f=2 0 3 1 B f=2 0 C f=1 1

A=0, B=10, C=11 — no code is a prefix of another (prefix-free property).

RLE Encoder

Type a string of characters and see the RLE output with space saving calculation.

Huffman Frequency Table Builder

Enter a short string to see its frequency table and Huffman codes calculated.