Huffman Coding & Compression

H446 · 1.3 Exchanging Data · A-Level Computer Science

Component 01

Run-Length Encoding (RLE)

Replace consecutive repeated values with a (count, value) pair. Effective when data has many repeating sequences.

Example

Original: AAABBBBBCCA

RLE output: 3A5B2C1A

Original: 11 chars × 8 bits = 88 bits

Compressed: 4 pairs × 16 bits = 64 bits

Saving: 27%

Best & Worst Cases

Best case — highly repetitive data

AAAAAAAAAAAAAAAA → 16A (huge saving)

Use case: fax transmission, simple bitmap images

Worst case — random/varied data

ABCDEFG → 1A1B1C1D1E1F1G (file GROWS)

RLE is ineffective for natural language text or photographs.

Huffman Coding

A variable-length prefix coding scheme where more frequent characters receive shorter codes. It is lossless — the original data can be perfectly reconstructed.

Algorithm Steps

Count the frequency of each character in the data
Create a leaf node for each character, sorted by frequency (ascending)
Repeatedly take the two nodes with the lowest frequency, combine into a parent node with their combined frequency
Repeat until a single root node remains — this is the Huffman tree
Assign 0 to left branches, 1 to right branches (or vice versa — consistent)
The code for each character is read by tracing from root to leaf

Worked Example: "AABBC"

Frequency Table

Char	Freq	Code	Bits
A	2	0	1
B	2	10	2
C	1	11	2

Without Huffman (3 chars → 2 bits each): 5 × 2 = 10 bits

With Huffman: (2×1)+(2×2)+(1×2) = 8 bits → 20% saving

Tree (simplified)

A=0, B=10, C=11 — no code is a prefix of another (prefix-free property).

RLE Encoder

Type a string of characters and see the RLE output with space saving calculation.

Huffman Frequency Table Builder

Enter a short string to see its frequency table and Huffman codes calculated.

Past Paper Questions

OCR H446 Style

1. The string 'AABBBCCCCDDDDDD' is compressed using run-length encoding. (a) Show the compressed output. (b) The original uses 8 bits per character. Each RLE pair uses 16 bits (8 for count, 8 for character). Calculate the compression ratio. [4 marks]

2. Explain the process of building a Huffman tree for the string 'AABBC'. Include the frequency table and describe the key steps. [5 marks]

3. Explain why Huffman coding is described as lossless, and explain one property of a valid Huffman code that ensures unambiguous decoding. [3 marks]