Radix Sort — Practice Tasks¶
All tasks must be solved in Go, Java, and Python.
Beginner Tasks¶
Task 1: Implement LSD Radix Sort for non-negative integers in base 10. Verify with [170, 45, 75, 90, 802, 24, 2, 66] → [2, 24, 45, 66, 75, 90, 170, 802].
Go¶
package main
func RadixSortBase10(arr []int) {
// TODO: implement
}
func main() {
data := []int{170, 45, 75, 90, 802, 24, 2, 66}
RadixSortBase10(data)
// print
}
Java¶
public class Task1 {
public static void sort(int[] arr) {
// TODO: implement
}
public static void main(String[] args) {
int[] data = {170, 45, 75, 90, 802, 24, 2, 66};
sort(data);
}
}
Python¶
def radix_sort_base10(arr):
# TODO: implement
pass
if __name__ == "__main__":
data = [170, 45, 75, 90, 802, 24, 2, 66]
radix_sort_base10(data)
print(data)
- Constraints: correct O(d·(n+k)), stable, tested with empty/single/duplicate/all-equal inputs.
- Expected Output:
[2, 24, 45, 66, 75, 90, 170, 802] - Evaluation: correctness, stability, edge cases.
Task 2: Implement byte-radix LSD (base 256) for uint32 keys. Verify with random input by comparing to language built-in sort.
Go¶
package main
func RadixSortBase256(arr []uint32) {
// TODO: 4 passes, shift by 0/8/16/24, mask 0xFF, double-buffered
}
func main() {
// generate random uint32 values, sort, compare against sort.Slice
}
Java¶
public class Task2 {
public static void sortBase256(int[] arr) {
// TODO: 4 passes, shift by 0/8/16/24, mask 0xFF, use >>> for unsigned shift
}
public static void main(String[] args) {
// generate random ints, sort, compare against Arrays.sort
}
}
Python¶
def radix_sort_base256(arr):
# TODO: 4 passes, shifts (0,8,16,24), mask 0xFF
pass
if __name__ == "__main__":
import random
data = [random.randint(0, 2**32 - 1) for _ in range(1000)]
radix_sort_base256(data)
assert data == sorted(data)
- Hint: Use bit-shift (
v >> shift) and mask (& 0xFF) for digit extraction; 4 passes total. - Constraints: 4 passes for 32-bit, double-buffered (no allocation per pass).
Task 3: Write a test that asserts stability of your LSD Radix Sort. Use input pairs [(key, original_idx)] and confirm ties preserve original index order.
Go¶
type Pair struct{ Key, Idx int }
func TestStability(t *testing.T) {
pairs := []Pair{{3, 0}, {1, 1}, {3, 2}, {1, 3}}
// TODO: extract keys, sort indirectly, verify pairs[0].Idx==1, pairs[1].Idx==3, etc.
}
Java¶
import java.util.*;
public class Task3 {
record Pair(int key, int idx) {}
public static void testStability() {
List<Pair> pairs = List.of(new Pair(3, 0), new Pair(1, 1),
new Pair(3, 2), new Pair(1, 3));
// TODO: sort by key using your stable radix; assert ties preserve idx order
}
}
Python¶
def test_stability():
pairs = [(3, 0), (1, 1), (3, 2), (1, 3)]
# TODO: implement stable_sort_by_key from middle.md
# Sort by pairs[i][0], verify ties preserve original index order
expected = [(1, 1), (1, 3), (3, 0), (3, 2)]
# assert sorted == expected
- Hint: Sort
[(3, 0), (1, 1), (3, 2), (1, 3)]bykey; expected[(1, 1), (1, 3), (3, 0), (3, 2)].
Task 4: Implement Counting Sort as a standalone function and call it from your Radix Sort as the inner subroutine. Verify the two-level structure works.
- Constraints: Counting Sort must be stable (right-to-left placement).
- Why this matters: Reinforces the separation: Counting Sort is a standalone primitive; Radix Sort just calls it d times.
Task 5: Generate 1,000 random ints in [0, 10^6], sort with your Radix Sort, compare to language-builtin sort. Assert equal outputs.
- Constraints: Use seeded RNG so tests are reproducible.
Intermediate Tasks¶
Task 6: Implement Radix Sort for signed int32 values. Handle negatives by flipping the sign bit before sort and flipping back after.
Go¶
package main
func RadixSortSigned(arr []int32) {
// Flip sign bit, sort as uint32, flip back
// TODO
}
Java¶
public class Task6 {
public static void sortSigned(int[] arr) {
// Flip sign bit (XOR with Integer.MIN_VALUE)
// TODO
}
}
Python¶
- Constraints: Must handle
INT_MIN, negatives, zero, and positives. - Expected Output: Correctly sorted output including negative values.
Task 7: Implement MSD Radix Sort for strings. Test with ["she", "sells", "seashells", "by", "the", "seashore"] → sorted lexicographically.
Go¶
package main
func MSDStringSort(arr []string) {
aux := make([]string, len(arr))
msdRecurse(arr, aux, 0, len(arr)-1, 0)
}
func msdRecurse(arr, aux []string, lo, hi, d int) {
// TODO: recursive MSD; sentinel bucket for d >= len(s)
}
Java¶
public class Task7 {
public static void msdSort(String[] arr) {
String[] aux = new String[arr.length];
msd(arr, aux, 0, arr.length - 1, 0);
}
private static void msd(String[] arr, String[] aux, int lo, int hi, int d) {
// TODO: recursive MSD; bucket 0 = end-of-string
}
}
Python¶
def msd_string_sort(strings):
aux = [None] * len(strings)
_msd(strings, aux, 0, len(strings), 0)
def _msd(arr, aux, lo, hi, d):
# TODO: recursive MSD with end-of-string sentinel
pass
- Hint: Use a sentinel for end-of-string (so shorter strings sort before their prefix-matched longer counterparts).
- Constraints: Recursive on each bucket; insertion-sort cutoff for buckets ≤ 24.
Task 8: Sort n = 1,000,000 random IPv4 addresses (as strings "a.b.c.d") using Radix Sort on the parsed uint32 value. Output must be in numeric IP order.
Go¶
func SortIPs(ips []string) []string {
nums := make([]uint32, len(ips))
for i, ip := range ips { nums[i] = ipToUint32(ip) }
RadixSortBase256(nums) // from Task 2
out := make([]string, len(nums))
for i, n := range nums { out[i] = uint32ToIP(n) }
return out
}
func ipToUint32(ip string) uint32 { /* TODO */ return 0 }
func uint32ToIP(v uint32) string { /* TODO */ return "" }
Java¶
public class Task8 {
public static String[] sortIPs(String[] ips) {
long[] nums = new long[ips.length];
for (int i = 0; i < ips.length; i++) nums[i] = ipToLong(ips[i]);
// sort nums with radix
String[] out = new String[ips.length];
for (int i = 0; i < nums.length; i++) out[i] = longToIp(nums[i]);
return out;
}
static long ipToLong(String ip) { /* TODO */ return 0; }
static String longToIp(long v) { /* TODO */ return ""; }
}
Python¶
def sort_ips(ips):
def ip_to_int(s):
parts = s.split('.')
return (int(parts[0]) << 24) | (int(parts[1]) << 16) | (int(parts[2]) << 8) | int(parts[3])
def int_to_ip(v):
return f"{(v>>24)&0xFF}.{(v>>16)&0xFF}.{(v>>8)&0xFF}.{v&0xFF}"
nums = [ip_to_int(ip) for ip in ips]
radix_sort_base256(nums)
return [int_to_ip(n) for n in nums]
- Hint: Parse → sort uint32s → format back to "a.b.c.d".
Task 9: Implement double-buffered LSD Radix Sort that does NOT do copy(arr, output) after each pass. Instead, ping-pong two pre-allocated buffers and only copy at the end if needed.
Go¶
func RadixSortDoubleBuffer(arr []uint32) {
n := len(arr)
buf := make([]uint32, n)
src, dst := arr, buf
for shift := uint(0); shift < 32; shift += 8 {
// histogram → prefix sum → place src→dst → swap src and dst
// TODO
src, dst = dst, src
}
if &src[0] != &arr[0] { copy(arr, src) }
}
Java¶
public class Task9 {
public static void sortDoubleBuf(int[] arr) {
int n = arr.length;
int[] buf = new int[n];
int[] src = arr, dst = buf;
for (int shift = 0; shift < 32; shift += 8) {
// TODO: histogram + scatter from src to dst
int[] tmp = src; src = dst; dst = tmp;
}
if (src != arr) System.arraycopy(src, 0, arr, 0, n);
}
}
Python¶
def radix_sort_double_buffer(arr):
n = len(arr)
buf = [0] * n
src, dst = arr, buf
for shift in (0, 8, 16, 24):
# TODO: histogram + scatter from src to dst
src, dst = dst, src
if src is not arr:
arr[:] = src
- Constraints: Same correctness as Task 2 but measurably faster (benchmark to confirm).
Task 10: Implement Radix Sort for floats (float32). Use the IEEE-754 bit transform: positives flip sign bit; negatives flip all bits. Reverse transform after sorting.
Go¶
import "math"
func RadixSortFloat32(arr []float32) {
keys := make([]uint32, len(arr))
for i, v := range arr {
b := math.Float32bits(v)
if b&0x80000000 != 0 { keys[i] = ^b } // negative
else { keys[i] = b | 0x80000000 }
}
RadixSortBase256(keys)
for i, k := range keys {
if k&0x80000000 != 0 { arr[i] = math.Float32frombits(k & 0x7FFFFFFF) }
else { arr[i] = math.Float32frombits(^k) }
}
}
Java¶
public class Task10 {
public static void sortFloats(float[] arr) {
int[] keys = new int[arr.length];
for (int i = 0; i < arr.length; i++) {
int b = Float.floatToRawIntBits(arr[i]);
keys[i] = ((b & 0x80000000) != 0) ? ~b : (b | 0x80000000);
}
// sort keys with byte radix
for (int i = 0; i < arr.length; i++) {
int k = keys[i];
arr[i] = ((k & 0x80000000) != 0)
? Float.intBitsToFloat(k & 0x7FFFFFFF)
: Float.intBitsToFloat(~k);
}
}
}
Python¶
import struct
def radix_sort_float32(arr):
def f_to_key(f):
b = struct.unpack('I', struct.pack('f', f))[0]
return (~b & 0xFFFFFFFF) if (b & 0x80000000) else (b | 0x80000000)
def key_to_f(k):
if k & 0x80000000:
return struct.unpack('f', struct.pack('I', k & 0x7FFFFFFF))[0]
return struct.unpack('f', struct.pack('I', (~k) & 0xFFFFFFFF))[0]
keys = [f_to_key(v) for v in arr]
radix_sort_base256(keys)
arr[:] = [key_to_f(k) for k in keys]
- Constraints: Correctly handles negative floats, positive floats, zero, and small/large magnitudes.
- Hint:
math.Float32bitsin Go,Float.floatToRawIntBitsin Java,struct.unpackin Python.
Advanced Tasks¶
Task 11: Implement hybrid Radix + Insertion Sort: switch to Insertion Sort when sub-array size drops below 32. Benchmark vs pure Radix on n = 10^6 random ints.
- Constraints: Must show measurable speedup (>20%) over pure Radix on the same input.
Task 12: Implement parallel MSD Radix Sort using goroutines (Go) / ExecutorService (Java) / multiprocessing (Python). Spawn one task per bucket. Benchmark on 8-core machine vs sequential.
Go¶
import "sync"
const PARALLEL_CUTOFF = 50_000
func ParallelMSD(arr []uint32) {
var wg sync.WaitGroup
msdPar(arr, 0, len(arr), 24, &wg)
wg.Wait()
}
func msdPar(arr []uint32, lo, hi int, shift uint, wg *sync.WaitGroup) {
// TODO: histogram → scatter → for each bucket, spawn goroutine if size > cutoff
}
Java¶
import java.util.concurrent.*;
public class Task12 {
static ExecutorService pool = Executors.newWorkStealingPool();
static final int CUTOFF = 50_000;
public static void parallelMSD(int[] arr) {
ForkJoinPool.commonPool().invoke(new MSDTask(arr, 0, arr.length, 24));
}
static class MSDTask extends RecursiveAction {
// TODO: implement
int[] arr; int lo, hi, shift;
MSDTask(int[] a, int l, int h, int s){arr=a;lo=l;hi=h;shift=s;}
@Override protected void compute() { /* TODO */ }
}
}
Python¶
from concurrent.futures import ProcessPoolExecutor
import os
CUTOFF = 50_000
def parallel_msd(arr):
# Note: Python's GIL limits threading; use processes for true parallelism
with ProcessPoolExecutor(max_workers=os.cpu_count()) as ex:
# TODO: partition once, dispatch each bucket as a task
pass
- Constraints: Use a threshold (
n > 50_000) below which sub-buckets run sequentially to avoid task-spawn overhead. - Expected: 3-5× speedup on 8 cores for
n = 10^7.
Task 13: Implement external Radix Sort: input is a binary file of int64 values larger than RAM (simulate by setting a small in-memory limit, e.g. 100k records). Partition into 256 bucket files (by top byte), recurse if any bucket > limit, finally concatenate.
- Constraints: Must handle a 10× memory-limit input correctly. Cleanup temp files.
Task 14: Implement Radix Sort with software write-combining: thread-local per-bucket buffers of one cache line (8 int64 keys). When a buffer fills, copy the line to the output. Measure cache-miss reduction with perf (Linux).
- Constraints: Must show fewer cache misses than naive scatter. Pure correctness alone isn't enough — write-combining is a performance optimization.
Task 15: Build a TeraSort-style distributed sort: 1. Sample 0.1% of input → quantile boundaries. 2. Partition each input record to one of p partitions by quantile. 3. Each partition sorted locally using your byte-radix LSD. 4. Concatenate partitions.
Test with synthetic data of size 100M records on a 4-machine simulated cluster (use 4 goroutines / threads to simulate nodes).
- Constraints: Must produce same output as a global sort. Time should beat a single-threaded sort by ≥ 3× on 4 nodes.
Benchmark Task¶
Compare performance across all 3 languages.
Go¶
package main
import (
"fmt"
"math/rand"
"sort"
"time"
)
// Put your byte-radix LSD here:
func RadixSortU32(arr []uint32) {
// implement (see middle.md Example 1)
}
func main() {
sizes := []int{10_000, 100_000, 1_000_000, 10_000_000}
rng := rand.New(rand.NewSource(42))
for _, n := range sizes {
data := make([]uint32, n)
for i := range data {
data[i] = rng.Uint32()
}
radix := append([]uint32(nil), data...)
t1 := time.Now()
RadixSortU32(radix)
rt := time.Since(t1)
native := append([]uint32(nil), data...)
t2 := time.Now()
sort.Slice(native, func(i, j int) bool { return native[i] < native[j] })
nt := time.Since(t2)
fmt.Printf("n=%9d radix=%9.2fms pdqsort=%9.2fms speedup=%.2fx\n",
n, float64(rt.Microseconds())/1000.0,
float64(nt.Microseconds())/1000.0,
float64(nt)/float64(rt))
}
}
Java¶
import java.util.Arrays;
import java.util.Random;
public class BenchmarkTask {
static void radixSort(int[] arr) {
// implement (see middle.md Java Example)
}
public static void main(String[] args) {
int[] sizes = {10_000, 100_000, 1_000_000, 10_000_000};
Random rng = new Random(42);
for (int n : sizes) {
int[] data = new int[n];
for (int i = 0; i < n; i++) data[i] = rng.nextInt() & 0x7FFFFFFF;
int[] radixCopy = data.clone();
long t1 = System.nanoTime();
radixSort(radixCopy);
long rt = System.nanoTime() - t1;
int[] nativeCopy = data.clone();
long t2 = System.nanoTime();
Arrays.sort(nativeCopy);
long nt = System.nanoTime() - t2;
System.out.printf("n=%9d radix=%9.2fms arraysort=%9.2fms speedup=%.2fx%n",
n, rt / 1e6, nt / 1e6, (double) nt / rt);
}
}
}
Python¶
import random
import timeit
def radix_sort_u32(arr):
"""Implement byte LSD radix sort (see middle.md Python example)."""
# TODO: implement
pass
def benchmark():
sizes = [10_000, 100_000, 1_000_000]
for n in sizes:
random.seed(42)
data = [random.randint(0, 2**31 - 1) for _ in range(n)]
rt = timeit.timeit(lambda: radix_sort_u32(data.copy()), number=3) / 3
nt = timeit.timeit(lambda: sorted(data), number=3) / 3
print(f"n={n:>9} radix={rt*1000:9.2f}ms "
f"timsort={nt*1000:9.2f}ms "
f"speedup={nt/rt:.2f}x")
if __name__ == "__main__":
benchmark()
Expected results (modern x86):
| n | Go Radix vs Pdqsort | Java Radix vs Arrays.sort | Python Radix vs sorted() |
|---|---|---|---|
| 10⁴ | 0.5-1.0× | 0.5-1.0× | 0.01× (Python loop slow) |
| 10⁵ | 1.5-2.5× | 1.5-2.5× | 0.05× |
| 10⁶ | 2.0-3.0× | 2.0-3.0× | 0.1× |
| 10⁷ | 2.5-3.5× | 2.5-3.5× | not feasible (pure Python) |
Note: Pure-Python radix is much slower than C-implemented sorted(). For real Python speed, use NumPy np.argsort or compile with numba.
Stretch Tasks¶
Task S1: Implement 3-way Radix Quicksort (Bentley-Sedgewick) for variable-length strings. Compare to MSD.
Task S2: Use Radix Sort to build a suffix array of a 1 MB text file. Compare correctness with naive sorted(suffixes).
Task S3: Implement Radix Sort using SIMD intrinsics (Go: golang.org/x/sys, Java: Vector API, Python: NumPy). Measure speedup.
Task S4: Write a Radix Sort that runs on a GPU via CUDA (Python via cupy or pycuda). Compare to CPU Radix on the same data.
Task S5: Build a streaming partial radix sort: given an unbounded stream of integers, maintain the top-1000 in sorted order using radix-style top-K. Cannot store all input.
Solution Approach Notes¶
For each task, follow this checklist:
- Specify the key model. Non-negative
uint32? Signedint32?float32? String? Get this clear before writing code. - Choose the radix. Default: byte (k=256). Decimal (k=10) for clarity in tasks 1-3.
- Test edge cases: empty array, single element, all equal, already sorted, reverse sorted, all zeros.
- Verify stability with the
(key, idx)pair test. - Measure — radix should beat language built-in by 2-3× for n ≥ 10⁵ on integer keys. If it doesn't, look for allocation in inner loop.