Java Basic Syntax — Optimize the Code¶
Practice optimizing slow, inefficient, or resource-heavy Java code related to basic syntax. Each exercise contains working but suboptimal code — your job is to make it faster, leaner, or more efficient.
How to Use¶
- Read the slow code and understand what it does
- Identify the performance bottleneck
- Write your optimized version
- Compare with the solution and benchmark results
- Understand why the optimization works
Difficulty Levels¶
| Level | Focus |
|---|---|
| 🟢 | Easy — Obvious inefficiencies, simple fixes |
| 🟡 | Medium — Algorithmic improvements, allocation reduction |
| 🔴 | Hard — JIT-aware code, escape analysis, runtime-level optimizations |
Optimization Categories¶
| Category | Icon | Description |
|---|---|---|
| Memory | 📦 | Reduce allocations, reuse buffers, avoid copies |
| CPU | ⚡ | Better algorithms, fewer operations, cache efficiency |
| Concurrency | 🔄 | Better parallelism, reduce contention, avoid locks |
| I/O | 💾 | Batch operations, buffering, connection reuse |
Exercise 1: String Concatenation in a Loop 🟢 📦¶
What the code does: Builds a large string by concatenating 100,000 tokens in a loop.
The problem: Each += creates a new String object, copies the entire previous content, and discards the old object — O(n^2) allocations.
public class Main {
public static void main(String[] args) {
long start = System.nanoTime();
String result = "";
for (int i = 0; i < 100_000; i++) {
result += "item" + i + ",";
}
long elapsed = System.nanoTime() - start;
System.out.println("Length: " + result.length());
System.out.println("Time: " + (elapsed / 1_000_000) + " ms");
}
}
Current benchmark:
💡 Hint
Think about `StringBuilder` — it maintains a mutable internal `char[]` buffer and avoids creating a new `String` on every append.⚡ Optimized Code
public class Main {
public static void main(String[] args) {
long start = System.nanoTime();
// Pre-size the builder: average token ~9 chars → ~900,000 chars total
StringBuilder sb = new StringBuilder(1_000_000);
for (int i = 0; i < 100_000; i++) {
sb.append("item").append(i).append(',');
}
String result = sb.toString();
long elapsed = System.nanoTime() - start;
System.out.println("Length: " + result.length());
System.out.println("Time: " + (elapsed / 1_000_000) + " ms");
}
}
📚 Learn More
**Why this works:** `String` is immutable in Java. Every `+=` creates a new `String`, copies all previous characters into it, and the old one becomes garbage. With 100,000 iterations this means copying ~5 billion characters total (quadratic). `StringBuilder` uses a mutable `char[]` buffer that grows by doubling — amortized O(1) per append. **When to apply:** Any loop that builds a string incrementally — log messages, CSV generation, query building, template rendering. **When NOT to apply:** Simple one-line concatenations like `"Hello " + name + "!"` — the compiler already optimizes these into `StringBuilder` calls. Forcing `StringBuilder` for two-three concatenations hurts readability with no measurable gain.Exercise 2: Autoboxing in Tight Loops 🟢 📦¶
What the code does: Sums 10 million integers using a Long wrapper type.
The problem: Every addition autoboxes the long result into a new Long object — millions of unnecessary heap allocations.
public class Main {
public static void main(String[] args) {
long start = System.nanoTime();
Long sum = 0L;
for (int i = 0; i < 10_000_000; i++) {
sum += i; // autoboxing: Long.valueOf(sum.longValue() + i)
}
long elapsed = System.nanoTime() - start;
System.out.println("Sum: " + sum);
System.out.println("Time: " + (elapsed / 1_000_000) + " ms");
}
}
Current benchmark:
💡 Hint
Use primitive `long` instead of wrapper `Long`. Primitives live on the stack and require zero heap allocation.⚡ Optimized Code
public class Main {
public static void main(String[] args) {
long start = System.nanoTime();
long sum = 0L; // primitive — no boxing
for (int i = 0; i < 10_000_000; i++) {
sum += i; // pure arithmetic, no object creation
}
long elapsed = System.nanoTime() - start;
System.out.println("Sum: " + sum);
System.out.println("Time: " + (elapsed / 1_000_000) + " ms");
}
}
📚 Learn More
**Why this works:** `Long` is a heap-allocated object (16 bytes header + 8 bytes data = 24 bytes). Each `sum += i` effectively calls `sum = Long.valueOf(sum.longValue() + i)`, creating a new `Long` per iteration. With 10M iterations that is ~240 MB of garbage. The primitive `long` uses a single 8-byte stack/register slot and the `+=` compiles to a single `iadd` / `ladd` bytecode instruction. **When to apply:** Any arithmetic-heavy loop, accumulators, counters, index variables. Always prefer `int`/`long`/`double` over `Integer`/`Long`/`Double` in computation loops. **When NOT to apply:** When you need nullability (e.g., database values that can be NULL) or when storing in generic collections (`ListExercise 3: System.out.println in Loops 🟢 💾¶
What the code does: Writes 100,000 lines to standard output in a loop.
The problem: System.out.println flushes after every line by default — each call triggers a separate I/O syscall.
public class Main {
public static void main(String[] args) {
long start = System.nanoTime();
for (int i = 0; i < 100_000; i++) {
System.out.println("Line " + i + ": some data here");
}
long elapsed = System.nanoTime() - start;
System.err.println("Time: " + (elapsed / 1_000_000) + " ms");
}
}
Current benchmark:
💡 Hint
Wrap `System.out` in a `BufferedWriter` or `PrintWriter` with a large buffer and flush once at the end. This batches many small writes into fewer large I/O syscalls.⚡ Optimized Code
import java.io.BufferedWriter;
import java.io.OutputStreamWriter;
import java.io.IOException;
public class Main {
public static void main(String[] args) throws IOException {
long start = System.nanoTime();
// 64KB buffer batches ~2,000 lines per syscall
BufferedWriter writer = new BufferedWriter(
new OutputStreamWriter(System.out), 65536
);
for (int i = 0; i < 100_000; i++) {
writer.write("Line ");
writer.write(String.valueOf(i));
writer.write(": some data here");
writer.newLine();
}
writer.flush();
long elapsed = System.nanoTime() - start;
System.err.println("Time: " + (elapsed / 1_000_000) + " ms");
}
}
📚 Learn More
**Why this works:** `System.out` is a `PrintStream` that auto-flushes on every `println()`. Each flush triggers a `write()` syscall, which involves a context switch from user space to kernel space (~1-10 microseconds). With 100,000 lines, that overhead dominates. `BufferedWriter` accumulates data in a user-space buffer and only calls `write()` when the buffer is full, amortizing the syscall cost across thousands of lines. **When to apply:** Any program that writes many small chunks to stdout, a file, or a socket — log generators, report builders, CSV exporters. **When NOT to apply:** Interactive programs where the user expects immediate output (shell prompts, progress indicators). In those cases, auto-flush is desired.Exercise 4: Inefficient Array Copying via Manual Loop 🟡 📦¶
What the code does: Copies 1 million integers from one array to another using an element-by-element loop.
The problem: The manual loop prevents the JVM from using optimized native memory copy operations.
public class Main {
public static void main(String[] args) {
int[] source = new int[1_000_000];
for (int i = 0; i < source.length; i++) {
source[i] = i;
}
long start = System.nanoTime();
int[] dest = new int[source.length];
for (int i = 0; i < source.length; i++) {
dest[i] = source[i]; // element-by-element copy
}
long elapsed = System.nanoTime() - start;
System.out.println("Copied: " + dest.length + " elements");
System.out.println("Time: " + (elapsed / 1_000) + " us");
}
}
Current benchmark:
💡 Hint
`System.arraycopy()` is a JVM intrinsic — it compiles to a single `memcpy`/`memmove` operation using SIMD instructions. Also consider `Arrays.copyOf()`.⚡ Optimized Code
public class Main {
public static void main(String[] args) {
int[] source = new int[1_000_000];
for (int i = 0; i < source.length; i++) {
source[i] = i;
}
long start = System.nanoTime();
// System.arraycopy is a JVM intrinsic — uses native memcpy
int[] dest = new int[source.length];
System.arraycopy(source, 0, dest, 0, source.length);
long elapsed = System.nanoTime() - start;
System.out.println("Copied: " + dest.length + " elements");
System.out.println("Time: " + (elapsed / 1_000) + " us");
}
}
📚 Learn More
**Why this works:** `System.arraycopy()` is declared `native` and recognized as a JVM intrinsic by HotSpot. The JIT compiler replaces the call with platform-specific optimized memory copy routines that use SIMD instructions (e.g., `rep movsb` on x86 or AVX vector loads/stores). A manual loop, while potentially auto-vectorized by the JIT, adds bounds-check overhead per element and may not achieve the same throughput. **When to apply:** Array duplication, sliding window buffers, ring buffer implementations, any bulk data movement between arrays. **When NOT to apply:** When copying only a few elements (< 10) — the overhead of setting up the native call exceeds the per-element loop cost. Also not needed if you can avoid the copy entirely (e.g., pass a read-only view).Exercise 5: String.format() in Hot Path 🟡 ⚡¶
What the code does: Formats 1 million log-style messages using String.format().
The problem: String.format() parses the format string on every call, creates a Formatter object, and allocates intermediate buffers — extremely heavy for a hot path.
public class Main {
public static void main(String[] args) {
long start = System.nanoTime();
String[] results = new String[1_000_000];
for (int i = 0; i < 1_000_000; i++) {
results[i] = String.format("User %d logged in at %d ms", i, System.nanoTime());
}
long elapsed = System.nanoTime() - start;
System.out.println("Generated: " + results.length + " messages");
System.out.println("Time: " + (elapsed / 1_000_000) + " ms");
}
}
Current benchmark:
Generated: 1,000,000 messages
Time: 2850 ms
Allocations: ~5,000,000 objects (Formatter, StringBuilder, char[], etc.)
💡 Hint
Replace `String.format()` with direct `StringBuilder` concatenation or simple `+` concatenation. The JIT can optimize `+` into efficient `StringBuilder` chains, but it cannot optimize away the `Formatter` parsing.⚡ Optimized Code
public class Main {
public static void main(String[] args) {
long start = System.nanoTime();
String[] results = new String[1_000_000];
for (int i = 0; i < 1_000_000; i++) {
// Simple concatenation — JIT optimizes this into StringBuilder
results[i] = "User " + i + " logged in at " + System.nanoTime() + " ms";
}
long elapsed = System.nanoTime() - start;
System.out.println("Generated: " + results.length + " messages");
System.out.println("Time: " + (elapsed / 1_000_000) + " ms");
}
}
📚 Learn More
**Why this works:** `String.format()` internally creates a `java.util.Formatter`, which parses the format string character by character on every call, creates a `StringBuilder`, and handles argument boxing. For simple integer/string formatting, this is massive overkill. The `+` operator is compiled by `javac` into `StringBuilder.append()` chains (or `invokedynamic` `StringConcatFactory` on Java 9+), which is far simpler. **When to apply:** High-throughput log message generation, CSV row building, any hot loop that formats strings with simple values. **When NOT to apply:** Complex formatting with locale-specific number/date formatting, or when the format string comes from configuration/user input. Also fine for cold paths where readability matters more.Exercise 6: Synchronized Counter vs AtomicLong 🟡 🔄¶
What the code does: Increments a shared counter from 4 threads, 1 million times each.
The problem: synchronized acquires a monitor lock on every increment, causing heavy thread contention and context switches.
public class Main {
private static long counter = 0;
public static synchronized void increment() {
counter++;
}
public static void main(String[] args) throws InterruptedException {
int threadCount = 4;
int perThread = 1_000_000;
Thread[] threads = new Thread[threadCount];
long start = System.nanoTime();
for (int t = 0; t < threadCount; t++) {
threads[t] = new Thread(() -> {
for (int i = 0; i < perThread; i++) {
increment();
}
});
threads[t].start();
}
for (Thread thread : threads) {
thread.join();
}
long elapsed = System.nanoTime() - start;
System.out.println("Counter: " + counter);
System.out.println("Time: " + (elapsed / 1_000_000) + " ms");
}
}
Current benchmark:
💡 Hint
Use `java.util.concurrent.atomic.AtomicLong` — it uses CPU-level CAS (Compare-And-Swap) instructions instead of OS-level monitor locks. For even higher throughput, consider `LongAdder`.⚡ Optimized Code
import java.util.concurrent.atomic.LongAdder;
public class Main {
// LongAdder uses striped cells — each thread increments its own cell
private static final LongAdder counter = new LongAdder();
public static void main(String[] args) throws InterruptedException {
int threadCount = 4;
int perThread = 1_000_000;
Thread[] threads = new Thread[threadCount];
long start = System.nanoTime();
for (int t = 0; t < threadCount; t++) {
threads[t] = new Thread(() -> {
for (int i = 0; i < perThread; i++) {
counter.increment(); // lock-free, per-cell increment
}
});
threads[t].start();
}
for (Thread thread : threads) {
thread.join();
}
long elapsed = System.nanoTime() - start;
System.out.println("Counter: " + counter.sum());
System.out.println("Time: " + (elapsed / 1_000_000) + " ms");
}
}
📚 Learn More
**Why this works:** `synchronized` uses a monitor lock backed by OS mutexes under contention. When 4 threads compete for the same lock, 3 are always blocked — they context-switch to the OS scheduler and back. `LongAdder` uses a `Cell[]` array where each thread hashes to its own cell and increments it independently. The cells are `@Contended`-padded to avoid false sharing. `sum()` reads all cells at the end. This converts a contended single-writer pattern into a distributed multi-writer pattern. **When to apply:** High-contention counters (request counters, metrics, statistics), event counting in high-throughput systems. **When NOT to apply:** When you need the counter value to be immediately visible to other threads (use `AtomicLong` instead). `LongAdder.sum()` is not atomic — it gives an approximate snapshot. Also unnecessary for single-threaded code.Exercise 7: Reading File Line by Line Without Buffering 🟡 💾¶
What the code does: Reads a file line by line using unbuffered InputStreamReader and manual character scanning.
The problem: Without buffering, each read() call triggers a separate I/O syscall to the OS.
import java.io.FileWriter;
import java.io.FileReader;
import java.io.IOException;
public class Main {
public static void main(String[] args) throws IOException {
// Create test file
String filename = "testfile.txt";
FileWriter fw = new FileWriter(filename);
for (int i = 0; i < 100_000; i++) {
fw.write("This is line number " + i + " with some padding data\n");
}
fw.close();
// Slow read: character by character without buffering
long start = System.nanoTime();
FileReader reader = new FileReader(filename);
StringBuilder line = new StringBuilder();
int lineCount = 0;
int ch;
while ((ch = reader.read()) != -1) { // 1 syscall per character!
if (ch == '\n') {
lineCount++;
line.setLength(0);
} else {
line.append((char) ch);
}
}
reader.close();
long elapsed = System.nanoTime() - start;
System.out.println("Lines: " + lineCount);
System.out.println("Time: " + (elapsed / 1_000_000) + " ms");
new java.io.File(filename).delete();
}
}
Current benchmark:
💡 Hint
Use `BufferedReader` which reads 8KB chunks at a time, or use `Files.lines()` / `Files.readAllLines()` for an even simpler API. The key is to amortize the syscall cost over thousands of characters.⚡ Optimized Code
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
public class Main {
public static void main(String[] args) throws IOException {
// Create test file
String filename = "testfile.txt";
FileWriter fw = new FileWriter(filename);
for (int i = 0; i < 100_000; i++) {
fw.write("This is line number " + i + " with some padding data\n");
}
fw.close();
// Fast read: BufferedReader reads 8KB chunks, readLine() scans buffer
long start = System.nanoTime();
BufferedReader reader = new BufferedReader(new FileReader(filename), 65536);
int lineCount = 0;
while (reader.readLine() != null) {
lineCount++;
}
reader.close();
long elapsed = System.nanoTime() - start;
System.out.println("Lines: " + lineCount);
System.out.println("Time: " + (elapsed / 1_000_000) + " ms");
new java.io.File(filename).delete();
}
}
📚 Learn More
**Why this works:** Each `read()` on an unbuffered `FileReader` invokes a native `read(fd, buf, 1)` syscall. Syscalls take 1-10 microseconds due to the user-kernel context switch. `BufferedReader` calls `read(fd, buf, 65536)` once to fill its internal buffer, then subsequent `read()` calls return characters from the buffer (a simple array access — ~1 nanosecond). This amortizes the syscall overhead across 65,536 characters. **When to apply:** Any file or stream reading — config file parsing, log analysis, data import pipelines. **When NOT to apply:** When reading from an already-buffered source (e.g., `ByteArrayInputStream`), adding another buffer layer is wasteful. Also consider memory-mapped files (`FileChannel.map()`) for random access on very large files.Exercise 8: Virtual Method Dispatch vs Static Methods 🟡 ⚡¶
What the code does: Calls a method 100 million times through an interface reference vs directly.
The problem: Virtual method dispatch through an interface requires vtable lookup, which prevents inlining and adds indirection overhead.
public class Main {
interface Calculator {
int compute(int a, int b);
}
static class Adder implements Calculator {
public int compute(int a, int b) {
return a + b;
}
}
public static void main(String[] args) {
Calculator calc = new Adder();
int result = 0;
long start = System.nanoTime();
for (int i = 0; i < 100_000_000; i++) {
result += calc.compute(i, 1); // virtual dispatch each call
}
long elapsed = System.nanoTime() - start;
System.out.println("Result: " + result);
System.out.println("Time: " + (elapsed / 1_000_000) + " ms");
}
}
Current benchmark:
💡 Hint
Use a concrete type reference or a `static` method. The JIT can inline monomorphic call sites, but using the concrete type makes it a guaranteed devirtualization. Also consider the `final` keyword on the class.⚡ Optimized Code
public class Main {
// final prevents subclassing — guarantees devirtualization
static final class Adder {
static int compute(int a, int b) {
return a + b;
}
}
public static void main(String[] args) {
int result = 0;
long start = System.nanoTime();
for (int i = 0; i < 100_000_000; i++) {
result += Adder.compute(i, 1); // static dispatch — no vtable
}
long elapsed = System.nanoTime() - start;
System.out.println("Result: " + result);
System.out.println("Time: " + (elapsed / 1_000_000) + " ms");
}
}
📚 Learn More
**Why this works:** Virtual dispatch requires loading the object's class pointer, indexing into the vtable, and performing an indirect jump. This adds ~2-5ns of overhead and, critically, prevents inlining. When the JIT inlines a method, it can further optimize: constant folding, dead code elimination, loop vectorization. A `static` method on a `final` class is a direct call — always inlinable. Note: HotSpot's JIT does perform speculative devirtualization for monomorphic sites (only one implementation seen), but this is a guarded optimization that can be deoptimized. **When to apply:** Ultra-hot inner loops in numerical computation, game engines, parsers. When the polymorphism is not needed. **When NOT to apply:** When you genuinely need polymorphism (strategy pattern, plugin architectures). The JIT handles monomorphic call sites well — only megamorphic sites (3+ implementations) cause real problems. Premature devirtualization destroys maintainability.Exercise 9: JIT Compilation Warmup and Benchmark Accuracy 🔴 ⚡¶
What the code does: Benchmarks a simple computation but gets inaccurate results because the JIT has not warmed up.
The problem: The first invocations run in interpreted mode (10-100x slower). The benchmark measures startup cost, not steady-state performance.
public class Main {
static int compute(int x) {
int result = 0;
for (int i = 0; i < 1000; i++) {
result += x * i + (i % 7);
}
return result;
}
public static void main(String[] args) {
// BAD: No warmup — measures interpreted bytecode speed
long start = System.nanoTime();
int total = 0;
for (int i = 0; i < 10_000; i++) {
total += compute(i);
}
long elapsed = System.nanoTime() - start;
System.out.println("Total: " + total);
System.out.println("Time: " + (elapsed / 1_000_000) + " ms");
}
}
Current benchmark:
Profiling output:
-XX:+PrintCompilation shows:
10ms: compute compiled by C1 (tier 3)
45ms: compute compiled by C2 (tier 4, optimized)
First 5000 iterations run in interpreted/C1 mode
💡 Hint
Add a warmup phase that runs the same code path at least 10,000 times before measuring. This triggers JIT compilation (C1 at ~1,500 invocations, C2 at ~10,000 invocations on HotSpot). Use a `volatile` or `System.identityHashCode()` trick to prevent dead code elimination.⚡ Optimized Code
public class Main {
static int compute(int x) {
int result = 0;
for (int i = 0; i < 1000; i++) {
result += x * i + (i % 7);
}
return result;
}
// volatile prevents the JIT from eliminating the warmup computation
static volatile int blackhole;
public static void main(String[] args) {
// WARMUP: trigger C2 JIT compilation
for (int w = 0; w < 20_000; w++) {
blackhole = compute(w);
}
// MEASUREMENT: JIT-compiled steady-state performance
long start = System.nanoTime();
int total = 0;
for (int i = 0; i < 10_000; i++) {
total += compute(i);
}
long elapsed = System.nanoTime() - start;
System.out.println("Total: " + total);
System.out.println("Time: " + (elapsed / 1_000_000) + " ms");
}
}
📚 Learn More
**Advanced concept:** HotSpot JVM uses tiered compilation: Tier 0 (interpreter), Tier 1-3 (C1 compiler — fast compile, basic optimizations), Tier 4 (C2 compiler — slow compile, aggressive optimizations like loop unrolling, vectorization, escape analysis). C2 kicks in after ~10,000 invocations. Without warmup, your benchmark measures a mix of all tiers. For production-grade benchmarks, use JMH (Java Microbenchmark Harness) which handles warmup, dead code elimination, and fork isolation automatically. **JVM flags for investigation:** - `-XX:+PrintCompilation` — shows when methods are compiled and at what tier - `-XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining` — shows inlining decisions - `-XX:CompileThreshold=N` — adjust invocation threshold for compilation **When to apply:** Any microbenchmark, performance regression testing, comparative benchmarks. **When NOT to apply:** In production code, warmup happens naturally. This is purely a benchmarking technique. For serious benchmarks, use JMH instead of hand-rolled timing.Exercise 10: Escape Analysis — Heap vs Stack Allocation 🔴 📦¶
What the code does: Creates a small temporary Point object in every iteration of a hot loop.
The problem: The object is allocated on the heap even though it never escapes the method — escape analysis could eliminate the allocation, but the code structure prevents it.
public class Main {
static class Point {
double x, y;
Point(double x, double y) { this.x = x; this.y = y; }
double distanceTo(Point other) {
double dx = this.x - other.x;
double dy = this.y - other.y;
return Math.sqrt(dx * dx + dy * dy);
}
}
// This list forces the Point to "escape" — prevents scalar replacement
static java.util.List<Point> leaked = new java.util.ArrayList<>();
static double computeDistance(int i) {
Point a = new Point(i, i + 1);
Point b = new Point(i + 2, i + 3);
if (i == Integer.MAX_VALUE) {
leaked.add(a); // rare but JIT sees it — prevents escape analysis
}
return a.distanceTo(b);
}
public static void main(String[] args) {
// Warmup
for (int w = 0; w < 20_000; w++) {
computeDistance(w);
}
long start = System.nanoTime();
double total = 0;
for (int i = 0; i < 10_000_000; i++) {
total += computeDistance(i);
}
long elapsed = System.nanoTime() - start;
System.out.println("Total: " + total);
System.out.println("Time: " + (elapsed / 1_000_000) + " ms");
}
}
Current benchmark:
Total: 2.828...E7
Time: 185 ms
Allocations: 20,000,000 Point objects on the heap
GC events: 15 minor GCs
Profiling output:
💡 Hint
Remove the code path that leaks the object reference. If the JIT can prove an object never escapes the method, it performs "scalar replacement" — the object's fields become local variables on the stack. No allocation, no GC.⚡ Optimized Code
public class Main {
// final class — helps JIT guarantee no subclass overrides distanceTo
static final class Point {
final double x, y;
Point(double x, double y) { this.x = x; this.y = y; }
double distanceTo(Point other) {
double dx = this.x - other.x;
double dy = this.y - other.y;
return Math.sqrt(dx * dx + dy * dy);
}
}
static double computeDistance(int i) {
// No escape path — JIT performs scalar replacement
Point a = new Point(i, i + 1);
Point b = new Point(i + 2, i + 3);
return a.distanceTo(b);
}
public static void main(String[] args) {
// Warmup
for (int w = 0; w < 20_000; w++) {
computeDistance(w);
}
long start = System.nanoTime();
double total = 0;
for (int i = 0; i < 10_000_000; i++) {
total += computeDistance(i);
}
long elapsed = System.nanoTime() - start;
System.out.println("Total: " + total);
System.out.println("Time: " + (elapsed / 1_000_000) + " ms");
}
}
📚 Learn More
**Advanced concept:** Escape Analysis (EA) is a C2 JIT optimization that determines whether an object's reference can "escape" the method or thread. If it cannot, the JIT performs **scalar replacement**: instead of allocating a `Point` object (header + x + y = 32 bytes on heap), it creates two local `double` variables on the stack. Stack allocation is free (just a stack pointer adjustment), and stack-allocated data is automatically reclaimed when the method returns — no GC involved. **JVM flags for investigation:** - `-XX:+PrintEscapeAnalysis` — shows escape analysis decisions - `-XX:+PrintEliminateAllocations` — confirms which allocations were removed - `-XX:+DoEscapeAnalysis` (enabled by default) — toggle EA - `-XX:+EliminateAllocations` (enabled by default) — toggle scalar replacement **Escape categories:** - `NoEscape` — object stays within the method → scalar replacement possible - `ArgEscape` — object passed as argument but doesn't escape thread → possible stack allocation - `GlobalEscape` — object stored in static field or returned → must be heap allocated **When to apply:** Intermediate value objects in numerical code, temporary wrappers (pairs, tuples, ranges), iterator objects. **When NOT to apply:** When you genuinely need the object to persist beyond the method scope. Also, EA has limits: objects larger than ~64 fields, arrays, or objects with complex control flow may not be scalar-replaced. Don't contort your code to help EA — write clean code and profile.Exercise 11: Redundant Exception Creation for Control Flow 🔴 🔄¶
What the code does: Parses 1 million strings as integers, using exception catching for non-numeric strings.
The problem: Integer.parseInt() throws NumberFormatException for invalid input. Exception creation fills in the stack trace — an extremely expensive operation (~10 microseconds per exception).
public class Main {
static int parseOrDefault(String s, int defaultVal) {
try {
return Integer.parseInt(s);
} catch (NumberFormatException e) {
return defaultVal; // exception used for control flow
}
}
public static void main(String[] args) {
String[] data = new String[1_000_000];
for (int i = 0; i < data.length; i++) {
// 50% valid numbers, 50% invalid strings
data[i] = (i % 2 == 0) ? String.valueOf(i) : "not_a_number";
}
// Warmup
for (int w = 0; w < 20_000; w++) {
parseOrDefault(data[w % data.length], -1);
}
long start = System.nanoTime();
long sum = 0;
for (String s : data) {
sum += parseOrDefault(s, 0);
}
long elapsed = System.nanoTime() - start;
System.out.println("Sum: " + sum);
System.out.println("Time: " + (elapsed / 1_000_000) + " ms");
}
}
Current benchmark:
💡 Hint
Pre-validate the string before calling `parseInt()`. A simple character scan to check if all characters are digits is ~100x cheaper than creating and throwing an exception. Alternatively, write a custom `tryParseInt` that returns a sentinel value instead of throwing.⚡ Optimized Code
public class Main {
static int parseOrDefault(String s, int defaultVal) {
// Pre-validate: check if string is a valid integer before parsing
if (s == null || s.isEmpty()) return defaultVal;
int start = 0;
if (s.charAt(0) == '-' || s.charAt(0) == '+') {
if (s.length() == 1) return defaultVal;
start = 1;
}
for (int i = start; i < s.length(); i++) {
if (s.charAt(i) < '0' || s.charAt(i) > '9') {
return defaultVal; // not a digit — return default without exception
}
}
// Safe to parse — we know it's a valid integer string
try {
return Integer.parseInt(s);
} catch (NumberFormatException e) {
return defaultVal; // overflow edge case
}
}
public static void main(String[] args) {
String[] data = new String[1_000_000];
for (int i = 0; i < data.length; i++) {
data[i] = (i % 2 == 0) ? String.valueOf(i) : "not_a_number";
}
// Warmup
for (int w = 0; w < 20_000; w++) {
parseOrDefault(data[w % data.length], -1);
}
long start = System.nanoTime();
long sum = 0;
for (String s : data) {
sum += parseOrDefault(s, 0);
}
long elapsed = System.nanoTime() - start;
System.out.println("Sum: " + sum);
System.out.println("Time: " + (elapsed / 1_000_000) + " ms");
}
}
📚 Learn More
**Advanced concept:** Exception creation in Java is expensive because `Throwable.fillInStackTrace()` is called in the constructor. This native method walks the entire call stack, resolves method names and line numbers, and stores them in a `StackTraceElement[]`. On a typical application with 20-30 frame stacks, this takes 5-15 microseconds. The `throw` and `catch` mechanism itself (stack unwinding) adds another 1-5 microseconds. For comparison, a simple method call takes ~2 nanoseconds — exceptions are 1,000-10,000x more expensive. **Alternative approaches:** - Custom `tryParseInt()` that returns `OptionalInt` — no exception path at all - Pattern matching with pre-compiled regex (slower than char scan but more readable) - On Java 9+: consider `Optional` APIs with `Stream.mapMulti()` **When to apply:** Any hot path where exceptions are expected (parsing user input, data validation, protocol handling). If more than ~1% of calls throw, exceptions dominate the runtime. **When NOT to apply:** Truly exceptional conditions (file not found, network failure, out of memory) where exceptions are rare and carry important diagnostic information. The stack trace is valuable for debugging — don't optimize it away in error paths.Score Card¶
Track your progress:
| Exercise | Difficulty | Category | Found bottleneck? | Your improvement | Target improvement |
|---|---|---|---|---|---|
| 1 | 🟢 | 📦 | ☐ | ___ x | 569x |
| 2 | 🟢 | 📦 | ☐ | ___ x | 9.7x |
| 3 | 🟢 | 💾 | ☐ | ___ x | 16.7x |
| 4 | 🟡 | 📦 | ☐ | ___ x | 4.5x |
| 5 | 🟡 | ⚡ | ☐ | ___ x | 8.4x |
| 6 | 🟡 | 🔄 | ☐ | ___ x | 10.3x |
| 7 | 🟡 | 💾 | ☐ | ___ x | 71x |
| 8 | 🟡 | ⚡ | ☐ | ___ x | 3.8x |
| 9 | 🔴 | ⚡ | ☐ | ___ x | 7.7x |
| 10 | 🔴 | 📦 | ☐ | ___ x | 5.8x |
| 11 | 🔴 | 🔄 | ☐ | ___ x | 56.6x |
Rating:¶
- All targets met → You understand Java performance deeply
- 8-11 targets met → Solid optimization skills
- 5-7 targets met → Good foundation, practice profiling more
- < 5 targets met → Start with JMH and VisualVM basics
Optimization Cheat Sheet¶
Quick reference for common Java optimizations:
| Problem | Solution | Impact |
|---|---|---|
| String concatenation in loop | Use StringBuilder with pre-allocated capacity | High |
| Autoboxing in tight loops | Use primitive types (int, long, double) | High |
System.out.println in loops | Use BufferedWriter with large buffer | High |
| Manual array copying | Use System.arraycopy() — JVM intrinsic | Medium |
String.format() in hot path | Use + concatenation or StringBuilder | High |
| Synchronized counter | Use AtomicLong or LongAdder | High |
| Unbuffered file reading | Use BufferedReader with 64KB buffer | High |
| Virtual method dispatch in hot loop | Use static methods or final classes | Medium |
| Benchmark without warmup | Add 20,000+ warmup iterations, use JMH | Medium |
| Heap allocation of temp objects | Enable escape analysis — avoid escaping references | Medium |
| Exception-based control flow | Pre-validate input before parsing | High |
| Frequent GC pauses | Reduce allocations, reuse objects, use object pools | High |
| Cache line contention | Use @Contended or pad fields manually | Medium |
| Megamorphic call sites | Reduce interface implementations at call site to ≤ 2 | Medium |