Math Operations — Professional Level¶

Table of Contents¶

Introduction
JVM Internals: Numeric Representation
Bytecode Analysis
Math Class JIT Intrinsics
IEEE 754 Deep Dive
BigDecimal Implementation Internals
CPU-Level Math Operations
GC Impact of Numeric Objects
JVM Flags and Tuning
Diagrams & Visual Aids

Introduction¶

Focus: "What happens under the hood?" — JVM, bytecode, GC level

This level explores the JVM internals behind Java math operations: how the HotSpot VM compiles arithmetic to native instructions, what bytecode the compiler generates for different numeric types, how Math methods are JIT-intrinsified to single CPU instructions, and how BigDecimal is implemented internally. Understanding these internals is essential for diagnosing performance anomalies in computation-heavy applications and making informed optimization decisions.

JVM Internals: Numeric Representation¶

JVM Operand Stack and Numeric Types¶

The JVM is a stack-based machine. All arithmetic operations work through the operand stack. The JVM has dedicated bytecode instructions for each numeric type:

Type	Stack Slots	Bytecodes
`int`	1 slot (32 bits)	`iadd`, `isub`, `imul`, `idiv`, `irem`
`long`	2 slots (64 bits)	`ladd`, `lsub`, `lmul`, `ldiv`, `lrem`
`float`	1 slot (32 bits)	`fadd`, `fsub`, `fmul`, `fdiv`, `frem`
`double`	2 slots (64 bits)	`dadd`, `dsub`, `dmul`, `ddiv`, `drem`

Note: byte, short, and char arithmetic is always promoted to int on the JVM operand stack. There are no badd, sadd instructions.

Integer Overflow at the Bytecode Level¶

public class Main {
    public static void main(String[] args) {
        int max = Integer.MAX_VALUE;
        int result = max + 1;  // overflow
        System.out.println(result);
    }
}

The JVM specification (JVMS 2.11.1) states:

The Java Virtual Machine does not indicate overflow during operations on integer data types. The only integer operations that can throw an exception are the integer divide and integer remainder instructions.

The iadd instruction simply performs two's complement addition. If the result exceeds 32 bits, the upper bits are discarded. This is not a bug — it is by design for performance.

Math.addExact() Implementation (OpenJDK)¶

// From java.lang.Math (OpenJDK 17)
public static int addExact(int x, int y) {
    int r = x + y;
    // Overflow occurs if both operands have the same sign
    // but the result has a different sign
    if (((x ^ r) & (y ^ r)) < 0) {
        throw new ArithmeticException("integer overflow");
    }
    return r;
}

The overflow check uses XOR and AND on sign bits — no branching on the happy path (the JIT compiles this to a single jo jump-on-overflow instruction on x86).

Bytecode Analysis¶

Arithmetic Bytecodes¶

public class Main {
    public static int arithmetic(int a, int b) {
        return (a + b) * (a - b) / 2;
    }

    public static void main(String[] args) {
        System.out.println(arithmetic(10, 3));
    }
}

Compile and disassemble:

javac Main.java
javap -c Main.class

Bytecode output:

public static int arithmetic(int, int);
  Code:
     0: iload_0           // push a
     1: iload_1           // push b
     2: iadd              // a + b
     3: iload_0           // push a
     4: iload_1           // push b
     5: isub              // a - b
     6: imul              // (a+b) * (a-b)
     7: iconst_2          // push 2
     8: idiv              // result / 2
     9: ireturn           // return int

BigDecimal Bytecode vs Primitive¶

public class Main {
    // Primitive version
    public static double primitiveCalc(double a, double b) {
        return a * b + a / b;
    }

    // BigDecimal version
    public static java.math.BigDecimal bigDecimalCalc(
            java.math.BigDecimal a, java.math.BigDecimal b) {
        return a.multiply(b).add(a.divide(b, 10, java.math.RoundingMode.HALF_UP));
    }

    public static void main(String[] args) {
        System.out.println(primitiveCalc(10.0, 3.0));
        System.out.println(bigDecimalCalc(
            new java.math.BigDecimal("10"),
            new java.math.BigDecimal("3")));
    }
}

Bytecode comparison:

primitiveCalc:           bigDecimalCalc:
  dload_0                  aload_0
  dload_2                  aload_1
  dmul           vs        invokevirtual multiply  (heap alloc)
  dload_0                  aload_0
  dload_2                  aload_1
  ddiv                     bipush 10
  dadd                     getstatic HALF_UP
  dreturn                  invokevirtual divide    (heap alloc)
                           invokevirtual add       (heap alloc)
                           areturn

Instructions:  6          Instructions: ~15 + 3 object allocations

The primitive version uses only stack operations (no heap allocation). The BigDecimal version allocates at least 3 new objects per call.

Type Conversion Bytecodes¶

public class Main {
    public static void main(String[] args) {
        int i = 42;
        long l = i;       // i2l — widening
        double d = i;     // i2d — widening
        float f = (float) d;  // d2f — narrowing
        int back = (int) d;   // d2i — narrowing (truncates)
        System.out.println(i + " " + l + " " + d + " " + f + " " + back);
    }
}

Conversion bytecodes:

From/To	int	long	float	double
int	—	`i2l`	`i2f`	`i2d`
long	`l2i`	—	`l2f`	`l2d`
float	`f2i`	`f2l`	—	`f2d`
double	`d2i`	`d2l`	`d2f`	—

Widening conversions (i2l, i2d) are always safe. Narrowing conversions (d2i, l2i) may lose precision or truncate.

Math Class JIT Intrinsics¶

What Are Intrinsics?¶

The HotSpot JIT compiler replaces certain Math method calls with direct CPU instructions — no actual method call occurs. These are called intrinsics.

Intrinsified Math methods (x86-64):

Method	x86 Instruction	Notes
`Math.abs(int)`	Inline code (XOR + subtract)	No branch
`Math.abs(double)`	`andpd` (mask sign bit)	Single SSE instruction
`Math.max(int, int)`	`cmov` (conditional move)	No branch
`Math.min(int, int)`	`cmov` (conditional move)	No branch
`Math.sqrt(double)`	`sqrtsd`	Hardware square root
`Math.log(double)`	Software (fdlibm) or hardware	Platform-dependent
`Math.sin(double)`	Software (fdlibm)	Not always intrinsified
`Math.addExact(int, int)`	`add` + `jo` (jump on overflow)	Hardware overflow flag

Verifying Intrinsics with JIT Output¶

# Compile and run with JIT assembly output
javac Main.java
java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -XX:CompileCommand=compileonly,Main::testSqrt Main

public class Main {
    static double testSqrt(double x) {
        return Math.sqrt(x);
    }

    public static void main(String[] args) {
        // Warm up JIT
        for (int i = 0; i < 100_000; i++) {
            testSqrt(i);
        }
        System.out.println(testSqrt(144));
    }
}

Expected JIT assembly (x86-64):

; Math.sqrt intrinsic:
vsqrtsd xmm0, xmm0, xmm0    ; single hardware instruction
ret

The entire Math.sqrt() call compiles to a single CPU instruction — zero overhead compared to writing it in C.

Math.addExact JIT Compilation¶

public class Main {
    static int safeAdd(int a, int b) {
        return Math.addExact(a, b);
    }

    public static void main(String[] args) {
        for (int i = 0; i < 100_000; i++) safeAdd(i, i);
        System.out.println(safeAdd(100, 200));
    }
}

JIT assembly:

; Math.addExact intrinsic:
add    eax, edx           ; regular integer addition
jo     overflow_handler   ; jump if overflow flag set (CPU hardware)
ret

overflow_handler:
; construct and throw ArithmeticException

The overflow check uses the CPU's hardware overflow flag — it costs essentially zero on the non-overflow path.

IEEE 754 Deep Dive¶

Double-Precision Bit Layout¶

public class Main {
    public static void main(String[] args) {
        double value = -3.14;
        long bits = Double.doubleToRawLongBits(value);

        long sign = (bits >>> 63) & 1;
        long exponent = (bits >>> 52) & 0x7FFL;
        long mantissa = bits & 0xFFFFFFFFFFFFFL;

        System.out.printf("Value:    %f%n", value);
        System.out.printf("Bits:     %s%n", Long.toBinaryString(bits));
        System.out.printf("Sign:     %d (%s)%n", sign, sign == 0 ? "positive" : "negative");
        System.out.printf("Exponent: %d (biased), %d (unbiased)%n", exponent, exponent - 1023);
        System.out.printf("Mantissa: %d%n", mantissa);

        // Explore special values
        System.out.println("\nSpecial values:");
        System.out.printf("NaN bits:      %s%n",
            Long.toHexString(Double.doubleToRawLongBits(Double.NaN)));
        System.out.printf("+Inf bits:     %s%n",
            Long.toHexString(Double.doubleToRawLongBits(Double.POSITIVE_INFINITY)));
        System.out.printf("-0.0 bits:     %s%n",
            Long.toHexString(Double.doubleToRawLongBits(-0.0)));
        System.out.printf("+0.0 bits:     %s%n",
            Long.toHexString(Double.doubleToRawLongBits(0.0)));
        System.out.printf("-0.0 == +0.0:  %b%n", -0.0 == 0.0);  // true!
    }
}

Why 0.1 + 0.2 != 0.3¶

public class Main {
    public static void main(String[] args) {
        // Show exact representation of 0.1 in binary
        double d = 0.1;
        System.out.printf("0.1 exact: %.55f%n", d);
        // 0.1000000000000000055511151231257827021181583404541015625

        // Binary representation of 0.1:
        // 0.0001100110011001100110011001100110011001100110011001101...
        // (repeating pattern, truncated to 52 mantissa bits)

        double sum = 0.1 + 0.2;
        System.out.printf("0.1 + 0.2 = %.55f%n", sum);
        // 0.3000000000000000444089209850062616169452667236328125000

        System.out.printf("0.3 exact: %.55f%n", 0.3);
        // 0.2999999999999999888977697537484345957636833190917968750

        // The closest representable double to the true sum is not the
        // closest representable double to 0.3
    }
}

Negative Zero Behavior¶

public class Main {
    public static void main(String[] args) {
        double posZero = 0.0;
        double negZero = -0.0;

        System.out.println(posZero == negZero);                      // true  (== says equal)
        System.out.println(Double.compare(posZero, negZero));         // 1     (compare says different)
        System.out.println(Double.doubleToRawLongBits(posZero) ==
                          Double.doubleToRawLongBits(negZero));       // false (different bits)

        System.out.println(1.0 / posZero);   // Infinity
        System.out.println(1.0 / negZero);   // -Infinity

        // Math.abs preserves negative zero in some JVMs:
        System.out.println(1.0 / Math.abs(negZero));  // Infinity (abs fixes it)
    }
}

BigDecimal Implementation Internals¶

Internal Representation (OpenJDK 17)¶

// Simplified from java.math.BigDecimal source
public class BigDecimal extends Number implements Comparable<BigDecimal> {
    // For values fitting in long: stored directly (fast path)
    private final transient long intCompact;  // INFLATED if too large

    // For larger values: stored as BigInteger
    private volatile BigInteger intVal;

    // Scale = number of digits to the right of decimal point
    private final int scale;

    // Cached precision (lazy, 0 = not computed)
    private transient int precision;

    // Cached toString (lazy)
    private transient String stringCache;

    static final long INFLATED = Long.MIN_VALUE;  // sentinel value
}

Key insight: BigDecimal has a compact representation for values that fit in a long. The intCompact field stores the unscaled value directly, avoiding BigInteger allocation. Only when the value exceeds long range does it inflate to BigInteger.

How BigDecimal.add() Works Internally¶

public class Main {
    public static void main(String[] args) {
        // Trace what happens inside add()
        java.math.BigDecimal a = new java.math.BigDecimal("123.45");   // intCompact=12345, scale=2
        java.math.BigDecimal b = new java.math.BigDecimal("0.001");    // intCompact=1, scale=3

        // Step 1: Align scales — a must become scale=3
        //   a: 12345 * 10 = 123450, scale=3
        // Step 2: Add compact values
        //   123450 + 1 = 123451
        // Step 3: Create new BigDecimal(123451, 3)
        java.math.BigDecimal result = a.add(b);
        System.out.println(result);          // 123.451
        System.out.println(result.scale());  // 3
    }
}

The scale alignment step can cause long overflow (e.g., multiplying by 10^N), in which case BigDecimal falls back to BigInteger arithmetic.

BigInteger Internals¶

// Simplified from java.math.BigInteger source
public class BigInteger extends Number implements Comparable<BigInteger> {
    final int signum;     // -1, 0, or 1
    final int[] mag;      // magnitude stored as big-endian int array

    // For small values, uses compact representation:
    // mag.length == 1, direct int arithmetic

    // For large values, uses Karatsuba or Toom-Cook multiplication:
    // Karatsuba: O(n^1.585) for > 80 ints
    // Toom-Cook 3: O(n^1.465) for > 240 ints
    // Schoenhage-Strassen (via NTT): for very large numbers
}

public class Main {
    public static void main(String[] args) {
        java.math.BigInteger a = new java.math.BigInteger("123456789012345678901234567890");
        java.math.BigInteger b = a.multiply(a);
        System.out.println("Digits: " + b.toString().length());  // 60 digits
        System.out.println("Bit length: " + b.bitLength());      // ~199 bits
    }
}

CPU-Level Math Operations¶

x86-64 Instruction Mapping¶

Java Operation	JIT Output (x86-64)	Latency (cycles)
`int + int`	`add eax, edx`	1
`int * int`	`imul eax, edx`	3
`int / int`	`idiv ecx`	20-90
`double + double`	`vaddsd xmm0, xmm0, xmm1`	3-5
`double * double`	`vmulsd xmm0, xmm0, xmm1`	3-5
`double / double`	`vdivsd xmm0, xmm0, xmm1`	13-20
`Math.sqrt(double)`	`vsqrtsd xmm0, xmm0, xmm0`	13-19
`Math.abs(int)`	`mov + neg + cmovl` (branchless)	2-3

Key observation: Integer division is extremely expensive (20-90 cycles). When dividing by constants, the JIT often replaces division with multiplication by a magic constant:

public class Main {
    static int divideBy7(int x) {
        return x / 7;
    }
    // JIT compiles to: multiply by magic constant + shift
    // imul rax, rcx, 0x24924925
    // sar rax, 34
    // This avoids the expensive idiv instruction

    public static void main(String[] args) {
        System.out.println(divideBy7(49));  // 7
    }
}

GC Impact of Numeric Objects¶

Autoboxing GC Pressure¶

public class Main {
    public static void main(String[] args) {
        // Each iteration creates a new Integer object (autoboxing)
        long start = System.nanoTime();
        Integer sum = 0;
        for (int i = 0; i < 10_000_000; i++) {
            sum += i;  // unbox, add, rebox — creates Integer object each time
        }
        long boxedTime = System.nanoTime() - start;

        // Primitive — no GC pressure
        start = System.nanoTime();
        int sum2 = 0;
        for (int i = 0; i < 10_000_000; i++) {
            sum2 += i;
        }
        long primitiveTime = System.nanoTime() - start;

        System.out.printf("Boxed:     %,d ns%n", boxedTime);
        System.out.printf("Primitive: %,d ns%n", primitiveTime);
        System.out.printf("Ratio:     %.1fx slower%n", (double) boxedTime / primitiveTime);
    }
}

Typical output:

Boxed:     89,000,000 ns
Primitive:  5,000,000 ns
Ratio:     17.8x slower

Integer Cache (-128 to 127)¶

public class Main {
    public static void main(String[] args) {
        Integer a = 127;
        Integer b = 127;
        System.out.println(a == b);  // true — cached

        Integer c = 128;
        Integer d = 128;
        System.out.println(c == d);  // false — new objects

        // The cache range can be extended:
        // -XX:AutoBoxCacheMax=1000
    }
}

The JVM caches Integer objects from -128 to 127 (configurable via -XX:AutoBoxCacheMax). Values outside this range allocate new objects on every boxing operation.

JVM Flags and Tuning¶

Flag	Default	Description
`-XX:AutoBoxCacheMax=N`	127	Extend Integer autobox cache to N
`-XX:+AggressiveOpts`	false	Enable aggressive optimizations (deprecated)
`-XX:+UseCompressedOops`	true (64-bit)	Reduce object header size
`-XX:+EliminateAutoBox`	true	JIT eliminates unnecessary boxing/unboxing
`-XX:+UseFMA`	true (JDK 9+)	Use Fused Multiply-Add CPU instruction

Fused Multiply-Add (FMA)¶

Java 9 added Math.fma(a, b, c) which computes a * b + c in a single operation with only one rounding (instead of two):

public class Main {
    public static void main(String[] args) {
        double a = 1.0000001;
        double b = 1.0000002;
        double c = -1.0000003;

        // Standard: two roundings (multiply, then add)
        double standard = a * b + c;

        // FMA: single rounding (more accurate)
        double fma = Math.fma(a, b, c);

        System.out.printf("Standard: %.20f%n", standard);
        System.out.printf("FMA:      %.20f%n", fma);
    }
}

On x86-64 with AVX2, Math.fma() compiles to a single vfmadd instruction.

Diagrams & Visual Aids¶

JVM Bytecode Execution of `a + b * c`¶

sequenceDiagram participant Code as Source Code participant Stack as Operand Stack participant ALU as CPU ALU Code->>Stack: iload a (push a) Code->>Stack: iload b (push b) Code->>Stack: iload c (push c) Stack->>ALU: imul (pop b, c) ALU->>Stack: push b*c Stack->>ALU: iadd (pop a, b*c) ALU->>Stack: push a + b*c Code->>Stack: ireturn (pop result)

Math Method Resolution Pipeline¶

flowchart TD A["Math.sqrt(x)"] --> B{"JIT compiled?"} B -->|No| C["Interpreter: invokestatic Math.sqrt"] C --> D["Java implementation (StrictMath delegation)"] B -->|Yes| E{"Intrinsic available?"} E -->|Yes| F["Direct CPU instruction vsqrtsd xmm0, xmm0"] E -->|No| G["Compiled method call (still fast, but not 1 instruction)"] style F fill:#c8e6c9 style G fill:#fff9c4 style C fill:#ffcdd2

BigDecimal Internal Decision Tree¶

flowchart TD A["BigDecimal operation"] --> B{"Value fits in long?"} B -->|Yes| C["Compact path: long arithmetic"] C --> D{"Result fits in long?"} D -->|Yes| E["Return compact BigDecimal (fast)"] D -->|No| F["Inflate to BigInteger"] B -->|No| G["BigInteger path: int[] arithmetic"] G --> H{"Size > 80 ints?"} H -->|No| I["Schoolbook O(n^2)"] H -->|Yes| J{"Size > 240 ints?"} J -->|No| K["Karatsuba O(n^1.585)"] J -->|Yes| L["Toom-Cook O(n^1.465)"]