[MM’s] Tiny Bench Notes — Java Looping

The Hidden Cost of Iteration Choices

Every Java developer has written this:

public class IndexedForLoopSum implements SumStrategy {

    @Override
    public long sumElements(List<Long> data) {
        long sum = 0;
        var n = data.size();
        for (int i = 0; i < n; i++) {
            sum += data.get(i);
        }
        return sum;
    }
}

It’s readable. Expressive. Safe. Clean. Easy to reason about.

But if list is a LinkedList, this innocent loop can take 3.8 seconds for 100,000 elements—while a different loop over the same data finishes in 0.15 milliseconds.

That’s not a micro-optimization. That’s a performance gap.

⚡ TL;DR (Quick Recap)

Safest default: Enhanced for loop (works efficiently everywhere)
Fastest on ArrayList: Indexed for, but only marginally
Catastrophic: Indexed for on LinkedList (O(n²))
Overhyped: parallelStream()—slower in all tested scenarios (for lightweight summation — not a universal rule)

Why Loop Strategy Still Matters

Modern Java abstracts a lot — Streams, lambdas, collections interfaces. But abstraction does not eliminate complexity.

The critical detail:

ArrayList.get(i) → O(1)
LinkedList.get(i) → O(n)

Now multiply that inside a loop. The JVM can optimize bytecode. It cannot fix a bad algorithm.

Benchmark Setup

It measures the average execution time per operation in nanoseconds, running in a shared benchmark state, with 2 forks (each doing 1 warmup fork), warming up for 5 iterations of 1 second each, and then measuring over 10 iterations of 1 second each.

Test parameters:

JMH 1.37 on Java 25
JVM: -Xms1g -Xmx1g -XX:+UseG1GC

All implementations follow a simple contract:

public interface SumStrategy {
    long sumElements(List<Long> data);
}

ArrayList: Performance Differences are Negligible

On ArrayList, everything works well.

At 10,000 elements:

Indexed for: ~3,159 ns
Enhanced for: ~3,181 ns
Iterator: ~3,176 ns
Stream: ~4,300 ns (~35% slower)

Differences are minimal because:

Random access is constant time
JIT optimizes loops aggressively
Iterator overhead is mostly eliminated (Depends on JIT and context)

Choose readability. Performance differences are negligible.

Enhanced for vs Iterator: No Real Difference

This:

public class EnhancedForLoopSum implements SumStrategy {

    @Override
    public long sumElements(List<Long> data) {
        long sum = 0;
        for (Long n : data) {
            sum += n;
        }
        return sum;
    }
}

or this:

public class ExplicitIteratorSum implements SumStrategy {

    @Override
    public long sumElements(List<Long> data) {
        long sum = 0;
        var it = data.iterator();
        while (it.hasNext()) {
            sum += it.next();
        }
        return sum;
    }
}

Same behavior. Same performance. Use whichever is more readable — typically the enhanced for.

LinkedList: Where Things Break Badly

Now the real story.

At 100,000 elements:

Enhanced for: ~159,000 ns (0.15 ms)
Indexed for: ~3,859,631,454 ns (3.8 s)

This is pure O(n²) behavior.

Why? Each get(i) walks the list from the start (or end).
So your loop becomes:

n * O(n) = O(n²)

This is not slow — it’s broken.

Streams: Clean, but Not Free

Streams improve readability, but they introduce overhead:

mapToLong().sum() → avoids boxing, best stream option
forEach() → slower due to lambda + mutable workaround
parallelStream() → consistently slower (for lightweight summation — not a universal rule)

Why parallelStream fails here:

Thread coordination overhead
Small per-element workload
Poor memory locality

Parallelism only helps when the work per element is expensive.

The Lambda “Mutable Capture” Problem

This pattern shows up often:

public class StreamForEachSum implements SumStrategy {

    @Override
    public long sumElements(List<Long> data) {
        long[] sum = {0};
        data.stream().forEach(n -> sum[0] += n);
        return sum[0];
    }
}

It works because lambdas require effectively-final variables.

But:

Adds indirection
Hurts readability
Signals wrong abstraction

Prefer:

public class StreamReduceSum implements SumStrategy {

    @Override
    public long sumElements(List<Long> data) {
        return data.stream().reduce(0L, Long::sum);
    }
}

Array Conversion: Smart Idea, Wrong Context

public class ArrayConversionSum implements SumStrategy {

    @Override
    public long sumElements(List<Long> data) {
        long[] arr = data.stream().mapToLong(Long::longValue).toArray();
        long sum = 0;
        for (int i = 0; i < arr.length; i++) {
            sum += arr[i];
        }
        return sum;
    }
}

Then iterate.

In theory:

No boxing
Better CPU optimization

In practice:

Conversion cost dominates
3–4× slower for single-pass operations

Only useful if:

You reuse the array multiple times
Heavy computation follows

Final Takeaways

Looping in Java feels trivial — but it’s one of the easiest places to introduce serious performance bugs.

The biggest lessons:

Data structure defines performance — not loop syntax
Enhanced for is the safest default
Indexed loops can be dangerous in the wrong context
Streams trade performance for clarity
Parallelism is not a free optimization

This isn’t about shaving nanoseconds. It’s about avoiding seconds of latency from a single line of code.

You can find all the code on GitHub.

Originally posted on marconak-matej.medium.com.