Know the Hidden Performance Costs of Java Streams

Since the beginning, the Java collections offer only a minimalistic API. If you wanted to determine any useful information which was not foreseen, you had to implement it yourself or use helper methods offered by various libraries.

With Java 8, streams came to the rescue. You now have a uniform API to query all information you could imagine. It is also easy to code, just type ".stream()" in your IDE and the it helps to explore the needed methods provided by the fluent API.

Unfortunately the stream approach - as handy as it is - comes with a cost in performance which cannot always be neglected. Let's have a look at the two simple methods below which count the number of occurrences of elements in a list – once in the traditional way, once using streams.

static <T> int countIfStream(List<T> list, Predicate<T> filter) {
    return (int) list.stream().filter(filter).count();
}

static <T> int countIfList(List<T> list, Predicate<T> filter) {
    int count = 0;
    for (int i = 0; i < list.size(); i++) {
        T elem = list.get(i);
        if (filter.test(elem)) {
            count++;
        }
    }
    return count;
}

The stream variant is much shorter, so the lazy developer will typically use the one liner shown to accomplish the task. What however if we look behind the scenes of the written source to see what's really going on? We will investigate the methods regarding the topics complexity, performance and memory – whereas these topics are obviously related.

Complexity

To get a feeling for the complexity, I propose that you fire up the debugger and strep through the code executed for an empty list – this way we just see the overhead needed for preparing the iteration.

The number of calls for countIfList() is short as expected (the output shows the calls made by an instrumented execution which is basically what you also can see manually in your debugger):

   Test#countIfList:79: Call: java.util.List#size Parameters: [], This: []

Even if we may expect some overhead for the stream approach, the sheer amount of method calls executed for countIfStream() typically comes quite unexpected:

Test#countIfStream:74: Call: Collection#stream 
  Spliterators#spliterator:420: Call: Objects#requireNonNull 
  Spliterators#spliterator:420: New: IteratorSpliterator#IteratorSpliterator 
  StreamSupport#stream:68: Call: Objects#requireNonNull 
  StreamSupport#stream:70: Call: StreamOpFlag#fromCharacteristics 
     StreamOpFlag#fromCharacteristics:733: Call: Spliterator#characteristics 
  StreamSupport#stream:70: New: ReferencePipeline$Head#Head 
Test#countIfStream:74: Call: Stream#filter 
  ReferencePipeline#filter:161: Call: Objects#requireNonNull 
  ReferencePipeline#filter:162: New: ReferencePipeline$2#2 
    StreamOpFlag#combineOpFlags:691: Call: StreamOpFlag#getMask 
Test#countIfStream:74: Call: Stream#count 
  ReferencePipeline#count:526: Call: ReferencePipeline#mapToLong 
    ReferencePipeline#mapToLong:219: Call: Objects#requireNonNull 
    ReferencePipeline#mapToLong:220: New: ReferencePipeline$5#5 
      StreamOpFlag#combineOpFlags:691: Call: StreamOpFlag#getMask 
  ReferencePipeline#count:526: Call: LongStream#sum 
    LongPipeline#sum:397: Call: LongPipeline#reduce 
      LongPipeline#reduce:439: Call: ReduceOps#makeLong 
         ReduceOps#makeLong:382: Call: Objects#requireNonNull 
         ReduceOps#makeLong:407: New: ReduceOps$8#8 
      LongPipeline#reduce:439: Call: LongPipeline#evaluate 
        AbstractPipeline#evaluate:232: Call: AbstractPipeline#isParallel 
        AbstractPipeline#evaluate:234: Call: TerminalOp#getOpFlags 
        AbstractPipeline#evaluate:234: Call: AbstractPipeline#sourceSpliterator 
          AbstractPipeline#sourceSpliterator:410: Call: AbstractPipeline#isParallel 
        AbstractPipeline#evaluate:234: Call: TerminalOp#evaluateSequential 
          AbstractPipeline#wrapAndCopyInto:472: Call: Objects#requireNonNull 
          AbstractPipeline#wrapAndCopyInto:472: Call: AbstractPipeline#wrapSink 
            AbstractPipeline#wrapSink:515: Call: Objects#requireNonNull 
            AbstractPipeline#wrapSink:518: Call: AbstractPipeline#opWrapSink 
            AbstractPipeline#wrapSink:518: Call: AbstractPipeline#opWrapSink 
          AbstractPipeline#wrapAndCopyInto:472: Call: AbstractPipeline#copyInto 
            AbstractPipeline#copyInto:478: Call: Objects#requireNonNull 
            AbstractPipeline#copyInto:480: Call: AbstractPipeline#getStreamAndOpFlags 
            AbstractPipeline#copyInto:480: Call: StreamOpFlag#isKnown 
            AbstractPipeline#copyInto:481: Call: Spliterator#getExactSizeIfKnown 
            AbstractPipeline#copyInto:481: Call: Sink#begin 
            AbstractPipeline#copyInto:482: Call: Spliterator#forEachRemaining 
            AbstractPipeline#copyInto:483: Call: Sink#end 
      LongPipeline#reduce:439: Call: java.lang.Long#longValue

What performance can we expect if the JVM has executed such an amount of calls. Fortunately, JIT does a great job of getting repetitive things out of the way by inlining such code. Let's use JITWatch for an easy analysis of the whole call chain.

As we can see, almost all calls can be inlined, only 5 methods remain in state "Compiled", with the explanatory message "(resursive) inlining too deep". But of course this only means that the exhibitive cost of calling a function is gone, the code must still be executed.

Performance

For measuring the performance, we execute the two methods with the micro benchmarking tool JMH. With the result above, we can expect that the stream approach will be slower for small lists, but it will be interesting to see whether this changes for larger sizes.

Below you can see results for the benchmark executed for lists with size of 0, 1, 10, 100, 1000.

countIfList         0  thrpt    5  291'458'215.578 ops/s
countIfList         1  thrpt    5  290'634'931.396 ops/s
countIfList        10  thrpt    5   47'959'330.631 ops/s
countIfList       100  thrpt    5    5'695'848.033 ops/s
countIfList      1000  thrpt    5      623'556.135 ops/s

countIfStream       0  thrpt    5    9'866'821.704 ops/s
countIfStream       1  thrpt    5    9'933'108.590 ops/s
countIfStream      10  thrpt    5   15'982'746.360 ops/s
countIfStream     100  thrpt    5    4'699'687.288 ops/s
countIfStream    1000  thrpt    5      595'175.673 ops/s

We can see that for empty lists, the direct iteration is about 30 times faster than the stream approach. For 10 elements, it is still 3 times faster, for 100 elements, the overhead goes down to about 15% and reaches about 5% for 1000 elements.

Memory

Besides performance, there is also another hidden cost which will slow down your application. If you look at the calls executed above, you can also see 5 occurrences of "New" meaning that a new object is allocated.

All objects temporarily allocated for handling the stream must be freed again and add pressure to the GC. So you can easily create a huge amount of garbage per second if use such a stream call in a tight loop.

countIfStream: gc.alloc.rate           0  thrpt    5       2031.985  MB/sec
countIfStream: gc.alloc.rate.norm      0  thrpt    5        216.000    B/op
countIfStream: gc.count                0  thrpt    5         40.000  counts
countIfStream: gc.time                 0  thrpt    5         32.000      ms

It is true that the JVM can eliminate some allocations by doing escape analysis, but unfortunately the setup used for stream seems to be complex – which is easy to understand if we recall the details discovered above.

This required allocation can easily become a more serious problem to the overall performance of an application than the direct performance cost shown above. This cost is directly tied to a method and can easily be acceptable if it occurs in a method which is not performance critical. The cost related to the additional load on the GC however can negatively influence the whole application.

Summary

Each developer should know the hidden cost of Java streams. Each usage of a stream adds overhead both for performance and memory consumption. Where as the performance costs are tied directly to a method and can therefore be acceptable, we must keep in mind that the costs for allocating and freeing memory generate more work for the GC and therefore influences the whole application.

As a simple guideline, the use of Java stream should therefore be avoided in often called methods or within tight loops.

It is therefore better to use static helper methods implementing the needed functionality without streams. Such methods can be found as part of libraries, e.g. Brownies Collections (https://github.com/magicwerk/brownies-collections) offers methods for filtering and mapping like filter() and map(), but also mapFilter() or filterMap() and many more like containsIf(), countIf(), getIf().

Magicwerk.org

Know the Hidden Performance Costs of Java Streams