Know the Hidden Performance Costs of Java Streams
Since the beginning, the Java collections offer only a minimalistic API. If you wanted to determine any useful information which was not foreseen, you had to implement it yourself or use helper methods offered by various libraries.
With Java 8, streams came to the rescue. You now have a uniform API to query all information you could imagine. It is also easy to code, just type ".stream()" in your IDE and the it helps to explore the needed methods provided by the fluent API.
Unfortunately the stream approach - as handy as it is - comes with a cost in performance which cannot always be neglected. Let's have a look at the two simple methods below which count the number of occurrences of elements in a list – once in the traditional way, once using streams.
static <T> int countIfStream(List<T> list, Predicate<T> filter) { return (int) list.stream().filter(filter).count(); } static <T> int countIfList(List<T> list, Predicate<T> filter) { int count = 0; for (int i = 0; i < list.size(); i++) { T elem = list.get(i); if (filter.test(elem)) { count++; } } return count; }
The stream variant is much shorter, so the lazy developer will typically use the one liner shown to accomplish the task. What however if we look behind the scenes of the written source to see what's really going on? We will investigate the methods regarding the topics complexity, performance and memory – whereas these topics are obviously related.
Complexity
To get a feeling for the complexity, I propose that you fire up the debugger and strep through the code executed for an empty list – this way we just see the overhead needed for preparing the iteration.
The number of calls for countIfList() is short as expected (the output shows the calls made by an instrumented execution which is basically what you also can see manually in your debugger):
Test#countIfList:79: Call: java.util.List#size Parameters: [], This: []
Even if we may expect some overhead for the stream approach, the sheer amount of method calls executed for countIfStream() typically comes quite unexpected:
Test#countIfStream:74: Call: Collection#stream Spliterators#spliterator:420: Call: Objects#requireNonNull Spliterators#spliterator:420: New: IteratorSpliterator#IteratorSpliterator StreamSupport#stream:68: Call: Objects#requireNonNull StreamSupport#stream:70: Call: StreamOpFlag#fromCharacteristics StreamOpFlag#fromCharacteristics:733: Call: Spliterator#characteristics StreamSupport#stream:70: New: ReferencePipeline$Head#Head Test#countIfStream:74: Call: Stream#filter ReferencePipeline#filter:161: Call: Objects#requireNonNull ReferencePipeline#filter:162: New: ReferencePipeline$2#2 StreamOpFlag#combineOpFlags:691: Call: StreamOpFlag#getMask Test#countIfStream:74: Call: Stream#count ReferencePipeline#count:526: Call: ReferencePipeline#mapToLong ReferencePipeline#mapToLong:219: Call: Objects#requireNonNull ReferencePipeline#mapToLong:220: New: ReferencePipeline$5#5 StreamOpFlag#combineOpFlags:691: Call: StreamOpFlag#getMask ReferencePipeline#count:526: Call: LongStream#sum LongPipeline#sum:397: Call: LongPipeline#reduce LongPipeline#reduce:439: Call: ReduceOps#makeLong ReduceOps#makeLong:382: Call: Objects#requireNonNull ReduceOps#makeLong:407: New: ReduceOps$8#8 LongPipeline#reduce:439: Call: LongPipeline#evaluate AbstractPipeline#evaluate:232: Call: AbstractPipeline#isParallel AbstractPipeline#evaluate:234: Call: TerminalOp#getOpFlags AbstractPipeline#evaluate:234: Call: AbstractPipeline#sourceSpliterator AbstractPipeline#sourceSpliterator:410: Call: AbstractPipeline#isParallel AbstractPipeline#evaluate:234: Call: TerminalOp#evaluateSequential AbstractPipeline#wrapAndCopyInto:472: Call: Objects#requireNonNull AbstractPipeline#wrapAndCopyInto:472: Call: AbstractPipeline#wrapSink AbstractPipeline#wrapSink:515: Call: Objects#requireNonNull AbstractPipeline#wrapSink:518: Call: AbstractPipeline#opWrapSink AbstractPipeline#wrapSink:518: Call: AbstractPipeline#opWrapSink AbstractPipeline#wrapAndCopyInto:472: Call: AbstractPipeline#copyInto AbstractPipeline#copyInto:478: Call: Objects#requireNonNull AbstractPipeline#copyInto:480: Call: AbstractPipeline#getStreamAndOpFlags AbstractPipeline#copyInto:480: Call: StreamOpFlag#isKnown AbstractPipeline#copyInto:481: Call: Spliterator#getExactSizeIfKnown AbstractPipeline#copyInto:481: Call: Sink#begin AbstractPipeline#copyInto:482: Call: Spliterator#forEachRemaining AbstractPipeline#copyInto:483: Call: Sink#end LongPipeline#reduce:439: Call: java.lang.Long#longValue
What performance can we expect if the JVM has executed such an amount of calls. Fortunately, JIT does a great job of getting repetitive things out of the way by inlining such code. Let's use JITWatch for an easy analysis of the whole call chain.
As we can see, almost all calls can be inlined, only 5 methods remain in state "Compiled", with the explanatory message "(resursive) inlining too deep". But of course this only means that the exhibitive cost of calling a function is gone, the code must still be executed.
Performance
For measuring the performance, we execute the two methods with the micro benchmarking tool JMH. With the result above, we can expect that the stream approach will be slower for small lists, but it will be interesting to see whether this changes for larger sizes.
Below you can see results for the benchmark executed for lists with size of 0, 1, 10, 100, 1000.
countIfList 0 thrpt 5 291'458'215.578 ops/s countIfList 1 thrpt 5 290'634'931.396 ops/s countIfList 10 thrpt 5 47'959'330.631 ops/s countIfList 100 thrpt 5 5'695'848.033 ops/s countIfList 1000 thrpt 5 623'556.135 ops/s countIfStream 0 thrpt 5 9'866'821.704 ops/s countIfStream 1 thrpt 5 9'933'108.590 ops/s countIfStream 10 thrpt 5 15'982'746.360 ops/s countIfStream 100 thrpt 5 4'699'687.288 ops/s countIfStream 1000 thrpt 5 595'175.673 ops/s
We can see that for empty lists, the direct iteration is about 30 times faster than the stream approach. For 10 elements, it is still 3 times faster, for 100 elements, the overhead goes down to about 15% and reaches about 5% for 1000 elements.
Memory
Besides performance, there is also another hidden cost which will slow down your application. If you look at the calls executed above, you can also see 5 occurrences of "New" meaning that a new object is allocated.
All objects temporarily allocated for handling the stream must be freed again and add pressure to the GC. So you can easily create a huge amount of garbage per second if use such a stream call in a tight loop.
countIfStream: gc.alloc.rate 0 thrpt 5 2031.985 MB/sec countIfStream: gc.alloc.rate.norm 0 thrpt 5 216.000 B/op countIfStream: gc.count 0 thrpt 5 40.000 counts countIfStream: gc.time 0 thrpt 5 32.000 ms
It is true that the JVM can eliminate some allocations by doing escape analysis, but unfortunately the setup used for stream seems to be complex – which is easy to understand if we recall the details discovered above.
This required allocation can easily become a more serious problem to the overall performance of an application than the direct performance cost shown above. This cost is directly tied to a method and can easily be acceptable if it occurs in a method which is not performance critical. The cost related to the additional load on the GC however can negatively influence the whole application.
Summary
Each developer should know the hidden cost of Java streams. Each usage of a stream adds overhead both for performance and memory consumption. Where as the performance costs are tied directly to a method and can therefore be acceptable, we must keep in mind that the costs for allocating and freeing memory generate more work for the GC and therefore influences the whole application.
As a simple guideline, the use of Java stream should therefore be avoided in often called methods or within tight loops.
It is therefore better to use static helper methods implementing the needed functionality without streams. Such methods can be found as part of libraries, e.g. Brownies Collections (https://github.com/magicwerk/brownies-collections) offers methods for filtering and mapping like filter() and map(), but also mapFilter() or filterMap() and many more like containsIf(), countIf(), getIf().