The Way of the Lambda and removeIf()
Did you know that removing elements from large collections can be really slow and error prone if you do it the wrong way? Here's one way to do it right.
Join the DZone community and get the full member experience.
Join For FreeThe Way of the Lambda: Elegant and Efficient
In Java 8, you can do:
items.removeIf(i -> predicate(i));
Where you used to do:
for (Iterator it = items.iterator(); it.hasNext();) {
if (predicate(it.next())) {
it.remove();
}
}
Which is not only considerably more elegant but also a more efficient solution in some cases. I have the graphs to prove it!
Background
Recently having spent a few years away from Java development, it is great to be back and find that Java 8 indeed has delivered on what was promised in 2013. With lambdas and streams, it is possible to tackle problems in a functional manner and write code that is both readable and efficient. This over due modernization of the language is by far the most important change in Java 8, but there are also other improvements that deserve mentioning. This post elaborates on the topic of Collection.removeIf()
which allows for conditional removal of elements of a collection.
The Way of the Iterator
Removing elements of a collection while iterating over it is somewhat like cutting off the branch on which you are sitting, but by using Iterators that can be done without a runtime ConcurrentModificationException. Consider the following typical simplistic example, where the elements of the collection items for which predicate() holds true are removed.
for (Iterator it = items.iterator(); it.hasNext();) {
if (predicate(it.next())) {
it.remove();
}
}
If the collection in question is a linked list of size n and m elements are removed, the full iteration with removals is O(n + m) = O(n) since iteration steps and removal from the list are constant time operations and m is limited by n. If, however, the data structure of the collection is one where removal is relatively expensive compared to iteration, the complexity becomes less favorable. For an ArrayList a single removal step is O(n) since all items at higher indices have to be moved. There are still n iteration steps to be made so the full sequence of removals becomes O(m*n) which is O(n^2) assuming worst case with the number of removals being a constant fraction of the array size.
In more pragmatic terms, what happens in the ArrayList case is that for each element that is removed, all elements of higher indices are moved to fill the gap in the backing array. This is clearly suboptimal since that operation is repeated for each removal. Thus, to reach its final destination an element will visit all locations between the starting position and the destination. In case many elements are removed, this may be a considerable amount of work in vain.
The Java 8 Way of the Lambda
In Java 8, the code snippet from above can be expressed as follows.
items.removeIf(i -> predicate(i));
Clearly more concise, this expression is also more efficient on an ArrayList compared to the Iterator approach. Where the Iterator requires completion of each removal operation before continuing to the next one, removeIf
operates on a predicate and can therefore iterate over all items once to determine which elements are to be removed and then shift the remaining elements to the correct new indices all in a second pass. Therefore, removeIf
is O(n) since it performs two linear passes through the collection. Each retained element in the resulting collection is moved just once, while for the Iterator approach it would be moved one step at a time towards its destination. See this gist for the relevant JDK code.
The Difference Visualized
We compare execution times for the two approaches as a function of the array size for a rather extreme case where 999 out of 1000 items are removed from an ArrayList. For each data point 40 runs are made, the 5 fastest and slowest outliers are removed and the sum of the remaining 30 form the execution time value. For details please refer to this gist.
We expect the execution time to be considerably shorter for the Java 8 way of the lambda and as a matter of fact that is what we find. We also note that within the resolution of the graph, the lines are quite noise free due to the many repeats and filtering of outlier results.
For array sizes of 150,000 elements, the removeIf()
approach is more than 3000 times faster. Even more importantly, the factor between the two increases as the size of the array increases which is expected since the way of the lambda has a superior asymptotical time complexity.
To better compare the time complexity of the two, we scale up the green line to match the green one. By multiplying the execution times of the faster algorithm by ~1500, the graphs can be compared in terms of shape. As expected, we find linear behavior for the Java 8 removeIf()
design, while the old ways show a quadratic time complexity due to the inefficient removal of elements.
When scaling up the measurements we get to see some noise in the green line. The sensitivity to random fluctuations in execution time is greater since the execution times are several orders of magnitude shorter compared to the blue. Both lines are fitted to polynomials of power two and as can be clearly seen, the light green line is very similar to a straight line while the blue line clearly aims for the sky in a super linear fashion.
Edit: There is a reddit thread about this blog post. Good points are being made there, for example the fact that the removeIf
implementation is atomic (in the database sense) as it first evaluates the predicate for all items before making any alterations to the list. Thus, if the predicate throws an exception for any of the items in the list, the list will remain unchanged.
Published at DZone with permission of Dan Lawesson, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments