Over a million developers have joined DZone.

Collection Performance - Don't Become Too Lazy

· Java Zone

What every Java engineer should know about microservices: Reactive Microservices Architecture.  Brought to you in partnership with Lightbend.

A few weeks ago I was performance tuning some code that ran quite regularly and took more time than it should, judging from the complexity of what is was doing. As usual, by merely looking that the code there was nothing blatantly, obviously, complete and ridiculously wrong - after all, it worked correctly.

So I fired up trusty old JProfiler and had a look, just to find out that sometimes you can become just too lazy and reliant on libraries without realizing their characteristics deeply enough.

The Code

The following snippet shows the relevant method of the code under analysis. The surroundings are not really important, the heart of the matter is in here:

protected List<String> getCurrentChunkList() {
List<String> all = getContext().getFullList();
List<String> result = new ArrayList<String>(chunkSize + 2);
int tempCount = 0;
for (Iterator<String> allIter = all.iterator(); allIter.hasNext()
&& tempCount < chunkSize; tempCount++) {
result.add(allIter.next());
if (tempCount == chunkSize) {
while (allIter.hasNext()) {
String current = allIter.next();
if (tCurrent.equalsIgnoreCase(result.get(result.size() - 1))) {
result.add(tCurrent);
} else {
break;
}
}
}
}
all.removeAll(result);
return result;
}

Basically all this does is take a portion of a long list of Strings and transfer them to a partial list with some data specific fiddling at the end of a chunk. Once that has been completed, the elements transferred are removed from the primary list.

The whole thing is part of a batch processing system that is only allowed to run for a limited time slice at a time. So the overall list of items that still needs to be processed is stored persistently in a context object and a certain amount of those is extracted, the master list shortened and the resulting chunk returned to the actual processing in each run. Control is then returned to the surrounding framework that can assign a new time slice to this kind of job, or at different one at its discretion. This will be repeated until the master list is empty.

The all list contains about 10,000 elements, the chunk size is set to 1,000. Now, this does not look too bad, does it?

First run

When running the processing job for the mentioned 10,000 elements in chunks of 1,000 the whole process took 490 seconds; of that the above method accounted for a whopping 6,3% or 31 seconds! A little long considering that it is just calling a few supposedly well-optimized collection operations…

But looking at it more closely, these “few” operations are a few more. With 1,000 elements per chunk, there were

  • 10 invocations of getCurrentChunkList
  • 10,000 invocations of Iterator.hasNext
  • 10,000 invocations of Iterator.next
  • 10,000 invocations of ArrayList.add
  • 10 new ArrayLists
  • 10 new Iterators
  • 10 invocations of ArrayList.removeAll

Not exactly minimalistic, but 31 seconds seems a little over the top.

The problem lies with the ArrayList.removeAll call - internally an ArrayList is using a regular array as a backend store. The implementation of removeAll is just a looped call to the remove method for a single element. remove will then internally iterate the array and do an equality check of the element to be removed, finally removing it from the array when found.

Unless the element to be removed is at the end of the backing store array, there is a lot of bookkeeping to do, because removing something from the middle of an array is not an easy thing to do - in any case a good chunk of the array needs to be shifted over to fill the gap.

While this is using the fairly efficient System.arraycopy in its implementation, it still adds up, because we the algorithm shown above removes the elements it picks for processing from the beginning of the list, leading several thousand array shifting and resizing operations.

For reference, this is the relevant part from the source of ArrayList:

public E remove(int location) {
E result;
int size = lastIndex - firstIndex;
if (0 <= location && location < size) {
if (location == size - 1) {
result = array[--lastIndex];
array[lastIndex] = null;
} else if (location == 0) {
result = array[firstIndex];
array[firstIndex++] = null;
} else {
int elementIndex = firstIndex + location;
result = array[elementIndex];
if (location < size / 2) {
System.arraycopy(array, firstIndex, array, firstIndex + 1,
location);
array[firstIndex++] = null;
} else {
System.arraycopy(array, elementIndex + 1, array, elementIndex,
size - location - 1);
array[--lastIndex] = null;
}
}
if (firstIndex == lastIndex) {
firstIndex = lastIndex = 0;
}
} else {
throw new IndexOutOfBoundsException(
// luni.0A=Index: {0}, Size: {1}
Messages.getString("luni.0A", //$NON-NLS-1
Integer.valueOf(location),
Integer.valueOf(lastIndex - firstIndex)));
}
modCount++;
return result;
}

Possible optimisations

As one can see from the source of ArrayList.remove above, the ideal case would be to remove elements from the end of the array, because that does not require any element shifting and array copying whatsoever. The remove operation can simply be done by setting the last element to null and decrementing the lastIndex.

So I changed to code of the initial method to the following:

protected List<String> getCurrentChunkList() {
List<String> all = getContext().getFullList();
List<String> result = new ArrayList<String>(chunkSize + 2);
int tempCount = 0;
for (int i = all.size() - 1; i >= 0 && tempCount < chunkSize; i--, tempCount++) {
result.add(all.remove(i));
}
return result;
}

This code removes strictly from the end of the all list and adds each element to the current result chunk. Going this route also allowed for the special fiddling in the middle of the old method to be removed, even though that did not contribute any measurable amount of overhead to the method’s execution time. The call to removeAll is completely gone, because remove already takes care of that individually.

Second run

Now, with the same data set the profiling comes to a very different conclusion:

  • 10 invocations of getCurrentChunkList
  • 10,000 invocations of ArrayList.remove
  • 10,000 invocations of ArrayList.add
  • 10 new ArrayLists

With this new configuration the percentage of overall runtime for this particular piece of code was reduced to a mere 23ms, less than 0,005% of the total execution time.

Conclusion

Even though the Java Collections framework’s classes are well tested and contain lots of performance enhancing shortcuts, it is still vital to be aware of their characteristics. Even though the original method looked perfectly ok at first glance and for a few runs would not show up on the radar, the repeated execution time added up to quite a remarkable percentage of the overall process.

With just a few minor modifications - that made the code easier to understand, too, getting rid of intermediate Iterators and special handlings for edge cases - that single method’s performance was boosted significantly. Considering that it is part of a more general batch job framework, overall improvements are even greater than what this single isolated test case showed.

From http://www.danielschneller.com/2010/11/collection-performance-dont-become-too.html

Microservices for Java, explained. Revitalize your legacy systems (and your career) with Reactive Microservices Architecture, a free O'Reilly book. Brought to you in partnership with Lightbend.

Topics:

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}