GapList – a Lightning-Fast List Implementation
This article introduces GapList, an implementation which strives for combining the strengths of both ArrayList and LinkedList.
Join the DZone community and get the full member experience.
Join For FreeThis article introduces GapList, an implementation which strives for combining the strengths of both ArrayList and LinkedList. We will show how it is implemented to offer efficient random access to elements by index (as ArrayList does) and at the same time efficient adding and removing elements to and from head and tail of the list (as LinkedList does).
Background
The Java Collections Framework belongs to the most used functionality in the Java standard library. Nearly each application uses it to handle some collections of data.
The framework comprises various interfaces and also implementations of these interfaces. By far the most popular interface is java.util.List which allows to handle an array-like list of elements which can grow and shrink as needed.
The Java standard library offers two main implementations of the List interface: ArrayList and LinkedList. As explained in http://docs.oracle.com/javase/tutorial/collections/implementations/list.html you will typically end up using ArrayList as general-purpose List implementation as it is usually faster and consumes less memory.
But the cited document also mentions that executing add/delete operations on the beginning or interior of an ArrayList will not perform very well. So the question arises: Is ArrayList really suited as general-purpose implementation of List?
Motivation
Before delving further into the details, we want to have a look at a situation which you could encounter in your daily programmer's life. It will illustrate that some needs cannot be met by using either ArrayList or LinkedList.
Consider the following use case: We write a software which has to analyze incoming events. The analysis is always done on a fixed size window containing the last events arrived. So to maintain these fixed size window, we need a data structure which supports efficient adding elements at the beginning and removing elements from the end. The analysis is then done by methods which require random access to the list - either because it is easier to implement or because they really need it.
Let's consider our two alternatives:
- ArrayList would excel in the analysis part, but behave poorly in adding/removing the events
- LinkedList would excel in adding and removing elements, but behave poorly in the analysis part
Which implementation would you choose? As a good programmer this will not be an easy choice because you know that your solution will not handle all cases very well. Of course you know that you should not care about premature optimization, but nevertheless it may give you at least a bad feeling. It may be that you do some performance measurements to keep you sleeping silently at night or you write your concerns in a design documents or in some Javadoc comments to show your co-workers that you have understood the implications of what you have done.
So whatever you do, the solution will not be an optimal one and you may have spent some time measuring performance, writing concerns or doing some brain work. And you may continue to wonder how you could realize a better solution which you could then be really be proud of.
But in fact you would like to have standard libraries which are fast enough so they can be used as building blocks for other components without compromises. And here GapList comes to the rescue: it will excel in all operations.
Review of ArrayList
The implementation of GapList is an improvement of ArrayList. We therefore first have a look at the implementation of ArrayList to see why the performance of some operations tends to be bad.
The following illustration shows how data is stored in an ArrayList.
All elements are stored in a Java array. The first element is at index 0, the second at index 1 and so on. To prevent new allocation on every insert, the allocated array is typically larger than the stored number of elements and a size variable is used to keep track of the used elements.
If we look at the illustration, it becomes quite obvious why different operations have different performance characteristics: operations at the end are very fast because no data must be copied, whereas operations at the beginning need an expensive copy operation.
If we analyze some operations in a row, we see that subsequent operations cannot profit from previous operations. So in the following example, the elements right from the insertion point have to be moved twice for two subsequent insert operations.
As a consequence, ArrayList is not suited for iterating over a list and adding or removing elements. If you really need to iterate over an ArrayList and modify it, you would better build up a new ArrayList instance during interating.
The fact that ArrayList offers a very asymmetric performance behavior can also become problematical if the operations are executed through a GUI where the user would notice that an operation at the end is ways faster than the same operation at the beginning of the list.
Introduction to GapList
To solve the issues brought out, we introduce GapList as another implementation of the java.util.List interface. As main features, GapList provides
- Efficient access to elements by index
- Constant time insertion at head and tail of list
- Exploit the locality of reference often seen in applications
Let's see how GapList is implemented to offer these features.
If we compare how the different kind of inserts are handled by ArrayList, we can quickly come up with a solution to guarantee fast insertion both at the beginning and at the end of the list. Instead of moving all elements to gain space at index 0, we leave the existing elements in place and write the elements at the end of the allocated array if there is space left. So we basically use the array as a kind of rotating buffer.
For accessing the elements in the right order, we have to remember the start position of the first element and use a modulo operation to calculate the physical index from the logical one:
physIndex = (start + index) % capacity
To exploit the locality of reference, we allow a gap to be included in the storage of the list elements. The gap formed by the unused slots in the backing array can be anywhere in the list. There is at most one gap, but there can also be none.
This gap helps you to take advantage of the locality of reference to the list, so if you add an element to the middle of the list, a subsequent addition to the middle will be fast.
If a GapList has no gap, one is created if needed. If the gap is at a wrong place, it is moved. But if the operations happen near to each other, only few data will have to be copied.
GapList also allows removal of elements at the beginning and at the end without any moving of elements.
Removals in the middle are handled similar to insertions: an existing gap may be moved or vanish if no longer needed.
Perfomance of GapList
We now present benchmark figures comparing GapList, ArrayList, and LinkedList for some common operations.
Let's first have a look at the most import operation: retrieving elements.
We can see that the performance of GapList is even slightly better than ArrayList in retrieving elements. The performance of LinkedList is 2'000 times worse if accessing elements in a random way (note that all charts use a logarithmic axis).
The next diagram shows the performance for adding elements.
GapList features good performance irrespective of the insert location whereas ArrayList is 3'000 times slower if adds happen at the beginning of list.
In the case of completely random add and remove operations, GapList is slightly slower than ArrayList. This is what we would expect as there is no locality of reference GapList can exploit and the implementation of GapList is somehow more complex.
But what happens if subsequent operations happen next to each other? Or we even are iterating over the list?
We can see that the performance of GapList improves the more local the subsequent operations happen and GapList can become 100 times faster than ArrayList. It is also nearly as fast as LinkedList in iterating.
The performance of remove operations is similar to the add operations shown.
To summarize, we he have seen that GapList offers great performance in any test case:
- it is as fast as ArrayList in retrieving elements which is the most important operation
- it is as fast as LinkedList in adding/removing elements at head/tail of list
- it is as fast as ArrayList in adding/removing elements at random locations
- it is as fast as LinkedList in adding/removing elements by iterating
Features of GapList
GapList has been designed as drop-in replacement for both ArrayList and LinkedList:
- GapList implements the interface java.util.Deque which is not implemented by ArrayList, but only by LinkedList.
- All public methods provided by ArrayList are also implemented by GapList (ensureCapacty, trimToSize).
A known drawback of the Java Collections Framework is the fact that there is no support for primitive types. So if you need to store a bunch of integers, you will end up with a List<Integer>. This approach uses much more memory than storing the data in a simple int[].
GapList also offers implementations for all simple data types, e.g IntGapList.
As the classes for the primitive data types are generated out of the generic version using a templating mechanism, some bulk operations have been added directly to GapList. They include remove(), move(), copy(), but also sort(), rotate(), or shuffle().
GapList is not thread-safe. It must also be considered that exploiting the locality of reference only works if there is a locality: if two threads are working on different parts of the list, this optimization can even have a negative effect on performance as the gap has to be moved back and forth. Of course this can also happen with a single thread but it is more unlikely.
MaxList - a Use Case for GapList
We now come back to our example presented as motivation and show how GapList can help us to create a really simple and powerful implementation.
As both adding and removing are fast, we can easily use GapList as a fixed size rotating buffer. Elements can be added until the maximum size is reached. If another element should be added, the first element is removed before the addition.
As all our performance considerations are solved by GapList, the implementation of the described functionality is quite simple. You basically have to override the add methods to guarantee that the list will never grow beyond the defined limit.
add(T elem) {
if (size() == maxSize()) {
removeFirst();
}
super.add(elem)
}
Conclusion
GapList offers excellent performance in any situation which makes it superior to ArrayList and LinkedList. This allows to use GapList as drop-in replacement for both ArrayList and LinkedList.
GapList, MaxList and the implementations working with primitive data types are part of the Brownies Collections library and can be downloaded from http://www.magicwerk.org/collections.
Opinions expressed by DZone contributors are their own.
Comments