Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

The Developer's Guide to Collections: Lists

DZone's Guide to

The Developer's Guide to Collections: Lists

It's time for another deep dive into Java collections. This time, we'll explore the concept, proper usage, and tips for implementing lists.

· Java Zone ·
Free Resource

Java-based (JDBC) data connectivity to SaaS, NoSQL, and Big Data. Download Now.

Of all the collection types in Java, the list is arguably the most popular due to its ordered characteristics, allowing developers to insert and retrieve elements at a specified index and to sort elements according to specified criteria. In this article, we continue The Developer's Guide to Collections series with an examination of the list collection, focusing on the conceptual idea of a list in Java, the List interface and how its overridden methods differ from those of the Collection interface, supporting interfaces, and the concrete list types (and their respective advantages and disadvantages).

The reader should note that this article uses the common mathematical convention for indices, where is some arbitrary index, n is the size of the indexed entity, square brackets are used to denote inclusive ranges, and parentheses are used to denote exclusive ranges. Also note that while this article uses notation and illustrations that are normally associated with linked lists (such as previous and next relationships), not all lists are implemented as linked lists; some lists may internally use arrays while others are free to use whichever internal implementation is most suited for the task at hand. 

Before diving into the technical aspects of a Java list, we must first define what a list is and the problem it is intended to solve.

The Concept of a List

A list is simply a collection with ordered elements. By ordering each element, a sequence of elements is created, allowing clients to insert, update, access, or remove an element at a specific index. This concept is illustrated in the figure below:

Image title

The indices of a list are 0-indexed, ranging from 0 to n - 1, where n is the size of the list. Therefore, the element at the first location in the list, or head, has an index of 0 while the element at the last location in the list, or tail, has an index of n - 1. Due to this indexing, new semantics are introduced in lists that are not available in general collections:

  • Retrieving an element at some index i
  • Updating an element at some index i
  • Adding an element at some index i
  • Adding all elements after some index i
  • Removing an element at some index i
  • Obtaining a sublist of the elements ranging from [a, b)
  • Obtaining the first index of a specified element
  • Obtaining the last index of a specified element

In addition to new index-based functionality, the list also introduces a new iterator. Unlike the general iterator, which is unidirectional (advances only in the forward, or next, direction), this list-based iterator allows for bidirectional traversal, adding a concept of a previous element. We will look at this iterator in more detail when we cover the List interface, but it suffices to note that the inclusion of order requires that a more nuanced iterator be created.

Appart from index-based semantics, the sequentiality of lists also allows clients to sort a list, moving some elements forward (to a higher index) in a list and moving other elements backward (to a lower index). In general, sorting is performed in two ways: (1) Using an explicit comparator object that takes two elements of equal types and determines if the first is before, equal, or after the other, or (2) natural ordering, where elements are directly compared to one another to determine if they are before, equal, or after other elements in the list. The default implementation for list sorting uses a merge sort algorithm, resulting in O(n) = n log(n). While this algorithm suffices for most list implementations, some implementations perform optimizations based on the internal data structure used to represent its sequence of elements.

Lastly, the lists can also replace each element in-place using a transformation function, where the input type and the output type of the transform are the same: T(ainput) → aoutput. Combining these responsibilities together, we can devise the additional functionality that makes up the List interface. Before delving into the functionality particular to lists, we will first examine the complete List interface, including the functionality inherited from the Collection interface.

The List Interface

The complete List interface for Java Development Kit (JDK) 9, including the methods inherited from the Collections interface (with comments removed) is listed below:

public interface List<E> extends Collection<E> {
    int size();
    boolean isEmpty();
    boolean contains(Object o);
    Iterator<E> iterator();
    Object[] toArray();
    <T> T[] toArray(T[] a);
    boolean add(E e);
    boolean remove(Object o);
    boolean containsAll(Collection<?> c);
    boolean addAll(Collection<? extends E> c);
    boolean addAll(int index, Collection<? extends E> c);
    boolean removeAll(Collection<?> c);
    boolean retainAll(Collection<?> c);
    default void replaceAll(UnaryOperator<E> operator) { /* ... */ }
    default void sort(Comparator<? super E> c) { /* ... */ }
    void clear();
    boolean equals(Object o);
    int hashCode();
    E get(int index);
    E set(int index, E element);
    void add(int index, E element);
    E remove(int index);
    int indexOf(Object o);
    int lastIndexOf(Object o);
    ListIterator<E> listIterator();
    ListIterator<E> listIterator(int index);
    List<E> subList(int fromIndex, int toIndex);
    default Spliterator<E> spliterator() { /* ... */ }
}


The non-index-based addition methods (add and addAll) operate with respect to the tail of the list. For example, if we add some element, e, using the add(E e) method inherited from the Collection interface, e is appended to the tail of the list. Likewise, allAll appends all of the elements in the supplied collection to the tail of the list. It is important to note that these mutation methods are considered optional operations and may not be supported by all List implementations. If a specific List implementation does not support an optional operation, an UnsupportedOperationException is thrown when the method is called.

In a similar fashion, the remove and removeAll methods are semantically equivalent to their Collection interface methods but are considered optional for lists. Likewise, the clear and retainAll methods are semantically identical to the Collection version but are considered optional for lists. The contains and containsAll methods are directly inherited from the Collection interface and are not optional (i.e. all list implementations support these methods).

The semantics of the equals method is noticeably different from the semantics of a generalized collection. In particular, the equals method returns true if all of the following criteria are satisfied:

  1. The supplied object is a List 
  2. Both lists have the same size
  3. All corresponding elements in both lists are equal, where two elements, a and b are considered equal if a.equals(b) returns true 

In a similar fashion, the hash code for a list is a composition of the hash codes of each of its elements, as dictated by the following algorithm:

int hashCode = 1;
for (E e : list)
    hashCode = 31 * hashCode + (e == null ? 0 : e.hashCode());


For more information on the use of the magic number 31 in the hash code computation above, see Effective Java, 3rd Edition (Item 11, pp. 50). The last override that is performed of Collection methods is that of the spliterator method, whose List default implementation is listed below:

@Override
default Spliterator<E> spliterator() {
    if (this instanceof RandomAccess) {
        return new AbstractList.RandomAccessSpliterator<>(this);
    } else {
        return Spliterators.spliterator(this, Spliterator.ORDERED);
    }
}


This default implementation is simply a refinement that specifies that the spliterator for a List should be ordered. This implementation also provides an optimization for random access lists, as denoted by the presence of the marker interface RandomAccess. We will cover this marker interface in more detail later in this article, but it suffices to know that this interface simply denotes that a List implementation supports fast (approximately time-constant) random access.

In general, the List interface methods inherited from the Collection interface follow the same semantics as their Collection ancestors but include more details (i.e. adding an element adds the element to a specific location in a list; namely, to the tail). With an understanding of the List methods that are common with the methods of the Collection interface, we can now examine the methods that are unique to the List interface. The unique methods of the JDK 9 List interface are enumerated in the listing below:

public interface List<E> extends Collection<E> {
    boolean addAll(int index, Collection<? extends E> c);
    default void replaceAll(UnaryOperator<E> operator) { /* ... */ }
    default void sort(Comparator<? super E> c) { /* ... */ }
    E get(int index);
    E set(int index, E element);
    void add(int index, E element);
    E remove(int index);
    int indexOf(Object o);
    int lastIndexOf(Object o);
    ListIterator<E> listIterator();
    ListIterator<E> listIterator(int index);
    List<E> subList(int fromIndex, int toIndex);
}


addAll(int index, Collection c)

This optional operation inserts the elements of the supplied collection (in the order they are returned by the iterator of the collection) at the specified index. Any elements present in the list at the index or greater will have their indices incremented by the number of elements added from the supplied collection. The number of added elements may not be the same as the size of the supplied collection, if, for example, the list does not allow for duplicates and one or more of the elements in the supplied collection are duplicates. For example, if the list contains 3 elements (indices 0 through 2) and a collection of 3 unique elements is added at index 1, the indices of the existing elements will be shifted as illustrated in the following figure:

Image title

In general, if m elements from the supplied collection are added at index i, all existing elements from i to n - 1, where n is the size of the list, are incremented by m. Formally, if m elements are added at index a, the existing indices, i, are incremented according to the following rule: i = i + m for i ≥ a.

replaceAll(UnaryOperation op)

This required method replaces each of the list elements in-place according to the supplied UnaryOperator. These replacements are performed by obtaining a list iterator (see the listIterator section below) and setting each value of the iterator to the return value of the supplied UnaryOperator. The default implementation for this method is defined as follows for JDK 9:

default void replaceAll(UnaryOperator<E> operator) {
    Objects.requireNonNull(operator);
    final ListIterator<E> li = this.listIterator();
    while (li.hasNext()) {
        li.set(operator.apply(li.next()));
    }
}


For reference, the JDK 9 effective interface for UnaryOperator is defined as follows:

@FunctionalInterface
public interface UnaryOperator<T> extends Function<T, T> {
    T apply(T t);
}


Note that the apply method is not actually present in the UnaryOperator interface, but rather, is defined in the Function interface. In general, the Function interface performs a transformation of some input of type T to some output (return value) of type R. In the case of a UnaryOperator, the input type and return type are both of type T and thus, acts as a transformation that does not alter the type of the input, only its state (or replaces the input with some other object of type T).

sort(Comparator c)

This method is optional and is algorithmically composed of the following steps:

  1. Obtain an array representation of the list
  2. Merge sort the array
  3. Replace each element of the list, in-place, with the array element of the matching index

This algorithm is reflected in the JDK 9 default implementation for the List sort method:

@SuppressWarnings({"unchecked", "rawtypes"})
default void sort(Comparator<? super E> c) {
    Object[] a = this.toArray();
    Arrays.sort(a, (Comparator) c);
    ListIterator<E> i = this.listIterator();
    for (Object e : a) {
        i.next();
        i.set((E) e);
    }
}


The Comparator interface used to determine the ordering of each element is defined in JDK 9 as follows (note that the default method implementations associated with this interface have been removed to enhance the focus on its single abstract method):

@FunctionalInterface
public interface Comparator<T> {
    int compare(T o1, T o2);
}


The compare method takes two objects of type T and determines if the first parameter is less than, equal to, or greater than the second parameter according to the return value of the method, which is defined as follows:

  • Negative: The first parameter is less than the second
  • Zero: The first parameter is equal to the second
  • Positive: The first parameter is greater than the second

Although any positive or negative integer can be returned, the convention used throughout Java is to return -1 if the first parameter is less than the second and return 1 if the first parameter is greater than the second. In the context of list sorting, the Comparator return values abide by the following rules (which correspond to the general compare definition):

  • Negative: The first parameter belongs before the second
  • Zero: The first parameter is equal to the second
  • Positive: The first parameter belongs after the second

The Comparator parameter to the sort method is used to compare each of the elements to one another and determine their relative location in the list. This does not mean that each element is compared to all other elements (the number of comparisons is determined by the sorting algorithm used), but conceptually, the position of an element is determined using the return value of the Comparator#compare method relative to the other elements in the list. For example, the element sorted to the head of the list the element who, when supplied as the first argument to the compare method, returns a negative value when any other element is supplied as the second argument to the compare method (i.e. returns a negative value when compared to any other element in the list).

If a null Comparator object is supplied to the sort method, the elements are sorted using their natural ordering. The natural order of the elements requires that each of the elements implements the Comparable interface. This interface is defined in JDK 9 as the following:

public interface Comparable<T> {
    public int compareTo(T o);
}


The logic for the compare method is identical to that of the Comparator#compare method, except that the comparison is done relative to the current object. Conceptually, the Comparable#compare method is identical to that of the Comparator#compare method, where the first argument of the latter method is this. If the elements of the list do not implement the Comparable interface, sorting by natural ordering is not possible and a ClassCastException is thrown.

get(int index)

An optional operation that returns the element at the supplied index in the list. If the supplied index is outside of the bounds of the list (less than 0 or greater than or equal to the size of the list), an IndexOutOfBoundsException is thrown.

set(int index, E element)

An optional operation that sets the element at the supplied index to the supplied element value. If the supplied index is outside of the bounds of the list (less than 0 or greater than or equal to the size of the list), an IndexOutOfBoundsException is thrown. If the supplied element value is null and the List implementation does not support null values, a NullPointerException is thrown. If the supplied element value does not meet the criteria for a valid element value as per the List implementation (i.e. the List implementation rejects the supplied element value), an IllegalArgumentException is thrown.

add(int index, E element)

Similar to the indexed addAll method, this optional operation inserts a single element at the supplied index. All existing elements in the list after the supplied index will have their indices incremented by one, as illustrated in the figure below.

Image title

If the supplied index is less than 0 (the index directly before the head of the list) or greater than the size of the list (the index directly after the tail), an IndexOutOfBoundsException is thrown. If an index of 0 is supplied, the added element becomes the head of the list; if an index equal to the size of the list is supplied, the added element becomes the tail of the list. List implementations may reject additions based on criteria particular to the implementation, in which case an IllegalArgumentException is thrown.

remove(int index)

Analogous to the indexed add method, this optional operation removes the element at the supplied index. This process is illustrated in the figure below.

Image title

If the supplied index is less than 0 or greater than or equal to the size of the list, an IndexOutOfBoundsException is thrown. If the supplied index is 0, the head of the list is removed; if the supplied index is equal to the size of the list minus 1, the tail of the list is removed.

indexOf and lastIndexOf

The indexOf method returns the index of the first element that equals the supplied object (in terms of its equals method) in the list. If no matching element is found, -1 is returned. Likewise, the lastIndexOf method returns the index of the last element that equals the supplied object, or -1 if no matching element could be found. Note that if only one matching element is present, or if the List implementation does not support duplicates, the same non-negative index will be returned by indexOf and lastIndexOf.

listIterator and listIterator(int index)

Although the general Iterator can be obtained from a List and is still usable, the ordered semantics of a List requires that a more fine-grained iterator be developed for a List. This more detailed iterator is the ListIterator and is defined as the following in JDK 9:

public interface ListIterator<E> extends Iterator<E> {
    boolean hasNext();
    E next();
    boolean hasPrevious();
    E previous();
    int nextIndex();
    int previousIndex();
    void remove();
    void set(E e);
    void add(E e);
}


While the general Iterator is unidirectional and only has a concept of traversing to the next element, the ListIterator is bidirectional and has the concept of traversing to the next or previous element in a List. Additionally, a ListIterator has a concept of the current location in a List. For example, a ListIterator can be started relative to a supplied index (using the listIterator(int) method in the List interface). It is important to understand that a ListIterator does not maintain its positional information (cursor) based on the elements in a List, but rather, relative to the positions between elements. This concept (depicting the possible location of the cursor) is illustrated in the following figure.

Image title

When the cursor is positioned before the head of the list, its next is the head of the list and it has no previous. When the cursor is positioned between two elements indexed at a and b, respectively, its next is b and its previous is a. When the cursor is positioned after the tail of the list, it has no next and its previous is the tail of the list. In general, when a List has n elements, it has n + 1 possible cursor positions.

When a ListIterator is given a starting index, the cursor is positioned such that the next element of the iterator points to the element at the given index and its previous points to the element before the given index. For example, if a ListIterator is started with an index of 1, the cursor would be positioned as illustrated in the following figure.

Image title

In general, a ListIterator started at index i will have its next equal to the element at index i and its previous equal to the element at index i - 1. In addition to traversal methods, ListIterators also have optional operations to add, set, and remove an element at the given cursor location. For example, if the cursor is positioned between elements at index a and b, calling add will result in the added element being positioned such that its next is the current next element and its previous is the current previous element, as illustrated in the figure below.

Image title

In the case of the set method, the last element returned from either the next or previous methods is set to the supplied element. For example, if the cursor is positioned at the head of the list and next is called (moving the cursor between the head of the list and the element at index 1), calling set would result in the head of the list being replaced with the supplied element. Likewise, if the cursor is between the elements at index 1 and 2 and previous is called, calling set would replace the element at index 1 with the supplied element.

It is important to note that the replaced element is the element returned from the last call to either previous() or next(), whichever was called last. For example, starting at the head of the list and calling next()next(), and then previous() would cause the last element returned by previous() to be set when calling the set method. The remove method follows the same pattern as the set method, removing the last element that was returned from either the next() or previous() methods.

subList(int fromIndex, int toIndex)

This required operation returns a view of the list from the fromIndex, inclusive, to the toIndex, exclusive. For example, if this method is called on a list of 10 elements, where fromIndex is 4 and toIndex is 7, a new list is returned containing the elements from the original list at indices 4, 5, and 6 (with the original order maintained in the returned sublist). If the supplied fromIndex is less than 0 or the toIndex is greater than the size of the list, or if the fromIndex is larger than the toIndex, an IndexOutOfBoundsException is thrown. If the supplied fromIndex equals the supplied toIndex, an empty list is returned.

The RandomAccess Interface

The RandomAccess interface is a marker interface that allows for List implementations or external clients to know whether a List can perform random access of elements in near-constant time. This interface allows clients to check if a List implementation is an instance of the RandomAccess and perform retrieval on the List implementation accordingly. For example, the following code can be used to decide which access algorithm is more effective:

List<Foo> someList;

if (someList instanceof RandomAccess) {
    // Randomly access the elements of the list
}
else {
    // Sequentially access the element of the list
}


In general, this marker interface should be applied to List implementations that perform the following random access operation more effectively than sequential access:

for (int i=0, n=list.size(); i < n; i++) {
    list.get(i);
}


List implementations should forgo this marker interface if the following sequential access is performed quicker than the above random access:

for (Iterator i=list.iterator(); i.hasNext(); ) {
    i.next();
}


The List Hierarchy

With a solid understanding of the general characteristics of a list and details of each method of the List interface, we can examine the concrete implementations of the List interface. The JDK 9 List hierarchy (disregarding concurrent List implementations for brevity) is illustrated in the following figure, where green boxes represent interfaces, blue boxes represent abstract classes, and purple boxes represent concrete implementations.

Image title

As covered in The Developer's Guide to Collections, the implementation root of the List hierarchy is the AbstractList, which is a subclass of the AbstractCollection class. From this root abstract class, the ArrayList and Vector classes are directly derived. In turn, the Stack class directly extends the Vector class. Lastly, the AbstractSequentialList extends the AbstractList class, which in turn is extended by the LinkedList class.

ArrayList

An ArrayList is a general-purpose List implementation that is backed by an internal array that expands to store the elements of the list. At any given time, the length of the internal array is called its capacity and may be greater than or equal to the size of the list (the number of elements in the list, not the size of the array used to store the elements). For example, it is possible that an ArrayList contains 10 elements, but its capacity is 12 (enough to store the 10 elements).

Additionally, the ArrayList class implements all optional operations in the List interface and allows for any element, including duplicates and null elements, to be added. An ArrayList is essentially a Vector without the overhead of synchronization. Unless there is a specific need to use a Vector, an ArrayList should be used in place of a Vector. For more information, see the official ArrayList documentation.

LinkedList

A ListedList is a doubly linked list that allows for bidirectional traversal of the elements in the list. This internal doubly link representation is based on the private static Node class, which includes a data field (called an item) of the type of the elements in the LinkedList (matching the type of the formal generic parameter of the LinkedList), a previous link, and a next link, as illustrated in the listing below.

private static class Node<E> {
    E item;
    Node<E> next;
    Node<E> prev;

    Node(Node<E> prev, E element, Node<E> next) {
        this.item = element;
        this.next = next;
        this.prev = prev;
    }
}


Using this Node class, the LinkedList implementation stores a reference to the head and tail of the list, allowing for forward traversal from the head of the list or backward traversal from the tail of the list (depending on which is closer to the needed index). This implementation should be used if elements are frequently added to the head of the list or if an iterator is used to delete elements far from the head or the tail of the list, as these operations are performed in constant time as compared to the linear time of an ArrayList. Note, though, that positional access requires linear time in a LinkedList but is a constant time operation for an ArrayList. For more information on the performance trade-offs between LinkedList and ArrayList, see the official List Implementation tutorial.

The LinkedList class also implements the Queue interface (which will be discussed in a later article in this series) and includes the following methods not included in List interface:

  • addFirst
  • getFirst
  • removeFirst
  • addLast
  • getLast
  • removeLast

For more information on the LinkedList implementation, see the official LinkedList documentation.

Vector

A legacy implementation predates JDK 2. This implementation is essentially an ArrayList with a simplistic internal synchronization mechanism (each mutation method is marked as synchronized). Due to the unneeded overhead of synchronization, an ArrayList should generally be used instead of a Vector. For more information, see the official Vector documentation.

Stack

A legacy Last-in-First-out (LIFO) stack implementation. This implementation includes stack-based methods, such as push, pop, and peek to add an element to the stack, retrieve and remove the top element from the stack, and to retrieve the top element without removing it, respectively. This implementation, like Vector, uses internal synchronization and a Deque implementation should generally be used instead. For more information, see the official Stack documentation.

Comparison

The following table captures some of the trade-offs and benefits of each of the standard List implementations covered in this section:

List Type Description
ArrayList
A good, general purpose list implementation that is efficient in most cases. This implementation uses an internal array to store the elements and must expand when the number of elements exceeds the capacity of the array.If a list frequently accesses its elements by index, this implementation provides constant time access. If instead, elements are frequently added to the head of the list or are removed from the center of the list (generally halfway between the head and tail of the list), these operations are linear and a LinkedList should be used instead. When in doubt, use an ArrayList by default.
LinkedList An efficient implementation for adding elements to the head of the list or sequentially removing elements (or traversing the list and removing selective elements along the way). This list is internally implemented as a doubly linked list with references to both the head and tail of the list, allowing for efficient forward traversal from the head of the list or backward from the tail of the list.
Vector
Uses internal synchronization. In most cases, ArrayList should be used in place of a Vector since the overhead of the internal synchronization of the Vector class is inefficient. The synchronization performed by the Vector class is a method-by-method synchronization, where each method that modifies the list is marked as synchronized. Synchronization mechanics for a list can be more efficiency introduced by using a concurrent list implementation, such as the CopyOnWriteArrayList implementation.
Stack
Uses internal synchronization. A legacy implementation that supports stack-based operations (such as push and pop). The Deque list type should be used instead of this implementation.

Conclusion

Lists are one of the most frequently used collection types, and for good reason. By definition, lists are ordered and its elements can be randomly accessed by index, as well as traversed through a special-purpose ListIterator. In addition, due to the ordered nature of a list, the elements in a List are sortable by either an explicit Comparator or through natural ordering. While some of the mutation methods (such as add) are optional operations, the two most common List implementations, ArrayList and LinkedList, provide implementations for all of these optional operations. Although we have not covered every aspect of the List interface and its various implementations, we have covered a wide breadth of knowledge that will allow us to start in the correct direction, digging into the official documentation where needed. In the next entry in this series, we will cover one of the more mathematical collection representations: The set.

Connect any Java based application to your SaaS data.  Over 100+ Java-based data source connectors.

Topics:
java ,lists ,arrays ,collections ,framework ,linked list ,arraylist ,tutorial

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}