List in C#: Implementation and Features
List is one of the most used collections in C#. Let's inspect the features of List and look at how some of its parts are implemented.
Join the DZone community and get the full member experience.
Join For FreeList is one of the most used collections in C#. Let's inspect the features of List and look at how some of its parts are implemented.
This article is about List<T> from the System.Collections.Generic namespace. More specifically, the article describes the internal implementation and some features of List<T>. This is the most used collection in C#, and it's not just my opinion — Andrew Troelsen, Philip Japikse, and Jon Skeet wrote the same in their books. And that's understandable: List<T> is easy to work with. It's quite flexible and thus covers most of the daily tasks for developers. It also has a large number of helpful methods. The collection's abilities are expanded even further with the help of LINQ.
Inside List<T>
The source code of the List<T> class is available on GitHub. This means we can look at its implementation. Let's analyze the most important aspects.
The List<T> class is a sequential and dynamically resizable list of items. Under the hood, List<T> is based on an array.
The List<T> class has three main fields:
- T[] _items is an internal array. The list is built on the base of this array.
- int _size stores information about the number of items in the list.
- int _version stores the version of the collection.
How to Add an Item to the List
The list is dynamically resizable. What happens when we add an item to the list?
public void Add(T item)
{
_version++;
T[] array = _items;
int size = _size;
if ((uint)size < (uint)array.Length)
{
_size = size + 1;
array[size] = item;
}
else
{
AddWithResize(item);
}
}
First, the value of the _version field is increased by one. We'll inspect why this happens a bit later in this article. After that, two local variables are created — array with the elements of the T type and size of the int type. The values of the appropriate fields are assigned to these variables. Next, if an array still has space for one element, then the array element is changed by the size + 1 index. If an array doesn't have any more space left, then the AddWithResize method is called.
private void AddWithResize(T item)
{
Debug.Assert(_size == _items.Length);
int size = _size;
Grow(size + 1);
_size = size + 1;
_items[size] = item;
}
Here, the Grow method is called to increase the internal array's current size. Then, the same actions are performed as in the Add method to add an element if there's available space left.
Let's look closely at the Grow method:
private void Grow(int capacity)
{
....
int newcapacity = _items.Length == 0 ? DefaultCapacity : 2 * _items.Length;
if ((uint)newcapacity > Array.MaxLength) newcapacity = Array.MaxLength;
if (newcapacity < capacity) newcapacity = capacity;
Capacity = newcapacity;
}
The algorithm of the Grow method is as follows:
- If an internal array is empty, then the capacity of the list is four, otherwise double the length of the array;
- if the new capacity value is greater than the maximum possible length of the array, then the capacity will be equal to Array.MaxLength;
- if the new value of the collection's capacity is less than the current one, then the new capacity will be equal to the current one;
- finally, newcapacity is written to the Capacity property.
Why Do We Need the _version Field?
Why do we need the _version field, the value of which changed in the Add method? As mentioned above, this field allows you to track the list version. This field's value is checked when the list is enumerated. For example, let's look at the ForEach method:
public void ForEach(Action<T> action)
{
....
int version = _version;
for (int i = 0; i < _size; i++)
{
if (version != _version)
{
break;
}
action(_items[i]);
}
if (version != _version)
ThrowHelper
.ThrowInvalidOperationException_InvalidOperation_EnumFailedVersion();
}
Before enumeration is started, the value of the _version field is saved to a variable. If during the traversal, the list is changed, then the traversal is stopped, and System.InvalidOperationException is thrown. The _version field is similarly tracked in List<T>.Enumerator. Therefore, if we change the list when traversing it in foreach, it will also result in an exception thrown.
Capacity
List<T> has a constructor that takes a number (initial capacity) as a first argument.
List<int> list = new List<int>(8);
If a developer knows in advance what list size they need, they can set it. This eliminates unnecessary copying operations and memory allocation for a new array when a new item is added.
By the way, we can also manage the size of the internal array using the Capacity property:
list.Capacity = 8;
Let's look at the code of this property:
public int Capacity
{
get => _items.Length;
set
{
if (value < _size)
{
ThrowHelper.ThrowArgumentOutOfRangeException(....);
}
if (value != _items.Length)
{
if (value > 0)
{
T[] newItems = new T[value];
if (_size > 0)
{
Array.Copy(_items, newItems, _size);
}
_items = newItems;
}
else
{
_items = s_emptyArray;
}
}
}
}
The get accessor returns the _items.Length value, i.e., the length of the internal array.
The set accessor works as follows:
- if the value is less than the number of items in the collection, an exception is thrown.
- if the value is not equal to the length of the internal array and the value is greater than zero, a new array with a capacity equal to the value is created;
- if the number of items in the list is greater than zero, then the items from the old array are copied to the new one;
- if the value is zero, then an empty array is assigned to the field, which is an internal array.
Other Features of the List<T> Methods
Insert
The Insert method allows us to insert an item into the collection only within the collection's range. If the number of items in the collection is equal to the size of the internal array, then the array's capacity is increased via the Grow(_size + 1) method. If we try to insert an item by an index that's larger than the list.Count, System.ArgumentOutOfRangeException is thrown.
List<string> list = new List<string>() { "1", "2"};
list.Insert(1, "10"); // OK
list.Insert(2, "15"); // OK
list.Insert(10, 12); // throw exception
Such behavior remains even with explicit management of the internal array's size.
Look at the example:
List<string> list = new List<string>() { "1", "2"};
list.Capacity = 8;
list.Insert(3, "3");
The Capacity property is assigned eight, and the internal array is resized. However, this doesn't allow us to insert an item to a position greater than the list.Count. As a result of executing the code, an exception is thrown.
Clear
This method clears the collection. As a result of this operation, the Count property will be zero. Items of the reference type get the default value. If the collection items are structures and have fields of reference type, these fields also get the default value. Note that the size of the internal array remains unchanged. If before the Clear call, the Capacity property was equal to eight, then after Clear, the array size remains equal to eight. To clear the memory allocated for the array, we need to call TrimExcess after Clear.
TrimExcess
This method makes the size of the internal array equal to the number of items in the list. We should use it when we know that no more items will be added to the collection.
list.Clear();
list.TrimExcess();
Sort and OrderBy
There are several differences between these two methods:
- the Sort method belongs to the List<T> class, while OrderBy is an extension method from LINQ;
- the Sort method modifies the initial collection, while OrderBy returns the sorted copy with the IOrderedEnumerable<TSource> type;
- the OrderBy method performs stable sorting; Sort does not. If we use the Sort method, the equivalent items may be reordered.
A Bit About Performance
List<T> Versus ArrayList
List<T> is generic. This means that when we create a list, we must specify what type of objects it works with.
List<string> list = new List<string>();
Jeffrey Richter, in his book "CLR via C#" describes the following advantages of generics:
- source code protection;
- type safety;
- cleaner code;
- better performance.
The same book (the beginning of the 12th chapter) has a good example of comparing List<T> and ArrayList, its non-generic analog. The essence of this test is in adding an item to the list and assigning the same item to a variable ten million times.
Below is an example of testing ArrayList with a value type:
public void ValueTypeArrayList()
{
ArrayList a = new ArrayList();
for (Int32 n = 0; n < _count; n++)
{
a.Add(n);
Int32 x = (Int32)a[n];
}
}
Testing was performed with objects of value (Int32) and reference (String) types.
After rewriting the code given in the book and testing it with BenchmarkDotNet, I got the following results:
We can see from the results that the List<T> algorithm works with Int32 much faster than ArrayList with Int32. 13 times faster! Besides, the memory is allocated four times less with List<T>.
Due to the fact that many packing operations are performed during ArrayList operation, the number of garbage collections also increases. And obtaining an item requires unpacking. All of that leads to decreased performance.
The difference with reference types is insignificant since there are no packing and unpacking operations, and these operations are heavy. Judging by the code, a small difference in speed appears because of the type conversion operation.
Advantages of Assigning Value to Capacity
As mentioned above, if a developer knows the list size in advance, they can specify it.
Let's do a small test.
public void ListWithoutCapacity()
{
for (int i = 0; i < Count; i++)
{
List<int> list = new List<int>();
for (int j = 0; j < Length; j++)
{
list.Add(j);
}
}
}
Here, 150 000 items are added to the list. Let's perform this operation 1000 times. Then, let's compare the performance with the same method but with the specified capacity, which is equal to the number of additional operations.
The results show that the time spent on executing the method without capacity is two times more than with a pre-set one. Also, the memory is allocated four times more in this case. Such actions eliminate 17 unnecessary copy operations on each iteration of the external loop.
What Is the Fastest Way to Determine That There are Items on the List?
Let's take three options for determining that the list is not empty:
- use the Count method from LINQ and compare the result with 0;
- use the Count property and compare the result with 0;
- use the Any extension method from LINQ.
After testing, we get the following results for the list of 1 500 000 items:
The fastest option is accessing the Count property since it returns the value of the _size field.
The Count method tries to convert the initial collection to ICollection. If the conversion is successful, the method returns the value of the Count property. If the conversion fails, we must traverse the whole collection to count the number of elements. Luckily, List<T> has this interface.
Any method returns true if it finds at least one element in the collection.
Conclusion
We can safely say that List<T> is a more user-friendly version of the array. For example, it's more convenient to work with a list when the number of sequence elements is unknown in advance.
C# has many more collections that help developers in their work. Some of them are more specific, and some of them are less. I hope this article helps you, too, and makes your understanding of lists a little better :).
Published at DZone with permission of Artem Rovenskii. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments