Arrays: Disarray or Datarray?
Arrays: Disarray or Datarray?
We all know about arrays. But if we work primarily in a single programming language we sometimes forget that they can be subtly different across different languages.
Join the DZone community and get the full member experience.Join For Free
Take 60 minutes to understand the Power of the Actor Model with "Designing Reactive Systems: The Role Of Actors In Distributed Architecture". Brought to you in partnership with Lightbend.
This is the first in the data structure reviews and likely the simplest: the humble array. The first issue is the term Array; its term differs depending on who uses it but we will get to that a bit later.
Generally, I think of an array like this:
An array is a container object that holds a fixed number of values of a single type. The length of an array is established when the array is created. After creation, its length is fixed. – Oracle Java Documentation
Seems simple enough. There are two limits placed on our container: single type and fixed length and both relate to how the array is handled in memory. When an array is created it looks at the type and length and uses that to calculate how much memory is needed to store all of that. For example, if we had an array of 8 items we would get a block of memory allocated for the array like this:
In some systems arrays can just grow, but allocating more memory at the end, these are called dynamic arrays. However, many systems do not allow this because the way memory is handled, there might not be any space after the last item to grow into, thus, the array length is fixed as there isn’t any memory allocated for that array instance.
This has a major the advantage to read performance since I can quickly calculate the where the item will be in memory, thus skipping having to read/navigate all other items.
For example, if my arrays values start at position 100 in memory and I want the 4th item in an
int, it would be 4 (for the position) multiplied by 4 (for the
int size) + 100 (for the start address) & boom value!
This makes every read an O(1) operation!
What happens when we can’t know the size of the items in the array? For example, if we created an
object which can hold anything.
In this scenario, when the array is created, rather than allocating memory based on length multiplied by type size, it allocates length multiplied by the size of a pointer and rather than storing the values themselves in the array memory, it stores pointers to other locations in memory where the value is.
Obviously, this has a slightly worse performance than an array where we can have the values in it—but it is slight. Below is some output from BenchmarkDotNet comparing sequential reads of an
object (code here) and it is close:
Method | Median | StdDev | --------------------------- |----------- |---------- |
IntArraySequentialReads | 52.2905 us | 4.9374 us |
ObjectArraySequentialReads | 58.3718 us | 5.4106 us |
Basically, these all have a key and a value associated to them and you can look up the value by using the key. Implementation details differ though from platform to platform.
For example on C#,
Associative arrays do take up more memory than a traditional array (good example here of PHP where it was 18 times larger).
Multi-dimensional arrays also differ platform to platform. The Java version of it is an array of arrays, which achieves the same goal is basically implemented the same as
object was described above. In C#, these are known as jagged arrays.
C# and other languages have proper multi-dimensional arrays which work differently—they actually take all the dimensions, multiply them together and use that for the length of an array. The dimensions just give different offsets.
Jagged arrays do have one benefit over a multi-dimensional array, since each internal array is independent, they can be different sizes where a multi-dimensional array all the dimensions must be the same size.
If you are working in C#, you might be asking yourself what List<T> is and how it relates to Array since it can grow forever!
List<T> is just an array with initial size of 4! When you call
.Add to add a 5th item, it then does the following:
- Create second array of where the length is double the current array
- Copy all items from first array to second array
- Use second array now
This is SUPER expensive and also why there is an optional constructor where you can override the initial size which helps a lot. Once again using BenchmarkDotNet you can see that it makes a nice difference (code):
Method | Median | StdDev | ------------------------ |------------ |----------- | DefaultConstructorUsed | 701.7312 us | 38.5573 us | ConstructorWithSizeUsed | 548.5436 us | 13.1122 us |
This post is one in a series about stuff formally trained programmers know. The rest of the series can be found here.
Published at DZone with permission of Robert Maclean , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.