Put your fat Collections on a diet!
Join the DZone community and get the full member experience.
Join For FreeBackground
Every Java program uses some sort of data structures, be it a trivial array or a Fibonacci Heap or even something more exotic that only Google search knows about. In most cases developers do not write their own implementations of these structures but use either the one provided by Java core APIs or some third-party library, such as Apache Collections or Google Guava. In my 10+ years of Java development not a day passed by without me using some data structure from Java Collection API. These Lists, Sets and Maps are so natural to me that I don't hesitate a second before writing
Map<Integer, String> = new HashMap<Integer, String>();
And everything was fine until recently…
One of the classes inside our Plumbr java agent needs to store a bunch of integers as one of its fields. The semi-formal requirements are as follows:
- We need a data structure for storing integers.
- No duplicates
- Order is unimportant
- We need to add to this structure new elements
- We need to look up if some element exists in this structure
- Number of different elements is limited to a couple of hundreds at most
- Memory consumption is more important than speed
- Nevertheless the performance must be decent, so MemoryMapped files, database etc are out of question.
The natural choice for this requirement is, at least considering my experience so far, java.util.HashSet<Integer>. So, without thinking twice I gave it a try. That was a disaster!
Experiment
Well, in order to illustrate my point, we need some way to measure memory usage of different data structures. For this blog post I used the following procedure:
- Write a java class with main method, which holds needed data structure as a local variable
- Add infinite cycle to the end of this main method in order for this thread not to die too quickly.
- Using Eclipse Memory Analyzer take a memory dump and find out the size of retained heap for the local variable of interest.
As a baseline I used the fact that in Java integer takes 4 bytes. So, for a COUNT number of integers we need 4*COUNT bytes. Then we can calculate the overhead of the given data structure as follows:
Overhead = Structure's retained heap/(4*COUNT)
Please note, that Java distinguishes between primitives and objects and Collection API operates only on objects. It means that total overhead consists of the overhead of given collection data structure and overhead of using Integer objects, not primitives.
Results
Set<Integer> set = new HashSet<Integer>(); int COUNT = 10; for (int i = 0; i < COUNT; i++) { set.add(i); }
Let us look at the results:
COUNT | Retained heap | Overhead |
10 | 720 | 18 |
100 | 6 960 | 17.4 |
1000 | 85 424 | 21.36 |
1000000 | 88 774 256 | 22.19 |
Wow! Just wow! Storing integers in java.util.HashSet takes about 20(!) times as much memory as the information we are storing. This is a HUGE overhead in my opinion. Taking into account our need to work in really constrained memory conditions that was unacceptable. We had to find some other way. We started with reviewing our requirements: what do we really need? It turns out, that plain old java array suits our requirements all right. Changing my code to this:
int COUNT = 10; Integer[] array = new Integer[COUNT]; for (int i = 0; i < COUNT; i++) { array[i] = i; }
COUNT | Retained heap | Overhead |
10 | 344 | 8.6 |
100 | 3 224 | 8.06 |
1000 | 32 024 | 8.006 |
1000000 | 32 000 024 | 8.000006 |
int COUNT = 10; int[] array = new int[COUNT]; for (int i = 0; i < COUNT; i++) { array[i] = i; }and got:
COUNT | Retained heap | Overhead |
10 | 64 | 1.6 |
100 | 424 | 1.06 |
1000 | 4 024 | 1.006 |
1000000 | 4 000 024 | 1.000006 |
Conclusion
Opinions expressed by DZone contributors are their own.
Trending
-
Using Render Log Streams to Log to Papertrail
-
Demystifying SPF Record Limitations
-
Security Challenges for Microservice Applications in Multi-Cloud Environments
-
Tactics and Strategies on Software Development: How To Reach Successful Software [Video]
Comments