Over a million developers have joined DZone.

The Scalable Array

DZone's Guide to

The Scalable Array

· Big Data Zone ·
Free Resource

Access NoSQL and Big Data through SQL using standard drivers (ODBC, JDBC, ADO.NET). Free Download 

In GraphHopper we can create very tiny graphs for indoor routing, but we can also scale to graphs holding the road network from the planet. And we can expand to this size on demand. The reason is a very simple trick, which we didn't invent and can be found in other systems like ElasticSearch as well.

Internally we're using a simple array. But this won't scale for two reasons:

  1. An array e.g. byte[] is limited to only 2GB as it is accessed via integer values.
  2. To expand the size of an array you need a second destination array which is larger

The second point is bad in terms of memory usage if you have a large array and you want increase it a bit you need more than twice the size of the original array.

But we can easily solve both problems when we use a list of arrays instead. Assume the following wrapper class which can be found in real life:

public class RAMDataAccess {
  private int[][] segments = new int[0][];

  public void ensureCapacity( long bytes ) {
    // if you want to increase the size you just need to create new segments
    for (int i = segments.length; i < newSegs.length; i++)
      newSegs[i] = new int[1 << segmentSizeIntsPower];
    segments = newSegs;

  public void setInt( long longIndex, int value ) {
    longIndex >>>= 2;
    int bufferIndex = (int) (longIndex >>> segmentSizeIntsPower);
    int index = (int) (longIndex & indexDivisor);
    segments[bufferIndex][index] = value;

  public final int getInt( long longIndex ) {
    longIndex >>>= 2;
    int bufferIndex = (int) (longIndex >>> segmentSizeIntsPower);
    int index = (int) (longIndex & indexDivisor);
    return segments[bufferIndex][index];

If you look at the get and set methods more closely you'll see that the access is optimized as well. The simplest approach to calculate the access indices would be:

segments[longIndex / segments.length][longIndex % segments.length];

But if your segments have a size ala 2^n you can use the slightly faster bit operations as pointed out e.g. here.

But the story does not end here. E.g. if we hide those internals behind an interface (we called it DataAccess) we can also use byte arrays or ByteBuffers instead of the integer array. A ByteBuffer can be retrieved from a memory mapped file which then enables us to access several hundred of megabytes on mobiles devices like Android where you only have 32MB of RAM.

If you are interested in more memory efficient data structures, OpenStreetMap, efficient JavaScript or routing algorithms in Java fork our repo or even join our GraphHopper team!

The fastest databases need the fastest drivers - learn how you can leverage CData Drivers for high performance NoSQL & Big Data Access.


Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}