Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Store One Billion Files in Alluxio 2.0

DZone 's Guide to

Store One Billion Files in Alluxio 2.0

We look at how this open source framework can handle larger namespace volumes as people mount more storages into it and make greater use of object storage.

· Big Data Zone ·
Free Resource

Alluxio is a virtual distributed file system that enables applications to access files and objects in different external storage like S3 or HDFS in a unified file system namespace with a single API. Scaling the capacity of Alluxio metadata service is vital to Alluxio for a couple of reasons:

  1. Alluxio provides a single namespace where multiple storage systems can be mounted. So the size of Alluxio's namespace needs to match the sum of the sizes of all mounted storages.
  2. Object storage is increasing in popularity, and object stores often hold many more small files compared with file systems like HDFS.

In Alluxio 1.x, the metadata service is limited to around 200 million files in practice. Scaling further would cause garbage collection issues due to the limited JVM heap size of the Alluxio master process. Also, storing 200 million files would require a large memory footprint (around 200GB) of JVM heap in a single machine running Alluxio master.

The metadata service in Alluxio 2.0 is designed to support at least 1 billion files with a significantly reduced memory requirement.  To achieve this, we added support for storing part of the namespace off-heap by RocksDB on disk. Recently-accessed file system metadata is stored in memory, while older data ends up on disk. This tiered metadata storage significantly reduces the memory requirements for serving the Alluxio namespace and also takes pressure off of the Java garbage collector by reducing the number of objects it needs to deal with.

1 Setting Up Off-Heap Metadata

Off-heap metadata is the default in Alluxio 2.0, so no action is required to start using it. If you want to go back to storing all metadata in-memory, set

alluxio-site.properties

alluxio.master.metastore=HEAP
# default is ROCKS for off-heap

The default location for off-heap metadata is ${alluxio.work.dir}/metastore, which is usually under the alluxio home directory. You can configure the location by setting

alluxio-site.properties

alluxio.master.metastore.dir=/path/to/metastore

In-Memory Metadata Storage

While Alluxio has the ability to store all of its metadata on disk, this would result in reduced performance due to the relative slowness of disk compared to memory. To alleviate this, Alluxio keeps a large number of files in memory for fast access. When the number of files grows close to the maximum cache size, Alluxio evicts the less-recently-used files onto disk. Users can trade off between memory usage and speed by setting

alluxio-site.properties

alluxio.master.metastore.inode.cache.max.size=10000000

A cache size of 10 million requires about 10GB of heap.

2 Sizing the Resource

Memory

For the best performance, the metadata of the working set should fit within the master's memory. Estimate the number of files in your working set, and multiply by 1KB per file. For example, for 5 million files, estimate 5GB of master memory. To configure master memory, add a JVM option in conf/alluxio-env.sh

alluxio-env.sh

ALLUXIO_MASTER_JAVA_OPTS+=" -Xmx5G"

If you have the memory available, we recommend giving the master a 31GB heap in production deployments.

alluxio-env.sh

ALLUXIO_MASTER_JAVA_OPTS+=" -Xmx31G"

Update the cache size to match the amount of memory allocated:

alluxio-site.properties

# 31 million
alluxio.master.metastore.inode.cache.max.size=31000000

This will accommodate most workloads, and avoids the need to switch to 64-bit pointers once the heap reaches 32GB.

Disk

Alluxio stores inodes on disk more compactly than it stores them in memory. Instead of 1KB per file, it takes about 500 bytes. To support 1 billion files stored on disk, provision an HDD or SSD with 500GB of space, and make sure the RocksDB metastore is using that disk.

alluxio-site.properties

alluxio.master.metastore.dir=/path/to/metastore

3 Conclusion

Alluxio 2.0 significantly improves metadata scalability by taking advantage of disk resources for cold metadata. This prepares Alluxio to handle larger namespace volumes as people mount more storages into Alluxio and make greater use of object storage. Stay tuned for a followup blog on the technical details of how we leverage RocksDB for storing our metadata.

Topics:
alluxio ,object storage ,scalability ,big data ,metadata

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}