Speeding Up Metadata Operations on S3 by 100x Using Fingerprint and Batching
Speeding Up Metadata Operations on S3 by 100x Using Fingerprint and Batching
We take a look at how the Alluxio development team improved the performance of their big data framework.
Join the DZone community and get the full member experience.Join For Free
This blog describes our experience in speeding up Alluxio metadata operations using fingerprint and Alluxio under store bulk operations. These latest optimizations can be found in the 1.8.1 release.
One of the major values Alluxio provides is a simple and unified interface to manage files and directories on different underlying storage systems. Alluxio acts as an intermediate layer and exposes a file interface for applications to interact with, even though the underlying storage system might be an object store that has a different interface. This is common when migrating applications from on-prem to the cloud, where the on-prem application uses the file interface like POSIX to address on-prem storage, while the cloud storage uses an object store interface like S3. Having Alluxio between the application and the storage system saves valuable development time that would otherwise be used in changing the application from using a file interface to an object store interface.
Object stores are often remote and slower to access compared to other underlying storage systems. Users can unintentionally trigger a large number of accesses to these object stores through Alluxio by using Alluxio’s file system interface. This problem is exacerbated when the users call certain metadata operations (
chgrp) recursively involving a large number of files, because of the number of calls made to the underlying object store. This blog looks at a number of optimizations done recently in Alluxio to speed up these metadata operations.
Speeding Up Alluxio/Under Store Synchronization
Since Alluxio v1.7, a fingerprint string that captures the information about a file or directory has been used to detect whether a file or directory has changed. This allows Alluxio to quickly determine whether the file in Alluxio and the file in S3 Storage or other object storage (Under Store) are in sync with each other. An Under Store sync operation takes place when the fingerprints are out of sync. This can happen when the user modifies a file in the Under Store without doing so through Alluxio. Version 1.8 improved this feature by partitioning the fingerprint into two components, a metadata component, and a content component. This further reduces the number of required synchronizations between Alluxio and the Under Store.
Prior to this change, if the metadata of an Under FileSystem (UFS) file/object (owner, mode, etc) changes, the entire file in Alluxio is invalidated. This can lead to unnecessary invalidation and reloading of data, adding to the cost of large metadata operation such as changing permission recursively on a directory containing many files and directories. The metadata operation itself is not necessarily slow, but the subsequent file operation will be slower because Alluxio has to reload the file content from the UFS in the next read operation.
By having the metadata fingerprint independent from the content fingerprint, when there is a change to the file’s metadata, only the metadata component of the fingerprint would be different. Hence, Alluxio does not need to reload data from the underlying storage and can instead apply the metadata update in memory directly. This is a less-expensive way to synchronize between UFS and Alluxio if only a metadata portion changes.
Speeding Up Recursive ListStatus and DiskUsage
One of the most commonly used metadata operations is the
listStatus operation or the
ls command. It is frequently used in an interactive manner, through the Command Line Interface. Any long delay from the user’s perspective is undesirable. Furthermore, it is used in many distributed computation frameworks like Spark and MapReduce, so improvement in efficiency can lead to computational jobs completing faster. One of the options of
listStatus is the recursive option. With the recursive option, users can query the status of the entire folder, recursively. This often leads to a very long list of files and directories being queried and consequently can be quite time-consuming.
Profiling reveals that calls from the file system master to UFS to query metadata information are the bottleneck in this process. This is especially problematic for deployment with object storage, as its UFS since these object stores are often remote and operations such as
listStatus are often slow compared to a co-located file system such as HDFS. The key to improving the performance of recursive
listStatus operations is to reduce the number of calls to the UFSes.
Alluxio recently introduced two features that improved the performance of
listStatus calls. First, Alluxio v1.8 started to leverage the fact that some object stores such as Amazon S3 support recursive listStatus calls. With a single API call, Alluxio can obtain the entire list of files and directories and their metadata. In addition, Alluxio caches the information that is obtained from this recursive call for other purposes such as fingerprint generation and verification.
Additionally, the 1.8.1 release of Alluxio contains another optimization. When a file is created in the Alluxio space as a result of loading metadata from the UFS, Alluxio avoids persisting the file information to the UFS. This is possible because the file creation was initiated from the UFS side, hence the file should already exist in the UFS.
The combined effect of these two optimizations is that the number of calls to the UFS went from O(n) to O(1), where n is the number of files being queried. An experiment was carried out to evaluate this change. It has a deeply nested directory structure with 10 files or directories at each level and 4 levels deep, for a total of 10000 files. Alluxio is deployed on local machines and Amazon S3 is used as the UFS. Comparing the performance of version 1.7.1 and version 1.8.0, there is a modest improvement to the first-time runtime of the recursive
listStatus, reducing running time by 75%. However, the running time of the second run of the recursive
listStatus, is drastically reduced, from over 900 seconds to 8 seconds.
Version 1.8.1 improved the first-time running time of ls -R significantly. Combined optimization reduced the first time ls -R from more than 2000 seconds to about 20 secs, and the subsequent running time of ls -R is only about 7 seconds. The following table summarizes the running time of recursive
listStatus after applying each of our optimizations.
Metadata operation is an important part of any file system. Its performance is even more critical for a system like Alluxio, which often manages several large underlying file systems. This blog post details two optimizations we have done recently in Alluxio v1.8.1 to significantly speed up large recursive load metadata operations. They improve the user experience of using the command line interface to explore Alluxio files using ls and du. In addition, these optimizations also speed up a process known as UFS sync, which is designed to keep files in sync between the UFS and the Alluxio namespace.
Metadata management is a key part of Alluxio, and, in this blog post, we detailed some optimizations done recently to improve its loading speed. Parallel to this effort, we are also looking at how to more efficiently synchronizing the UFS metadata and metadata stored in Alluxio master. In addition to speed and efficiency, we are also working on the scalability of metadata management. We are looking to scale our metadata management to support a much larger number of files. Some of the improvement listed in this blog and other future improvements are:
- Partition UFS Fingerprint between content-related vs metadata-related info (ALLUXIO-3150)
- Use recursive listStatus in UFS to implement loadMetadata (ALLUXIO-3205)
- Reduce the number of interactions with UFS in loadMetadata (ALLUXIO-3300)
Read More: Alluxio Version 1.8.1 Release Notes
Published at DZone with permission of David Zhu . See the original article here.
Opinions expressed by DZone contributors are their own.