DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Curious about the future of data-driven systems? Join our Data Engineering roundtable and learn how to build scalable data platforms.

Data Engineering: The industry has come a long way from organizing unstructured data to adopting today's modern data pipelines. See how.

Threat Detection: Learn core practices for managing security risks and vulnerabilities in your organization — don't regret those threats!

Managing API integrations: Assess your use case and needs — plus learn patterns for the design, build, and maintenance of your integrations.

Related

  • Vector Search: The Hottest Topic in Information Retrieval
  • Designing for Security
  • Rate Limiting Strategies for Efficient Traffic Management
  • Understanding the Integration of Embedded Systems in Consumer Electronics

Trending

  • Model Compression: Improving Efficiency of Deep Learning Models
  • Right-Sizing GPU and CPU Resources for Training and Inferencing Using Kubernetes
  • Multimodal RAG Is Not Scary, Ghosts Are Scary
  • Organizing Logging Between the Three IBM App Connect Form Factors
  1. DZone
  2. Data Engineering
  3. Data
  4. A Deep Dive Into the Directory Quotas Design of a Distributed File System

A Deep Dive Into the Directory Quotas Design of a Distributed File System

This article offers insights into the design choices, their pros and cons, and the final implementation of directory quotas in JuiceFS.

By 
Sandy Xu user avatar
Sandy Xu
·
Nov. 07, 23 · Tutorial
Likes (1)
Comment
Save
Tweet
Share
1.2K Views

Join the DZone community and get the full member experience.

Join For Free

As an open-source distributed file system, JuiceFS introduced the directory quota feature in version 1.1. The released commands support setting quotas for directories, fetching directory quota information, and listing all directory quotas. For details, see the Command Reference document.

In designing this feature, our team went through numerous discussions and trade-offs regarding its accuracy, effectiveness, and performance implications.

In this article, we’ll elaborate on different choices made during the design of this feature, their pros and cons, and share the final implementation. This aims to provide insights for users interested in directory quotas or with similar development needs.

Requirement Analysis

Designing directory quotas must first consider three factors:

1. Dimensions of statistics: Commonly, usage and limits are calculated based on directories, but other dimensions include user-based and user-group-based statistics.

2. Resources for statistics: Generally, these include total file capacity and total file count.

3. The limiting mechanism: The simplest way is to halt further writing when usage reaches a predefined value, known as the hard limit. There is also a common limiting mechanism called the soft limit, which triggers only an alert when the usage reaches this value but does not immediately restrict writing. The restriction occurs either when it hits the hard limit or after a grace period.

Furthermore, requirements for the effectiveness and accuracy of quota statistics must be considered. In distributed systems, multiple clients often access simultaneously. Ensuring that their views of quotas remain consistent at the same point in time can significantly impact performance.

It's also important to consider whether complex configurations, such as nested quotas and setting quotas for non-empty directories, should be supported.

Development Principles

Our main consideration was to keep things as simple and manageable as possible. We aimed to avoid extensive code refactoring, minimize intrusions into critical read and write paths, and ensure that implementing new functionalities wouldn't greatly affect the stability and performance of the existing system. Based on this, we compiled the following to-be-developed functionalities, as indicated in the table:

Function description Supported Evaluation
Directory-based limit Yes Basic functionality
Storage capacity limit Yes Basic functionality
Hard limit Yes Basic functionality
Total file count limit Yes Very simple
Nested quota setting Yes For user-friendliness
Quotas for non-empty directories Yes For user-friendliness
Moving directories across quotas Yes For user-friendliness
User/User-group-based restriction No Out of scope
Soft limit No Non-core
Grace period No Non-core
Real-time quota usage statistics No Non-essential
Precise quota usage statistics No Non-essential

It's worth noting the three items in bold in the table. Initially, we did not plan to support these because their complexity posed a challenge to the overall implementation of the quota feature, and they were not part of our defined core functionalities. However, after extensive discussions with various users, we realized that omitting these features would significantly reduce the practicality of the quota feature. Many users indeed required these functionalities to meet their specific needs. Therefore, in the end, we decided to incorporate these functionalities in JuiceFS 1.1.

Basic Functionalities

The User Interface

When we designed the quota feature, our first consideration was how users would set and manage quotas. This typically involves two methods:

  1. Using specific command-line tools. For example, GlusterFS uses the following command to set a hard limit for a specific directory:
 
$ gluster volume quota <VOLNAME> limit-usage <DIR> <HARD_LIMIT>


2. Using the existing Linux tool but with specific fields. For example, CephFS manages quotas as a special extended attribute:

 
$ setfattr -n ceph.quota.max_bytes -v 100000000 /some/dir     # 100 MB


JuiceFS opted for the first approach, with the command form as follows:

 
$ juicefs quota set <METAURL> --path <PATH> --capacity <LIMIT> --inodes <LIMIT>


We made this choice for three reasons: JuiceFS already had a CLI tool in place. Adding quota management functionality only required adding a new sub-command. This was convenient. Quotas are usually configured by administrators and should not be changed by regular users. Custom commands can require METAURL to ensure permission. The second approach requires mounting the file system locally in advance. Quota settings often need to integrate with management platforms. Including the directory path as a parameter in the command avoids this step. This is more convenient.

The Metadata Structure

JuiceFS supports three major categories of metadata engines:

  1. Redis
  2. SQL types, such as MySQL, PostgreSQL, and SQLite
  3. TiKV types, such as TiKV, FoundationDB, and BadgerDB

Each category has different implementations based on the data structures they support, but the managed information is generally consistent. In the previous section, we decided to use the juicefs quota command to manage quotas. So, we also decided to use separate fields to store information for the metadata engine. Taking the example of the SQL type:

 
// SQL table
type dirQuota struct {
    Inode      Ino   `xorm:"pk"`
    MaxSpace   int64 `xorm:"notnull"`
    MaxInodes  int64 `xorm:"notnull"`
    UsedSpace  int64 `xorm:"notnull"`
    UsedInodes int64 `xorm:"notnull"`
}


The code above shows that JuiceFS creates a new table for directory quotas, with the directory index number (Inode) as the primary key, storing threshold values for capacity and file count in quotas and their respective utilization.

Quota Updates/Checks

The maintenance of quota information mainly involves two tasks: updates and checks. Quota updates typically involve creating and deleting files or directories, which affect the file count. Additionally, writing to files impacts quota usage capacity. The most straightforward way to do this is to commit changes to the database right after each request is completed. This ensures real-time and accurate statistics but can easily lead to significant metadata transaction conflicts.

The reason behind this is that in JuiceFS' architecture, there is no separate metadata service process. Instead, multiple clients concurrently submit changes to metadata engines in the form of optimistic transactions. If they attempt to change the same field, like the usage of quotas, in a short timeframe, it results in severe conflicts.

JuiceFS' approach is to synchronize the maintenance of quota-related caches in each client's memory and asynchronously submit local updates to the database every three seconds. This sacrifices a certain degree of real-time capability but significantly reduces the number of requests and transaction conflicts. Additionally, clients load the latest information from metadata engines during each heartbeat cycle (default 12 seconds), including quota thresholds and usage, to understand the global file system situation.

Quota checks are similar to updates, but they are simpler. Before performing an operation, the client can directly perform synchronous checks in memory if necessary, only proceeding with the process if the check passes.

Design of Complex Functionalities

This section discusses the design approach to nested quotas and recursive statistics.

Nested Quotas

In conversations with users, we often encountered this requirement: a department sets a large quota, but within that department, there might be groups or individuals, each needing their own quotas.

This necessitates adding nested structures to quotas. Without considering nesting, each directory has only two states: no quotas or limited by a single quota. However, introducing nested structures complicates the situation. For example, when updating a file, we need to identify all affected quotas and check or modify them. So, given a directory, how can we quickly find all affected quotas?

Option 1: Cache the Quota Tree and Directory-to-Nearest-Quota Mapping

Taking the figure below as an example:
An example of a nested structure.

An example of a nested structure.

This solution involves maintaining a nested structure of quotas and mapping information for each directory to its nearest quota. The data structure for the given example is as follows:

 
// quotaTree map[quotaID]quotaID
{q1: 0, q2:0, q3: q1}
// dirQuotas map[Inode]quotaID
{d1: q1, d3: q1, d4: q1, d6: q3, d2: q2, d5: q2}


With this information, when updating or searching for quotas, you can quickly find the nearest quota ID based on the directory Inode you are operating on. Then, you can traverse the quotaTree to find all affected quotas. This approach allows for efficient lookups and has an advantage from a static perspective. However, certain dynamic changes can be challenging to handle. Consider the scenario below:
A scenario with dynamic changes.

A scenario with dynamic changes.

Now, if you need to move directory d4 from its original location, d1, to d2, the parent quota for q3 changes from q1 to q2. However, since q3 is configured on d6, detecting this change is difficult. We could traverse all the directories underneath while moving d4 to check if they have quotas. However, it's evident that this would be a substantial undertaking. Thus, this solution is not advisable.

Option 2: Cache Directory-to-Parent-Directory Mapping

The figure shows a scenario with dynamic changes:
A scenario with dynamic changes.

A scenario with dynamic changes.

The second solution is caching mapping relationships for all directories to their parent directories. For the initial data structure represented in the figure above, it looks like this:

 
// dirParent map[Inode]Inode
{d1: 1, d3: d1, d4: d1, d6: d4, d2: 1, d5: d2}


With the same modification, all that's needed in this case is to change the value of d4 from d1 to d2. In this approach, when looking for affected quotas in a specific directory, you need to traverse up the directory tree using dirParent until you reach the root directory. During this process, you check whether each directory passed through has quotas set. Clearly, this solution has a slightly lower lookup efficiency compared to the previous one. However, all this information is cached in the client memory, so the overall efficiency remains acceptable. Therefore, we adopted this solution.

It's worth mentioning that the directory-to-parent-directory mapping is cached in client memory without specific expiration policies. This is primarily due to two considerations:

  1. In most cases, the number of directories in a file system is not large. It requires only a small amount of memory to cache them all.
  2. Changes made to directories by other clients do not need to be immediately detected in this client. When this client accesses related directories again, the cache is updated via kernel's lookup or reading directory (Readdir) requests.

Recursive Statistics

During requirement analysis, in addition to nested quotas, two related issues emerged: setting quotas for non-empty directories and dealing with quota changes after directory movements. These problems essentially share a common challenge: how to quickly obtain statistical information for an entire directory tree.
An example of recursive statistics.

An example of recursive statistics.

Option 1: Default Recursion Statistics for Each Directory

This solution is somewhat similar to the nested quotas functionality, but now it needs to add recursive statistics information to every directory. This approach provides the convenience of one-time queries for the entire tree's size. However, the cost of maintenance is high. When modifying any file, it requires updating the recursive statistics for each directory. This can lead to frequent changes, especially closer to the root directories.

JuiceFS uses an optimistic concurrency mechanism for its metadata. This means that conflicts are resolved through retries. Under high-pressure conditions, this can result in severe transaction conflicts, potentially crashing the metadata engine as the cluster scales.

Option 2: No Regular Interventions; Only Perform Temporary Scans When Needed

This is a simple and straightforward approach. However, when dealing with directories containing a large number of files, temporary scans can be time-consuming. It also places a significant load on the metadata engine. Therefore, this solution is not suitable for direct use.

Option 3: Maintain Usage Statistics for Immediate Subdirectories Only; Perform a Full Scan When Needed

The figure below illustrates this solution:
Maintaining usage statistics for immediate subdirectories.

Maintaining usage statistics for immediate subdirectories.

This solution combines the advantages of the previous two solutions while trying to avoid their drawbacks.

There are a couple of observations about file systems: Most metadata requests inherently contain information about the direct parent directory. This eliminates the need for additional operations and avoids any extra transaction conflicts. Typically, in a file system, the number of directories is 2 to 3 orders of magnitude less than the number of regular files.

Given these observations, JuiceFS implemented the directory statistics feature, which maintains usage statistics for only the immediate subdirectories. When recursive statistics are needed for the quota feature, there's no need to traverse all files; it suffices to calculate the usage of all subdirectories. This is the solution JuiceFS ultimately adopted.

Additionally, with the introduction of directory statistics, JuiceFS found some additional benefits. For example, the previously existing juicefs info -r command, used to replace du for calculating the total usage of a specific directory, has significantly improved in speed. Another new command, juicefs summary, provides a quick analysis of specific directory usage. This makes it easier to find the highest-utilized subdirectories through specific sorting.

Quota Repair

As explained earlier, JuiceFS sacrificed a degree of accuracy for stability and reduced performance impact when implementing directory quotas. When a client process crashes, or a directory is frequently moved, there might be slight discrepancies in quota information over time. JuiceFS offers the juicefs check functionality for rescanning the entire directory tree's statistics and comparing it with the stored quota values. If discrepancies are found, the system reports the issues and provides optional repair options.

Data structure Design File system Statistics Directory systems

Published at DZone with permission of Sandy Xu. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Vector Search: The Hottest Topic in Information Retrieval
  • Designing for Security
  • Rate Limiting Strategies for Efficient Traffic Management
  • Understanding the Integration of Embedded Systems in Consumer Electronics

Partner Resources


Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: