DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • API-Led Example: MuleSoft
  • Router4j: A Free Alternative to Google Maps for Route and Distance Calculation
  • Best Practices for Designing Resilient APIs for Scalability and Reliability
  • Protecting Your API Ecosystem: The Role of Rate Limiting in Service Stability

Trending

  • MCP Servers: The Technical Debt That Is Coming
  • IoT and Cybersecurity: Addressing Data Privacy and Security Challenges
  • Cosmos DB Disaster Recovery: Multi-Region Write Pitfalls and How to Evade Them
  • Immutable Secrets Management: A Zero-Trust Approach to Sensitive Data in Containers
  1. DZone
  2. Data Engineering
  3. Data
  4. Best Practices for API Rate Limits and Quotas

Best Practices for API Rate Limits and Quotas

Rate limits protect your infrastructure, and quotas help you monetize your APIs. Both are key parts of a healthy API strategy.

By 
Derric Gilling user avatar
Derric Gilling
DZone Core CORE ·
Feb. 03, 25 · Analysis
Likes (1)
Comment
Save
Tweet
Share
3.1K Views

Join the DZone community and get the full member experience.

Join For Free

Like any online service, your API users expect high availability and good performance. This also means one customer should not be able to starve another customer's access to your API. Adding rate limiting is a defensive measure that can protect your API from being overwhelmed with requests and improve general availability. 

Similarly, adding quota management also ensures customers stay within their contract terms and obligations, ensuring you're able to monetize your API. This is even more important for Data and GenAI APIs, where the cost of an API can be high and part of your COGS (Cost of Goods Sold). Without quota management, a customer could easily use far more resources than their plan allows, even if they stay within your overall server rate limits. 

Yet, incorrect implementations can cause customers to become angry due to their requests not working as expected. Worse, a bad rate-limiting implementation could fail itself, causing all requests to be rejected. The 429 error is a common result of such failures, indicating that too many requests have been sent in a given amount of time. 

This guide walks through different types of rate limits and quotas. Then, it walks through ways to set up rate limiting that protects your API without angering customers. 

How Do Rate Limits and Quotas Work

Both quotas and rate limits work by tracking the number of requests each API user makes within a defined time interval and then taking some action when a user exceeds the limit, which could be a variety of things, such as rejecting the request with a 429 Too Many Requests status code, sending a warning email, adding a surcharge, among other things. Just like different metrics are needed to measure different goals, different rate limits are used to achieve different goals.

Rate Limits vs. Quota Management

There are two different types of rate limiting, each with different use cases. Short-term rate limits are focused on protecting servers and infrastructure from being overwhelmed, whereas long-term quotas are focused on managing the cost and monetization of your API’s resources.

Rate Limits

Short-term rate limits look at the number of requests per second or per minute and help "even out" spikes and bursty traffic patterns to offer backend protection. Because short-term rate limits are calculated in real time, there is usually little customer-specific context. Instead, these rate limits may be measured using a simple counter per IP address or API key.

Example Use Cases for Rate Limits

  • Protect downstream services from being overloaded by traffic spikes
  • Increase availability and prevent certain DDoS attacks from bringing down your API
  • Provide a time buffer to handle capacity scaling operations
  • Ensure consistent performance for customers and even out load on databases and other dependent services
  • Reduce costs due to uneven utilization of downstream compute and storage capacity.

Identifier

Due to their time sensitivity, short-term rate limits need a mechanism to identify different clients without relying heavily on external context. Some rate-limiting mechanisms will use IP addresses, but this can be inaccurate. For example, some customers may call your API from many different servers. A more robust solution may use the API key or the user_id of the customer.

Scope

Short-term rate limits can be scoped either to the server or a distributed cluster of instances using a cache system like Redis. You can also use the information within the request, such as the API endpoint, for additional scope. This can be helpful in offering different rate limits for different services depending on their capacity. For example, certain services may be very costly to service and can be easily overwhelmed such as launching batch jobs or running complex queries on a database. Short-term rate limits can be imperfect given their real-time nature, which makes them a poor form for billing and financial terms but great for protecting your backend.

Quota Management

Unlike short-term rate limits, the goal of quotas is to enforce business terms such as monetizing your APIs and protecting your business from high-cost overruns by customers. They measure customer utilization of your API over longer durations, such as per hour, per day, or per month. Quotas are not designed to prevent a spike from overwhelming your API. Rather, quotas regulate your API’s resources by ensuring a customer stays within their agreed contract terms. Because you may have a variety of different API service tiers, quotas are usually dynamic for each customer, which makes them more complex to handle than short-term rate limiting. 

Besides quota obligations, historical trends in customer behaviors can be used for spam detection and automatically blocking users who may be violating your API’s terms of service (ToS).

Examples Use Cases for Quota Limits

  • Block intentional abuse such as sending spam messages, scraping, or creating fake reviews
  • Reduce unintentional abuse while allowing a customer’s usage to burst if needed
  • Properly monetize your API via metering and usage-based billing
  • Ensure a customer does not consume too many resources or rake up your cloud bill
  • Enforce contract terms of service and prevent "freeloaders"

Identifier

Long-term quotas are almost always calculated at a per-tenant or customer level. IP addresses won’t work for these cases because an IP address can change, or a single customer may be calling your API from multiple servers, circumventing the enforcement.

Scope

Because quotas are usually enforcing the financial and legal terms of a contract, they should be unified across all servers and be accurate. There can’t be any "guesstimation" when it comes to quotas.

How to Implement Rate Limiting

Usually, a gateway server like NGINX or Amazon API Gateway is the ideal spot to integrate rate limiting, as most external requests will be routed through your gateway layer. For short-term rate limit violations, the universal standard is to reject requests with 429 Too Many Requests. Additional information can be added in the response headers or body instructing the client when the throttle will be cleared or when the request can be retried.

How to Implement Quotas

For long-term quota violations, a number of different actions can be taken. You could either reject the requests similar to short-term rate limiting, but you could also handle other ways, such as adding an overage fee.

Informing Customers of Rate Limit and Quota Violations

Like any fault or error condition, you should have active monitoring and alerting to understand when customers are approaching or exceeding their limits/quotas. Your customer success team should proactively reach out to customers who run into these issues and assist them in optimizing their integration. Because manual outreach can be slow and unscalable, you should have a system in place that automatically informs customers when they do run into rate limits as their transactions are getting rejected which can cause issues in their applications. Another easy way to keep customers informed of such issues is via behavioral emails. 

Rate Limit Remaining Headers

Besides sending emails, it’s also helpful to inform the customer of any rate limit remaining using HTTP response headers. There is an Internet Draft that specifies the headers RateLimit-Limit, RateLimit-Remaining and RateLimit-Reset.

By adding these headers, developers can easily set up their HTTP clients to retry once the correct time has passed. Otherwise, you may have unnecessary traffic, as a developer won’t know exactly when to retry a rejected request. This can create a bad customer experience.

Rate Limit Implementation Errors

Even a protection mechanism like rate limiting could have errors. For example, a bad network connection with Redis could cause reading rate limit counters to fail. In such scenarios, it’s important not to artificially reject all requests or lock out users even though your Redis cluster is inaccessible. Your rate-limiting implementation should fail open rather than fail closed, meaning all requests are allowed even though the rate limit implementation is faulting.

This also means rate limiting is not a workaround to poor capacity planning, as you should still have sufficient capacity to handle these requests or even design your system to scale accordingly to handle a large influx of new requests. This can be done through auto-scale, timeouts, and automatic trips that enable your API to still function.

Conclusion

Quotas and rate limits are two tools that enable you to better manage and protect your API resources. Yet, rate limits are different from quotas in terms of business use cases. It’s critical to understand the differences and limitations of each. In addition, it’s also important to provide tooling such that customers can stay informed of rate limit issues and a way to audit 4xx errors including 429.

API rate limit Data Types

Published at DZone with permission of Derric Gilling. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • API-Led Example: MuleSoft
  • Router4j: A Free Alternative to Google Maps for Route and Distance Calculation
  • Best Practices for Designing Resilient APIs for Scalability and Reliability
  • Protecting Your API Ecosystem: The Role of Rate Limiting in Service Stability

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!