DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Rate Limiting Strategies for Efficient Traffic Management
  • Exploring Exciting New Features in Java 17 With Examples
  • The Art of Being Ready: Reliability in Extreme Conditions
  • Java and Low Latency

Trending

  • Power BI Embedded Analytics — Part 2: Power BI Embedded Overview
  • Scalable System Design: Core Concepts for Building Reliable Software
  • Scalable, Resilient Data Orchestration: The Power of Intelligent Systems
  • Accelerating AI Inference With TensorRT
  1. DZone
  2. Coding
  3. Java
  4. Limited Conversations With Distributed Systems

Limited Conversations With Distributed Systems

Rate limiting is a mechanism used to control consumption over time. This consumption over time is known as the rate. Read more!

By 
Chinomso Ikwuagwu user avatar
Chinomso Ikwuagwu
·
Nov. 03, 23 · Presentation
Likes (1)
Comment
Save
Tweet
Share
5.6K Views

Join the DZone community and get the full member experience.

Join For Free

By the way, ChatGPT suggested the title: The Art of Balancing Control and Accessibility

Background

Houston Airport had this really big problem. Passengers complained about the time it took for luggage to arrive at the terminal building after the airplane had landed. The Airport invested millions to solve this pain point. They improved the process, hired more people, and introduced new technology. They eventually succeeded in reducing the wait time to 7 minutes. However, users still complained. The Airport realized that they had reached a point where optimizing the process/design was no longer optimal. So they did something different. They reframed the problem. By reframing the problem, they discovered that it was not the time it took to get the luggage to the terminal building that was the problem. It was the time the passengers had to wait for the luggage that was the problem. The Airport decided to park the airplanes further away from the terminal building. Consequently, it took some time for passengers to arrive at the terminal building, thus reducing the wait time for luggage, and voila! Complaints dropped drastically.

One lesson that could be learned from this story is that speed could have unintended consequences, especially when granted to the wrong client or in the wrong context. Therefore, it makes sense to control which and how much traffic accesses our resources. Whenever such control is lacking or ineffective, developer productivity suffers, as engineers spend more time responding to platform incidences (PIs). Existing controls could benefit from an additional dimension as long as such a dimension does not introduce unnecessary complexity or increase response time noticeably. This article will explore rate-limiting as an additional dimension of controlling access to our resources. 

What Is Rate Limiting?

Rate limiting is a mechanism used to control consumption over time. This consumption over time is known as the rate. Hence, the term rate limiting. The goal of a rate-limiting system is to work well when the system is under heavy load. It needs to be built for the worst 1%, not the good 99%.

More Than Limiting

Rate limiting is more than limiting. It could also be used to shape traffic in various ways. For example, smoothing of bursts in traffic. We increase the resiliency of the system by smoothing bursts in traffic. See diagram below:

bursts is traffic

Why Limit Rates?

It is easy to design a system that is 95% resilient. However, moving the resiliency dial to 99.99% requires a well-architected system. This is where rare-limiting, amongst other resiliency mechanisms, comes in. These mechanisms are gaining traction for the following reasons:

  • Growth often happens during periods of high load.
  • It is increasingly easier to exploit public resources due to the advent of AI and related tools.

With rate-limiting, we could achieve the following:

  • Traffic shaping
  • Prevent attacks — e.g., DDoS/brute force attacks.
  • Prevent resource starvation — Some unusual traffic is caused by bots, errors in software, or configurations in some other part of the system, not malicious attacks.
  • Improve developer productivity
  • Save cost. 

Common Rate-Limiting Algorithms (Optional Section)

Some common rate-limiting algorithms are:

  • Token bucket
  • Leaky bucket
  • Fixed window counter
  • Sliding window counter
  • Sliding window logs

Token Bucket

A token bucket is a container that has a pre-defined capacity. Tokens are put in the bucket at preset rates periodically. Once the bucket is full, no more tokens are added. When a request arrives, we check if there is at least one token left in the bucket. If there is, we take one token out of the bucket, and the request goes through. If the bucket is empty, the request is dropped.

Pros

  • Memory efficient.
  • Accommodates burst/spike in traffic. 
  • Easy to implement.

Cons

  • Needs to be adapted for the distributed system by achieving some atomicity when accessing the shared state of buckets.

Leaky Bucket

A leaky bucket is a container that has a pre-defined capacity. Tokens are put in the bucket, one for each request from the client. Requests are taken out of the bucket and processed at a constant rate. If the rate at which requests arrive is greater than the rate at which requests are processed, the bucket will fill up, and further requests will be dropped until there is space in the bucket.

Pros

  • Memory efficient.
  • Suitable for use cases where a stable outflow rate is required.

Cons

  • Not Accommodating of burst/spike in traffic, as some recent requests may be dropped in such cases. 
  • Needs to be adapted for a distributed system by achieving some atomicity when accessing the shared state of buckets.

Rate Limiting at Scale

Desired

Here are some requirements for rate limiting at scale. Rate limiting should:

  • Be very easy to set up.
  • Support dynamic rate limiting on the fly. For example, conditional rate limiting based on both server states (e.g., jvm memory) and request details (e.g ip address, user agent).
  • Does not decrease response times.
  • Support distributed systems. 
  • Be easy to maintain and evolve.

Needs Rate Limiting

Public facing pages, for example:

  • Contact us and other such forms.
  • Pages where users provide inputs that may need to be processed.

Background services which may suffer from traffic bursts:

  • Image upload.
  • Catalog upload.

Heavy lifting services:

  • Order archive download.
  • Other file download services.

Other public-facing pages, for example:

  • Home page.
  • Search page.
  • Article details page.

Rate-Limiting Libraries

The following rate-limiting libraries can do the heavy lifting of rate-limiting. 

  • guava ratelimiter
  • bucket4j
  • resilience4j-ratelimiter
  • Flex rate limiter

guava bucket4j resilience4j flex remarks
Easy to setup (tick) (error) (tick) (tick)
Dynamic rate limiting on the fly (error) (error) (error) (tick)
Decrease to response times (error) (error) (error) (error)
Supports distributed systems (error) (tick) (error) (tick)
Easy to maintain and evolve (tick) (error) (tick) (tick)


From the above list of rate limiters, only the Flex rate limiter allows developers to express various conditions for rate limiting. Fluent expression of such conditions would look like this:

 
IF jvm.memory.available < 5G AND user.role = guest THEN 20 requests / second

ELSE IF jvm.memory.available < 3G THEN 10 requests / second

ELSE IF jvm.memory.available < 2G THEN 5 requests / second


Expressive conditions such as those displayed above allow the rate limiter to change the shape of traffic dynamically.

Flex Rate Limiter

Flex rate limiter enables engineers to fluently express rate conditions. The rate limiter is then able to dynamically respond to changes in traffic. The flex rate limiter is based on google-guava. However, rate limiting is not locked into the in-built rate limiter. Third-party rate limiters could be used and still enjoy the power and simplicity provided by the Flex rate limiter. 

Example Usage of Flex Rate Limiter

The flex rate limiter allows for limits to be specified using annotations. For example, two permits per second when the user is not logged in AND the JVM available memory is less than 1.5GB.

Java
 
@Controller
@RequestMapping("/api")
class GreetingResource {

    @Rate(permits = 2, condition = "web.request.user.role=GUEST & jvm.memory.available<1.7GB")
    @GetMapping("/smile")
    String smile() {
        return ":)";
    }
}


A More Contrived Example of Flex Rate Limiter

Unlike other rate limiters, the Flex rate limiter allows rate limiting based on complex conditions. For example:

 
IF jvm.memory.available < 5G AND user.role = guest THEN 20 requests / second
  
ELSE IF jvm.memory.available < 3G THEN 10 requests / second

ELSE IF jvm.memory.available < 2G THEN 5 requests / second


The above could be expressed as:
Java
 
@Controller
@RequestMapping("/api")
class GreetingResource {

    @Rate(permits = 20, condition = "jvm.memory.available < 5G & web.request.user.role = GUEST")
    @Rate(permits = 10, condition = "jvm.memory.available < 3G")
    @Rate(permits = 5,  condition = "jvm.memory.available < 1G")
    @GetMapping("/smile")
    public String smile() {
        return ":)";
    }
}


Flex Rate Limiter Is Based on Three Major Pillars

  • Flexibility: Use of annotations as well as a flexible and expressive language for rate conditions.
  • Evolvability: Modular design and non-exposure of implementation details. 
  • Ease of use: Minimum setup.

Flexibility

Multiple rates may be specified per class or method. The rates at the class level applies to all methods in the class. Using multiple rate conditions, such as displayed below, allows the rate limiter to dynamically change the shape of traffic. As a result, rate limiting is more responsive, easier to maintain, and boosts developer productivity. 

Java
 
@Rate(10) // 10 permits per second for all methods in this class
@Controller
@RequestMapping("/api/v1")
public class GreetingResource {

    @Rate(permits=1, condition="web.request.user.role=GUEST")
    @Rate(permits=5, condition="web.request.user.role=USER")
    @GetMapping("/smile")
    public String smile() {
        return ":)";
    }

    @Rate(permits=5, timeUnit=TimeUnit.MINUTES, condition="sys.memory.available<1gb")
    @Rate(permits=2, condition="web.request.parameter={viewOptions$#/profile/}")
    @GetMapping("/greet")
    public String greet(@RequestParam("who") String who) {
        return "Hello " + who;
    }
}


Composite rates could be built and re-used multiple times. For example, the conditional limit below (i.e., LimitIfNotGermany) may be used multiple times on different classes/methods. 

Java
 
@Rate(condition = "web.request.locale != [de_DE|de]", permits = 5)
@RateGroup("not-germany")
@Retention(RetentionPolicy.RUNTIME)
@Target({ ElementType.TYPE, ElementType.METHOD, ElementType.ANNOTATION_TYPE})
@interface LimitIfNotGermany{ }

@Controller
@RequestMapping("/api")
class GreetingResource {

    @LimitIfNotGermany
    @GetMapping("/smile")
    String smile() {
        return ":)";
    }
}


The flexibility offered by the Flex rate limiter is made possible by its Rate Condition Expression Language, which supports conditions like web.request.cookie=<cookie-name>  using the following tokens and operators:

Tokens

  • web.request: attribute, auth.scheme, cookie, header, locale, parameter, remote.address, uri, user.principal, user.role
  • web.session: id
  • jvm.memory: available, free, max, total, used
  • jvm.thread.count: daemon, deadlocked, deadlocked.monitor, peak, started
  • jvm.thread.current: count.blocked, count.waited, state, suspended, time.blocked, time.cpu, time.user, time.waited
  • sys.environment
  • sys.property
  • sys.time - current, elapsed

Operators

=       EQUALS

>       GREATER

>=     GREATER_OR_EQUALS

<       LESS

<=     LESS_OR_EQUALS

%      LIKE

^       STARTS_WITH

$       ENDS_WITH

!        NOT (Negates other operators e.g != or !%)

Evolvability

Flex rate limiter has a modular design, which includes the following modules:

  • rate-limiter: Core module. Inspired by guava rate-limiter. Adapted for distributed systems.
  • rate-limiter-annotation: Annotation module. Built on the core module to support annotations. 
  • rate-limiter-web-core: Web module. Built on the annotation module to support Java web-based systems.
  • rate-limiter-spring: Spring module. Built on web module, based on Spring framework. 
  • rate-limiter-javaee: Javaee module. Built on web module, based on javaee specs.

Rate limiting is not locked into the in-built rate limiter. Third-party rate limiters could be used and still enjoy the power and simplicity provided by annotations and Rate Condition Expression Language.

In addition, to prevent tight coupling. The core modules (i.e., rate-limiter, rate-limiter-annotation, and rate-limiter-web-core) do not expose implementation details.

Ease of Use

Here is how a spring boot application could easily set up rate limiting using the Flex rate limiter.

Java
@SpringBootApplication
@EnableConfigurationProperties(MyApp.MyRateLimitProperties.class)
public class MyApp {

    public static void main(String[] args) {
        SpringApplication.run(MyApp.class, args);
    }

    @ConfigurationProperties(prefix = "rate-limiter", ignoreUnknownFields = false)
    public class MyRateLimitProperties extends RateLimitPropertiesSpring { }
    
    @Component 
    public static class MyAppFilter extends ResourceLimitingFilter {
        public MyAppFilter(RateLimitProperties properties) {
            super(properties);
        }
        @Override 
        protected void onLimitExceeded(
                HttpServletRequest request, HttpServletResponse response, FilterChain chain) {
            response.sendError(429, "Too many requests");
        }
    }
}


Here are example rate-limit properties:

Java
rate-limiter:
  resource-packages: com.myapplicatioon.web.rest
  rate-limit-configs:
    task_queue: # Accept only 2 tasks per second 
      permits: 2
      duration: PT1S
    video_download: # Cap streaming of video to 5kb per second
      permits: 5000
      duration: PT1S
    com.myapplicatioon.web.rest.MyResource: # Limit requests to this resource to 10 per minute
      permits: 10
      duration: PT1M 


Putting It All Together

Bot control mechanisms and CAPTCHA are often used to protect resources. This section will re-imagine such systems with rate-limiting introduced. The aim of introducing rate limiting in general and a dynamic rate limiter, in particular, is to stay ahead of the curve. Instead of spending valuable man-hours fire fighting, developers can focus on what they love doing.

Staying Ahead of the Curve

Today, bots have a variety of tricks up their sleeves, including using multiple user agents, IP addresses, rate-limiting detection, etc. Rate limiting detection often involves sending packets as fast as possible for long enough to trigger rate limiting. Thereafter, requests are sent just under the limit to evade detection as a bot. To counter these tricks, we could use dynamic rate limiting provided by the Flex rate limiter as well as a bot trap.

Dynamic Rate Limiting

Dynamic rate limiting involves triggering rate-limiting conditionally. Using conditions like the client's IP address is not effective. Flex rate limiter allows rate limiting based on conditions which the client is not privy to, for example, JVM memory state. This prevents the client from detecting rate limiting because the condition for rate limiting changes arbitrarily based on factors outside the client's control.  

Bot-Trap

A bot trap is a link with text hidden from human vision that only bots are able to click/follow. The text could be hidden by giving it the same color as the web page's background color. Any user who follows the human invisible link is marked as a bot.

Custom Solutions per Use Case

Public Facing Pages

A robust solution would involve dynamic rate limiting to evade rate limiting detection. All requests which exceed the limit (probably bots) are redirected to a CAPTCHA page. The CAPTCHA page contains the bot trap at the very top. Bots click the trap without even attempting the CAPTCHA challenge. This means humans may not need to solve the challenge. Anyone who does not click the bot trap is probably human and is redirected to the desired resource. 

  • Home page.
  • Search page.
  • Article details page.

Background Services May Suffer From Traffic Bursts

Rate limiting with traffic smoothing. In this case, requests are not dropped but delayed depending on various conditions. This acts like a queue that is privy to the server memory/responsiveness state.

  • Catalog upload.
  • Order upload.

Heavy Lifting Services

Plain old vanilla rate limiting. The user gets a limit exceeded when there are too many requests. This should not be surprising to the user as the requested resource in such cases is of a large size.

  • Order archive download.
  • Other file download services.

Architecture

architecture

architecture pattern

Comparison

Property Central control Side car Remarks
Can access state of target server (e.g jvm memory) (error) (tick) Accessing state of target server allows for conditional rate limiting based on important metrics like jvm memory
Rate limiter may serve applications written in other languages (tick) (error)
Rate limiter may be scaled independent of application (tick) (error)
Ease of implementation and maintenance (error) (tick) Once a filter is setup any application that imports that filter, inherits automatic rate limiting. All that is need is to add annotations and/or properties specifying rates and conditions for limiting resources.
Low latency (tick) (error)


Whereas central control is the more advantageous of the two patterns compared above, its implementation requires the setup of a control plane. A control plane is not trivial to set up. On the other hand, the possible latency issue of the "quasi" sidecar pattern (due to the shared cache) could be mitigated by asynchronous-eventual-rate-limiting.

Improving Latency

Asynchronous-eventual-rate-limiting means the following:

  • The call to the shared rate limit cache is made asynchronously. This way, requests are not blocked by the same process (rate-limiting) that was intended to increase latency.
  • A major implication of the asynchronous call is that rate-of-use data will not be strongly consistent but rather eventually consistent. 

Conclusion

It is easy to design a system that is 95% resilient. However, moving the resiliency dial to 99.99% requires a well-architected system. There are various resiliency mechanisms that control and shape traffic. Whenever such control is lacking or ineffective, developer productivity suffers, as engineers spend more time responding to platform incidences. Existing controls could benefit from rate limiting, as long as rate limiting does not introduce unnecessary complexity or increase response time noticeably. Accordingly, the Flex rate limiter was presented as a suitable option for improving the resilience of distributed systems. Using the Flex rate limiter and the "quasi" sidecar pattern, rate-limiting could be easily set up to protect vulnerable resources. In addition, asynchronous-eventual-rare-limiting could be used to ensure low latency when rate-limiting distributed systems with a shared cache.

Service Unavailable

Let us drink to the day when this kind of response will be no more, knowing that day may never come, and we may forever be hungover.

References

  • Flex rate limiter
  • Rate condition expression language: web specification and core specification
  • Systems Design
Java EE Java virtual machine rate limit systems

Opinions expressed by DZone contributors are their own.

Related

  • Rate Limiting Strategies for Efficient Traffic Management
  • Exploring Exciting New Features in Java 17 With Examples
  • The Art of Being Ready: Reliability in Extreme Conditions
  • Java and Low Latency

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!