Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Apache RocketMQ: Lessons Learned on How to Ensure Stable Capacity

DZone 's Guide to

Apache RocketMQ: Lessons Learned on How to Ensure Stable Capacity

See how to ensure stable capacity with RocketMQ.

· Integration Zone ·
Free Resource

In a previous article, we talked about how Apache RocketMQ fine-tuned the bottlenecks related to latency.

Remember Little’s law?

Image title


Not surprisingly, there are exceptions when performance fluctuates. In those situations, how do we maintain the stability of the capacity? We must understand the urgency when talking about solutions. If not dealt with immediately, these emergencies might cause cascading failure of the whole cluster. The solution goes to the three well-known approaches: downgrade, traffic shaping, and circuit breaker.

Downgrade

Downgrade means the system will acknowledge that there is a problem and adapt accordingly. It is really the lazy solution of the three. What it does is simply drop certain messages. As to who will decide which ones to drop, it depends on the Qos configuration and the analysis of the user data. As a result, those deemed as “less important” topics will be closed.

Traffic Shaping

There are two major theories on how to shape traffic: the leaky bucket and the token bucket. The leaky bucket assumes the leaking happens at a steady speed, which signals the bucket to accept more water. If the rate of receiving exceeds the rate of leaking, the bucket will overflow.

Image title

The token bucket requires every request to acquire a token before getting into the bucket. Of course, the tokens are renewed at a constant speed. If the requests arrive without tokens, that’s the signal to cut off the traffic.

Image title

Both theories are used in software practices. For example, Guava has RateLimiter module, Netty has TrafficShaping module.

RocketMQ tackles this problem by categorizing traffic. For those identified as frequent, repeating requests, they will be put in a fail-fast track. Which means they will be terminated if the latency exceeds a threshold. For the non-frequent or large requests, mechanisms such as sliding-window will be applied to adjust the frequency and control the impact.

Circuit Breaker

Many Java developers are familiar with Netflix Hystrix. But for those who are not, circuit breaking is the mechanism to reject or time out requests to protect the cluster. Let’s take a look at the following example:

Image title

“Dependency I” is becoming the bottleneck. If left unattended, more and more requests will hang there and eventually bring down the whole system. A circuit breaker is introduced to reject these requests.

Image title

However, many products like Hystrix needs 30s or so to determine the bottleneck, especially those related to system resources. This is just too long for a message queue. RocketMQ does this by quickly identifying the hardware level bottlenecks and will respond within milliseconds.

Conclusion

The stable capacity of a message queue is a critical feature. RocketMQ makes sure that aspect works well. Now many of us might wonder, how would all three of these mechanisms work together? Well, like any design, there are targeted uses cased and those left out. We use three of them to make sure there is no use case forgotten.  

Topics:
message queue ,performance tuning

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}