Message Durability in ActiveMQ 5.X
Message Durability in ActiveMQ 5.X
An in-depth dive into message persistence in AMQ, a notoriously difficult minefield that is important to get right.
Join the DZone community and get the full member experience.Join For Free
I get asked quite a bit to explain the basics of how ActiveMQ works with respect to how it stores messages (or doesn’t in some cases). Here’s the high-level explanation of it. Note, the context is within JMS. If you use ActiveMQ’s non-JMS clients (e.g., STOMP, AMQP, MQTT, etc.) then the behavior may be different in some cases.
The JMS durability guarantees are pretty strong in terms of not losing messages that are marked “persistent.” Let’s see how that applies for ActiveMQ.
Topics are a broadcast mechanism. They allow us to implement publish-subscribe semantics in JMS land. However, what happens if we mark a message “persistent” and there are no subscribers? In any normal broadcast (e.g., I go downtown and start shouting about the awesomeness of ActiveMQ), if there are no subscribers (it’s 3am and there’s nobody around to hear me…. must’ve been a good night out if I'm out at 3am) then what happens? Nothing. Nobody hears it. And we move on. ActiveMQ doesn’t do anything with the message if you publish it (persistent or not persistent) and there are no subscribers (no live subscribers and no durable subscribers).
ActiveMQ will only store the message if there are durable subscribers (active or inactive). For an inactive durable subscription, ActiveMQ will store messages marked “persistent” into a non-volatile store and wait for a subscriber to rejoin the subscription. At that point it will try to deliver messages.
For queues, ActiveMQ treats “persistent” messages with a simple default protocol. We basically block the main producer thread and wait for confirmation that the broker has actually gotten the message:
- Producer sends message
- Producer blocks, waits for ACK from broker
- Producer continues on if successful ACK
- Retries if NACK or timeout or failover
- receives message
- stores message to disk
- sends back ACK
For “non-persistent” sends, the flow is different. We send in a “fire and forget” mode. The main producer thread does not get blocked and any ACK or other response happens asynchronously on the ActiveMQ Connection Transport thread:
- Producer sends message
- Producer continues on with its thread and does not block
- Producer eventually gets ACK on a separate thread than the main producer thread
We can increase performance of sends to the broker by batching up multiple messages to send at once. This utilizes the network as well as the broker storage more effectively. There’s an important distinction you must be aware of when sending transacted. The opening of the TX session and the closing of it (rollback/commit) are all synchronous interactions with the broker, however, the sends for each individual message during the TX window are all sent asynchronously. This is okay if everything works out because the broker batches these messages up. But what happens if there are transport errors? Or the broker runs out of space to save these messages?
We need to set an ExceptionListener to watch for errors during these sends. We also need to (or should) set a client side sending “producer window” to allow us to enforce producer flow control when the broker runs out of resources. See ActiveMQ producer flow control for more.
Changing the defaults
The interesting settings on the producer that can change these behaviors:
- useAsyncSend - always wait for ACKs asynchronously, even in persistent sends and commits
- alwaysSyncSend – force all sends (non-persistent or transactional sends included) to always wait for ACK from the broker
Using the defaults are generally what folks want.
For production usage of ActiveMQ, I recommend the shared-storage approach at the moment. In this case, we need to be aware of what’s happening at the storage layer to understand ActiveMQ’s guarantees.
ActiveMQ by default will implement JMS durability requirements which basically states messages that get stored must survive crashes. For this, we by default will do a “fsync” on the filesystem. Now what happens on each system will be dependent on what OS, network, storage controller, storage devices, etc you use. This is the same you’d expect for any type of database that needs to persistently store messages and is not ActiveMQ-specific, per se.
When we write to the ActiveMQ transaction journal we need to ask the OperatingSystem to flush the journal to disk with a call to fsync. Basically what happens is we force the OS to write back the page-file cache it uses to cache file changes to the storage medium. It also encourages the storage medium to do what it needs to do (depends on implementation) to “store” the data to disk:
Some storage controllers have their own cache that needs to be flushed. The disk drives have their own caches, etc. Some of these caches are backed by battery and may write-back at their own time intervals, etc. For you to understand the durability of your messages running through ActiveMQ, you should understand the guarantees of your storage layer.
Finally the last piece of the puzzle is how we deliver/dispatch messages to consumers and how they acknowledge. The ActiveMQ JMS libraries handle all of this for you, so you don’t need to worry about whether or not you’re going to lose messages.
Messages get dispatched to consumers up to a certain “prefetch” buffer that lives on the consumer. This helps speed up message processing by having an available cache of messages on the consumer ready to process and then refill this cache as the consumer consumes them. In ActiveMQ these prefetched messages are denoted as “in flight” in the console. At this point it’s up to the consumer to process these messages and ACK them. (this will depend on the ack modes… the default of auto ack will send the ACK as the consumer gets the message.. for more important message processing you may wish to use “client” ack where the client explicitly says when to ack the message, i.e., after it’s completed some processing).
If the consumer fails for some reason, any of the non-ack’d messages will be redelivered to another consumer (if available) and follow the same processing as above. The broker will not remove the message from its indexes until it gets an ACK. So this includes failures at both the consumer and network level. If there are errors at either of these levels even after a consumer as “successfully processed” (note, this is very use-case specific what “successfully processed” means), and the broker does not get the ack, then it’s possible the broker will re-send the message. In this case you could end up with duplicates on the consumer side and will probably want to implement an idempotent consumer. For scaling up messaging producers/consumers, you’ll want to have idempotent consumers in place anyway.
Last thing to note: JMS DOES NOT GUARANTEE ONCE AND ONLY ONCE PROCESSING of a message without the use of XA transactions. JMS guarantees once and only once delivery insofar it can mark messages as being “redelivered” and have the consumer check that, but the consumer is responsible for how many times it should be allowed to process (or filter out with idempotent consumer).
Published at DZone with permission of Christian Posta , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.