This is a live blog from MongoSV. Here’s a link to the entire series of posts.
About.me uses MongoDB for different pieces of infrastructure, but this talk is just about queuing.
Originally ran a 3-node RabbitMQ cluster, without disk persistence. Were having trouble diagnosing issues at scale. Looked at some other AMQP options, but decided on MongoDB.
Benefits: async ops, per-message (document) atomicity, batch processing, periodic processing, durability, sharding, operational familiarity (n.b. that would be the big one for me!). One drawback: AMQP push model needs to be emulated with MongoDB polling. To model topic matching, they’re using a regex. One thing they don’t (can’t) do with Mongo: fanout.
Use a capped collection? It has better performance but is limited to a single node and FIFO. They use an uncapped collection: can shard. Can get semi-FIFO but not strict.
Each message is a document. To create a message, just insert. The document has a queue field (string id) and a payload (serialized data).
To consume a message they use a findAndModify to grab and remove a document atomically. They index on (queue, _id).
That’s pretty much it! This would be pretty simple to implement in any language (he’s showing an example in the shell + in Python).
Benchmarks they ran showed MongoDB outperforming RabbitMQ for message creation by 19% (this is a single-node benchmark on a laptop, FYI). For consumption MongoDB again does very well (outperforming RabbitMQ for different levels of concurrency).
FindAndModify is blocking, so you will see high lock % w/ lots of concurrent consumers.
Pros and Cons
Pro: familiar, sharding, durability/persistence, low operational overhead, optional use of advanced queries.
Cons: Not AMQP, needs to poll, performance depends on polling frequency + concurrency, fewer libraries available (for Python there’s a library called Kombu), locking for findAndModify.