Architecture of a MongoDB-powered Event Processing System
We’re live-blogging from MongoSV today. Here’s a link to the entire series of posts.
The actual title of this talk is “There’s a Moster in My Closest”, but I thought the subtitle would be more elucidating. This talk is packed! Actually, all of the talks so far today have been pretty packed - great crowd here. This was a live blog I wrote for this presentation by Greg Brockman from Stripe.
Monster is the name of the event processing system Greg built for Stripe. Been using it in production for a few months now, and it’s built on top of MongoDB. The concept of event processing is that you want to glean some information from lots of real-time events that are happening (incremental stats, real time analytics, trending topics, etc.). Stripe uses it for fraud detection, dashboards, and more. Now we’re going to get a live demo!
He’s showing a blog-post generator that he’s written, going to use Monster to monitor the content of the posts that it’s spitting out. Live coding a “model”, which looks like sort of a quanta of reporting. Logging a new event per-sentence that gets generated. Now we need a consumer to actually do something with the events. The consumer gets streamed events and just needs to “do something”. Doesn’t worry about storage, generation, etc. Registers for classes of events and has a `consume()` method. Pretty simple, but flexible. Consumer is logging when generated sentences are “too long”.
Question: Monster vs celery/beanstalkd/resque? Answer: when using a job queue the act of logging implies an “action”/job. With Monster/event queuing the goal is to totally decouple logging from performing actions on logs. Can add new consumers later, etc. Events persist, not ephemeral.
Consumer uses polling to get new events.
Now we’re hearing why they chose MongoDB. Replica sets are a major reason, for HA. They also wanted a document store: easy to use, so developers will all use it. They need atomic operations (talking about things like findAndModify). Seems like a lot of the talks today have been mentioning findAndModify. They like automatic collection creation, from a deployment perspective. No migrations, etc. Finally, background index building is really important for Stripe. Can create new indexes w/o compromising availability.
Tradeoff: no transactions. This is the one thing they’d really like for Monster (mainly for DR). The particular case that they need it for is what they call a Stateful Consumer - can modify the state of an event while consuming it. They basically build transactions at the application layer here.
Like the previous talk, they aren’t using capped collections. They don’t expire old events. They also aren’t using sharding (these are in response to audience questions again). Environment is a 3-node replica set on AWS (large instances). Not using EBS except for on one of the secondaries.