This was a live-blog from a MongoSV session. Here’s a link to the entire series of posts.
Kyle’s strategy is to start with a normalized representation and then embed for simplicity and optimization. This reminds me of our data-modeling post.
Two strategies; first is to store the canonical hierarchy in a single document. That allows atomic updates of the hierarchy, but can make other ops difficult. The second is to store a list of ancestors in each document; that’s the one we’ll talk about. Each node has a parent_id with the immediate parent. Each also has an array of ancestors. It’s easy to display a single node, but it’s also easy to display all descendants of a node (by looking inside the ancestors array). What about updates?
There’s a new helper to compute ancestors for a node and overwrite the ancestors array (in case we insert in the middle of the hierarchy). When we do an insert in the middle we’ll need to run the helper on each descendant, since they all have a new ancestor.
tl;dr use arrays and multi-key indexes (and the positional operator) to deal with hierarchies.
What about ticketing, don’t you need transactions? How can you do transactions w/o RDBMS style transactions? Distributed systems, long-running transactions, and contentious environments might require different strategies (see paper: your coffee shop doesn’t use two-phase commit). Let’s see an example:
Two collections. One is a map of available seats for an event. The second has one document for each seat, with a state (available or not) and price.
Use find-and-modify as a TAS operation to update the state of seats to “Cart”. If it succeeds all of the seats then we’re good to go. If not, we can manually roll-back state. Basically do each state-change manually and be prepared to roll-back if needed.
This isn’t right for everything but it works for some.
3. Feed reader example
Four collections: users, feeds, entries and buckets.
Users have username and an array of feeds. Each feed in that array is a document with an ID and denormalized name. We’ll index on username.
Feeds collection each has a URL, name, subscriber count (denormalizing here again) and date of last entry.
To add a feed, do an upsert to insert if missing or increment subscriber count otherwise (good use of upsert). Also need to do an $addToSet on the user to add the feed to the feeds array.
Removing means a $pull from the user’s feed array and a $inc of -1 on the feed’s subscriber count.
Entries documents each have content for a single entry, with ID of feed it belongs to.
We need a way to show an individual user’s feed: that query can be expensive if done naively. That’s what bucketing is for: only query for latest entries and then store them in a bucket.
A bucket has a user_id, timestamp, and list of entries. We can just query for buckets to get a users feed. Again, we’re selectively denormalizing here to get good locality and performance. Use rich documents for caching. This gives good sharded locality, too.