Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

MongoDB Christmas optimization

DZone's Guide to

MongoDB Christmas optimization

· Performance Zone
Free Resource

Transform incident management with machine learning and analytics to help you maintain optimal performance and availability while keeping pace with the growing demands of digital business with this eBook, brought to you in partnership with BMC.

Just add indexes, right? MongoDB actually works at a lower level of transparency with respect to relational databases and so more fine tuning is possible on the application side. Whether this is desirable or not is not the point of this article, but here are a few techniques we used to increase the throughput of a distributed applicaton based on several instances of this database.

The goal is clear: preparing for the holiday season and not be caught with slow queries or write contention in our applications.

The basic: indexes

Do not even start optimizing without explaining your queries first, including the sort part:

db.collection.find(...).sort(...).explain()

I made this error and missed out on some easy wins. Here's a commented output for a query containing a range on the ended_at field and an exact match on a low-cardinality field tags which only have a few possible values:

OnebipRS:PRIMARY> db.recruiter_monitoring.count()
10240
OnebipRS:PRIMARY> db.recruiter_monitoring.find({ended_at:{$lt:ISODate("2013-12-23T16:34:00.000Z"), $gt:ISODate("2013-12-23T16:33.000Z")}, tags:"generic"}).explain()
{
  // Btree means an index is being used, while Basic cursors mean a full scan is in your way
  "cursor" : "BtreeCursor ended_at_1_tags_1",
  // number of documents that had to be evaluated
  "n" : 65,
  "nscannedObjects" : 65,
  // the number of elements of the index that had to be evaluated
  // both these numbers should be much lower than the count of the collection
  "nscanned" : 202,
  // indexOnly will be true if the query can be answered just by using the index,
  // without loading actual documents. I can still improve here
  "indexOnly" : false,
  // how the index is being used
  "indexBounds" : {
  "ended_at" : [
  [
  ISODate("2013-12-23T16:33:00Z"),
  ISODate("2013-12-23T16:34:00Z")
  ]
  ],
  "tags" : [
  [
  "generic",
  "generic"
  ]
  ]
  },
}

Divide et impera

It's almost 2014 and MongoDB still has a database-wide write lock (not a global lock anymore fortunately). This means if you have a write-intensive component of your application you should separate its collections into a different database to avoid their write blocking the rest of the application from working.

If you have encapsulated well your choice of database, changing an object to write and read from a different database is a matter of injecting a different MongoCollection object:

new Application($connection->my_db->collection);
new HeavyComponent($connection->my_db_component->another_collection);

You'll probably lose the capability to execute cross-database queries, but you shouldn't do that anyway between multiple collections. In our case the HeavyComponent was pestering of findAndModify() the main database, and moving it outside isolated its locks from the rest of the world. Again, if the object is well-isolated from the rest of the world already it's not a big deal.

(Another possible drawback is that since MongoDB preallocates several gigabytes of space for every new database, you'll occupy more space. That shouldn't be a problem.)

Capped collections

Capped collections have lots of limitations: you can't remove single documents from them and you can't update them if the size of the document grows. The oldest documents are removed when the need to make room for new ones arises.

You also can't extend their size: they are basically an automatically rotating log file on steroids, and are complex to create because unlike for ordinary collections, you have to ensure they are created before any insert() is attempted by your code (so much for schemaless).

Here's a PHP snippet for creating a capped collection:

$db->command(array(
  'create' => 'monitoring_events',
  'capped' => true,
  'autoIndexId' => false,
  'size' => 1024 * 1024, // 1 MB
  'max' => 102400 // number of documents
));

autoIndexId as false avoids creating an index for _id, which is usually mandatory. As long as you query this collection in aggregate or scan it in order, you won't need id and this will make insertions and updates faster.

Conclusions

I hope you have an happy Christmas where the traffic generated by the people on holiday does not affect your availability and your MongoDB instance (that is a strange Christmas greeting).

Evolve your approach to Application Performance Monitoring by adopting five best practices that are outlined and explored in this e-book, brought to you in partnership with BMC.

Topics:

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}