DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
  1. DZone
  2. Data Engineering
  3. Databases
  4. MongoDB Performance Tuning and Scalability

MongoDB Performance Tuning and Scalability

Mike Dirolf user avatar by
Mike Dirolf
·
Dec. 27, 11 · Interview
Like (0)
Save
Tweet
Share
16.49K Views

Join the DZone community and get the full member experience.

Join For Free
This post was a live blog from the recent MongoSV conference.  Here’s a link to the entire series of posts.

Kenny is getting started, talking about performance tuning based on experience at Shutterfly. They have 8 MongoDB clusters in production with ~30 servers. Not cloud based: all own hardware and datacenters.

MongoDB performance tuning is similar to traditional RDBMS tuning. Looking at queries, indexes, etc. If performance isn’t good on a single server than don’t look to sharding, reading from replicas, etc. Single server performance is critical.

Modeling is key. Schema design can be really important for performance (recommends talks later on by Eliot & Kyle).

Know when to stop tuning: prioritize what is important/adequate for the business/application. What needs to be fast? Build tuning into dev. lifecycle, don’t wait until there’s an issue. Tuning is “personal”: need to know your problem/domain.

MongoDB is really fast when read only, writes start to impact performance. Important consideration during design phase.

The profiler. Writes to db.system.profile collection. Recommendation is to turn it on and leave it on: low overhead. Look for full scans (nreturned vs nscanned) and updates (ideally you want fastmod - in place updates. Look for moved & key updates).

Should graph response times over time (from the system.profile collection). Shows performance over time of db. To look at the profiling data just do `show profile` from the shell.

Showing examples of data from the profiler: here’s an example where nscanned is 10000 and nreturned is 1: we need an index! Another example where need to move the document due to an update (keyword “moved” in the profile doc.). Now showing an example using $inc - you’ll see “fastmod” in the profile document - that’s good!

Now talking about explain(). Use during development, don’t wait. This actually runs the query when you call it. When you find a bad op using the profiler, run explain on it to get more info: shows index usage, yields, covered indexes, nscanned vs nreturned. Another recommendation: run explain() twice to see difference when data is in memory. Showing the difference between a query w/ and w/o an index in terms of explain.

Now talking about covered indexes: need to do a projection that says we don’t need _id: `db.test.find({userid: 10}, {_id: 0, userid: 1})`. When you don’t need _id it’s possible to respond to the query using the index only.

Architecture tips: split on functional areas first to different replica set clusters, then worry about sharding those (possibly). Do reads off of slaves when you can, but be sure your app can handle inconsistent reads first. Also, use slaves for maintenance (index compaction, etc.). Move reports & backups to slaves, too. One mongod instance per machine: keeps things simple for introspection.

Emphasizing the importance of minimizing writes.

Now we’re talking about data locality. When you’re doing a query it’s best if the results are as dense as possible (as few blocks on disk). How do you maintain this? Here’s an example of how to see this: need to include `$diskLoc` in your query document, and finish with a `.showDiskLoc()` (analogous to `.explain()`).

Total performance is a function of write performance. Keep an eye on lock % and queue size: how much is the DB waiting for writes. A trick (for pre 2.0 when data > RAM) is to do read before write: spend more time in read lock rather than write lock. Tune for fastmod’s: reduce moves (maybe by pre-padding documents). Evaluate indexes for key changes, minimize # of indexes if unused. Look for places to do inserts instead of updates.

What about scaling reads? They scale easily if writes are tuned. Identify reads that can be performed on slaves. Make sure you have enough RAM for indexes - can check the mongostat “faults” column for cache misses. Minimize I/O per query (back to data locality).

Tools: mongostat (look for faults & lock % / queue lengeth). currentOp() to see what’s waiting. mtop to get a picture of current session level information. iostat to see how much physical I/O is going on. Do load testing before going live. Use MMS (or some other monitoring system).

What if you still need more performance after doing all of this tuning? One option is to use SSDs. Shutterfly uses Facebook’s flashcache: kernel module to cache data on SSD. Designed for MySQL/InnoDB. SSD in front of a disk, but exposed as a single mount point. This only makes sense when you have lots of physical I/O. Shutterfly saw a speedup of 500% w/ flashcache. A benefit is that you can delay sharding: less complexity.

Source:  http://blog.fiesta.cc/post/13976616772/mongosv-live-blog-performance-tuning-and-scalability

Database MongoDB Scalability

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Best CI/CD Tools for DevOps: A Review of the Top 10
  • Unlocking the Power of Elasticsearch: A Comprehensive Guide to Complex Search Use Cases
  • Introduction to Spring Cloud Kubernetes
  • Spring Cloud

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: