MongoDB Sharded Clusters [Q+A]
MongoDB Sharded Clusters [Q+A]
Welcome to another post in our series of interview blogs for the upcoming Percona Live Europe 2017 in Dublin. This series highlights a number of talks that w...
Join the DZone community and get the full member experience.Join For Free
Welcome to another post in our series of interview blogs for the upcoming Percona Live Europe 2017 in Dublin. This series highlights a number of talks that will be at the conference and gives a short preview of what attendees can expect to learn from the presenter.
This blog post is with Jason Terpko and Antonios Giannopoulos, DBAs at ObjectRocket. Their tutorial is MongoDB: Sharded Cluster Tutorial. This tutorial guides you through the many considerations when deploying a sharded cluster. The talk covers the services that make up a sharded cluster, configuration recommendations for these services, shard key selection, use cases, and how to manage data within a sharded cluster. In our conversation, we discussed how using a sharded cluster can benefit your database environment.
Percona: How did you get into database technology? What do you love about it?
Jason: Nowadays, a DBA must also be part-sysadmin and part-developer (and always awesome). Being a DBA gives me the opportunity to deal with the entire stack. I never get bored.
Antonios: I agree with Jason, and I have to add that today there are probably more databases than programming languages. Designing an application often involves using more than one database technology. The challenge is choosing the right ones each time. Honestly, how can you get bored with that?
Percona: You're presenting a session called "MongoDB: Sharded Cluster Tutorial." What is a MongoDB sharded cluster and how is it useful in databases?
Jason: Scaling is one of the biggest challenges in databases world. A database can scale either vertically or horizontally. Vertical scaling is increasing the resources count to deal with database underperformance — doubling the RAM or increasing speed through faster CPUs. Unfortunately, doubling capacity doesn't necessarily mean doubling the performance. There is a breaking point where adding resources does not affect the performance. With horizontal scaling, we distribute the database workload among multiple servers. Each of the instances serves only a portion of the workload. If the database is underperforming, we simply add more servers. It's faster and cheaper compared to vertical scaling, and at the same time, the capacity-to-performance ratio is much higher with horizontal scaling. Sharding is MongoDB's horizontal scaling mechanism.
Percona: How can a sharded cluster affect MongoDB performance (both negatively and positively)?
Antonios: Sharded clusters can have an immediate positive impact on application performance when the collection has been pre-distributed with a hashed shard key or manual splitting. These approaches allow your application to make use of all shards and resources from the start. For some newly deployed applications, this throughput is a requirement for a successful release.
The distribution of data, with its positive impact on performance, can also have a negative effect. Even with an evenly distributed collection, hot spotting can occur. This causes degradation for both targeted and broadcast operations. Also, write operations cause overhead when you need to move or balance this distributed data. This overhead can impact some applications when added to their workload at specific times.
Percona: What are some of the things you need to watch out for when deploying a sharded cluster?
Jason: First of all, there is the shard key selection. Choosing the right shard key makes your application rock (and you might earn the employee of the month award). Selecting a poor shard key may have a catastrophic effect on your business (and get you a much different type of company notice).
Secondly, after sharding a collection any existing applications must continue to work error-free. Familiarizing yourself with shard key limitations and what operations may not work on a sharded collection is very important. Doing the research beforehand will prevent issues later.
Thirdly, you need to monitor the resources you have deployed your sharded cluster on. Whether it is physical, virtualized, or containerized, all components should have a similar performance profile and reliable communication. For broadcast operations, your operation is only as fast as the slowest shard. And if internal traffic is not reliable, your can cluster can be prone to issues.
Percona: What do you want attendees to take away from your session? Why should they attend?
Antonios: Our attendees will feel like Hamlet: "To shard or not to shard?" At the end of the session, every attendee will be able to set up, maintain, and troubleshoot a sharded cluster. Additionally, they will get their hands dirty in our labs. Don't get me wrong — our slides are great! But sharded cluster mastery requires practice. Finally, we encourage discussion during the tutorial, so please come and raise your hand and ask us sharding-related questions. We would love to learn about your use cases and help you in any way.
Percona: What are you most looking forward to at Percona Live Europe 2017?
Both: The free drinks of course! But seriously, Percona Live Euorpe is how community events should be. It's our fourth time attending, and every single time, we meet people who share our passion for databases. They are open for a conversation and everywhere we discover new ways to solve complex problems, new technologies to look at, and innovative ideas to try out. The Percona Live conferences are getting better and better every year, and we are 100% sure that Percona Live Europe 2017 will be a success.
Published at DZone with permission of Dave Avery , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.