Welcome to another Percona Live Europe featured talk with Percona Live Europe 2016: Amsterdam speakers! In this series of blogs, we’ll highlight some of the speakers that will be at this year’s conference. We’ll also discuss the technologies and outlooks of the speakers themselves. Make sure to read to the end to get a special Percona Live Europe registration bonus!
In this Percona Live Europe featured talk, we’ll meet Anthony Yeh, Software Engineer, Google. His talk will be on Launching Vitess: How to run YouTube’s MySQL sharding engine. Vitess is YouTube’s solution for scaling MySQL horizontally through sharding, built as a general-purpose, open-source project. Now that Vitess 2.0 has reached general availability, they’re moving beyond “getting started” guides and working with users to develop and document best practices for launching Vitess in their own production environments.
I had a chance to speak with Anthony and learn a bit more about Vitess:
Percona: Give me a brief history of yourself: how you got into database development, where you work, what you love about it.
Anthony: Before joining YouTube as a software engineer, I worked on photonic integrated circuits as a graduate student researcher at U.C. Berkeley. So I guess you could say I took a rather circuitous path to the database field. My co-presenter Dan and I have that in common. If you see him at the conference, I recommend asking him about his story.
I don’t actually think of myself as being in database development though; that’s probably more Sugu‘s area. I treat Vitess as just another distributed system, and my job is to make it more automated, more reliable, and easier to administer. My favorite part of this job is when open-source contributors send us new features and plug-ins, and all I have to do is review them. Keep those pull requests coming!
Percona: Your talk is going to be on “Launching Vitess: How to run YouTube’s MySQL sharding engine.” How has Vitess moved from a YouTube fix to a viable enterprise data solution?
Anthony: I joined Vitess a little over two years ago, right when they decided to expand the team’s focus to include external usability as a key goal. The idea was to transform Vitess from a piece of YouTube infrastructure that happens to be open-source, into an open-source solution that YouTube happens to use.
At first, the biggest challenge was getting people to tell us what they needed to make Vitess work well in their environments. Attending Percona Live is a great way to keep a pulse on how the industry uses MySQL, and talk with exactly the people who can give us that feedback. Progress really picked up early this year when companies like Flipkart and Pixel Federation started not only trying out Vitess on their systems, but contributing back features, plug-ins, and connectors.
My half of the talk will summarize all the things we’ve learned from these early adopters about migrating to Vitess and running it in various environments. We also convinced one of our Site Reliability Engineers to give the second half of the talk, to share firsthand what it’s like to run Vitess in production.
Percona: What new features and fixes can people look forward to in the latest release?
Anthony: The biggest new feature in Vitess 2.0 is something that was codenamed “V3” (sorry about the naming confusion). In a nutshell, this completes the transition of all sharding logic from the app into Vitess: at first you had to give us a shard name, then you just had to tell us the sharding key value. Now you just send a regular query and we do the rest.
To make this possible, Vitess has to parse and analyze the query, for which it then builds a distributed execution plan. For queries served by a single shard, the plan collapses to a simple routing decision without extra processing. But for things like cross-shard joins, Vitess will generate new queries and combine results from multiple shards for you, in much the same way your app would otherwise do it.
Percona: Why is sharding beneficial to databases? Are there pros and cons to sharding?
Anthony: The main pro for sharding is horizontal scalability, the holy grail of distributed databases. It offers the promise of a magical knob that you simply turn up when you need more capacity. The biggest cons have usually been that it’s a lot of work to make your app handle sharding, and it multiplies the operational overhead as you add more and more database servers.
The goal of Vitess is to create a generalized solution to these problems, so we can all stop building one-off sharding layers within our apps, and replace a sea of management scripts with a holistic, self-healing distributed database.
Percona: Vitess is billed as being for web applications based in the cloud and dedicated hardware infrastructures. Was it designed specifically for one or the other, and does it work better for certain environments?
Anthony: Vitess started out on dedicated YouTube hardware and later moved into Borg, which is Google’s internal precursor to Kubernetes. So we know from experience that it works in both types of environments. But like any distributed system, there are lots of benefits to running Vitess under some kind of cluster orchestration system. We provide sample configs to get you started on Kubernetes, but we would love to also have examples for other orchestration platforms like Mesos, Swarm, or Nomad, and we’d welcome contributions in this area.
Percona: What are you most looking forward to at Percona Live Data Performance Conference 2016?
Anthony: I hope to meet people who have ideas about how to make Vitess better, and I look forward to learning more about how others are solving similar problems.
You can read more about Anthony and Vitess on the Vitess blog.