This is the second part of the partial transcription of my discussion with Adrian Cockcroft, cloud luminary and previous Chief Architect at Netflix. In this installment we discuss tooling and PaaS, containerization, and private vs. public cloud.
Microservices and PaaS
John This leads to my next topic, near and dear to my heart: the tooling around microservices. What does PaaS need in order to be successful with microservices?
Adrian PaaS has been evolving. Initially PaaS was "give us some code and we'll give you back a running service," like Heroku or Google App Engine. They owned everything so it was a straitjacket platform. Nothing to worry about, you just check in your code and you're done.
What Netflix built was at the other extreme, all the components you need to build a loosely coupled platform, with very little straitjacket. It's more a platform for delivering at scale. It had some large-scale high-availability goals, it had to be very distributed, and it solved some very hard distributed systems problems by architecting them away with big rules rather than local constraints.
If you design something to be small scale and fairly consistent you soon discover it's really hard to make it large scale and eventually consistent. If your view of the world is "there's one database that everything talks to" then you end up building a system that's very hard to distribute later on. Netflix started off saying there were lots of data sources and they don't talk to each other, and you have to do your joins and transactions at the application layer, sorry!
You don't use relational databases in this world because they they can't scale to be a global system and you can’t have a single big schema. You have to solve these problems anyway, so work it out.
I remember a long time ago when Cloud Foundry first came out Derek Collison was saying you could fire it up on your laptop in Starbucks, and have a functional platform you could develop on and then use that same deployment to go to production. In contrast, to use NetflixOSS you needed to spend a couple of months studying the blog posts and assemble fifteen different moving parts, and it doesn't really make sense until you have 30 or 40 machines running on a AWS. It is totally different from a design point.
Merging the Two Approaches
What I think we're seeing now is the two design points are merging back together a bit. Cloud Foundry is much more microservice-based. There are a lot more services and more complex orchestration.
In the other direction Netflix figured out how to jam all the images into Docker so you can actually start it up on your laptop in Starbucks. So NetflixOSS got enough wrappers around it to make it relatively easy to try out, and the Cloud Foundry base is more scalable and has a lot more useful features for teams building larger pieces of distributed software.
These two approaches are moving closer together, with shared ideas and even some integration points, like SpringCloud that combine the two, resulting in much more distributable systems.
The other thing is the containerization idea. Cloud Foundry has Warden with basic containerization. Docker is a nice lightweight container, then there's the AMI with a JVM Tomcat container. Tomcat was what Netflix saw as its primary container, and the baked AMI was the only way you delivered anything to production.
The Docker and LXC-like tools are a way of wrapping that into something much more lightweight, agile and faster, and as that matures to be more production ready it will be more widely used.
Public vs. Private Cloud
John Isn’t AWS the straitjacket for Netflix? Isn't Netflix tied to, and dependent on, AWS?
Adrian No it abstracts a long way from AWS. They demoed NetflixOSS running on Eucalyptus. There's work going on to port parts of it elsewhere. That's not work Netflix is doing but, it's work that's happening externally. It got ported to Softlayer by a team at IBM and IBM Watson uses NetflixOSS components running on Softlayer.
So it has been ported and it wasn't that difficult. Most of what you are coding to is much higher level concepts. For example auto-scalers and an object store like S3. SQS is useful, but most people have some kind of queuing service. SNS is used a lot internally at Netflix for notifications, but again it's not key to the products and Netflix could go build it itself if it had to.
John Does the NetFlix platform work on OpenStack as well?
Adrian There has been some work on OpenStack. PayPal took Netflix open source code and forked it into a portal that they used to manage openstack.
OpenStack has pretty good southbound APIs controlling stuff in the data center. Nowadays if you're buying something to install in the data center, it has to have an OpenStack driver of some kind, then the rest of the world knows how to talk to it and make it automate-able into whatever fabric they're building.
The northbound APIs from OpenStack were much later to develop and they are still fairly immature, and there was too much missing functionality to run NetflixOSS last time I looked. OpenStack was focused on solving the operations problems, and didn't do enough to solve problems for developers early on.
The base component that Netflix uses is a cluster of identical machines under control of an autoscaler so that if one died it would automatically get replaced. Nowadays that's basically what Mesos does. So if you need Mesos on OpenStack, that means it didn't get done somewhere else in the stack. It wasn't that Amazon is the only place that has self-healing clusters of machines: You can use Rightscale, like Softlayer does, or others. It's not about the code for provisioning an instance or container, it's about the frameworks you need around all that.
John I saw you mention at DockerCon, you said Go is getting very popular.
Adrian I was basically saying that new code that I've seen from start-ups is largely written in Go. I was talking to a big early-adopter bank recently, and they are looking at Go. They are also in the process of moving to AWS. They are in banking, but they are trying to disrupt banking by being an early adopter. They use Cloud Foundry or one of the variants of it, and are doing it to specifically get a business advantage. That's where the presenters at the DevOps Enterprise Summit are going: they don't want to be left behind or they want to try and disrupt their industry.
They need to keep up. If you're going to compete with Amazon.com then you don't do that by running on a Mainframe and updating your software once a year.
John Speaking of banks: what is your viewpoint on private vs. public cloud. I wonder if private cloud is destined going the way of old power generators where every company used to have its own dedicated generator/power plant.
If you talk to Google they say the future is public cloud. Yet banks are still hesitant to use AWS.
Adrian It's a funny quote because Google runs on its own private cloud. How much of Google itself runs on AWS or Google Cloud?
Google's public cloud business is actually very small compared to its private cloud. So Google is a private cloud company that has a little bit of public cloud on the side which they are marketing. It’s very sophisticated as long as you have a Google-like mindset, but enterprises are focused on AWS and Azure.
The real point here is that if you're big enough, you should run your own cloud. Google is too big to run on public cloud, Facebook is too big to run on public cloud.
Amazon.com just about runs on AWS. They are probably the biggest customer of AWS now that they have transitioned in. Netflix is probably a top 10 user of AWS depending on who else is doing what and depending on what year it is, not the biggest user of AWS. However Netflix is one of the most sophisticated and drove a lot of the feature set improvements that they implemented.
If you get big enough, you get to a point at which you cannot run on public cloud. Netflix is in the order of 30,000 machines and probably over 100K cores. That's fairly large, there are other people out there doing things on AWS in the 100k core kind of level. I don't think there's really anybody doing million core jobs as a service running on AWS. But AWS must have several million cores in total.
It does also matter what the laws for a region are, and we are seeing Azure and Vmware based clouds in many locations so that companies and governments can get over the jurisdiction issues. I think AWS needs to move faster here, but Google isn’t playing this game.
You got to be a small fish in a big pond to make cloud work. If you're the largest consumer of a public cloud service, then you may be too big a fish in a small pond. You can’t be a shark in a paddling pool.