There's fog coming to the world of IT. Some people are calling this "The Cloud".
This fog turns everything inside out. It will take your monolithic applications, the guts of which are contained within large bodies of code and break them apart so that discrete units of functionality are contained within their own process, exposed through an interface for others to see. Complexity is moving from the inside of a few large processes to the outside of many smaller processes.
People are rightly scared when they see that this fog, this "Cloud", will soon engulf their IT organization. Some of those that has been exposed to the fog, have embraced it, have applied good design practices and seen its real benefits. These people are singing its praise and telling others, "Don't fear the fog, but be prepared for it."
So why are things moving in this direction? Why we are breaking down our applications into smaller and simpler units of discrete functionality, only to have to wire them back together again on the outside?
There are many reasons. The most cloudy answer is that it helps you scale functionality independently and redundantly across highly available cloud infrastructure. Also, smaller components are easier to reason about. Small efficient teams can focus on smaller components, rather than large teams, whose members have to learn a monolith code-base before they become useful.
Each smaller component can be written in the language best suited to it, with a back-end data-store equally suitable. I've seen a pure Java shop build a five line proxy in Node.js-- when they moved in this direction-- because it was the best tool for the job. Moving from Oracle, or even to Oracle, no longer becomes a multi-year project. It becomes a small project that only affects the teams where the move makes the most sense.
When we have more smaller moving parts, tooling needs to change - and it is. Automation is needed for provisioning machines, operating systems and the software stack to run the software that runs our business. It needs to be fast, consistent and it needs to scale. Filing tickets to provision a machine doesn't scale. Having a human decide where the instances of my application should run doesn't scale.
We now have the tooling to do this and these technologies are improving every single day. Public infrastructure and private solutions have greatly improved to what they were a year or two ago. In a year things will be different again. You have probably heard of Docker. 18 months ago, Docker didn't exist. Now it's everywhere and technology providers are racing to support its portability and interoperability. The ecosystem around Docker grew quickly and continues to flourish.
Cloud adoption is racing across enterprises like a fast moving fog.
The price of public cloud infrastructure is already dropping. Amazon Web Services, Google Compute Engine, Microsoft Azure and newer companies like Digital Ocean are in a race to the bottom, while the trust of these public services is on the rise. Yes, we fear the NSA and all those cyber-no-gooders, but we are starting to realize that no matter how skilled our Operations teams are, they just cannot compete. How long can your Operations team say that they are building more secure infrastructure than these public infrastructure services, when their pool of the smartest engineers on the planet is constantly growing?
This situation will only get worse as more companies start to use public infrastructure services. But there is hope on the horizon for private infrastructure. OpenStack is the open-source alternative to these cloud infrastructure giants and it provides a glimmer of hope for those large enterprises, of which there are many, who do not want to sell their private IT soles just yet.
Although, even those that want to keep their infrastructure in-house do see the benefit of utilizing public infrastructure for certain workloads. Public infrastructure allows developers to rapidly experiment. For instance, a team could spin up a 1000 node Hadoop cluster in Brazil at the drop of a hat without having to build out a data-center. Your average IT department will struggle to compete with that.
All large enterprises will likely use public infrastructure in some way. When they do, they will likely be spread across several, if not all, of these providers. One to get the best price, two because different features offered will be more suitable for different projects and three because organizations are large and getting consensus on which cloud provider to bet the farm on is not always feasible.
For a long time we are going to see enterprise IT straddled across all these public and private infrastructure technologies. The private infrastructure will be straddled across legacy systems and private Cloud solutions such as OpenStack. Things are going to get more complex before they get simpler.
So how do we build software to run across the tectonic plates of this public and private infrastructure? It's three parts. At a high-level, it's the way you architect your applications. Then there's the platform on which you build your applications. And at the lowest level is how you build those applications. Let's talk about architecture first...
If you're a developer, you may have used the Twitter API, Google Maps API or any of the many APIs available out there. How long did you spend in a room with the developers of the Twitter API or Google API before you used them? Unless you are well connected, it is unlikely you will get any facetime with developers of those APIs, even if you hit a problem.
On the other end of the scale to Twitter and Google, there are many startups building their business around their own API. The API Hub, Mashape, lists over 10,000 API services you can choose from; with everything from health, commerce and number crunching to "Yoda Speak". The 10,000+ companies behind these APIs are all trying to monetize access to them. You would probably be able to talk to some of these folks; tell them what features you need and they might work with you to meet your needs. But will they allow your application to connect directly to their back-end database? Hell, no.
How about the programming language the API provider uses? Would you ask them? Would you care? Would you refuse to consume their API service because it was not written in Go? You wouldn't care. You would just look at the API provided, the SLA (service level agreement) and possibly the background of the team supporting it and start integrating with it.
Hopefully the API is versioned, so any changes upstream do not affect you and you can gracefully migrate your application to newer versions of the API.
Ok, so we've got a mental picture of us, the consumer of APIs, and these many APIs that we consume from - for which we care not of how they implement their them. Possibly we are providing the data we consume from these as another API. Maybe it's to our web front-end or maybe it's to other consumers. Maybe it's turtles all the way down.
Now imagine this is your organization. Small teams providing distinct components of your business. Each team can focus on choosing the best tools, the most appropriate data-store, whether it be relational or NoSQL and the most appropriate programming language. Each of these micro services provides an SLA and a versioned API. They continue to support previous versions and deprecate features gracefully. There is a roadmap of upcoming features with time-lines so consumers can plan accordingly. High-level design is just as important as ever, if not more so.
For the teams working on these microservices, things are much simpler. A smaller component with a distinct interface is much easier to build and maintain. They can concentrate on getting their component right rather than having to consider all the other moving parts of the system. They can have their own independant release cycle.
The complexity comes with what happens outside these running services. These are Operational concerns. For instance, where does it run? How do we scale it up? How do we deal with the many system failures we might encounter across machines and availability zones? How do consumers discover where this service is running - especially if it moves location due to failure. How do we handle latency between components and prevent a cascading catastrophe as each service's processes, threads and queues backup. This can happen quickly when latency suddenly goes from 2 milliseconds to 2 seconds. How do we handle errors in the system gracefully? What happens when a new version of service fails? Can we fall back to a previous version?
When we think long and hard about all the problems we might encounter, we want to distill it down and keep these concerns from overly polluting our now smaller simpler applications. We want each application to concentrate on the explicit job that we have assigned to it. This is where an application platform comes in.
Platform-as-a-Service, or PaaS, is a concept that is starting to gain widespread acceptance. The past few years has seen growth of the ecosystems around open-source projects like Cloud Foundry. You are probably familiar with the public Platform-as-a-Service, Heroku, which has pioneered a great deal of what we know of and expect from PaaS today. Containerization, such as Docker, goes hand-in-glove with PaaS. In fact, Docker was born from the lesser-known public PaaS, dotCloud.
The application platform provides a way for developers to quickly deploy their software on a foundation which is agnostic to the underlying infrastructure technology or service provider. For instance, a single cluster of a Cloud Foundry based PaaS, such as ActiveState's Stackato, can span internal hardware, OpenStack and multiple service providers such as Google, Amazon or Microsoft.
So what should we expect from an application platform?
- It should be language agnostic. As a developer I should be able to push any code of my choosing and have it running as an application in minutes, if not seconds.
- It should be data-store agnostic. As a developer I should have a range of diverse data-stores to choose from.
- It should be consistent. Every time I push my application code, it should build and deploy it in the same way, so that I can be sure it will work in development as it works for QA, on staging or in production.
- It should be self-serve. As a developer I should be able to provision all the services I need and deploy my code without requiring to file a support ticket or talk with Operations engineers.
- It should be resilient to partial underlying infrastructure failure. If a machine within the cluster is lost, or even an availability zone, we should recover quickly and gracefully to the levels set in the SLA we give our application consumers. There should be no down-time, but temporary throttling and graceful service degradation may be acceptable. This often involves running an application with some redundancy to handle bumps and applications should be designed to be ephemeral.
- It should be extensible. It should provide a programmable interface, such as a REST API. I should be able to interact with it via my IDE. I should be able automate it. I should be able to integrate it with my Continuous Integration and Continuous Delivery pipelines.
- Ideally the application platform would support the portability of and be inter-operable with Docker containers.
So let's assume we have the perfect architecture. Maybe it's a microservices architecture the likes of what Netflix runs, maybe it's something similar or different. No one size fits all. What is happening down in the weeds? Where does the applications run? How are things changing here when we move to the Cloud?
Through the application platform I just described, we are giving developers more power. If our application platform can provide self-service, if it can free developers to choose the right programming languages, data-stores and the right technologies for the job, then we start to see rapid results and innovation.
If the domain of each application is smaller, then developers will have greater focus and will be able to understand and test it more easily. Small teams will be able to fully own these smaller components. Companies that have adopted this model often have the developers be the ones that carry the pagers. If an issue occurs in a small component owned by a small team, then that team are the best people to address the issue. Through the self-service tools they also have the power to rapidly deploy fixes. Through CI pipelines they would also be able to quickly see potential issues with any fixes.
An amazing thing happens when developers carry the pagers - the code quality goes up. Adrian Cockcroft, previously of Netflix, says this is analogous to car drivers having a 6-inch spike instead of airbag. If thinking about your code a little more during the day means not being woken up during the night, then you are more likely to do it. This is also similar to how code quality generally goes down when a development team first gets a QA team that will find all the bugs for them. It's someone else's concern. When there is nobody else to pass the buck to, and you own the code until it is decommissioned, you make sure it works well and will continue to work well.
So how are the applications themselves changing?
Twelve Factor Apps
In order for applications to become scalable they must be ephemeral. This means that each instance of my application is equal to the other instances, no matter whether it just came on-line or has been running for the past 2 weeks. Equally it must have no affect on the overall system if one instance suddenly dies. This means no state. Application instances should be stateless and any persistent data should be offloaded to data services which are shared between all the instances of that application.
The application platform should handle the routing and load balancing across all healthy instances. The application platform should be able to wire-in the location and credentials of the data-services that the application instances will use.
This ephemeral nature of applications and how they are wired into the overall system are part of a manifesto written by Heroku, called "The Twelve-Factor App". I recommend everyone read this. It's 12 short pages and outlines what the perfect cloud application should look like.
Docker now runs through the cloud from development to production. It is the portable entity that allows developers to package their code and deploy it anywhere that runs Linux. It was born from PaaS and this is where it makes most sense.
Docker is simply a building block and does not provide the orchestration. But the portability and interoperability that it brings means that it is becoming the common-denominator of Cloud technology solutions. This benefits the users of the cloud more than anyone.
Docker's original concept was based on the idea of running a single process in each container. This idea of small discrete units of operation has been skewed by people wanting to use them as virtual machines and running an entire operating system. The "can we do this?" continues to drive development in this area, but just because you can do something, does not mean you should. A Docker container can be tightly coupled to a single process and Docker can manage this well. As soon as we decouple this with our own layers of process management within the container, or daemonized processes, things get messy. It is better to keep things simple, utilize the application platform to do your orchestration and wiring for you.
The world of IT has changed. Applications are getting more granular and with this the culture and structure of IT organizations is changing. Design your architecture to take advantage of the cloud, because building monolithic applications no longer scales to the modern business demands. Monolithic applications cannot adapt quickly enough for the changing pace we are seeing.
Don't fear the fog, but be prepared for it.