The major IT trends are all being driven by what can probably best be summarized as more. Some of the stats are actually fairly eye-popping:
- 40% of the world’s 7 billion people connected in 2014
- 3 devices per person by 2018
- Traffic will triple by 2018
- 100 hours of Youtube video are uploaded every minute
- Datacenter traffic alone will grow with a 25% CAGR
The point is not that things are growing, but that they are growing exceedingly fast. And trends like the Internet of Things and Big Data, along with the continued proliferation of media-heavy communications, are acting as further accelerant.
So how do we scale?
Taking a page out of the storage and compute play books
Storage and compute have gone through architectural changes to alleviate their initial limitations. While networking is not the same as storage or compute, there are interesting lessons to be learned. So what did they do?
The history lesson here is probably largely unnecessary, but the punch lines are fairly meaningful. From a storage perspective, the atomic unit shifted from the spinning disk down to a block. Ultimately, to scale up, what storage did was reduce the size of the useful atomic unit of storage. With a smaller building block, storage solutions then assembled lots and lots of these atomic units to create massively scaled solutions.
On the compute side, the play was eerily similar. When we needed more compute capacity, we didn’t create more massive CPUs. Rather, the atomic computing unit shifted from the CPU down to the core. We then distributed loads across multiple cores, allowing us to scale up by becoming more atomic.
The costs of going atomic
When you create lots of smaller units of capacity, enabling them does come with a cost. First, there is a management burden in terms of dealing with lots of parallel workers. Unsurprisingly, the rise of software to handle both storage and compute accompanied the move to smaller atomic units.
The second, and perhaps more important, implication is that the applications themselves have to be architected to take advantage of the new infrastructure. Having multiple CPU cores, for example, is not terribly meaningful if your monolithic application can only run inside one of them.
Finally, once you have smaller pools of workload capacity, distributing application loads across those resources requires some attention to utilization. Both storage and compute have gone through a lot of changes around how resources are divvied up, primarily to address the costs of scaling out.
The side effects of going atomic
This might seem obvious, but one of the nice side effects of reducing the size of the atomic unit for infrastructure is that the per-unit costs end up dropping. Indeed, it is much more cost effective to work on smaller units, especially at higher volumes. By reducing the size of the infrastructure elements, both compute and storage have moved to cheap and cheerful solutions.
The other powerful effect of moving to a distributed workload model was that the meaning of failure changed. When the atomic unit is large, the blast radius for issues is equally large. When the atomic unit is small, the blast radius shrinks. The goal in managing applications across this type of infrastructure shifts from avoiding failure to more capably handling failure when it happens. If data is for some reason unavailable, replication means that block can be pulled from somewhere else. If a server fails, application instances can be spun up on different hardware. In an atomic world, you don’t fear failure so much as embrace it (the Chaos Monkey work is all about this, by the way).
So what is networking’s analog?
If the way to reach cost-effective scale for storage and compute was to reduce the size of the atomic unit, what is that unit for networking?
If storage was the disk and compute was the CPU, networking is probably either the device or the routing/switching silicon. But if the goal is to reduce the size of the atomic unit, then what must be the end state?
Ultimately, the network exists primarily to enable communication (this thing talks to that thing). Arguably, the role of the network is changing more along the lines of distribution (the rise of content-centric networking, as an example). Either way, the function the network provides is the interconnect between users, resources, a data. In that model, the smallest thing of importance is the link. By link, I don’t mean wires. I mean the individual connections that allow communication or distribution between any two resources, be they in the same rack or across a continent.
For networking to get more atomic, it means the proliferation of lots and lots of small links.
Implications of many small links
If there are many small links between resources, those links need to be managed. While SDN was initially conceived as a means to open up routing protocols to research and to create a way to automate workflows, one of the more powerful applications of controller architectures is the global view of the network. That global view provides an intelligent perspective about all the possible paths between communicating resources.
Of course having multiple paths only matters if you are capable of using them. In this environment, anything that limits the available paths is the enemy. Notably, this means that protocols that rely on underlying SPF algorithms are actually constraining. If the goal is to become atomic, these serve as inhibitors.
A note to the bandwidth is cheap and plentiful crowd
There is a school of thought that says that bandwidth is cheap and plentiful. If the costs of bandwidth continue to drop, then can we not just overbuild networks with lots of bandwidth?
Costs for storage and compute were in similar decline. If everything was so cheap, why did technologies that drove higher utilization up emerge? Utilization only matters if you need to get more bang for your buck. It is tempting to think that consumption models stay the same, but the availability of more (compute, storage, networking, whatever) drives higher consumption.
It simply hasn’t been the case in any technology that we have overbuilt anything because a resource is seemingly inexpensive. And while we talk about bandwidth being cheap, we have no fewer than four major technologies emerging whose primary objective is to drive down cost (either CapEx or OpEx): white box, SDN, overlays, and DevOps.
The bottom line
To scale larger, technology has time and again become more atomic and adopted more distributed architectures. The 1990s were about storage’s transition, the 2000s about compute’s transition, and the 2010s are going to be about networking’s transition. When this transition is done, we will have a larger number of smaller links, we will have different control mechanisms to use them, and the per-link cost will be lower.