The challenge in architecting, building, and managing data centers is one of balance. There are forces competing to both push together and pull apart datacenter resources. Finding an equilibrium point that is technological sustainable, operationally viable, and business friendly is challenging. The result is frequently a set of compromises that outweigh the advantages.
The datacenter represents a diverse set of orchestrated resources bound together by the applications they serve. At its most simplest, these resources are physically co-located. At its extreme, these resources are geographically distributed across many sites. Whatever the physical layout, these resources are under pressure to be treated as a single logical group.
Resource collaboration - The datacenter is a collection of compute and storage resources that must work in concert in support of application workloads. The simple requirement of coordination creates an inward force pulling resources closer together, even if only logically. How can multiple elements work together towards a common goal if they are completely separate?
The answer is that they cannot. And as IT moves increasingly towards distributed applications, the interdependence between resources only grows.
Interestingly, the performance advantages of distributed architectures are only meaningful when communication between servers is uninhibited. If the network that makes communication possible slows down, the efficacy of the distributed architecture decreases. This means that datacenter architects must solve simultaneously for compute and storage demand, and the interconnect capacity required between them.
Resource availability - Building out a datacenter is an exercise in matching resource capacity to demand. But not just in aggregate.
Individual applications, tenants, and geographies all place localized demands on datacenter resources. If the aggregate demand is sufficient but the resources exist in separate resource pools, you end up in a perpetual state of mismatch. There is always too much or too little workload capacity. The former means you have overbuilt. The latter leaves you wanting for more, which oddly enough means you end up having to overbuild.
Combatting these resource islands requires pulling resources closer together. In the most simple case, this is a physical act. But even if resources cannot be physically co-located, there are entire classes of technologies whose primary function is to allow physically separate resources to behave as if they are in close proximity.
Of course this does not come without a cost. The complexity of managing the disparate technologies required to logically pool physically separate resources can be prohibitively difficult. Even the most skilled specialists have to invest time in creating a properly engineered fabric between sites that accounts for queuing, prioritization, load balancing, and so on. The number of protocols and technologies required is high, and the volume of devices over which they must be applied can be huge. The result is a level of complexity that makes the network more expensive to manage and more difficult to change.
Organizational process - Friction is greatest at boundaries. Whenever a task requires involvement across different organizations or teams, the act of human coordination imposes a tax on both effort and time. In larger organizations, the handoff between teams might be automated to reduce communication mistakes (as with a ticketing system), but the shift in context is still expensive.
This creates organizational pressure to pull together things that might otherwise be separate. If distributed resources can be logically centralized and managed within a common organization, it reduces the dependence on outside teams. The removal of boundaries from common workflows lowers organizational friction and makes easier the overall task of managing the infrastructure.
At the same time that forces are pulling things together, there are equally strong oppositional forces exerting outward pressure on datacenter resources.
Business continuity - For many companies, the datacenter represents a mission critical element of their infrastructure. For companies whose existence depends on the presence of the resources within the datacenter (be they data, servers, or applications), it is untenably risky to rely on a single physical site. This exerts an outward force on resources as companies must create multiple physical sites, typically separated by enough distance that a disaster would not meaningfully impact all sites.
Despite the operational desire to keep things together, the risk to the business dictates that resources be physically separate.
Natural expansion - As resources are added to a datacenter, they are typically installed in racks in relative close proximity to each other. When racks are empty, there is no reason to unnecessarily create physical separation between resources working in concert. Over time, adjacent rack space is filled through the natural expansion of compute, storage, and networking capacity.
As equipment expands, available rack space is depleted, and new racks and rows are populated. Eventually, the device sprawl can occupy entire data centers.
Imagine now that a cluster of servers occupies a rack in one corner of the datacenter. If that cluster is to be expanded, where does the next server go? If the nearby racks are already built out, that resource must be installed some physical distance away from the resources with which it must coordinate.
It is near impossible to plan for all future growth at the time of datacenter inception. Leaving enough space in adjacent racks to account for a decade of growth is impractically expensive. A sparsely populated datacenter suffers from poor space utilization, challenging power distribution, and difficult cabling. Thus, the mere act of expansion actually exerts an outward force leading to physically distributed resources.
Real estate - Sometimes, even when architects want to keep resources together, physical limitations create problems. There is no more immovable object than real estate (which serves as a proxy for all of space, power, and HVAC). In some cases, it is impossible to build out either laterally or even up. In other cases, there is no additional power to be had from the grid. Either of these scenarios forces an expansion to another site, which requires the physical separation of resources that might be expected to function in concert.
Additionally, as land rates change and technologies evolve, the best spots for data centers are not always known. It is difficult at best to predict with enough certainty how a physical site will evolve over an arbitrarily long time horizon. For example, not long ago, the thought of building cooling-hungry data centers in the hot desert was foreign. Today, Las Vegas is home to some of the most cutting edge facilities in the world. This means that geographical dispersion is likely a certainty for large companies. The forces pulling resources physically apart are unlikely to be neutralized.
Finding a balance
Given the strong forces working to keep resources logically together and the equally strong forces keeping them physically separate, how does anyone find a balance?
The price for balance is cost and complexity. You pay for reach directly, and control requires complexity. Both translate into higher carrying costs for the infrastructure. The push-pull dynamic in datacenters is not going away anytime soon. In fact, a move towards more distributed applications will only make harder the balancing act that already exists.
Newer technology offerings like SDN and datacenter fabrics offer some hope, but only insofar as they offer alternatives to the existing problems. Whatever the solution, architects will need to evaluate approaches based not just on the features but on the long-term costs of those features.
[Today’s fun fact: “Way” is the most frequently used noun in the English language. No way!]