Twitter has been ridiculously successful at embedding itself into the lives of hundreds of millions of people. Part of its success is that the service lends itself to a variety of use cases depending on its users' consumption models. These use cases are actually a social manifestation of common data sharing practices. And the same models that helped Twitter raise close to $2B in its IPO are relevant across the infrastructure that makes up all data centers.
Twitter is essentially a message bus. Individual users choose to publish when they want to, and can subscribe to content that is important to them for some reason. Content can be consumed in whatever way makes sense for the subscriber – in realtime, at set intervals, when it is directed to them specifically, or whenever something particularly interesting is going on. The magic in Twitter is in relaying information, not in dictating a specific consumption model or requisite set of actions as a result of any of that content.
The same consumption models exist in the data center.
The update: When
something important happens, it sometimes makes sense to let everyone
know. When a new application instance is deployed, it likely starts with
the server. The act of setting up the server generates information that
might be of interest to other elements within the data center. The
specific application might require some allocation of storage or some
network configuration (VLAN, ACL, QOS, whatever). By sending out a
general update, followers can take appropriate action to ensure a more
automated and orchestrated response.
The follow: Not
all constituents are interesting to everyone. It might not matter to
the load balancers what the application performance monitoring tools are
doing. Rather than clutter their data timeline, they follow only those
elements that are producing content that is relevant to them. This
simplifies data consumption and reduces overhead on the subscriber side.
The list: It
could be that there are lots of interesting sources to follow, but the
sum of all of the updates is overwhelming. In this case, updates can be
grouped into relevant streams, each of which is consumed differently. It
might be sufficient to simply monitor some updates while other updates
require careful consideration and subsequent action. For instance, it
might be interesting for servers to monitor changes in network state but
not necessarily meaningful to act on all changes. Additionally, some
streams might require more constant attention with tighter windows
around activity, while others can be periodically parsed for general
Intermittent monitoring: Some
entities might only parse relevant updates periodically. It is not
important to stay up-to-date in realtime, and it might not even be
important to pay careful attention to every update. They want to consume
content asynchronously and in batches. Analytics tools, for example,
might be able to poll periodically and report overall health without
needing to consume a realtime feed.
updates are interesting, but when multiple sources all report the same
thing, it becomes newsworthy. An error message, for example, might
indicate a random issue. But a flood of error messages from multiple
data center entities might indicate more serious issues that require
attention (perhaps a DDOS attack)
Message threading: The threading function within
Twitter is simply a sort to help provide context and preserve temporal
order around some exchange. This is very similar to reviewing changes or
state information during common troubleshooting tasks.
The thing of central importance in all of the consumption models is the data. In Twitter's case, the 140-character update is the data. The users determine what that data is, with whom that data is shared, and ultimately how that data is consumed. Twitter neither produces the updates nor consumes them. Its sole function is to relay those updates to the appropriate subscribers and to allow data access to those doing searches.
When this is working well, Twitter's message bus is a powerful enabler of human orchestration. Twitter's role in the Arab Spring uprisings has been well-documented. Entire movements have been coordinated across the globe using Twitter as a means to broadcast organizing thoughts. In most of these cases, the origin of the information was not even directly connected to its recipients. Merely publishing information was enough to spur action.
When our industry talks about orchestration in the data center, it need not be that different. Orchestration doesn't require a tight linkage between all elements within the data center ecosystem. Orchestration only requires that data be made available as and when it is needed. The rules for data consumption ought not be uniformly applied. Individual elements will consume information in different ways depending on what their needs are.
This is all to say that delegating application workloads to resources across the data center does not rely on the existence of a tightly-integrated system. Integration and orchestration serve different needs. Integration is about performance – controlling both sides of an interface allows for fine-grained optimization required to eke out every last bit of performance available. Orchestration is about seamless handoff between resources.
The SDN movement broadly can be applied to both performance and workflow automation. Different use cases demand one, the other, or both. But architects and administrators will be best served by explicitly determining whether their objective is integration or orchestration. The differences go well beyond semantics. The architectural implications are profound.