Twitter in the Data Center: A Model for Data Consumption
Twitter in the Data Center: A Model for Data Consumption
Join the DZone community and get the full member experience.Join For Free
Twitter has been ridiculously successful at embedding itself into the lives of hundreds of millions of people. Part of its success is that the service lends itself to a variety of use cases depending on its users' consumption models. These use cases are actually a social manifestation of common data sharing practices. And the same models that helped Twitter raise close to $2B in its IPO are relevant across the infrastructure that makes up all data centers.
Twitter is essentially a message bus. Individual users choose to publish when they want to, and can subscribe to content that is important to them for some reason. Content can be consumed in whatever way makes sense for the subscriber – in realtime, at set intervals, when it is directed to them specifically, or whenever something particularly interesting is going on. The magic in Twitter is in relaying information, not in dictating a specific consumption model or requisite set of actions as a result of any of that content.
The same consumption models exist in the data center.
- The update: When something important happens, it sometimes makes sense to let everyone know. When a new application instance is deployed, it likely starts with the server. The act of setting up the server generates information that might be of interest to other elements within the data center. The specific application might require some allocation of storage or some network configuration (VLAN, ACL, QOS, whatever). By sending out a general update, followers can take appropriate action to ensure a more automated and orchestrated response.
- The follow: Not all constituents are interesting to everyone. It might not matter to the load balancers what the application performance monitoring tools are doing. Rather than clutter their data timeline, they follow only those elements that are producing content that is relevant to them. This simplifies data consumption and reduces overhead on the subscriber side.
- The list: It could be that there are lots of interesting sources to follow, but the sum of all of the updates is overwhelming. In this case, updates can be grouped into relevant streams, each of which is consumed differently. It might be sufficient to simply monitor some updates while other updates require careful consideration and subsequent action. For instance, it might be interesting for servers to monitor changes in network state but not necessarily meaningful to act on all changes. Additionally, some streams might require more constant attention with tighter windows around activity, while others can be periodically parsed for general updates.
- Intermittent monitoring: Some entities might only parse relevant updates periodically. It is not important to stay up-to-date in realtime, and it might not even be important to pay careful attention to every update. They want to consume content asynchronously and in batches. Analytics tools, for example, might be able to poll periodically and report overall health without needing to consume a realtime feed.
- Trendspotting: Individual updates are interesting, but when multiple sources all report the same thing, it becomes newsworthy. An error message, for example, might indicate a random issue. But a flood of error messages from multiple data center entities might indicate more serious issues that require attention (perhaps a DDOS attack)
- Message threading: The threading function within Twitter is simply a sort to help provide context and preserve temporal order around some exchange. This is very similar to reviewing changes or state information during common troubleshooting tasks.
The thing of central importance in all of the consumption models is the data. In Twitter's case, the 140-character update is the data. The users determine what that data is, with whom that data is shared, and ultimately how that data is consumed. Twitter neither produces the updates nor consumes them. Its sole function is to relay those updates to the appropriate subscribers and to allow data access to those doing searches.
When this is working well, Twitter's message bus is a powerful enabler of human orchestration. Twitter's role in the Arab Spring uprisings has been well-documented. Entire movements have been coordinated across the globe using Twitter as a means to broadcast organizing thoughts. In most of these cases, the origin of the information was not even directly connected to its recipients. Merely publishing information was enough to spur action.
When our industry talks about orchestration in the data center, it need not be that different. Orchestration doesn't require a tight linkage between all elements within the data center ecosystem. Orchestration only requires that data be made available as and when it is needed. The rules for data consumption ought not be uniformly applied. Individual elements will consume information in different ways depending on what their needs are.
This is all to say that delegating application workloads to resources across the data center does not rely on the existence of a tightly-integrated system. Integration and orchestration serve different needs. Integration is about performance – controlling both sides of an interface allows for fine-grained optimization required to eke out every last bit of performance available. Orchestration is about seamless handoff between resources.
The SDN movement broadly can be applied to both performance and workflow automation. Different use cases demand one, the other, or both. But architects and administrators will be best served by explicitly determining whether their objective is integration or orchestration. The differences go well beyond semantics. The architectural implications are profound.
Published at DZone with permission of Mike Bushong , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.