Big Data and SDN – Isn’t it inevitable?
Enter analytics. Or, if you play this out to its local conclusion, it is likely to be Big Data.
While most people are focused on how we program discrete forwarding elements in a network, there is certainly a question looming about how we handle external inputs to drive these changes. More specifically, what is the role that analytics play in determining things like path? And when it comes to troubleshooting, what is the source of truth for network state when forwarding decisions are made centrally based on network conditions that are not static or necessarily tied to configuration that can be inspected?
These questions get beyond how we build software-defined networks (or data centers) and get more to what it will take to make these designs production-ready. Having something that works is no substitute for having something that has all the necessary production tools and faculties.
So what does this mean practically?
- What is the right granularity for data collection? Today, many analytics tools look at data and average it out over some time interval. If that data is driving real-time changes in the infrastructure, what is the right time internal? If it's too wide, the data is useless. Too narrow and you get an oscillation problem where you never reach a steady state.
- As we move to more and more data informing decisions, is that data collected and processed in batch jobs? Or do you farm off specific calculations to be done in smaller increments? If it is all at once, you have to crunch a lot of stuff, but if you break everything into small tasks, then you have to orchestrate those tasks.
- To what extent are we ready to trust computers to manage data centers? If we move to a highly automated framework, what is the role of the human operator? Does the system offer up suggestions that are reviewed and approved? Or does the system go ahead and make changes?
- How much data do we keep? And for how long? And where? If everything is data-driven, then how much history must be kept for analysis or troubleshooting? And where is that data kept? The answers will have profound impacts on how supporting tools are developed. If data is distributed on local devices, then any data analytics engines need to harvest data, which adds complexity and some interesting failure scenarios. A central repository has its own design issues and failure scenarios.
If you ask me how things evolve, I suspect that we will not see any real industry standardization on answers to these questions. Some tools and use cases will favor one answer while others will favor another. Needs vary so dramatically that there is plenty of space for various types of tools. In the provisioning area alone, we know about Chef, Ansible, Puppet, CFEngine, Salt and others. When you expand more broadly to other types of tools, the list only grows.
So how do we handle a thousand different integrations so we can use all of these tools? I suspect the answer to that question will be what separates vendors to some extent. The prevailing camp thus far seems to advocate APIs everywhere and point integrations. For high-value tools (read: a customer has asked for it), vendors will integrate. For everything else, the existence of APIs so that customers can integrate seems to be the answer.
I think that approach works so long as the point integrations are broadly applicable and customers are happy doing their own integrations. But in an emerging space, the market is more likely to be split among potential tools, and for most shops, the desire to handle heavy integrations just isn't there. A different approach will have to surface. Rather than relying on system APIs, it might be that the right answer is to essentially groom data and present all network data from a single data engine. All data services would integrate with this engine through a common infrastructure making the integrations less custom. I won't go into too much here, but I like this latter approach as it goes a step further than just saying "We have an API for that."
Regardless of what happens though, I think the intersection of Big Data and SDN is going to happen. It's just a matter of when and where. If you thought one buzzword acronym was tough, the mere mention of two probably makes you uneasy. But I'm afraid this one is unavoidable.