Big data is a polarizing topic whenever it comes up in discussions — which is often. Like any hot trend, the combination of excitement, ignorance, and opportunism creates noise that leads to a healthy amount of skepticism, cynicism, and a few other isms as well.
Cutting through the clutter
Cutting through the clutter is challenging. The legions of analysts and vendors shouting out predictions of what can be drown out the stories of what actually is. The result? Big data is on a one-way trip to the hype cycle’s trough of disillusionment unless we can get past poor definitions and see the success stories clearly.
I have the chance to do just that October 1st in New York as owner of the Big Data Workshop at InterOp. When I was asked to lead, I immediately saw the opportunity to put a stake in the ground to say what big data is and isn’t — a chance to separate the myth from the reality.
It was clear from the start that accomplishing that goal would require bringing together various players who cover the broad reaches of big data history and use cases — experts who can answer the questions that swirl around big data, such as:
- What are the common myths about big data?
- How are big data requirements discovered and managed?
- What are the risks of finding, gathering, moving, and “freshening up” so much data?
- How do we make sure big data isn’t used in ways that make it creepy data?
- How does big data impact hardware requirements?
- What are some of big data’s best success stories? Highest value use cases?
- What does big data’s future look like?
The balanced approach
The balanced approach involves cross-technology platforms like HP, Talend, Dell, IBM, and TIBCO, big data pure plays like Datameer, and niche vendors like Fabless Labs, a San Francisco startup that specializes in big data-driven, open-source fueled recommendation engines. There will even be a hardware manufacturer, QLogic, and editors from both Venture Beat and Big Data Republic.
Understanding the myths and realities takes such a diverse crowd because there are so many aspects to having a true 360-degree view of big data. The platform technology companies will certainly focus on the need to integrate the many sources and destinations for data while a pure play will focus on Hadoop and its ecosystem. A series of panel discussions will keep the conversation honest, as the different perspectives will be aired in a conversation that will leave nowhere to hide.
Getting the architecture right
John West, co-founder and CTO of Fabless Labs and a presenter in October, stressed to me that a lot of the articles and advice being given around big data fail to take into account the kinds of things that can curtail the value of big data projects. As an example, he brought up the challenge around how data is persisted:
Traditional enterprise architectures use virtualized, SAN-type storage like Oracle or EMC… a ton of disks, huge arrays, and serious expense, where MongoDB, Hadoop, and other big data tools are geared toward local storage. There are huge performance penalties for using the wrong storage architecture, but most people don’t realize these basic facts. Hadoop prefers commodity storage… in this case, the beer budget data storage defeats the champagne architecture. It’s a myth that you can simply add big data technology to your current infrastructure without considering things like system overhead.
I asked John if big data could make a person pretty, rich, or funny. He chuckled and said, “With enough data, I can tell who would be likely to find you pretty or funny, but the rich part is a tougher challenge to solve.”