Originally written by Mat Mathews at the Plexxi blog.
I spent last week at the Strata + Hadoop World Conference in New York City with 5000 other “big data” customers, vendors, and enthusiasts. In the last 6 months we’ve seen demand for a “big data” based network infrastructure really start to take off, and I’ve spent a lot of time recently trying to better understand the evolving market and technology landscape and use cases. I’m particularly interested in how network infrastructure can drive a better experience for users of big data applications, or networking/infrastructure teams that need to support these applications, but ultimately I want to know what do businesses get out of these investments in data, analytics, and infrastructure.
Hadoop World was a really great experience. As a relative newbie to Big Data, I have a lot to learn and this was a great place to soak up actual customer use cases. While there was certainly much feel-good hyperbole about the “making the world a better place” (if you haven’t seen HBO’s Silicon Valley, please watch this!) nature of big data, that was more than offset with actual real-world details of how data was being used to solve more day-to-day business problems. Here’s a quick synopsis of some of my personal highlights:
- Finance People have their Sh*t Together In “Why Marketing Suck at Big Data” Jennifer Zeszut (@jenniferland) from Beckon pleaded with Marketers to learn from other functional areas that actually make formal use of data in pretty structured ways (e.g. finance people). Her message was a bit contrarian to the whole Big Data notion of data exploration – she talked about structuring the data on the way in, storing only the data that matters, and avoiding the data “spelunking” approach. She used some great examples of what Finance people would never do (like throw all receipts into one big data warehouse without any input classification!).
- Data is Intrinsically Worthless In “Do you Know What Your Data is Worth?” Brian d’Alessandro from Dstillery (@delbrians) talked about how you can easily double your data, which often doubles the investment (cost to acquire the data) but doesn’t double the benefit. He captured the “Value of Data (VOD)” in a handy equation that looked at understanding the value of an application with data minus the value of an application without data. A key lesson I learned here was that data has no intrinsic value – rather it is tied to the applications and actions derived from the data.
- Surprise leads to Innovation In “The Sound of Data Silence” Jana Eggers from Nara Logics (@jeggers) talked about how to better listen to the non-obvious signals in the data. She gave some very practical exercises on how to do this including going beyond the ‘show me state’ to the ‘curiosity state’ – being hyper curious by channelling your inner Steve Jobs and remembering not to rationalize surprises in the data as this is the true source of innovation.
- The Data Natives Generation In “The Future of Data” Kim Rees of Periscopic (@krees) gave a fascinating talk on just that – the future of data. To demonstrate how data is much more scalable than algorithms, she used an example of a robot that crowd sources its knowledge on how to handle objects from other robots’ data. Then she led us to the “data natives” phenomenon – using kids’ familiarity with gadgets as an analogy – data natives speaks to how we’ll very quickly have a generation that will be born with the universe of data at their fingertips and from birth will never need to remember or figure. This marks a new state in our evolution.
- The Kevin Bacon Game for Banking In “How Goldman Sachs is Using Knowledge to Create an Information Edge” Peter Ferns talked about the GS “Big Graph” application and how it is used to build a relationship graph of people, legal entities, organizational entities, transactional data, and banking transactions like M&A. He then detailed how this information is used for compliance (surveillance, investigations and analytics), information security, technology infrastructure management, and customer relationship management. The best thing is that they put this info out in the public at http://www.gs.com/engineering!
- Big People In “Building with Data: Lessons from Etsy” Nellwyn Thomas talked about how she built the data organization at Etsy. She covered how they have 3 groups within the “Data Org” – Data engineering / hadoop team, data science team, and analysts. Then she zoomed in on the specific skill sets that are required for analysts – not just analytical skills (which she defined as the ability to understand the problem and the opportunity), but also math/statistics skills (to understand the data), technical skills (to get, parse, and visualize the data), and communication skills (to communicate what matters, and more importantly to not communicate what doesn’t matter).
- Bad Recommendations Make Angry Customers In “Learning About Music and Listeners” Brian Whitman (@bwhitman) from Spotify gave what was probably my favorite session of the week. He talked about the company he started called The Echo Nest (that was acquired by Spotify) and the work they are doing to move beyond simple collaborative filtering engines (the ‘other customers bought this’ type). Spotify’s goal is to make users loyal by encouraging discovery, understanding that giving bad recommendations really makes music fans angry! They are doing this with content-based recommendations. He talked about the progress they have made in helping computers understand enough music to make recommendations, and the obvious but almost always overlooked basic human fact that we can actually have different “modes” that determine what we might want to listen too! He also went through some fascinating data analysis derived solely from user usage data – like predicting political affiliations. Pretty cool stuff. Best of all was the fact that the entire Echo Nest API and a million song data set are available for anyone to use for research purposes on http://developer.spotify.com.
- Knowledge is Dangerous Last but not least, the venerable Shankar Vedantam (@HiddenBrain), author of “The Hidden Brain” and the Social Science Correspondent on NPR warned us all that more data doesn’t necessarily make us smarter or better. In fact, what the data shows is that the more knowledge we have, the more we amplify our own existing biases into stronger positions because we ultimately are really good at cherry picking what we want to believe. It was a bit sobering given the tone of the conference, but a practical message nonetheless.
This is just a summary of some of the great content at the conference, and I’m leaving out a great deal. I also spent a ton of time in our booth talking to some really fascinating customers and learning about what they are doing (and of course doing a bit of selling). Bottom line is that Big Data is not only big, but it has real/broad applications beyond the typical web crawling for the search crowd -including customer profiling for marketing/sales across a variety of industries, content and goods recommendations for eCommerce and online media, fraud detection/compliance for banking, resource allocation / inventory planning for retailers/manufacturers, and of course solving world hunger and making the world a better place!
All of this will ultimately drive new infrastructure designs and decisions as the data sets get larger, the users get more diverse and more demanding, and the expectations to provide real-time analysis across many data sets becomes more possible.