This post was originally authored by Nati Shalom.
Most of the Big Data topics that I covered in the past were focused mostly on the technical aspects of running Big Data.
In this post, I want to take a different route and discuss the business challenges based on a panel discussion that I had the pleasure of moderating.
Is Big Data a Big Hype?
One of the first questions that came up in the discussion is whether Big Data is over-hyped due to over-use in marketing content. Many of those who claim to provide either a Big Data solution or implement Big Data are just using the term to re-brand their product or offering.
There was consensus across the panel that while Big Data is probably over hyped like any big trend, it does represent a real change in both technology and business. Let me explain:
In a pre-Big Data world, Big Data lived in expensive and closed data warehouses, where many of the solutions were often tied to specific, high end hardware. In addition, the term “warehouse” in itself meant that most of the data had little exposure and impact on the business. Today, Big Data is everywhere and can be done at a fraction of the cost of the traditional data warehouses. With the addition of cloud the barrier to running Big Data has become significantly lower and now every startup can run Big Data almost at the scale of Google.
Having said that, the actual adoption of Big Data tends to vary quite significantly per industry with telecommunications (67%), consumer goods (57%) and financial services (52%) industries as the leaders in recognizing that Big Data can greatly improve their understanding of customer needs according to a recent Hitachi Data Systems Corporation (HDS) study .The same study indicates that more than 60% of the firms in the financial services and consumer goods industries haven't started any Big Data programs. Healthcare and life science are lagging further behind; 72% of them haven't started any Big Data programs. This means that the perspective of whether Big Data is real or just marketing hype tends to vary as well depending on the industry.
How do you make sense out of your Big Data? Do we need a new role for Chief Data Officer?
Many of the questions in the panel had to do with how to make sense of the data.
One of the common strategies to make sense of the data is to assign a data scientist or Chief Data Officer, expecting them to deal with the data and offer useful insights. Another common strategy is to put all the data in something like Hadoop with the hope that having the data in hand will provide a variety of interesting insights that will change the way we operate.
Putting the data in Hadoop or assigning a specialized data analyst are often just the beginning of a process.
Eyal Veitzman from Wix described how they deal with the challenge. To really make sense of your data, you first need to change the organizational structure in which the business, developers and data analysts will be part of the same group. In this way you can turn insights into action and more importantly have a faster feedback loop that will allow you to iterate quickly.
What is the business value behind Big Data?
The business value behind Big Data comes in different shapes and forms depending on where you apply it in your system. Insightera uses Big Data to capture user interaction and increase the conversion of users using personalized responses. Wix uses Big Data to optimize their user conversion and satisfaction. Nice Systems uses Big Data to improve their call center user satisfaction. PayPal is using Big Data in all sort of shapes and forms to both detect fraud as well as to optimize their operations. Check Point uses Big Data to collect data from all their gateways and in that way keep their gateways protected from new virus attacks
So how much can we gain from applying Big Data practices in our organization?
According to the Hitachi Data Systems Corporation (HDS) study, Big Data can help companies' revenue by 25%. This applies mostly to companies that use Big Data to optimize their internal operations and products, like most of the companies in the panel. That is obviously more true for companies that sell Big Data insights as part of their business, as in the case of Insightera.
Is there a good visualization tool for Big Data?
Big Data visualization is a tough challenge and there is no simple answer or single framework that fits them all. Using infographic style can help to visualize data in a way that makes it easier to capture lots of data into a single, readable report using a variety of visualization dimensions, such as fonts, colors, graphics, etc. There are many open source frameworks, such as Jaspersoft and many SaaS-based frameworks, aimed to provide specialized reporting and visualization tools. A useful reference on that regard can be found here here: The 36 best tools for data visualization.
Having said that, the important thing is to decide what and how to measure. Quite often, the number of Key Performance Indicators (KPIs) must be limited to a fairly small number.
The second important thing is to realize that as humans, we are not designed to capture and process vast amounts of information and therefore, visualization tools provide only a small aid to a much bigger challenge.
In summary, there is a dangerous slippery slope with Big Data visualization; fairly quickly we can flood our users with many graphs and numbers to a point where what we actually get is a dimension return and more complexity and confusion than clarity. Those who had experience with analytics tools such as Google Analytics are probably familiar with that experience. On the other hand, narrowing down the key performance indicators into a small number of KPIs makes the visualization challenge simple.
Moving from Reports to Actions
The key in capturing most of the value of Big Data is to turn Big Data into actions. In the example of Insightera, actions means presenting a user with the most relevant information on a website. In the case of Nice Systems, actions would refer to routing the caller in a call center to the right agent based on the caller’s profile and history of behaviour. In the case of GigaSpaces, this would mean constantly adjusting our provisioning system to detect failures and auto-heal a production environment when something goes wrong.
To get to that level we need to apply automation practices on how we collect, analyze and execute actions out of our data. We also need to be able to process data in real time to reduce the time to action.
Final words: How do make our data instantly actionable?
It is quite clear that the end game of any Big Data initiative is to figure out a way to make it instantly actionable. One of the big challenges on that regard is that quite often we need to compromise between scale and speed. In fact, many of the Big Data systems (Hadoop included) use batch processing to be able to process large amounts information, meaning that the actual insights, and thus actions, from that data will be available only when the batch cycle has been completed.
To make Big Data instantly actionable we need to figure out a way in which we can process vast amounts of data in real time and still keep the cost down.
During the panel discussion, PayPal described their use of In-Memory based architecture to handle this challenge. Still, pure In-Memory-based solutions tend to be fairly expensive and limited in capacity comparing to disk based alternatives. So the next big thing IMO is to devise a way in which we can get In-Memory speed at a cost that is closer to Disk. This is where new advancements in SSD have the potential to disrupt the entire way in which we manage and process Big Data, by providing high capacity durable RAM based out of SSD that can reach terabytes capacity per single machine. The fact that it can be durable means that we could rely completely on In-Memory databases to manage and store larger chunks of our Big Data with no need to offload or reload the data to and from external disks. I also expect that this will lead to the rise of a new generation of In-Memory databases that will be designed primarily for this new kind of durable RAM devices.
About the Panel
The Big Data Business challenges panel was led by the CICC and in collaboration with IGTCloud, focusing primarily on the business challenges side of Big Data. The event was held simultaneously in California and Israel and was hosted by eBay. The panel included a range of industry experts in Big Data from a wide range of companies and industries and included Allen Kamer, Chief Commercial Officer, Optum Analytic (Co-founder of Humedica that was acquired by Optum Health), Doron Simon, Head of Cloud, Business Development, Nice Systems, Mickey Alon, Co-Founder & CEO, Insightera (Acquired by Marketo), Dr. Nachum Shacham, Principal Data Scientist, PayPal, Eyal Veitzman, Director of Business Operations, Wix and Ron Davidson, Head of Security Services, Check Point.
I had the special honor to organize and moderate the panel together with Ido Sarig, General Manager, IoT Solutions Group while Wind River moderated the panel in California.
Telecommunications (67%), consumer goods (57%) and financial services (52%) industries are leaders in recognizing that big data can greatly improve their understanding of customer needs. However, more than 60% of the firms in the financial services and consumer goods industries haven't started any big data programs. Healthcare and life science are lagging further behind; 72% of them haven't started any big data programs. http://timesofindia.indiatimes....
The usual factors: the same data explosion that created the urgency for Big Data is also generating demand for making the data instantly actionable. Bandwidth, commodity hardware and, of course, declining memory prices, are further forcing the issue: Fast Data is no longer limited to specialized, premium use cases for enterprises with infinite budgets. Not surprisingly, pure in-memory databases are now going mainstream: ..
..Potential use cases for Fast Data could encompass:
- A homeland security agency monitoring the borders requiring the ability to parse, decipher, and act on complex occurrences in real time to prevent suspicious people from entering the country
- Capital markets trading firms requiring real-time analytics and sophisticated event processing to conduct algorithmic or high-frequency trades
- Entities managing smart infrastructure which must digest torrents of sensory data to make real-time decisions that optimize use of transportation or public utility infrastructure
- B2B consumer products firms monitoring social networks may require real-time response to understand sudden swings in customer sentiment
- The 36 best tools for data visualization
A useful reference for popular visualization frameworks