What’s Wrong With Big Data
What’s Wrong With Big Data
Thanks to big data and the cloud, the power of supercomputers is for the taking. What we lose in the mix is the fact that the tools we use to analyze and apply big data usually has a fatal flaw.
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
Big data may be the technology getting all the buzz nowadays, but that does not mean that it is infallible. Big data has wreaked havoc in many situations, yet the exact reasons are not always clear. It could be the detection of false positives, technical glitches, lack of tools, shabby data, incorrect data, or even unnecessary data.
Needless to say, if you have some of the errors mentioned above, the results will be completely different from what you were expecting. To make matters worse, the results are sometimes not analyzed, which can result in some unpleasant consequences.
Flaws of Big Data
Thanks to big data and the cloud, the power of supercomputers is anybody's for the taking. However, what we lose in the mix is the fact that the tools we use to interpret, analyze, and apply this tsunami of information usually has a fatal flaw. Most of the data analysis we conduct is based on erroneous models, which means that mistakes are inevitable. And when our overblown expectations exceed our capacity, the consequences can be dire.
If big data was not so ginormous, this would not be such a problem. Given the volume of data that we have, we are sometimes able to use even flawed models to produce sometimes useful results. The issue here is that we often confuse the results with omniscience. We are enamored with our own technology, but when the models go haywire, it can get pretty ugly — especially when the mistakes the data produces are proportionally as large.
Examples of Big Data Failures
Perhaps the largest and most well-known big data flop was in 2013 with Google Flu Trends. Google launched this service in 2008 with the goal of predicting flu outbreaks in 25 countries. The logic was simple: analyze Google search queries about the flu in a given region. The next step was to compare the search results with a historical record of flu activity in that geographical area. Based on these results, the activity level was classified as either low, medium, high, or extreme.
Even though at first glance, this may seem like a cool idea, in reality, it was not. At the height of the 2013 flu season, Google Flu Trend failed miserably. In fact, it was off by an astounding 140%. The reason was that the algorithm was flawed and did not take into account several factors. For example, if people were searching words such as “cold” or “fever,” this does not necessarily mean that they were searching for flu-like symptoms. They could have been searching for seasonal illnesses. Unfortunately for Google Flu Trends, it could not recover from this disaster, which ultimately led to its demise in 2013.
Reasons Why Big Data Fail
The unmitigated disaster that was Google Flu Trends is by far not the only one. It is not possible to list all of the blunders of big data over the years; however, it is important that we analyze the failures so we can learn our lesson and never repeat our mistakes in the feature. Some of the reasons for big data failures are the following.
Lack of Data Governance and Data Management
Very often, organizations do not fully understand the data that they already have, yet they still decide to undertake new projects based on it. There is a lack of documentation, storage, policy, and other procedures regarding data handling. It is a good idea to turn to a big data consulting company so that you can provide your business with a clear roadmap and instructions on how to handle the data that you already have and only after that try to conquer the challenges of big data.
Undefined Goals and Strategies
There is a lot of IT terminology and marketing terms out there, and it can be difficult to make some sense out of all this white noise. Furthermore, there are a lot of big data products out there on the market and it is really difficult to choose the right one. Before you decide on anything, it is important to figure out what services and technologies you need to accomplish your goals. “Do small data on big data” — that means that you should evaluate your big data architecture on small amounts of data to ensure that you choose the right products.
It’s All Greek to Me
Data science and big data are a complex combination of domain knowledge, mathematical, and statistical expertise and programming skills. Yet, at the same time, it must make business sense. Usually, the IT department will make changes that management does not understand and vice-versa. Make sure that your big data actions make sense to both IT and business leaders. Build a bridge between IT and the business in the big data project. Business people should be deeply involved in any of the stages of a big data project.
Too Big Too Soon
When you first start implementing big data projects, there are a lot of undefined factors — such as budget, technologies, courses of action, etc. When you start a big project right away early on, it is doomed to fail. Instead, opt for a small project and measure the success (or lack thereof) incrementally. This way, if something goes wrong, you will be able to notice it immediately and make the necessary adjustments before it dooms the project. A good way to benchmark your progress is to create prototypes or proofs of concept to validate the work you've accomplished. There's no point in advancing to the next stages of a project if there are flaws in the early stages.
Lack of IT Talent
Finding and hiring the people you need to successfully complete a project is a daunting task, but the people handling your data are a vital component of the overall project. Moreover, they must be well-versed in new technologies, which is a challenge given the fast-paced IT environment.
A common theme that we notice from the list above is that no matter how much we want to focus on the data, people keep getting in the way. Even though we want data to rule the decision-making process, people ultimately rule the big data process. This includes making basic decisions such as which data to collect and keep and which answers they seek from big data.
Innovation via Iteration
Many organizations feel constrained when they decide to undertake a big data project, which is why it is vital to take an iterative approach to big data. Organizations should try to find ways to set their employees free to experiment with data. The “start small, fail fast” approach is enhanced by the fact that most significant big data technology is open-source. Also, lots of platforms are immediately and affordably accessible as cloud services, thus lowering the bar even further to a trial-and-error method.
Big data is all about asking the right questions, so relying on existing employees is critical. However, even with superior domain knowledge, organizations still will not correct the necessary data and the will not ask the proper questions from the very beginning. Such failures should be accepted and expected.
Since the early stages of your big data project can make or break the entire thing, this is where the advice of big data consultants can really pay off. They can advise you on how to create prototypes and proofs of concept, benchmark your efforts, help create a microservice architecture, and help you migrate to new technologies. It's important to employ flexible, open-data infrastructure that lets employees constantly modify and perfect their approach until they reap the fruits of their toils. This way, organizations can eliminate fear and can iterate towards the effective use of big data.
Published at DZone with permission of Maxim Tereschenko . See the original article here.
Opinions expressed by DZone contributors are their own.