How to Avoid the Big Data Black Hole
How to Avoid the Big Data Black Hole
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
It takes a lot of will power, in our data obsessed world to say “too much!” However, there are many ways where too much information is destroying productivity, and actually causing bad decision making, not good. But it is hard to avoid the world of opportunities that has been opened in data collection and analysis. So how do you balance the two? The first step is to understand there is a big difference between data collection, and it’s utilization. While it seems subtle, the difference is key, and utilization is where many make mistakes.
Many confuse data collection and data utilization as the same thing, or at least being very similar.
Collection merely indicates the storage and analysis of data by automated systems, maybe never viewed by human eye or mind. It is when you spend the time to access and understand the data collected where there is a human cost.
“When information is cheap, attention becomes expensive.” ― James Gleick
New tools mask human effort by taking raw data (in hard to consume formats) and making it much easier to understand, and more importantly; query upon. This makes users feel like they need to look at it all, and the UI is saving them all the time they need. But, in most cases, we did not previously collect this information, nor analyze it. So the opportunity is higher, but any effort is net new.
When I get in front of a great log management tool, I sometimes get that same feeling as when I get addicted to a new iPad game; crushing candies, or tending to farms. Because there is a sort of gamification when you dig, and if I just dig a little deeper I will reach the goal. Next thing you know, you have spent three hours without any results.
“Information is not knowledge and knowledge is not wisdom.” ― James Gleick,
It is a people problem too
There is effort required to first understand, and then act on the information we receive. But the enthusiasm for learning more, and “information is power” creates a sort of blindness. A blindness that skirts the need to quantify our effort spent in analyzing the data against the results (decisions, actions items) that from it. To make things worse, the amount of data it takes for one decision is often way more than we realize.
So, what is the impact of spending too much time trying to utilize the new pile of information?
- Time suck: Being taken away from daily tasks, by asking a question of the platform that takes way more time than expected.
- Dwelling on details that do not impact the business: Leading to zero actionable results.
- The anchoring effect: If you are familiar with the psychological anchoring effect, where new decisions are based on an existing foundation of positions, and opinions. With all the data in your log analysis platform, if you are not aware of anchoring, you will the ask questions that produce the answers you seek.
- “That is what the numbers say ….” Platforms that only produce time-series dashboards and numbers lead to the habit of using this data as an default answer to all questions, and deflecting mechanism. It’s easy to put the decisions on the system, instead of the people who were previously responsible for them.
These are not light topics I know, but I have seen time and time again where the abuse of data, and the time wasted on collecting it eats organizations from within.
Does the tool save your time or steal your eyeballs?
Part of this whole situation has a lot to do with the tools. Tools can be good or bad at surfacing information. All the latest APM, Log Analysis, Error Logging, and usage analytics tools have ways for you to dig-in deep. Putting the burden of when enough is enough on the user. Who often is too absorbed to keep track.
In fact, many of these tools encourage you to go deep, eventually spending hours sifting thought information that provides no real value real, but perhaps a lot of little “oh that is interesting”. Those little hits are just like pulling the lever on a casino machine. If you have to become specialized to use the platform, and make it a large portion of your daily effort, you are committed. After all, churn is a major concern for tech companies. But should it be?
“Business is not just doing deals; business is having great products, doing great engineering, and providing tremendous service to customers.” – Ross Perot
A great tool should help you get the data, make decisions (or prepare them for the expanded team), and move on.
It should prevent you from entering the data black hole, and guide you to what is most useful. The tool should also put a layer between you and the data, not to encourage the diving-in too deep which we know can happen quickly.
Here are some ways to avoid these problems:
- Hire a dedicated analysis person: Definitely a nice-to-have. But really, this is just a workaround for a bigger problem.
- Ask discrete questions: If you keep your questions simple and small, then you can move faster. But we all know that there are bigger questions we want answered.
- Plan your logs in advance for utilization, not collection: If you think about how you want to use the information, instead of just gathering data, you will make the analysis go much faster. In a crunch, this seems like a wasted effort.
- Focusing: There are a lot of “Fun” questions you can ask. But time is better spent on questions decided-on in advance and related to current initiatives and activity. At the same time, wouldn’t it be nice for the data to tell you what is interesting without you having to predict it in advance?
- FIND THE RIGHT TOOL! Many analysis tools only help “send information and put in a nice format ”, aka data collection. While this is a must, the focus of the tool should be on utilization. And this should be their roadmap focus.
Find it, act on it, move on
Part of creating this value layer is discovering what is happening, but also what is not happening, and what is out of normal. Watching the flow of data can be memorizing, but decisions are made on things that are not consistent with the steady state of systems and applications. Or with sudden changes. Things that normally occur in the system, but for some recently did not.
Data collection should be synthesized into meaningful events. Getting users addicted to a platform by the quality and frequency of decisions versus encouraging them to spin the wheel to see what happens and becoming a 5th limb.
First, admit to yourself that there is too much information, more than you could every analyze on your own. And then find a tool that takes over analysis, and only shows you what is valuable so that you and your team can get on with your day.
Published at DZone with permission of Trevor Parsons , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.