Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Grow a Data Tree Out of the Big Data Swamp

DZone's Guide to

Grow a Data Tree Out of the Big Data Swamp

If you don’t know what you want to get out of the data, how can you know what data you need — and what insight you’re looking for?

· Big Data Zone
Free Resource

See how the beta release of Kubernetes on DC/OS 1.10 delivers the most robust platform for building & operating data-intensive, containerized apps. Register now for tech preview.

Big Data. Everyone’s paying for it, collecting it, and talking about it, but what are companies actually doing with it? There’s a lot of data out there, but not too much actionable insight — yet. And that’s the point, isn’t it? Shouldn’t we be using this seemingly unlimited stream of consumer and user data to drive smarter business decisions?

Many companies have turned into pathological data hoarders without much to show for it.

Even in 2015, Forbes foresaw the average investment for data-related initiatives to cost an average of $7.4 million. That amount is likely higher now — and nearly 60% of these projects fail according to Gartner. That is a lot of wasted dollars. And now, naysayers are chirping about the death of big data.

How did this happen? Theory vs. execution. Instead of idyllic “data lakes” – curated, centrally connected, and ripe for pinpoint, targeted analysis — we’ve ended up with “data swamps,” filled to the edges and too muddy to offer any visibility into the insights that we’re all fishing for. But big data isn’t dead — most organizations just haven’t figured out how to manage it.

If you don’t know what you want to get out of the data, how can you know what data you need — and what insight you’re looking for?

I even wrote on LinkedIn that the “hoard it all and sort it later” approach is a bit backward. So, I recommend the “data tree” approach (not to be confused with the decision tree, of course). It starts with a thoughtful look at the questions/issues that are keeping you up at night. What are the business problems you’re trying to solve? Next, you must:

  1. Prioritize your business issues
  2. Prioritize difficulty of assembling the first data breadcrumbs
  3. Identify highest value issue that can be solved with the least amount of effort

This approach gives you a path to a “win,” and a better understanding of how to collect and analyze your data. Once you’ve selected a specific target issue, investigate all factors that could have an impact on it. For example, if your big issue is shipping/delivery delays, those factors can be anything from tired drivers and speed limits to weather and traffic patterns. Data for each of those should be relatively easy to assemble and should be correlated.

Using the same example, you then focus on the factor with the highest correlation (let’s say traffic), and branch out from there — what contributes to that factor? In this case, weather, time of day, location, etc. And again, you assemble another data breadcrumb or discard it. And just like you would be growing and grooming a tree, you continue to break down each factor and sub-factor.

This will ultimately provide a holistic view of every data point that contributes to your main issue — and more importantly, which ones you can control or impact. Best of all, it eliminates the muddiness of data noise, which is extraneous information that doesn’t have an impact on your key business issues.

Once your data tree grows, you’ll end up with big, smart, healthy data.

Putting your issues first is the best way to mine your data for actionable insight — and to get the best return on your investment in big data.

New Mesosphere DC/OS 1.10: Production-proven reliability, security & scalability for fast-data, modern apps. Register now for a live demo.

Topics:
big data ,data swamp ,data tree

Published at DZone with permission of Wolf Ruzicka. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}