Over a million developers have joined DZone.

There’s still a big misunderstanding of Big Data

DZone 's Guide to

There’s still a big misunderstanding of Big Data

· Big Data Zone ·
Free Resource

Big Data confusionSpending time in London and Paris this week for TIBCO’s Transform Conference was a great reminder that there’s still an enormous swath of our world that doesn’t get Big Data.

Look no further than the leak of the NSA’s PRISM data surveillance project and you’ll see that the officials charged with oversight don’t get it when they poo poo it as “just the metadata.” Talk to a London cabbie who says, “What’s the fuss? I have nothing to hide.” and you’ll immediately realize he doesn’t get it (read the link, it’s provocative). Watch the BBC whipping the Brits into a frenzy that U.S. companies can snoop on their conversations and you’ll see they don’t get it.

One British elected official thought the answer was to create a British version of Facebook and Google. Sure…that’ll solve it…good luck with that. 

Not just lay people

The misunderstanding isn’t confined to the U.S. Congress, newsies and London hackies. Just as some are making Big Data sound ominous, business people are confused by the messages of the media and software vendors who would love to make Big Data sound as large and disorganized as possible. This leads to a perception that if data is manageable (and it often can be with the right infrastructure), it can’t be what the world is buzzing about.

Here are some common misperceptions:

  • More data is better – More data is definitely not better and if that data isn’t governed (controlled), it is actually a liability. Securing, cleansing, and refreshing data has a tangible cost.
  • We need data scientists- Just like the arrival of the Web, where it took a team from a consulting company to create a corporate website, the need for data scientists is gradually being overcome by tools that make it far easier to find, visualize and act on enormous data sets. That doesn’t mean über data dinks won’t be valuable, you’re just not dead in the water without them.
  • Everyone has a big data problem – Not even remotely true. Many industries, decades into computerization, still lack the basic infrastructure to capitalize on what they have, much less look to solve problems with Hadoop. The majority of the Big Data success stories to date are coming from retail and web companies but we can expect that to grow.

There’s work to be done that can keep you from getting stranded on the rocks of Big Data. Here are some realities that need to be absorbed that also give you a place to start:

  • First, get your data house in order - Before thinking about Hadoop and “incredible insights,” think about getting your current data infrastructure in order. Despite years of Gartner and Forrester advice to integrate, most large enterprises are still missing the boat.
  • Requirements first - While corporate ‘lab experiments’ sound fun if you’re on the project and learning fast, you probably aren’t delivering the most value to your organization if you don’t have business requirements that can be measured and monitored
  • Rent what you need – Companies like GoodData offer SaaS solutions that let you pay as you go and realize quick return on your company’s data. Why spend big on a project in an area that’s new to you?

UPSBig Data will eventually live up to the hype. There’s simply too much power in aggregation of data and in tweaking the small based on the patterns found in the enormous.

A great example would be the work done by UPS that was highlighted in a Wired storyThe Astronomical Math Behind UPS’ New Tool to Deliver Packages Faster. Keep in mind that few companies operate on the scale of UPS (if any at all) and these numbers won’t be necessarily be meaningful outside of the UPS context:

$30 million—The cost to UPS per year if each driver drives just one more mile each day than necessary. By that same logic, the company saves $30 million if each driver finds a way to drive one mile less.

15 trillion trillion—The number of possible routes a driver with just 25 packages to deliver can choose from. As illustrated by the classic traveling salesman problem, the mathematical phenomenon that makes figuring out the best delivery routes so difficult is called a combinatorial explosion.

55,000—The number of “package cars” (the brown trucks) in UPS’ U.S. fleet. If the figures involved in determining the most efficient route for one driver are astronomical in scale, imagine how those numbers look for the entire fleet.

85 million—The number of miles Levis says UPS’ analytics tools are saving UPS drivers per year.

16 million—The number of deliveries UPS makes daily.

30—The maximum number of inches UPS specifies a driver should have to move to select thethe next package. This is accomplished through a meticulous system for loading packages into the truck in the order in which they’ll be delivered.

200 million—The number of addresses mapped by UPS drivers on the ground.

74—The number of pages in the manual for UPS drivers detailing the best practices for maximizing delivery efficiency.

100 million—The reduction in the number of minutes UPS trucks spend idling thanks in part, the company says, to onboard sensors that helped figure out when in the delivery process to turn the truck on and off.

200—The number of data points monitored on each delivery truck to anticipate maintenance issues and determine the most efficient ways to operate the vehicles.


Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}