Over a million developers have joined DZone.

A better way to tackle all that data

DZone's Guide to

A better way to tackle all that data

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

The single biggest challenge any organization faces in a world awash in data is the time it takes to make a decision. We can amass all of the data in the world, but if it doesn’t help to save a life, allocate resources better, fund the organization, or avoid a crisis, what good is it? Hampered by a shortage of qualified data scientists to perform the work of analysis, big data’s rise is outstripping our ability to perform analysis and reach conclusions fast enough.

Headed for trouble

At the root of this problem is our concept of what constitutes data. Existing boundaries of what we can digitize and analyze are moving outward every day. Taking Gartner’s prediction that the Internet of Things (essentially, sensors that share data with the Internet) will add 50 billion machine voices to today’s 2 billion connected users, we have to believe that the ability for humans to manage the process of amassing the right data and performing the right analysis is headed for trouble.

The measure of how long it takes analytics to reach a conclusion is often called “time to decision.” If we accept that big data’s holy grail is, as  says in Information Week, better, faster decisions, we have to believe that as data continue to grow in volume, velocity, and variety, making management more complex and potentially slowing time to decision, something has to give.

Machine learning

This is a problem crying out for a solution that has long been in development but only recently has begun to become effective and economically feasible enough for widespread adoption — machine learning. As the term suggests, machine learning is a branch of computer science where algorithms learn from and react to data just as humans do. Machine-learning software identifies hidden patterns in data and uses those patterns both to group similar data and to make predictions. Each time new data are added and analyzed, the software gains a clearer view of data patterns and gets closer to making the optimal prediction or reaching a meaningful understanding.

Turning the problem around

It does this by turning the conventional data-mining practice on its head. Rather than scientists beginning with a (possibly biased) hypothesis that they then seek to confirm or disprove in a body of data, the machine starts with a definition of an ideal outcome which it uses to decide what data matter and how they should factor into solving problems. The idea is that if we know the optimal way for something to operate, we can figure out exactly what to change in a suboptimal situation.

Machine learning in commuter rail

Thus, for example, a complex system like commuter train service has targets for the on time, safe delivery of passengers that present an optimization problem in real time based on a variety of fluctuating variables, ranging from the weather, to load size, to even the availability and cost of energy. Machine-learning software onboard the trains themselves can take all of these factors into account, running hundreds of calculations a second to direct an engineer to operate at the proper speed.

Machine learning in our homes

The Nest thermostat is a well-known example of machine learning applied to very local data. As people turn the dial on the Nest thermostat, it learns their temperature preferences and begins to manage the heating and cooling automatically, regardless of time of day and day of week. The system never stops learning, allowing people to continuously define the optimum.

Machine learning in healthcare

The application of machine learning in health care is essential to achieving the goal of personalized medicine (the concept that every patient is subtly different and should be treated uniquely). Nowhere is this more easily seen than in cancer treatment, where genomic medicine is enabling highly customized therapy based on an individual’s type of tumor and myriad other factors. Here machine-learning algorithms help sort the various treatments available to oncologists, classifying them by cost, efficacy, toxicity, and so forth. As patients are treated, these systems grow in intelligence, learning from outcomes and additional evidence-based guidelines. This leaves the oncologists free to focus on optimizing treatment plans and sharing information with their patients.

Machine learning off the shelf

With the rise of off-the-shelf software, such as LIONsolver, the winner of a recent crowdsourcing contest to find better ways to recognize Parkinson’s disease, machine learning is at last entering the mainstream, available to a wider variety of businesses than the likes of Yahoo, Google, and Facebook that first made big data headlines. More and more businesses may now see it as a viable alternative to addressing the rapid proliferation of data with increasing numbers of data scientists spending more and more time analyzing data. Expect to see machine learning used to train supply chain systems, predict weather, spot fraud, and especially in customer experience management, to help decide what variables and context matter for customer response to marketing.

This piece first appeared on the Harvard Business Review and has been lightly edited.

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.


Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}