Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

The Cold Start Problem

DZone's Guide to

The Cold Start Problem

A discussion of how big data experts can begin to solve problems before they even had data to work with! Read on for more.

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

How do you operate a data-driven application before you have any data? This is known as the cold start problem.

We faced this problem all the time when I designed clinical trials at MD Anderson Cancer Center. We used Bayesian methods to design adaptive clinical trial designs, such as clinical trials for determining chemotherapy dose levels. Each patient's treatment assignment would be informed by data from all patients treated previously.

But what about the first patient in a trial? You've got to treat a first patient, and treat them as well as you know how. They're struggling with cancer, so it matters a great deal what treatment they are assigned. So you treat them according to expert opinion. What else could you do?

Thanks to the magic of Bayes theorem, you don't have to have an ad hoc rule that says, "Treat the first patient this way, then turn on the Bayesian machine to determine how to treat the next patient." No, you use Bayes theorem from beginning to end. There's no need to handle the first patient differently because expert opinion is already there, captured in the form of prior distributions (and the structure of the probability model).

Each patient is treated according to all information available at the time. At first, all available information is prior information. After you have data on one patient, most of the information you have is still prior information, but Bayes' theorem updates this prior information with your lone observation. As more data becomes available, the Bayesian machine incorporates it all, automatically shifting weight away from the prior and toward the data.

The cold start problem for business applications is easier than the cold start problem for clinical trials. First of all, most business applications don't have the potential to cost people their lives. Second, business applications typically have fewer competing criteria to balance.

What if you're not sure where to draw your prior information? Bayes can handle that too. You can use Bayesian model selection or Bayesian model averaging to determine which source (or weighting of sources) best fits the new data as it comes in.

Once you've decided to use a Bayesian approach, there's still plenty of work to do, but the Bayesian approach provides scaffolding for that work, a framework for moving forward.

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

Topics:
bayesian ,big data ,data analysis ,cold starts ,probability and statistics

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}