Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Crowdsourcing Our Way to Better Food Hygiene

DZone's Guide to

Crowdsourcing Our Way to Better Food Hygiene

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

The last few years has seen a tremendous boom in the number of sources online relaying information about restaurant quality.  Whether it’s review sites or more general social media, there is no shortage of feedback on how people have found a particular restaurant.

I wrote a few years ago about a project from the University of Rochester that aimed to mine Twitter for mentions of eating out, with the hope of producing a detailed and comprehensive map of food hygiene standards throughout restaurants in New York.

The system, called nEmesis, analyzed millions of tweets, and was on the hunt for people sharing an attack of food poisoning after visiting a restaurant.  You might think, or hope at least, that this would be a relatively small number, but over a four month period they found 480 such mentions in New York City alone from a total of 23,000 restaurant visitors.  What’s more, the data collected correlated well with public health data on those diners.

Crowdsourcing food hygiene

A recent Harvard led project is hoping to provide similar assistance to the Boston food hygiene authorities by providing more intelligent information for the authorities to base their inspection checks on.

Rather than using Twitter for data however, the Harvard project is turning to the review website Yelp.  They have launched a NetFlix style competition to create an algorithm that can search through the ratings of restaurants in Boston and produce recommendations for which restaurants warrant a visit from the hygiene police.

The competition, organized by the data company DrivenData, will see the raw data posted online and then an army of data scientists charged with solving the puzzle.

The founders observed that whilst the collection of machine readable data was now mandated by the government, there was a literacy problem that rendered much of that data left dormant and unused.

Bringing data science to the masses

And so the competition was born to try and make data science affordable for organizations with a clear social need but no budget to afford what are still very expensive skill sets.

Of course, the food hygiene challenge is but one of the challenges on the DrivenData website, with the venture coming along way from their first challenge to make a better algorithm for improving spending in schools.

The organization try and ensure that whatever winning entries emerge from the competitions receive support and help to grow and improve.  The winner of that initial competition, for instance, eventually turned their algorithm into a software tool for schools to use.

The eventual aim is to establish a community of data scientists that are happy to deploy their talents for socially worthwhile endeavors.

“Our mindset has grown; we want to solve the big-picture data literacy and data capacity problems in the social and public sectors,” the creators say. “We think competitions are a great mechanism to do that right now, but our goal is to do more, to serve that community in other ways.”

Suffice to say, challenges have come a long way from their beginnings in the 18th century when the UK government launched such a competition to help find longitude more easily.

The likes of the X Prize has taken them to newfound heights, and it’s great to see organizations like DrivenData apply the concept to more manageable challenges.

Of course, they aren’t the only organization seeking to make algorithms more accessible.  I wrote last year about the Algorithmia social network, which aims to connect up organizations with lots of data with algorithms that are being under-utilized.  The aim is that this match up will create not just new insights but extra profits.

Data science is undoubtedly a burgeoning field, and it’s one with a great many exciting developments in it.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
big data ,twitter ,analytics ,data science

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}