Over a million developers have joined DZone.

Conjecture: Scalable Machine Learning in Hadoop with Scalding

DZone's Guide to

Conjecture: Scalable Machine Learning in Hadoop with Scalding

· Big Data Zone ·
Free Resource

How to Simplify Apache Kafka. Get eBook.

When it comes to predictive modeling and machine learning, the most obvious product of engineering work that is seen client-side are those tailored ads: they scour your internet behavior and feed you content based on your preferences. This type of framework becomes particularly important on e-commerce platforms in recommending related purchases and other behaviors.

A blogger from the Etsy engineering team shared some of their process in a post about scalable machine learning.  

...we use predictive machine learning models to estimate click rates of items so that we can present high quality and relevant items to potential buyers on the site.  This estimation is particularly important when used for ranking our cost-per-click search ads, a substantial source of revenue. In addition to contributing to on-site experiences, we use machine learning as a component of many internal tools, such as routing and prioritizing our internal support e-mail queue.  By automatically categorizing and estimating an “urgency” for inbound support e-mails, we can assign support requests to the appropriate personnel and ensure that urgent requests are handled by staff more rapidly, helping to ensure a good customer experience.

The way in which they set up predictive machine learning operates on three basic premises:

  1. Java classes which define the machine learning models and data types.

  2. Scala methods which perform MapReduce training using Scalding.

  3. PHP classes which use the produced models to make predictions in real-time on the web site.

The modeling laid out in the rest of the article  is only a small part of what Etsy does both externally and internally to utilize the large amount of data that passes through its hands every day.

12 Best Practices for Modern Data Ingestion. Download White Paper.


Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}