Over a million developers have joined DZone.

Conjecture: Scalable Machine Learning in Hadoop with Scalding

· Big Data Zone

Learn about how to rapidly iterate data applications, while reusing existing code and leveraging open source technologies, brought to you in partnership with Exaptive.

When it comes to predictive modeling and machine learning, the most obvious product of engineering work that is seen client-side are those tailored ads: they scour your internet behavior and feed you content based on your preferences. This type of framework becomes particularly important on e-commerce platforms in recommending related purchases and other behaviors.

A blogger from the Etsy engineering team shared some of their process in a post about scalable machine learning.  

...we use predictive machine learning models to estimate click rates of items so that we can present high quality and relevant items to potential buyers on the site.  This estimation is particularly important when used for ranking our cost-per-click search ads, a substantial source of revenue. In addition to contributing to on-site experiences, we use machine learning as a component of many internal tools, such as routing and prioritizing our internal support e-mail queue.  By automatically categorizing and estimating an “urgency” for inbound support e-mails, we can assign support requests to the appropriate personnel and ensure that urgent requests are handled by staff more rapidly, helping to ensure a good customer experience.

The way in which they set up predictive machine learning operates on three basic premises:

  1. Java classes which define the machine learning models and data types.

  2. Scala methods which perform MapReduce training using Scalding.

  3. PHP classes which use the produced models to make predictions in real-time on the web site.

The modeling laid out in the rest of the article  is only a small part of what Etsy does both externally and internally to utilize the large amount of data that passes through its hands every day.

The Big Data Zone is brought to you in partnership with Exaptive.  Learn how Rapid Application Development powers business. 

Topics:

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}