DZone Refcardz: Distributed Machine Learning with Apache Mahout
Join the DZone community and get the full member experience.Join For Free
We are happy to announce the release of DZone's latest Refcard: Distributed Machine Learning with Apache Mahout by Ian Pointer and Dr. Ir. Linda Terlouw. This Refcard is the Essential Cheat Sheet to get you started using the Apache Mahout library for machine learning tasks. It will introduce you to key machine learning concepts and will show you how to install and start using Mahout.
Apache Mahout is a library for scalable machine learning. Originally a subproject of Apache Lucene (a high-performance text search engine library), Mahout has progressed to be a top-level Apache project. While Mahout has only been around for a few years, it has established itself as a frontrunner in the field of machine learning technologies.
This Refcard will present the basics of Mahout by studying two possible applications:
- Training and testing a Random Forest for handwriting recognition using Amazon Web Services EMR
- Running a recommendation engine on a standalone Spark cluster.
And will go over even more! DZone Refcardz help you develop your skills in a wide assortment of development technologies. You can find the rest of our Refcard Library here.
Want to learn more about Machine Learning and Mahout? Check out these articles from DZone's Most Valuable Bloggers:
- A Beginner's Guide To Enhancing Solr/Lucene Search With Mahout’s Machine Learning - Mahout has all the tidbits, advanced and beginner, that can help us do very interesting machine learning (read:math) processing on our giant term-document matrices. For example we can cluster the rows, use similarity metrics to calculate distances between our term vectors. We can fill in some of the empty spots in our sparse vectors by detecting relationships between terms (“banana” and “peel” commonly cooccur, lets score this document for banana too!) and much much more.
- Top 4 Machine Learning Use Cases for Energy Forecasting - This article explores the top 4 machine learning use cases for energy forecasting. The idea behind forecasting is to make predictions about future events. Some of the other areas apart from energy forecasting where probabilistic forecasting is used are weather forecasting and sports betting.
- What is Scalable Machine Learning? - The terms “scalable” and “large scale” have been used in machine learning circles long before there was Big Data. There had always been certain problems which lead to a large amount of data, for example in bioinformatics, or when dealing with large number of text documents. So finding learning algorithms, or more generally data analysis algorithms which can deal with a very large set of data was always a relevant question.
Opinions expressed by DZone contributors are their own.