Mahout was started by the developers of the Apache Lucene project. The name Mahout comes from the Hindi word for an elephant driver. The term was chosen because of Mahout's association with Apache Hadoop, which has an elephant logo. Earlier this year, the Lucene developers decided to create machine learning libraries and algorithms on top of the Apache's data systems, such as Hadoop. The goal of Mahout is to create a scalable machine learning solution with a commercially friendly license and an active community.
Mahout currently supports four use cases:
- Clustering takes - groups text documents with similar topics.
- Classification - assigns categories to an unlabeled document by learning from existing categorized documents.
- Frequent itemset mining - identifies items that frequently appear together in item groups such as shopping cart contents.
- Recommendation mining - learns from user behavior and recommends related items; the same case as the NetFlix movie recommendation engine.
Version 0.2 of Mahout includes updates for Hadoop 0.20.x and cleaner code. The contributors have added API changes and performance enhancements to the collaborative filtering engine. Other highlights include K-nearest-neighbor and SVD recommenders, Latent Dirichlet Allocation, random forests, and frequent pattern mining using parallel FP growth.
Mahout is still in its early stages, but the 0.2 version is a first step toward easier creation of machine learning systems. When Mahout's machine learning libraries and algorithms become more mature, developers may not have to start from scratch like the developer who built the NetFlicks recommendation engine.