Over a million developers have joined DZone.

Apache Mahout Tackles A.I.

DZone 's Guide to

Apache Mahout Tackles A.I.

· Java Zone ·
Free Resource
Artificial intelligence is a term frequently associated with science fiction, not software development.  However, A.I. is becoming increasingly viable as a business tool.  In development, A.I. is more commonly referred to as "machine learning".  Writing a machine learning system can be very profitable if it's done well.   In September, an independent developer won $1million for building a movie recommendation engine for NetFlix. Flight Caster received strong praises for its flight delay prediction system.  These types of machine learning systems are not easy to build, but this year, Apache started working on a new project that would provide the tools needed for building a scalable machine learning system.  The project is named Apache Mahout and it recently released version 0.2, which is the first usable release. 

Mahout was started by the developers of the Apache Lucene project.  The name Mahout comes from the Hindi word for an elephant driver.  The term was chosen because of Mahout's association with Apache Hadoop, which has an elephant logo.  Earlier this year, the Lucene developers decided to create machine learning libraries and algorithms on top of the Apache's data systems, such as Hadoop.  The goal of Mahout is to create a scalable machine learning solution with a commercially friendly license and an active community. 

Mahout currently supports four use cases:
  • Clustering takes - groups text documents with similar topics.
  • Classification - assigns categories to an unlabeled document by learning from existing categorized documents.
  • Frequent itemset mining - identifies items that frequently appear together in item groups such as shopping cart contents.
  • Recommendation mining - learns from user behavior and recommends related items; the same case as the NetFlix movie recommendation engine.

Version 0.2 of Mahout includes updates for Hadoop 0.20.x and cleaner code.  The contributors have added API changes and performance enhancements to the collaborative filtering engine.  Other highlights include K-nearest-neighbor and SVD recommenders, Latent Dirichlet Allocation, random forests, and frequent pattern mining using parallel FP growth.

Mahout is still in its early stages, but the 0.2 version is a first step toward easier creation of machine learning systems.  When Mahout's machine learning libraries and algorithms become more mature, developers may not have to start from scratch like the developer who built the NetFlicks recommendation engine. 

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}