Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

MADlib: Big Data Machine Learning in SQL for Data Scientists

DZone's Guide to

MADlib: Big Data Machine Learning in SQL for Data Scientists

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

Data scientists and others working with Big Data may be interested in MADlib, an open-source framework for Big Data machine learning in SQL. As of November 25th, the framework is on version 1.4, and includes such features as support for Postgres, Pivotal Greenplum Database, and Pivotal HAWQ, a commercially usable BSD license, and more. According to the MADlib team, the key philosophies in MADlib's development are the following:

  • Operate on the data locally—in database. Do not move it between multiple runtime environments unnecessarily.

  • Utilize best of breed database engines, but separate the machine learning logic from database specific implementation details.

  • Leverage MPP Share nothing technology, such as the Pivotal Greenplum Database, to provide parallelism and scalability.

  • Open implementation maintaining active ties into ongoing academic research.

Check out MADlib's full site for more details. You can also find a list of features, as well as documentation, if you want to learn more.

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

Topics:

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}