Over a million developers have joined DZone.

Blaze: A Python Compiler for Big Data

DZone's Guide to

Blaze: A Python Compiler for Big Data

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

Python developers working with NumPy or Big Data in general might be interested in Blaze, a Python library created by Continuum Analytics and referred to by Stephen Diehl as "the next generation of NumPy." Blaze expands on NumPy's array structures by utilizing a variety of table and array-like structures and supporting a number of new features. According to Diehl:

...Blaze is designed to handle out-of-core computations on large datasets that exceed the system memory capacity, as well as on distributed and streaming data. Blaze is able to operate on datasets transparently as if they behaved like in-memory NumPy arrays.

We aim to allow analysts and scientists to productively write robust and efficient code, without getting bogged down in the details of how to distribute computation, or worse, how to transport and convert data between databases, formats, proprietary data warehouses, and other silos.

Basically, it looks like NumPy, but a bit more flexible and efficient. If you're looking for something different in the world of Python and Big Data, check out the GitHub and the docs.

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.


Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}