Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Blaze: A Python Compiler for Big Data

DZone's Guide to

Blaze: A Python Compiler for Big Data

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Python developers working with NumPy or Big Data in general might be interested in Blaze, a Python library created by Continuum Analytics and referred to by Stephen Diehl as "the next generation of NumPy." Blaze expands on NumPy's array structures by utilizing a variety of table and array-like structures and supporting a number of new features. According to Diehl:

...Blaze is designed to handle out-of-core computations on large datasets that exceed the system memory capacity, as well as on distributed and streaming data. Blaze is able to operate on datasets transparently as if they behaved like in-memory NumPy arrays.

We aim to allow analysts and scientists to productively write robust and efficient code, without getting bogged down in the details of how to distribute computation, or worse, how to transport and convert data between databases, formats, proprietary data warehouses, and other silos.

Basically, it looks like NumPy, but a bit more flexible and efficient. If you're looking for something different in the world of Python and Big Data, check out the GitHub and the docs.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}