Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Using Python for Big Data Workloads (Part 1)

DZone's Guide to

Using Python for Big Data Workloads (Part 1)

Get several resources on using Python for Big Data workloads, and learn about various programming SDKs, APIs, and libraries.

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

In Part 2, we will look at Python for Spark (PySpark), Machine Learning, and deep learning in depth. In this first part, we'll go over the basics, some examples, and some tutorials to get you started.

Get the latest Python for your environment — Linux, OSX, and even Windows are supported.   There's a debate whether to finally move to Python 3.x; try it and see if it works for all your tools. Since my Hadoop installation has Python 2.7, I am going to use that for my work.

Python is great. I can run it for Machine Learning, websites, from NiFi, deep learning, and stitching together a lot of jobs. Using Apache Zeppelin, I can run Python and PySpark without installing it and tons of modules on my developer workstation.

Python Resources

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

Topics:
python ,big data ,spark ,machine learning ,deep learning

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}