Using Python for Big Data Workloads (Part 2)

DZone 's Guide to

Using Python for Big Data Workloads (Part 2)

Check out a continuation of the series on how, where, and why to use Python for Big Data workloads for Machine Learning, Deep Learning, and Big Data.

· Big Data Zone ·
Free Resource

Why should you use Python for Big Data workloads? We have discussed a few reasons, but here, we'll talk about more. Some surveys on the internet are showing that Python is gaining near 50% penetration in the Machine Learning language of choice.

  1. Deep Learning: TensorFlow, Keras, and PyTorch.

  2. OpenCV Python bindings.

  3. NLTK.

  4. PySpark.

  5. Apache Arrow, Parquet, and other project support.

  6. Apache Beam 2.0 support for Python.

  7. Speech recognition.

  8. API support.

  9. Sci-Kit Learn and other cool Machine Learning libraries.

  10. Utilities abound — a PiP away.

Here's a cool GitHub example of using Keras with OpenCV and Python for face detection.

Step one is to install OpenCV with Python.

There's some big news from Google about a new release of Apache Beam 2.0, and Python is now supported. You can now do streaming with Flink, Spark, and more using Python:

pip install apache-beam 

After a simple PiP install, you can run Beam jobs:

python -m apache_beam.examples.wordcount --input MANIFEST.in --output counts 

Check out some details on speech recognition with Python, Python support for upcoming Apache Arrow and Parquet, and some cool Spark SQL code and UDF with Python.

OpenCV has a great Python library and tons of fun examples that work with robots, cars, and drones.

Cool Python image utilities are very abundant for all types of graphic and image manipulation

APIs are everywhere for Python! Here's an example on Spotify. There's also libraries for Facebook, Twitter, Instagram, Google services, Amazon services, Microsoft services, and tons of other feeds and services.

TensorFlow, NLTK, and Stanford CoreNLP have a Python wrapper, and TextBlob has so many cool libraries, utilities, and helpers. They're all easy to install, and most are well-documented.

Python also has SciKit-Learn, which is great and has a ton of great Machine Learning goodies.

Python runs everywhere — Windows, OSX, Linux, and lots of devices. Here's an example on an ASUS Tinkerboard.

There's a lot of Python libraries that run everywhere. Here are a few I recommend installing on every platform:

  • Numpy.

  • SciPy.

  • NLTK.

  • Wheel.

  • Pandas.

  • MatPlotLib.

  • PyTorch.

  • TensorFlow.

  • TextBlob.

  • spACy.

big data, deep learning, machine learning, python

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}