Using Python for Big Data Workloads (Part 2)
Check out a continuation of the series on how, where, and why to use Python for Big Data workloads for Machine Learning, Deep Learning, and Big Data.
Join the DZone community and get the full member experience.
Join For FreeWhy should you use Python for Big Data workloads? We have discussed a few reasons, but here, we'll talk about more. Some surveys on the internet are showing that Python is gaining near 50% penetration in the Machine Learning language of choice.
Deep Learning: TensorFlow, Keras, and PyTorch.
OpenCV Python bindings.
NLTK.
PySpark.
Apache Arrow, Parquet, and other project support.
Apache Beam 2.0 support for Python.
Speech recognition.
API support.
Sci-Kit Learn and other cool Machine Learning libraries.
Utilities abound — a PiP away.
Here's a cool GitHub example of using Keras with OpenCV and Python for face detection.
Step one is to install OpenCV with Python.
There's some big news from Google about a new release of Apache Beam 2.0, and Python is now supported. You can now do streaming with Flink, Spark, and more using Python:
pip install apache-beam
After a simple PiP install, you can run Beam jobs:
python -m apache_beam.examples.wordcount --input MANIFEST.in --output counts
Check out some details on speech recognition with Python, Python support for upcoming Apache Arrow and Parquet, and some cool Spark SQL code and UDF with Python.
OpenCV has a great Python library and tons of fun examples that work with robots, cars, and drones.
Cool Python image utilities are very abundant for all types of graphic and image manipulation
APIs are everywhere for Python! Here's an example on Spotify. There's also libraries for Facebook, Twitter, Instagram, Google services, Amazon services, Microsoft services, and tons of other feeds and services.
TensorFlow, NLTK, and Stanford CoreNLP have a Python wrapper, and TextBlob has so many cool libraries, utilities, and helpers. They're all easy to install, and most are well-documented.
Python also has SciKit-Learn, which is great and has a ton of great Machine Learning goodies.
Python runs everywhere — Windows, OSX, Linux, and lots of devices. Here's an example on an ASUS Tinkerboard.
There's a lot of Python libraries that run everywhere. Here are a few I recommend installing on every platform:
Numpy.
SciPy.
NLTK.
Wheel.
Pandas.
MatPlotLib.
PyTorch.
TensorFlow.
TextBlob.
spACy.
Opinions expressed by DZone contributors are their own.
Comments