DZone
Big Data Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Big Data Zone > Using Python for Big Data Workloads (Part 2)

Using Python for Big Data Workloads (Part 2)

Check out a continuation of the series on how, where, and why to use Python for Big Data workloads for Machine Learning, Deep Learning, and Big Data.

Tim Spann user avatar by
Tim Spann
CORE ·
May. 31, 17 · Big Data Zone · Tutorial
Like (5)
Save
Tweet
8.02K Views

Join the DZone community and get the full member experience.

Join For Free

Why should you use Python for Big Data workloads? We have discussed a few reasons, but here, we'll talk about more. Some surveys on the internet are showing that Python is gaining near 50% penetration in the Machine Learning language of choice.

  1. Deep Learning: TensorFlow, Keras, and PyTorch.

  2. OpenCV Python bindings.

  3. NLTK.

  4. PySpark.

  5. Apache Arrow, Parquet, and other project support.

  6. Apache Beam 2.0 support for Python.

  7. Speech recognition.

  8. API support.

  9. Sci-Kit Learn and other cool Machine Learning libraries.

  10. Utilities abound — a PiP away.

Here's a cool GitHub example of using Keras with OpenCV and Python for face detection.

Step one is to install OpenCV with Python.

There's some big news from Google about a new release of Apache Beam 2.0, and Python is now supported. You can now do streaming with Flink, Spark, and more using Python:

pip install apache-beam 

After a simple PiP install, you can run Beam jobs:

python -m apache_beam.examples.wordcount --input MANIFEST.in --output counts 

Check out some details on speech recognition with Python, Python support for upcoming Apache Arrow and Parquet, and some cool Spark SQL code and UDF with Python.

OpenCV has a great Python library and tons of fun examples that work with robots, cars, and drones.

Cool Python image utilities are very abundant for all types of graphic and image manipulation

APIs are everywhere for Python! Here's an example on Spotify. There's also libraries for Facebook, Twitter, Instagram, Google services, Amazon services, Microsoft services, and tons of other feeds and services.

TensorFlow, NLTK, and Stanford CoreNLP have a Python wrapper, and TextBlob has so many cool libraries, utilities, and helpers. They're all easy to install, and most are well-documented.

Python also has SciKit-Learn, which is great and has a ton of great Machine Learning goodies.

Python runs everywhere — Windows, OSX, Linux, and lots of devices. Here's an example on an ASUS Tinkerboard.

There's a lot of Python libraries that run everywhere. Here are a few I recommend installing on every platform:

  • Numpy.

  • SciPy.

  • NLTK.

  • Wheel.

  • Pandas.

  • MatPlotLib.

  • PyTorch.

  • TensorFlow.

  • TextBlob.

  • spACy.

Python (language) Big data Machine learning

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • The Best Team Ever
  • Pub/Sub Design Pattern in .NET Distributed Cache
  • Top 5 Programming Languages in AI: A Comparison
  • Build a Business-Led API Strategy and Get the Most Out of Your APIs

Comments

Big Data Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo