Over a million developers have joined DZone.

Deep Learning for Data Engineers: Part I

DZone's Guide to

Deep Learning for Data Engineers: Part I

Big Data engineers and data DevOps teams come into play when data and code need to go through the process of being run and executed.

· Big Data Zone ·
Free Resource

The Architect’s Guide to Big Data Application Performance. Get the Guide.

Deep Learning is not just for super genius Data Science PhDs. (OK, some of it is.)

Someone needs to get this code running, scale it, distribute it, install it, integrate it, and have it executed. Here is the place where the Big Data engineers and data DevOps teams come into play.

I have been integrating some of these frameworks, libraries, models and tools into existing Hadoop, Big Data, Spark, and Machine Learning pipelines. Let's make this work!

Step 1: Practice Installing Deep Learning Libraries

Follow the instructions for PyTorch, TensorFlow, and MXNet. Make sure you can get them running on Centos, OSX, and Ubuntu (and any other OS you may run).

Step 2: Compile and Run the Examples and Training

Run, document, and time all of these exercises.

Step 3: Download Common Data Sets

Sign up for development accounts with Twitter, Weather Underground, Google, Microsoft, and more. Start looking for cool new data like this movement one from Uber.

Step 3b: Download All the Model Zoos That You Can

Here's a couple:

Step 4: Prepare Ingestion of Streaming and Live Feeds for Scientist Use

I recommend Apache NiFi to ingest:

  • Twitter (at least four feeds following sites of interest, news, and topics for your industry).
  • News sites.
  • NOAA.
  • Weather Underground.
  • State and Federal feeds.
  • Any paid feeds you have.
  • Step 5: Store Your Data

    Store your data as ORC in HDFS with Hive tables on top for fast queries. Also, keep original copies in JSON and XML for other uses.

    Step 6: Create Summary Data

    Create your summary data in HBase/Phoenix.

    Learn how taking a DataOps approach will help you speed up processes and increase data quality by providing streamlined analytics pipelines via automation and testing. Learn More.

    deep learning ,big data ,data engineers

    Opinions expressed by DZone contributors are their own.

    {{ parent.title || parent.header.title}}

    {{ parent.tldr }}

    {{ parent.urlSource.name }}