Over a million developers have joined DZone.

Deep Learning for Data Engineers: Part I

DZone's Guide to

Deep Learning for Data Engineers: Part I

Big Data engineers and data DevOps teams come into play when data and code need to go through the process of being run and executed.

· Big Data Zone
Free Resource

Learn best practices according to DataOps. Download the free O'Reilly eBook on building a modern Big Data platform.

Deep Learning is not just for super genius Data Science PhDs. (OK, some of it is.)

Someone needs to get this code running, scale it, distribute it, install it, integrate it, and have it executed. Here is the place where the Big Data engineers and data DevOps teams come into play.

I have been integrating some of these frameworks, libraries, models and tools into existing Hadoop, Big Data, Spark, and Machine Learning pipelines. Let's make this work!

Step 1: Practice Installing Deep Learning Libraries

Follow the instructions for PyTorch, TensorFlow, and MXNet. Make sure you can get them running on Centos, OSX, and Ubuntu (and any other OS you may run).

Step 2: Compile and Run the Examples and Training

Run, document, and time all of these exercises.

Step 3: Download Common Data Sets

Sign up for development accounts with Twitter, Weather Underground, Google, Microsoft, and more. Start looking for cool new data like this movement one from Uber.

Step 3b: Download All the Model Zoos That You Can

Here's a couple:

Step 4: Prepare Ingestion of Streaming and Live Feeds for Scientist Use

I recommend Apache NiFi to ingest:

  • Twitter (at least four feeds following sites of interest, news, and topics for your industry).
  • News sites.
  • NOAA.
  • Weather Underground.
  • State and Federal feeds.
  • Any paid feeds you have.
  • Step 5: Store Your Data

    Store your data as ORC in HDFS with Hive tables on top for fast queries. Also, keep original copies in JSON and XML for other uses.

    Step 6: Create Summary Data

    Create your summary data in HBase/Phoenix.

    Find the perfect platform for a scalable self-service model to manage Big Data workloads in the Cloud. Download the free O'Reilly eBook to learn more.

    deep learning ,big data ,data engineers

    Opinions expressed by DZone contributors are their own.

    {{ parent.title || parent.header.title}}

    {{ parent.tldr }}

    {{ parent.urlSource.name }}