Deep Learning is not just for super genius Data Science PhDs. (OK, some of it is.)
Someone needs to get this code running, scale it, distribute it, install it, integrate it, and have it executed. Here is the place where the Big Data engineers and data DevOps teams come into play.
I have been integrating some of these frameworks, libraries, models and tools into existing Hadoop, Big Data, Spark, and Machine Learning pipelines. Let's make this work!
Step 1: Practice Installing Deep Learning Libraries
Follow the instructions for PyTorch, TensorFlow, and MXNet. Make sure you can get them running on Centos, OSX, and Ubuntu (and any other OS you may run).
Step 2: Compile and Run the Examples and Training
Run, document, and time all of these exercises.
Step 3: Download Common Data Sets
Sign up for development accounts with Twitter, Weather Underground, Google, Microsoft, and more. Start looking for cool new data like this movement one from Uber.
Step 3b: Download All the Model Zoos That You Can
Here's a couple:
Step 4: Prepare Ingestion of Streaming and Live Feeds for Scientist Use
I recommend Apache NiFi to ingest:
Step 5: Store Your Data
Store your data as ORC in HDFS with Hive tables on top for fast queries. Also, keep original copies in JSON and XML for other uses.
Step 6: Create Summary Data
Create your summary data in HBase/Phoenix.