Over a million developers have joined DZone.

Deep Learning and Machine Learning Guide: Part I

DZone's Guide to

Deep Learning and Machine Learning Guide: Part I

From doing deep dives and checking out cool projects to working on distributed frameworks yourself, there many ways to learn about deep learning and machine learning.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

There's a ton of great resources for deep learning and machine learning. I have put together some of the most interesting, focusing on applications and use cases.

Step 1 is to learn the languages of ML and DL. C and C++ are strong in DL and Python is everywhere — so learning Python is very critical. Next up, you will need to learn a JVM language. Java 8 will work for most use cases, but I recommend adding Scala to your toolkit. Dan Garrette has a great article on the basics of Scala.   

Step 2 is to start picking up machine learning frameworks like MXNet, which is now in Apache.

Step 3 is to get your data. Apache Olingo for Open Data Protocol lets you read OData and Open Data sources, which is helpful. Speaking of data, with the Superbowl just past, NFL Savant has a huge set of NFL data. It’s 8 million URLs and over a terabyte of data. Google has some blog articles that you can check out here, here, and here. This would be a good shared data set for machine learning and deep learning (especially TensorFlow). Do we have a place we can land this? I did not have a few hundred gigabytes of space to store it or the bandwidth to grab it. That's something to note; some have Python scripts that will start downloading gigs. Another Open Images Database is another great source. Also, Microsoft has another huge image database for use called Microsoft OpenImages Dataset (9 million image URLs; 654 megs).

Step 4 is a to dive deep into TensorFlow. Google's library has a lot of tutorials, examples, and documentation. Even if you want to switch a mature library like DL4J, TensorFlow is a great place to start. Get yourself a machine with a few terabytes of SSD, 64 gigs of RAM, and a nice GPU before you get serious about deep learning. Or, you can use some AWS GPUs in the cloud for running your algorithms. Here are some resources that you might find useful:

Spark is still the choice for running distributed jobs on Hadoop for multiple workloads (i.e., Graph, Batch, Streaming, SQL, ML, DL).

Step 5 is to start reading. Here are some more resources:

Step 6 is to look at cool projects. Eigenfaces and facial recognition are really cool use cases and have many practical security applications. Check out these: 

Step 7 is to check out distributed deep learning frameworks like deep learning on Spark and Caffe on Spark from Yahoo.

Finally, here's a couple of interesting examples of image analysis with Spark.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

big data ,deep learning ,machine learning

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}