Machine Learning on Big Data

DZone 's Guide to

Machine Learning on Big Data

Check out popular cases of machine learning and data analytics on big data, such as Netflix recommendation engines and AI that can comprehend handwriting.

· AI Zone ·
Free Resource

There is a revolution happening in the field of machine learning and big data. From every coffee that you buy to everything you click (not to mention purchase) online, everything is being tracked and analyzed. From these analyses, a lot of deductions being made to offer you new and better choices according to what you like. 

Earlier technologies like machine learning and artificial intelligence used to just sit in labs, never to be actually implemented — but not anymore. With the rise of big data, these technologies have gone mainstream. And using these technologies, you can predict almost anything, from which advertisement a user is going to click on next to whether a tumor is cancerous or not just based on image recognition. 

Let’s see some popular use cases where we use machine learning and regular analytics on big data on a day to day basis. Along the way, I will also mention how they are explained in the book Big Data Analytics With Java.

Recommendation Engines

I loved watching Marco Polo on Netflix, so I was recommended similar movies and shows that I might like (refer to the image above).This is one of the most common use cases of machine learning — where the machine learns from our historical data and makes appropriate recommendations to us.

Frequently Bought Together

Let’s look at the image above. As you probably know, whenever you buy any item on any e-commerce store and go to that item's detail page, you'll be shown other products that are frequently sold along with it. This gives more the user more options to purchase along with the current item and is done to boost sales.

Predictive Analytics

Machine learning has plenty of use in predicting the future value of items as long as historical data is available on which to train the models. The value can be anything, whether it’s the amount needed for a marketing campaign, the amount of expenditure needed to launch a new product, or what the price of a product is. The book Big Data Analytics With Java uses a real-life case study of predicting the price of a house based on a dataset of different variables released by King County in Chicago. 

Spam Detection and Sentiment Analysis

Spam detection is a popular use case. Gmail does it for us, and we are so used to using it. Let’s look at the image of two emails shown above. The email on the left is clearly spam, while the email on the right is perfectly fine. 

Using the same algorithm that's used for spam detection, Big Data Analytics With Java builds on a sample case study of showing the sentiment (whether positive or negative) of a user on top of a set of tweets for different movies. See the image below.

Social Analytics and Regular Graph Analytics

When you search for a destination on your GPS, a graph search algorithm runs to figure out the shortest path to your destination. Running these graphs on a small piece of data is one thing, but running them on a huge amount of data requires special software like GraphFrames on top of big data. Also, in today’s world of social networks, we have huge social graphs of people that can connect us to people we know — i.e. to our friends, our friends of friends, and so on. The image above shows a very simple social graph yet it shows how complex these graphs can get.

Big Data Analytics With Java has an extensive chapter on graph analytics and covers a case study on a real dataset regarding airports and connecting flights. Using this dataset, we run analytics like the page rank algorithm to figure out the airport that is the best option, the shortest path between destinations in the graph, and more.

Image Classification and Natural Language Processing

Image classification and NLP are both tough and interesting problems to solve. Artificial neural networks are extremely good and getting better and better in these fields. In fact, some convolutional neural networks are able to perform hand-written digit classification with 99% plus accuracy.


The use cases and examples above are but just a few; there are plenty of other use cases of analytics now. Artificial intelligence and other analytical processes are getting so imbibed into our regular day to day processes that it is very evident that we will see the usage of these techniques extend more and more in the near future.

ai, big data analytics, machine learning, natural language processing, predictive analytics, recommendation engines, sentiment analysis

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}