DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
View Events Video Library
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Integrating PostgreSQL Databases with ANF: Join this workshop to learn how to create a PostgreSQL server using Instaclustr’s managed service

Mobile Database Essentials: Assess data needs, storage requirements, and more when leveraging databases for cloud and edge applications.

Monitoring and Observability for LLMs: Datadog and Google Cloud discuss how to achieve optimal AI model performance.

Automated Testing: The latest on architecture, TDD, and the benefits of AI and low-code tools.

Related

  • Data Warehouses: The Undying Titans of Information Storage
  • Ethical AI and Responsible Data Science: What Can Developers Do?
  • Explainable AI: Making the Black Box Transparent
  • AI-Powered Knowledge Graphs

Trending

  • Software Verification and Validation With Simple Examples
  • The Convergence of Testing and Observability
  • Build a Serverless App Fast With Zipper: Write TypeScript, Offload Everything Else
  • Database Monitoring: Key Metrics and Considerations
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Recommendation System Using Spark, ML Akka, and Cassandra

Recommendation System Using Spark, ML Akka, and Cassandra

Let's look at how to build a recommendation system with Spark, ML Akka, and Cassandra.

Ederson Corbari user avatar by
Ederson Corbari
·
Oct. 09, 19 · Tutorial
Like (3)
Save
Tweet
Share
10.95K Views

Join the DZone community and get the full member experience.

Join For Free

Image title

Reccomendation System

Overview

Building a recommendation system with Spark is a simple task. Spark’s machine learning library already does all the hard work for us.

In this study, I will show you how to build a scalable application for Big Data using the following technologies:

  • Scala Language
  • Spark with Machine Learning
  • Akka with Actors
  • Cassandra
You may also like:  Introduction to Recommender Systems

A recommendation system is an information filtering mechanism that attempts to predict the rating a user would give a particular product. There are some algorithms to create a Recommendation System.

Apache Spark ML implements alternating least squares (ALS) for collaborative filtering, a very popular algorithm for making recommendations.

ALS recommender is a matrix factorization algorithm that uses Alternating Least Squares with Weighted-Lamda-Regularization (ALS-WR). It factors the user to item matrix A into the user-to-feature matrix U and the item-to-feature matrix M:

It runs the ALS algorithm in a parallel fashion. The ALS algorithm should uncover the latent factors that explain the observed user to item ratings and tries to find optimal factor weights to minimize the least squares between predicted and actual ratings.

Example:

We also know that not all users rate the products (movies), or we don’t already know all the entries in the matrix. With collaborative filtering, the idea is to approximate the ratings matrix by factorizing it as the product of two matrices: one that describes properties of each user (shown in green), and one that describes properties of each movie (shown in blue).

Example:

1. Project Architecture

The architecture used in the project:

2. Dataset

The datasets with the movie information and user rating were taken from site Movie Lens. Then the data was customized and loaded into Apache Cassandra. A docker was also used for Cassandra.

The keyspace is called movies. The data in Cassandra is modeled as follows:

3. The Code

The code is available in: https://github.com/edersoncorbari/movie-rec

4. Organization and End-Points

Collections:

Collection Comments
movies.uitem Contains available movies, total dataset used is 1682.
movies.udata Contains movies rated by each user, total dataset used is 100000.
movies.uresult Where the data calculated by the model is saved, by default it is empty.

The end-points:

Method End-Point Comments
POST /movie-model-train Do the training of the model.
GET /movie-get-recommendation/{ID} Lists user recommended movies.

5. Hands-on Docking and Configuring Cassandra

Run the commands below to upload and configure Cassandra:

$ docker pull cassandra:3.11.4
$ docker run --name cassandra-movie-rec -p 127.0.0.1:9042:9042 -p 127.0.0.1:9160:9160 -d cassandra:3.11.4


In the project directory (movie-rec), there are the datasets already prepared to put in Cassandra.

$ cd movie-rec
$ cat dataset/ml-100k.tar.gz | docker exec -i cassandra-movie-rec tar zxvf - -C /tmp
$ docker exec -it cassandra-movie-rec cqlsh -f /tmp/ml-100k/schema.cql


6. Hands-on Running and testing

Enter the project root folder and run the commands. If this is the first time, SBT will download the necessary dependencies.

$ sbt run


In another terminal, run the below command to train the model:

$ curl -XPOST http://localhost:8080/movie-model-train


This will start the model training. You can then run the command to see results with recommendations. Example:

$ curl -XGET http://localhost:8080/movie-get-recommendation/1


The answer should be:

{
    "items": [
        {
            "datetime": "Thu Oct 03 15:37:34 BRT 2019",
            "movieId": 613,
            "name": "My Man Godfrey (1936)",
            "rating": 6.485164882121823,
            "userId": 1
        },
        {
            "datetime": "Thu Oct 03 15:37:34 BRT 2019",
            "movieId": 718,
            "name": "In the Bleak Midwinter (1995)",
            "rating": 5.728434247420009,
            "userId": 1
        },
        ...
}


That’s the icing on the cake! Remember that the setting is set to show 10 movie recommendations per user.

You can also check the results in the uresult collection:

7. Model Predictions

The model and application training settings are in: (src/main/resources/application.conf)

model {
  rank = 10
  iterations = 10
  lambda = 0.01
}


This setting controls forecasts and is linked with how much and what kind of data we have. For more detailed project information, please access the below link:

  • https://github.com/edersoncorbari/movie-rec

8. References

Books that were used:

  • 6.1. Scala Machine Learning Projects
  • 6.2. Reactive Programming with Scala and Akka

Spark ML Documentation:

  • https://spark.apache.org/docs/2.2.0/ml-collaborative-filtering.html
  • https://spark.apache.org/docs/latest/ml-guide.html

Thanks!

Further Reading

Building a Recommendation System Using Deep Learning Models

How to Develop a Simple Recommendations Engine Using Redis

Machine learning Akka (toolkit) Big data Least squares

Published at DZone with permission of Ederson Corbari. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Data Warehouses: The Undying Titans of Information Storage
  • Ethical AI and Responsible Data Science: What Can Developers Do?
  • Explainable AI: Making the Black Box Transparent
  • AI-Powered Knowledge Graphs

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: