DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Understanding the Basics of Neural Networks and Deep Learning
  • Programming Solutions for Graph and Data Structure Problems With Implementation Examples (Word Dictionary)
  • Petastorm: A Simple Approach to Deep Learning Models in Apache Parquet Format
  • A Deep-Learning Approach to Search for Similar Homes

Trending

  • A Developer's Guide to Mastering Agentic AI: From Theory to Practice
  • Grafana Loki Fundamentals and Architecture
  • Ensuring Configuration Consistency Across Global Data Centers
  • Unlocking AI Coding Assistants Part 1: Real-World Use Cases
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Graph-Based Recommendation System With Milvus

Graph-Based Recommendation System With Milvus

In this article, we discuss how to build a graph-based recommendation system by using PinSage (a GCN algorithm), DGL package, MovieLens datasets, and Milvus.

By 
Jun Gu user avatar
Jun Gu
DZone Core CORE ·
Feb. 11, 21 · Tutorial
Likes (4)
Comment
Save
Tweet
Share
11.5K Views

Join the DZone community and get the full member experience.

Join For Free

Background

A recommendation system (RS) can identify user preferences based on their historical data and suggest products or items to them accordingly. Companies will enjoy considerable economic benefits from a well-designed recommendation system.

There are three elements in a complete set of recommendation systems: user model, object model, and the core element—recommendation algorithm. Currently, established algorithms include collaborative filtering, implicit semantic modeling, graph-based modeling, combined recommendation, and more. In this article, we will provide some brief instructions on how to use Milvus to build a graph-based recommendation system.

Key Techniques

Graph Convolutional Neural (GCN) Networks

PinSage

Users tag contents to their interest (pins) and related categories (boards) on Pinterest’s website, accumulating 2 billion pins, 1 billion boards, and 18 billion edges (an edge is created only when the pin falls into a specific board). The following illustration is a pins-boards bipartite graph.

pins-boards bipartite graph


PinSage uses pins-boards bipartite graph to generate high-quality embeddings from pins for recommendations tasks such as pins recommendation. It has three key innovations:

  1. Dynamic convolutions: Unlike the traditional GCN algorithms, which perform convolutions on the feature matrices and the full graph, PinSage samples the neighborhood of the nodes, and performs more efficient local convolutions through dynamic construction of the computational graph.
  2. Constructing convolutions with random walk modeling: Performing convolutions on the entire neighborhood of the node will result in a massive computational graph. To reduce the computation required, traditional GCN algorithms examine k-hop neighbors; PinSage simulates a random walk to set the highly-visited contents as the key neighborhood and constructs a convolution based on it.
  3. Efficient MapReduce inference: Performing local convolution on nodes takes with it the problem of repeated computation. This is because the k-hop neighborhood overlaps. In each aggregate step, PinSage maps all nodes without repeated calculation, links them to the corresponding upper-level nodes, and then retrieves the embeddings of the upper-level nodes.

DGL

Deep Graph Library is a Python package designed for building graph-based neural network models on top of existing deep learning frameworks, such as PyTorch, MXNet, Gluon, and more. With its easy-to-use backend interfaces, DGL can be readily implanted into frameworks that are based on tensor and supporting auto-generation. The PinSage algorithm that this article is dealing with is optimized based on DGL and PyTorch.

Deep Graph Library


Milvus

The next thing to obtaining embeddings is to conduct a similarity search in these embeddings to find items that might be of interest.

Milvus is an open-source AI-powered similarity search engine supporting a wide variety of unstructured data-converted vectors. It has been adopted by 400+ enterprise users and has applications spanning image processing, computer vision, natural language processing (NLP), speech recognition, recommendation engines, search engines, new drug development, gene analysis, and more. The following shows a general similarity search process using Milvus:

  1. The user uses deep learning models to convert unstructured data to feature vectors and import them to Milvus.
  2. Milvus stores and builds indexes for the feature vectors.
  3. After receiving a vector query from the user, Milvus outputs a result similar to the input vector. Upon request, Milvus searches and returns vectors most similar to the input vectors.

Milvus diagram

Implementation of Recommendation System

System Overview

Here we will use the following figure to illustrate the basic process of building a graph-based recommendation system with Milvus. The basic process includes data preprocessing, PinSage model training, data loading, searching, and recommending.

Graph-based recommendation model with Milvus

Data Preprocessing

The recommendation system we build in this article is based on the open data sets MovieLens (download the ZipFile here) (m1–1m), which contain 1,000,000 ratings of 4,000 movies by 6,000 users. Collected by GroupLens Research Labs, the data includes movie information, user characteristics, and ratings of movies. In this article, we will use users’ movie history to build a graph with classification characteristics, a users-movies bipartite graph g.

Python
x
1
# Build graph
2
graph_builder = PandasGraphBuilder()
3
graph_builder.add_entities(users, 'user_id', 'user')
4
graph_builder.add_entities(movies_categorical, 'movie_id', 'movie')
5
graph_builder.add_binary_relations(ratings, 'user_id', 'movie_id', 'watched')
6
graph_builder.add_binary_relations(ratings, 'movie_id', 'user_id', 'watched-by')
7
g = graph_builder.build()

PinSage Model Training

The embedding vectors of pins generated by using the PinSage model are feature vectors of the acquired movie info. First, create a PinSage model according to the bipartite graph g and the customized movie feature vector dimensions (which is 256-dimension at default). Then, train the model with PyTorch to obtain the h_item embeddings of 4000 movies.

Python
 




xxxxxxxxxx
1


 
1
# Define the model
2
model = PinSAGEModel(g, item_ntype, textset, args.hidden_dims, args.num_layers).to(device)
3
opt = torch.optim.Adam(model.parameters(), lr=args.lr)
4
# Get the item embeddings
5
for blocks in dataloader_test:
6
 for i in range(len(blocks)):
7
 blocks[i] = blocks[i].to(device)
8
 h_item_batches.append(model.get_repr(blocks))
9
h_item = torch.cat(h_item_batches, 0)



Data Loading

Load the movie embeddings h_item generated by the PinSage model into Milvus, and Milvus will return the corresponding IDs. Import the IDs and the corresponding movie information into MySQL.

Python
x
1
# Load data to Milvus and MySQL
2
status, ids = milvus.insert(milvus_table, h_item)
3
load_movies_to_mysql(milvus_table, ids_info)

Searching

Get the corresponding embeddings in Milvus based on the movie IDs and carry out a similarity search with these embeddings in Milvus. Then, find the corresponding movie information in a MySQL database accordingly.

MySQL
xxxxxxxxxx
1
 
1
# Get embeddings that users like
2
_, user_like_vectors = milvus.get_entity_by_id(milvus_table, ids)
3
# Get the information with similar movies
4
_, ids = milvus.search(param = {milvus_table, user_like_vectors, top_k})
5
sql = "select * from " + movies_table + " where milvus_id=" + ids + ";"
6
results = cursor.execute(sql).fetchall()

Recommendation

Finally, the system will recommend movies most similar to the search queries to the users. Above is the main workflow of building a recommendation system. For more details, see Milvus-Bootcamp.

System Demo

In addition to a FastAPI method, the project also has a front-end demo. By simulating the process of a user clicking on the movies to his liking, the demo makes a movie recommendation.

The system also provides a FastAPI interface and front-end display that recommends movies catering to users’ tastes. You can simulate the process by logging into the Movie Recommendation System and marking the movies you like.

Movie system demo


Movie demo gif

Conclusion

PinSage is a graph convolutional neural network that can be used for recommendation tasks. It generates high-quality embeddings of pins via a pins-boards bipartite graph.

We use the MovieLens datasets to create a users-movies bipartite graph, and the DGL open-source package, and the PinSage model to generate feature vectors of movies. The vectors are then stored in Milvus, a similarity embeddings search engine. Recommendations of movies are returned to users afterward.

Milvus embedding vector similarity search engine can be integrated into a wide variety of deep learning platforms and multiple AI scenarios. By fully leveraging the optimized vector retrieval algorithms and integrated heterogeneous computing resources, Milvus can continually empower companies with vector retrieval capabilities.

Data structure Graph (Unix) Open source neural network Deep learning Data (computing)

Opinions expressed by DZone contributors are their own.

Related

  • Understanding the Basics of Neural Networks and Deep Learning
  • Programming Solutions for Graph and Data Structure Problems With Implementation Examples (Word Dictionary)
  • Petastorm: A Simple Approach to Deep Learning Models in Apache Parquet Format
  • A Deep-Learning Approach to Search for Similar Homes

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!