A Deep Dive Into Recommendation Algorithms With Netflix Case Study and NVIDIA Deep Learning Technology

Take a deep dive into recommendation algorithms that are crucial for internet platforms, driving user engagement and revenue, and used by major platforms.

Sagar Sidana

Aug. 12, 24 · Analysis

Likes (2)

Comment

Save

2.5K Views

What Are Recommendation Algorithms?

Recommendation Engines are the secret behind every Internet transaction, be it Amazon, Netflix, Flipkart, YouTube, TikTok, even LinkedIn, Facebook, X(Twitter), Snapchat, Medium, Substack, HackerNoon. . . all of these sites and nearly every content curation or product marketplace site on the Internet make their big bucks from recommendation algorithms.

Simply put, a recommendation algorithm builds a model of your likes, dislikes, favorites, things you prefer, genres you prefer, and items you prefer, and when one transaction is made on the site, they practically almost read your mind and predict the next product you are most likely to buy. Some of the recommendation algorithms on YouTube and TikTok are so accurate that they can keep users hooked for hours. I would be surprised if even one reader did not report a YouTube binge that came out of just scrolling and clicking/tapping for around ten minutes.

This leads to better customer engagement, a better customer experience, increased revenue, and more money for the platform itself. Addiction is built upon the accuracy and the scary performance of these ultra-optimized algorithms.

This is how these giants build their audience.

The monthly visitors to YouTube, TikTok, Instagram and Facebook are (source):

Facebook: 2.9 Billion
YouTube: 2.2 Billion
Instagram: 1.4 Billion
TikTok:1 Billion

And the secret to their success: fantastic recommendation algorithms.

Types of Recommendation Algorithms

Collaborative Filtering (User-Based)

User-based collaborative filtering is a recommendation technique that assumes users with similar preferences will have similar tastes. It utilizes user-item interaction data to identify similarities between users, often employing measures such as cosine similarity or Pearson correlation. The method predicts a user's ratings or preferences based on the ratings given by similar users.

However, it can face challenges, such as the cold-start problem for new users who have not yet interacted with the system, and scalability issues may arise when dealing with a large number of users.

    Python
   
 

   import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

def user_based_cf(ratings_matrix, user_id, k=5):
    similarities = cosine_similarity(ratings_matrix)
    user_similarities = similarities[user_id]
    similar_users = np.argsort(user_similarities)[::-1][1:k+1]
    
    recommendations = np.zeros(ratings_matrix.shape[1])
    for similar_user in similar_users:
        recommendations += ratings_matrix[similar_user]
    
    return recommendations / k
  

Uses cosine similarity to calculate user similarities
Finds the k most similar users to the target user
Aggregates the ratings of similar users to generate recommendations
Returns the average rating for each item from similar users
Simple implementation that can be easily modified or extended

Collaborative Filtering (Item-Based)

Item-based collaborative filtering assumes that users will prefer items similar to those they have liked in the past. It calculates the similarity between items based on user ratings or interactions. This approach is often more scalable than user-based collaborative filtering, particularly when there are many users and fewer items. It allows for the pre-computation of item similarities, which can make real-time recommendations faster.

While it handles new users better than user-based methods, it may struggle with new items that lack sufficient ratings. Additionally, it is less affected by changes in user preferences over time.

    Python
   
 

   import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

def item_based_cf(ratings_matrix, item_id, k=5):
    similarities = cosine_similarity(ratings_matrix.T)
    item_similarities = similarities[item_id]
    similar_items = np.argsort(item_similarities)[::-1][1:k+1]
    
    recommendations = np.zeros(ratings_matrix.shape[0])
    for similar_item in similar_items:
        recommendations += ratings_matrix[:, similar_item]
    
    return recommendations / k
  

Transposes the rating matrix to calculate item-item similarities
Finds the k most similar items to the target item
Aggregates user ratings for similar items
Returns the average rating for each user based on similar items
Efficient for systems with more users than items

Matrix Factorization

Matrix factorization decomposes the user-item interaction matrix into lower-dimensional matrices, assuming that user preferences and item characteristics can be represented by latent factors. Techniques such as Singular Value Decomposition (SVD) or Alternating Least Squares (ALS) are commonly used for this purpose.

This approach can efficiently handle large, sparse datasets and often provides better accuracy compared to memory-based collaborative filtering methods. Additionally, it can incorporate regularization techniques to prevent overfitting, enhancing the model's generalization to unseen data.

    Python
   
 

   import numpy as np

def matrix_factorization(R, P, Q, K, steps=5000, alpha=0.0002, beta=0.02):
    Q = Q.T
    for step in range(steps):
        for i in range(len(R)):
            for j in range(len(R[i])):
                if R[i][j] > 0:
                    eij = R[i][j] - np.dot(P[i,:], Q[:,j])
                    for k in range(K):
                        P[i][k] += alpha * (2 * eij * Q[k][j] - beta * P[i][k])
                        Q[k][j] += alpha * (2 * eij * P[i][k] - beta * Q[k][j])
        
        e = 0
        for i in range(len(R)):
            for j in range(len(R[i])):
                if R[i][j] > 0:
                    e += pow(R[i][j] - np.dot(P[i,:], Q[:,j]), 2)
                    for k in range(K):
                        e += (beta/2) * (pow(P[i][k], 2) + pow(Q[k][j], 2))
        if e < 0.001:
            break
    return P, Q.T

  

Implements a basic matrix factorization algorithm
Uses gradient descent to minimize the error between predicted and actual ratings
Incorporates regularization to prevent overfitting
Iteratively updates user and item latent factors
Stops when error falls below a threshold or maximum steps are reached

Content-Based Filtering

Content-based filtering recommends items based on their features and user preferences. It builds a profile for each user and item based on their characteristics.

Techniques such as TF-IDF (Term Frequency-Inverse Document Frequency) for text analysis and cosine similarity for matching are commonly employed. This approach effectively addresses the new item problem, as it does not rely on prior user interactions.

However, it may suffer from overspecialization, resulting in a lack of diversity in recommendations. Additionally, effective implementation requires good feature engineering to ensure that the relevant characteristics of items are accurately captured.

    Python
   
 

   import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

def content_based_filtering(item_descriptions, user_profile, k=5):
    vectorizer = TfidfVectorizer()
    item_vectors = vectorizer.fit_transform(item_descriptions)
    user_vector = vectorizer.transform([user_profile])
    
    similarities = cosine_similarity(user_vector, item_vectors)
    top_items = np.argsort(similarities[0])[::-1][:k]
    
    return top_items
  

Uses TF-IDF to convert text descriptions into numerical vectors
Calculates cosine similarity between user profile and item descriptions
Returns the top k most similar items to the user profile
Efficient for systems with well-defined item features
Can be easily extended to include multiple feature types

Hybrid Recommendation System

Hybrid recommendation systems combine two or more recommendation techniques to leverage their respective strengths. By integrating multiple approaches, hybrid systems can mitigate the weaknesses of individual methods, such as the cold-start problem. Common combinations include collaborative and content-based filtering. Various methods are used for combining these techniques, such as weighted, switching, mixed, or meta-level approaches.

Hybrid systems often provide more robust and accurate recommendations compared to single-approach systems. However, effective implementation requires careful tuning to balance the different components and ensure optimal performance.

    Python
   
 

   import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

def hybrid_recommender(ratings_matrix, content_matrix, user_id, alpha=0.5, k=5):
    cf_similarities = cosine_similarity(ratings_matrix)
    content_similarities = cosine_similarity(content_matrix)
    
    hybrid_similarities = alpha * cf_similarities + (1 - alpha) * content_similarities
    user_similarities = hybrid_similarities[user_id]
    
    similar_users = np.argsort(user_similarities)[::-1][1:k+1]
    
    recommendations = np.zeros(ratings_matrix.shape[1])
    for similar_user in similar_users:
        recommendations += ratings_matrix[similar_user]
    
    return recommendations / k
  

Combines collaborative filtering and content-based similarities
Uses a weighted sum approach with parameter alpha
Finds similar users based on the hybrid similarity
Generates recommendations from similar users' ratings
Allows for easy adjustment of the balance between CF and content-based approaches

Singular Value Decomposition (SVD)

Singular Value Decomposition (SVD) is a matrix factorization technique that decomposes a matrix into three components: U, Σ, and V^T. In this decomposition, U and V represent the left and right singular vectors, respectively, while Σ contains the singular values.

SVD reduces dimensionality by retaining only the top k singular values, which helps uncover latent factors in user-item interactions.

This method is efficient for handling large, sparse matrices commonly found in recommendation systems. Additionally, SVD provides a good balance between accuracy and computational efficiency, making it a popular choice for generating recommendations.

    Python
   
   import numpy as np
from scipy.sparse.linalg import svds

def svd_recommender(ratings_matrix, k=5):
    U, s, Vt = svds(ratings_matrix, k=k)
    
    sigma = np.diag(s)
    predicted_ratings = np.dot(np.dot(U, sigma), Vt)
    
    return predicted_ratings

Uses scipy's svds function to perform truncated SVD
Reconstructs the rating matrix using only the top k singular values
Returns a dense matrix of predicted ratings for all user-item pairs
Efficient for large, sparse rating matrices
Can be easily integrated into a larger recommendation system

Tensor Factorization

The technique of tensor factorization extends traditional matrix factorization to multi-dimensional data, allowing the incorporation of contextual information such as time and location into recommendations. It utilizes methods like the CP decomposition, which decomposes a tensor into a sum of component tensors, capturing complex interactions between multiple factors. This approach requires more data and computational resources compared to two-dimensional methods, as it deals with higher-dimensional arrays.

However, it can provide highly personalized and context-aware recommendations by leveraging the additional dimensions of data. The increased complexity of the data structure allows for a more nuanced understanding of user preferences in various contexts, enhancing the overall recommendation accuracy.

    Python
   
 

   import numpy as np
import tensorly as tl
from tensorly.decomposition import parafac

def tensor_factorization_recommender(tensor, rank=10):
    factors = parafac(tensor, rank=rank)
    reconstructed_tensor = tl.kruskal_to_tensor(factors)
    return reconstructed_tensor
  

Uses the TensorLy library for tensor operations and decomposition
Applies PARAFAC decomposition to the input tensor
Reconstructs the tensor from the decomposed factors
Returns the reconstructed tensor as recommendations
Can handle multi-dimensional data (e.g., user-item-context)

Neural Collaborative Filtering

Deep learning-based recommendation systems combine collaborative filtering techniques with neural networks. This approach allows for learning non-linear user-item interactions, which traditional matrix factorization methods may struggle with. Deep learning recommenders typically use embedding layers to represent users and items in a dense, low-dimensional space. This enables easy integration of additional features or side information, such as user demographics or item descriptions, to enhance the recommendation performance.

When trained on large datasets, deep learning-based systems can often outperform traditional matrix factorization methods in terms of accuracy. However, this advantage comes at the cost of increased computational complexity and the need for large amounts of data.

Deep learning recommenders also require careful hyperparameter tuning to achieve optimal results, making them more challenging to implement and maintain compared to simpler collaborative filtering approaches.

    Python
   
 

   import tensorflow as tf
from tensorflow.keras.layers import Input, Embedding, Flatten, Dense, Concatenate
from tensorflow.keras.models import Model

def neural_collaborative_filtering(num_users, num_items, embedding_size=100):
    user_input = Input(shape=(1,), dtype='int32', name='user_input')
    item_input = Input(shape=(1,), dtype='int32', name='item_input')

    user_embedding = Embedding(num_users, embedding_size, name='user_embedding')(user_input)
    item_embedding = Embedding(num_items, embedding_size, name='item_embedding')(item_input)

    user_vecs = Flatten()(user_embedding)
    item_vecs = Flatten()(item_embedding)

    concat = Concatenate()([user_vecs, item_vecs])
    dense1 = Dense(128, activation='relu')(concat)
    dense2 = Dense(64, activation='relu')(dense1)
    output = Dense(1, activation='sigmoid')(dense2)

    model = Model(inputs=[user_input, item_input], outputs=output)
    model.compile(optimizer='adam', loss='binary_crossentropy')

    return model
  

Uses TensorFlow and Keras to build a neural network model
Creates embedding layers for users and items
Concatenates user and item embeddings
Adds dense layers for learning non-linear interactions
Returns a compiled model ready for training

The Case Study of Netflix

The journey of Netflix’s recommendation algorithm began with CineMatch in 2000, a collaborative filtering algorithm that used member ratings to estimate how much a user would enjoy a movie. In 2006, The Netflix Prize of 1 million USD was launched to challenge data scientists to create a model that would beat CineMatch by 10%. The winning algorithm was then implemented into Netflix’s internal data.

Netflix soon began to accumulate users and there was a shift to streaming data in 2007. Viewers were exposed to reinforcement learning algorithms and clustering algorithms that generated suggestions in real time. As the algorithm improved, more and more users began to switch to Netflix, simply because of the effectiveness of the recommendation algorithm. Almost 80% of the content viewed on Netflix is suggested by the recommendation algorithm.

The company estimates that it saves 1 billion annually from lost users because of the effectiveness of the recommendation algorithm.

Netflix uses advanced machine learning techniques and clustering with a system of over 1300 clusters based on the metadata of the films that the users watch. This allows them to deliver highly optimized suggestions to their users. But Netflix soon ran into a problem: scale. As the number of monthly users went into hundreds of millions, and the total number of users went to over 200 million, Netflix went all in on cloud computing.

Simply put, they migrated all the data into Amazon Web Services (AWS), starting in 2008. The complete transition process took years to complete and finished in 2015. Netflix reportedly saves 1 billion a year using AWS. AWS also has support built-in for machine learning, which Netflix uses to the full. Netflix reportedly used over 100,000 AWS servers and 1,000 Kinesis shards for its global audience back in 2022.

From 2015, Netflix has also started offering its own productions, over thousands of movies and shows in a wide variety of formats. Netflix recommender algorithms are highly automated and perform thousands of A/B tests for users per day. Today’s Netflix user subscription base exceeds 280 million.

While Netflix now faces stiff competition, especially from Disney+, which has acquired the Marvel and the Star Wars franchise, the company aims to hit 500 million subscribers by 2025.

Last year, Netflix earned a whopping 31 billion in revenue.

The major parts of its current recommendation systems involve:

Reinforcement learning: Depending upon user behavior, Netflix changes the content on the screen in real-time. Thus, the system is in a state of constant flux and changes depending upon the user’s interactions.
Deep neural networks: Because of the scale of the data (over 15,000 shows and almost 300 million users), standard ML techniques are not easy to apply. Deep Learning is used extensively, using NVIDIA’s technology. (See the end of this article for a program that uses NVIDIA’s latest Merlin deep learning technology).
Matrix factorization: By effectively performing Singular Value Decomposition (SVD) on highly sparse and highly vast matrices, Netflix estimates the importance and the attraction of each user to certain genres and shows.
Ensemble learning: Clever combinations of the algorithms listed above adjust the recommendations on the fly so that no two users see the same screen. This personalization is what creates the big bucks and keeps Netflix on top of all the OTT platforms.

And all these models and optimizations run hundreds of thousands of times a day for hundreds of thousands of users.

Modern Deep Learning Technology

With such scales, no single computer can run these ML models alone. That is why AWS runs ML algorithms in a distributed fashion over thousands of machines.

NVIDIA has recently released several products to enable recommendation systems at scale. NVIDIA's GPU clusters also play a big part in the ML algorithm execution. NVIDIA has recently released Merlin, a high-performance recommender algorithm optimized to run on thousands of machines and deliver superior results. This was perhaps only a matter of time, as dataset sizes exceeded far beyond what single computers could process.

Modern recommendation systems use deep learning extensively. As a part of DL, GPU/TPU computing systems are extensively used to speed up the computation.

Some of NVIDIA’s recent offerings for Merlin include:

NVIDIA Recommender Systems

(From Announcing NVIDIA Merlin: An Application Framework for Deep Recommender Systems)

Available as open-source projects:

NVTabular

NVTabular is a feature engineering and preprocessing library, designed to quickly and easily manipulate terabyte-scale datasets. It is especially suitable for recommender systems, which require a scalable way to process additional information, such as user and item metadata and contextual information. It provides a high-level abstraction to simplify code and accelerates computation on the GPU using the RAPIDS cuDF library. Using NVTabular, with just 10-20 lines of high-level API code, you can set up a data engineering pipeline and achieve up to 10X speedup compared to optimized CPU-based approaches while experiencing no dataset size limitations, regardless of the GPU/CPU memory capacity.

HugeCTR

HugeCTR is a highly efficient GPU framework designed for recommender model training, which targets both high performance and ease of use. It supports both simple deep models and also state-of-the-art hybrid models such as W&D, Deep Cross Network, and DeepFM. We are also working on enabling DLRM with HugeCTR. The model details and hyperparameters can be specified easily in JSON format, allowing for quick selection from a range of common models.

TensorRT and Triton Server for Inference

NVIDIA TensorRT is an SDK for high-performance DL inference. It includes a DL inference optimizer and runtime that delivers low latency and high throughput for inference applications. TensorRT can accept trained neural networks from all DL frameworks using a common interface, the open neural network exchange format (ONNX).

NVIDIA Triton Inference Server provides a cloud-inferencing solution optimized for NVIDIA GPUs. The server provides an inference service via an HTTP or gRPC endpoint, allowing remote clients to request inferencing for any model being managed by the server. Triton Server can serve DL recommender models using several backends, including TensorFlow, PyTorch (TorchScript), ONNX runtime, and TensorRT runtime.

Code Example

The following code example shows an actual preprocessing workflow required to transform the 1-TB Criteo Ads dataset, implemented with just a dozen lines of code using NVTabular. Briefly, numerical and categorical columns are specified. Next, we define an NVTabular workflow and supply a set of train and validation files. Then, preprocessing operations are added to the workflow, and data is persisted to disk. In comparison, custom-built processing codes, such as the NumPy-based data util in Facebook’s DLRM implementation, can have 500-1000 lines of code for the same pipeline.

    Python
   
 

   import nvtabular as nvt
import glob
 
cont_names = ["I"+str(x) for x in range(1, 14)] # specify continuous feature names
cat_names = ["C"+str(x) for x in range(1, 27)] # specify categorical feature names
label_names = ["label"] # specify target feature
columns = label_names + cat_names + cont_names # all feature names
 
# initialize Workflow
proc = nvt.Worfklow(cat_names=cat_names, cont_names=cont_names, label_name=label_names)
 
# create datsets from input files
train_files = glob.glob("./dataset/train/*.parquet")
valid_files = glob.glob("./dataset/valid/*.parquet")
 
train_dataset = nvt.dataset(train_files, gpu_memory_frac=0.1)
valid_dataset = nvt.dataset(valid_files, gpu_memory_frac=0.1)
 
# add feature engineering and preprocessing ops to Workflow
proc.add_cont_feature([nvt.ops.ZeroFill(), nvt.ops.LogOp()])
proc.add_cont_preprocess(nvt.ops.Normalize())
proc.add_cat_preprocess(nvt.ops.Categorify(use_frequency=True, freq_threshold=15))
 
# compute statistics, transform data, export to disk
proc.apply(train_dataset, shuffle=True, output_path="./processed_data/train", num_out_files=len(train_files))
proc.apply(valid_dataset, shuffle=False, output_path="./processed_data/valid", num_out_files=len(valid_files))
  

The entire technology stack can be found at the following GitHub repository:

NVIDIA-Merlin (github.com)

Conclusion

Recommendation systems have come a long way.

From simple statistical modeling, content-based filtering, and collaborative filtering, we now have deep learning neural networks, HPC nodes, matrix factorization, and its extension to greater dimensions, tensor factorization.

The most profitable recommender system for streaming is NVIDIA, and they run their entire Machine Learning algorithms on the cloud with AWS.

Recommender systems are used everywhere, from Google to Microsoft to Amazon to Flipkart. It is a critical part of the modern-day enterprise, and there is no company online that does not use it in one form or the other.

There are many companies today that offer custom recommendation systems online.

Some of the leading ones include:

Netflix: Known for its sophisticated recommendation engine that analyzes user viewing habits to suggest movies and TV shows
Amazon: Utilizes a powerful recommendation engine that suggests products based on user purchase history and browsing behavior
Spotify: Employs a recommendation system that curates music playlists and song suggestions based on user listening history
YouTube: Uses a recommendation engine to suggest videos based on users' viewing patterns and preferences
LinkedIn: Recommends jobs, connections, and content based on user profiles and professional history
Zillow: Suggests real estate properties tailored to user preferences and search history
Airbnb: Provides accommodation recommendations based on user travel history and preferences
Uber: Recommends ride options based on user preferences and previous rides
IBM Corporation: A leader in the recommendation engine market, offering various AI-driven solutions
Google LLC (Alphabet Inc.): Provides recommendation systems across its platforms, leveraging extensive data analytics

Hopefully, one day, your company will be one among this elite list. And all the best for your enterprise.

Regardless of which sector you are in, if you have an online presence, you need to use recommendation systems one-way or another. Continue to explore this segment, and if you have an excellent expertise, rest assured that you will be highly in demand.

Never stop learning. Keep up the enthusiasm. Always believe in your infinite potential for growth. Your future is in your hands. Make it extraordinary!

References

Collaborative filtering Deep learning Machine learning Algorithm Neural Networks (journal)

Opinions expressed by DZone contributors are their own.

Related

Trending