Shopping is a necessity for every human being. And when we shop, we tend to buy products recommended by people we trust. That's why, in the digital age, any online shop you visit uses some sort of recommendation engine.
Recommendation engines are data filtering tools that make use of algorithms and data to recommend the most relevant items to a particular user. In simpler terms, they're nothing but an automated version of a “shop counter guy.” You ask him for a product. He not only shows that product but also the related ones you might also be interested in (which are also often more expensive). They are well trained in cross-selling and upselling.
With the growing amount of information on the internet and with a significant rise in the number of users, it is becoming increasingly important for companies to search data and provide users with relevant information according to their preferences and tastes.
How Does a Recommendation Engine Work?
According to the article Using Machine Learning on Compute Engine to Make Product Recommendations, a typical recommendation engine processes data through the following four phases: collecting, storing, analyzing, and filtering.
The first step in creating a recommendation engine is gathering data. Data can be either explicit or implicit. Explicit data consists of data inputted by users, such as ratings and comments on products. Implicit data might include order history, return history, cart events, page views, clicks, and search log. This data is collected for every user who visits any given site.
Behavior data is easy to collect because you can keep a log of user activities on your site. Collecting this data is also straightforward because it doesn’t require any extra action from the user; they’re already using the application, after all. The downside of this approach is that it’s harder to analyze the data. For example, filtering the necessary logs from the less important logs can be cumbersome.
Since each user has different likes and dislikes about a given product, their datasets will be distinct. Over time, as you feed the engine more data, it gets smarter and smarter with its recommendations so that customers are more likely to engage, click, and buy (like how Amazon’s recommendation engine has "Frequently bought together" and "Recommended for you" tabs).
The more data that you make available to your algorithms, the better the recommendations will be. This means that any recommendations project can quickly turn into a big data project.
The type of data that you use to create recommendations can help you decide the type of storage you should use. You could choose to use a NoSQL database, a standard SQL database, or even some kind of object storage. Each of these options is viable depending on whether you’re capturing user input or behavior, as well as on factors such as ease of implementation, the amount of data that the storage can manage, integration with the rest of the environment, and portability.
When saving user ratings or comments, a scalable and managed database minimizes the number of tasks required and helps focus on the recommendation itself. Cloud SQL fulfills both of these needs and also makes it easy to load the data directly from Spark.
How do we find items that have similar user engagement data we filter the data by using different analysis methods. If you want to provide immediate recommendations to the user as they are viewing the product, you will need a more nimble type of analysis. Some of the ways in which we can analyze the data are the following.
- Real-time systems canprocess data as it’s created. This type of system usually involves tools that can process and analyze streams of events. A real-time system is required if you want to provide in-the-moment recommendations.
- Batch analysis demands that you process the data periodically. This approach implies that enough data needs to be created in order to make the analysis relevant, such as daily sales volume. A batch system might work fine to send an e-mail at a later date.
- Near-real-time analysis lets you gather data quickly; you can refresh the analytics every few minutes or seconds. A near-real-time system works best for providing recommendations during the same browsing session.
The next step is to filter the data to get the relevant data necessary to provide recommendations to the user. We have to choose an algorithm that would better suit the recommendation engine from the list of algorithms explained above. Some types of filters are:
- Content-based: A popular, recommended product has similar characteristics to what a user views or likes.
- Cluster: Recommended products go well together, no matter what other users have done.
- Collaborative: Other users, who like the same products as another user views or likes, will also like a recommended product.
Collaborative filtering enables you to make product attributes theoretical and make predictions based on user tastes. The output of this filtering is based on the assumption that two users who liked the same products in the past will probably like the same ones now or in the future.
You can represent data about ratings or interactions as a set of matrices, with products and users as dimensions. Assume that the following two matrices are similar, but then we deduct the second from the first by replacing existing ratings with the number one and missing ratings by the number zero. The resulting matrix is a truth table where a number one represents an interaction by users with a product.
We use the K-nearest algorithm, Jaccard’s coefficient, Dijkstra’s algorithm, and cosine similarity to better relate the data sets of people for recommending based on the rating or product.
K-nearest algorithm cluster filtering
Finally, using the result obtained after filtering and using the algorithm, recommendations are given to the user based on the timeliness of the type of recommendation, whether that means providing a real-time recommendation or sending an email after some time.