Building an Online Recommendation Engine with MongoDB and Mahout
Once upon a time there was a Munich pizza baker who developed a technique to beam pizza out of bright sunshine. He can produce more than a thousand pizzas per second and needs a channel to sell this amount of pizza and decides to build an online shop. Mario’s initial idea is to sell pizzas, but now he is thinking about introduction of new product lines like beverages, salads and pasta. Before we take a look to the validation of Mario’s idea, let’s take a short look at the existing online shop.
Mario’s online shop is based on MongoDB, Apache Wicket and Spring. MongoDB is a document-oriented NoSQL-Database. MongoDB stores records not in tables as a relational database but in BSON documents, which is a binary version of JSON (Java Script Object Notation) and very similar to the object structure in Mario’s application. The usage of MongoDB makes his development easier and deployment faster.
The figure shows a JSON document which is very similar to a Java object: a JSON document property with the according value corresponds to the Java object property with the appropriate value. You can add or remove properties in your Java object and this will automatically change your database schema. So there is no need to put your Java object model into a relational schema via Hibernate.
Mario also decided to build his online shop only with open-source technologies like Apache Wicket and Spring. Wicket is a very common lightweight component-based web application framework and it is closely patterned after stateful GUI frameworks such as JavaFX. The Spring Framework is an open source application framework and Inversion of Control container for the Java platform and does not impose any specific programming model. Spring has become popular in the Java community as an alternative to, replacement for, or even addition to the Enterprise JavaBean (EJB) model. Because of this architecture Mario is able to deploy its application in a lightweight application server like Tomcat or Jetty.
This figure shows the system landscape of Mario. Mario has two major system on the lefthand site there is his online shop and on the righthand site there is ‘PAS’ a famous billing system. In the middle is Hadoop that connects both systems together.
In the business world an application normally does not stand alone. In most cases an application must communicate with others. The lean architecture of Marios online shop enables him to connect the billing system ‘PAS’ to his online shop. Spring for Apache Hadoopprovides this integration between the two systems online shop and ‘PAS’. Hadoop supports data-intensive distributed applications and implements a computational paradigm named MapReduce, where the computation is divided into many small fragments, each of them may be executed or re-executed on any node in the cluster of commodity hardware. Mario uses Hadoop as an ETL layer that enables him to transfer gigabytes of order information into the billing system. In this case Hadoop makes it possible for a financial controller to verify if all orders were billed correctly.
In addition to the online shop feature Mario has a real-time sales dashboard that enables him to track his sales in real time. The dashboard displays daily and monthly sales statistics for each pizza and contains a map with the geographical overview of customer activity and competitor locations.
Here is a walkthrough of the shop:
Now lets talk about Mario’s incredible new idea: Mario wants to sell even more pizza! And other products as well. Mario decides to use Lean Startup methods in order to test the possible introduction of new product lines and plans an experiment to validate his new idea using a scientific approach and pure facts instead of hunches. Mario’s core assumption is that customers want to buy other products than pizza – drinks, salads and pasta. Furthermore he is worried about pricing. Mario contacts all customers to complete a survey and provides an incentive for the participation, a free pizza, to every customer who responds to the survey.
The result of the survey validated Mario’s assumption – customers want to buy beverages, salads and pasta. But he also found out that his customers are willing to pay higher prices for high-quality products and that they simply love his easy shopping flow. Currently a pizza order can be completed with three clicks only, so there is new riskiest assumption to validate: Will a more complex shopping flow affect his sales?
The figures shows a Validation Board. A Validation Board is a deceptively simple tool for testing out product ideas. Furthermore a validation board tracks pivots which follows from customer feedback.
Mario decides to introduce beverages, salads and pasta product lines and thinks about a possibility, how he can handle the extension of the product line without destroying the easy shopping flow. That’s why Mario thinks a recommendation engine is the right way for him. Panels for recommendations can be integrated in the online shop without changing the shopping flow. Mario hired a statistician to help him implement a recommender system for his online shop for better cross-selling. He also defined new measurement points to validate his new idea. Therefore he would like to track the conversion rate of orders as well as cross-selling rates, just as every other important event in the online shop is already tracked inrealtime. So Mario can very easily perform further experiments in order to verify more assumptions.
Follow the blog to see how the story continues!