[This article was written by David Bennett]
"There’s no benchmark for how life’s “supposed” to happen. There is no ideal world for you to wait around for. The world is always just what it is now, it’s up to you how you respond to it.”
― Isaac Marion, Warm Bodies
At one time or another, most of us have heard some version of this question: “Sure the system does fine in the benchmarks, but can it perform in production?”. Benchmark software developers have tried to address this issue in their systems. Features such as configurable workloads and scripting interfaces help to tailor benchmarks to various scenarios, but still require an expert to properly implement them.
A brief overview of LinkBench
The LinkBench benchmark was developed by Tim Armstrong with the guidance and help of a team from Facebook during his internship there. It takes a different approach to the challenge of simulating the real world. LinkBench is designed from the ground up as a replica of the data operations of Facebook’s social graph. By implementing an identical data model along with business methods and workloads directly proportionate the those used in the production social graph, LinkBench can effectively duplicate the data load that will be seen in a production social networking application.
Anatomy of a Social Graph – The Data Model
With this deceptively simple schema, a very robust application can be built.
The nodetable defines an object or end-point within the social graph. Examples of nodes include users, posts, comments, replies, albums, photos, groups, etc… The node’s type attribute is the magic that determines what the node represents and the data attribute contains the object itself (up to 16mb).
The linktable is a generic association or bridge table allowing any two nodes to be associated in a specific way. The secret sauce in this case, is the link_type attribute which represents a specific relationship between any two nodes. Examples of links include users being friends, a user liking a post of another user, a user that is tagged in another user’s photo and so on.
The third table, counttable is very important for performance and scalability in a social network. It maintains counts of a given link type for a node. Counts are transactionally updated whenever an operation that could potentially alter the count occurs. This small investment in the form of an additional write operation pays off by allowing for quick access to the number of likes, shares, posts, friends and other associations between nodes. Without the count table, the application would have to continuously query the database to retrieve up-to-date count information for various relationships creating a tremendous amount of system load.
The Social Graph Controller
As you can see, the model is very simple. The real magic in the social graph lies in the controller. LinkBench simulates the controller<->model interface through it’s workload configuration. The included configuration is based on actual production measurements of data payload size and distribution, node and link ‘temperature’ (popularity) and logged operation mix over a period of days.
The Social Graph In Use
Implementation of MongoDB / TokuMX plugin for LinkBench
LinkBench is designed to be customizable and extensible in order to test new persistence technologies and architecture designs. A new database plugin can be developed by extending the abstract classcom.facebook.LinkBench.LinkStore and/or implementing the interfacecom.facebook.LinkBench.NodeStore. There is also a combinationcom.facebook.LinkBench.GraphStore class that can be sub-classed for a combination of both LinkStore and NodeStore. One disadvantage of the current implementation is that it is up to the plugin developer to follow all of the business requirements of the social graph in the plugin. This requires careful auditing of each plugin to insure that it has been implemented to specification. To assure a 1-to-1 parity with the MySQL plugin, I used it as a base and converted the methods to MongoDB one at a time carefully translating each operation.
Along the way, I’ve learned a lot about NoSQL and MongoDB in particular and dispelled a few myths that I had about NoSQL. I will save that for another article. Let me talk about a few design decisions I made while implementing the plugin.
- Compatibility – In order to provide comparisons, the LinkBench plugin maintains compatibility with MongoDB 2.x, TokuMX 2.x, MongoDB 3.x and TokuMXse (RC)
- Transactions – MVCC concurrency is used in the LinkBench MySQL plugin. In order to maintain this capability I implemented new configuration transaction_support_level which allows the Benchmark to run with no transaction support, MVCC only if supported or simulated transactions using the Two Phase Commit strategy documented on the MongoDB site.
- Schema – The relationship between nodes and links does not facilitate the use of embedded documents. It would be possible to embed count documents under node, however it probably isn’t worth the the extra complexity and network traffic that would be generated. I opted to leave the schema flat.
In Part 2 I will dive into the LinkBench Java code a bit to show the comparison between the MySQL plugin the MongoDB plugin.