It is a universal fact that although Hadoop is becoming the standard for big data processing, it is still a struggle to get concurrency into it. While working on one such high throughput use case, "simultaneous read write" access to a set of files was required where there may be modification requests for files while the data is being also queried from them. Ah! this was an interesting problem to solve in the Hadoop world which does not have any "transactional" support per se. Essentially, the problem statement was to "provide simultaneous access for reads and writes to files in Hadoop along with some rollback capability".
The way to attack the problem was to define a multi-version concurrency model which would also provide rollback capability to older versioned files. And ZooKeeper along with some application level logic proved to be the correct choice for the solution.
The design assumes that only one write can be executed at a time. However, there can be multiple reads simultaneously along with a single write. The steps in the design are outlined below.
- Maintain a current global version in ZooKeeper.
- Maintain read counts on each version in ZooKeeper.
- Whenever files are modified/updated, increment the global count and assign a new global version to the modified files.
- For read requests, files which have latest versions <= global version in ZooKeeper are picked up for reading
- On receiving a read request for a file(s), increment the count against that version in ZooKeeper.
- Once the read request is done, decrement the count against the version in ZooKeeper.
- On a periodic basis:
- Archive files related to a version to an archive zone, if there exists a higher global version and there are no files being read for that version (read count for a version = 0)
- Remove archived read count information from ZooKeeper, if there exists a higher global version and there are no files being read for that version (read count for a version = 0)
- Additionally one can rollback files related to a version from the archive zone if required.
ZooKeeper Nodes Topology
The ZooKeeper nodes topology as per the design looks like this. ZooKeeper works like a filesystem starting with a root directory followed with several nodes (analogous to folders) and finally the data nodes (analogous to files). The circles in the image represent the name of a property/folder that we are trying to maintain and the rounded boxes are the values/files for those properties/folders.
So, the image above shows that the “global version” is 100 and there are 10 & 20 read requests being executed on versions 98 and 99 respectively and since there is a write request in progress, no other write request would be taken up until it completes.
Multi-Versioned System Activity Flow
The activity flow shown above depicts the changes in ZooKeeper states as read and write requests are submitted to the application. t1 is the event when the initial files are created with global version 0. Then read requests are submitted on 0th version (mentioned as t1-extracts). At t3, one write request and 10 read requests are submitted. Read requests execute on 0th version. When write request completes, the global version becomes 1 and the next set of reads execute on version 1 (mentioned as t3-extracts). t8 shows that the files modified by the write request is finally archived and the zookeeper node called “t1-extracts” would finally be discarded. Also, note that for the second read request, version 1 is selected for “file1 & file3” whereas version 0 is selected for “file2” since 0th version is still the latest one for file2.
ZooKeeper advantages in the use case
- ZooKeeper gave us a distributed coordination and failover mechanism.
- It can be a good option for a minimal amount of related information without the need to create relational tables.
- Watches can help in monitoring system state and responding to it.
To know more about ZooKeeper, please visit the following links :