Over a million developers have joined DZone.

Infinispan 5.2: Map/Reduce Parallel Execution

DZone's Guide to

Infinispan 5.2: Map/Reduce Parallel Execution

· Big Data Zone
Free Resource

Effortlessly power IoT, predictive analytics, and machine learning applications with an elastic, resilient data infrastructure. Learn how with Mesosphere DC/OS.

Ever since the Infinispan 5.2 release we implemented fully distributed execution of both map and reduce phases of MapReduceTask. For the map phase, MapReduceTask hashes task input keys, groups them by execution node N these keys are hashed to, and sends map function along input keys to each node N. At node N map function gets invoked for each input key and locally loaded corresponding value. However, map function on node N, until recently, got invoked on a single thread regardless of the number of key/value pairs. If we need to invoke map function on many key/value pairs, things would sooner rather than later grind to a halt.

Similarly in order to complete reduce phase, MapReduceTask groups intermediate KOut keys by execution node N they are hashed to. After intermediate phase is completed, MapReduceTask sends a reduce command to each node N where KOut keys are hashed. Once reduce command arrives on target execution node, it looks up temporary cache belonging to MapReduceTask and for each KOut key, grabs a list of VOut values, wraps it with an Iterator and invokes reduce on it. However, even reduce function, until recently, got invoked on a single thread, as well. Even though, due to the nature of map/reduce paradigm, reduce entails significantly smaller number of key/value function invocations compared to a map, current single threaded execution model does not help to speed things up.

Starting with Infinispan community release 7.0.0.Alpha1, map and reduce task phases are executed in parallel. If the eviction is not configured for the cache where key/value pairs involved in the map phase reside, MapReduceTask uses fork/join work-stealing technique for parallel execution of the map and reduce functions. Otherwise, we implement parallel execution using a standard thread executor framework. Reduce phase is always executed using fork/join work-stealing algorithm. Either way, we are hoping that users' large map/reduce tasks will experience a significant execution speedup.  At the moment, we are conducting our own performance tests and will get back to you with the results soon. Stay tuned.


Learn to design and build better data-rich applications with this free eBook from O’Reilly. Brought to you by Mesosphere DC/OS.


Published at DZone with permission of Manik Surtani, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.


Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.


{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}