Compute Grids - What to Expect?
Join the DZone community and get the full member experience.Join For Free
Let's take Cloudy Akka for example. While the API seems pretty simple from reading this blog, calling it a compute grid is rather overstating the reality. I would probably not go any further than naming it for what it is - a Convenient Scala API for RPC.
Having worked on GridGain compute grid myself for over 5 years already (and then on data grid), and having studied quite a few others, here are some features that are minimally required in any product in my opinion, before it can claim itself as a compute grid.
- Auto Discovery - all nodes in the grid should auto-discover each other, i. e. user should never have to manually add nodes to a topology.
- MapReduce - support for splitting execution into multiple sub-jobs and then aggregating the results is just a MUST. Otherwise you are offloading most of the dirty work onto your users, which is not fair.
- Auto Failure Detection - compute grid must be smart enough to automatically detect node crashes and proportionally distribute all the load among remaining nodes.
- Fault Tolerance - all failed grid jobs must be automatically failed-over to other nodes, which are better suited for executing these jobs.
- Load Balancing - compute grid should automatically distribute load equally among nodes, usually utilizing many different policies for load balancing. GridGain even has support for work-stealing, where less-loaded nodes can steal jobs from overloaded nodes.
- Job Collision Resolution - this gives users control over how many jobs can run in parallel, while other jobs should wait in waiting queues, ordered by multiple available collision resolution strategies.
- Auto Deployment - compute grid users should never be forced to manually deploy their libraries on all available grid nodes, this is just way too inconvenient and error-prone. The approach I like the best (available in GridGain) is auto-deployment, where code just automatically penetrates throughout the grid without any explicit action from users.
- Nested Jobs And Continuations - compute grid jobs should be able to invoke other compute grid jobs when executing remotely. This is a very powerful feature, especially when grid jobs are recursive. Continuations should allow to suspend a job and release its resources while it's waiting for a result of another job within the grid.
I could probably continue, but I will stop here. I think it is clear that compute grid products are way more advanced than RPC frameworks. Just as a comparison, here are the GridGain examples implemented in GridGain Scalar, which cover examples provided by Cloudy Akka in a simpler fashion, and also do all the other cool stuff compute grids do in the background:
// 1. Execute a simple job on some remote node.
grid !!< (() => println("> GridGain ROCKS <"))
// 2. Execute a simple job on all remote nodes.
grid !!! (() => println("> GridGain ROCKS <"))
// 3. Use MapReduce to split a phrase into multiple words and
// print each word on remote nodes.
grid !!~ (for (w <- "GridGain ROCKS".split(" ")) yield () => println(w))
// 4. Use MapReduce to count number characters by spreading
// workload to the grid and reducing on local node.
val cnt = grid !*~ (for (w <- "GridGain REALLY ROCKS!".split(" "))
yield () => w.length, // Map step.
(s: Seq[Int]) => s.sum) // Reduce step.
Opinions expressed by DZone contributors are their own.