Last Saturday Sergio Bossa has been at the Spring Italian Meeting in Cagliari, for an enjoyable meet-up with colleagues, friends, and Spring-passionate users. I specifically enjoyed going through power point as it had many great coding samples:
Here are some take away points about this meeting.
Q: Splitting a task into jobs and sending them to grid nodes involves some overhead due to data transfer: do you have any percentage number that shows you when this overhead is too high compared to what you gain by parallelizing your jobs?
A: I don't believe in magic numbers :)
I'd like to answer your question in a different way: just keep your overhead as little as possible by applying data affinity, that is, by keeping jobs and the data they need together, trying to minimize data transfers.
If you'll not transfer any data, your overhead will be at its minimum.
Q: You talked about data affinity and data grid solutions: what about my database?
A: For really scaling out your application, you must scale your full application stack: hence, your database must scale, too.
I think one of the most effective ways of making you database scale is to partition it, by splitting data into several instances and making every job access a different partition, depending on the data it needs.
Another strategy would be to use a master/replica scenario, where you have a master instance and several read-only replicas, which you map your jobs to for read-intensive operations.
Q: Is there any GridGain success story? Do you really use it?
A: Yes, we do :)
We recently developed for the Italian Public Broadcasting Service a custom Content Management System with extended capabilities for life cycle management and rule based publishing of editorial contents. The publishing infrastructure is made up of a GridGain based application managing the publishing cycle of all public web sites, ranging from the main web portal to all related web sites.
It has been implemented for linearly scaling out the publication process from one to hundred sites, by distributing publishing operations on grid nodes, each one capable of publishing contents of one or more sites independently from others: this means that with a number of physical nodes equal to the number of sites to publish, the whole publication process would linearly scale by taking the same time as there were just one site.