I have been getting many questions of how to tune GridGain, so I decided to create a brief manual which covers most important tuning properties.
1. GridGain is multi-threaded - Use It
If you are experiencing somewhat slow performance for cache updates, you should ask yourselves whether you are utilizing full computing power (all the cores) on your machine. GridGain is multi-threaded internally, but if you are doing sequential operations one after another from a single thread, then you are not really using multithreading. Generally it makes sense to use the amount of threads of about 2 or 3 times the number of cores for populating grid. All GridGain APIs are thread-safe, so you don't have to worry about any concurrency issue when populating data.
2. Use Collocated Computations
GridGain enables you to execute MapReduce computations in memory. However, most computations usually work on some data which is cached on remote grid nodes. Loading that data from remote nodes is usually expensive and it is a lot more cheaper to send the computation to the node where the data is. The easiest way to do it is to use
GridProjection.affinityRun(...) method; however GridGain has plenty of "
mapKeysToNodes(...)" methods to help users figure out data ownership within Grid.
If you need to upload lots of data into cache, use
org.gridgain.grid.GridDataLoader to do it. Data loader will properly batch the updates prior to sending them to remote nodes and will properly control number of parallel operations taking place on each node to avoid thrashing. Generally it provides performance of 10x than doing a bunch of single-threaded updates.
4. Tune Initial Cache Size
To avoid internal resizing of cache maps you should always provide proper cache start size - not doing so can significantly hurt performance as some CPU cycles will be spent on GridGain resizing internal cache maps instead of application logic. You can configure cache start size via
GridCacheConfiguration.getStartSize() configuration property.
When using Partitioned cache, GridGain will front this cache with local Near cache to make sure that if entry does not belong to local partitions, it will still be cached in a smaller local cache for better performance on next access.
However, most usages of GridGain happen from collocated computations, i.e. computations submitted to the grid are usually routed to the nodes where the data resides automatically. In cases like this, using Near cache is redundant, as all data access happens from memory anyway. To save on performance, you can disable Near cache by setting
GridCacheConfiguration.isNearEnabled() configuration property.
If you plan to allocate large amounts of memory to your JVM for data caching (usually more than 10GB of memory), then your application will most likely suffer from prolonged lock-the-world GC pauses which can significantly hurt latencies. To avoid GC pauses use
off-heap memory to cache data - essentially your data is still cached in memory, but JVM does not know about it and GC is not affected.
The only configuration property to set to enable off-heap memory is
GridCacheConfiguration.getMaxOffHeapMemory() which will tell GridGain how much off-heap memory to make available for your application. By default off-heap memory is disabled.
First of all, if you don't plan to use swap storage (i.e. disk overflow storage), you should not change any default swap settings (swap storage is disabled by default). If you do need to use swap storage, then you should enable it via
GridCacheConfiguration.isSwapEnabled() configuration property.
There are several configuration properties that you should watch out for here. First of all and most importantly, if you don't plan to use cache queries at all, you should disable indexing altogether via
GridCacheConfiguration.isQueryIndexEnabled() configuration property.
If you do plan to use cache queries, you should properly enable/disable indexing of primitive keys and values on
GridH2IndexingSpi. You should enable indexing for primitive keys by setting
true on the SPI only if you plan to use primitive cache keys in your cache queries. The same goes for indexing primitive values controlled by
Also, if for every value class you don't plan to have different key classes (essentially every value class has one key class), set
setDefaultIndexFixedTyping(...) on the SPI to
true. This way GridGain will store key types as corresponding SQL types instead of binary form which provides faster performance for key lookups.
Again, if you don't plan to over-populate your cache, i.e. if you don't need any eviction policy at all, then you should disable eviction policy altogether via
GridCacheConfiguration.isEvictionEnabled() configuration property.
If you do need GridGain to make sure that data in cache does not overgrow beyond allowed memory limits, you should carefully choose the eviction policy you need. Most likely you will need either FIFO or LRU eviction policies shipped with GridGain, however depending on your application, you may need to configure LIRs or plugin your own custom eviction policy. Regardless of which eviction policy you use, you should carefully chose the maximum amount of entries in cache allowed by eviction policy - if cache size overgrows this limit, then evictions will start occurring. Usually max size is controlled by
setMaxSize(...) configuration property on the instance of eviction policy.
You should also almost always configure "setAllowEmptyEntries(...)" configuration property to
false. By default GridGain will keep entries with null values in cache to preserve some other properties of the entry, like time-to-live for example. However, if you don't use time-to-live then most likely you should discard the entry once it gets expired or invalidated.
10. Use Write-Behind Caching
If you can afford for your persistent store to be behind your in-memory cache, then use
write-behind caching. When
write-behind is enabled, GridGain will batch up cache updates and flush them to database in batches in the
background which can often provide significant performance benefits. You can enable
write-behind caching via
GridCacheConfiguration.isWriteBehindEnabled() configuration property.