Having spoken with many customers evaluating
I am noticing that a majority of folks evaluating in-memory computing, whether it be data grid, map reduce, or streaming, do not know how to appropriately perform benchmarking. The right approach to distributed in-memory benchmarking is very different than benchmarking disk-based products, like databases, and generally requires experience and understanding of the delicate details of how network and garbage collections behave under load. With that in mind,
will soon be releasing a benchmarking framework to help easily overcome all these newbie issues, but until then here is a laundry list of things to watch out for.
1. Did you allocate enough memory?
If you have not allocated enough memory, the performance of your application may significantly degrade. From a user's stand point it may not really be noticeable until your application gradually becomes slower and slower, which is the first sign that you may be running out of memory.
Also keep in mind that if you allocate more than 10GB of memory per JVM, your application may start suffering from prolonged garbage collections. To mitigate that, check if the product you are evaluating has support for
which cheats Java garbage collection by utilizing off-heap memory space.
When running benchmarks, you should constantly monitor memory consumption. You can do it with any light weight profiler, like VisualVm which is shipped by default with JDK.
2. Are your tests multi-threaded?
Benchmarking should not be performed from a single thread or a small number of threads, as generally such tests will not load the system. This is especially true for synchronous API calls, as the next API call has to wait for the previous one to complete. It is generally a good practice to run benchmarks from about 60 to 100 threads (on boxes with more than 16 cores, the number of threads can be higher).
When running benchmarks, you should monitor the CPU load on your system and add more threads if system is below 90% load.
3. Do you perform initial warm-up?
JVM HotSpot compiler will usually inline and precompile portions of the code that get executed the most. However, usually in load tests, this warm-up process takes about 20 or 30 seconds. It is generally a good practice to allow benchmarks to warm up for about 30 seconds before you start measuring performance.
When running benchmarks, you should do periodic print outs of your throughput (number of operations per second). The warm-up is usually finished when the throughput numbers stabilize.
4. Did you tune your garbage collection?
If you are seeing spikes in your throughput, it may be due to JVM Garbage Collection (GC). In this case it is best to use concurrent sweep GC which has proven to provide fairly smooth throughput without any large spikes or long pauses.
Here is a good link, which describes several GC tuning parameters: Tune Garbage Collection.
5. Do you use bulk operations?
Often many developers will sequentially execute multiple single cache
'put(key, value)' operations and then notice that performance is not ideal. The main reason is that whenever network I/O is involved, you should always strive to send as few messages as possible. To achieve this, a majority of data grid or caching products provide support for bulk operations, such as
'putAll(...)'. Bulk updates can often improve performance by 100x magnitude.
When using bulk updates, make sure to experiment with batch sizes - making them too big or too small can hurt performance. Also see if the product you are evaluating has already automated bulk-updates for you.