Over a million developers have joined DZone.

Try Optimising the Memory Consumption First

DZone's Guide to

Try Optimising the Memory Consumption First

· Performance Zone ·
Free Resource

Sensu is an open source monitoring event pipeline. Try it today.

You would think that if you wanted your application to go faster you would start with the CPU profiling.  However, when looking for quick wins, it's the memory profiler I target first.

Allocating memory is cheap

Allocating memory has never been cheaper.  Memory is cheaper, you can get machines will thousands of GBs of memory. You can buy 16 GB for less than $200.
The memory allocation operation is cheaper than in the past, and it's multi-threaded so it scales reasonably well.
However, memory allocation is not free.  Your CPU cache is a precious resources especially if you are trying to use multiple threads.  While you can buy 16 GB of main memory easily, you might only have 2 MB of cache per logical CPU.  If you want these CPUs to run independently, you want to spend as much time as possible within the 256 KB L2 cache.
Cache level Size access time in clock cycles concurrency
32 KB data
32 KB instruction
cores independent
256 KB
cores independent
3 MB - 32 MB
sockets independent
4 MB - 4 TB
each memory region seperate

Allocating memory is not linear

Allocating memory on the heap is not linear.  The CPU is very good at doing things in parallel.  This means that if memory bandwidth is not your main bottleneck, the rate you produce garbage has less impact that what ever your bottleneck is, however if the allocation rate is high enough (and in most Java systems it is high) it will be a serious bottleneck.
You can tell if the allocation rate is a bottleneck if;
  • You are close to the maximum allocation rate of the machine.  Write a small test which creates lots of garbage and measure the allocation rate.  If you close to this you have a problem.
  • When you return the garbage produced by say 10%, the 99% latency of application becomes 10% faster, and yet the allocation rate hardly drops.  This means your application speed up so that it reached your bottleneck again.
  • You have very long pause times e.g. into the seconds.  At this point, your memory consumption has a very high impact on your performance, and reducing the memory consumption and allocation rate can improve scalability (how many requests you can process concurrently) as well as reduce your worst case jitter.

Is there a way to see CPU and memory at the same time

After reducing allocation rate, I look at the CPU consumption, with memory trace turned on.  This give more weight to the memory allocations and will give you a different view to looking at CPU alone.
Only when this CPU&Memory view looks clean, or at least has no quick wins do I look at CPU profiling alone.
Using these techniques as a starting point my aim is typically to reduce the 99%tile latency (the worst 1%) by a fact of 10.  However, this approach can also increase the throughput of each threads as well as allow you to run more thread concurrently in an efficient manner.

For more information

The profiler I use is  YourKit , the IDE I use is  IntelliJ, an excellent tool for visualising your allocation rate and GC timings is  Censum.
We offer  Advanced Java Training with hands on exercises for individuals, as well as  Corporate Training which can be tailors and is more cost effective per person.

Sensu: workflow automation for monitoring. Learn more—download the whitepaper.


Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}