Speed Up Your Computations

DZone 's Guide to

Speed Up Your Computations

Learn to speed up your computations and boost performance in your application by changing the structure of the Transaction entity.

· Performance Zone ·
Free Resource

Sometimes you find yourself trying to squeeze the last bits of your CPU to give you some more milliseconds. Or you may hate the smell of a melting CPU. Or maybe you are just a performance maniac. If you happen to be in one of these cases, keep reading this post.

So let us say you are designing a banking application in which you have a Transaction entity, and let us say that the entity contains these simple fields.

class Transaction {
String uuid;
double amount;
String debitAccountNo;
String creditAccountNo;
int failureAttempts;

And then you have an account with an array of transactions.

class Account {
Transaction[] transactions;

Now you are tasked to calculate the total amount of those transactions. A simple solution would be:

double sumAmount() {
  double result = 0.0;
  for (int i = 0; i < transactions.length; i++) {
      result += transactions[i].amount;
  return result;

(Do not attack me. I know this is not object oriented.)

You test it and it works. The next day, your colleagues start to complain about performance, and given there are millions of transactions, the profiler points to your method.

What are you going to do now? Hmm, let me see, yeah, I got it, parallelism to the rescue. You take this great idea to your colleagues and they turn you down because the server is already filled with active threads. Now what? Well, you just need to better use your cycles (bear the word).

Let’s change the structure of the Transaction entity.

class Transactions {
    String[] uuid;
    double[] amount;
    String[] debitAccountNo;
    String[] creditAccountNo;
    int[] failureAttempts;
    int length;

Now the Account will hold this new Transactions object instead of the Transaction[].

Then the calculation becomes:

double sumAmount() {
  double result = 0.0;
  for (int i = 0; i < transactions.length; i++) {
    result += transactions.amount[i];
  return result;

On my rusty machine and after warming the JVM for a million rounds, this implementation is around 6X faster than the previous one (test the working code for yourselves on GitHub).

Now, why this is faster? It turns out that modern CPUs work better when all the calculation is applied on data that is packed together. The CPU reads data from the memory in cache lines (it is around 64KB in most CPUs) and stores it in the L1 cache. When the CPU wants to fetch data, it will compare its address to address space in the L1 then L2 then L3 caches. If the address is not there, the CPU will read a whole cache line starting from the nearest alignment to the needed address and store it in the L1 cache (in this example, the CPU will load all the attributes of the Transaction to use the amount attribute).

The first example suffers a lot from cache miss, which means the CPU reads a lot of data that it doesn’t need, then throws it away, and then reads more data that it won't need.

One more advantage is that this approach fits naturally with SIMD operations.

To a give an example, the read from the L1 cache in Intel i7 5500 takes around 4 cycles for simple access, whereas a read from main memory takes around 120 cycles.

Now you have it; most of the time, the overhead of the calculation in the first example is not worth the change to the second unless you work in a restricted environment (an embedded system, for example).

And to make a disclaimer, you may not benefit from this approach in a large web app, because other scheduled threads will pollute the cache anyway, and your best bet is the large shared L3 cache (for Intel, at least).

cpu, java, performance

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}