Over a million developers have joined DZone.

An Inconvenient Latency

· Java Zone

Discover how AppDynamics steps in to upgrade your performance game and prevent your enterprise from these top 10 Java performance problems, brought to you in partnership with AppDynamics.

Vendors typically publish numbers they are happy with, and avoid telling you about a product's weaknesses.  However, behind the numbers is a dirty secret if you know where to look.

Why don't we use GPUs for everything?

Finding problems which naturally scale to thousands of data points/tasks is easy for some problems, and very hard for others.

GPUs are designed for computing large vector operations.  However, what is "large" and why does it matter?

Say you have a GPU with 1024 cores. This means it can process a vector of length 1024 all at once and 2048 double the time.  But what happen if we only have a vector of 100, or 10 or 1. The inconvenient answer is it take the same amount of time because you can’t make use of all of your cores at once.  You get only 10%, 1% or just 0.1% of the peak performance.  If you want to get best efficiency, you want a problem which has many thousands of values which can be processed concurrently.  If you don't have a problem which is so concurrent, you are not going to get maximum performance.

Unlike a screen full of pixels, a lot of business logic deals with a small number values at once so you want a small number of fast cores which can perform independent tasks at once.

Why web servers favour horizontal scalability

Web services scale well up to one task per active user.
To use multiple cores, you need a problem which naturally breaks into many independent tasks. When you provide a web service, you have many users and the work for each user is largely independent.
If you have ten concurrent users, you can expect to close to ten time the throughput as having one user at a time. If you have one hundred users, you can get up to ten times the throughput of ten users etc.

The limit of your concurrency is around the number of active/concurrent users you have.

Throughput, latency and concurrency

When a vendor benchmarks their product, a common benchmark is throughput. This is the total number of operations per second over a significant period of time.  Ideally this should be many minutes.

Vendors are increasingly publishing average latency benchmarks.  Average latency is a fairly optimistic number as it is good at hiding small numbers of particularly bad latencies. For example, if you want to hide long GC pauses, use average latency.

The problem for vendors is these two measures can be used in combination to determine the minimum concurrency required to achieve the throughput claimed.

A given problem has a "natural concurrency" the problem can be easily broken into. If you have 100  concurrent users, you may have a natural latency of about 100.  There is also a "minimum concurrency" implied by a benchmark.

minimum-concurrency = throughput * latency.

To achieve the throughput a benchmark suggests your natural concurrency should greater than the minimum concurrency in the benchmark.  If you have less natural concurrency, you might expect you get pro-rata throughput as well.

Consider these benchmarks for three key-value stores. 

key-store throughput latency
Product 1 1,000,000/s   46 ms     or 0.046s
Product 2    600,000/s   10 ms      or 0.010s
Product 3 2,000,000/s   0.002 ms or 2e-6s

You might look at this table and say, they all look good, they all have high throughputs and low enough latencies.  The problem is; there is an implied natural concurrency requirement to achieve the throughput stated for the latency measured. Lets see what the minimum concurrency needed to achieve the latency

key-store throughput latency minimum concurrency
Product 1 1,000,000/s   46 ms     or 0.046s 46,000
Product 2    600,000/s   10 ms      or 0.010s   6,000
Product 3 2,000,000/s  0.002 ms or 2e-6s          4

Many problems have around 4 independent tasks, but 46K is pretty high.  So what?  What if you only have 10 concurrent tasks/users. 

key-store concurrency latency throughput achieved
Product 1   10   46 ms     or 0.046s            220/s
Product 2   10   10 ms      or 0.010s         1,000/s
Product 3   10   0.002 ms or 2e-6s  2,000,000/s

Note: having more concurrency than you need, doesn't help throughput much, but a lack of natural concurrency in your problem will hurt you throughput (and horizontal scalability)


Next time you read a benchmark which includes throughput and average latency, multiply them together to see what level of concurrency would be required to achieve that throughput and compare this with the natural concurrency of the problem you are trying to solve to see if the solution fits your problem. 

If you have more natural concurrency, you how more solutions you can consider, if you have a low natural concurrency, you need a solution with a low latency.

The Java Zone is brought to you in partnership with AppDynamics. AppDynamics helps you gain the fundamentals behind application performance, and implement best practices so you can proactively analyze and act on performance problems as they arise, and more specifically with your Java applications. Start a Free Trial.


Published at DZone with permission of Peter Lawrey, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}