Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

High-Concurrency HTTP Clients on the JVM

DZone's Guide to

High-Concurrency HTTP Clients on the JVM

HTTP is a super popular app protocol with loads of libraries. Here's a look at high-concurrency HTTP clients on Java virtual machines.

· Performance Zone
Free Resource

HTTP is probably the most popular application-level protocol and there are many libraries that implement it on top of network I/O, which is a special (stream-oriented) case of general I/O. Since all I/O has a much in common, let’s start with some discussion about it.

I’ll concentrate on I/O cases with a lots of concurrent HTTP requests, for example micro-services, where a set of higher-level HTTP services invoke several lower-level ones, some concurrently and some sequentially due to data dependencies.

When serving many such requests the total number of concurrently open connections can become big at times; if there are data dependencies, or if the lower-level services are slow (or slowed down due to exceptional conditions). So microservice layers tend to require many concurrent, potentially long-lived connections. To see how many open connections we are required to support without crashing let’s recall Little’s Law with Ψ being the average in-progress requests count, ρ being the average arrival rate and τ being the average completion time:

Ψ = ρ τ

The number of in-progress requests we can support depends on the language runtime, the OS and the hardware; the average request completion time (or latency), depends on what we have to do in order to fulfill the requests, including of course the calls to any lower level services, access to storage etc.

How many concurrent HTTP requests can we support? Each will need an open connection and some runnable primitive that can read/write on it using syscalls. If the memory, I/O subsystem and network bandwidth can keep up, modern OSes can support hundreds of thousands open TCP connections; the runnable primitives they provide to work on sockets are threads. Threads are much more heavyweight than sockets: a single box running a modern OS can only support 5000-15000 of them.

From 10,000 Feet: I/O Performance on the JVM

Nowadays JDK threads are OS threads on most platforms but if at any time there are only few concurrent connections then the “thread-per-connection” model is perfectly fine.

What if not? The answer to this question has changed along history:

  • JDK pre-1.4 only had libraries calling into the OS’ thread-blocking I/O (java.io pkgs), so only the “thread-per-connection” model or thread-pools could be used. If you wanted something better you’d tap into your OS’ additional features through JNI.
  • JDK 1.4 added non-blocking I/O or NIO (java.nio packages) to read/write from connections only if it can be done immediately, without putting the thread to sleep. Even more importantly it added a way for a single thread to work effectively on many channels with socket selection, which means asking the OS to block the current thread and unblock it when it is possible to receive/send data immediately from at least one socket of a set.
  • JDK 1.7 added NIO.2, also known as asynchronous I/O (still java.nio packages). This means asking the OS to perform I/O tasks completely in the background and wake up a thread with a notification later on, only when the I/O has finished.

Calling HTTP From the JVM Either Easily or Efficiently: the Thread-blocking and Async Toolboxes

There’s a wide selection of open-source HTTP client libraries available for the JVM. The thread-blocking APIs are easy to use and to maintain but potentially less efficient with many concurrent requests, while the async ones are efficient but harder to use. Asynchronous APIs also virally affect your code with asynchrony: any method consuming asynchronous data must be asynchronous itself, or block and nullify the advantages of asynchrony.

Here’s a selection of open-source HTTP clients for Java and Clojure:

  • JDK’s URLConnection uses traditional thread-blocking I/O.
  • Apache HTTP Client uses traditional thread-blocking I/O with thread-pools.
  • Apache Async HTTP Client uses NIO.
  • Jersey is a ReST client/server framework; the client API can use several HTTP client backends including URLConnection and Apache HTTP Client.
  • OkHttp uses traditional thread-blocking I/O with thread-pools.
  • Retrofit turns your HTTP API into a Java interface and can use several HTTP client backends including Apache HTTP Client.
  • Grizzly is network framework with low-level HTTP support; it was using NIO but it switched to AIO .
  • Netty is a network framework with HTTP support (low-level), multi-transport, includes NIO and native (the latter uses epoll on Linux).
  • Jetty Async HTTP Client uses NIO.
  • Async HTTP Client wraps either Netty, Grizzly or JDK’s HTTP support.
  • clj-http wraps the Apache HTTP Client.
  • http-kit is an async subset of clj-http implemented partially in Java directly on top of NIO.
  • http async client wraps the Async HTTP Client for Java.

From 10,000 Feet: Making it Easy

Since Java threads are heavy on resources, if we want to perform I/O and scale to many concurrent connections we have to use either NIO or async NIO; on the other hand they are much more difficult to code and maintain. Is there a solution to this dilemma?

If threads weren’t heavy we could just use straightforward blocking I/O, so our question really is: can we have cheap enough threads that could be created in much larger numbers than OS threads?

At present the JVM itself doesn’t provide lightweight threads but Quasar comes to the rescue with fibers, which are very efficient threads, implemented in userspace.

Calling HTTP From the JVM Both Easily and Efficiently: the Comsat Fiber-blocking Toolbox

Comsat integrates some of the existing libraries with Quasar fibers. The Comsat APIs are identical to the original ones and the HTTP clients section) explains how to hook them in; for the rest simply ensure you’re running Quasar properly, fire up your fibers when you need to perform a new HTTP call and use one (or more) of following fiber-blocking APIs (or take inspiration from templates and examples:

  • Java:
    • An extensive subset of the Apache HTTP Client API, integrated by bridging the async one. Apache HTTP Client is mature, efficient, feature-complete and very widely used.
    • The fiber-blocking Retrofit API wraps the Apache client. Retrofit is a modern and high-level HTTP client toolkit that has been drawing a lot of interest also for ReST.
    • The JAXRS synchronous HTTP client API, integrated by bridging Jersey’s async one. Jersey is a very popular JAXRS-compliant framework for ReST, so several micro-services could decide to use both its server and client APIs.
    • The OkHttp synchronous API, integrated by bridging the OkHttp async API. OkHttp performs very well, is cheap on resources and feature-rich yet at the same time it has a very straightforward API for common cases, plus it supports HTTP2 and SPDY as well.
  • Clojure:
    • An extensive subset of the clj-http API, integrated by bridging the async API of http-kit. clj-http is probably the most popular HTTP client API in the Clojure ecosystem.

New integrations can be added easily and of course contributions are always welcome.

Some Load Tests with JBender

jbender is Pinterest’s Quasar-based network load testing framework. It’s efficient and flexible but thanks to Quasar fiber-blocking its source code is tiny and readable; using it is just as straightforward as using traditional thread-blocking I/O.

Consider this project, which builds on JBender and with a tiny amount of code implements HTTP load test clients for all the Comsat-integrated libraries, both in their original thread-blocking version and in Comsat’s fiber-blocking one.

JBender can use any either (plain, heavyweight, OS) threads or fibers to perform requests, both are abstracted by Quasar to a shared abstract class called Strand, so the thread-blocking and fiber-blocking versions share HTTP code: this proves that the Comsat-integrated APIs are exactly the same as the original ones and that fibers and threads are used exactly in the same way.

The load-test clients accept parameters to customize pretty much every aspect of their run but the test cases we’ll consider are the following:

  1. 41000 long-lived HTTP connections fired at the highest possible rate.
  2. Executing 10000 requests (plus 1000 of initial client and server warmup) lasting 1 second each with a target rate of 1000 rps.
  3. Executing 10000 requests (plus 1000 of initial client and server warmup) lasting 100 milliseconds each with a target rate of 10000 rps.
  4. Executing 10000 requests (plus 1000 of initial client and server warmup) with an immediate reply and a target rate of 100000 rps.

All of the tests have been fired against a server running Dropwizard, optimized to employ fibers on the HTTP server-side with comsat-dropwizard for maximum concurrency. The server simply replies to any request with “Hello!”

Here’s some information about our load test environment:

Parallel Universe Stack Quasar 0.7.4-SNAPSHOT, Comsat 0.5.0
Fiber Server (comsat-dropwizard 0.5.0, Jetty 9.2.9) AWSEC2 Linux m4.xlarge (16 GB, 4 vcpus, high net perf)
Client (GET “/” -> 204 “Hello”) AWSEC2 Linux t2.medium (4 GB, 2 vcpus, moderate-to-low net perf)
OS Settings https://github.com/circlespainter/jbender
JBender load test suite (server + clients) https://github.com/circlespainter/comsat-http-client-bench
CPU/RAM Monitoring method JFR
CPU/RAM sampling interval JFR’s default
JVM Oracle 1.8.0_b66
JVM Settings -XX:+AggressiveOpts
HTTP Client settings No retries, maximum-sized connection pool, I/O threads (only async) = <cpus> (= 2 for m4.medium), connect/read/write/ttl timeout = 1h
Warmup 1000 reqs, both server and client
Request generator buffer = # reqs with pre-generation for throughput tests, = 1 for concurrency tests
Request completion events buffer = # reqs with throughput tests, = 1 for concurrency tests

The first important result is that the Comsat-based clients win hands-down, each compared to its respective non-fiber mode. Apache’s for many long-lasting connections and OkHttp’s for lots of short-lived requests with a very high target rate, both with a small and a bigger heap (resp. 990 MiB and 3 GiB, showing just the first one for brevity):

HTTP Client Load Test (colored is best) Regular (thread-blocking) Apache (4.4.1) Comsat (fiber-blocking) Apache (async 4.1) Regular (thread-blocking) OkHttp 2.4.0 Comsat (fiber-blocking) OkHttp 2.4.0 Regular (thread-blocking) Jersey 2.19 w/JDK connector Comsat (fiber-blocking) Jersey 2.19 w/JDK connector
AHC blocking (BIO) AHC async (NIO) + Quasar fibers OkHttp blocking (BIO) OkHttp async (BIO) + Quasar fibers Jersey blocking Jersey async + Quasar fibers
Long-lived concurrent 41k (maximum rate possible) Max 16715 41k 16358 16608 16713 16713
Error OOM - thread - OOM - thread OOM - thread OOM - thread OOM - thread
Time (s) 8.8 8.8 16.6 16.7 16.5 20.2
Heap max (MiB) N/A 702 N/A N/A N/A N/A
Heap avg (MiB) 246
Threads max 16
Throughput with target rate 1k (response after 1s) Time max (ms) 7139 1138 10209 1301 6341 4370
Time avg (ms) 2359 1002 3031 1008 1902 1477
Heap max (MiB) 227 110 125 119 330 342
Heap avg (MiB) 61 29.8 34.9 30 76.8 73.7
Threads max 4000+ 15 4000+ 1900+ 4300+ 2600+
Throughput with target rate 10k (response after 100ms) Time max (ms) 4898 4085 7939 7079 45198 14512
Time avg (ms) 2479 2717 3423 2125 25885 7594
Heap max (MiB) 338 192 179 165 495 489
Heap avg (MiB) 91.3 67.2 40.9 38.5 147 155
Threads max 7500+ 16 4900+ 3900+ 11000+ 6900+
Throughput with target rate 100k (immediate response) Time max (ms) 6937 3590 4668 1793 9303 9840
Time avg (ms) 1468 1821 1287 826 1659 3442
Heap max (MiB) 226 188 130 113 354 398
Heap avg (MiB) 62.2 66 36.1 33.2 79.5 122
Threads max 3500+ 16 2600+ 2000+ 4000+ 4000+
Notes OkHttp doesn’t use NIO but regular blocking I/O under the hood. Jersey uses one thread per connection even in the async case

OkHttp excels in speed and memory utilization for fast requests. The fiber version for the JVM uses the async API and performs significantly better even though the underlying mechanism is traditional blocking I/O served by a thread pool.

Even more impressive is the measure by which the http-kit-based fiber-blocking comsat-httpkit wins against a traditional clj-http client (still showing just with the small heap):

HTTP Client Load Test (colored is best) clj-http comsat-httpkit
AHC blocking (BIO) http-kit async (NIO) + Quasar fibers
Long-lived concurrent 41k (maximum rate possible) Max 15715 41k
Error OOM - thread -
Time (s) 5
Heap max (MiB) N/A 511
Heap avg (MiB) 127
Threads max 14
Throughput with target rate 1k (response after 1s) Time max (ms) 19059 1102
Time avg (ms) 8720 1003
Heap max (MiB) 405 331
Heap avg (MiB) 94.4 64.3
Threads max 9000+ 16
Throughput with target rate 10k (response after 100ms) Time max (ms) 22045 5545
Time avg (ms) 8960 4102
Heap max (MiB) 406 250
Heap avg (MiB) 117 52.1
Threads max 7000+ 15
Throughput with target rate 100k (immediate response) Time max (ms) 42849 3438
Time avg (ms) 34750 4698
Heap max (MiB) 523 259
Heap avg (MiB) 481 50.2
Threads max 11000+ 16

There are other Jersey providers as well (Grizzly, Jetty and Apache) but Jersey proved the worst of the bunch with a generally higher footprint and an async interface (used by Comsat’s fiber-blocking integration) that unfortunately spawns and blocks a thread for each and every request; for this reason (and probably also due to each provider’s implementation strategy) the fiber version sometimes provides clear performance benefits and sometimes doesn’t. Anyway these numbers are not as interesting as the Apache, OkHttp and http-kit ones so I’m not including them here, but let me know if you’d like to see them.

(Optional) From 100 < 10,000 Feet: More About I/O Performance on the JVM

So you want to know why fibers are better than threads in highly concurrent scenarios.

When only few concurrent sockets are open, the OS kernel can wake up blocked threads with very low latency. But OS threads are general purpose and they add considerable overhead for many use cases: they consume a lot of kernel memory for bookkeeping, synchronization syscalls can be orders of magnitude slower than procedure calls, context switching is expensive, and the scheduling algorithm is too generalist. All of this means that at present OS threads are just not the best choice for fine-grained concurrency with significant communication and synchronization, nor for highly concurrent systems in general .

Blocking I/O syscalls can indeed block expensive OS threads indefinitely, so a “thread-per-connection” approach will tear your system down very fast when you’re serving lots of concurrent connections; on the other hand using a thread-pool will probably make the “accepted” connection queue overflow because we can’t keep the arrival pace or cause unacceptable latencies at the very least. A “fiber-per-connection” approach instead is perfectly sustainable because fibers are so lightweight.

Summing it up: threads can be better at latency with few concurrent connections and fibers are better at throughput with many concurrent connections.

Of course fibers need to run on top of active OS threads because the OS knows nothing about fibers, so fibers are scheduled on a thread pool by Quasar. Quasar is just a library and runs entirely in user-space, which means that a fiber performing a syscall will block its underlying JVM thread for the entire call duration, making it unavailable to other fibers. That’s why it’s important that such calls are as short as possible and especially they shouldn’t wait for long time or, even worse, indefinitely: in practice fibers should only perform non-blocking syscalls. So how can we make blocking HTTP clients run so well on fibers? As those libraries provide a non-blocking (but inconvenient) API as well, we convert that async APIs to a fiber-blocking ones and use it to implement the original blocking API. The new implementation (which is very short and is little more than a wrapper) will:

  1. Block the current fiber.
  2. Start an equivalent asynchronous operation, and pass in a completion handler that will unblock the fiber when finished.

From the fiber’s (and programmer’s) perspective the execution will restart after the library call when I/O completes, just like when using a thread and a regular thread-blocking call.

Wrap-up

With Quasar and Comsat you can easily write and maintain highly concurrent and HTTP-intensive code in Java, Clojure or Kotlin and you can even choose your favorite HTTP client library, without any API lock-ins. Do you want to use something else? Let us know, or integrate it with Quasar yourself.

  1. …and much not in common, for example file I/O (which is block-oriented) supports memory-mapped I/O which doesn’t make sense with stream-oriented I/O.
  2. Read this blog post for further discussion.
  3. Not so before 1.2, when it had (only) Green Threads.
  4. Using thread-pools means dedicating a limited or anyway managed amount (or pool) of threads to fulfill a certain type of tasks, in this case serving HTTP requests: incoming connections are queued until a thread in the pool is free to serve it (as an aside, “connection pooling” is something entirely different and it’s most often about reusing DB connections).
  5. Have a look at this intro for more information.
  6. Read for example this, this and this for more information and benchmarks as well as this guest post on ZeroTurnaround RebelLabs’s blog if you want more insight about why and how fibers are implemented.
Topics:
java ,performance and scalability

Published at DZone with permission of Fabio Tudone, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}