High-Concurrency HTTP Clients on the JVM

HTTP is a super popular app protocol with loads of libraries. Here's a look at high-concurrency HTTP clients on Java virtual machines.

Fabio Tudone

Dec. 08, 15 · Analysis

Likes (35)

Comment

Save

89.2K Views

HTTP is probably the most popular application-level protocol and there are many libraries that implement it on top of network I/O, which is a special (stream-oriented) case of general I/O. Since all I/O has a much in common, let’s start with some discussion about it.

I’ll concentrate on I/O cases with a lots of concurrent HTTP requests, for example micro-services, where a set of higher-level HTTP services invoke several lower-level ones, some concurrently and some sequentially due to data dependencies.

When serving many such requests the total number of concurrently open connections can become big at times; if there are data dependencies, or if the lower-level services are slow (or slowed down due to exceptional conditions). So microservice layers tend to require many concurrent, potentially long-lived connections. To see how many open connections we are required to support without crashing let’s recall Little’s Law with Ψ being the average in-progress requests count, ρ being the average arrival rate and τ being the average completion time:

Ψ = ρ τ

The number of in-progress requests we can support depends on the language runtime, the OS and the hardware; the average request completion time (or latency), depends on what we have to do in order to fulfill the requests, including of course the calls to any lower level services, access to storage etc.

How many concurrent HTTP requests can we support? Each will need an open connection and some runnable primitive that can read/write on it using syscalls. If the memory, I/O subsystem and network bandwidth can keep up, modern OSes can support hundreds of thousands open TCP connections; the runnable primitives they provide to work on sockets are threads. Threads are much more heavyweight than sockets: a single box running a modern OS can only support 5000-15000 of them.

From 10,000 Feet: I/O Performance on the JVM

Nowadays JDK threads are OS threads on most platforms but if at any time there are only few concurrent connections then the “thread-per-connection” model is perfectly fine.

What if not? The answer to this question has changed along history:

JDK pre-1.4 only had libraries calling into the OS’ thread-blocking I/O (java.io pkgs), so only the “thread-per-connection” model or thread-pools could be used. If you wanted something better you’d tap into your OS’ additional features through JNI.
JDK 1.4 added non-blocking I/O or NIO (java.nio packages) to read/write from connections only if it can be done immediately, without putting the thread to sleep. Even more importantly it added a way for a single thread to work effectively on many channels with socket selection, which means asking the OS to block the current thread and unblock it when it is possible to receive/send data immediately from at least one socket of a set.
JDK 1.7 added NIO.2, also known as asynchronous I/O (still java.nio packages). This means asking the OS to perform I/O tasks completely in the background and wake up a thread with a notification later on, only when the I/O has finished.

Calling HTTP From the JVM Either Easily or Efficiently: the Thread-blocking and Async Toolboxes

There’s a wide selection of open-source HTTP client libraries available for the JVM. The thread-blocking APIs are easy to use and to maintain but potentially less efficient with many concurrent requests, while the async ones are efficient but harder to use. Asynchronous APIs also virally affect your code with asynchrony: any method consuming asynchronous data must be asynchronous itself, or block and nullify the advantages of asynchrony.

Here’s a selection of open-source HTTP clients for Java and Clojure:

JDK’s URLConnection uses traditional thread-blocking I/O.
Apache HTTP Client uses traditional thread-blocking I/O with thread-pools.
Apache Async HTTP Client uses NIO.
Jersey is a ReST client/server framework; the client API can use several HTTP client backends including URLConnection and Apache HTTP Client.
OkHttp uses traditional thread-blocking I/O with thread-pools.
Retrofit turns your HTTP API into a Java interface and can use several HTTP client backends including Apache HTTP Client.
Grizzly is network framework with low-level HTTP support; it was using NIO but it switched to AIO .
Netty is a network framework with HTTP support (low-level), multi-transport, includes NIO and native (the latter uses epoll on Linux).
Jetty Async HTTP Client uses NIO.
Async HTTP Client wraps either Netty, Grizzly or JDK’s HTTP support.
clj-http wraps the Apache HTTP Client.
is an async subset of clj-http implemented partially in Java directly on top of NIO.
http async client wraps the Async HTTP Client for Java.

From 10,000 Feet: Making it Easy

Since Java threads are heavy on resources, if we want to perform I/O and scale to many concurrent connections we have to use either NIO or async NIO; on the other hand they are much more difficult to code and maintain. Is there a solution to this dilemma?

If threads weren’t heavy we could just use straightforward blocking I/O, so our question really is: can we have cheap enough threads that could be created in much larger numbers than OS threads?

At present the JVM itself doesn’t provide lightweight threads but Quasar comes to the rescue with fibers, which are very efficient threads, implemented in userspace.

Calling HTTP From the JVM Both Easily and Efficiently: the Comsat Fiber-blocking Toolbox

Comsat integrates some of the existing libraries with Quasar fibers. The Comsat APIs are identical to the original ones and the HTTP clients section) explains how to hook them in; for the rest simply ensure you’re running Quasar properly, fire up your fibers when you need to perform a new HTTP call and use one (or more) of following fiber-blocking APIs (or take inspiration from templates and examples:

Java:
- An extensive subset of the Apache HTTP Client API, integrated by bridging the async one. Apache HTTP Client is mature, efficient, feature-complete and very widely used.
- The fiber-blocking Retrofit API wraps the Apache client. Retrofit is a modern and high-level HTTP client toolkit that has been drawing a lot of interest also for ReST.
- The JAXRS synchronous HTTP client API, integrated by bridging Jersey’s async one. Jersey is a very popular JAXRS-compliant framework for ReST, so several micro-services could decide to use both its server and client APIs.
- The OkHttp synchronous API, integrated by bridging the OkHttp async API. OkHttp performs very well, is cheap on resources and feature-rich yet at the same time it has a very straightforward API for common cases, plus it supports HTTP2 and SPDY as well.
Clojure:
- An extensive subset of the clj-http API, integrated by bridging the async API of http-kit. clj-http is probably the most popular HTTP client API in the Clojure ecosystem.

New integrations can be added easily and of course contributions are always welcome.

Some Load Tests with JBender

jbender is Pinterest’s Quasar-based network load testing framework. It’s efficient and flexible but thanks to Quasar fiber-blocking its source code is tiny and readable; using it is just as straightforward as using traditional thread-blocking I/O.

Consider this project, which builds on JBender and with a tiny amount of code implements HTTP load test clients for all the Comsat-integrated libraries, both in their original thread-blocking version and in Comsat’s fiber-blocking one.

JBender can use any either (plain, heavyweight, OS) threads or fibers to perform requests, both are abstracted by Quasar to a shared abstract class called Strand, so the thread-blocking and fiber-blocking versions share HTTP code: this proves that the Comsat-integrated APIs are exactly the same as the original ones and that fibers and threads are used exactly in the same way.

The load-test clients accept parameters to customize pretty much every aspect of their run but the test cases we’ll consider are the following:

41000 long-lived HTTP connections fired at the highest possible rate.
Executing 10000 requests (plus 1000 of initial client and server warmup) lasting 1 second each with a target rate of 1000 rps.
Executing 10000 requests (plus 1000 of initial client and server warmup) lasting 100 milliseconds each with a target rate of 10000 rps.
Executing 10000 requests (plus 1000 of initial client and server warmup) with an immediate reply and a target rate of 100000 rps.

All of the tests have been fired against a server running Dropwizard, optimized to employ fibers on the HTTP server-side with comsat-dropwizard for maximum concurrency. The server simply replies to any request with “Hello!”

Here’s some information about our load test environment:

Parallel Universe Stack	Quasar 0.7.4-SNAPSHOT, Comsat 0.5.0
Fiber Server (comsat-dropwizard 0.5.0, Jetty 9.2.9)	AWSEC2 Linux m4.xlarge (16 GB, 4 vcpus, high net perf)
Client (GET “/” -> 204 “Hello”)	AWSEC2 Linux t2.medium (4 GB, 2 vcpus, moderate-to-low net perf)
OS Settings	https://github.com/circlespainter/jbender
JBender load test suite (server + clients)	https://github.com/circlespainter/comsat-http-client-bench
CPU/RAM Monitoring method	JFR
CPU/RAM sampling interval	JFR’s default
JVM	Oracle 1.8.0_b66
JVM Settings	-XX:+AggressiveOpts
HTTP Client settings	No retries, maximum-sized connection pool, I/O threads (only async) = <cpus> (= 2 for m4.medium), connect/read/write/ttl timeout = 1h
Warmup	1000 reqs, both server and client
Request generator buffer	= # reqs with pre-generation for throughput tests, = 1 for concurrency tests
Request completion events buffer	= # reqs with throughput tests, = 1 for concurrency tests

The first important result is that the Comsat-based clients win hands-down, each compared to its respective non-fiber mode. Apache’s for many long-lasting connections and OkHttp’s for lots of short-lived requests with a very high target rate, both with a small and a bigger heap (resp. 990 MiB and 3 GiB, showing just the first one for brevity):

HTTP Client Load Test (colored is best)		Regular (thread-blocking) Apache (4.4.1)	Comsat (fiber-blocking) Apache (async 4.1)	Regular (thread-blocking) OkHttp 2.4.0	Comsat (fiber-blocking) OkHttp 2.4.0	Regular (thread-blocking) Jersey 2.19 w/JDK connector	Comsat (fiber-blocking) Jersey 2.19 w/JDK connector
HTTP Client Load Test (colored is best)		AHC blocking (BIO)	AHC async (NIO) + Quasar fibers	OkHttp blocking (BIO)	OkHttp async (BIO) + Quasar fibers	Jersey blocking	Jersey async + Quasar fibers
Long-lived concurrent 41k (maximum rate possible)	Max	16715	41k	16358	16608	16713	16713
	Error	OOM - thread	-	OOM - thread	OOM - thread	OOM - thread	OOM - thread
	Time (s)	8.8	8.8	16.6	16.7	16.5	20.2
	Heap max (MiB)	N/A	702	N/A	N/A	N/A	N/A
	Heap avg (MiB)		246
	Threads max		16
Throughput with target rate 1k (response after 1s)	Time max (ms)	7139	1138	10209	1301	6341	4370
	Time avg (ms)	2359	1002	3031	1008	1902	1477
	Heap max (MiB)	227	110	125	119	330	342
	Heap avg (MiB)	61	29.8	34.9	30	76.8	73.7
	Threads max	4000+	15	4000+	1900+	4300+	2600+
Throughput with target rate 10k (response after 100ms)	Time max (ms)	4898	4085	7939	7079	45198	14512
	Time avg (ms)	2479	2717	3423	2125	25885	7594
	Heap max (MiB)	338	192	179	165	495	489
	Heap avg (MiB)	91.3	67.2	40.9	38.5	147	155
	Threads max	7500+	16	4900+	3900+	11000+	6900+
Throughput with target rate 100k (immediate response)	Time max (ms)	6937	3590	4668	1793	9303	9840
	Time avg (ms)	1468	1821	1287	826	1659	3442
	Heap max (MiB)	226	188	130	113	354	398
	Heap avg (MiB)	62.2	66	36.1	33.2	79.5	122
	Threads max	3500+	16	2600+	2000+	4000+	4000+
Notes					OkHttp doesn’t use NIO but regular blocking I/O under the hood.		Jersey uses one thread per connection even in the async case

OkHttp excels in speed and memory utilization for fast requests. The fiber version for the JVM uses the async API and performs significantly better even though the underlying mechanism is traditional blocking I/O served by a thread pool.

Even more impressive is the measure by which the http-kit-based fiber-blocking comsat-httpkit wins against a traditional clj-http client (still showing just with the small heap):

HTTP Client Load Test (colored is best)		clj-http	comsat-httpkit
HTTP Client Load Test (colored is best)		AHC blocking (BIO)	http-kit async (NIO) + Quasar fibers
Long-lived concurrent 41k (maximum rate possible)	Max	15715	41k
	Error	OOM - thread	-
	Time (s)		5
	Heap max (MiB)	N/A	511
	Heap avg (MiB)		127
	Threads max		14
Throughput with target rate 1k (response after 1s)	Time max (ms)	19059	1102
	Time avg (ms)	8720	1003
	Heap max (MiB)	405	331
	Heap avg (MiB)	94.4	64.3
	Threads max	9000+	16
Throughput with target rate 10k (response after 100ms)	Time max (ms)	22045	5545
	Time avg (ms)	8960	4102
	Heap max (MiB)	406	250
	Heap avg (MiB)	117	52.1
	Threads max	7000+	15
Throughput with target rate 100k (immediate response)	Time max (ms)	42849	3438
	Time avg (ms)	34750	4698
	Heap max (MiB)	523	259
	Heap avg (MiB)	481	50.2
	Threads max	11000+	16

There are other Jersey providers as well (Grizzly, Jetty and Apache) but Jersey proved the worst of the bunch with a generally higher footprint and an async interface (used by Comsat’s fiber-blocking integration) that unfortunately spawns and blocks a thread for each and every request; for this reason (and probably also due to each provider’s implementation strategy) the fiber version sometimes provides clear performance benefits and sometimes doesn’t. Anyway these numbers are not as interesting as the Apache, OkHttp and http-kit ones so I’m not including them here, but let me know if you’d like to see them.

(Optional) From 100 < 10,000 Feet: More About I/O Performance on the JVM

So you want to know why fibers are better than threads in highly concurrent scenarios.

When only few concurrent sockets are open, the OS kernel can wake up blocked threads with very low latency. But OS threads are general purpose and they add considerable overhead for many use cases: they consume a lot of kernel memory for bookkeeping, synchronization syscalls can be orders of magnitude slower than procedure calls, context switching is expensive, and the scheduling algorithm is too generalist. All of this means that at present OS threads are just not the best choice for fine-grained concurrency with significant communication and synchronization, nor for highly concurrent systems in general .

Blocking I/O syscalls can indeed block expensive OS threads indefinitely, so a “thread-per-connection” approach will tear your system down very fast when you’re serving lots of concurrent connections; on the other hand using a thread-pool will probably make the “accepted” connection queue overflow because we can’t keep the arrival pace or cause unacceptable latencies at the very least. A “fiber-per-connection” approach instead is perfectly sustainable because fibers are so lightweight.

Summing it up: threads can be better at latency with few concurrent connections and fibers are better at throughput with many concurrent connections.

Of course fibers need to run on top of active OS threads because the OS knows nothing about fibers, so fibers are scheduled on a thread pool by Quasar. Quasar is just a library and runs entirely in user-space, which means that a fiber performing a syscall will block its underlying JVM thread for the entire call duration, making it unavailable to other fibers. That’s why it’s important that such calls are as short as possible and especially they shouldn’t wait for long time or, even worse, indefinitely: in practice fibers should only perform non-blocking syscalls. So how can we make blocking HTTP clients run so well on fibers? As those libraries provide a non-blocking (but inconvenient) API as well, we convert that async APIs to a fiber-blocking ones and use it to implement the original blocking API. The new implementation (which is very short and is little more than a wrapper) will:

Block the current fiber.
Start an equivalent asynchronous operation, and pass in a completion handler that will unblock the fiber when finished.

From the fiber’s (and programmer’s) perspective the execution will restart after the library call when I/O completes, just like when using a thread and a regular thread-blocking call.

Wrap-up

With Quasar and Comsat you can easily write and maintain highly concurrent and HTTP-intensive code in Java, Clojure or Kotlin and you can even choose your favorite HTTP client library, without any API lock-ins. Do you want to use something else? Let us know, or integrate it with Quasar yourself.

…and much not in common, for example file I/O (which is block-oriented) supports memory-mapped I/O which doesn’t make sense with stream-oriented I/O.
Read this blog post for further discussion.
Not so before 1.2, when it had (only) Green Threads.
Using thread-pools means dedicating a limited or anyway managed amount (or pool) of threads to fulfill a certain type of tasks, in this case serving HTTP requests: incoming connections are queued until a thread in the pool is free to serve it (as an aside, “connection pooling” is something entirely different and it’s most often about reusing DB connections).
Have a look at this intro for more information.
Read for example this, this and this for more information and benchmarks as well as this guest post on ZeroTurnaround RebelLabs’s blog if you want more insight about why and how fibers are implemented.

Java (programming language) Java virtual machine

Published at DZone with permission of Fabio Tudone, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending