DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Java Virtual Threads and Scaling
  • Java’s Next Act: Native Speed for a Cloud-Native World
  • The Energy Efficiency of JVMs and the Role of GraalVM
  • Understanding Root Causes of Out of Memory (OOM) Issues in Java Containers

Trending

  • A Simple, Convenience Package for the Azure Cosmos DB Go SDK
  • Designing a Java Connector for Software Integrations
  • AWS to Azure Migration: A Cloudy Journey of Challenges and Triumphs
  • Agile and Quality Engineering: A Holistic Perspective
  1. DZone
  2. Coding
  3. Java
  4. High-Concurrency HTTP Clients on the JVM

High-Concurrency HTTP Clients on the JVM

HTTP is a super popular app protocol with loads of libraries. Here's a look at high-concurrency HTTP clients on Java virtual machines.

By 
Fabio Tudone user avatar
Fabio Tudone
·
Dec. 08, 15 · Analysis
Likes (35)
Comment
Save
Tweet
Share
89.2K Views

Join the DZone community and get the full member experience.

Join For Free

HTTP is probably the most popular application-level protocol and there are many libraries that implement it on top of network I/O, which is a special (stream-oriented) case of general I/O. Since all I/O has a much in common, let’s start with some discussion about it.

I’ll concentrate on I/O cases with a lots of concurrent HTTP requests, for example micro-services, where a set of higher-level HTTP services invoke several lower-level ones, some concurrently and some sequentially due to data dependencies.

When serving many such requests the total number of concurrently open connections can become big at times; if there are data dependencies, or if the lower-level services are slow (or slowed down due to exceptional conditions). So microservice layers tend to require many concurrent, potentially long-lived connections. To see how many open connections we are required to support without crashing let’s recall Little’s Law with Ψ being the average in-progress requests count, ρ being the average arrival rate and τ being the average completion time:

Ψ = ρ τ

The number of in-progress requests we can support depends on the language runtime, the OS and the hardware; the average request completion time (or latency), depends on what we have to do in order to fulfill the requests, including of course the calls to any lower level services, access to storage etc.

How many concurrent HTTP requests can we support? Each will need an open connection and some runnable primitive that can read/write on it using syscalls. If the memory, I/O subsystem and network bandwidth can keep up, modern OSes can support hundreds of thousands open TCP connections; the runnable primitives they provide to work on sockets are threads. Threads are much more heavyweight than sockets: a single box running a modern OS can only support 5000-15000 of them.

From 10,000 Feet: I/O Performance on the JVM

Nowadays JDK threads are OS threads on most platforms but if at any time there are only few concurrent connections then the “thread-per-connection” model is perfectly fine.

What if not? The answer to this question has changed along history:

  • JDK pre-1.4 only had libraries calling into the OS’ thread-blocking I/O (java.io pkgs), so only the “thread-per-connection” model or thread-pools could be used. If you wanted something better you’d tap into your OS’ additional features through JNI.
  • JDK 1.4 added non-blocking I/O or NIO (java.nio packages) to read/write from connections only if it can be done immediately, without putting the thread to sleep. Even more importantly it added a way for a single thread to work effectively on many channels with socket selection, which means asking the OS to block the current thread and unblock it when it is possible to receive/send data immediately from at least one socket of a set.
  • JDK 1.7 added NIO.2, also known as asynchronous I/O (still java.nio packages). This means asking the OS to perform I/O tasks completely in the background and wake up a thread with a notification later on, only when the I/O has finished.

Calling HTTP From the JVM Either Easily or Efficiently: the Thread-blocking and Async Toolboxes

There’s a wide selection of open-source HTTP client libraries available for the JVM. The thread-blocking APIs are easy to use and to maintain but potentially less efficient with many concurrent requests, while the async ones are efficient but harder to use. Asynchronous APIs also virally affect your code with asynchrony: any method consuming asynchronous data must be asynchronous itself, or block and nullify the advantages of asynchrony.

Here’s a selection of open-source HTTP clients for Java and Clojure:

  • JDK’s URLConnection uses traditional thread-blocking I/O.
  • Apache HTTP Client uses traditional thread-blocking I/O with thread-pools.
  • Apache Async HTTP Client uses NIO.
  • Jersey is a ReST client/server framework; the client API can use several HTTP client backends including URLConnection and Apache HTTP Client.
  • OkHttp uses traditional thread-blocking I/O with thread-pools.
  • Retrofit turns your HTTP API into a Java interface and can use several HTTP client backends including Apache HTTP Client.
  • Grizzly is network framework with low-level HTTP support; it was using NIO but it switched to AIO .
  • Netty is a network framework with HTTP support (low-level), multi-transport, includes NIO and native (the latter uses epoll on Linux).
  • Jetty Async HTTP Client uses NIO.
  • Async HTTP Client wraps either Netty, Grizzly or JDK’s HTTP support.
  • clj-http wraps the Apache HTTP Client.
  • is an async subset of clj-http implemented partially in Java directly on top of NIO.
  • http async client wraps the Async HTTP Client for Java.

From 10,000 Feet: Making it Easy

Since Java threads are heavy on resources, if we want to perform I/O and scale to many concurrent connections we have to use either NIO or async NIO; on the other hand they are much more difficult to code and maintain. Is there a solution to this dilemma?

If threads weren’t heavy we could just use straightforward blocking I/O, so our question really is: can we have cheap enough threads that could be created in much larger numbers than OS threads?

At present the JVM itself doesn’t provide lightweight threads but Quasar comes to the rescue with fibers, which are very efficient threads, implemented in userspace.

Calling HTTP From the JVM Both Easily and Efficiently: the Comsat Fiber-blocking Toolbox

Comsat integrates some of the existing libraries with Quasar fibers. The Comsat APIs are identical to the original ones and the HTTP clients section) explains how to hook them in; for the rest simply ensure you’re running Quasar properly, fire up your fibers when you need to perform a new HTTP call and use one (or more) of following fiber-blocking APIs (or take inspiration from templates and examples:

  • Java:
    • An extensive subset of the Apache HTTP Client API, integrated by bridging the async one. Apache HTTP Client is mature, efficient, feature-complete and very widely used.
    • The fiber-blocking Retrofit API wraps the Apache client. Retrofit is a modern and high-level HTTP client toolkit that has been drawing a lot of interest also for ReST.
    • The JAXRS synchronous HTTP client API, integrated by bridging Jersey’s async one. Jersey is a very popular JAXRS-compliant framework for ReST, so several micro-services could decide to use both its server and client APIs.
    • The OkHttp synchronous API, integrated by bridging the OkHttp async API. OkHttp performs very well, is cheap on resources and feature-rich yet at the same time it has a very straightforward API for common cases, plus it supports HTTP2 and SPDY as well.
  • Clojure:
    • An extensive subset of the clj-http API, integrated by bridging the async API of http-kit. clj-http is probably the most popular HTTP client API in the Clojure ecosystem.

New integrations can be added easily and of course contributions are always welcome.

Some Load Tests with JBender

jbender is Pinterest’s Quasar-based network load testing framework. It’s efficient and flexible but thanks to Quasar fiber-blocking its source code is tiny and readable; using it is just as straightforward as using traditional thread-blocking I/O.

Consider this project, which builds on JBender and with a tiny amount of code implements HTTP load test clients for all the Comsat-integrated libraries, both in their original thread-blocking version and in Comsat’s fiber-blocking one.

JBender can use any either (plain, heavyweight, OS) threads or fibers to perform requests, both are abstracted by Quasar to a shared abstract class called Strand, so the thread-blocking and fiber-blocking versions share HTTP code: this proves that the Comsat-integrated APIs are exactly the same as the original ones and that fibers and threads are used exactly in the same way.

The load-test clients accept parameters to customize pretty much every aspect of their run but the test cases we’ll consider are the following:

  1. 41000 long-lived HTTP connections fired at the highest possible rate.
  2. Executing 10000 requests (plus 1000 of initial client and server warmup) lasting 1 second each with a target rate of 1000 rps.
  3. Executing 10000 requests (plus 1000 of initial client and server warmup) lasting 100 milliseconds each with a target rate of 10000 rps.
  4. Executing 10000 requests (plus 1000 of initial client and server warmup) with an immediate reply and a target rate of 100000 rps.

All of the tests have been fired against a server running Dropwizard, optimized to employ fibers on the HTTP server-side with comsat-dropwizard for maximum concurrency. The server simply replies to any request with “Hello!”

Here’s some information about our load test environment:

Parallel Universe Stack Quasar 0.7.4-SNAPSHOT, Comsat 0.5.0
Fiber Server (comsat-dropwizard 0.5.0, Jetty 9.2.9) AWSEC2 Linux m4.xlarge (16 GB, 4 vcpus, high net perf)
Client (GET “/” -> 204 “Hello”) AWSEC2 Linux t2.medium (4 GB, 2 vcpus, moderate-to-low net perf)
OS Settings https://github.com/circlespainter/jbender
JBender load test suite (server + clients) https://github.com/circlespainter/comsat-http-client-bench
CPU/RAM Monitoring method JFR
CPU/RAM sampling interval JFR’s default
JVM Oracle 1.8.0_b66
JVM Settings -XX:+AggressiveOpts
HTTP Client settings No retries, maximum-sized connection pool, I/O threads (only async) = <cpus> (= 2 for m4.medium), connect/read/write/ttl timeout = 1h
Warmup 1000 reqs, both server and client
Request generator buffer = # reqs with pre-generation for throughput tests, = 1 for concurrency tests
Request completion events buffer = # reqs with throughput tests, = 1 for concurrency tests

The first important result is that the Comsat-based clients win hands-down, each compared to its respective non-fiber mode. Apache’s for many long-lasting connections and OkHttp’s for lots of short-lived requests with a very high target rate, both with a small and a bigger heap (resp. 990 MiB and 3 GiB, showing just the first one for brevity):

HTTP Client Load Test (colored is best) Regular (thread-blocking) Apache (4.4.1) Comsat (fiber-blocking) Apache (async 4.1) Regular (thread-blocking) OkHttp 2.4.0 Comsat (fiber-blocking) OkHttp 2.4.0 Regular (thread-blocking) Jersey 2.19 w/JDK connector Comsat (fiber-blocking) Jersey 2.19 w/JDK connector
AHC blocking (BIO) AHC async (NIO) + Quasar fibers OkHttp blocking (BIO) OkHttp async (BIO) + Quasar fibers Jersey blocking Jersey async + Quasar fibers
Long-lived concurrent 41k (maximum rate possible) Max 16715 41k 16358 16608 16713 16713
Error OOM - thread - OOM - thread OOM - thread OOM - thread OOM - thread
Time (s) 8.8 8.8 16.6 16.7 16.5 20.2
Heap max (MiB) N/A 702 N/A N/A N/A N/A
Heap avg (MiB) 246
Threads max 16
Throughput with target rate 1k (response after 1s) Time max (ms) 7139 1138 10209 1301 6341 4370
Time avg (ms) 2359 1002 3031 1008 1902 1477
Heap max (MiB) 227 110 125 119 330 342
Heap avg (MiB) 61 29.8 34.9 30 76.8 73.7
Threads max 4000+ 15 4000+ 1900+ 4300+ 2600+
Throughput with target rate 10k (response after 100ms) Time max (ms) 4898 4085 7939 7079 45198 14512
Time avg (ms) 2479 2717 3423 2125 25885 7594
Heap max (MiB) 338 192 179 165 495 489
Heap avg (MiB) 91.3 67.2 40.9 38.5 147 155
Threads max 7500+ 16 4900+ 3900+ 11000+ 6900+
Throughput with target rate 100k (immediate response) Time max (ms) 6937 3590 4668 1793 9303 9840
Time avg (ms) 1468 1821 1287 826 1659 3442
Heap max (MiB) 226 188 130 113 354 398
Heap avg (MiB) 62.2 66 36.1 33.2 79.5 122
Threads max 3500+ 16 2600+ 2000+ 4000+ 4000+
Notes OkHttp doesn’t use NIO but regular blocking I/O under the hood. Jersey uses one thread per connection even in the async case

OkHttp excels in speed and memory utilization for fast requests. The fiber version for the JVM uses the async API and performs significantly better even though the underlying mechanism is traditional blocking I/O served by a thread pool.

Even more impressive is the measure by which the http-kit-based fiber-blocking comsat-httpkit wins against a traditional clj-http client (still showing just with the small heap):

HTTP Client Load Test (colored is best) clj-http comsat-httpkit
AHC blocking (BIO) http-kit async (NIO) + Quasar fibers
Long-lived concurrent 41k (maximum rate possible) Max 15715 41k
Error OOM - thread -
Time (s) 5
Heap max (MiB) N/A 511
Heap avg (MiB) 127
Threads max 14
Throughput with target rate 1k (response after 1s) Time max (ms) 19059 1102
Time avg (ms) 8720 1003
Heap max (MiB) 405 331
Heap avg (MiB) 94.4 64.3
Threads max 9000+ 16
Throughput with target rate 10k (response after 100ms) Time max (ms) 22045 5545
Time avg (ms) 8960 4102
Heap max (MiB) 406 250
Heap avg (MiB) 117 52.1
Threads max 7000+ 15
Throughput with target rate 100k (immediate response) Time max (ms) 42849 3438
Time avg (ms) 34750 4698
Heap max (MiB) 523 259
Heap avg (MiB) 481 50.2
Threads max 11000+ 16

There are other Jersey providers as well (Grizzly, Jetty and Apache) but Jersey proved the worst of the bunch with a generally higher footprint and an async interface (used by Comsat’s fiber-blocking integration) that unfortunately spawns and blocks a thread for each and every request; for this reason (and probably also due to each provider’s implementation strategy) the fiber version sometimes provides clear performance benefits and sometimes doesn’t. Anyway these numbers are not as interesting as the Apache, OkHttp and http-kit ones so I’m not including them here, but let me know if you’d like to see them.

(Optional) From 100 < 10,000 Feet: More About I/O Performance on the JVM

So you want to know why fibers are better than threads in highly concurrent scenarios.

When only few concurrent sockets are open, the OS kernel can wake up blocked threads with very low latency. But OS threads are general purpose and they add considerable overhead for many use cases: they consume a lot of kernel memory for bookkeeping, synchronization syscalls can be orders of magnitude slower than procedure calls, context switching is expensive, and the scheduling algorithm is too generalist. All of this means that at present OS threads are just not the best choice for fine-grained concurrency with significant communication and synchronization, nor for highly concurrent systems in general .

Blocking I/O syscalls can indeed block expensive OS threads indefinitely, so a “thread-per-connection” approach will tear your system down very fast when you’re serving lots of concurrent connections; on the other hand using a thread-pool will probably make the “accepted” connection queue overflow because we can’t keep the arrival pace or cause unacceptable latencies at the very least. A “fiber-per-connection” approach instead is perfectly sustainable because fibers are so lightweight.

Summing it up: threads can be better at latency with few concurrent connections and fibers are better at throughput with many concurrent connections.

Of course fibers need to run on top of active OS threads because the OS knows nothing about fibers, so fibers are scheduled on a thread pool by Quasar. Quasar is just a library and runs entirely in user-space, which means that a fiber performing a syscall will block its underlying JVM thread for the entire call duration, making it unavailable to other fibers. That’s why it’s important that such calls are as short as possible and especially they shouldn’t wait for long time or, even worse, indefinitely: in practice fibers should only perform non-blocking syscalls. So how can we make blocking HTTP clients run so well on fibers? As those libraries provide a non-blocking (but inconvenient) API as well, we convert that async APIs to a fiber-blocking ones and use it to implement the original blocking API. The new implementation (which is very short and is little more than a wrapper) will:

  1. Block the current fiber.
  2. Start an equivalent asynchronous operation, and pass in a completion handler that will unblock the fiber when finished.

From the fiber’s (and programmer’s) perspective the execution will restart after the library call when I/O completes, just like when using a thread and a regular thread-blocking call.

Wrap-up

With Quasar and Comsat you can easily write and maintain highly concurrent and HTTP-intensive code in Java, Clojure or Kotlin and you can even choose your favorite HTTP client library, without any API lock-ins. Do you want to use something else? Let us know, or integrate it with Quasar yourself.

  1. …and much not in common, for example file I/O (which is block-oriented) supports memory-mapped I/O which doesn’t make sense with stream-oriented I/O.
  2. Read this blog post for further discussion.
  3. Not so before 1.2, when it had (only) Green Threads.
  4. Using thread-pools means dedicating a limited or anyway managed amount (or pool) of threads to fulfill a certain type of tasks, in this case serving HTTP requests: incoming connections are queued until a thread in the pool is free to serve it (as an aside, “connection pooling” is something entirely different and it’s most often about reusing DB connections).
  5. Have a look at this intro for more information.
  6. Read for example this, this and this for more information and benchmarks as well as this guest post on ZeroTurnaround RebelLabs’s blog if you want more insight about why and how fibers are implemented.
Java (programming language) Java virtual machine

Published at DZone with permission of Fabio Tudone, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Java Virtual Threads and Scaling
  • Java’s Next Act: Native Speed for a Cloud-Native World
  • The Energy Efficiency of JVMs and the Role of GraalVM
  • Understanding Root Causes of Out of Memory (OOM) Issues in Java Containers

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!