Project Loom: Java With a Stronger Fiber

Thanks to Project Loom, a new era of Java is on the horizon.

Luca Venturi

Aug. 27, 19 · Presentation

Likes (29)

Comment

Save

38.7K Views

Popularity comes at a price. Java is, and has been, a very popular language, attracting both praise and critique. While it would be fair to expect a decline after so many years and such a big legacy, Java is actually in pretty good shape and has a strong technical road-map ahead. A new era of Java is coming, and a few years from now, things might be very different in JVM-land. The OpenJDK has some technically impressive projects that we will hopefully be able to use soonish, and that have the potential to affect not only Java but even other languages.

Apart from Loom, the focus of this article, you should also keep an eye on Valhalla, which might double the performance of Java on some cases, and Graal, which does so many things that I don’t even know where to start! And of course, the language is becoming less verbose, thanks to Amber.

These projects might even change the perception of other languages, so they are potentially really high impact.

One example of this is Loom + Graal which continuations (co-routines) and ahead-of-time compilation, making Go less appealing than now.

Loom + Amber gives you fibers (enabling potentially simpler actor systems) and shorter syntax, also making Scala less attractive than now. Valhalla + Graal might reduce the performance gap with C++. And Graal alone might push Python to run in the JVM, or at least PySpark might greatly benefit from it.

But let's focus on Loom. And as there is not much practical information about it at this time, we will further explain, build, and use this experimental JVM to take some benchmarks. Let the numbers do the talking!

Project Loom

Java used to have green threads, at least in Solaris, but modern versions of Java use what's called native threads. Native threads are nice but relatively heavy, and you might need to tune the OS if you want to have tens of thousands of them.

Project Loom introduces continuations (co-routines) and fibers (a type of green threads), allowing you to choose between threads and fibers. With Loom, even a laptop can easily run millions of fibers, opening the door to new, or not so new, paradigms.

A Small Digression: Erlang

You might have heard about Erlang. It’s a very interesting language, much older than Java, with some shortcomings but also some impressive features. Erlang has native support for green threads, and in fact, the VM counts the operations and switch between green threads every now and then.

In Erlang, it is common for a program to have many long-lived, not-very-busy threads. It is, in fact, expected to serve every user with a dedicated thread. Many of these threads might execute network operations (after all, Erlang has been developed by Ericsson for the Telecom industry), and these network operations are synchronous. Yes, synchronous. We might serve a million of users with one machine with a lot of RAM, using simple, synchronous, network operations.

Synchronous Vs. Asynchronous

For years, we have been told that scalable servers require asynchronous operations, but that’s not completely true.

Sure, if you need to scale using a thread pool (or even one single thread), you basically have no alternatives: You have to use asynchronous operations. And, asynchronous operations can scale very well.

When I joined Opera Software in 2008, I was a bit surprised to hear that Presto, the core of the browser, was single-threaded. Yep, one single thread. But that was enough. Tens of tabs rendering HTML and processing JavaScript, network downloads, file operations, cookies, cache — you name it. And only one thread, lots of asynchronous operations, and callbacks everywhere. And it worked pretty well.

But asynchronous code is hard. It can be very hard. Asynchronous calls break the flow of the operations, and what could be just 20 lines of simple code, might need to be split into multiple files, run across threads, and can take a developers hours to figure out what is actually happening.

Wouldn’t it be nice to get the simplicity of synchronous operations with the performance of asynchronous calls?

Fibers to the Rescue

Loom introduces fibers. That’s great, but it is not enough. To do useful things, you need a network stack that is fiber friendly. When I tried Loom a few months ago, this was not the case. Creating around 40-50 fibers was enough to start to have network errors. The project was too immature.

In June, JDK 13 accepted in the mainline the JEP 353 (https://openjdk.java.net/jeps/353), which rewrote the Java Socket API to be more fiber-friendly.

While not everything works, Loom can now be used with network operations.

It’s time to have an Actor System that can leverage the fibers of Loom.

Ok, maybe it is a bit early, as Project Loom is still experimental and the JDK 13 is due in September, but I could not resist. So I created and open-sourced a small Actor System able to take advantage of Loom: Fibry. We will use it to benchmark Loom and see if fibers are really better than threads.

Actors and Fibry

Actors are used in a multi-threaded environment to achieve concurrency in a relatively simple way. In particular, actors are single-threaded, so you do not have concurrency issue by definition, as long as they operate only on their state; you can alter the state of an actor sending messages to it.

Erlang enforces this safety having only constants (no for-loops, and you can’t even switch two variables in a traditional way…), Java does not. But actors can still be very useful.

An excellent use case for actors is when you have a long-running task that is particularly light, typically because it relies on network operations and just waits for the clients to do something. For example, an IoT network might have all the devices permanently connected to a control server, sending messages only every now and then. A chat is another example of a program that can benefit from actors. And a server supporting WebSockets might be another candidate.

Fibry is my Actor System, designed to be small, flexible, simple-to-use, and, of course, take advantage of Loom. It works with any version of Java, starting from Java 8 onwards, and it has no dependencies, except requires Loom to use fibers.

Building Loom

Building Loom is a bit time consuming, yet easy enough. You can get some basic information here.

After installing Mercurial (OpenJDK is still on mercurial), you need to run these commands:

hg clone http://hg.openjdk.java.net/loom/loom 
cd loom 
hg update -r fibers
sh configure  
make images

You might need to install some packages during the process, but ‘sh configure’ should tell you which commands to run.

That’s it!

You could now create “Hello World” with fibers:

var fiber =
  FiberScope.background().schedule(() -> System.out.println("Hello World"));

You can get more information here.

We are not going to use fibers directly, but we will use Fibry, as we are primarily concerned with how actors can benefit from them.

Comparing Fibers and Threads

Let’s count how much time we need to create (and keep alive) 3K threads. You can try a higher number if your OS is properly tuned. I am using the standard configuration of a c5.2xlarge VM with Loom JDK without parameters. It can be created with 3K threads, but not 4K.

When you run this test with many threads, be prepared; it can be a bit hard on your PC, and you might need a reboot.

for(int i=0; i<3000; i++)
  Stereotypes.threads().sink(null);

This code creates 3K “sink threads” that simply discard the messages they receive. In my VM, it takes 210 ms to execute.

Let’s try to create 1M Fibers, using the fibers() method instead of threads():

for(int i=0; i<1_000_000; i++)
  Stereotypes.fibers().sink(null);

In my VM, I can actually create 3M fibers. 3 millions.

With Loom, we can roughly create 1000 times more fibers than threads! You can surely tune the VM and the OS to increase the number of threads, but it is my understanding that there is a limit at around 32K.

Fibers are also much faster to create. 3K threads require 210 ms, but in the same amount of time, it is possible to create 200K fibers, meaning that fibers creation is around 70 times faster than thread creation!

Measuring Context Switching

In general, a computer needs to switch from one thread to another, and this takes a small but significant amount of time. We will now try to see if fibers are faster on this particular problem. To try to measure context switching, we will create two threads and exchange messages synchronously, with a code similar to this (you need to call ActorSystem.setDefaultStrategy() to select threads or fibers):

var actorAnswer = ActorSystem.anonymous().newActorWithReturn((Integer n) -> n * n);

Stereotypes.def().runOnceSilent(() -> {
  for (int i = 0; i < 250_000; i++)
    actorAnswer.sendMessageReturn(i).get();
}).closeOnExit(actorAnswer).waitForExit();

Here we have actorAnswer able to return the square of a number, and another actor asking it to do so 250K times, waiting for the result.

Strategy specifies if it is using threads or fibers.

On my VM, the threads need around 4700 ms to complete this task, while fibers need around 1500 ms, so fibers can exchange three times as many synchronous messages than threads.

Network Operations

Let’s now check if network operations are fine.

The following is a simple HTTP HelloWorld code that starts the embedded Java HTTP server:

Stereotypes.def().embeddedHttpServer(12345, exchange -> "Hello World!");

Every time a new client is connected, a new actor is created to process the request. In this case, threads and fibers perform very similarly at around a disappointing 2200 requests per second. Here, the bottleneck is probably the embedded HTTP server, which is not meant for server loads.

So, let’s try to write a super simple HTTP server that always answers with the same string:

Stereotypes.def().tcpAcceptorSilent(12345, conn -> {
  try (var is = conn.getInputStream(); var os = conn.getOutputStream()) {
    // Skips till the end of the HTTP request
    while (is.read() != '\n' || is.read() != '\r' || is.read() != '\n') { }

    os.write("HTTP/1.1 200 OK\r\nContent-Length: 6\r\n\r\nHello!".getBytes());
  }
}, null).waitForExit();

I am testing with Apache Bench, using 100 threads:

ab -k -n 50000 -c 100 http://localhost:12345/

The thread version can serve almost 11K requests per second, while fibers score above 24K. So, in this test, fibers are twice as fast as threads.

Are Fibers Always Faster?

Not exactly. For some reason, threads seem to be slightly faster at sending asynchronous messages, at around 8.5M per second, while fibers peak at around 7.5M per second. In addition, threads seem to suffer less of congestion when the number of threads grows, in this particular benchmark.

This might be solvable switching to a different messaging system than the one Fibry uses. In addition, let’s not forget than Loom is not yer ready for production, so there is still margin to improve the behavior.

If you want to run some benchmarks by yourself, you can find the full code and some more tests here: https://github.com/lucav76/Fibry Bench /

Conclusions

Loom seems to be in good shape. Fibers behave really well from a performance point of view and have the potential to increase the capacity of a server by wide margins, while, at the same time, simplifying the code. Fibers might not be a solution for every problem, but surely actors systems can greatly benefit from them.

I am looking forward to seeing Loom emerge in the mainline of the OpenJDK. Are you?

Java (programming language)

Published at DZone with permission of Luca Venturi. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending