{{announcement.body}}
{{announcement.title}}

Digging Into Sockets With Java Flight Recorder

DZone 's Guide to

Digging Into Sockets With Java Flight Recorder

Let's look at one amazing tool for gathering information about what's going on in your JVM — the Java Flight Recorder.

· Java Zone ·
Free Resource

Let's look at one amazing tool for gathering information about what's going on in your JVM — the Java Flight Recorder.
Let's look at one amazing tool for gathering information about what's going on in your JVM. The tool is called the  Java Flight Recorder, and in today's article, I'm going to use it to dig into how Sockets behave in your Java application.

A lot of people use HTTP APIs to communicate between their services (let's have another article about whether it's a good solution or not), and in a majority of cases, it works fine. However, we can get into situations where performance really matters, and especially for libraries writers, JFR can be a very useful tool to see I/O works in their code.

You may also like: Using Java Flight Recorder With OpenJDK 11

This article is not an introduction to Flight Recorder, it's rather more practical. If you want to learn more about this technology, you can start with JEP 328: Flight Recorder.

Introduction to Our Netty Application

As an example, I decided to use a great project called Netty that simplifies the JDK NIO implementation and provides a lot of tricks we can use to optimize our application. Let's dive into what we are going to use for our experiments.

Feel free to check out my jfr-sockets project on GitHub and try out everything with me. The project actually contains two applications based on TCP. The first one in the package simple is just a dump TCP Client/Server and the client pushes every 1s a string my-message that takes exactly 10 bytes. And the second one is Netty-based Client/Server application where the server pushes every 1s the same string as before but using WebSocket. I deliberately chose those types of applications, and we'll look at the details of why I did it.

You can also notice a String constant called CONFIG. I'm using a trick where the file with the proper configuration is created this way, instead of changing a regular JFR configuration file. If you want to see configuration files that are in JDK, please, open this folder ${JAVA_HOME}/lib/jfrVery likely you spotted two files default.jfc and profile.jfc. If you open it, you can see XML configuration of JFR events. This configuration says when the event is supposed to be emitted. The configuration file default has smaller overhead (~ 1%), and it's useful for continuous profiling whereas the profile configuration is much aggressive and emits more events with lower thresholds (~ 2% overhead).

We needed to create our own settings because, by default, there is a 20 ms threshold for reading and writing but we want to see every operation.

Notice that I'm using a new feature that's available since Java 14 JEP 349: JFR Event Streaming, which means if you want to try it out, you need to download it from https://jdk.java.net/14/.

Run a Simple Blocking Application

The very first example is a class pbouda.jfr.sockets.simple.Start. If you try it out, you can see the output similar to one below. As we mentioned earlier, the application writes UTF String my-message (10 bytes), and if you look inside the methods, we actually got in the stacktrace field, 10-byte string is encoded into 12-byte message.

Since TCP is based on streams, we need a well-known delimiter between two messages, or we need to encode the size of the message somehow. JDK implementation of writeUTF and readUTF is based on the second option. JDK uses the first two bytes as a short type, which determines the number of bytes in the message itself.

Java




x


 
1
[0, 10, 109, 121, 45, 109, 101, 115, 115, 97, 103, 101]



According to jdk.SocketWrite event, we can see that we wrote 12-byte array (2 bytes for length, 10 bytes for a payload).

In terms of jdk.SocketRead, we can notice two events belonging to DataInputStream#readUnsignedShort. As we mentioned above, first, we need to read the size of the payload (unsigned short). This operation emitted two events because it always calls  SocketInputStream#read twice to read short value (one byte per operation). The very last event already contains the payload and calls DataInputStream#readFully.

Java




xxxxxxxxxx
1
79


1
jdk.SocketWrite {
2
    startTime = 20:03:57.651
3
    duration = 0.185 ms
4
    host = "localhost"
5
    address = "127.0.0.1"
6
    port = 52960
7
    bytesWritten = 12 bytes
8
    eventThread = "Thread-0" (javaThreadId = 18)
9
    stackTrace = [
10
        java.net.Socket$SocketOutputStream.write(byte[], int, int) line: 65
11
        java.io.DataOutputStream.write(byte[], int, int) line: 106
12
        java.io.DataOutputStream.writeUTF(String, DataOutput) line: 397
13
        java.io.DataOutputStream.writeUTF(String) line: 325
14
        pbouda.jfr.sockets.simple.Start$ClientHandler.run() line: 67
15
        ...
16
    ]
17
}
18
 
          
19
jdk.SocketRead {
20
    startTime = 20:03:56.558
21
    duration = 1.09 s
22
    host = "localhost"
23
    address = "127.0.0.1"
24
    port = 5000
25
    timeout = 0 s
26
    bytesRead = 1 byte
27
    endOfStream = false
28
    eventThread = "main" (javaThreadId = 1)
29
    stackTrace = [
30
        java.net.Socket$SocketInputStream.read(byte[], int, int) line: 68
31
        java.net.Socket$SocketInputStream.read() line: 977
32
        java.io.DataInputStream.readUnsignedShort() line: 341
33
        java.io.DataInputStream.readUTF(DataInput) line: 593
34
        java.io.DataInputStream.readUTF() line: 568
35
        ...
36
    ]
37
}
38
 
          
39
jdk.SocketRead {
40
    startTime = 20:03:57.651
41
    duration = 0.0617 ms
42
    host = "localhost"
43
    address = "127.0.0.1"
44
    port = 5000
45
    timeout = 0 s
46
    bytesRead = 1 byte
47
    endOfStream = false
48
    eventThread = "main" (javaThreadId = 1)
49
    stackTrace = [
50
        java.net.Socket$SocketInputStream.read(byte[], int, int) line: 68
51
        java.net.Socket$SocketInputStream.read() line: 977
52
        java.io.DataInputStream.readUnsignedShort() line: 342
53
        java.io.DataInputStream.readUTF(DataInput) line: 593
54
        java.io.DataInputStream.readUTF() line: 568
55
        ...
56
    ]
57
}
58
 
          
59
jdk.SocketRead {
60
    startTime = 20:03:57.652
61
    duration = 0.0699 ms
62
    host = "localhost"
63
    address = "127.0.0.1"
64
    port = 5000
65
    timeout = 0 s
66
    bytesRead = 10 bytes
67
    endOfStream = false
68
    eventThread = "main" (javaThreadId = 1)
69
    stackTrace = [
70
        java.net.Socket$SocketInputStream.read(byte[], int, int) line: 68
71
        java.io.DataInputStream.readFully(byte[], int, int) line: 199
72
        java.io.DataInputStream.readUTF(DataInput) line: 613
73
        java.io.DataInputStream.readUTF() line: 568
74
        pbouda.jfr.sockets.simple.Start.main(String[]) line: 49
75
        ...
76
    ]
77
}
78
 
          
79
my-message



If you want to see how those events are represented in JDK, then here is a clue:

Java




xxxxxxxxxx
1


1
jdk.jfr.events.SocketReadEvent 
2
jdk.jfr.events.SocketWriteEvent



Don't hesitate to navigate to where those events are emitted. JFR events are usually cached and reused later on. In this case, we can see that the object is stored in ThreadLocal variable. The fields are always re-set, the event is emitted, and then the object can be used again by the same thread. Everything is done for the sake of being GC friendly.  

What Interesting Stuff Can We See?

The socket events are very rich and we can see a lot of interesting information. The majority of those fields are very descriptive, just a note about the address/hostname/port. We can very easily see what other applications we are communicating to, don't we call any suspicious endpoint? 

However, my favorite field is duration. We can see big differences between SocketRead events in the snippet above. Why the very first events took more than 1 second and what does it actually mean?

We already know that the client has a 1-second delay between messages. However, the duration field is incredibly valuable because now we can see that something is blocking our thread (of course, we can see it even from the stacktrace because InputStream is blocking by nature). What actually the thread does is that it waits until any data arrives to Receive Socket Buffer to be able to copy it into internal temporary DirectBuffer. We can notice that the other operations were much faster because bytes were already in the socket buffer and we just read the exact number of bytes we needed.

Why Do I See Such a Long Duration?

If you use Java Mission Control to visualize your JFR events, then you very likely noticed that Socket I/O section contains incredible long durations and Total I/O Time. This is the result of having some blocking TCP connections opened and waiting for data. In the case below, it's RMI TCP Connection, but you can come across a very similar problem if you have let's say RabbitMQ and don't consume any messages.

Where Are the Events Emitted From My Endpoint?

You can be also surprised: Why don't I see any other traffic? Let's say we have an application with HTTP Endpoint, which is used very often. The reason is that you probably run on the default JFR configuration that says: only operations longer than 20ms will emit an event. That means that you can celebrate because your application very likely doesn't suffer any serious (I know it's relative, 20ms is not a small latency) Socket IO problems.

Run Netty-Based WebSocket Application

Let's proceed with the second application. As we already mentioned, it's a WebSocket application where the server broadcasts the messages to all connected clients. For this experiment, we're going to connect only one client and see what messages the client receives.

If you know Netty project, you probably noticed that Netty contains several implementations of SocketChannel. Let's start with the JDK-based NIO Socket Channel.

Run a class that automatically starts the client and server: pbouda.jfr.sockets.netty.Start.

First, we can see four events belonging to establishing a WebSocket connection. Notice that we already have a meaningful eventThread field that can help us a lot when we start to investigate any problems. 

Java




xxxxxxxxxx
1
74


1
jdk.SocketWrite {
2
    startTime = 21:21:27.135
3
    duration = 0.677 ms
4
    host = ""
5
    address = "127.0.0.1"
6
    port = 8080
7
    bytesWritten = 292 bytes
8
    eventThread = "client-nioEventLoopGroup-0" (javaThreadId = 26)
9
    stackTrace = [ sun.nio.ch.SocketChannelImpl.write(ByteBuffer) line: 138
10
        io.netty.channel.socket.nio.NioSocketChannel.doWrite(ChannelOutboundBuffer) line: 405
11
        io.netty.channel.AbstractChannel$AbstractUnsafe.flush0() line: 931
12
        io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0() line: 354
13
        io.netty.channel.AbstractChannel$AbstractUnsafe.flush() line: 898
14
        ...
15
    ]
16
}
17
 
          
18
jdk.SocketRead {
19
    startTime = 21:21:27.149
20
    duration = 0.0479 ms
21
    host = ""
22
    address = "127.0.0.1"
23
    port = 41244
24
    timeout = 0 s
25
    bytesRead = 292 bytes
26
    endOfStream = false
27
    eventThread = "server-nioEventLoopGroup-1" (javaThreadId = 27)
28
    stackTrace = [
29
        sun.nio.ch.SocketChannelImpl.read(ByteBuffer) line: 73
30
        io.netty.buffer.PooledByteBuf.setBytes(int, ScatteringByteChannel, int) line: 247
31
        io.netty.buffer.AbstractByteBuf.writeBytes(ScatteringByteChannel, int) line: 1147
32
        io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(ByteBuf) line: 347
33
        io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read() line: 148
34
        ...
35
    ]
36
}
37
 
          
38
jdk.SocketWrite {
39
    startTime = 21:21:27.177
40
    duration = 0.102 ms
41
    host = ""
42
    address = "127.0.0.1"
43
    port = 41244
44
    bytesWritten = 129 bytes
45
    eventThread = "server-nioEventLoopGroup-1" (javaThreadId = 27)
46
    stackTrace = [
47
        sun.nio.ch.SocketChannelImpl.write(ByteBuffer) line: 138
48
        io.netty.channel.socket.nio.NioSocketChannel.doWrite(ChannelOutboundBuffer) line: 405
49
        io.netty.channel.AbstractChannel$AbstractUnsafe.flush0() line: 931
50
        io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0() line: 354
51
        io.netty.channel.AbstractChannel$AbstractUnsafe.flush() line: 898
52
        ...
53
    ]
54
}
55
 
          
56
jdk.SocketRead {
57
    startTime = 21:21:27.183
58
    duration = 0.0394 ms
59
    host = ""
60
    address = "127.0.0.1"
61
    port = 8080
62
    timeout = 0 s
63
    bytesRead = 129 bytes
64
    endOfStream = false
65
    eventThread = "client-nioEventLoopGroup-0" (javaThreadId = 26)
66
    stackTrace = [
67
        sun.nio.ch.SocketChannelImpl.read(ByteBuffer) line: 73
68
        io.netty.buffer.PooledByteBuf.setBytes(int, ScatteringByteChannel, int) line: 247 
69
        io.netty.buffer.AbstractByteBuf.writeBytes(ScatteringByteChannel, int) line: 1147 
70
        io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(ByteBuf) line: 347 
71
        io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read() line: 148 
72
        ... 
73
    ] 
74
}



Let's see the events related to our payload (it's not so easy to figure out what events belong to what log record but it does not matter so much right now).

We can notice that we have very often one event for reading and one for writing both containing 12 bytes (it can be also different, we can read just a part of the message and then WebSocket support in Netty needs to wait for another read/buffer to be able to parse the whole WebSocket frame).

Someone can be curious why did we transport 12 bytes in a case of the 10-byte message? There is an overhead of 2 bytes in case of messages smaller than 126 bytes, it's related to WebSocket framing mechanism.

Someone can also be surprised that we didn't experience any durations longer than 1-sec in the case of  SocketRead events as we did in the first example. The reason is that NIO is configured in a non-blocking mode, which means that it does not wait on incoming data (just copies what is currently in socket buffer into JDK DirectBuffer and pass it to Netty and don't care about the number of bytes), and the current thread just tries to do something else instead (it can read from a different channel if we have multiple connections for one event-loop). So, we can see that our application is very likely healthy in terms of blocking threads; it can be a problem if you want to pass a C10K problem (handling thousands of connections in parallel).

Java




xxxxxxxxxx
1
39


 
1
Broadcaster-Server 2020-01-14 21:21:29,181 [client-nioEventLoopGroup-0] INFO p.j.s.n.c.WebSocketClientHandler - Received message: my-message (10 bytes)
2
 
          
3
jdk.SocketWrite {
4
    startTime = 21:21:29.454
5
    duration = 0.0644 ms
6
    host = "" 
7
    address = "127.0.0.1" 
8
    port = 41244 
9
    bytesWritten = 12 bytes 
10
    eventThread = "server-nioEventLoopGroup-1" (javaThreadId = 27) 
11
    stackTrace = [ 
12
        sun.nio.ch.SocketChannelImpl.write(ByteBuffer) line: 138 
13
        io.netty.channel.socket.nio.NioSocketChannel.doWrite(ChannelOutboundBuffer) line: 405 
14
        io.netty.channel.AbstractChannel$AbstractUnsafe.flush0() line: 931 
15
        io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0() line: 354 
16
        io.netty.channel.AbstractChannel$AbstractUnsafe.flush() line: 898 
17
        ... 
18
    ] 
19
} 
20
 
          
21
jdk.SocketRead { 
22
    startTime = 21:21:29.454 
23
    duration = 0.0197 ms 
24
    host = "" 
25
    address = "127.0.0.1" 
26
    port = 8080 
27
    timeout = 0 s 
28
    bytesRead = 12 bytes 
29
    endOfStream = false 
30
    eventThread = "client-nioEventLoopGroup-0" (javaThreadId = 26) 
31
    stackTrace = [ 
32
        sun.nio.ch.SocketChannelImpl.read(ByteBuffer) line: 73 
33
        io.netty.buffer.PooledByteBuf.setBytes(int, ScatteringByteChannel, int) line: 247 
34
        io.netty.buffer.AbstractByteBuf.writeBytes(ScatteringByteChannel, int) line: 1147 
35
        io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(ByteBuf) line: 347 
36
        io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read() line: 148 
37
        ... 
38
    ] 
39
}



Show Me Some Possible Optimizations With Netty!

Netty allows us to drive our communication belonging to one connection very effectively. I prepared one example showing that we can very easily decrease the number of syscalls using a cache of messages in Netty. Simply, we decided to sacrifice a latency of one message and make our server application much more efficient by removing syscalls. Let's say we want to flush not every message but every fifth message to the given client.

We need to do two tiny changes:

Java




xxxxxxxxxx
1


 
1
pbouda.jfr.sockets.netty.server.SlowConsumerDisconnectHandler 
2
- we need to comment out flushing of every message and use simple write instead 
3
- write method does not automatically write data into the socket, it waits for a flush context.writeAndFlush(obj) -> context.write(obj)



Java




xxxxxxxxxx
1


 
1
pbouda.jfr.sockets.netty.Start#main 
2
- uncomment the section at the end of the method `Flush a bulk of 5 messages`



Now, we can start the Netty app again. We can see that we received five messages in a bulk and transmit 60 bytes instead of 12 using one syscall/invocation.

Java




xxxxxxxxxx
1
43


 
1
Broadcaster-Server 2020-01-14 22:12:00,937 [client-nioEventLoopGroup-0] INFO p.j.s.n.c.WebSocketClientHandler - Received message: my-message (10 bytes) 
2
Broadcaster-Server 2020-01-14 22:12:00,937 [client-nioEventLoopGroup-0] INFO p.j.s.n.c.WebSocketClientHandler - Received message: my-message (10 bytes)
3
Broadcaster-Server 2020-01-14 22:12:00,938 [client-nioEventLoopGroup-0] INFO p.j.s.n.c.WebSocketClientHandler - Received message: my-message (10 bytes) 
4
Broadcaster-Server 2020-01-14 22:12:00,938 [client-nioEventLoopGroup-0] INFO p.j.s.n.c.WebSocketClientHandler - Received message: my-message (10 bytes) 
5
Broadcaster-Server 2020-01-14 22:12:00,939 [client-nioEventLoopGroup-0] INFO p.j.s.n.c.WebSocketClientHandler - Received message: my-message (10 bytes) 
6
 
          
7
jdk.SocketWrite { 
8
    startTime = 22:12:01.603 
9
    duration = 2.23 ms 
10
    host = "" 
11
    address = "127.0.0.1" 
12
    port = 42556 
13
    bytesWritten = 60 bytes 
14
    eventThread = "server-nioEventLoopGroup-1" (javaThreadId = 27) 
15
    stackTrace = [ 
16
        sun.nio.ch.SocketChannelImpl.write(ByteBuffer[], int, int) line: 167
17
        io.netty.channel.socket.nio.NioSocketChannel.doWrite(ChannelOutboundBuffer) line: 420
18
        io.netty.channel.AbstractChannel$AbstractUnsafe.flush0() line: 931 
19
        io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0() line: 354 
20
        io.netty.channel.AbstractChannel$AbstractUnsafe.flush() line: 898 
21
        ... 
22
    ] 
23
} 
24
 
          
25
jdk.SocketRead { 
26
    startTime = 22:12:01.605 
27
    duration = 0.0757 ms 
28
    host = "" 
29
    address = "127.0.0.1" 
30
    port = 8080 
31
    timeout = 0 s 
32
    bytesRead = 60 bytes 
33
    endOfStream = false 
34
    eventThread = "client-nioEventLoopGroup-0" (javaThreadId = 26) 
35
    stackTrace = [ 
36
        sun.nio.ch.SocketChannelImpl.read(ByteBuffer) line: 73 
37
        io.netty.buffer.PooledByteBuf.setBytes(int, ScatteringByteChannel, int) line: 247 
38
        io.netty.buffer.AbstractByteBuf.writeBytes(ScatteringByteChannel, int) line: 1147 
39
        io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(ByteBuf) line: 347 
40
        io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read() line: 148 
41
        ... 
42
    ] 
43
}



I Use Netty and Don't See Any Events! Why?

There are plenty of reasons, but one of them is a bit tricky. If you are running on Linux and you want to be very efficient, you can enable e-poll transport. It's a non-blocking native transport that can be used only on Linux. The big advantage of this transport is that if we have an application handling tens of thousands of connections in parallel, then, in the case of JDK NIO, we have to iterate over all connections to try if there is any data to read — we have O(N) to read data. Simple, e-poll is able to get the information about readable connections - O(1).

The problem is that the e-poll support is fully developed by Netty team and does not emit any JFR events. If you want to try it out, then:

Java




xxxxxxxxxx
1


1
pbouda.jfr.sockets.netty.server.Server  
2
- uncomment EpollEventLoopGroup instead of NioEventLoopGroup 
3
- uncomment EpollServerSocketChannel instead of NioServerSocketChannel



Summary

JFR events can be a very powerful tool to find weaknesses in our applications. It's worth studying what it offers to us and be ready to use it when we run into any problem. In general, it means:

SocketRead events: We are waiting for incoming data. Our Receive Socket Buffer is very likely empty. In the case of non-blocking transport, we don't receive any events because of the non-blocking nature of the implementation. In the case of blocking transport, we can receive events but it can be some kind of connection that is just idle and waits for any data.

SocketWrite events: The situation is just reversed. We can block in case of a blocking transport if the Send Socket Buffer is full. In the case of non-blocking transport, we can do partial write (write just bytes that fit to Send Socket Buffer) and the rest of bytes need to be cached by our application.

If you want to try more events (even other types), then check out this repo jfr-playground. Otherwise, thank you for reading my article and please leave comments below. If you would like to be notified about new posts, then start following me on Twitter!

Further Reading

Using Java Flight Recorder With OpenJDK 11

Java 11 Features: Java Flight Recorder 

Topics:
java ,java flight recorder ,tcp ,socket ,netty

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}