Cometd-2 Throughput vs Latency
With the imminent release of cometd-2.0.0, it's time to publish some of our own lies, damned lies and benchmarks. It has be over 2 years since we published the 20,000 reasons that cometd scales and in that time we have completely reworked both the client side and server side of cometd, plus we have moved to Jetty 7.1.4 from eclipse as the main web server for cometd.
- Improved Java API for both client and server side interaction.
- Improved concurrency in the server and client code base.
- Fully pluggable transports
- Support for a websocket transport (that works with latest chromium browsers).
- Improved extensions
- More comprehensive testing and examples.
- More graceful degradation under extreme load.
The results have been a dramatic increase in throughput while maintaining sub second latencies and great scalability.
The chart above shows the preliminary results of recent benchmarking carried out by Simone Bordet for a 100 room chat server. The test was done on Amazon EC2 nodes with 2 x amd64 CPUs and 8GB of memory, running ubuntu Linux 2.6.32 with Sun's 1.6.0_20-b02 JVM. Simone did some tuning of the java heap and garbage collector, but the operating system was not customized other than to increase the file descriptor limits. The test used the HTTP long polling transport. A single server machine was used and 4 identical machines were used to generate the load using the cometd java client that is bundled with the cometd release.
It is worth remembering that the latencies/throughput measured include the time in the client load generator, each running the full HTTP/cometd stack for many thousands of clients when in a real deployment each client would have a computer/browser. It is also noteworthy that the server is not just a dedicated comet server, but the fully featured Jetty Java Servlet container and the cometd messages are handled within the rich application context provided.
It can be seen from the chart above, that message rate has been significantly improved from the 3800/s achieved in 2008. All scenarios tested were able to achieve 10,000 messages per second with excellent latency. Only with 20,000 clients did the average latency start to climb rapidly once the message rate exceeded 8000/s. The top average server CPU usage was 140/200 and for the most part latencies were under 100ms over the amazon network, which indicates that there is some additional capacity available for this server. Our experience of cometd in the wild indicates that you can expect another 50 to 200ms network latency crossing the public internet, but that due to the asynchronous design of cometd, the extra latency does not reduce throughput.
Below is an example of the raw output of one of the 4 load generators, which shows some of the capabilities of the java cometd client, which can be used to develop load generators specific for your own application:
Statistics Started at Mon Jun 21 15:50:58 UTC 2010
Operative System: Linux 2.6.32-305-ec2 amd64
JVM : Sun Microsystems Inc. Java HotSpot(TM) 64-Bit Server VM runtime 16.3-b01 1.6.0_20-b02
System Memory: 93.82409% used of 7.5002174 GiB
Used Heap Size: 2453.7236 MiB
Max Heap Size: 5895.0 MiB
Young Generation Heap Size: 2823.0 MiB
- - - - - - - - - - - - - - - - - - - -
Testing 2500 clients in 100 rooms
Sending 3000 batches of 1x50B messages every 8000µs
- - - - - - - - - - - - - - - - - - - -
Statistics Ended at Mon Jun 21 15:51:29 UTC 2010
Elapsed time: 30164 ms
Time in JIT compilation: 12 ms
Time in Young Generation GC: 0 ms (0 collections)
Time in Old Generation GC: 0 ms (0 collections)
Garbage Generated in Young Generation: 1848.7974 MiB
Garbage Generated in Survivor Generation: 0.0 MiB
Garbage Generated in Old Generation: 0.0 MiB
Average CPU Load: 109.96191/200
Outgoing: Elapsed = 30164 ms | Rate = 99 messages/s - 99 requests/s
Waiting for messages to arrive 74450/75081
All messages arrived 75081/75081
Messages - Success/Expected = 75081/75081
Incoming - Elapsed = 30470 ms | Rate = 2464 messages/s - 2368 responses/s (96.14%)
Messages - Wall Latency Distribution Curve (X axis: Frequency, Y axis: Latency):
@ _ 56 ms (19201, 25.57%)
@ _ 112 ms (33230, 44.26%) ^50%
@ _ 167 ms (10282, 13.69%)
@ _ 222 ms (3438, 4.58%) ^85%
@ _ 277 ms (2479, 3.30%)
@ _ 332 ms (1647, 2.19%)
@ _ 388 ms (1462, 1.95%) ^95%
@ _ 443 ms (971, 1.29%)
@ _ 498 ms (424, 0.56%)
@ _ 553 ms (443, 0.59%)
@ _ 609 ms (309, 0.41%)
@ _ 664 ms (363, 0.48%)
@ _ 719 ms (338, 0.45%) ^99%
@ _ 774 ms (289, 0.38%)
@ _ 829 ms (153, 0.20%) ^99.9%
@ _ 885 ms (46, 0.06%)
@ _ 940 ms (3, 0.00%)
@ _ 995 ms (1, 0.00%)
@ _ 1050 ms (1, 0.00%)
@ _ 1105 ms (1, 0.00%)
Messages - Wall Latency Min/Ave/Max = 1/120/1105 ms
Messages - Network Latency Min/Ave/Max = 1/108/1100 ms
As time permits, we would like to update our java client to also support the websocket protocol, so that we can also generate the load from 20,000 websocket clients and see how this new protocol may further improve throughput and latency.