Indirect communication is a widely used communication paradigm in distributed systems. It decouples distributed entities — message senders and receivers from each other. Every indirect communication technique employs some intermediary service or component to achieve the indirection between distributed entities. Among other indirect communication techniques, Message Broker provides the indirection for publish-subscribe and message queue systems.
This article provides a theoretical and technical background of message broker systems and a performance comparison of commonly used message broker systems today like ActiveMQ, RabbitMQ, Kafka, and AWS SQS.
The Architectural Elements in Distributed Systems
George Coulouris, et al. describes system models for distributed systems in their book.
As it depicts in above figure, the distributed systems compose of two main architectural elements: distributed entities and communication paradigms. Distributed entities represent any computation processes that communicate with each other over the network using some communication paradigms.
There are three main communication paradigms in distributed systems: interprocess communication, remote invocation, and indirect communication. Both interprocess communication and remote invocation have two aspects in common.
Firstly, the identities of distributed entities are known to each other. Secondly, the entities exist at the same time. That is, distributed entities in interprocess communication and remote invocation paradigms communicate directly with each other; hence, identities of distributed entities are known to each and should exist at the same time. But most of the modern distributed applications do not follow these aspects.
For example, applications running on mobile devices may not get unique network identities while devices switch over different networks. Also, they can be unavailable at any time. The indirect communication paradigm addresses these problems allowing space and time uncoupling of distributed entities.
Space uncoupling: The distributed/communication entities do not know identities of each other. This property provides great flexibility in designing distributed systems. It allows replication of distributed entities to improve scalability and fault tolerance. Also, it allows easy upgrade and migration of system components.
Time uncoupling: The distributed/communication entities do not need to exist at the same time. This property allows distributed entities to have their own lifetime.
Indirect Communication Paradigm
There are four main techniques in indirect communication paradigm. All these techniques provide space and time uncoupling for distributed entities via some intermediary service component.
As shown in the following figure, the indirect communication techniques can be categorized into two groups based on their technical design. In state-based models, the distributed entities are decoupled via an intermediary service that manages shared states for distributed entities. In the message-based approach, distributed entities communicate with each other by publishing or sending messages to an intermediary service, which ensure reliable message delivery on relevant entities.
Distributed/communication entities in both publish-subscribe and message queue models have two primary roles in common: message sender/publisher and message receiver/subscriber. Also, both has a similar challenge of reliable message delivery. The similar requirements for a message oriented intermediary service has resulted emerge of message broker systems as a separate service component.
Message Broker Systems
The message broker systems provide a message-oriented intermediary for publish-subscribe and message queue communication models. In most cases, the same message broker supports both communication models. The simplicity and message-oriented nature in message brokers have made broad adoption in many distributed application domains.
In addition to its indirect communication aspect, the message brokers are also considered as an integration hub for connecting diverse applications and its protocols, specifically in enterprise application domains. That is, the message brokers intrinsically support for integration of enterprise applications. Gregor Hohpe and Bobby Woolf explain how message brokers are considered as a placeholder for enterprise integration patterns.
Basically, the message indirection service in message brokers is extended to support various message integration patterns such as pipe-filters, message routers, and message translator, etc.
The extension of the message broker as an enterprise integration hub has opened up many application integration possibilities. One major use-case is setting up a pipeline of message flows that spans various enterprise applications, but with the challenge of interoperability between multiple message brokers.
The interoperability between applications is improved and resolved with the introduction of standard message specifications and protocols for message brokers. The following presents a set of commonly known messaging specifications and protocols today.
JMS: Java Message Service is part of Java EE platform. It provides a standard API for Java EE applications to create, send, and receive messages. Any applications that follow JMS API are deployable on any Java EE compliance applications servers. It ensures reliable delivery of messages. It supports both point-to-points and publish-subscribes messaging domains. That is, any JMS complains message broker systems supports both publish-subscribe and message queue indirection services in a single framework.
AMQP: Advanced Message Queuing Protocol mainly designed for supporting interoperability between different vendor products. It supports reliable message service in both Publish-subscribe and Message queue model with additional features such as routing, transaction, and Security.
MQTT: Message Queue Telemetry Transport supports only publish-subscribe messaging model, and specifically designed for resource constrained devices such as mobile and Internet of Things applications.
STOMP: Simple/Streaming Text Oriented Messaging is a simple and light-weight text-based protocol similar to HTTP.
Most of the message broker systems adhere to one or more messaging protocols/specifications to improve their interoperability. Most of the traditional message brokers such as ActiveMQ, RabbitMQ, and WebSpehereMQ support both publish-subscribe and message queue models. In addition, these traditional systems support various other features extended from their supported specifications and protocols.
For example, JMS drives message acknowledgment and transaction options for each message and may add significant overhead to the message flow. For this reason, most of the traditional message brokers are considered inapplicable for modern data-intensive needs. With compare to traditional approaches, modern message broker systems like Kafka and AWS SQS support only base messaging features and has focused on scalability over growing intensive data loads.
A Comparison of Message Broker Systems (ActiveMQ, RabbitMQ, Kafka, and AWS SQS)
The message brokers today can be categorized as modern and traditional systems. ActiveMQ and RabbitMQ are selected as traditional approaches, whereas Kafka and AWS SQS are considered to represent modern distributed approaches for the comparison test. The message latency, the time taken for a message to get from producer to consumer, is measured for different message sizes and batch sizes in this comparison.
Testbed for the Comparison
A simple testbed was designed and implemented for collecting the message latency measurements for different test scenarios. The following diagram shows the design overview of the testbed, which is extendible to apply the same test scenarios with any other message broker systems.
The source code of the above implementation can be found here.
To collect fair measurements, the testbed is set up on AWS cloud environment, which has greater control and flexibility in arranging resources. Each consumer, producer, and message broker components are setup on its own AWS EC2 instances, but in different availability zones. EC2 instances are created from Amazon Linux AMI with instance type t2.micro.
AWS Linux AMI ensures Network Time Protocol (NTP) are configured for all EC2 instances and clocks are in sync. This is an important factor for the message latency measurement. The following figure shows the execution environment setup for all test scenarios:
Three main test cases were considered based on message size and message load. In the first test case, 250 messages in 25 batches are generated at the producer and send to the message broker. This was carried out for messages with size, 32 Bytes, 128 Bytes, 1KB, 4KB, 16KB, 64KB, and 256KB. Each message was piggyback with the timestamp when it was sent. The consumer continuously listened on the message broker and calculated the time difference (latency) when the message was received at the consumer. The latency was calculated for each message and recorded for the data analysis.
In the second test case, total 2400 messages of size 32 bytes was generated and sent to the Message broker in batches, 25, 50, 100, 200, 400, and 800. As in first test case, the latency was calculated for individual messages and recorded for the analysis.
The third test case considered CPU usage of the Message broker. The messages size of 1 KB was
continuously generated in 100 batches by the producer. The consumer continuously listened and read the messages. Meanwhile, CPU usage of each Message broker was monitored and recorded for the analysis.
Results and Analysis
Latency Over Message Size
As it depicts in the following figure, ActiveMQ, RabbitMQ and Kafka show a clear trend that latency gets increased over message size. AWS SQS does not show such a clear trend. The latency range in SQS is high compared to other systems for each message size. It can be seen in almost every message size: RabbitMQ shows a lower latency compare to Kafka.
Latency Over Batch Size
In this test scenario, 32 bytes fixed size messages were generated while increasing the batch size sending a total of 2,400 messages. (Note: AWS SQS was not considered in this test scenario as it showed clear high latency from previous test case). The following figure shows the result of the test scenario. It is clearly seen, Kafka tends to perform well compare to others on high message load.
The figure below shows the CPU usage of each Message broker system, while producers continuously generating 1KB messages in 100 batches, and consumer continuously retrieving messages from the broker. The test case carried out simultaneously on all message brokers. The result shows Kafka still uses low CPU usage while providing better message latency compare to ActiveMQ and RabbitMQ.
CPU usage of message brokers.
The message brokers provide an architecturally simple and scalable solution among other indirect communication approaches based on the technical and theoretical background of indirect
communication technique. The message broker systems today can be categorized as traditional and modern approaches. ActiveMQ and RabbitMQ inherit tradition message broker features, while Kafka and AWS SQS represents modern approaches of message broker systems. The performance comparison of these message broker systems show that traditional approaches like RabbitMQ outperform modern systems like Kafka and SQS but in low message load. Kafka tends to perform well when increasing the message load. The CPU usage result shows that Kafka handles a high throughput of messages still with low CPU usage.
- G. Coulouris, Jean Dollimore, and Tim Kindberg. Distributed Systems: Concepts and Design.
- G. Hohpe and B. Woolf. Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions.
- E. Jendrock et al. The Java EE 6 Tutorial — Java Message Service Concepts.
- AMQP Advanced Message Queuing Protocol.
- STOMP: The Simple Text-Oriented Messaging Protocol.
- Prakash Malani. Transaction and Redelivery in JMS.