Over a million developers have joined DZone.

Getting Notified About RabbitMQ Cluster Partitioning

· Java Zone

Learn more about how the Java language, tools and frameworks have been the foundation of countless enterprise systems, brought to you in partnership with Salesforce.

If you are running RabbitMQ in a cluster, it is not unlikely that the cluster gets partitioned (part of the cluster losing connection to the rest). The basic commands to show the status and configure the behaviour is explained in the linked page above. And when partitioning happens, you want to first be notified about that, and second – resolve it.

RabbitMQ actually automatically handles the second, with thecluster_partition_handling configuration. It has three values: ignore, pause_minority and autoheal. The partitions guide linked above explains that as well (“Which mode should I pick?”). Note that whatever you choose, you have a problem and you have to restore the connectivity. For example, in a multi-availability-zone setup I explained a while ago it’s probably better to use pause_minority and then to manually reconnect.

Fortunately, it’s rather simple to detect partitioning. The status command has an empty “partitions” element if there is no partitioning, and there is either a non-empty partitions element, or no such element at all, if there are partitions. So this line does the detection:

clusterOK=$(sudo rabbitmqctl cluster_status | grep "{partitions,\[\]}" | wc -l)

You would want to schedule that script to run every minute, for example. What to do with the result depends on the tool you use (Nagios, CloudWatch, etc). For Nagios there is a ready-to-use plugin, actually. And if it’s AWS CloudWatch, then you can do as follows:

if [ "$clusterOK" -eq "0" ]; then
    echo "RabbitMQ cluster is partitioned"
    aws cloudwatch put-metric-data --metric-name $METRIC_NAME --namespace $NAMESPACE --value 1 --dimensions Stack=$STACKNAME --region $REGION
    aws cloudwatch put-metric-data --metric-name $METRIC_NAME --namespace $NAMESPACE --value 0 --dimensions Stack=$STACKNAME --region $REGION

When partitioning happens, the important things is getting notified about it. After that it depends on the particular application, problem, configuration of queues (durable, mirrored, etc.)

Discover how the Force.com Web Services Connector (WSC) is a code-generation tool and runtime library for use with Force.com Web services, brought to you in partnership with Salesforce.


Published at DZone with permission of Bozhidar Bozhanov, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}