Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

10 Useful Tips To Implement Distributed Fail-Over

DZone's Guide to

10 Useful Tips To Implement Distributed Fail-Over

· Java Zone
Free Resource

Learn how to troubleshoot and diagnose some of the most common performance issues in Java today. Brought to you in partnership with AppDynamics.

GridGainIt is no secret that automatic fail-over in distributed environments is no picnic to implement. Here are some useful pointers if you ever decide to do it on your own:

  1. Make sure to implement some sort of heartbeat protocol. A heartbeat is a message that every node emits to tell others that it's alive. It is usually implemented with IP Multicast, however actual communication protocol is not important here. Other nodes will consider a node to be failed after it missed a certain pre-configured number of heartbeats.
  2. Account for delays in node discovery. There is always a time window between an actual node crash and when other nodes find out about it.
  3. Store all messages on sender node until they get processed. This way you can fail them over to other nodes in case if the processing node failed.
  4. Account for possibility of receiving multiple notification events for the same node failure - you don't want to process the same fail-over event more than once.
  5. Make sure that your message does not get failed-over forever, i.e. keeps jumping between grid nodes indefinitely. After a certain number of fail-over attempts, let the whole processing of the message fail.
  6. Make sure that your message does not get failed-over to the same node it failed on initially - always give preference to other grid nodes.
  7. Make sure that message failure is not limited to node crashes. For example, you may potentially want to fail-over a message if it threw some exception on remote node or returned a bad result.
  8. Avoid sending any messages within synchronization blocks - this is a sure way to introduce deadlocks into your code.
  9. Make sure that fail-over happens automatically at infrastructure level and is transparent to your application logic.
  10. Provide a good interface for your failover module and make it pluggable - failover logic, such as selecting a new node, may differ based on your application policy, so it is essential to be able to easily switch underlying implementation.
Of course, you could always download GridGain and get all of the above right out of the box.

Understand the needs and benefits around implementing the right monitoring solution for a growing containerized market. Brought to you in partnership with AppDynamics.

Topics:

Published at DZone with permission of Dmitriy Setrakyan, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}