Over a million developers have joined DZone.

Azure Event Grid WebHooks (Part 3): Retries

DZone's Guide to

Azure Event Grid WebHooks (Part 3): Retries

Bring some resilience to your app with a Retry pattern set up through Azure Event Grid. Here, we create a Retry policy designed around WebHook requests.

· Cloud Zone ·
Free Resource

Learn how to migrate and modernize stateless applications and run them in a Kubernetes cluster.

Building distributed systems is challenging. If not carefully designed and implemented, a failure in one component can cause cascading failures that affect the whole system. That's why patterns like Retry and Circuit Breaker should be considered to improve system resilience. In the case of sending WebHooks, the situation might be even worse — as your system is calling a totally external system with no availability guarantees and over the internet, which is less reliable than your internal network.

Continuing on from the previous parts of this series (Part 1, Part 2), I'll show how to use Azure Event Grid to overcome this challenge.

Azure Event Grid Retry Policy

Azure Event Grid provides a built-in capability to retry failed requests with exponential backoff, which means that if the WebHook request fails, it will be retried with increased delays.

As per the documentation, failed requests will be retried after 10 seconds, and if the request fails again, it will keep retrying after 30 seconds, 1 minute, 5 minutes, 10 minutes, 30 minutes, and 1 hour. However, these numbers aren't exact intervals, as Azure Event Grid adds some randomization to these intervals.

Events that take more than 2 hours to be delivered will expire. This duration should be increased to 24 hours after the preview phase.

This behavior is not trivial to implement, which adds to the reasons why using a service like Azure Event Grid should be considered as an alternative to implementing its capabilities from scratch.

Testing Azure Event Grid Retry

To try this capability, and building on the example used in Part 1, I made a change to the AWS Lambda function that receives the WebHook to introduce random failures:

public object Handle(Event[] request)
    Event data = request[0];
        return new {validationResponse = data.Data.validationCode};

    var random = new Random(Guid.NewGuid().GetHashCode());
    var value = random.Next(1 ,11);

    if(value > 5)
        throw new Exception("Failure!");

    return "";

Lines 9-15 produce almost a 50% failure rate. When I pushed an event (as shown in the previous posts) to 1,000 WebHook subscribers, the result was the below chart depicting the number of API calls per minute and the number of 500 errors per minute:

Number of requests per minute (Blue) - Number of 500 Errors per minute (Orange)

We can observe the following:

  • The number of errors (orange) is almost half the number of requests (blue)
  • The number of requests per minute is around 1,500 for the first minute. My explanation is that since we have 1,000 listeners and a 50% failure rate, Azure has made extra 500 requests.
  • After a bit less than 2 hours (not shown in the chart for size constraints), the number of errors dropped to 5 and no more requests were made. This is due to the expiration period during the preview.


Azure Event Grid is a scalable and resilient service that can be used to handle thousands (maybe more) of WebHook receivers. Whether your solution is hosted on-premises or on Azure, you can use this service to offload a lot of work and effort.

I wish that Azure Event Grid could give some insights on how events are pushed and received, which would help a lot in troubleshooting, as the subscriber is usually not under your control. I hope this will become an integrated part of the Azure portal.

It's worth mentioning that other cloud providers support similar functionality, specifically Amazon Simple Notification Service (SNS) and Google Cloud Pub/Sub. Both have overlapping functionality with Azure Event Grid.

Join us in exploring application and infrastructure changes required for running scalable, observable, and portable apps on Kubernetes.

azure event grid ,webhooks ,resilience patterns ,cloud ,tutorial

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}