DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Modeling Saga as a State Machine
  • 4 Key Observability Metrics for Distributed Applications
  • Step-by-Step Guide: Application Using NestJs and Angular
  • OWASP TOP 10 API Security Part 2 (Broken Object Level Authorization)

Trending

  • System Coexistence: Bridging Legacy and Modern Architecture
  • Optimizing Integration Workflows With Spark Structured Streaming and Cloud Services
  • Can You Run a MariaDB Cluster on a $150 Kubernetes Lab? I Gave It a Shot
  • Unlocking Data with Language: Real-World Applications of Text-to-SQL Interfaces
  1. DZone
  2. Data Engineering
  3. Databases
  4. Your Lambda Function Might Execute Twice. Deal With It!

Your Lambda Function Might Execute Twice. Deal With It!

Tired of your Lambda functions triggering twice? Let's see why it happens and how to keep your functions working and idempotent.

By 
Michael Wittig user avatar
Michael Wittig
·
Nov. 12, 17 · Tutorial
Likes (6)
Comment
Save
Tweet
Share
21.2K Views

Join the DZone community and get the full member experience.

Join For Free

Are you confused when scheduled Lambdas execute twice, SNS messages trigger an invocation three times, your handmade S3 inventory is out of date because events occurred twice? Bad news: Sooner or later, your Lambda function will be invoked multiple times. You have to deal with it! The reasons are retries on errors and event sources that guarantee at-least-once delivery (e.g., CloudWatch Events, SNS, …).

Your Lambda function will be invoked multiple times

How do you know that your Lambda function is broken (or not idempotent)? If your function is given the same input (AKA event) multiple times, the function MUST produce the same result. If your function produces different results with the same input, the implementation is not idempotent, and you are in big trouble.

You may ask yourself how to fix it. Let’s work with a concrete example. Imagine a Lambda function to ensure that a user can only make a specific number request per day. A request could be an upload to an S3 bucket, sending a message, whatever. In other words, the Lambda function implements rate limiting. To do so, you need to store some state. A good place to store the state is DynamoDB. Luckily, DynamoDB offers many features to fix your problem.

1. Iteration: The Not Idempotent Implementation

The first iteration provides the most simple implementation. But also a broken implementation.

The input event looks like this:

{
    "user": "u1"
}


The function uses a DynamoDB table ratelimit with the primary key id (partition key). The id consists of the user and the current date (yyyy-mm-dd). Additionally, a calls attribute of type number is used to track the number of calls.

const AWS = require('aws-sdk');
const dynamodb = new AWS.DynamoDB({
    apiVersion: '2012-08-10'
});
const limit = 8;
exports.handler = (event, context, cb) => {
    const date = new Date().toISOString().slice(0, 10);
    const id = `iteration1:${event.user}:${date}`;
    dynamodb.updateItem({
        TableName: `ratelimit`,
        Key: {
            id: {
                S: id
            },
        },
        UpdateExpression: 'ADD calls :one',
        ExpressionAttributeValues: {
            ':one': {
                N: '1'
            }
        },
        ReturnValues: 'ALL_NEW'
    }, function(err, data) {
        if (err) {
            cb(err);
        } else {
            const calls = parseInt(data.Attributes.calls.N, 10);
            cb(null, {
                limited: calls > limit
            });
        }
    });
};



The implementation is not idempotent. The calls attribute is incremented even if the invocation is just a retry. The implementation would limit too early in this case. Let’s fix that!

2. Iteration: The Mostly Idempotent Implementation

Let’s try to fix the first iteration. Add a request id to the event.

The input event looks like this:

{
    "user": "u1",
    "request": "r1"
}


All event sources provide some unique id that you can use as the request id. Some examples:

  • Kinesis: Records[].eventID
  • SNS: Records[].Sns.MessageId
  • API Gateway: requestContext.requestId
  • Scheduled CloudWatch Event: id

The function uses the same DynamoDB table from the first iteration. Instead of the calls attribute, you use a requests attribute of type string set.

The requests attribute keeps track of all request ids that were already counts by the rate limiter. This way, you avoid to count the same request twice. DynamoDB also ensures that the requests set is limited to a certain amount of request ids.

const AWS = require('aws-sdk');
const dynamodb = new AWS.DynamoDB({
    apiVersion: '2012-08-10'
});
const limit = 8;
exports.handler = (event, context, cb) => {
    const date = new Date().toISOString().slice(0, 10);
    const id = `iteration2:${event.user}:${date}`;
    dynamodb.updateItem({
        TableName: `ratelimit`,
        Key: {
            id: {
                S: id
            },
        },
        UpdateExpression: 'ADD requests :requests',
        ConditionExpression: 'attribute_not_exists (requests) OR contains(requests, :request) OR size(requests) < :limit',
        ExpressionAttributeValues: {
            ':request': {
                S: event.request
            },
            ':requests': {
                SS: [event.request]
            },
            ':limit': {
                N: limit.toString()
            }
        }
    }, function(err) {
        if (err) {
            if (err.code === 'ConditionalCheckFailedException') {
                cb(null, {
                    limited: true
                });
            } else {
                cb(err);
            }
        } else {
            cb(null, {
                limited: false
            });
        }
    });
};


You can invoke this function as often as you like, as long as the input stays the same, the result is the same.

Wait, no. The date (yyyy-mm-dd) is used as part of the key. So if the current date changes to the next day, the result will be different.

3. Iteration: The Idempotent Implementation

Let’s fix that as well! The date has to be part of the input event as well. I added a timestamp but you could choose whatever format you like.

The input event looks like this:

{
    "user": "u1",
    "request": "r1",
    "timestamp": 1510067704370
}


Some event sources provide a date value out of the box. Some examples:

  • Kinesis: not supported but a workaround is possible: put the timestamp in the record data
  • SNS: Records[].Sns.Timestamp
  • API Gateway: not supported and a workaround is problematic because you would rely on the client’s time
  • Scheduled CloudWatch Event: time

The function uses the same DynamoDB table from the first iteration. You can use the implementation of the second iteration. But replace the way the date variable is generated:

const date = new Date(event.timestamp).toISOString().slice(0, 10);


That’s it. Idempotent rate limiting is implemented.

Summary: Request ids as a Savior

Use request identifiers to implement idempotent Lambda functions that do not break if a function invocation needs to be retried. Make sure to pass the original request id to all subsequent calls. The original request id is generated as early as possible in the chain. If possible on the client side. Avoid generating your own ids.

Use the original request id as the identity to guarantee idempotency:

  • In DynamoDB, use it as a primary key together with a ConditionExpression or UpdateExpression SET if_not_exists(att, :val).
  • In Step Functions, use it as the execution name.
  • For other services like Kinesis and SQS, put the original request id in the payload. The consumer can then use the original request id again.

You are now equipped to implement idempotent Lambda functions using DynamoDB. If you use any other data store, think carefully about idempotency!

Implementation Notes

The implementation does not work well with large limits (> 5000) because the requests set will grow with each request added. Sooner or later you will hit the 400 KB limitation of a DynamoDB item.

You could circumvent the limitation by working with smaller timespans (e.g., yyyy-mm-dd-hh or yyyy-mm-dd-hh-mm) which also leads to a lower limit but assumes a more or less even distribution of request during the day.

If you need a big limit, store each request as a separate item in DynamoDB and sum them asynchronously by using a DynamoDB stream on the table. Keep the current state about if a user is limited in a separate DynamoDB table. This is not an exact rate limiter, but with large limits, you could tolerate that users can make slightly more requests that allowed.

Requests Event Implementation Database

Published at DZone with permission of Michael Wittig. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Modeling Saga as a State Machine
  • 4 Key Observability Metrics for Distributed Applications
  • Step-by-Step Guide: Application Using NestJs and Angular
  • OWASP TOP 10 API Security Part 2 (Broken Object Level Authorization)

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!