Optimizing AWS Lambda Performance With MongoDB Atlas and Node.js
In this article, we'll go step by step in showing you how to optimize the performance of your Lamba Function using Node.js to do the heavy lifting.
Join the DZone community and get the full member experience.
Join For FreeI attended an AWS user group a few weeks ago, and many of the questions from the audience concerned caching and performance. In this post, I review the performance implications of using Lambda functions with any database-as-a-service (DBaaS) platform (such as MongoDB Atlas). Based on internal investigations, I offer a specific workaround available for Node.js Lambda functions. Note that other supported languages (such as Python) may only require implementing some parts of the workaround, as the underlying AWS containers may differ in their resource disposal requirements. I will specifically call out below which parts are required for any language and which ones are Node.js-specific.
AWS Lambda is serverless, which means that it is essentially stateless. Well, almost. As stated in its developer documentation, AWS Lambda relies on a container technology to execute its functions. This has several implications:
- The first time your application invokes a Lambda function it will incur a penalty hit in latency – time that is necessary to bootstrap a new container that will run your Lambda code. The definition of "first time" is fuzzy, but word on the street is that you should expect a new container (i.e. a “first time” event) each time your Lambda function hasn’t been invoked for more than 5 minutes.
- If your application makes subsequent calls to your Lambda function within 5 minutes, you can expect that the same container will be reused, thus saving some precious initialization time. Note that AWS makes no guarantee it will reuse the container (i.e. you might just get a new one), but experience shows that in many cases, it does manage to reuse existing containers.
- As mentioned in the How It Works page, any Node.js variable that is declared outside the handler method remains initialized across calls, as long as the same container is reused.
Understanding Container Reuse in AWS Lambda, written in 2014, dives a bit deeper into the whole lifecycle of a Lambda function and is an interesting read, though it may not reflect more recent architectural changes to the service. Note that AWS makes no guarantee that containers are maintained alive (though in a "frozen" mode) for 5 minutes, so don’t rely on that specific duration in your code.
In our very first attempt to build Lambda functions that would run queries against MongoDB Atlas, our database as a service offering, we noticed the performance impact of repeatedly calling the same Lambda function without trying to reuse the MongoDB database connection. The wait time for the Lambda function to complete was around 4-5 seconds, even with the simplest query, which is unacceptable for any real-world operational application.
In our subsequent attempts to declare the database connection outside the handler code, we ran into another issue: we had to call db.close() to effectively release the database handle, lest the Lambda function timeout without returning to the caller. The AWS Lambda documentation doesn’t explicitly mention this caveat which seems to be language dependent since we couldn’t reproduce it with a Lambda function written in Python.
Fortunately, we found out that Lambda’s context object exposes a callbackWaitsForEmptyEventLoop
property, that effectively allows a Lambda function to return its result to the caller without requiring that the MongoDB database connection be closed (you can find more information about callbackWaitsForEmptyEventLoop
in the Lambda developer documentation). This allows the Lambda function to reuse a MongoDB Atlas connection across calls, and reduce the execution time to a few milliseconds (instead of a few seconds).
In summary, here are the specific steps you should take to optimize the performance of your Lambda function:
- Declare the MongoDB database connection object outside the handler method, as shown below in Node.js syntax (this step is required for any language, not just Node.js):
'use strict'
var MongoClient = require('mongodb').MongoClient;
let cachedDb = null;
- In the handler method, set
context.callbackWaitsForEmptyEventLoop
tofalse
before attempting to use the MongoDB database connection object (this step is only required for Node.js Lambda functions):
exports.handler = (event, context, callback) => {
context.callbackWaitsForEmptyEventLoop = false;
- Try to re-use the database connection object using the
MongoDB.connect(Uri)
method only if it is not null anddb.serverConfig.isConnected()
returns true (this step is required for any language, not just Node.js):
function connectToDatabase(uri) {
if (cachedDb && cachedDb.serverConfig.isConnected()) {
console.log('=> using cached database instance');
return Promise.resolve(cachedDb);
}
return MongoClient.connect(uri)
.then(db => { cachedDb = db; return cachedDb; });
}
The Serverless development with Node.js, AWS Lambda, and MongoDB Atlas tutorial post makes use of all these best practices, so I recommend that you take the time to read it. The more experienced developers can also find optimized Lambda Node.js functions (with relevant comments) in:
I’d love to hear from you, so if you have any question or feedback, don’t hesitate to leave them below.
Published at DZone with permission of Raphael Londner, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Trending
-
WireMock: The Ridiculously Easy Way (For Spring Microservices)
-
4 Expert Tips for High Availability and Disaster Recovery of Your Cloud Deployment
-
Using OpenAI Embeddings Search With SingleStoreDB
-
What Is React? A Complete Guide
Comments