Dealing With Serverless Cold Starts, Once and For All!
Your functions won't always be up and running immediately, but there are strategies for managing and even preventing them.
Join the DZone community and get the full member experience.Join For Free
Serverless technology has been everywhere in the news these past few years, and for good reason. The adoption rate for this amazing new way of building applications and services is largely due to its unique set of benefits which include big cost benefits, graceful scaling and the speed at which developers publish new features.
You may also enjoy: How to Handle AWS Lambda Cold Starts
As with all good things in this world, serverless technology does come with its share of downsides and amongst these, there’s something new that we developers have yet to overcome: cold starts.
What is a Cold Start?
In short, a cold start is the time it takes for a function to start after a period of downtime in which it hasn’t been used.
Let me clarify that a little bit. Lambda functions are small bits of code that are initialized either by an event or a request. Once that execution is complete, given the ephemeral and stateless nature of these functions, they go away but the container that the function resides in hangs out for a bit longer.
That container that I’ve mentioned is basically the entire infrastructure that the Lambda function gets to use in order to complete its tasks. It has the runtime, the code, and any other necessary plugin contained within it.
If the function doesn’t see activity for a period of time (the current consensus is that the idle time is around 35 minutes) this container gets destroyed. If the function gets called later on, it will take a little more time before it can execute the code because the container needs time to spin up. This is called a cold start.
Why Do Serverless Containers Need to Be Destroyed?
Like I’ve mentioned earlier, one of the best things about serverless technology is its scaling capabilities that let your application serve hundreds of users at once without breaking a sweat (provided you don’t run into concurrencies issues). In order to get that almost infinite scaling, you might imagine, AWS needs space and compute power to run, so it makes that space by deleting containers that haven’t been used in a while, allowing it to shift its capabilities to applications with higher demand.
The Impact of a Cold Start on An Application
The overall impact a cold start has depends on two big criteria: the runtime you are using and whether your functions are used in a time-sensitive request (anything user-facing, I’d consider time-sensitive). Suffice it to say, even a 100 ms delay could be a dealbreaker to some.
The language you’ve chosen to write your function with has a big impact on cold starts and you better take some time to understand exactly where these functions will end up so you can make a better decision as to what kind of programming language to use for your API’s.
Scripting languages such as Python and Ruby can perform significantly better than compiled runtimes. Yan Cui did an awesome comparison of language startup times in AWS Lambda.
Python was the best out of the bunch, with up to 100x faster startup times than other contenders such as Java, C#, and NodeJS. Whenever possible, consider writing your serverless functions in a lightweight language such as Python. Although the execution of a Python script is slower (due to its interpreted nature), the reduced startup latency may offset and provide an overall better performance (and lower bills from your cloud provider).
Can You Prevent Cold Starts?
Can you prevent cold starts from happening? Yes. Should you? Not always. Preventing cold starts could be as easy as using a library that calls your function every x minutes, ensuring it never gets to that point where AWS just kills the container.
My buddy Renato created something like this and you can “git it” (get it? Because it’s on Github) right now. X-Lambda will monitor your AWS Lambda functions, analyze past behavior and use statistical methods to forecast short-term invocation demand. Running on a scheduled basis, it will keep the right number of containers warm, mitigating cold start latency.
Now the second part of the question is whether or not to actually keep lambdas warm for an indefinite time. My answer would be not always. Another reason serverless is awesome is that while it has its fair share of security vulnerabilities the fact that your functions are only available for a short period of time so that the attacker will have less time to mess with it before the access to that function gets taken away all together.
Mitigating the Impact of Cold Starts
I’d argue that rather than going with creating a pool of warm lambdas by using a “synthetic method,” the better route would be to take certain steps to ensure that the impact of the cold start is not felt as much by the end-user.
The first and obvious thing to try is increasing the memory limit on your functions. More memory equals faster boot-up time. If the cost implication is not an issue for your use case, consider allocating more memory to the functions you need the best startup performance.
Another great way to mitigate the impact of cold starts is by choosing a fast booting runtime like Python, and since I’ve already spoken about this I’m not going to bore you by repeating myself.
Cut Down on Those Package Sizes
Here’s one that might be overlooked by a lot of people. One of the spinup processes that the container does is unzipping the entire package that was uploaded with the function. Naturally, a big package will slow things down quite a bit.
When we package our code for a serverless function, it’s very common to put in the zipped file everything we have, from README files to unnecessary third-party libraries files. It’s important to clean up our package before deploying in production, removing everything that is not used or needed by our function to run. This will contribute to a shorter cold start time by reducing internal networking latency — the function will be fetching a smaller package file.
Lose the Virtual Private Network
This one is obvious but might get missed nevertheless, which comes at a big cost for the end-user. Functions running inside a virtual private cloud will suffer additional latency, taking usually an extra second or two to startup; try to design your functions to run outside a VPC.
Monitor your cold stats. We touched the infrastructure factors that drive container startup latency, but our code is also a major contributor. We need to constantly monitor our application’s performance, in order to identify bottlenecks and what is driving execution time up or down.
In order to do that, it's recommended to always log timestamps during the execution of a function and monitor duration outliers in your function’s invocations history. Whenever it performs worse than expected, go to the logs and identify which parts of your code contributed to the bad performance.
Services such as AWS X-Ray and Dashbird support this type of analysis out-of-the-box, saving you a lot of time in this performance optimization journey. In case you’re running serverless functions in production for professional projects, using such a service is a must.
I hope this provides a better overview of what cold starts are and how to deal with them. If you think I’ve missed anything or have anything to share just leave a message in the comments section.
Opinions expressed by DZone contributors are their own.