Provisioned Concurrency: The Silver Bullet to AWS Lambda Cold Starts
Join the DZone community and get the full member experience.Join For Free
The year 2014 marked the start of the serverless era with Dr. Werner Vogels announcing AWS Lambda to an ecstatic crowd at AWS’ esteemed re:Invent. What was promised was a compute service abstracting away all the pains of cloud orchestration, thus leaving the user to only worry about the business logic they would run on these magic little worker nodes spun out of thin air and pure engineering.
Even though AWS Lambda was not the first FaaS service out there, with a startup called PiCloud being the first FaaS provider back in 2010, AWS was the first major cloud provider to jump into the race. In fact, it can be argued that AWS is actually the one that kick-started the serverless era, soon followed by Google Cloud Functions and Azure Functions by Microsoft. By 2017 the cloud wars intensified, with more and more providers descending upon the battlefield, all championing one promise, no more orchestration needed.
In the following years, serverless grew in popularity, but alas, the serverless adornment began to wane. Many held the limitation cloud vendors put on their FaaS offerings as the cause for avoiding the mass adoption of serverless services. However, one of the primary causes of avoiding serverless adoption was cold starts, which we'll talk about in-depth in the following sections.
However, AWS may have just announced the silver bullet to the much-dreaded cold start in the form of Provisioned Concurrency. Half a decade after kick-starting the serverless train, AWS has shoveled in new coal with Provisioned Concurrency to accelerate the trend.
In this post, we'll explain how Provisioned Concurrency works and answer the question, does this new feature still keep serverless technologies on the same track as was first laid out?
You may also like: AWS Lambda Best Practices.
The Inherent Problem
To understand the solution to cold starts, we need to understand why cold starts occur in the first place. They could generally be defined as the setup time required to get a serverless application’s environment up and running when it is invoked for the first time within a defined period of time. With this understanding, we also accept that cold starts are somewhat of an inherent problem with the serverless model.
Serverless applications run on ephemeral containers, the worker nodes where the management of these nodes becomes the responsibility of platform providers. That is where the wonderful features of auto-scalability and pay-as-you-go come from, since vendors, such as AWS, can manage the resources to match exactly the requirements of your application.
The problem here though is that there is latency in getting these worker nodes in the form of ephemeral containers up for your first invocation. After all, the serverless principle is that you utilize resources when required, and when not required, those resources theoretically do not exist.
This unavoidable latency is what actually degrades the performance of your applications. This is especially true when you are building serverless applications that are meant to be time-sensitive, almost all customer-facing applications.
This latency varies from vendor to vendor and across programming languages. For example, FaaS functions written in .NET usually have higher latency when compared to other programming languages. Mikhail Shilkov was one of the personalities in the field to confirm this in his famous piece, Comparison of Cold Starts in Serverless Functions across AWS, Azure, and GCP.
Nevertheless, we are witnessing the community improving performance over cold starts. For example, if we refer back to cold starts in .NET, we see that newer versions of .NET demonstrate better performance when compared to their predecessors.
The graph below, for example, illustrates how cold starts improve over newer versions of .NET, across various memory allocation to the AWS Lambda functions. However, we are still experiencing latency, which could be devastating, especially considering that the results below arise from a simple hello world .NET AWS Lambda function.
Of course, this has been an issue that the serverless community has been dealing with for a while now, and they have devsied various strategies to overcome latency issues. Moreover, third-party SaaS tools, such as Thundra.io, constantly providing solutions (cold start monitoring and warming triggers) in an attempt to mitigate the pain that cold starts bring with them.
Unfortunately, none of the solutions are perfect, and this is where Provisioned Concurrency comes into play. On the spectrum of solutions to AWS Lambda cold starts, AWS’ Provisioned Concurrency probably sits closest to achieving the goal of zero cold starts.
The Workings of Provisioned Concurrency
Knowing that the major reason behind cold starts is the time taken to initialize the computing worker nodes, AWS’ Provisioned Concurrency solution is quite simple. Already have those worker nodes initialized!
The concept here is that you can now decide how many of these worker nodes you would like to keep initialized for your time-sensitive serverless applications. These worker nodes will reside in a frozen state with your code downloaded and underlying container infrastructure all set. Because this is still not technically using up any resources, the benefit here is a guaranteed response time of almost double-digit milliseconds. This is a considerable improvement compared to the latency creeping into the seconds if not minutes with the .NET example, whose cold start durations are illustrated above.
That means, depending on the number of concurrent worker nodes you have, invocations will be routed to provisioned worker nodes before on-demand worker nodes, thus avoiding cold starts due to the need for initialization. It would thus be wise to provision a higher number of worker nodes for expected spikes in traffic. For example, a movie ticketing system could expect a higher rate of traffic on their site at the time ticket sales of a popular show go on sale as shown below.
If the tickets go on sale at 6 pm, then you would expect a higher number of requests, meaning a higher number of invocations of the function. As ticket sales continue, and all the show’s tickets get sold out, you can then expect traffic to drop. Therefore, you would no longer need as many provisioned concurrent worker nodes.
If the provisioned concurrent worker nodes fail to accommodate all incoming invocations, then the overflow invocations are handled conventionally, with on-demand worker nodes being initialized per the request. However, overall, it is definite that there is an improvement in the latency displayed by your serverless application.
There are various ways to provision concurrent worker nodes. The main method includes using the AWS console itself, or the AWS API. Moreover, with the launch, AWS has partnered with third-party AWS partner tools to facilitate the provisioning of these concurrent worker nodes.
For example, Thundra.io is one such partner and allows you to monitor these provisioned worker nodes, including the number of provisioned concurrent nodes compared to spill over invocations that get routed to on-demand worker nodes.
There are, however, some limitations to the number of provisioned concurrent worker nodes you can reserve. For example, the number of unreserved worker nodes cannot fall below 100.
Therefore it is seen that provisioned worker nodes can be used to avoid cold starts. Those using Lambda functions no longer need to set up extra triggers or perform code changes to mitigate the latency problem. However, Provisioned Concurrency means the worker nodes are present and ready to take on requests to the application. This leads us to another, more philosophical question: is Provisioned Concurrency cheating?
The Serverless Conundrum
Provisioned Concurrency kills the cold start problem, but alas, the ideal serverless dream also suffers collateral damage. After all, serverless was meant to be a fully managed on-demand service. However, with Provisioned Concurrency, neither are these services fully managed for you nor are they on-demand, hence redefining the proverbial definition of what a FaaS service is.
Serverless was built on three great pillars as mentioned below:
- Fully managed.
With Provisioned Concurrency, however, we see a divergence from the first two characteristics. Firstly, you have to manage the number of reserved worker nodes, and secondly, you have to pay for these worker nodes by the hour, according to AWS’s pricing plans for the feature.
We may still salvage the ‘fully-managed’ clause of serverless services though. Even though we have delved into some of the management responsibilities of Lambda functions by deciding the number of worker nodes to provision, a large and substantial part of the resource management is still being handled by the cloud vendor.
The only reason we are reserving concurrent worker nodes for the Lambda function is to overcome the cold start problem. Apart from the number of resources, all other responsibilities are still managed by AWS. Moreover, when the number of reserved worker nodes falls short, we switch over to on-demand worker nodes seamlessly without any intervention from the user. Therefore, we have the capability of overcoming cold starts with a dent to the "fully-managed" clause, but nonetheless, a dent required to fit the serverless model into the practical caveats of the real-world.
Thus, we can get over the fact that AWS Lambda functions may no longer be fully managed as per the ideological definition of a serverless service It is a small price to pay to achieve serverless adoption as once envisioned. On the other hand, it is the ‘pay-as-you-go’ trait of the serverless model that takes the greatest hit.
Provisioned Concurrent worker nodes incur a fixed cost per hour, irrespective of whether or not the worker node is processing requests. The ‘pay-as-you-go’ trait was one of the biggest advantages to the serverless model, but then again, the cold start problem was the greatest disadvantage. Therefore, we have had to woefully sacrifice the queen to checkmate cold starts. A sacrifice with a cloud of contention.
The issue is exasperated once it is known that the reserved worker nodes do not actually preserve the state. It is already known that FaaS functions are stateless, and many devise workarounds to the state caching problem of serverless functions. Provisioned worker nodes, on the other hand, even though reserved, do not preserve the state, and, as a result, still exhibit the issue intuitively expected with on-demand worker nodes. Consequently, what AWS is charging for is basically just the warm-up events, a concept on which Jeremy Daly already hosts an open-source project on.
The community may have to yield a scratch on the fine "pay-as-you-go" trait, but it is believed that the benefits of overcoming cold starts are manifold. Thus tipping the scales in favor of serverless adoption, considering how we have seen that cold starts, are in fact an inherent problem of the serverless concept. To see the successful mass adoption of serverless as the community envisioned, there are some minor costs we must incur to our ideal serverless environment, and AWS has taken that bold step with Provisioned Concurrency.
If there was anyone who could so poetically capture the opinions and pleas that enwreathe Provision Concurrency, it is the Greek fabulist Areos who once said: “A crust eaten in peace is better than a banquet partaken in anxiety.” We cannot reap the ideal benefits offered by serverless without constantly looking over our shoulders for that dreaded cold start lurking in the uncertainty.
Published at DZone with permission of Sarjeel Yusuf. See the original article here.
Opinions expressed by DZone contributors are their own.