The Fundamental Problem: Solving .NET Lambda Cold Starts, Part 1
The Fundamental Problem: Solving .NET Lambda Cold Starts, Part 1
In the this post, we'll highlight why, exactly, .NET experiences considerably longer cold starts when running on AWS.
Join the DZone community and get the full member experience.Join For Free
As the age-old battle of .NET(C#) vs Java rages on, it is evident that there is still no clear victor albeit Java’s larger user group. Both languages emerged from the same aspirations and both languages have served similar purposes throughout the evolution of technology. However, as we now enter the era of serverless, .NET is on the back foot with an evident disadvantage regarding cold starts.
Mikhail Shilkov, one of the more prominent and articulate tech evangelists in the domain of serverless cold starts wrote a piece comparing the three big cloud providers. The main focus was on how they dealt with initializing their serverless environments. His results intrigued me as they illustrated a major problem, especially with AWS, which fields itself as the leader in function-as-a-service platforms.
No, .NET developers are not doomed to stay in exile in this new era of serverless. With an understanding of the problem at hand and arrangement of best practices, the issue of cold starts can be greatly mitigated and even avoided completely. That is the purpose of this two-part article. In the first part, we'll highlight why exactly .NET experiences considerably longer cold starts. In the second part, we'll discuss the steps a C# enthusiast and .NET developer can take to reduce cold starts and even overcome them completely.
The Curse of Cold Starts
A cold start refers to the set-up time required to get a serverless application’s environment up and running when it is invoked for the first time within a defined period. Serverless applications run on ephemeral containers, where the management of these containers becomes the responsibility of platform providers. That is where the wonderful features of auto-scalability and pay-as-you-go arise from since vendors such as AWS — they were made to help manage the resources outlined by the the requirements of your running application.
Unfortunately, that also means that if your application is not being used for a considerable amount of time, the ephemeral container that holds your application, which we can call a ‘worker,’ closes to save resources and cost. However, if that application is triggered again, resources must be allocated to it again, and its environment set up again. That means latency. That means slower response times. That means, rethinking if serverless is actually the best solution after all.
Certainly, it is one of the best solutions out there, especially when it comes to cost-effectiveness and scalability of event-driven applications. These benefits really get developers to think about whether it is worth forgoing the amazing auto-scaling, reliability and cost-effective benefits of serverless platforms simply due to a bit of latency. Considering the numbers, however, many do think that a bit of latency is affordable for the benefits of serverless, as seen by the graph below, reported by the RightScale state of cloud report.
However, can .NET developers say the same? We have seen that the C# language is the worst performing in terms of cold starts. This begs the question, is serverless possible for .NET? Well, to know if there is a possibility in overcoming this cold start predicament, understanding the problems is crucial so that targeted solutions can be devised.
Understanding the Problem
The question that persists is where does serverless go wrong for .NET? What exactly is the problem that causes these considerably longer cold starts? Well, in all honesty, there isn’t a single problem, but rather several.
Right of the bat, we know that C# is a statically typed language, and that means that the container environment that is set up needs to initially be aware of the variable types. However, Java is also a statically typed language but we do not see the same horrific cold start durations as we do in .NET. This brings one to ask, is the Java platform on AWS better optimized as compared to .NET? Exploring the answers to this question would definitely lead to folly, as the two runtimes, even though similar in their paradigms, are greatly different in the way they are compiled and deployed.
The major problem with .NET is jitting machine agnostic libraries that are used during the development of the code to be machine specific libraries during the deployment of .NET on the AWS platform. According to Norm Johanson, a senior developer at AWS working on .NET, “one of the most intensive tasks at startup is jitting your machine agnostic .NET assemblies into machine specific assemblies.” This is the major issue, specific to .NET lambda functions, and it is paramount that we pinpoint this as the primary culprit of longer cold starts.
There are, however, other issues that do prolong cold starts, but they are not as specific to .NET Lambdas. They also pose concerns and include cases such as the use of VPCs and connecting ENIs to the serverless container. Setting up these resources really does drag on cold start durations, but does not have significantly different effects on the performance depending on the runtime. Yet, solving them can also boost your Lambda functions, and addressing each such issue is greatly beneficial for .NET functions overall.
Where We Stand Now
Before we dive into the solutions, it must be acknowledged that AWS is continuously optimizing their serverless platforms and finding novel ways to reduce cold start times. Their efforts manifest in the reduced cold start durations observed with .NET 2.x core support.
With the release of .NET 2.0 core support by AWS in January of 2018, the cold start experience has gradually improved as compared to what was witnessed in the widely referred article written by serverless hero Yan Cui in June 2015. His article, among other things, brought to light the deplorable cold start performance of .NET. Since then, various demonstrations have shown considerable improvements in the runtime's performance, especially with the release of .NET 2.0.
For example, comparing basic Lambda functions with .NET Core 1.0 and .NET Core 2.1 yields the results illustrated below.
Additional improvements in building Lambda functions with the C# language can definitely be expected. From my casual discussions with .NET developers at AWS, and the atmosphere of the internet, it can be said that, at the moment, developers at AWS are planning on implementing tiered compilation support. This new support will allow your .NET Lambda to start up in multiple parts, usually two parts. This is achieved as the .NET runtime substitutes different assembly methods for a single method over the lifetime of the application. The two phases of tiered compilation can be considered as below:
Startup — The initial process where the runtime generates ‘low-quality’ code to be substituted by ‘high-quality’ code when the function appears hot. The compromise on the quality saves JIT time, which means quicker startup time for the serverless container.
Steady-state — When the initial function finally is deemed hot, the ‘higher-quality’ jitted code is substituted and used throughout the Lambda’s operation.
This compilation strategy shows great promise in terms of performance and its introduction will definitely be a game changer. At the moment, the core .NET team at AWS is still working on the feature. Its implementation could yield some interesting results and steadily level the playing field.
Regardless of the great innovations being made to improve the .NET runtime, there are still several ways we can reduce cold start durations, simulating start-up performances similar to that of Java. This can be achieved by a series of solutions that span over the entire process of creating, uploading, and executing .NET Lambda functions. Therefore, now that a base understanding of the problems of .NET is established, countering solutions can be discussed and implemented accordingly to target the specific problems identified. That is the discussion that is to follow in Part 2.
Published at DZone with permission of Sarjeel Yusuf . See the original article here.
Opinions expressed by DZone contributors are their own.