Lambda vs. EC2
This in-depth comparison of AWS Lambda and EC2 covers their relative strengths and weaknesses while singling out ideal use cases.
Join the DZone community and get the full member experience.
Join For Free
In this article, we’ll review AWS Lambda in a direct comparison with AWS EC2. If you’re here for the conclusion, feel free to skip to the end where we have a lovely summary table.
From EC2 to Lambda
Lambda is a product offered by Amazon as “serverless architecture”. This, like “blockchain”, has become an industry buzzword, leaving many people to ask, “What the hell actually is Lambda?” Lambda, as it turns out, is a framework of ECS (EC2 Container Service) containers that run a single piece of code or application, and scales as needed based on use. Each container is short-lived. This is part of a long line of products offered by Amazon that remove the need for infrastructure management. Beginning with EC2, Amazon started reducing the time to provision a server, and enhanced this with autoscaling, scheduled provisioning, and inbuilt monitoring and alerting through CloudWatch. When EC2 was first introduced it was considered a far more volatile environment than today. For companies operating at scale, there were issues with noisy neighbors, provisioning failures, sudden machine disappearances, and occasional outages (even at the datacenter level).
Then came EB (Elastic Beanstalk), which wrapped all of this up into a neat little package. It comes in a variety of flavors for programming languages and frameworks (like Python/Django, RoR, Java/J2EE, etc.), and allows developers to upload code directly to the machines through the AWS GUI as compressed packages. Its autoscaling is configurable, and the EC2 instances themselves are accessible via the EC2 console after Elastic Beanstalk spins them up. Sysadmins and devs can still log into the EC2 machines that Elastic Beanstalk spins up and can make modifications, even cutting an AMI and replacing the one used by Elastic Beanstalk to make the modifications stick across scaling events. The load balancer for Elastic Beanstalk is abstracted by Amazon, and Elastic Beanstalk gives you an endpoint to use in its place.
Most recently Amazon released Lambda, and like Elastic Beanstalk it permits a number of programming languages and frameworks, once again including Python, .NET, Java, and Node.js (if you’re looking for PHP you’re out of luck). Even more recently, Amazon has made newer versions of these languages and frameworks available (like Python 3.6, up from the original 2.7).
Lambda Language Options (circa 2017/09/20)
Also like Elastic Beanstalk, developers can upload code packages directly to Lambda. Unlike its predecessors, the underlying Lambda infrastructure is entirely unavailable to sysadmins or developers. Scale is not configurable, instead Lambda reacts to usage and scales up automatically. Instead of using EC2, Lambdas instead use ECS, and the containers are not available for modification. In place of a load balancer, or an endpoint provided by Amazon, if you want to make Lambdas accessible to the web it must be done through an API Gateway, which acts as a URL router to Lambda functions.
Cost and Use Cases
One of the major advantages touted by Amazon for using Lambda was reduced cost. The cost model of Lambda is time-based: you’re charged for requests and request duration. You’re allotted a certain number of seconds of use that varies with the amount of memory you require. Likewise, the price per MS (millisecond) varies with the amount of memory you require2. Obviously, shorter running functions are more adaptable to this model.
We use Lambda to pass on information to other services, acting as a connector. When we first did this we noticed that in cases where the services were unreachable, we ended up waiting for 60 seconds for a timeout. This added significant cost to the service, and after modification, we reduced this to one second. We’ll revisit maximum operating time for Lambdas in a section below.
Our code that resides in Lambdas also reaches out to other services to get data. With each request taking longer, we introduced a caching mechanism to cut down on the time significantly. Earlier every request could take over 1000ms (one second) and, using cache, the average request time is now ~100ms (with the exclusion of the first request for said data, obviously).
At its heart, it seems that independent processing jobs are the best use-case for Lambda. Given the time-based cost, it’s tempting to say that something like video and image processing are the best kind of use-cases for Lambda, but video and image processing tend to take up more memory, tend to require libraries of a significant size, and can be protracted. We’ll revisit maximum package sizes for Lambdas in a section below.
In considering costs let’s split it into two sections:
- Setup
- Ongoing
The setup costs for Lambda are as close to none as they can be. There are plenty of “gotchas” that we’ll discuss below, but simply getting a function running is low-effort. After that, hooking up a Lambda to an API Gateway is also low overhead, including getting a development environment and a production environment.
The ongoing costs for Lambda are surprisingly close to what they would be if we were to recreate the services in EC2 with ELB (Elastic Load Balancing) and autoscaling. This was disappointing, as we had hoped that Lambda would cost less given our setup.
For setup, Lambda is the clear winner and does as advertised. For ongoing costs, it’s a tie between more traditional cloud architecture and Lambdas.
Networking
One of the bigger issues in not being able to directly manage the infrastructure is in not being able to control networking. We ran into this when attempting to resolve domains from another VPC in another AWS account. In an EC2 environment, we would have updated resolv.conf to reflect the nameserver IP of the other account, which is accessible to us through a VPN tunnel and routing rules. In a Lambda environment you must:
- Associate the Lambda to your VPC and select the appropriate subnet(s)
- Update your VPC DHCP option set to reflect the nameserver you’d like first in your list
That should set off a violently loud alarm in your mind. Assigning a DHCP option set to a VPC impacts all running machines within that VPC, and automatically replaces the associated config files on running instances over minutes or hours. To mitigate this issue we decided to use code to resolve domain names, in which we could specify a nameserver by IP with fallback.
As for public name resolution: “We recommend that you avoid DNS resolution of public hostnames for your VPC. This can take several seconds to resolve, which adds several seconds of billable time on your request.”1
A smaller annoyance is that Lambdas are only ever given private APIs. To access these Lambdas you must set up an API Gateway. This doesn’t have an impact on our costs or process, but it would have been nice to do this all in one place.
The clear winner here is traditional cloud architecture.
Dependencies
Most, if not all, projects have external dependencies. They rely on libraries that aren’t built into the language or framework. When you have functionality that includes cryptography, image processing, etc., these libraries can be fairly heavy. Without system-level access, you must package these dependencies into the application itself.
For some frameworks, like Ruby on Rails, this is the standard process (though RoR isn’t supported through Lambda), while for others, like Python Django, it’s more common to install these dependencies to the system or to a virtual environment. Updating your packaging mechanism for this purpose is straightforward, but once done you may encounter another problem: Lambdas have hard limits on the size of packages that you may upload. The base limit is 50MB, but you may also download dependencies on function initialization of up to 500MB to “/tmp”2, which of course would most likely incur a significant time-cost.
The winner here is based on your context. For simple applications with few dependencies, Lambda is the winner, for anything more complex or requiring heavier libraries, traditional cloud architecture is the winner.
Security
You may wish to set up encryption between your API Gateway and Lambda function, and again between Lambda and S3 (using Kinesis Firehose). Setting up this type of encryption using a KMS key is straightforward, though left us a little confused.
A default encryption key is created the first time you create a Lambda function. However, it is recommended that you create your own KMS key. Having your own key provides flexibility to create, rotate, disable the key, etc. Once the KMS key is available you can use it in the Lambda function configuration easily, simply by selecting it as part of the configuration. In our use case, the data was being saved to S3 via Kinesis Firehose. Likewise, to enable encryption for S3 you simply need to enable encryption and select the correct key in your S3 configuration.
What was unusual about this setup is that you’ll never know that your data is encrypted in S3. Having read access to the S3 bucket enables a consumer to download the data unencrypted, which is, according to Amazon, decrypted on the fly using the appropriate key. We had expected to see the encrypted data in the S3 bucket and then decrypt it using the KMS key, as opposed to AWS decrypting it on the fly, and, of course, this isn’t documented.
There’s no clear winner in terms of transport layer encryption: setting up a tunneled communication mechanism between two services is straightforward with REST, simply by using an SSL certificate and HTTPS. When it comes to at rest encryption, the winner is S3 over traditional storage (opposed to Lambda over EC2), as having all data encrypted and decrypted is seamless, assuming that the data truly is encrypted at rest. The downside of this is control of IOPS, which may be important in your context.
When it comes to system-level security it seems likely that Lambda is the winner. Lambdas require no system updates, and presumably, the ECS containers that AWS uses are being updated automatically and often (presumably). Additionally, the ECS containers are short-lived and come only with private IP addresses, and as a result exploiting them directly would be next to impossible. Security in the traditional environment is completely in your control, and as control freaks, we like that, but it does put the onus of security entirely on your organization, including system updates. IP addresses for these machines are consistent, even when private only, and as a result, there is greater exposure for EC2 instances.
Once again, there’s no clear winner. If our assumptions about Lambda are correct and given the lesser attack vector presented by having short-lived containers, we’re inclined to say that Lambda is the winner in the security department.
Environments
Setting up different environments for Lambdas is as easy as setting up a single environment. Given that it’s pay for use, this is a large improvement over EC2, as we no longer need to spin up dev machines (or leave dev machines available) for the service at cost. If you’ve already set up a schedule for your dev environment, or have automation for on-demand dev environments, this won’t be as appealing.
The winner here is definitely Lambdas unless you already have on-demand dev environments through containers or otherwise.
Timeouts
As alluded to above, there’s a hard 300-second timeout limit3 for Lambdas. We’ve already covered how complex or long-running functions aren’t good for Lambdas, but having a hard timeout makes it downright impossible to perform certain tasks, or assume additional cost at times when execution may take longer, as when communicating with external services or dealing with an abnormally complex processing request for an image.
A hard limit on this time makes Lambda untenable for applications that have wildly variable execution times, and for many services that require information from an external source.
The clear winner here is traditional cloud architecture.
Scale
Scale for Lambdas is a double-edged sword. The scaling process is automatic and, for the most part, fairly seamless, but where there is a lack of control there are also the sporadic edge cases that we’ve encountered. Lambda is a black box, and at times we see a pileup of errors related to the creation of new Lambda processes. While it’s great to not need to do anything during scale-up events, it’s also a pain to not be able to address and mitigate errors related to spawning new Lambda instances.
In our setup, Lambdas receive requests using an API Gateway and occasionally, we would see a spike in 5XX’s being returned from the API Gateway due to integration latency between the API Gateway and Lambda. This has nothing to with our implementation or our code running on the Lambda. What’s frustrating is that we cannot resolve this issue, as it is something occurring between API Gateway and Lambda that prevents our request from even reaching the Lambda. The only option we have during that time is for us to retry sending the request through the API Gateway.
An issue that we haven’t hit, but certainly could if we migrated more services to Lambda, is that of the concurrent execution limits per region. Lambda has a cap of 1000 concurrent connections per region that, to be fair, can be increased by reaching out to AWS support3. If we were to move all of our services to Lambdas we’d exceed that immediately.
It’s a tie between Lambdas and traditional cloud architecture in this section: hands-free scale is great, but hands-on yields control and, after setup and tuning, becomes hands-free.
Monitoring
You can enable CloudWatch monitoring for both API Gateway and Lambdas, in much the same way that you would need to enable logging for EC2 ELBs. Similarly, for Lambda, you can view logs within CloudWatch. When enabling them for API Gateway you should ensure that the log level is set to “Info” instead of “Error”, which gives more detailed information in the logs. This information (in the API Gateway logs) is still lacking in detail, and we have had multiple occasions where the only option was to send the logs to Amazon to review. This is a stark reminder that the whole system is a black box, much of which is outside of visibility.
Additionally, we encountered serious headaches when attempting to review and parse logs using the CloudWatch GUI. The search filter is far from intuitive: to obtain the correct log file, you need to select the correct Lambda version and alias that is prepended to the logs, and then scroll through each log file to find the specific timeframe in question.
There are libraries that you can use to supplement these capabilities (or you could write your own log parsers), but we haven’t experimented with this yet.
The clear winner here is traditional EC2, which, when set up correctly, makes logging (and log parsing) easy.
Conclusion
Ultimately, whether you want to use Lambda depends on your context. There are situations where Lambda may fit incredibly well and come in at much lower setup and maintenance costs. This is particularly true for lightweight services that have unpredictable traffic volumes.
The table below summarizes our findings above.
Winner |
||
Category |
Lambda |
Traditional Cloud Architecture |
Language Choice |
× Limited choice of languages and versions |
✓ |
Setup / Maintenance |
✓ Easy to set up and maintain |
× Context-specific |
Ongoing Cost |
= Both cost pretty much the same |
= Both cost pretty much the same |
Networking |
× No direct control, no public IP addresses |
✓ Name resolution, public IP addresses |
Libraries / Dependencies |
× Only smaller dependencies |
✓ Any size dependencies |
Security |
✓ Black box, but less exposure, responsibility |
× Control of and responsibility for security |
Environments |
✓ Easy to set up |
× Varies by use case, but generally more complex than Lambda |
Timeouts |
× Hard 300 second timeout |
✓ |
Scale |
= Easily scales, low touch |
= Scales easily if setup correctly |
Monitoring / Logging |
× Easy monitoring, logs are huge, difficult to parse |
✓ Easy monitoring and logging using appropriate tools |
Opinions expressed by DZone contributors are their own.
Trending
-
Designing a New Framework for Ephemeral Resources
-
A Data-Driven Approach to Application Modernization
-
How Agile Works at Tesla [Video]
-
RAML vs. OAS: Which Is the Best API Specification for Your Project?
Comments