lambda warming is an anti-pattern

AWS lambda functions are a great tool for asynchronous, burstable, and serverless processing and transactional computing.

The example used by AWS for Lambda usage is asynchronously generating thumbnails from uploaded images, and this is a prime use case for Lambdas. They provide scalable processing when required without the need to constantly run servers when there are no jobs to be completed.

They serve a great purpose for batch media encoding when you don't have the resources or desire to manage a rendering farm, or when your workload is not consistent enough to warrant the operational overhead for the servers required to handle encodes or other tasks that can be offloaded into the background.

What I have seen, however, is that Lambdas have been used in places where they never should have been. Lambdas can certainly provide backend and batch processing to support an API, however they should never be used as the client-facing request handler for an API due to the cold start issue, as well as some additional scalability issues that may be experienced in a large enterprise environment.

Lambdas work well when there is no direct client communication with the system, as the cold start time for a Lambda can take multiple seconds before the Lambda code even starts executing. While subsequent requests to a "warm" Lambda will be much quicker, the initial call to a "cold" Lambda will cause noticeable lag for the user, as well as bottleneck any systems that utilize multiple composable micro-services.

Furthermore, if the Lambdas are internally-networked through a customer-owned VPC, there is an additional delay while attaching an ENI - however my contacts at AWS say this will be mitigated in the near future.

Additionally, while Lambdas enable scalable burst request handling, one must be cognizant of the account-wide maximum concurrent Lambda execution limitations imposed by AWS.

While many AWS limits can be modified, currently this is a hard limit which cannot be lifted. Since this an account level limit, if a customer runs production and non-production systems in the same account and uses VPC and / or IAM segregation for environment management, load testing in QA can actually take down PROD - a pain one of my teams experienced in a very inopportune time (is there ever a good time for prod to go down?).

To make matters worse, there is no way to view the list of currently running / queued Lambdas, so setting CloudWatch alerts for such an edge case is currently not possible.

I am a proponent of micro-service oriented architecture, and single purpose Lambdas do follow the UNIX philosophy of "do one thing, and do it well". However the issues mentioned above are compounded when development teams use Step Functions and multiple micro Lambdas for each API call. Not only does each Lambda exhibit the cold start issue, but with potentially hundreds of Lambdas called for each API request, a widely used enterpise-level API could soon cause the account to lock with the concurrent execution limit.

My contacts at AWS explain that the concurrent Lambda issue can be mitigated by splitting up team environments into separate accounts, but that makes billing and internal charge backs more cumbersome. And at enterprise scale, even this may not be resolved if a team fully embraces the serverless Lambda philosophy.

However beyond the edge case of multiple concurrent Lambdas, the real issue I have seen with Lambda usage is implementing it as an edge API router / handler. And development teams cannot be faulted for using it in this way. API Gateway was released as a frontend RESTful layer for Lambda execution, among other AWS services.

Lambdas are nothing more than containers that run on top of the Firecracker virtualization system. Like any container, they incur an initial startup time as the resources are being allocated and the virtualized environment is being started. The issue is that Lambdas were never meant to be a continually-running process.

In programming, a lambda function is simply a function that takes input and returns output. That means that a Lambda is not running in an event loop, it's not running in the background, and it's not continually available to process requests. Instead, Lambdas are instantiated with new event input, and exit when the process has completed. When used as an API ingress, this means that for each HTTP request, a Lambda is instantiated, runs, and exits.

For an API that receives a few requests a day, this may actually be desired when compared to running a full server or K8S cluster to handle these disparate requests. However for an API that is heavily used, this adds more overhead and complexity when compared to a server handling these requests, or at heavier load, an orchestrated container environment to scale according to the incoming request load.

As mentioned, while Lambdas experience the cold start issue if called infrequently, a Lambda that is called more regularly will be kept "warm" - effectively the difference between a docker pull && docker run... and a docker stop && docker start....

To address the Lambda cold start issue, dev teams have taken to using "Lambda warmers" which are effectively Lambdas called at a regular interval which in turn call the API Lambda(s) to keep them "warm", ready to accept real connections.

First of all, this need to "keep a Lambda warm" completely undermines the transactional nature of Lambdas - they should only be data in, data out.

Second, if a Lambda must be kept "warm", it probably indicates that the Lambda is so frequently used with the need for low latency, that the process is better served either as a server process, a K8S container, or if you're all-in with AWS, an ECS Fargate container.

While a server-side process is generally more difficult to scale (ignoring ASGs for a moment), K8S or Fargate containers are made explicitly to handle such dynamic loads and requirements.

Until recently, API Gateway did not have an integration with Fargate or K8S, however with the recent addition of PrivateLink NLB integration with API Gateway, one can front containerized workloads with an API Gateway, so any reasoning to use Lambda as an API ingress is rendered moot.

Furthermore, with API management services such as Apigee, or even better if you're in the containerized world, dynamic cluster ingress controllers such as NGINX or Traefik, one can manage microservice ingress and API routing entirely within their cluster and app logic.

In sum - Lambdas serve a great purpose for backend processing, batch jobs, transcoding, and other work that does not require an immediate response and does not have direct client interaction.

But for systems which end-users interact with directly, you should probably use an auto-scaling system which has at least a couple of pods / tasks running continually, otherwise you'll end up with more "your app is slow!" complaints than you - or your app support teams - would like to deal with.

last updated 2022-08-20