Introduction
In the part 1 of our series about how to develop, run and optimize Quarkus web application on AWS Lambda, we demonstrated how to write a sample application which uses the Quarkus framework, AWS Lambda, Amazon API Gateway and Amazon DynamoDB. We also made the first Lambda performance (cold and warm start time) measurements and observed quite a big cold start time.
In the part 2 of the series, we introduced Lambda SnapStart and measured how its enabling reduces the Lambda cold start time by more than 50%.
In the part 3 of the series, we introduced how to apply Lambda SnapStart priming techniques by starting with DynamoDB request priming with the goal to even further improve the performance of our Lambda functions. We saw that by doing this kind of priming by writing some additional code we could significantly further reduce the Lambda cold start times compared to simply activating the SnapStart.
We also clearly observed the impact of the AWS SnapStart Snapshot tiered cache in our measurements.
In this part of our article series, we'll introduce another Lambda SnapStart priming technique which is API Gateway request event priming. We'll then measure the Lambda performance by applying it and compare the results with other already introduced approaches.
Sample application with the activated AWS Lambda SnapStart with using API Gateway request event priming
We'll re-use the same sample application introduced in the part 1 of our series.
Activating Lambda SnapStart is also a prerequisite for this method.
Globals:
Function:
Handler: io.quarkus.amazon.lambda.runtime.QuarkusStreamHandler::handleRequest
CodeUri: target/function.zip
Runtime: java21
SnapStart:
ApplyOn: PublishedVersions
....
This can be done in the globals section of the Lambda functions, in which case SnapStart applies to all Lambda functions defined in the AWS SAM template, or you can add the 2 lines
SnapStart:
ApplyOn: PublishedVersions
to activate SnapStart only for the individual Lambda function.
You can read more about the concepts behind the Lambda SnapStart in the part 2.
SnapStart and runtime hooks offer you new possibilities to create your Lambda functions for low startup latency. With the pre-snapshot hook, we can prepare our Java application as much as possible for the first call. We load and initialize as much as possible which our Lambda function needs before the snapshot is created. This technique is known as priming.
Here I'll present you another experimental priming technique that preinitializes the entire web request (API gateway request event). This preinitializes more than DynamoDB request described in part 3, but also requires significantly more code to be written. The idea is nevertheless comparable. Activating Lambda SnapStart is also a prerequisite for this method. Let's take a look at the implementation in the AmazonAPIGatewayPrimingResource class:
@Startup
@ApplicationScoped
public class AmazonAPIGatewayPrimingResource implements Resource {
@PostConstruct
public void init () {
Core.getGlobalContext().register(this);
}
@Override
public void beforeCheckpoint(org.crac.Context<? extends Resource> context) throws Exception {
new QuarkusStreamHandler().handleRequest
(new ByteArrayInputStream(convertAwsProxRequestToJsonBytes()),
new ByteArrayOutputStream(), new MockLambdaContext());
}
@Override
public void afterRestore(org.crac.Context<? extends Resource> context) throws Exception {
}
private static byte[] convertAwsProxRequestToJsonBytes () throws JsonProcessingException {
ObjectWriter ow = new ObjectMapper().writer().withDefaultPrettyPrinter();
return ow.writeValueAsBytes(getAwsProxyRequest());
}
private static AwsProxyRequest getAwsProxyRequest () {
final APIGatewayProxyRequestEvent aPIGatewayProxyRequestEvent = new APIGatewayProxyRequestEvent ();
aPIGatewayProxyRequestEvent.setHttpMethod("GET");
aPIGatewayProxyRequestEvent.setPathParameters(Map.of("id","0"));
return aPIGatewayProxyRequestEvent;
}
}
We use Lambda SnapStart CRaC runtime hooks here. To do this, we need to declare the following dependency in pom.xml:
<dependency>
<groupId>io.github.crac</groupId>
<artifactId>org-crac</artifactId>
</dependency>
Please make sure that @ Startup annotation is present in the AmazonAPIGatewayPrimingResource class so that the priming takes effect. As we can see, in the method getAwsProxyRequest we create an object of type APIGatewayProxyRequestEvent and set some of its properties like HTTP Method to "GET" and path parameter ID to 0. This basically mocks APIGatewayProxyRequestEvent which is the input parameter of the handleRequest method. Only this properties of the APIGatewayProxyRequestEvent are required to be set to invoke GetProductByIdHandler Lambda function which then accesses the product id by invoking requestEvent.getPathParameters().get("id").
In the CRaC runtime hook beforeCheckpoint method, AwsProxyRequest is converted into a byte array and processed by calling QuarkusStreamHandler().handleRequest which in turn invokes GetProductByIdHandler Lambda function (which is mapped in the template.yaml to the /products/{id} path and HTTP GET method), whose handleRequest method is called directly. The priming is performed locally in AWS, so no network trip is required.
The purpose of this priming is to instantiate all required classes and to translate the AWS Lambda programming model (and invocation) into the Quarkus programming model. Through the preinitialized call of the handleRequest method of the GetProductByIdHandler, the DynamoDB request priming presented in the part 3 is also carried out automatically by the DynamoDB call.
To ensure that only this priming takes effect, please either comment out or remove the @ Startup annotation in the following class AmazonDynamoDBPrimingResource.
However, I consider this priming technique to be experimental, as it naturally leads to a lot of extra code, which can be significantly simplified using a few utility methods. Therefore, the decision to use this priming method is left to the reader.
Measurements of cold and warm start times of our application with Lambda SnapStart and API Gateway request event priming
In the following, we will measure the performance of our GetProductByIdFunction Lambda function, which we will trigger by invoking curl -H "X-API-Key: a6ZbcDefQW12BN56WEV318" https://{$API_GATEWAY_URL}/prod/products/1.
The results of the experiment are based on reproducing more than 100 cold starts and about 100,000 warm starts with the Lambda function GetProductByIdFunction (we ask for the already existing product with ID=1 ) for the duration of about 1 hour. We give Lambda function 1024 MB memory, which is a good trade-off between performance and cost. We also use (default) x86 Lambda architecture. For the load tests I used the load test tool hey, but you can use whatever tool you want, like Serverless-artillery or Postman.
We will measure with tiered compilation (which is default in Java 21, we don't need to set anything separately) and compilation option XX:+TieredCompilation -XX:TieredStopAtLevel=1. To use the last option, you have to set it in template.yaml in JAVA_OPTIONS environment variable as follows:
Globals:
Function:
Handler: io.quarkus.amazon.lambda.runtime.QuarkusStreamHandler::handleRequest
...
Environment:
Variables:
JAVA_TOOL_OPTIONS: "-XX:+TieredCompilation -XX:TieredStopAtLevel=1"
Please also note the effect of the AWS SnapStart Snapshot tiered cache. This means that in the case of SnapStart activation, we get the largest cold starts during the first measurements. Due to the tiered cache, the subsequent cold starts will have lower values. For more details about the technical implementation of AWS SnapStart and its tiered cache, I refer you to the presentation by Mike Danilov: "AWS Lambda Under the Hood". Therefore, I will present the Lambda performance measurements with SnapStart being activated for all approx. 100 cold start times (labelled as all in the table), but also for the last approx. 70 (labelled as last 70 in the table), so that the effect of Snapshot Tiered Cache becomes visible to you. Depending on how often the respective Lambda function is updated and thus some layers of the cache are invalidated, a Lambda function can experience thousands or tens of thousands of cold starts during its life cycle, so that the first longer lasting cold starts no longer carry much weight.
To show the impact of the SnapStart with API Gateway request event priming, we'll also present the Lambda performance measurements without SnapStart being activated from the part 1, with SnapStart being activated but without applying the priming techniques as measured in the part 2 and with SnapStart being activated and DynamoDB request priming neing applied as measured in the part 3.
Cold (c) and warm (w) start time with tiered compilation in ms:
Scenario Number | c p50 | c p75 | c p90 | c p99 | c p99.9 | c max | w p50 | w p75 | w p90 | w p99 | w p99.9 | w max |
---|---|---|---|---|---|---|---|---|---|---|---|---|
No SnapStart enabled | 3344 | 3422 | 3494 | 3633 | 3904 | 3907 | 5.92 | 6.83 | 8.00 | 19.46 | 50.44 | 1233 |
SnapStart enabled but no priming applied, all | 1643 | 1703 | 1953 | 2007 | 2084 | 2084 | 5.68 | 6.35 | 7.39 | 16.39 | 49.23 | 1386 |
SnapStart enabled but no priming applied, last 70 | 1604 | 1664 | 1728 | 1798 | 1798 | 1798 | 5.64 | 6.30 | 7.33 | 15.87 | 47.30 | 1286 |
SnapStart enabled and DynamoDB request priming applied, all | 666 | 720 | 944 | 1117 | 1317 | 1318 | 5.73 | 6.45 | 7.57 | 16.01 | 39.07 | 566 |
SnapStart enabled and DynamoDB request priming applied, last 70 | 646 | 681 | 774 | 1043 | 1043 | 1043 | 5.64 | 6.40 | 7.51 | 15.75 | 37.54 | 566 |
SnapStart enabled and API Gateway request event priming applied, all | 604 | 675 | 648 | 1181 | 1197 | 1198 | 5.25 | 6.25 | 7.33 | 15.62 | 41.64 | 338 |
SnapStart enabled and API Gateway request event priming applied, last 70 | 588 | 599 | 650 | 790 | 790 | 790 | 5.46 | 6.10 | 7.16 | 14.90 | 39.38 | 241 |
Cold (c) and warm (w) start time with -XX:+TieredCompilation -XX:TieredStopAtLevel=1 compilation in ms:
Scenario Number | c p50 | c p75 | c p90 | c p99 | c p99.9 | c max | w p50 | w p75 | w p90 | w p99 | w p99.9 | w max |
---|---|---|---|---|---|---|---|---|---|---|---|---|
No SnapStart enabled | 3357 | 3456 | 3554 | 4039 | 4060 | 4060 | 6.01 | 6.83 | 8.13 | 19.77 | 53.74 | 1314 |
SnapStart enabled but no priming applied, all | 1593 | 1625 | 1722 | 1834 | 1930 | 1930 | 5.55 | 6.21 | 7.16 | 16.08 | 50.44 | 1401 |
SnapStart enabled but no priming applied, last 70 | 1574 | 1621 | 1685 | 1801 | 1801 | 1801 | 5.55 | 6.20 | 7.16 | 15.14 | 49.23 | 1401 |
SnapStart enabled and DynamoDB request priming applied, all | 636 | 701 | 943 | 973 | 1055 | 1055 | 5.50 | 6.20 | 7.21 | 14.66 | 39.07 | 330 |
SnapStart enabled and DynamoDB request priming applied, last 70 | 628 | 654 | 692 | 859 | 859 | 859 | 5.50 | 6.15 | 7.04 | 14.08 | 37.25 | 270 |
SnapStart enabled and API Gateway request event priming applied, all | 616 | 656 | 617 | 941 | 960 | 961 | 5.55 | 6.21 | 7.39 | 16.08 | 41.40 | 486 |
SnapStart enabled and API Gateway request event priming applied, last 70 | 595 | 619 | 653 | 765 | 765 | 765 | 5.47 | 6.21 | 7.27 | 16.08 | 39.94 | 486 |
Conclusion
In this part of the series, we introduced how to apply Lambda SnapStart priming techniques we called API Gateway event request priming with the goal to even further improve the performance of our Lambda functions compared to the DynamoDB request priming. We saw that by doing this kind of priming by writing a some amount of additional code we could further reduce the Lambda cold start times.
We also saw that -XX:+TieredCompilation -XX:TieredStopAtLevel=1 java compilation outperformed the tiered compilation for this type of priming for nearly all percentiles.
We also clearly observed the impact of the AWS SnapStart Snapshot tiered cache in our measurements.
However, as I've already pointed out, I consider this priming technique to be experimental, as it naturally leads to a lot of extra code, which can be significantly simplified using a few utility methods. Therefore, the decision to use this priming method is left to the reader. You can stick to the applying DynamoDB request priming having a bit higher Lambda cold start times.
In the next part of our article series, we'll introduce how to adjust our sample application to one from which we can build the GraalVM Native Image and deploy it as a Lambda Custom Runtime. We'll then measure the Lambda performance with it and compare the results with other already introduced approaches.
Top comments (0)