Forem: Leon Brocard

Profiling Fastly Compute applications

Leon Brocard — Thu, 23 Jan 2025 10:06:47 +0000

As a member of the Fastly Solutions Engineering team, I care deeply about web performance. I work with our customers to configure their services and use Fastly features to make their web applications fast. While I generally use browser developer tools to investigate performance, how can we investigate performance on Fastly Compute, our serverless platform that lets you easily build the best experiences for your users?

Fastly Compute is an advanced edge computing system that runs your code, in your favorite language, on our global edge network. Security and portability are provided by compiling your code to WebAssembly (Wasm), a portable compilation target for programming languages. We run your code using Wasmtime, a fast and secure runtime for WebAssembly from the Bytecode Alliance project.

Setting up the environment

Let’s step through an example. Fastly Compute provides language tooling for JavaScript, Go, and Rust. I wrote a Rust application using the Fastly Rust SDK that generates a picture of a part of the Julia set, a mathematical function that can be quite pretty:

To run this on Fastly Compute, the Fastly CLI invokes the Rust compiler to compile the Rust code to the Wasm platform rather than my laptop’s platform. To run this on your laptop, use

fastly compute serve.

Generating these pretty images takes a few hundred milliseconds — that seems a little slow. How can I find out what is the slow part? The engineer’s tool of choice is a profiler. I could separate the Rust code out and use Rust’s standard profiling tools. However, the performance on my laptop’s platform might not represent the performance on Wasm.

Capturing performance data

It’s best to profile applications in a way that is as similar to production as possible, so we’ll use the cross-platform Wasmtime guest profiler. The “guest” part of the name indicates that it is profiles inside the Wasm process. To serve and profile from your laptop, use

fastly compute serve --profile-guest

Every 50 microseconds Compute notes down the function call stack (that is, which function are we in and which function called it) and then after the HTTP response is sent, it writes the captured profile to a file. The file is in a format supported by the Firefox profiler. Captured profiles can be viewed by dragging and dropping them onto https://profiler.firefox.com/, which processes the profiles using your browser.

Analyzing profiler output

The initial view from the Firefox profiler shows a number of tabs and the call tree, which is split up by function names. Some of these are from the Rust runtime, from my application, and from libraries that my application uses.

The guest profiler took 7,922 samples. The entry point of my application is the highlighted ecp_example_fractal::main function and 100% of the samples had it in the function call stack, as indicated by the Total (samples) column. However, the Self column indicates that none of the samples were in the main function itself: all of the work is being done in other functions.

The key parts of the application are:

image::buffer_::ImageBuffer::from_fn(), a Rust image library, which runs a function for every pixel. The function took 345 sampling intervals, while the call stack originating from this function took 3900 sampling intervals.
colorous::gradient::Gradient::eval_rational, a Rust colour scheme library, which assigns a nice colour to each pixel
brotli::enc::writer::CompressorWriterCustomIo, a Rust library to compress the response using Brotli
image::dynimage::DynamicImage::write_to, the same Rust image library which encodes the image using the PNG format.

Another way of visualizing the Self count is the flame graph tab:

This is a good way of seeing the relative amount of time being spent by these functions. By seeing what stacks are above the main function, we can see that the application spends most of its time generating the image (the image:: stacks), picking the pretty colours (the colorous:: stacks) and compressing the image as Brotli (the brotli:: stacks).

One more way of visualizing what functions are on the call stack is the stack chart tab:

By seeing what stacks are below the main function, we can see that half the time, the call stack has functions for generating the image and half the time is compressing the image as Brotli.

Wait a second: I’m already compressing this image in the image-specific PNG format, so there is no point in compressing it again using general-purpose Brotli! Compressing assets is best practice, but double compression is a waste of time. I must have copied and pasted that bit of code from another project. If I remove the Brotli compression then the application generates the same images but runs three times faster. This updated flame graph shows that the application now spends most of its time generating the image and the rest encoding the image:

Conclusion

Use the guest profiler via the fastly compute serve --profile-guest command to optimize your Compute applications and make them even faster.

If you’re just getting started with Fastly Compute, check out our learning resources. If you’re new to Fastly, creating an account is free and easy. Sign up to get started instantly!

Demystifying the HTTP Host header

Leon Brocard — Wed, 28 Jun 2023 12:43:35 +0000

The HTTP host header is a small, important part of our modern web. It is used more than you might think in Fastly. In this post, we’ll dig into the history of the header and show how crucial it is to serving content stored in an object store through Fastly.

How it started: single domain hosting

When the web was growing up, some early assumptions worked well. As we’ll see, some of our assumptions had to change.

Let’s work through an example. It’s 1996. Gina G’s “Ooh Aah... Just a Little Bit” is playing on the radio. Real physical computers, which were probably beige, ran web servers on port 80. Thus, each server could only serve content for a single domain. You open up Netscape Navigator and browse to www.example.com.

The browser resolves www.example.com, connects to port 80 of the server, and sends the few characters:

GET / HTTP/1.0

The web server listens on port 80, receives the request and sends back an HTML response. Success!

The web took off: virtual hosting

As the global hypertext dream came true in practice, there was a lot of success. So much success that the limitation of only one domain per server became a problem. The Host request header is introduced.

Let’s work through an example. It’s 1997. The Cardigan’s “Lovefool” is playing on the radio. A real computer, which is now a sleek black, runs a pretend computer which in turn runs a web server. You open up Netscape Communicator and browse to www.example.com.

The browser resolves www.example.com, connects to port 80 of the server and sends the few characters:

GET / HTTP/1.1
Host: www.example.com

The web server listens on port 80, receives the request, pays attention to the Host request header and sends back an appropriate HTML response.

The effect of this is that the web server can use the Host header to direct your request to one of many websites hosted on the same machine. It's no longer necessary to have one IP address per website. And thank goodness, because we didn't have anywhere near enough IP addresses for that!

TLS

As the web grew, ecommerce started taking off. Sending credit card numbers and other sensitive data in plain text across the network became a problem. We needed to encrypt content in transit across the web. The Secure Sockets Layer (SSL) provided secure communications between web browsers and web servers and led to the Transport Layer Security (TLS) protocol.

It’s 1999. Britney Spears’ “...Baby One More Time” is playing on the radio and our web server is now run by something of indeterminate colour by a mass hosting company. You open up Internet Explorer 5 and browse to www.example.com.

Before the browser and server get to speak HTTP, they first participate in a TLS conversation. The browser connects to port 443. As part of the TLS Server Name Indication extension, the browser indicates the name of the server it is contacting. The secure web server directs your request to one of many web servers.

After TLS is negotiated, the HTTP conversation continues over the secure connection. Sensitive information is no longer carried in plain text across the network and the Dot-com boom happens.

Today

It’s today. As we speak, hypertext spans the globe and we now listen to radio using HTTP. Every request sends a Host request header (for HTTP/1.1 requests) or an :authority pseudo-header (for HTTP/2 and HTTP/3 requests).

Fastly’s powerful edge cloud platform enables developers to build exceptional websites and apps. We sit in between our customers’ customers and our customers' servers (origins). It’s common to modify the request Host header in the Fastly layer as your origins might have a different naming convention to your public domains.

For example, www.example.com might be the public domain, whereas the real service might run on production.example.com.

This simple case is easy to configure. For a Fastly Delivery or Compute service, you can specify an override host on the origin. For a Fastly Compute service, use override_host in fastly.toml for development. For production, the Fastly CLI sets the override by default when adding a backend.

Object stores

While running physical servers and software used to be the only way to go, many of our customers use on-demand central cloud computing platforms. In particular, object stores such as Amazon S3 and Google Cloud Storage are great for storing images, assets and even static websites. Having Fastly in front of these provides supreme global performance.

These object stores live on different domains so we need to modify the Host header as the request travels through Fastly.

For Amazon S3, the simplest way to access a bucket is to use Virtual-hosted–style access. The override host name should be in the format:

<BUCKET NAME>.s3.<REGION-CODE>.amazonaws.com

For Google Cloud Storage, the simplest way to access a bucket is similar. The override host name should be in the format:

<BUCKET NAME>.storage.googleapis.com

We have a lot of useful information about these settings on Overriding the Host header.

As before, for a Fastly Delivery service, you can specify an override host on the origin. For a Fastly Compute service, use override_host in fastly.toml for development. For production, the Fastly CLI sets the override by default when adding a backend.

Don’t forget to secure access to your buckets using a signed AWS authorization header or a Google Cloud HMAC authentication. Enable shielding to reduce latency and the number of requests made to the object store.

Conclusion

We’ve traveled through time and demystified the HTTP Host header, a small part of the large web.