Forem: Marie Aurore

⚡🦀 Deploy a blazing-fast & Lightweight LLM app with Rust-Rig-LanceDB

Marie Aurore — Fri, 22 Nov 2024 16:00:35 +0000

TL;DR

A step-by-step walkthrough on deploying a LLM app using Rig & LanceDB on AWS Lambda. You'll learn how to prepare your app, choose the right storage backend (like S3 or EFS), and optimize performance by efficiently using cloud metrics.
Stats: Rig RAG Agent using LanceDB on AWS :
1. Low memory usage (96MB - 113MB)
2. Fast cold starts (consistently 160ms)
Stats: LangChain RAG Agent using LanceDB on AWS:
1. Higher memory usage (246MB - 360MB)
2. Slower cold starts (1,900ms - 2,700ms)
Jump to Metrics ⏬

How to Deploy Your Rig App with LanceDB: A Step-by-Step Guide
- Introduction
- Prerequisites
- Our use case: Montreal 🌇
- LanceDB Quick Overview 💾
- LanceDB Storage Backends
  - S3 - Object Store
  - Lambda ephemeral storage - Local file system
  - EFS - Virtual file system
- Metrics on the cloud ☁️
  - Memory, CPU, and runtime
- Langchain Montreal Agent App 🐍
  - Deployment package
  - Memory, CPU, and runtime
- Final Comparison between Rig and LangChain
- Resources

Introduction

Welcome back to Deploy Your Rig Application! Apps built with Rig vary in complexity based on LLM usage, vector databases for RAG, and infrastructure deployment. This series explores various configurations for production use.

⭐ Today's Highlight: Rig's LanceDB integration! ⭐

We'll deploy a Rig agent using OpenAI's text-embedding-ada-002 and GPT-4o, relying on the LanceDB vector store and deployed on AWS Lambda.

💡 If you're new to Rig and want to start from the beginning or are looking for additional tutorials, check out our blog series.

Let’s dive in!

Prerequisites

Before we begin building, ensure you have the following:

❗ We will not be covering how to write your RAG app with Rig, only how to deploy it. So make sure you read this tutorial first to help you code your application.

A clone of the rig-montreal-lancedb crate which includes two separate binaries: a loader (writes data to LanceDB) and an app (performs RAG on LanceDB).

An AWS account and some background knowledge on deployments on AWS, including Cloudformation templates

An OpenAI api key

Our use case: Montreal 🌇

The app in rig-montreal-lancedb RAGs data from montreal open data. The Montréal municipality generates and manages large quantities of data through its activities, such as data about agriculture, politics, transportation, health and much more. The open data app publishes all these datasets and make them freely accessible to all citizens! Our app will index the metadata of all the public datasets so that a user can ask questions pertaining to the open data.
The loader binary indexes all dataset metadata (name, description, tags, ...) into LanceDB and the app binary performs vector search on the data based on a prompt. For example:

Prompt: Give me information on gaseous pollutants in Montreal. How are the concentrations measured?
App answer: The concentrations of gaseous pollutants in Montreal are measured through the Réseau de surveillance de la qualité de l'air (RSQA), which is a network of measurement stations located on the Island of Montreal. These stations continuously determine the atmospheric concentration of various pollutants. The data is transmitted via telemetry, ...

LanceDB Quick Overview 💾

Lance is an open-source columnar data format designed for performant ML workloads.

Written in Rust 🦀.
Native support for storing, querying and filtering vectors, deeply nested data and multi-modal data (text, images, videos, point clouds, and more).
Support for vector similarity search, full-text search and SQL.
Interoperable with other columnar formats (such as Parquet) via Arrow
Disk-based indexes and storage.
Built to scale to hundreds of terabytes of data.

LanceDB is an open-source vector database.

Written in Rust 🦀.
Built on top of Lance.
Support for Python, JavaScript, and Rust client libraries to interact with the database.
Allows storage of raw data, metadata, and embeddings all at once.

LanceDB Storage Options

LanceDB's underlying optimized storage format, lance, is flexible enough to be supported by various storage backends, such as local NVMe, EBS, EFS, S3 and other third-party APIs that connect to the cloud.

💡 All you need to do to use a specific storage backend is define its connection string in the LanceDB client!

Let's go through some storage options that are compatible with AWS Lambda!

S3 - Object Store

❕Data is stored as individual objects all at the same level.
❕Objects are kept track of by a distributed hash table (DHT), where each object is identified by a unique ID.

Pros of Object Stores	Cons of Object Stores
Unlimited scaling ♾️: Objects can be stored across distributed systems, eliminating single-node limitations. This is ideal for ML and AI applications handling large data volumes.	Higher latency 🚚: Accessing a remote object store over a network via HTTP/HTTPS adds overhead compared to file system protocols like NFS. Additionally, storing metadata separately from objects introduces some retrieval latency.
Cheap 💸: The simple storage design makes it more affordable than traditional file systems.
Highly available and resilient 💪: Affordable storage allows for redundant data storage within and across data centers.

S3 + LanceDB setup on AWS lambda

⚠️ Important: S3 does not support concurrent writes. If multiple processes attempt to write to the same table simultaneously, it could lead to data corruption. But there's a solution! Use the DynamoDB commit store feature in LanceDB to prevent this.

Part I - Write lambda function code

Create an S3 Bucket where your Lance database will be stored. Ours is called: rig-montreal-lancedb.
In the lambda code, connect to the store via the LanceBD client as so:

// Note: Create s3://rig-montreal-lancedb bucket beforehand
let db = lancedb::connect("s3://rig-montreal-lancedb").execute().await?;
// OR
let db = lancedb::connect("s3+ddb://rig-montreal-lancedb?ddbTableName=my-dynamodb-table").execute().await?;

Part II - Deploy lambdas

💡 Need a refresher on Lambda deployments? Check out our previous blog for a full walkthrough.

# Lambda that writes to the store
cargo lambda build --release --bin loader
cargo lambda deploy --binary-name loader <your_loader_function_name>

# Lambda that reads to the store
cargo lambda build --release --bin app
cargo lambda deploy --binary-name app <your_app_function_name>

💡 Don’t forget to set the necessary IAM permissions! Your lambda functions need appropriate access to the S3 bucket — whether it’s read, write, or both.

Lambda ephemeral storage - Local file system

Lambda ephemeral storage is temporary and unique to each execution environment, it is not intended for persistent storage. In other words, any LanceDB store created during the lambda execution on ephemeral storage will be wiped when the function cold starts.
This option can be used for very specific use cases (mostly for testing) where writing to the store needs to be done in the same process as reading, and data is only read by a single lambda execution.

Ephemeral storage in a lambda is found in the /tmp directory. All you need to do is:

let db = lancedb::connect("/tmp").execute().await?;

EFS - Virtual file system

❕A serverless, elastic, shared file system designed to be consumed by AWS services like EC2 and Lambda.
❕Data is persisted and can be shared across lambda invocations (unlike the S3 without commit store and ephemeral storage options above).
❕Supports up to 25,000 concurrent connections.

Pros of EFS	Cons of EFS
Stateful lambda: Mounting an EFS instance on a lambda function provides knowledge of previous and concurrent executions.	Development time: More involved cloud setup
Low latency ⚡: A lambda function resides in the same VPC as the EFS instance, allowing low-latency network calls via the NFS protocol.	Cost 💲: More expensive than S3

EFS + LanceDB setup on AWS Lambda

💡 Setting up EFS in the cloud can be intricate, so you can use our CloudFormation template to streamline the deployment process.

Part I - Build Rust code and upload zip files to S3

In the lambda code, connect to the store via the LanceBD client as so:

let db = lancedb::connect("/mnt/efs").execute().await?;

Then, compile your code, zip the binaries, and upload them to S3:

# Can also do this directly on the AWS console
aws s3api create-bucket --bucket <your_bucket_name>

cargo lambda build --release --bin loader
cargo lambda build --release --bin app

cd target/lambda/loader
zip -r bootstrap.zip bootstrap
# Can also do this directly on the AWS console
aws s3 cp bootstrap.zip s3://<your_bucket_name>/rig/loader/

cd ..
zip -r bootstrap.zip bootstrap
# Can also do this directly on the AWS console
aws s3 cp bootstrap.zip s3://<your_bucket_name>/rig/app/

Part II - Understand Cloudformation template

The template assumes that your AWS account already has the following resources:

A VPC with at least two private subnets in separate availability zones, each with public internet access.
An S3 bucket (as created in Part I) for storing Lambda code. > 💡 If you’re missing these resources, follow this AWS tutorial to set up a basic VPC and subnets.

EFS setup

Mount Targets: Create two mount targets for your EFS instance — one in each subnet (specified in Parameters section of CFT template).
Security Groups: Set up an EFS security group with rules to allow NFS traffic from your Lambda functions’ security group.

Lambda functions setup

Loader and App Lambdas: Deploy both Lambda functions (loader and app) in the same subnets as your EFS mount targets.
Security Groups: Assign a security group that enables access to the EFS security group and public internet.
EFS Mounting: Configure the Lambdas to mount the EFS targets at /mnt/efs.

💡 Once everything’s ready, deploy the CloudFormation template to launch your environment with just one click!

Metrics on the cloud ☁️

If you've made it to here, you have the Montreal rig app with EFS as the LanceDbB storage backend deployed on AWS Lambda! 🎉 Now we want to look at some metrics when the app is run in the cloud.

For reference, we replicated the Montreal agent using langchain 🐍 in this python project which contains the source code for the loader and app lambdas. The python app uses the same LanceDB vector store on the same EFS instance as the Rig app. To see how the python app was configured in the cloud, take a look at the CloudFormation template.

Let's compare them!

Rig - Memory, runtime, and coldstarts

We invoked the app function 50 times for each memory configuration of 128MB, 256MB, 512MB, 1024MB using the power tuner tool.
The Cloudwatch query below gathers averages about runtime, memory usage, and cold starts of the lambda over the 50 invocations.

filter @type = "REPORT"
| stats 
      avg(@maxMemoryUsed) / 1000000 as MemoryUsageMB,
      avg(@duration) / 1000 as AvgDurationSec,
      max(@duration) / 1000 as MaxDurationSec, 
      min(@duration) / 1000 as MinDurationSec, 
      avg(@initDuration) / 1000 as AvgColdStartTimeSec, 
      count(*) as NumberOfInvocations,
      sum(@initDuration > 0) as ColdStartInvocations
by bin(1d) as TimeRange, @memorySize / 1000000 as MemoryConfigurationMB

Memory and runtime analysis
At the memory configuration of 128MB, the lambda has the lowest average memory usage of 96.1 MB and the highest runtime of 5.1s. At a memory configuration of 1GB, the lambda has the highest average memory usage of 113.1 MB and the lowest runtime of 4.4s. In other words, with an extra ~7MB of memory usage, the lambda function was 700ms faster.

Cold starts analysis ❄️
The average initialization time remains steady around 0.16s.

The chart below shows the power tuner results after running the app 50 times with each of the 4 memory configurations.

We see that adding memory to the function (and therefore adding computational power) does in fact affect the performance of the lambda by less than a second.

LangChain - Memory, runtime, and coldstarts

Deployment package

We are not able to use zip files for the deployment package of the lambda functions as the zip size exceeds the maximum size allowed by AWS. The loader dependencies and app dependencies create zip files of size around 150 MB.

Instead, we must use container images. The docker image has size 471.45MB using the base python lambda image.

We did the same experiment as with the Rig app above and got the following metrics:

First of all, the function is unable to run with a memory allocation of 128MB. It gets killed at this allocation size due to lack of memory. So we will compare the three following memory configurations: 256MB, 512MB, 1GB.

Memory and runtime analysis
At the memory configuration of 256MB, the lambda has the lowest average memory usage of 245.8 MB and the highest runtime of 4.9s. At a memory configuration of 1GB, the lambda has the highest average memory usage of 359.6 MB and the lowest runtime of 4.0s. In other words, with an extra ~113MB of memory usage, the lambda function was 1s faster.

Cold starts analysis ❄️
The average initialization time increases as the memory configuration increases with the lowest being 1.9s and the highest being 2.7s.

The chart below shows the power tuner results after running the app 50 times with each of the 4 memory configurations.

We see that adding memory to the function (and therefore adding computational power) also affects the performance of the lambda by about a second.

Final Comparison between Rig and LangChain

Based on the Cloudwatch logs produced by both the Rig and Langchain lambdas, we were able to produce the following graphics:

Resources

Rig is an emerging project in the open-source community, and we're continuously expanding its ecosystem with new integrations and tools. We believe in the power of community-driven development and welcome contributions from developers of all skill levels.

Stay connected and contribute to Rig's growth:

📚 Documentation: Comprehensive guides and API references
💻 GitHub Repository: Contribute, report issues, or star the project
🌐 Official Website: Latest news, tutorials, and resources

Join our community channel to discuss ideas, seek help, and collaborate with other Rig developers.

Thanks for reading,

Marie

Full-stack developer @ Playgrounds Analytics

How to Deploy Your Rig App on AWS Lambda: A Step-by-Step Guide

Marie Aurore — Fri, 01 Nov 2024 21:13:44 +0000

TL;DR

A step-by-step walkthrough on deploying a simple AI Agent built with Rig, a fullstack agent framework, on AWS Lambda using the cargo lambda CLI.
Comparison of performance metrics (memory usage, execution time, and cold starts) with a similar deployed Agent built with LangChain.
Stats: Rig Agent on AWS Lmabda :
1. Low memory usage (26MB average)
2. Fast cold starts (90.9ms)
3. Consistent performance across memory configurations
Stats: LangChain Agent on AWS Lmabda:
1. Higher memory usage (112-130MB)
2. Slower cold starts (1,898.52ms)
3. Performance improves with more memory allocation

How to Deploy Your Rig App on AWS Lambda: A Step-by-Step Guide
- Table of Contents
- Introduction
- Prerequisites
- AWS Lambda Quick Overview
  - AWS and Rust
    - REST API backend
    - Event based task
- Rig Entertainer Agent App
  - Now let's deploy it!
  - Metrics on the cloud
    - Deployment package
    - Memory, CPU, and runtime
    - Cold starts
- Langchain Entertainer Agent App
  - Deployment package
  - Memory, CPU, and runtime
  - Cold starts
- Community and Ecosystem
- The Road Ahead: Rig's Future
- Conclusion and Call to Action

Introduction

Welcome to the series Deploy Your Rig Application!
Apps built with Rig can vary in complexity across three core dimensions: LLM usage, knowledge bases for RAG, and the compute infrastructure where the application is deployed. In this series, we’ll explore how different combinations of these dimensions can be configured for production use.

Today, we’ll start with a simple Rig agent that uses the OpenAI model GPT-4-turbo, does not rely on a vector store (ie.: no RAGing), and will be deployed on AWS Lambda.

This blog will provide a step-by-step deployment guide for the simple Rig app, showcase performance metrics of the Rig app running on AWS Lambda, and compare these metrics with those of a LangChain app on the same platform.

💡 If you're new to Rig and want to start from the beginning or are looking for additional tutorials, check out our blog series.

Let’s dive in!

Prerequisites

Before we begin building, ensure you have the following:

A clone of the rig-entertainer-lambda crate (or your own Rig application).
An AWS account
An Open AI api key

AWS Lambda Quick Overview

You might deploy your Rust application on AWS lambda if it’s a task that can execute in under 15 mins or if your app is a REST API backend.

AWS 🤝 Rust

AWS Lambda supports Rust through the use of the OS-only runtime Amazon Linux 2023 (a lambda runtime) in conjunction with the Rust runtime client, a rust crate.

REST API backend

Use the lambda-http crate (from the runtime client) to write your function’s entrypoint.
Then, route traffic to your lambda via AWS API services like Api Gateway, App Sync, VPC lattice, etc ...
If your lambda handles multiple endpoints of your API, the crate axum facilitates the routing within the lambda.

Event based task (15 mins max.)

Your lambda function is invoked by some event with the event passed as the payload. For example, configure your S3 bucket to trigger the lambda function when a new object is added to the bucket. The function will receive the new object in the payload and can further process it.
Use the lambda_runtime crate with lambda_events (from the runtime client) to write your function’s entrypoint.
Then, invoke your function either via lambda invoke command or with integrated AWS triggers (ie. S3 UploadObject trigger).

For both cases, the crate tokio must also be added to your project as the lambda runtime client uses tokio to handle asynchronous calls.

Rig Entertainer Agent App 🤡

The crate rig-entertainer-lambda implements a simple Rust program that is executed via the lambda_runtime. It invokes a Rig agent using the OpenAI API, to entertain users with jokes. It is an event-based task that I will execute with the lambda invoke command.

The main takeaway here is that the app's Cargo.toml file must include the following dependencies:

rig-core (our rig crate)
lambda_runtime
tokio

Now let's deploy it!

There are many ways to deploy Rust lambdas to AWS. Some out of the box options include the AWS CLI, the cargo lambda CLI, the AWS SAM CLI, the AWS CDK, and more. You can also decide to create a Dockerfile for your app and use that container image in your Lambda function instead. See some useful examples here.

In this blog, we'll use the cargo lambda CLI option to deploy the code in rig-entertainer-rust from your local machine to an AWS lambda:

# Add your AWS credentials to my terminal
# Create an AWS Lambda function named ‘rig-entertainer’ with architecture x86_64.

function_name='rig-entertainer'

cd rig-entertainer-lambda
cargo lambda build --release # Can define different architectures here with --arm64 for example
cargo lambda deploy $function_name # Since the name of the crate is the same as the the lambda function name, no need to specify a binary file

Metrics on the cloud ☁️

Deployment package

This is the code configuration of the rig-entertainer function in AWS. The function’s code package (bundled code and dependencies required for lambda to run) includes the single rust binary called bootstrap, which is 3.2 MB.

Memory, CPU, and runtime

The image below gives metrics on memory usage and execution time of the function. Each row represents a single execution of the function. In yellow is the total memory used, in red is the amount of memory allocated, and in blue is the runtime.
Although the lambda has many configuration options for the memory ranging from 128MB to 1024MB, we can see that the average memory used by our app is 26MB.

Let's get more information on the metrics above by spamming the function and calculating averages. I invoked rig-entertainer 50 times for each memory configuration of 128MB, 256MB, 512MB, 1024MB using the power tuner tool and the result of those invocations are displayed in the chart below.

The x-axis is the memory allocation, and the y-axis is the average runtime over the 50 executions of rig-entertainer.

Q. We know that the function uses on average only 26MB per execution (which is less than the minimum memory allocation of 128MB) so why should we test higher memory configurations?

A. vCPUs are added to the lambda in proportion to memory so adding memory could still affect the performance.

However, we can see that adding memory to the function (and therefore adding computational power) does not affect its performance at all. Since the cost of a lambda execution is calculated in GB-seconds, we get the most efficient lambda for the lowest price!

Cold starts ❄️

Cold starts occur when the lambda function's execution environment needs to be booted up from scratch. This includes setting up the actual compute that the lambda function is running on, and downloading the lambda function code and dependencies in that environment.

Cold start latency doesn't affect all function executions because once the lambda environment has been setup, it will be reused by subsequent executions of the same lambda.

In the lambda cloudwatch logs, if a function execution requires a cold start, we see the Init Duration metric at the end of the execution.

For rig-entertainer, we can see that the average cold start time is 90.9ms:

Note that the function was affected by cold starts 9 times out of the 245 times it was executed, so 0.036% of the time.

Langchain Entertainer Agent App 🐍

I replicated the OpenAI entertainer agent using the langchain python library in this mini python app which I also deployed to AWS Lambda in a function called langchain-entertainer.

Let's compare the metrics outlined above.

Deployment package

This is the code configuration of the langchain-entertainer function in AWS. The function’s code package is a zip file including the lambda function code and all dependencies required for the lambda program to run.

Memory, CPU, and runtime

There are varying memory configurations from 128MB, 256MB, 512MB, to 1024MB for the lambda shown in the table below. When 128MB of memory is allocated, on average about 112MB of memory is used, and when more more than 128MB is allocated, about 130MB of memory is used and the runtime is lower.

Let's get some more averages for these metrics: I invoked langchain-entertainer 50 times for each memory configuration of 128MB, 256MB, 512MB, 1024MB using the power tuner tool and the result of those invocations were plotted in the graph below.

We can see that by increasing the memory allocation (and therefore computation power) of langchain-entertainer, the function becomes more performant (lower runtime). However, note that since you pay per GB-seconds, a more performant function is more expensive.

Cold starts ❄️

For langchain-entertainer, the average cold start time is: 1,898.52ms, ie. 20x as much as the rig app coldstart.

Note that the function was affected by cold starts 6 times out of the 202 times it was executed, so 0.029% of the time.

Resources

Stay connected and contribute to Rig's growth:

📚 Documentation: Comprehensive guides and API references
💻 GitHub Repository: Contribute, report issues, or star the project
🌐 Official Website: Latest news, tutorials, and resources

Join our community channel to discuss ideas, seek help, and collaborate with other Rig developers.

The Road Ahead: Rig's Future

As we continue to develop Rig, we're excited about the possibilities. Our roadmap includes:

Expanding LLM Provider Support: Adding integrations for more LLM providers to give developers even more choices.
Enhanced Performance Optimizations: Continuously improving Rig's performance to handle larger-scale applications.
Advanced AI Workflow Templates: Providing pre-built templates for common AI workflows to accelerate development further.
Ecosystem Growth: Developing additional tools and libraries that complement Rig's core functionality.

We're committed to making Rig the go-to library for LLM application development in Rust, and your feedback is crucial in shaping this journey.

Conclusion and Call to Action

Rig is transforming LLM-powered application development in Rust by providing:

A unified, intuitive API for multiple LLM providers
High-level abstractions for complex AI workflows
Type-safe development leveraging Rust's powerful features
Extensibility and seamless ecosystem integration

We believe Rig has the potential to significantly enhance developers' building of AI applications, and we want you to be part of this journey.

Your Feedback Matters! We're offering a unique opportunity to shape the future of Rig:

Build an AI-powered application using Rig.
Share your experience and insights via this feedback form.
Get a chance to win $100 and have your project featured in our showcase!

Your insights will directly influence Rig's development, helping us create a tool that truly meets the needs of AI developers. 🦀✨

Thanks for reading,

Marie

Full-stack developer @ Playgrounds Analytics

Forem: Marie Aurore

⚡🦀 Deploy a blazing-fast & Lightweight LLM app with Rust-Rig-LanceDB

TL;DR

Table of Contents

Introduction

Prerequisites

Our use case: Montreal 🌇

LanceDB Quick Overview 💾

LanceDB Storage Options

S3 - Object Store

S3 + LanceDB setup on AWS lambda

Part I - Write lambda function code

Part II - Deploy lambdas

Lambda ephemeral storage - Local file system

EFS - Virtual file system

EFS + LanceDB setup on AWS Lambda

Part I - Build Rust code and upload zip files to S3

Part II - Understand Cloudformation template

💡 Once everything’s ready, deploy the CloudFormation template to launch your environment with just one click!

Metrics on the cloud ☁️

Rig - Memory, runtime, and coldstarts

LangChain - Memory, runtime, and coldstarts

Deployment package

Final Comparison between Rig and LangChain

Resources

How to Deploy Your Rig App on AWS Lambda: A Step-by-Step Guide

Table of Contents

Introduction

Prerequisites

AWS Lambda Quick Overview

AWS 🤝 Rust

REST API backend

Event based task (15 mins max.)

Rig Entertainer Agent App 🤡

Now let's deploy it!

Metrics on the cloud ☁️

Deployment package

Memory, CPU, and runtime

Cold starts ❄️

Langchain Entertainer Agent App 🐍

Deployment package

Memory, CPU, and runtime

Cold starts ❄️

Resources

The Road Ahead: Rig's Future

Conclusion and Call to Action