Forem: jianzs@pluto-lang

Bridging the Last Mile in LangChain Application Development

jianzs@pluto-lang — Thu, 16 May 2024 14:06:21 +0000

Deploying LangChain applications can be complex due to the need for various cloud services. This article explores the challenges developers face and introduces Pluto, a tool that enables developers to focus on writing business logic rather than getting bogged down in tedious configuration tasks.

Undoubtedly, LangChain is the most popular framework for AI application development at the moment. The advent of LangChain has greatly simplified the construction of AI applications based on Large Language Models (LLM). If we compare an AI application to a person, the LLM would be the "brain," while LangChain acts as the "limbs" by providing various tools and abstractions. Combined, they enable the creation of AI applications capable of "thinking." However, this article does not delve into the specific usage of LangChain but aims to discuss with readers the last-mile issue in LangChain application development—how to deploy LangChain applications, using AWS as an example. Why deploy on AWS? The free tier is simply too appealing for daily use.

First, let's define the scope of this discussion: we are not talking about only deploying a LangChain application's code to the cloud. If that were the case, we would only need to consider using services like EC2 virtual machines, Fargate container services, or Lambda functions. However, a complete AI application often requires a series of backend services for support, such as using databases to save session histories or vector databases to store knowledge base embeddings. To achieve more comprehensive AI application functionalities, we might also need messaging queues, API Gateways, and so on. Therefore, we will discuss: How to deploy a LangChain application and its dependent backend services to the cloud together.

🔗 LangServe

Those familiar with the LangChain ecosystem might think of LangChain's sub-project LangServe upon reading this. LangServe's goal is to simplify the deployment of LangChain applications. It can package LangChain apps into API servers and provides default endpoints such as stream, async, docs, and playground. But LangServe alone does not resolve the deployment issues of LangChain applications. It ultimately provides an API server based on FastAPI, akin to frameworks like Flask and Django. How to deploy LangServe applications to the cloud and how to create and manage the dependent backend services remain unanswered by LangServe.

Nonetheless, LangChain is actively offering hosted LangServe capabilities on the LangSmith platform, indicating that the LangChain community is aware of the deployment issues and is working towards a solution. Even so, what about the backend services that LangChain applications depend on? Should LangSmith also provide these services? Aren't application hosting and backend services core competencies of cloud service providers? Why not directly use services from AWS, Azure, and others?

🪄 Three Ways to Deploy LangChain Applications

Let's examine how to deploy LangChain applications on AWS. Here, we introduce three different methods for deployment. If you have better approaches, feel free to join the discussion.

⚙️ AWS CDK

In an AWS GenAI Day event, AWS invited Harrison Chase, the CEO of LangChain. The theme was "Building and Deploying Cutting-edge Generative AI Applications with LangChain and Amazon Bedrock." You can watch the recorded event here.

Interestingly, a significant portion of the event was dedicated to introducing how AWS's services like OpenSearch, Bedrock, Kendra, etc., can integrate with LangChain. However, the final demonstration did not show how to create instances of these services or how to deploy LangChain applications on AWS. Instead, it showcased a locally executed LangChain application that utilized pre-deployed instances of AWS Bedrock and Kendra services.

However, I found a langchain-aws-template GitHub repository in the Resources list at the end of the video. It contains two example applications integrating AWS with LangChain, complete with deployment guides. The deployment process includes four steps:

Create a specific Python environment using Conda;
Configure keys and other necessary application data;
Execute a Bash script to package the application;
Deploy the application using AWS CDK.

It seems quite straightforward, right? But if you need to implement more complex features and rely on more backend services, you will have to modify the packaging process and the CDK deployment scripts. This can be challenging for developers unfamiliar with AWS CDK or AWS cloud services.

Additionally, we previously conducted a comparison that found using IaC-based deployment methods (like Terraform) results in IaC code that is 2-3 times the volume of the business code. This means using IaC tools requires spending a significant amount of time maintaining IaC code, while developers clearly prefer to focus more on business code development, as the goal is to implement application functionalities.

A quick note here: AWS CDK is one type of Infrastructure as Code (IaC) tool, alongside others like Terraform and Pulumi. They have similar usage patterns and share the issues mentioned above.

⌨️ AWS Console

If we don't use AWS CDK, we can manually create the backend services that applications depend on by logging into the AWS console. However, this method is quite cumbersome, involving repetitive navigation across different console pages to create various service instances and configure permissions between them. Moreover, these processes cannot be automated, making team collaboration, continuous integration, and continuous deployment impractical for complex applications.

As we can see from the above, both the AWS CDK deployment method and manual creation have their difficulties:

Prone to errors: Both methods essentially involve manually creating granular service instances, which can lead to configuration omissions and errors that are hard to detect during deployment and only surface when the application runs.
Requires AWS background knowledge: Whether defining service instances through CDK code or manually creating them via the console, developers need an in-depth understanding of AWS services, including direct dependencies like DynamoDB, S3, and indirect ones like IAM.
Tedious permission configuration: For security reasons, we usually adhere to the principle of least privilege when configuring permissions for resource service instances. If developers manually manage these permissions through CDK or the console, it will undoubtedly be a very cumbersome process. Moreover, it's easy to forget to update permission configurations after modifying business code.
Dependency management: When publishing a LangChain application as an AWS Lambda function instance, we need to package the application's SDK dependencies during the packaging process. This requires manual management by developers, which can lead to missed dependencies. If the local device's operating system or CPU architecture doesn't match AWS's platform, the packaging process becomes even more troublesome.

🤖️ Pluto

The analysis above shows that despite the powerful services offered by large cloud service providers like AWS, there's still a significant learning curve for developers to effectively utilize these services. This led us to an idea: what if we could deduce the infrastructure resource requirements of an application directly from the LangChain application code, and then automatically create corresponding resource instances on cloud platforms like AWS? This approach could simplify the process of resource creation and application deployment. Based on this idea, we developed a research and development tool named Pluto.

Pluto is a tool designed for individual developers, aimed at making the construction of cloud and AI applications more convenient, addressing the aforementioned usability issues related to cloud services. Developers can directly define and use the cloud services their application requires within the application code, including AWS DynamoDB, SageMaker, and more. Pluto employs static program analysis to automatically extract the infrastructure requirements from the code and create the necessary service instances on the designated cloud platform.

So, what's it like to deploy a LangChain application using Pluto? Let's look at a simple example:



import os

from pluto_client import Router, HttpRequest, HttpResponse
from langchain_core.pydantic_v1 import SecretStr
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

prompt = ChatPromptTemplate.from_template("tell me a short joke about {topic}")
model = ChatOpenAI(
    model="gpt-3.5-turbo",
    api_key=SecretStr(os.environ["OPENAI_API_KEY"]),
)
output_parser = StrOutputParser()

def handler(req: HttpRequest) -> HttpResponse:
    chain = prompt | model | output_parser
    topic = req.query.get("topic", "ice cream")
    joke = chain.invoke({"topic": topic})
    return HttpResponse(status_code=200, body=joke)

router = Router("pluto")
router.get("/", handler)

The code snippet above is a LangChain application implemented with Pluto, and it looks just like a regular Python application, doesn't it? However, by simply executing pluto deploy, Pluto can construct the application architecture shown in the figure below on the AWS platform. During the process, it will automatically create instances for the API Gateway and Lambda, and configure the routing, triggers, permissions, etc., from API Gateway to Lambda.

Due to space limitations, the example above only demonstrates the integration of a LangChain application with the API Gateway resource. Similarly, using variable definitions, you can integrate more resources such as DynamoDB, S3, SageMaker, and more. You can find additional examples here.

Since the infrastructure configuration is defined alongside the application code, developers can freely alter the code according to their needs. In the next pluto deploy, Pluto will automatically update the infrastructure configuration of the application without any extra steps from the developer, thus solving the previously mentioned issues of error-proneness, code packaging, and cumbersome permission configurations.

💬 Conclusion

The analysis above shows that despite the powerful services offered by large cloud service providers like AWS, there's still a significant learning curve for developers to effectively utilize these services. This might explain the emergence of AI Infra products like LangSmith, Modal, and LaptonAI, which aim to be one-stop service providers for AI applications. We, on the other hand, take a different approach by directly deducing the infrastructure requirements from the application code and automatically creating corresponding service instances on cloud platforms, thus assisting developers with deployment issues. Our hope is to enable developers to focus on writing business logic, allowing even those unfamiliar with AWS to deploy applications to the cloud without getting bogged down in the tedious configuration of infrastructure.

Lastly, if you like the Pluto project and want to give it a try, you can visit our Getting Started guide, which offers various usage options including containers and online. If you have any questions or suggestions, or would like to contribute (very welcome!), feel free to join our community to participate in discussions and co-creation.

And finally, if you've come this far, why not give us a star🌟? GitHub link 👉 https://github.com/pluto-lang/pluto.

Reference

Craft a Document QA Assistant for Your Project in Just 5 Minutes!

jianzs@pluto-lang — Mon, 13 May 2024 11:55:36 +0000

Let me introduce you to an incredibly simple way to build a document Q&A assistant, which allows you to create a personalized Web Q&A assistant based on your GitHub documentation repository in just 5 minutes.

First, let's take a look at the result. You can also experience it by opening this link.

Brief Introduction

First, let's quickly go over the working principle and implementation method of this document Q&A assistant.

The main logic of this document assistant is: upon initialization, it downloads documents from the GitHub repository where the documents are stored. Then, through LangChain, it calls OpenAI's Embeddings model to generate vectors for the documents and saves them to AWS S3 to avoid wasting Tokens by recreating document vectors. When a user inputs a question, LangChain is used again to generate vectors for the question. Then, the FAISS vector database retrieves related documents, and finally, GPT-3.5 is used to synthesize an answer. Additionally, this document Q&A assistant automatically updates the document vectors at 0:00 UTC every day.

To construct this document Q&A assistant, we utilized a series of frameworks and tools including LangChain, FastUI, and Pluto, and eventually deployed it on AWS. LangChain is used for the main Q&A functionality, invoking models such as OpenAI's Embedding and GPT-3.5; FastUI is used for the Web interface shown above; Pluto is in charge of creating and configuring cloud resources, as well as application deployment. With Pluto, you can create and configure resources such as AWS Lambda and S3 by just creating a few variables in the code.

Getting Started

To make it easy for everyone to get started, I've made this document Q&A assistant into a CodeSandbox template. Simply click the link below to open CodeSandbox's online IDE, then click Fork in the top right corner to copy the project to your account. After that, you are free to modify the code and deploy it to AWS with one click.

Next, I will introduce how to step by step build your own document Q&A assistant.

Prepare Tokens

Before you start, you need to prepare several key Tokens, including a GitHub Token for downloading document data from GitHub, an OpenAI API Key for invoking OpenAI models, and AWS credentials for application deployment, among others.

GitHub Token: You can create a GitHub Token on this page New personal access token (classic), with just the public_repo permission required.
OpenAI API Key: You can obtain an API Key from the OpenAI platform. Of course, you can also use other APIs that are compatible with the OpenAI API, where there's a place to configure base_url.
AWS Credentials: You need to get your AWS Access Key and Secret Key from AWS's console for deploying the application to AWS later.

Modify Basic Configuration

Once you enter the development environment, the console will automatically display the Configure AWS Certificate tab. Enter your AWS certificate information here to ensure the application can be successfully deployed to AWS. You can leave the output format field blank. After filling in the other necessary information, if everything is correct, you will see a green check mark ✔️ next to the tab name.

Next, we need to modify the configuration of the assistant. Open the app/main.py file, starting from line 25, the following lines contain the assistant's basic configuration, including the GitHub repository where the documents are stored, the repository branch, the relative path of the documents in the repository, OpenAI's API Key, GitHub Token, etc. You will need to modify these configurations according to your actual situation.



PROJECT_NAME = "Pluto" # Project name, related to the title of the Web page
REPO = "pluto-lang/website" # GitHub repository storing the documents
BRANCH = "main" # Branch of the repository
DOC_RELATIVE_PATH = "pages" # Relative path of the documents in the repository

OPENAI_BASE_URL = "https://api.openai.com/v1" # Base URL of the OpenAI API
OPENAI_API_KEY = "<replace_with_your_openai_api_key>" # OpenAI API Key
GITHUB_ACCESS_KEY = "<replace_with_your_github_access_key>" # GitHub Token

To customize your robot, such as changing the style of the robot's responses, you can achieve this by altering the prompt variable in the code.

One-Click Deployment

After the configuration is complete, simply click the terminal icon and select Deploy from the menu to deploy the document Q&A assistant to AWS, without any additional steps on your part. The deployment may take about a minute. Once it's done, the deployment URL will be displayed in the console, and you can access the document Q&A assistant by clicking this URL!

Note that due to AWS Lambda cold starts and the delay in building the vector database, the first visit may require a few seconds to a few tens of seconds of patience.

Destruction

If you want to take the application offline, simply click Destroy from the menu, and the resources created in AWS will be completely deleted.

Summary

The implementation of this document Q&A assistant is very simple; you only need to modify a few configurations to deploy it on AWS.

The reason it can be implemented so effortlessly is that you don't need to worry about the creation and configuration of cloud resources or the deployment of applications; this is mainly thanks to Pluto's capabilities. Pluto can automatically deduce the resources an application depends on from the code and automatically create and configure these resources, allowing developers to focus on the implementation of business logic.

If you want to learn more about Pluto's features, feel free to visit Pluto's official documentation or the GitHub repository. If you could give a Star🌟, that would be even better! We also welcome everyone to submit issues and PRs.

More Resources

If you want to create based on Llama3, SageMaker, you can refer to this case: Document-Q&A assistant based on Llama3
If you want to support session functionality, you can refer to this case: Building a Llama2 Conversational Chatbot with AWS and LangChain
Pluto Official Documentation: https://pluto-lang.vercel.app
Pluto GitHub Repository: https://github.com/pluto-lang/pluto

Deploy LangServe Application to AWS

jianzs@pluto-lang — Tue, 07 May 2024 09:46:55 +0000

This guide will introduce how to deploy the LangServe application to AWS with one click through Pluto, requiring only AWS access credentials, no need to learn AWS operations or log in to the AWS console.

LangServe is a subproject of LangChain, which can help developers deploy LangChain's Runnable and Chain through REST API. At the same time, it also provides a client for calling Runnable deployed on the server, including multiple versions such as Python and TypeScript, and provides Playground by default for online trial after deployment.

You can get all the code for this example from here. This link provides an online IDE for this sample application. Click the Fork button in the upper right corner to create your own development environment, and then you can directly modify the code and deploy it to AWS in the browser.

⚠️Note:

Since Pluto currently only supports single files, the code of the LangServe application needs to be placed in one file.
Limited by the current packaging method of Pluto, it does not yet support LangChain's Template Ecosystem. Coming soon

Environment Preparation

If you have not configured the Pluto development environment, please refer to the local development in Getting Started for configuration, or you can use the online sandbox or container provided by Pluto for experience.

Developing LangServe Application

Here we introduce two different ways to develop LangServe applications: one is the development method mentioned in the langserve tutorial, using the langchain app new command to create a new LangChain application; the other is using the pluto new command to create a new Pluto application.

Method 1: langchain app new

Install LangChain CLI

pip install langchain-cli

Create LangServe Application

Use the langchain app new command to create a new LangChain application. This command will create a new directory in the current directory, and the directory name is the application name you specified:

langchain app new --non-interactive my-app
cd my-app

Note: The langchain app new command depends on git, please make sure git is installed in your environment. If you are using the container environment provided by Pluto, please execute this command apt-get update && apt-get install -y git to install git.

Write LangServe Application

You can develop AI applications based on LangChain in the app/server.py file according to your needs. In the end, you should develop one or more Runnable instances such as LangChain's Agent, Chain, etc. These instances can be added to FastAPI through the add_routes method provided by LangServe, and then provided to users in the form of HTTP services.

We take the sample application on the LangServe homepage as an example. This example uses the add_routes method to add multiple Runnable instances of LangChain to FastAPI.

Modify Code to Adapt to Pluto

Next, we need to adapt the LangServe application to the Pluto application so that Pluto can deploy it to AWS. The adaptation process is also very simple, just two steps

First, you need to put the code related to the FastAPI app into a function and make this function return the FastAPI app instance. Here we assume that the function name is return_fastapi_app.
Then, replace the entire if __name__ == "__main__": code block with the following 4 statements. You can modify router_name to your favorite name. This name is related to the name of the Api Gateway instance created on AWS.

from mangum import Mangum
from pluto_client import Router

router = Router("router_name")
router.all("/*", lambda *args, **kwargs: Mangum(return_fastapi_app(), api_gateway_base_path="/dev")(*args, **kwargs), raw=True)

The final code is as follows:

from fastapi import FastAPI
from langchain.prompts import ChatPromptTemplate
# from langchain.chat_models import ChatAnthropic, ChatOpenAI
from langchain_anthropic import ChatAnthropic
from langchain_openai import ChatOpenAI
from langserve import add_routes
from langchain.pydantic_v1 import SecretStr

from mangum import Mangum
from pluto_client import Router

OPENAI_API_KEY = SecretStr("sk-EUk0Tal8cIkmG4vJF904F57a9eE241A8Ae72666fAxxxxxxx")
ANTHROPIC_API_KEY = SecretStr("sk-EUk0Tal8cIkmG4vJF904F57a9eE241A8Ae72666fAxxxxxxx")

model = ChatAnthropic(api_key=ANTHROPIC_API_KEY)
prompt = ChatPromptTemplate.from_template("tell me a joke about {topic}")

def return_fastapi_app():
    # The langserve depends on this, but it may not come pre-installed.
    # So, we write it here to ensure it is installed.
    import sse_starlette

    app = FastAPI(
        title="LangChain Server",
        version="1.0",
        description="A simple api server using Langchain's Runnable interfaces",
    )

    add_routes(
        app,
        ChatOpenAI(api_key=OPENAI_API_KEY),
        path="/openai",
    )

    add_routes(
        app,
        ChatAnthropic(api_key=ANTHROPIC_API_KEY),
        path="/anthropic",
    )

    add_routes(
        app,
        prompt | model,
        path="/joke",
    )

    return app

router = Router("router_name")
router.all(
    "/*",
    lambda *args, **kwargs: Mangum(return_fastapi_app(), api_gateway_base_path="/dev")(*args, **kwargs),
    raw=True,
)

Deploy to AWS

Before the official deployment, we need to initialize this project as a Pluto project, so that Pluto can recognize and deploy the project. Run the following command in the project root directory, Pluto will guide you to initialize the project interactively, please choose Python for the programming language:

pluto init

After the initialization is completed, we need to install some necessary dependencies, execute the following two commands:

npm install

# When the Python version does not match, please modify the python version number in pyproject.toml
poetry add pluto-client mangum langchain-openai langchain_anthropic sse_starlette

Finally, we can deploy the LangServe application to AWS by executing the following command:

poetry shell
pluto deploy app/server.py

Note: If your development environment is Arm64 architecture, please install and start docker in the environment. If you are using the container environment provided by Pluto, docker has been installed in the environment, but you need to configure the --privileged parameter when starting, and then manually start the docker service in the container, the startup command is:

dockerd > /dev/null 2>&1 &

This command will deploy your LangServe application as a serverless application to AWS, creating an Api Gateway instance and a Lambda function instance to handle requests. At the same time, the URL of AWS's Api Gateway will be printed in the terminal. You can access the deployed application by visiting this URL.

Method 2: pluto new

Create Pluto Application

Use the pluto new command to create a new Pluto application. This command will interactively create a new Pluto application and create a new directory in the current directory. The directory name is the application name you specified. Please choose Python for the programming language:

pluto new

After creation, enter the created application directory and install the necessary dependencies:

cd <project name>
npm install
pip install -r requirements.txt

Write LangServe Application

You can develop AI applications based on LangChain in the app/main.py file according to your needs. In the end, you should develop one or more Runnable instances such as LangChain's Agent, Chain, etc. These instances can be added to FastAPI through the add_routes method provided by LangServe, and then provided to users in the form of HTTP services.

However, here, we need to put the code related to the FastAPI app into a function and make this function return the FastAPI app instance, and finally encapsulate this function in the all method of Router, so that Pluto can deploy it to AWS.

Take the sample application on the LangServe homepage as an example, the final code is the same as the adapted code in the previous method.

Deploy to AWS

Ensure that all dependencies are installed, and you can deploy the LangServe application to AWS by executing the following command:

pluto deploy

Note: If your development environment is Arm architecture, please install and start docker in the environment. If you are using the container environment provided by Pluto, docker has been installed in the environment, but you need to configure the --privileged parameter when starting, and then manually start the docker service in the container, the startup command is:

dockerd > /dev/null 2>&1 &

pluto deploy will deploy your LangServe application as a serverless application to AWS, creating an Api Gateway instance and a Lambda function instance to handle requests. At the same time, the URL of AWS's Api Gateway will be printed in the terminal. You can access the deployed application by visiting this URL.

Access

After the deployment is complete, you can see the URL output by Pluto from the terminal. You can access your LangServe application through this URL.

⚠️Note:

Pluto does not yet support Stream access, and the result is still returned at once when using the astream method of LangServe.
Because the first load of LangChain dependencies may be slow, the first call to the LangServe service or access to the Playground may be slow, and it will automatically time out after 30 seconds. Therefore, if you encounter a timeout issue when accessing, please try again.
Each instance of an AWS Lambda function can only handle one request at a time, and the initialization time of each LangChain Lambda instance is close to 2 minutes, so there may be request timeout issues in high concurrency situations.

Call via RemoteRunnable

Still taking the Client provided by the sample application on the LangServe homepage as an example, you only need to replace the local URL in the LangServe example with the URL output by Pluto.

We did not use the Anthropic model, so we only retained the call of the OpenAI and Joke models. The modified Python client code is as follows, please replace the https://fcz1u130w3.execute-api.us-east-1.amazonaws.com/dev in the code with the URL output by Pluto:

import asyncio

from langchain.schema import SystemMessage, HumanMessage
from langchain.prompts import ChatPromptTemplate
from langchain.schema.runnable import RunnableMap
from langserve import RemoteRunnable


openai = RemoteRunnable(
    "https://fcz1u130w3.execute-api.us-east-1.amazonaws.com/dev/openai/"
)
joke_chain = RemoteRunnable(
    "https://fcz1u130w3.execute-api.us-east-1.amazonaws.com/dev/joke/"
)


def sync_inoke():
    result = joke_chain.invoke({"topic": "parrots"})
    print(
        ">> The result of `joke_chain.invoke({'topic': 'parrots'})` is:\n",
        result.content,
        "\n",
    )


async def async_inoke():
    result = await joke_chain.ainvoke({"topic": "parrots"})
    print(
        ">> The result of `await joke_chain.ainvoke({'topic': 'parrots'})` is:\n",
        result.content,
        "\n",
    )

    prompt = [
        SystemMessage(content="Act like either a cat or a parrot."),
        HumanMessage(content="Hello!"),
    ]

    # Supports astream
    print(">> The result of `openai.astream(prompt)` is:")
    async for msg in openai.astream(prompt):
        print(msg.content, end=" | ", flush=True)
    print()


def custom_chain():
    prompt = ChatPromptTemplate.from_messages(
        [("system", "Tell me a long story about {topic}")]
    )

    # Can define custom chains
    chain = prompt | RunnableMap(
        {
            "openai": openai,
            "anthropic": openai,
        }
    )

    result = chain.batch([{"topic": "parrots"}, {"topic": "cats"}])
    print(
        ">> The result of `chain.batch([{'topic': 'parrots'}, {'topic': 'cats'}])` is:\n",
        result,
    )


async def main():
    sync_inoke()
    await async_inoke()
    custom_chain()


asyncio.run(main())

The following figure shows the result of executing the Python client code:

The modified TypeScript client code is as follows, please replace the <your-api-gateway-url> in the code with the URL output by Pluto:

import { RemoteRunnable } from "@langchain/core/runnables/remote";

const chain = new RemoteRunnable({
  url: `<your-api-gateway-url>/joke/`,
});
const result = await chain.invoke({
  topic: "cats",
});

Access via curl

Similarly, you only need to replace the <your-api-gateway-url> in the example with the URL output by Pluto:

curl --location --request POST '<your-api-gateway-url>/joke/invoke' \
    --header 'Content-Type: application/json' \
    --data-raw '{
        "input": {
            "topic": "cats"
        }
    }'

The following figure shows the result of executing the curl command:

Access Playground via Browser

Due to the current routing policy of LangServe, we cannot directly access LangServe's Playground through the browser without modifying the code. After this PR is merged, LangServe's Playground can be directly accessed through the browser.

Now, we need to add an additional add_routes method for each add_routes method, and add the /dev prefix to the path parameter, so that LangServe's Playground can be accessed in the browser. Below is a sample code:

add_routes(
    app,
    ChatOpenAI(api_key=OPENAI_API_KEY),
    path="/openai",
)

add_routes(
    app,
    ChatOpenAI(api_key=OPENAI_API_KEY),
    path="/dev/openai",
)

After modifying and deploying, you can access the Playground of the sample application through the following URL. Note that you need to add /dev in the access path, that is, there are two /dev in the path. Note that the URL may be redirected. If it is modified, please adjust the path and try again.

OpenAI: <your-api-gateway-url>/dev/openai/playground
Anthropic: <your-api-gateway-url>/dev/anthropic/playground
Joke: <your-api-gateway-url>/dev/joke/playground

The two figures below show the results of accessing the Playground of OpenAI and Joke through the browser, respectively:

Cleanup

If you want to take the deployed LangServe application offline from AWS, you only need to execute the following command:

pluto destroy

Conclusion

In this article, we have explored in detail how to use Pluto to deploy the LangServe application to the AWS cloud platform with one click. This method allows you to easily deploy the LangServe application to the cloud and implement remote calls and Playground access, even if you are not familiar with AWS operations.

Pluto also provides the ability to automatically create resources such as DynamoDB, SNS, SageMaker, etc. You only need to write code, and pluto deploy will automatically create and configure these resources on AWS, providing you with the computing, storage and other capabilities of the cloud more conveniently, helping you to develop powerful AI applications more easily, and realize your ideas💡. You can get more information from More Resources.

We try to make the steps in this article as simple and easy to understand as possible, so even if you are not very familiar with Pluto or AWS, you can easily get started. If you encounter problems during reading and practice, or have new ideas, please feel free to seek help by submitting an issue or joining the Pluto Slack community.

More Resources

Quick Experience

Replace the OPENAI_API_KEY, ANTHROPIC_API_KEY, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION and other parameters in this script with your actual values, and then save this script to your local machine.

Executing this script will automatically create a LangServe sample application and deploy it to AWS, finally outputting the deployed URL. You can refer to the Access section in the text above to access the deployed application.

After execution, it will enter an interactive command line, making it easy for you to take the deployed application offline with pluto destroy.

OPENAI_API_KEY="<your-openai-api-key>"
AWS_ACCESS_KEY_ID="<your-aws-access-key-id>"
AWS_SECRET_ACCESS_KEY="<your-aws-secret-access-key>"
AWS_REGION="us-east-1"

# Prepare the modified code of LangServe application
MODIFIED_CODE=$(cat <<EOF
from fastapi import FastAPI
from langchain.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langserve import add_routes
from langchain.pydantic_v1 import SecretStr

from mangum import Mangum
from pluto_client import Router

OPENAI_API_KEY = SecretStr("${OPENAI_API_KEY}")

model = ChatOpenAI(api_key=OPENAI_API_KEY)
prompt = ChatPromptTemplate.from_template("tell me a joke about {topic}")

def return_fastapi_app():
    # The langserve depends on this, but it may not come pre-installed.
    # So, we write it here to ensure it is installed.
    import sse_starlette

    app = FastAPI(
      title="LangChain Server",
      version="1.0",
      description="A simple api server using Langchain's Runnable interfaces",
    )

    add_routes(
      app,
      ChatOpenAI(api_key=OPENAI_API_KEY),
      path="/openai",
    )

    add_routes(
      app,
      ChatOpenAI(api_key=OPENAI_API_KEY),
      path="/dev/openai",
    )

    add_routes(
      app,
      prompt | model,
      path="/joke",
    )

    add_routes(
      app,
      prompt | model,
      path="/dev/joke",
    )

    return app

router = Router("router_name")
router.all(
    "/*",
    lambda *args, **kwargs: Mangum(return_fastapi_app(), api_gateway_base_path="/dev")(*args, **kwargs),
    raw=True,
)
EOF
)

# Prepare the package.json file, used by the Pluto
PACKAGE_JSON=$(cat <<EOF
{
  "name": "my-app",
  "private": true,
  "version": "0.0.0",
  "scripts": {
    "test:dev": "pluto test --sim",
    "test:prod": "pluto test",
    "deploy": "pluto deploy",
    "destroy": "pluto destroy"
  },
  "dependencies": {},
  "devDependencies": {
    "@types/node": "^20",
    "typescript": "^5.2.2",
    "@plutolang/base": "latest",
    "@plutolang/pluto-infra": "latest",
    "@pulumi/pulumi": "^3.88.0"
  },
  "main": "dist/index.js"
}
EOF
)

# Prepare the Pluto configuration file
PLUTO_YML=$(cat <<EOF
current: aws
language: python
stacks:
  - configs: {}
    name: aws
    platformType: AWS
    provisionType: Pulumi
EOF
)

# Prepare the AWS credentials
AWS_CREDENTIALS=$(cat <<EOF
[default]
aws_access_key_id = ${AWS_ACCESS_KEY_ID}
aws_secret_access_key = ${AWS_SECRET_ACCESS_KEY}
EOF
)

# Prepare the AWS configuration
AWS_CONFIG=$(cat <<EOF
[default]
region = ${AWS_REGION}
EOF
)

# Prepare the script to run inside the Docker container
cat <<EOF1 > script.sh
#!/bin/bash

apt update
apt install -y git

pip install langchain-cli poetry

langchain app new --non-interactive my-app
cd my-app

cat << EOF2 > app/server.py
${MODIFIED_CODE}
EOF2

cat << EOF3 > package.json
${PACKAGE_JSON}
EOF3

mkdir -p .pluto
cat << EOF4 > .pluto/pluto.yml
${PLUTO_YML}
EOF4

npm install
sed -i 's/\^3.11/\^3.10/' pyproject.toml
poetry add pluto-client mangum langchain-openai sse_starlette

mkdir -p ~/.aws
cat << EOF5 > ~/.aws/credentials
${AWS_CREDENTIALS}
EOF5
cat << EOF6 > ~/.aws/config
${AWS_CONFIG}
EOF6

source \$(poetry env info --path)/bin/activate
pluto deploy -y --force app/server.py

bash
EOF1

# Run the script inside the Docker container
docker run -it --rm \
  --platform linux/amd64 \
  -v $(pwd)/script.sh:/script.sh \
  plutolang/pluto:latest bash -c "bash /script.sh"

Integrate Llama3 Into Your Application with Just One Command!

jianzs@pluto-lang — Thu, 25 Apr 2024 12:46:59 +0000

Llama3, heralded as the world's first "open-source GPT4," has finally arrived!

Llama3, the latest open-source Large Language Model (LLM) launched by Meta, includes the Llama3 8B with 8 billion parameters and the Llama3 70B with 70 billion parameters. Llama3 has made significant progress in performance, with the 8B model outperforming Gemma 7B and Mistral 7B Instruct across various benchmarks such as MMLU, GPQA, HumanEval, while the 70B model has surpassed the proprietary Claude 3 Sonnet and is on par with Google's Gemini Pro 1.5. Additionally, Meta is developing a version with more than 400 billion (400B) parameters, expected to have enhanced multilingual processing capabilities and the ability to understand non-textual patterns such as images.

With Llama3 at our disposal, we can craft an array of innovative applications, from engaging chatbots to intelligent Retrieval-Augmented Generation (RAG) QA bots, and beyond. However, deploying Llama3, integrating the deployed Llama3 with one's application, and deploying the application itself can be challenging for many developers.

This article introduces a development approach based on Pluto, which requires only writing application code and executing a single command to deploy Llama3 and release the application. This article will use a RAG-based document QA bot as an example to demonstrate this development method. The main function of this QA bot is to retrieve project documentation from a specified GitHub repository and then use the Llama3 model to answer questions based on the document content.

The following image shows the interaction with this QA bot, with the specified repository being Pluto's documentation repository. Thus, from the content of the image, one can get a basic understanding of what Pluto is:

Application Architecture

The example application to be implemented is based on the LangChain framework and uses OpenAI Embeddings as the document vectorization tool. The entire application will be deployed on AWS, and the deployed architecture is shown in the figure below:

Specifically, the deployed application will include the following AWS resource instances:

The Llama3 model will be deployed on SageMaker.
An S3 bucket will be created to store the document vector database, thus avoiding the need to rebuild the vector database each time a Lambda function is started.
A CloudWatch rule will be created to update the document vector database daily.
Two Lambda instances will be created, one for receiving user query requests and the other for updating the document vector database.

In addition to creating these resource instances, it is also necessary to configure the dependencies between resources, including triggers, IAM roles, and permission policies. However, you don't need to worry about these complex creation and configuration processes, as Pluto can deduce this information from the code and then automatically complete the creation and configuration of resources.

Setting Up Your Development Environment and Tokens

Prepare the Development Environment

First, you need to prepare the development environment for the Pluto application. Pluto provides three different development methods: container development, online development, and local development:

Container development: Create a container based on the plutolang/pluto:latest container image to serve as the development environment for the Pluto application.
Online development: Open and fork the template application created on CodeSandbox to develop the application directly in the browser.
Local development: Refer to the Pluto Local Development Guide to configure the Pluto development environment locally. You can learn about the detailed usage methods of various environments from the Pluto Getting Started Guide.

Prepare AWS Resource Quotas

To deploy the Llama3 8B model, at least the ml.g5.2xlarge instance type is required, and to deploy the Llama3 70B model, at least the ml.p4d.24xlarge instance type is required. The initial quota for these two types of instances is zero, so if you haven't applied for an increase, you may need to do so through the AWS Management Console. For a trial experience, you can also use the TinyLlama-1.1B-Chat-v1.0 model, which is compatible with the ml.m5.xlarge instance.

Prepare Tokens

This example application requires preparing API Keys from several different websites to implement the application functions:

OpenAI API Key: Used to access the OpenAI Embeddings API. You can obtain the API Key from OpenAI.
GitHub Access Token: Used to fetch documents from the GitHub repository. You can create a personal access token from GitHub.
Hugging Face Hub Token: Used to download the model from the Hugging Face Hub when deploying the model on AWS SageMaker. You can obtain the Token from Hugging Face.

Create a Pluto Application

If you are in a local or container environment, execute the following command interactively to create a new Pluto application. After completion, a new project directory will be created in the current directory with the project name.

pluto new

The cloud development environment already includes the basic project directory, so there is no need to create a new application.

Install Application Dependencies

After entering the project root directory, add the following dependency libraries to the requirements.txt file. These libraries are the dependencies required for this example application:

pluto_client
faiss-cpu
langchain-core
langchain-community
langchain-openai
langchain_text_splitters

Then execute the following two commands to install the dependency libraries:

npm install
pip install -r requirements.txt

Write Application Code

After installing the dependencies, you can start writing the application program in the app/main.py file. The example code for the document QA bot is attached at the end of this article. You can directly copy it into the app/main.py file and then modify the configuration parameters as needed.

Writing Pluto application code is similar to writing pure business code. Developers do not need to worry about how cloud resources are created and configured. They only need to define the resources required for the application and implement the application's business logic by creating objects in the code. When deploying, Pluto will automatically deduce the dependencies between cloud resources from the application logic and create and configure these resources on the cloud platform.
For example, we can define a SageMaker endpoint with the Llama3 model deployed using the following code snippet:

sagemaker = SageMaker(
    "llama3-model",
    "763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.1-tgi2.0.0-gpu-py310-cu121-ubuntu22.04-v2.0",
    SageMakerOptions(
        instanceType="ml.g5.2xlarge",
        envs={
            "HF_MODEL_ID": "meta-llama/Meta-Llama-3-8B-Instruct",
            "HF_TASK": "text-generation",
            "HUGGING_FACE_HUB_TOKEN": HUGGING_FACE_HUB_TOKEN,
        },
    ),
)

In subsequent code, we can directly call the sagemaker.invoke() method to trigger the SageMaker endpoint or use sagemaker.endpointName to get the name of the SageMaker endpoint. Pluto will automatically set the correct permissions for the functions calling the SageMaker instance, allowing the SageMaker endpoint to be called.

Similarly, creating resources like S3 and Lambda is the same. However, Pluto provides an abstraction layer for these common resource types provided by most cloud platforms to reduce developers' learning costs and facilitate the feature of deploying directly to multiple cloud platforms without modifying the code.
In the code snippet below, the Function resource type corresponds to Lambda functions on AWS and to Knative Service on Kubernetes, while the Bucket resource type corresponds to S3 storage buckets on AWS, and the corresponding type on Kubernetes is expected to be PV, which has not been implemented yet. Everyone is welcome to contribute.

vector_store_bucket = Bucket("vector-store")
Function(query, FunctionOptions(name="query", memory=512))

Deploy the Application

After completing the application code, you only need to execute the following command to directly deploy the application on AWS:

pluto deploy

The process of creating SageMaker may take a long time, even exceeding 20 minutes. Please be patient. After deployment, Pluto will return a URL address, which you can use to access your application.

Test the Application

This URL address accepts POST requests. The request body must be a JSON array containing only one element, corresponding to the parameter of the query function in the code. You can use the curl command or other HTTP client tools to interact with your application. For example, you can replace the URL in the code snippet below with the URL address you received, and also replace the specific question with the one you want to ask. Then, execute the code to interact with your application:

curl -X POST <URL> \
    -H "Content-Type: application/json" \
    -d '["What is Pluto?"]'

Here is a simple interactive script. You can save this script to a file and then execute the file to interact with your application, achieving the effect shown at the beginning of this article.

#!/bin/bash
# set -o xtrace
echo "NOTE: This QA bot cannot keep a conversation. It can only answer the question you just asked."
echo ""

read -p "Input the URL that Pluto has outputted: " URL
if [ -z $URL ]; then
    echo "Please set the BOT_URL env var first"
    exit 1
fi

echo -e "\nNow you can ask your question!"
user_message=""
while :; do
    echo "Press 'q' to quit."
    read -p "User > " user_message
    if [[ $user_message == "q" ]]; then
        echo "Bye. 👋"
        break
    fi

    payload=$(jq -n --arg msg "$user_message" '[$msg]')
    response=$(curl -s -w "\n%{http_code}" -X POST "$URL?n=1" -d "$payload" -H 'Content-type: application/json')

    http_code=$(echo "$response" | tail -n1)
    response_content=$(echo "$response" | head -n-1)
    body=$(echo "$response_content" | jq -r '.body')
    code=$(echo "$response_content" | jq -r '.code')

    if [[ "$http_code" -ne 200 || "$code" -ne 200 ]]; then
        echo "Server responded with error: $http_code, $response_content"
        exit 1
    fi

    body=$(echo "$response_content" | jq -r '.body')
    echo "Bot  > $body"
    echo -e "\n"
done

Take Down the Application

If you want to take down the application, you only need to execute the following command:

pluto destroy

Extend the Application

If you want to implement a session-based QA bot, you can use the KVStore resource type to save the session. The example application "Session Chatbot" can be used as a reference.

If you want to rewrite the application as a LangServe application to use LangServe's RemoteRunable and Playground components, you can refer to the "Deploy LangServe to AWS" document.

More Resources

Example Code

Below is the example code for the document QA bot. You can copy it into the app/main.py file and modify the configuration parameters as needed.

import os
import re
import sys
import json
import logging
from typing import Dict

from pluto_client import FunctionOptions, Function, Bucket, Schedule
from pluto_client.sagemaker import SageMaker, SageMakerOptions

from langchain_openai import OpenAIEmbeddings
from langchain_core.pydantic_v1 import SecretStr
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_community.vectorstores.faiss import FAISS
from langchain_text_splitters import MarkdownTextSplitter
from langchain_community.document_loaders.github import GithubFileLoader
from langchain_community.llms.sagemaker_endpoint import (
    SagemakerEndpoint,
    LLMContentHandler,
)


# ====== Configuration ======
# 1. The OpenAI API key is used to access the OpenAI Embeddings API. You can get the API key from
# https://platform.openai.com/account/api-keys
# 2. The GitHub Access Key is used to fetch the documents from the GitHub repository. You can create
# a personal access token from https://github.com/settings/tokens
# 3. The Hugging Face Hub token is used to download the model from the Hugging Face Hub when
# deploying the model on AWS SageMaker. You can get the token from
# https://huggingface.co/settings/tokens

REPO = "pluto-lang/website"
BRANCH = "main"
DOC_RELATIVE_PATH = "pages"
OPENAI_BASE_URL = "https://api.openai.com/v1"
OPENAI_API_KEY = "<replace_with_your_key>"
GITHUB_ACCESS_KEY = "<replace_with_your_key>"
HUGGING_FACE_HUB_TOKEN = "<replace_with_your_key>"
# ===========================


FAISS_INDEX = "index"
PKL_KEY = f"{FAISS_INDEX}.pkl"
FAISS_KEY = f"{FAISS_INDEX}.faiss"

embeddings = OpenAIEmbeddings(
    base_url=OPENAI_BASE_URL, api_key=SecretStr(OPENAI_API_KEY)
)

vector_store_bucket = Bucket("vector-store")

"""
Deploy the Llama3 model on AWS SageMaker using the Hugging Face Text Generation Inference (TGI)
container. If you're unable to deploy the model because of the instance type, consider using the
TinyLlama-1.1B-Chat-v1.0 model, which is compatible with the ml.m5.xlarge instance.

Below is a set up minimum requirements for each model size of Llama3 model:
Model      Instance Type      # of GPUs per replica
Llama 8B   ml.g5.2xlarge      1
Llama 70B  ml.p4d.24xlarge    8

The initial limit set for these instances is zero. If you need more, you can request an increase
in quota via the [AWS Management Console](https://console.aws.amazon.com/servicequotas/home).
"""
sagemaker = SageMaker(
    "llama3-model",
    "763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.1-tgi2.0.0-gpu-py310-cu121-ubuntu22.04-v2.0",
    SageMakerOptions(
        instanceType="ml.g5.2xlarge",
        envs={
            "HF_MODEL_ID": "meta-llama/Meta-Llama-3-8B-Instruct",
            "HF_TASK": "text-generation",
            # If you want to deploy the Meta Llama3 model, you need to request a permission and
            # prepare the token. You can get the token from https://huggingface.co/settings/tokens
            "HUGGING_FACE_HUB_TOKEN": HUGGING_FACE_HUB_TOKEN,
        },
    ),
)


class ContentHandler(LLMContentHandler):
    content_type = "application/json"
    accepts = "application/json"

    def transform_input(self, prompt: str, model_kwargs: Dict) -> bytes:
        if "stop" not in model_kwargs:
            model_kwargs["stop"] = ["<|eot_id|>"]
        elif "<|eot_id|>" not in model_kwargs["stop"]:
            model_kwargs["stop"].append("<|eot_id|>")

        input_str = json.dumps({"inputs": prompt, "parameters": model_kwargs})
        return input_str.encode("utf-8")

    def transform_output(self, output: bytes) -> str:
        raw = output.read()  # type: ignore
        response_json = json.loads(raw.decode("utf-8"))
        content = response_json[0]["generated_text"]

        assistant_beg_flag = "assistant<|end_header_id|>"
        answerStartPos = content.index(assistant_beg_flag) + len(assistant_beg_flag)
        answer = content[answerStartPos:].strip()
        return answer


def build_logger():
    logger = logging.getLogger(__name__)
    logger.setLevel(logging.INFO)
    # Create a console handler
    handler = logging.StreamHandler()
    handler.flush = sys.stdout.flush
    handler.setLevel(logging.INFO)
    # Create a formatter and add it to the handler
    formatter = logging.Formatter(
        "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
    )
    handler.setFormatter(formatter)
    # Add the handler to the logger
    logger.addHandler(handler)
    return logger


logger = build_logger()


def create_vector_store() -> FAISS | None:
    # Explicitly import Faiss to alert Pluto that this function relies on it, ensuring the inclusion
    # of the Faiss package in the deployment bundle.
    import faiss

    def file_filter(file_path):
        return re.match(f"{DOC_RELATIVE_PATH}/.*\\.mdx?", file_path) is not None

    loader = GithubFileLoader(
        repo=REPO,
        branch=BRANCH,
        access_token=GITHUB_ACCESS_KEY,
        github_api_url="https://api.github.com",
        file_filter=file_filter,
    )
    docs = loader.load()

    if len(docs) == 0:
        logger.info("No documents updated")
        return
    logger.info(f"Loaded {len(docs)} documents")

    for doc in docs:
        doc.metadata["source"] = str(doc.metadata["source"])

    logger.info(f"Starting to split documents")
    text_splitter = MarkdownTextSplitter(chunk_size=2000, chunk_overlap=200)
    splits = text_splitter.split_documents(docs)

    logger.info(f"Starting to create vector store")
    store = FAISS.from_documents(splits, embeddings)
    logger.info(f"Finished creating vector store")

    return store


def download_vector_store(vector_store_dir: str):
    ensure_dir(vector_store_dir)
    vector_store_bucket.get(PKL_KEY, os.path.join(vector_store_dir, PKL_KEY))
    vector_store_bucket.get(FAISS_KEY, os.path.join(vector_store_dir, FAISS_KEY))


def flush_vector_store(vector_store_dir: str = "/tmp/vector_store"):
    vector_store = create_vector_store()
    if vector_store is None:
        return

    ensure_dir(vector_store_dir)
    vector_store.save_local(vector_store_dir, index_name=FAISS_INDEX)
    vector_store_bucket.put(PKL_KEY, os.path.join(vector_store_dir, PKL_KEY))
    vector_store_bucket.put(FAISS_KEY, os.path.join(vector_store_dir, FAISS_KEY))


def build_retriever():
    vector_store_dir = "/tmp/vector_store"
    if not os.path.exists(vector_store_dir):
        try:
            logger.info("Vector store not found, downloading...")
            download_vector_store(vector_store_dir)
        except Exception as e:
            logger.error(f"Failed to download vector store: {e}")
            flush_vector_store(vector_store_dir)

    logger.info("Loading vector store")
    vectorstore = FAISS.load_local(
        vector_store_dir, embeddings, allow_dangerous_deserialization=True
    )
    logger.info("Vector store loaded")
    return vectorstore.as_retriever()


def ensure_dir(dir: str):
    if not os.path.exists(dir):
        os.makedirs(dir)


def get_aws_region() -> str:
    aws_region = os.environ.get("AWS_REGION")
    if aws_region is None:
        raise ValueError("AWS_REGION environment variable must be set")
    return aws_region


# Leaving the following variable outside the handler function will allow them to be reused across
# multiple invocations of the function.
retriever = build_retriever()

# Create the prompt template in accordance with the structure provided in the Llama3 documentation,
# which can be found at https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/
prompt = PromptTemplate.from_template(
    """<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise. In case the query requests a link, respond that you don't support links.
Context: {context}<|eot_id|><|start_header_id|>user<|end_header_id|>

{question}<|eot_id|><|start_header_id|>assistant<|end_header_id|>"""
)

llm = SagemakerEndpoint(
    endpoint_name=sagemaker.endpoint_name,  # SageMaker endpoint name
    region_name=get_aws_region(),
    content_handler=ContentHandler(),
    model_kwargs={
        "max_new_tokens": 512,
        "do_sample": True,
        "temperature": 0.6,
        "top_p": 0.9,
    },
)


def query(query):
    def format_docs(docs):
        return "\n\n".join(doc.page_content for doc in docs)

    rag_chain = (
        {"context": retriever | format_docs, "question": RunnablePassthrough()}
        | prompt
        | llm
        | StrOutputParser()
    )

    return rag_chain.invoke(query)
    # The line below serves as a notification to Pluto that the function will trigger SageMaker
    # endpoint. So, Pluto will set the appropriate permissions for the function.
    sagemaker.invoke("")


schd = Schedule("schedule")
schd.cron("0 0 * * *", flush_vector_store)

# This application requires a minimum of 256MB memory to run.
Function(query, FunctionOptions(name="query", memory=512))

Q&A

Why Not Use Api Gateway?

There are two reasons:

ApiGateway has an inalterable 30-second timeout limit. This means if the generation process exceeds this time window, we would receive a 503 Service Unavailable error. Therefore, we use Lambda functions directly to handle requests. We will attempt to improve the experience by supporting WebSocket in the future.
The application example requires more than the default 128MB memory. As for the routing functions for ApiGateway, Pluto currently does not support setting the memory size. For the Function resource type corresponding to AWS Lambda, the memory size can be set.

Rethinking a Cloud-Native Application Development Paradigm

jianzs@pluto-lang — Tue, 02 Jan 2024 11:45:00 +0000

TL;DR: The existing development model for cloud-native applications faces complexity and orchestration challenges in the FaaS environment. This article introduces a novel development paradigm that combines Monolithic Programming with Compile-Time Splitting to streamline cloud-based execution and enhance development efficiency. The concept involves leveraging a compiler to automatically partition monolithic application code, enabling distributed execution on cloud infrastructure.

Introduction

Cloud-native applications are conventionally identified as those designed and nurtured on cloud infrastructure. Such applications, rooted in cloud technologies, skillfully benefit from the cloud's features of autoscaling and high reliability. This offers robust support for individual developers and SMEs, liberating them from wrestling with the complexities of infrastructure management—an undoubtedly attractive proposition.

Present Scenario

So, how are cloud applications currently being developed? References to cloud-native applications usually conjure images of containers and microservices, both of which are intrinsic to the CNCF's definition of the term. A typical development workflow might encompass writing application code at a microservices level, packaging it into containers, and deploying it onto a PaaS platform.

However, as cloud technologies evolve, Function as a Service (FaaS) has emerged as a pivotal cloud component. Compared to PaaS, FaaS integrates more intimately with cloud capabilities, offering superior performance in scaling and cold starts. For instance, AWS Lambda has achieved cold start latencies on the order of hundreds of milliseconds

Hurdles

But does the traditional development methodology of microservices and containers still prove effective for FaaS-based cloud-native applications? Consider the following:

Services encapsulated by functions, often called NanoServices, possess a finer granularity than microservices. The process of packaging and deploying each function individually can be tedious, despite the command-line tools provided by cloud vendors.
Dividing an application into numerous functions, each functionally isolated, necessitates inter-function communication via SDKs. This results in a development experience that lacks consistency, with frequent context-switching among functions.
Orchestrating functions entails a steep learning curve, demanding familiarity with the cloud's event mechanisms and orchestration tools.

Essentially, the development model directly based on FaaS falls short of ideal, posing substantial challenges in effectively managing and coordinating functions. It's only natural to aspire for an improved application development experience. To this end, a fresh concept is proposed: Monolithic Programming, Compile-Time Splitting, and Distributed Execution.

Conceptual Analogies

This concept might strike a chord with the parallel programming framework OpenMP. OpenMP leverages the compiler to inject multithreading code for parallel execution within code segments earmarked for concurrency. Similarly, this novel concept uses the compiler to identify and extract code regions capable of independent computation, while the cloud infrastructure assumes the responsibility of managing the distributed execution.

Frameworks like MapReduce, Spark, and the increasingly popular Ray, all operating in the realm of cloud computing, target large-scale distributed computing. The notable difference here lies in the underlying runtime implementation. While these frameworks construct their runtime environments to support the distributed execution of specific task types, the proposed concept harnesses FaaS provided by cloud infrastructure as a uniform underlying environment to enable general computation for a range of cloud-native applications. This approach presents the potential for tighter integration with cloud capabilities and accommodates diverse workloads.

The Concept

Why "Monolithic Programming"?

Developing a monolithic application is a remarkably seamless experience. With all context residing within a single project, tools like linters, formatters, and IDE plugins can verify variable dependencies and function calls prior to execution.

How is "Compile-time Splitting" Achievable?

Files containing code devoid of programming constraints are challenging to split. Nevertheless, by defining keywords, special classes, or functions, the compiler can be directed to partition the code.

Consider the code snippet below. The Function class can be viewed as a special construct, where the function definition passed to its constructor is analyzed and extracted into a standalone computational module. Of course, real-world implementations would necessitate addressing many more nuances.

class Function  {
    constructor(fn: (...args: any[]) => any) { /* implementation details */ }
}

const fn = new Function((a: number, b: number ) => { return a + b; });

async main() {
    const c = await fn.invoke(1, 1);
    console.log("The sum of 1 + 1 is ", c);
}
main();

Crucially, once computational modules are demarcated, the original code file excluding these modules should also be treated as a computational module. This module serves as the application's entrypoint, akin to the main function in a monolithic app, orchestrating the entire application logic.

Thus, we achieve the development experience of monolithic programming, paired with the capability for cloud-based distributed execution through compile-time splitting. The outcome of this development approach is an application that thrives directly on the cloud infrastructure, embodying a true cloud-native app.

Example

Take an example program based on this concept. The program applies the Monte Carlo method to compute Pi. The logic is straightforward: spawn 10 Workers, each conducting a million samples, then accumulate the results.

const calculatePi = new Function((iterations: number): number => {
  let insideCircle = 0;

  for (let i = 0; i < iterations; i++) {
    const x = Math.random();
    const y = Math.random();
    if (x * x + y * y <= 1) {
      insideCircle++;
    }
  }

  const piEstimate = (insideCircle / iterations) * 4;
  return piEstimate;
});

async function main() {
  const workerCount = 10;
  const iterationsPerWorker = 1000000;

  let piPromises: Promise<number>[] = [];
  for (let i = 0; i < workerCount; i++) {
    piPromises.push(calculatePi.invoke(iterationsPerWorker));
  }

  const piResults = await Promise.all(piPromises);

  const piSum = piResults.reduce((sum, current) => sum + current, 0);
  const pi = piSum / workerCount;
  console.log(`Estimated value of π: ${pi}`);
}

main();

The execution of this code is expected to pan out as depicted above:

During the compilation stage, two computational modules are extracted: one for calculatePi, and another for the main code, excluding calculatePi.
These modules are subsequently deployed as separate FaaS resource instances.
Upon deployment, invoking the instance corresponding to the main code yields the output results from the logs.

For more elaborate examples, consider the following:

Pluto is committed to further exploring this concept throughout 2024, with a focus on leveraging static analysis and Infrastructure as Code (IaC) to facilitate implementation. If you find this concept intriguing or have relevant use cases, we encourage you to connect with us.

If you support this idea, please star the project.

References

Building Cloud-Native Applications Made Easy with Pluto: A Guide for Developers

jianzs@pluto-lang — Wed, 22 Nov 2023 15:13:13 +0000

Developers define variables in their code, and Pluto takes care of automatically creating and managing the required cloud resource components based on those variables. This simplifies the process of deploying and managing cloud infrastructure, enabling developers to make better use of the cloud.

In the context, cloud resources do not refer to Infrastructure as a Service (IaaS), but rather to managed resource components such as Backend as a Service (BaaS) and Function as a Service (FaaS). These managed components generally provide enhanced reliability and cost-effectiveness compared to building and managing your own instances.

In this article, we will guide you through the steps of getting started with Pluto and help you become familiar with its features.

Installation

Prerequisites

Node.js: Pluto supports writing cloud applications using TypeScript.
Pulumi: Pluto uses Pulumi to interact with cloud platforms (such as AWS or Kubernetes) and deploy cloud resources.

Pluto CLI

The Pluto command-line tool is distributed via npm. Install it by running the following command:

npm install -g @plutolang/cli

Verify your installation:

pluto --version

Hello, Pluto

Now, let's get started with your first Pluto program.

Create your project

Create a Pluto project using the Pluto CLI by running:

pluto new

This command will interactively create a project and create a directory with the provided project name. Here's an example:

$ pluto new
? Project name hello-pluto
? Stack name dev
? Select a platform AWS
? Select an IaC engine Pulumi
Info:  Created a project, hello-pluto

After you have created the new project, go to the project root directory and install the dependencies.

cd <project_root>  
npm install

Write your business logic

Use your preferred code editor to write the following code in <project_root>/src/index.ts:

import { Router, Queue, KVStore, CloudEvent, HttpRequest, HttpResponse } from "@plutolang/pluto";

const router = new Router("router");
const queue = new Queue("queue");
const kvstore = new KVStore("kvstore");

// Publish the access time to the queue, and respond with the last access time.
router.get("/access", async (req: HttpRequest): Promise<HttpResponse> => {
  const name = req.query["name"] ?? "Anonym";
  await queue.push(JSON.stringify({ name, accessAt: `${Date.now()}` }));
  const lastAccess = await kvstore.get(name).catch(() => undefined);
  const respMsg = lastAccess
    ? `Hello, ${name}! The last access was at ${lastAccess}`
    : `Hello, ${name}!`;
  return { statusCode: 200, body: respMsg };
});

// Subscribe to messages in the queue and store them in the KV database.
queue.subscribe(async (evt: CloudEvent): Promise<void> => {
  const data = JSON.parse(evt.data);
  await kvstore.set(data["name"], data["accessAt"]);
  return;
});

This code includes 3 resource variables and 2 processes:

An HTTP service called "router" that accepts /access HTTP requests. It publishes the access time to the message queue "queue" and retrieves the last access time from the KV database "kvstore" and returns it in the response.
A message queue named "queue" with a subscriber that saves messages from the queue to the KV database "kvstore".
A KV database named "kvstore" used to store users' last access time.

Deploy your application

To deploy your application to the cloud platform you configured initially, run the following command:

pluto deploy

If you specified AWS as the cloud platform, make sure the AWS_REGION environment variable is correctly configured, for example:

export AWS_REGION=us-east-1

Pluto will create 3 resource components and 2 function objects on the specified cloud platform. For example, if you chose AWS, it will create:

An ApiGateway named "router"
An SNS named "queue"
A DynamoDB named "kvstore"
Two Lambda functions starting with "function"

Multi-platform deployment

If you want to deploy to another cloud platform, you can create a new stack and specify the stack during deployment. Here are the steps:

Create a new stack:

pluto stack new

Specify the stack during deployment:

pluto deploy --stack <new_stack>

More Resources

Example: Command-line chatbot based on OpenAI
Example: Share a daily computer joke on Slack
Repository: Pluto | GitHub

Pluto's main approach is to leverage techniques such as static program analysis and infrastructure as code to automatically create cloud resource components on the cloud platform by defining a variable. The goal of Pluto is to assist individual developers in building cloud-native applications with ease and reduce the learning curve associated with cloud capabilities.

Please note that Pluto is still in its early stages, and we welcome contributions from interested developers. If you are using AWS or Kubernetes, you can provide us with your requirements. We also appreciate any ideas or suggestions you may have, as they can be implemented in future versions. Feel free to join our Slack community