<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Ian Macartney</title>
    <description>The latest articles on Forem by Ian Macartney (@ianmacartney).</description>
    <link>https://forem.com/ianmacartney</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F790051%2Ffe4a3445-f7cd-4dbb-91e4-7e4b5d7a2e74.jpeg</url>
      <title>Forem: Ian Macartney</title>
      <link>https://forem.com/ianmacartney</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/ianmacartney"/>
    <language>en</language>
    <item>
      <title>Work Stealing: Load-balancing for compute-heavy tasks</title>
      <dc:creator>Ian Macartney</dc:creator>
      <pubDate>Wed, 24 Jul 2024 00:17:26 +0000</pubDate>
      <link>https://forem.com/ianmacartney/work-stealing-load-balancing-for-compute-heavy-tasks-1g77</link>
      <guid>https://forem.com/ianmacartney/work-stealing-load-balancing-for-compute-heavy-tasks-1g77</guid>
      <description>&lt;p&gt;For fast, light-weight workloads, you can often get away with a small number of powerful machines. Even when load isn’t distributed evenly, a single backlogged machine won’t noticeably impact a user’s experience.&lt;/p&gt;

&lt;p&gt;However, when your app requires heavy operations, such as running requests on an LLM, transcoding a video, or intensive cryptography, you need a better strategy for handling concurrency. &lt;/p&gt;

&lt;p&gt;Requests that monopolize many CPU or GPU cores require more machines, as each machine is able to handle less parallelism. When you factor in slow requests, a single backlogged machine can cause significant delays and p95 performance degredation, even if the overall system has extra bandwidth.&lt;/p&gt;

&lt;p&gt;So how can you distribute the load across many workers? &lt;/p&gt;

&lt;h3&gt;
  
  
  tl;dr
&lt;/h3&gt;

&lt;p&gt;In this post I’ll explain the “work stealing”&lt;sup id="fnref1"&gt;1&lt;/sup&gt; strategy for task distribution and why you should consider it for workloads that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Take significant time to execute.&lt;/li&gt;
&lt;li&gt;Do not share resources well, such as GPU-intensive computation.&lt;/li&gt;
&lt;li&gt;Prioritize throughput and utilization over average-case latency.&lt;/li&gt;
&lt;li&gt;Run locally, behind a NAT, or are otherwise not discoverable from a web server.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Overview
&lt;/h2&gt;

&lt;p&gt;We will look at two strategies for managing resource-intensive workloads:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Push-based&lt;/strong&gt; routing: a load balancer decides where to send requests and waits for a response from the worker, which it then returns to the client.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pull-based&lt;/strong&gt; “work stealing”: an incoming request is put on one or more queues from which workers pull. They publish results which can be included in the response to the original request, or pulled from the client via a subscription, allowing the original response to return early. Multiple clients can subscribe to the result.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;One way to think about this is ordering food at a restaurant. &lt;/p&gt;

&lt;p&gt;A push-based approach would assign you to a chef when you walked in the door, as a load balancer forwards a request. You’d wait for all other parties assigned to the chef to be served, hoping there aren’t many time-intensive dishes ahead of you, and wondering if anyone else was lucky enough to be assigned to an idle chef. If you left the restaurant for any reason, you’d be re-assigned when you came back in, losing your place in line.&lt;/p&gt;

&lt;p&gt;A pull-based approach is more similar to getting an order number. All guests have their orders taken when they walk in, and chefs work on the next order as they become available. You can walk around with your order number, check in on its status, or even cancel your order if it hasn’t been started. It’s more efficient for the chefs, but it requires writing things down and having a way to notify you when your food is ready, since you might not be standing next to the chef waiting.&lt;/p&gt;

&lt;p&gt;As a concrete code example, I recently put out a demo of distributed LLM computing: &lt;a href="https://labs.convex.dev/llama-farm" rel="noopener noreferrer"&gt;llama farm&lt;/a&gt;, where requests to the website that require llama3 are farmed out to workers. I can run these workers from the command line on a spare laptop, in containers hosted on &lt;a href="https://fly.io" rel="noopener noreferrer"&gt;fly.io&lt;/a&gt;, or even from browsers using &lt;a href="https://github.com/mlc-ai/web-llm" rel="noopener noreferrer"&gt;web-llm&lt;/a&gt;. The repo is an implementation of “work stealing,” which enables these llama workers to pull and process jobs at their discretion without exposing a port to http requests or requiring service discovery. Read &lt;a href="https://stack.convex.dev/implementing-work-stealing" rel="noopener noreferrer"&gt;this post&lt;/a&gt; about the implementation, or check out the &lt;a href="https://github.com/get-convex/llama-farm-chat" rel="noopener noreferrer"&gt;code&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;To learn more about work stealing and how it compares to more traditional load balancing, read on. For the sake of this article, I’ll use the example of processing LLM requests, but the techniques naturally extend to any high-latency or hardware-intensive workload.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do I need this?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; This decision assumes you are controlling your own infrastructure. If you are using a cloud service, such as using &lt;a href="https://replicate.com/" rel="noopener noreferrer"&gt;Replicate&lt;/a&gt; to serve and scale your models, you don’t have to worry about this - you are paying them to make these decisions and scale transparently. Most of the time this is the right way to start.&lt;/p&gt;

&lt;p&gt;Some reasons you might benefit from scaling your own infrastructure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Controlling data&lt;/strong&gt;: If you are unwilling to send data to a third party LLM, you can run your own machines and know how your data is being handled and used. This is especially important if you have data governance requirements preventing it from leaving a private network.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Controlling costs&lt;/strong&gt;: Cloud providers allow you to scale more granularly at a higher per-request cost. By deciding when and how many machines you run, you control your scale.

&lt;ul&gt;
&lt;li&gt;Note: I say controlling rather than reducing because, until you utilize your machines well, this is unlikely to save you money. In the case of &lt;a href="https://labs.convex.dev/llama-farm" rel="noopener noreferrer"&gt;llama farm&lt;/a&gt;, however, we avoid paying for dedicated GPUs altogether by leveraging existing idle hardware.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Controlling latency&lt;/strong&gt;: By controlling the routing and prioritization of requests, you can ensure tighter bounds on latency than you may get from a cloud provider, which is likely sharing resources with other customers and may not expose a mechanism for you to prioritize or cancel requests. Note: you’ll need to decide how to absorb spikes of traffic. Options include:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Over-provisioning&lt;/strong&gt; (or auto-scaling) your hardware to accept additional load.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shedding load&lt;/strong&gt; by rejecting requests (often with a 429 status code) and relying on clients to retry later.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Accepting high latency&lt;/strong&gt; during these periods, ideally isolated to low-priority traffic.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Push-based routing
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe760zi2wimy1tyl2chij.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe760zi2wimy1tyl2chij.png" alt="diagram of client request to api server to worker, then back after the task" width="800" height="514"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Traditionally, the web works via pushing, or sending, requests. A request (usually HTTP) gets routed to a machine based on its IP address. For compute-intensive tasks, a client typically hits an API endpoint, which doesn’t do the CPU-intensive operation itself but rather makes its own request to a pool of dedicated workers. Forwarding the work to other machines isolates the API server’s resources so it is available to serve other requests, while also allowing you to scale the workers on use-case-specific hardware, such as machines with GPUs, separately from the web servers. The API server returns the (potentially streamed) results to the client.&lt;/p&gt;

&lt;h3&gt;
  
  
  Benefits:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Serverless hosting&lt;/strong&gt;. On platforms where you only pay for the duration of a request, you can avoid running the machine between requests. For a worker to pull requests, it needs to be running continuously or auto-scaled by a monitoring service.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Standard&lt;/strong&gt;. It is easier to reason about latency, errors, and work attribution for a traditional request. By comparison, when a worker pulls work and publishes results, it is no longer within the call graph of the original request.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stateless&lt;/strong&gt;. When you hold open the client request and return the result directly from a worker, you don’t have to persist any state if you don’t want to.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Challenges:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Load balancing&lt;/strong&gt; needs to keep track of workers.

&lt;ul&gt;
&lt;li&gt;You have to guess which backend to send work to, or poll every worker for their state.&lt;/li&gt;
&lt;li&gt;When a backend starts or stops, something needs to update, whether it’s &lt;a href="https://www.consul.io/" rel="noopener noreferrer"&gt;Consul&lt;/a&gt;, &lt;a href="https://kubernetes.io/docs/reference/command-line-tools-reference/kube-proxy/" rel="noopener noreferrer"&gt;kube-proxy&lt;/a&gt;, &lt;a href="https://aws.amazon.com/elasticloadbalancing/" rel="noopener noreferrer"&gt;ELB&lt;/a&gt;, or otherwise. To stop a worker without incurring failures, you need to prevent the load balancer from sending new requests and then finishing existing ones.&lt;/li&gt;
&lt;li&gt;These updates can fail or take some time.&lt;/li&gt;
&lt;li&gt;All workers need to be discoverable and exposed to inbound http traffic. To run a worker on your local machine, you could use a service like &lt;a href="https://tunnelmole.com/" rel="noopener noreferrer"&gt;Tunnelmole&lt;/a&gt; or &lt;a href="https://ngrok.com/" rel="noopener noreferrer"&gt;ngrok&lt;/a&gt; to proxy traffic, which exposes you to public internet traffic.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Isolated queues&lt;/strong&gt;: if a worker has too many requests, it can only queue or reject.

&lt;ul&gt;
&lt;li&gt;Requests might not be started in the order they were received, and higher priority requests may be queued behind lower priority ones.

&lt;ul&gt;
&lt;li&gt;Often the queueing happens in the TCP socket connection, which can’t distinguish application-layer details, such as request priority or expected duration.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Per-machine queues &lt;a href="https://cacm.acm.org/research/the-tail-at-scale/#body-4" rel="noopener noreferrer"&gt;can cause high tail latency in distributed systems&lt;/a&gt;.

&lt;ul&gt;
&lt;li&gt;Some workers might be idle while others have a backlog of slow requests.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;HTTP connection lifecycle&lt;/strong&gt;: the API request needs to hold open both incoming and outgoing connections for the duration of the work.

&lt;ul&gt;
&lt;li&gt;If the client loses the connection, the operation can’t easily resume. Even with sticky connections, an API server could come up on a new machine during a deploy.&lt;/li&gt;
&lt;li&gt;This can results in low CPU utilization on the API server. If this is a serverless function, you may be paying for this idle time.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Pull-based work stealing
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmurfutg37dxwv4cou3wn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmurfutg37dxwv4cou3wn.png" alt="diagram of a user adding a task to a queue, it being run asynchronously, and results flowing to the user via a subscription" width="800" height="514"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Compared to push-based, in a pull-based system, workers take on or ”steal” work when they have capacity, and then publish the result. To see an implementation of work stealing, check out &lt;a href="https://stack.convex.dev/implementing-work-stealing" rel="noopener noreferrer"&gt;this post&lt;/a&gt; and this GitHub repo where I implement it for LLM-powered group chat:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://stack.convex.dev/implementing-work-stealing" rel="noopener noreferrer"&gt;Implementing work stealing with a reactive database&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/get-convex/llama-farm-chat" rel="noopener noreferrer"&gt;llama-farm-chat GitHubRepo&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Benefits
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Optimizing throughput&lt;/strong&gt; with consistent concurrency.

&lt;ul&gt;
&lt;li&gt;In non-user-facing workloads you want to utilize machines as efficiently as possible, which is especially common in AI applications where you want to crawl large amounts of data to generate embeddings. Instead of controlling how fast you push work and to which machines, having a large work queue consumed by workers that you can dynamically spin up allows for optimal utilization.&lt;/li&gt;
&lt;li&gt;For user-facing workloads, during spikes in load the API server knows how much work is in flight and can decide whether to reject or enqueue the work, as well as whether to re-order or cancel existing jobs based on priority.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;No load-balancing&lt;/strong&gt; or service discovery.

&lt;ul&gt;
&lt;li&gt;Workers can come and go without updating anything - they simply start requesting work.&lt;/li&gt;
&lt;li&gt;Workers only make outbound requests: they can safely run behind a NAT.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;No isolated queues&lt;/strong&gt;: workers don’t accumulate their own backlog.

&lt;ul&gt;
&lt;li&gt;By sharing a global queue, performance (latency) is more uniform and can be FIFO or globally prioritized.&lt;/li&gt;
&lt;li&gt;Workers decide when to take on work, and how much.&lt;/li&gt;
&lt;li&gt;To stop, they finish their requests and don’t request more.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Multiplexed subscriptions&lt;/strong&gt;: clients can start jobs, disconnect, and re-subscribe to results. In fact, many clients can be subscribed to the result, since it is persisted outside of the scope of an active http request.&lt;/li&gt;

&lt;/ul&gt;

&lt;h4&gt;
  
  
  Challenges
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Serverless hosting ecosystem&lt;/strong&gt;: workers need to be subscribed and are harder to dynamically wake up, compared to serverless hosting models that &lt;a href="https://fly.io/docs/apps/autostart-stop/" rel="noopener noreferrer"&gt;wake on incoming HTTP requests&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Failures are harder to detect&lt;/strong&gt;: the worker needs to periodically let the server know it’s still working. With “push” HTTP requests, failure can be detected automatically by the connection closing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Additional overhead&lt;/strong&gt;: every request is persisted and flows through a subscription mechanism such as a pub/sub service, or database queries in Convex. 

&lt;ul&gt;
&lt;li&gt;If the request is otherwise fast, the additional latency might be noticeable. It will affect the "average case" or "p50" performance.&lt;/li&gt;
&lt;li&gt;If the requests are frequent and don’t otherwise require much database bandwidth, the overhead of tracking requests might be noticeable.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Making the call: my experience
&lt;/h2&gt;

&lt;p&gt;While pull-based solutions have a lot of benefits, this decision is highly sensitive to your application’s needs. I’ll contextualize this with my own experience deciding between push- and pull-based solutions for task distribution.&lt;/p&gt;

&lt;p&gt;I used to run the team at Dropbox responsible for generating previews of user documents. If you’ve ever used the Dropbox website and looked through images, watched a video, or looked at a pdf preview of a Microsoft Office document, that file was processed by the system my team built and maintained. We thought deeply about load balancing, caching, and reliability. One fun statistic: if you removed the cache and processed the full file for every user’s request, it would amount to processing over one exabyte of data per day. &lt;/p&gt;

&lt;p&gt;When we were re-architecting it, we considered both push &amp;amp; pull.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why we wanted a pull-based solution:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;System utilization and maximizing throughput&lt;/strong&gt;. We had different classes of services optimized for different operations - video transcoding, windows emulation for office documents, etc. These machines knew their capabilities, and we were excited about a workflow where a machine could take on different types and quantities of work based on its available memory and CPU utilization.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Absorbing and shedding load.&lt;/strong&gt; There were occasionally spikes of load that meant either failing requests, or saturating each service’s http queue and driving up latency for all users. With a queue we could have had more control over which requests we dropped and which requests we could continue to prioritize.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Avoids service discovery.&lt;/strong&gt; Keeping track of which machines to route to introduces many opportunities for failure, especially during deployments:

&lt;ol&gt;
&lt;li&gt;When a backend dies, how soon will service discovery adapt?&lt;/li&gt;
&lt;li&gt;When a machine comes online, how soon will it be discovered?&lt;/li&gt;
&lt;li&gt;When service discovery fails (it does), how well can you keep serving traffic?&lt;/li&gt;
&lt;/ol&gt;


&lt;/li&gt;

&lt;/ol&gt;

&lt;h3&gt;
  
  
  Why we ended up with a push-based solution:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Our database wasn’t reactive:&lt;/strong&gt; determining from the backend when a job was available, or detecting when it was finished from the API server would have involved additional infrastructure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Predictability:&lt;/strong&gt; our infrastructure was based around discrete HTTP requests that flowed into gRPC services, not WebSockets and subscriptions. This influenced what tooling was readily available.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High volume, low latency&lt;/strong&gt;: although occasional requests took minutes, most of our traffic was very high volume and low latency (~5ms). Incurring a database write per request would have overwhelmed the database, and a pub/sub subscription was too heavyweight. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Uniformity:&lt;/strong&gt; there was a case for doing pull-based requests only for heavy operations like video transcoding, but we were a small team and didn’t want to maintain two sets of infrastructure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitoring:&lt;/strong&gt; we wanted to track latency and success statistics in a centralized place close to the request. Splitting the status across multiple metrics reported from different services and machines would have complicated our monitoring setup.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I don’t regret making that decision at the time. However, I do think some things have changed since then that enable the work stealing pattern for modern apps, such as &lt;strong&gt;reactive databases&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why reactive databases change the game
&lt;/h2&gt;

&lt;p&gt;One big challenge with workers pulling work is connecting the result back to the client. If you want to return it in the client’s original request, the worker needs to know which API server to send the work to, and get the result to the right thread or process waiting for the request. Or it needs to leverage a pub/sub system where the worker is subscribed only for its own results. This “return address” problem ends up requiring a lot of nuance to get right at scale. For example, how long should the pub/sub system wait before dropping the message?&lt;/p&gt;

&lt;p&gt;Since using Convex, I’ve come to appreciate separating data flow between queries, which are read-only, side-effect-free, consistent views of data, and mutations which are read-write transactions. This is an increasingly common separation, and greatly simplifies how you reason about data moving through a system, including for work stealing.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A client subscribes to a view of data with a query. In the case of a chat app, it subscribes to recent messages in a channel. Whenever those messages are updated, regardless of who updated them, it gets a fresh view of the data.&lt;/li&gt;
&lt;li&gt;The work can be submitted to the API server (Convex in my case) whether it’s within the context of a client request or not. In the case of my chat app, the original request creates a placeholder message, and submits a job to fill out the message. When the worker generates and submits the message — whether it’s a partial update or the final result — all the API endpoint needs to know is what record(s) to update in the database.&lt;/li&gt;
&lt;li&gt;The communication channel — how users end up seeing the message — is the same as the application’s transactional data persistence: the reactive database. There’s never a case where a client receives a result but the result was never persisted, or where the result was recorded but the client missed the update.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I was pleasantly surprised how quickly this pattern came together for &lt;a href="https://github.com/get-convex/llama-farm-chat" rel="noopener noreferrer"&gt;llama farm&lt;/a&gt;, and am excited to see what novel architectures this enables.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;In this post we compared push-based load balancing with pull-based work stealing, as ways of distributing resource-intensive workloads. While the former is the traditional strategy, the latter brings a lot of benefits, provided you are able to separate your reads and writes.&lt;/p&gt;

&lt;p&gt;Next steps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;To learn more about optimizing for latency, I recommend reading the paper “&lt;a href="https://cacm.acm.org/research/the-tail-at-scale/" rel="noopener noreferrer"&gt;The Tail at Scale&lt;/a&gt;” if you haven’t read it already.&lt;/li&gt;
&lt;li&gt;To see an implementation of work stealing, read &lt;a href="https://stack.convex.dev/implementing-work-stealing" rel="noopener noreferrer"&gt;this post&lt;/a&gt; about implementing llama farm, or read the code &lt;a href="https://github.com/get-convex/llama-farm-chat" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Learn more about how Convex works &lt;a href="https://stack.convex.dev/how-convex-works" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;




&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;If you’re curious where the term “work stealing” comes from, &lt;a href="https://en.wikipedia.org/wiki/Work_stealing" rel="noopener noreferrer"&gt;it is a scheduling algorithm for parallel computing&lt;/a&gt;, specifically when there is one queue of tasks per process where another process can “steal” tasks (threads to execute) while idle. In our example, we simplify it to have one queue for the sake of explanation. At high scale, this technique at scale involves dividing requests into multiple queues, and having each worker interact primarily with a one or a subset of them, and stealing work from other queues when idle. Let me know &lt;a href="https://convex.dev/community" rel="noopener noreferrer"&gt;in Discord&lt;/a&gt; if you'd be interested in an article on doing this at scale. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>backend</category>
      <category>convex</category>
      <category>javascript</category>
    </item>
    <item>
      <title>Supercharge `npm run dev` with package.json scripts</title>
      <dc:creator>Ian Macartney</dc:creator>
      <pubDate>Wed, 24 Jul 2024 00:07:51 +0000</pubDate>
      <link>https://forem.com/ianmacartney/supercharge-npm-run-dev-with-packagejson-scripts-4l63</link>
      <guid>https://forem.com/ianmacartney/supercharge-npm-run-dev-with-packagejson-scripts-4l63</guid>
      <description>&lt;p&gt;&lt;code&gt;npm run dev&lt;/code&gt; is the standard for "run my website locally," but how does it work? How can we expand its functionality? In this post we'll look at:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How to configure what &lt;code&gt;npm run dev&lt;/code&gt; does.&lt;/li&gt;
&lt;li&gt;How to decompose complex commands into granular units.&lt;/li&gt;
&lt;li&gt;How to run multiple commands in parallel.&lt;/li&gt;
&lt;li&gt;How to run pre-requisites without losing normal &lt;code&gt;Ctrl-C&lt;/code&gt; behavior.&lt;/li&gt;
&lt;li&gt;How to add seed data (if none exists) when starting up a Convex backend.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As a motivating example, here are some scripts defined in the &lt;a href="https://github.com/get-convex/convex-helpers" rel="noopener noreferrer"&gt;convex-helpers&lt;/a&gt; example app. We'll cover what each piece does&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nl"&gt;"scripts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"dev"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npm-run-all --parallel dev:backend dev:frontend"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"build"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tsc &amp;amp;&amp;amp; vite build"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"dev:backend"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"convex dev"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"dev:frontend"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"vite"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"predev"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"convex dev --until-success"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"test"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"vitest"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How and where they're defined
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;npm run&lt;/code&gt; runs commands that are defined in your &lt;code&gt;package.json&lt;/code&gt; in your project's workspace. These commands are often pre-configured when you start your repo from a command like &lt;code&gt;npm create vite@latest&lt;/code&gt; with commands for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;dev&lt;/code&gt;: Run a development environment. This often includes autor-reloading the UI when files change. For &lt;a href="https://vitejs.dev/" rel="noopener noreferrer"&gt;Vite&lt;/a&gt; this is &lt;code&gt;vite&lt;/code&gt; and &lt;a href="https://nextjs.org/" rel="noopener noreferrer"&gt;Next.js&lt;/a&gt; is &lt;code&gt;next dev&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;build&lt;/code&gt;: Build the website for deployment. This will generally compile and bundle all your html, css, and javascript. For Vite this is &lt;code&gt;vite build&lt;/code&gt; and Next.js is &lt;code&gt;next build&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;test&lt;/code&gt;: Run tests - if you're using &lt;a href="https://jestjs.io" rel="noopener noreferrer"&gt;Jest&lt;/a&gt;, it's just &lt;code&gt;"test": "jest"&lt;/code&gt; or &lt;code&gt;vitest&lt;/code&gt; for &lt;a href="https://vitest.dev/" rel="noopener noreferrer"&gt;Vitest&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's a basic example from Next.js:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// in package.json&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="c1"&gt;// ...&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;scripts&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;dev&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;next dev&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;build&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;next build&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;start&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;next start&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;lint&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;next lint&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="c1"&gt;//...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here you can run &lt;code&gt;npm run dev&lt;/code&gt; or &lt;code&gt;npm run lint&lt;/code&gt; etc.&lt;/p&gt;

&lt;p&gt;You can learn more about &lt;code&gt;npm run&lt;/code&gt; in &lt;a href="https://docs.npmjs.com/cli/v10/commands/npm-run-script" rel="noopener noreferrer"&gt;the docs&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why use scripts?
&lt;/h2&gt;

&lt;p&gt;It's a fair question why one would put commands that already so simple into package scripts. Why not just call &lt;code&gt;jest&lt;/code&gt; or &lt;code&gt;vite&lt;/code&gt; or &lt;code&gt;next build&lt;/code&gt;? There's a few good reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You can save the default parameters for commands so you don't have to remember or document the "standard" way of starting something. We'll see below how you can configure it to chain commands and run others in parallel.&lt;/li&gt;
&lt;li&gt;It allows you to easily run commands that are installed by &lt;code&gt;npm&lt;/code&gt; but not globally accessible from your shell (terminal).&lt;sup id="fnref1"&gt;1&lt;/sup&gt; When you install things like &lt;code&gt;npm install -D vitest&lt;/code&gt;, it installs &lt;code&gt;vitest&lt;/code&gt; into &lt;code&gt;node_modules/.bin&lt;/code&gt;.&lt;sup id="fnref2"&gt;2&lt;/sup&gt; You can't run &lt;code&gt;vitest&lt;/code&gt; directly in your shell,&lt;sup id="fnref3"&gt;3&lt;/sup&gt; but you can have a config like: &lt;code&gt;"scripts": { "test": "vitest" }&lt;/code&gt; and &lt;code&gt;npm run test&lt;/code&gt; will run &lt;code&gt;vitest&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;It always runs with the root of the package folder as the "current directory" even if you're in a subdirectory. So you can define a script like &lt;code&gt;"foo": "./myscript.sh"&lt;/code&gt; and it will always look for &lt;code&gt;myscript.sh&lt;/code&gt; in the package root (in the same directory as package.json). Note: you can access the current directory where it was called via the &lt;code&gt;INIT_CWD&lt;/code&gt; environment variable.&lt;/li&gt;
&lt;li&gt;You can reference variables in the &lt;code&gt;package.json&lt;/code&gt; easily when the script is run from &lt;code&gt;npm run&lt;/code&gt;. For instance, you can access the "version" of your package with the &lt;code&gt;npm_package_version&lt;/code&gt; environment variable, like &lt;code&gt;process.env.npm_package_version&lt;/code&gt; in js or &lt;code&gt;$npm_package_version&lt;/code&gt; in a script.&lt;/li&gt;
&lt;li&gt;If you have multiple workspaces (many directories with their own package.json configured into a parent package.json with a "workspaces" config), you can run the same command in all workspaces with &lt;code&gt;npm test --workspaces&lt;/code&gt; or one with &lt;code&gt;npm run lint --workspace apps/web&lt;/code&gt;. &lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Does &lt;code&gt;npm run dev&lt;/code&gt; work with yarn / pnpm / bun?
&lt;/h2&gt;

&lt;p&gt;Yes! Even if you install your dependencies with another package manager, you can still run your package scripts with npm.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;yarn &lt;span class="c"&gt;# similar to `npm install`&lt;/span&gt;
npm run dev &lt;span class="c"&gt;# still works!&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You don't have to remember that &lt;code&gt;npm run dev&lt;/code&gt; maps to &lt;code&gt;yarn dev&lt;/code&gt; (or &lt;code&gt;yarn run dev&lt;/code&gt;). The same goes for &lt;code&gt;npx&lt;/code&gt;: &lt;code&gt;npx convex dev&lt;/code&gt; works regardless of what package manager you used to install things.&lt;/p&gt;

&lt;h2&gt;
  
  
  Running commands in parallel
&lt;/h2&gt;

&lt;p&gt;There are a couple packages you can use to run commands concurrently:&lt;sup id="fnref4"&gt;4&lt;/sup&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://github.com/mysticatea/npm-run-all" rel="noopener noreferrer"&gt;&lt;code&gt;npm-run-all&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://npmjs.org/package/concurrently" rel="noopener noreferrer"&gt;&lt;code&gt;concurrently&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We'll just look at &lt;code&gt;npm-run-all&lt;/code&gt; here. Consider our example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nl"&gt;"scripts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"dev"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npm-run-all --parallel dev:backend dev:frontend"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"dev:backend"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"convex dev"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"dev:frontend"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"vite"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This defines three scripts. &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;npm run dev:backend&lt;/code&gt; runs &lt;code&gt;convex dev&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;npm run dev:frontend&lt;/code&gt; runs &lt;code&gt;vite&lt;/code&gt;. &lt;/li&gt;
&lt;li&gt;
&lt;code&gt;npm run dev&lt;/code&gt; runs both &lt;code&gt;convex dev&lt;/code&gt; and &lt;code&gt;vite&lt;/code&gt; in parallel via &lt;code&gt;npm-run-all&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Both outputs are streamed out, and doing Ctrl-C will interrupt both scripts.&lt;/p&gt;

&lt;h2&gt;
  
  
  predev? postbuild?
&lt;/h2&gt;

&lt;p&gt;You can specify commands to run before (pre) or after (post) another command (say, X) by naming your command &lt;code&gt;preX&lt;/code&gt; or &lt;code&gt;postX&lt;/code&gt;. In the example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nl"&gt;"scripts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"dev"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npm-run-all --parallel dev:backend dev:frontend"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"dev:backend"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"convex dev"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"dev:frontend"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"vite"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"predev"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"convex dev --until-success"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will run &lt;code&gt;convex dev --until-success&lt;/code&gt;, before the "dev" command of &lt;code&gt;npm-run-all --parallel dev:backend dev:frontend&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Chaining with "&amp;amp;&amp;amp;"
&lt;/h3&gt;

&lt;p&gt;For those used to shell scripting, you can run two commands in sequence if the previous one succeeds with &lt;code&gt;commandA &amp;amp;&amp;amp; commandB&lt;/code&gt;. This works on both Windows and Unix (mac / linux). &lt;/p&gt;

&lt;p&gt;However, there's a couple advantages to just using &lt;code&gt;pre&lt;/code&gt;-scripts:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You can run either command with &lt;code&gt;npm run dev --ignore-scripts&lt;/code&gt; to not do the "predev" script, or &lt;code&gt;npm run predev&lt;/code&gt; to  explicitly only do the "predev" step.&lt;/li&gt;
&lt;li&gt;The Ctrl-C behavior is more predictable in my experience. In different shell environments, doing Ctrl-C (which sends an interrupt signal to the current process) would sometimes kill the first script but still run the second script. After many attempts we decided to switch to "predev" as the pattern.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Run interactive steps first
&lt;/h2&gt;

&lt;p&gt;For Convex, when you first run &lt;code&gt;npx convex dev&lt;/code&gt; (or &lt;code&gt;npm run dev&lt;/code&gt; with the above scripts), it will ask you to log in if you aren't already, and ask you to set up your project if one isn't already set up. This is great, but interactive commands that update the output text don't work well when the output is being streamed by multiple commands at once. This is the motivation for running &lt;code&gt;npx convex dev --until-success&lt;/code&gt; before &lt;code&gt;npx convex dev&lt;/code&gt;. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;convex dev&lt;/code&gt; syncs your functions and schema whenever it doesn't match what you have deployed, watching for file changes. &lt;/li&gt;
&lt;li&gt;The &lt;code&gt;--until-success&lt;/code&gt; flag syncs your functions and schema only until it succeeds once, telling you what to fix if something is wrong and retrying automatically until it succeeds or you Ctrl-C it. &lt;/li&gt;
&lt;li&gt;By running &lt;code&gt;npx convex dev --until-success&lt;/code&gt;, we can go through the login, project configuration, and an initial sync, all before trying to start up the frontend and backend. &lt;/li&gt;
&lt;li&gt;The initial sync is especially helpful if it catches issues like missing environment variables which need to be set before your app can function. &lt;/li&gt;
&lt;li&gt;This way the frontend doesn't start until the backend is ready to handle requests with the version of functions it expects.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Seeding data on startup
&lt;/h2&gt;

&lt;p&gt;If you change your "predev" command for Convex to include &lt;code&gt;--run&lt;/code&gt; it will run a server-side function before your frontend has started.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nl"&gt;"scripts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="err"&gt;//...&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"predev"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"convex dev --until-success --run init"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="err"&gt;//...&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;--run init&lt;/code&gt; command will run a function that is the default export in &lt;code&gt;convex/init.ts&lt;/code&gt;. You could also have run &lt;code&gt;--run myFolder/myModule:myFunction&lt;/code&gt;. See &lt;a href="https://docs.convex.dev/functions/query-functions#query-names" rel="noopener noreferrer"&gt;docs on naming here&lt;/a&gt;. See &lt;a href="https://stack.convex.dev/seeding-data-for-preview-deployments" rel="noopener noreferrer"&gt;this post&lt;/a&gt; on seeding data but the gist is that you can define an &lt;a href="https://docs.convex.dev/functions/internal-functions" rel="noopener noreferrer"&gt;internalMutation&lt;/a&gt; that checks if the database is empty, and if so inserts a collection of records for testing / setup purposes.&lt;/p&gt;

&lt;h2&gt;
  
  
  tsc?
&lt;/h2&gt;

&lt;p&gt;If you use TypeScript, you can run a type check / compile your typescript files with a bare &lt;code&gt;tsc&lt;/code&gt;. If your &lt;code&gt;tsconfig.json&lt;/code&gt; is configured to emit types, it will write out the types. If not, it will just validate the types. This is great to do as part of the build, so you don't build anything that has type errors. This is why the above example did:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;build&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;tsc &amp;amp;&amp;amp; vite build&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How to pass arguments?
&lt;/h2&gt;

&lt;p&gt;If you want to pass arguments to a command, for instance passing arguments to your testing command to specify what test to run, you can pass them &lt;strong&gt;after&lt;/strong&gt; a &lt;code&gt;--&lt;/code&gt; to separate the command from the argument. Technically you don't need &lt;code&gt;--&lt;/code&gt; if your arguments are positional instead of &lt;code&gt;-&lt;/code&gt;-prefixed, but it doesn't hurt to always do it in case you forget which to do it for.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm run &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="nt"&gt;--grep&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"pattern"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;We looked at some ways of using package.json scripts to simplify our workflows. Who knew how much power could rest behind a simple &lt;code&gt;npm run dev&lt;/code&gt;? Looking at our original example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nl"&gt;"scripts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"dev"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npm-run-all --parallel dev:backend dev:frontend"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"build"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tsc &amp;amp;&amp;amp; vite build"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"dev:backend"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"convex dev"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"dev:frontend"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"vite"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"predev"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"convex dev --until-success"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"test"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"vitest"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;dev&lt;/code&gt; runs the frontend and backend in parallel, after &lt;code&gt;predev&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;build&lt;/code&gt; does type checking via &lt;code&gt;tsc&lt;/code&gt; before building the static site.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;dev:backend&lt;/code&gt; continuously deploys the backend functions to your development environment as you edit files.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;dev:frontend&lt;/code&gt; runs a local frontend server that auto-reloads as you edit files.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;predev&lt;/code&gt; runs before &lt;code&gt;dev&lt;/code&gt; and does an initial deployment, handling login, configuration, and an initial sync as necessary.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;test&lt;/code&gt; uses Vitest to run tests. Note: &lt;code&gt;npm test&lt;/code&gt; is shorthand for &lt;code&gt;npm run test&lt;/code&gt; along with other commands, but they're special cases. &lt;code&gt;npm run test&lt;/code&gt; is the habit I suggest.&lt;/li&gt;
&lt;/ul&gt;




&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;The way your shell finds which command to run when you type &lt;code&gt;npm&lt;/code&gt; is to check the shell's &lt;code&gt;PATH&lt;/code&gt; environment variable (on unix machines anyways). You can see your own with &lt;code&gt;echo "$PATH"&lt;/code&gt;. It checks all the places specified in &lt;code&gt;$PATH&lt;/code&gt; and uses the first one.  ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn2"&gt;
&lt;p&gt;Technically you can override &amp;amp; specify where npm installs binaries. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn3"&gt;
&lt;p&gt;If you really want to, you can run &lt;code&gt;npm exec vitest&lt;/code&gt;, &lt;code&gt;npx vitest&lt;/code&gt; for short, &lt;code&gt;./npm_modules/.bin/vitest&lt;/code&gt; directly, or add &lt;code&gt;.npm_modules/.bin&lt;/code&gt; to your PATH. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn4"&gt;
&lt;p&gt;Some people use a bare &lt;code&gt;&amp;amp;&lt;/code&gt; to run one task in the background, but that is not supported on Windows, and interrupting one command won't necessarily kill the other. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>webdev</category>
      <category>javascript</category>
      <category>npm</category>
      <category>convex</category>
    </item>
    <item>
      <title>Convert your .json array to a .jsonl (JSON Lines)</title>
      <dc:creator>Ian Macartney</dc:creator>
      <pubDate>Wed, 24 Jul 2024 00:05:57 +0000</pubDate>
      <link>https://forem.com/ianmacartney/convert-your-json-array-to-a-jsonl-json-lines-gfp</link>
      <guid>https://forem.com/ianmacartney/convert-your-json-array-to-a-jsonl-json-lines-gfp</guid>
      <description>&lt;p&gt;&lt;a href="https://jsonlines.org/" rel="noopener noreferrer"&gt;JSON Lines&lt;/a&gt; is a file format that stores one JSON object per line in a file. This is more scalable than storing it as a JSON array, since most JSON parsers require loading the full stringified array into memory before it can parse it. With JSON Lines, your code can load one line at a time. If you don't hold onto each object (for instance, you do some operation and save it elsewhere), your code can theoretically read an infinitely long file without running out of memory. It also allows you to easily read a stream of data. &lt;/p&gt;

&lt;p&gt;To make these files, each object needs to be on its own line. and the file extension is .json*&lt;em&gt;l&lt;/em&gt;*&lt;/p&gt;

&lt;p&gt;For Convex &lt;a href="https://docs.convex.dev/database/import-export/import" rel="noopener noreferrer"&gt;import&lt;/a&gt;, we allow you to specify either format: a &lt;code&gt;.json&lt;/code&gt; or &lt;code&gt;.jsonl&lt;/code&gt;. However, we limit the &lt;code&gt;.json&lt;/code&gt; size to 8MB. So one question that comes up is &lt;strong&gt;how can you transform your &lt;code&gt;.json&lt;/code&gt; file into &lt;code&gt;.jsonl&lt;/code&gt;?&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Using &lt;code&gt;jq&lt;/code&gt; to convert .json to .jsonl
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://jqlang.github.io/jq/" rel="noopener noreferrer"&gt;&lt;code&gt;jq&lt;/code&gt;&lt;/a&gt; is a common tool for working with JSON from the commandline.&lt;/p&gt;

&lt;p&gt;Here's the one-liner to convert a &lt;code&gt;.json&lt;/code&gt; file into &lt;code&gt;.jsonl&lt;/code&gt; using &lt;code&gt;jq&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;jq &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s1"&gt;'.[]'&lt;/span&gt; ./mydata.json &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; mydata.jsonl
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Example
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Input: &lt;code&gt;./mydata.json&lt;/code&gt;&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"foo"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bar"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"easy_as"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;123&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output: &lt;code&gt;./mydata.jsonl&lt;/code&gt;&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;foo&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;bar&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;easy_as&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;123&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
    </item>
    <item>
      <title>Implementing Rate Limiting with only two numbers</title>
      <dc:creator>Ian Macartney</dc:creator>
      <pubDate>Wed, 24 Jul 2024 00:03:03 +0000</pubDate>
      <link>https://forem.com/ianmacartney/implementing-rate-limiting-with-only-two-numbers-36i8</link>
      <guid>https://forem.com/ianmacartney/implementing-rate-limiting-with-only-two-numbers-36i8</guid>
      <description>&lt;p&gt;There’s a lot of buzz about rate limiting, in part because of the cost exposure associated with LLM workloads and other resource-intensive patterns. Especially for apps that have a freemium model or any use that isn’t correlated with revenue, a single user can have a meaningful impact on your costs if they’re allowed to fire off thousands of requests . &lt;/p&gt;

&lt;p&gt;There’s a host of advice and services that can help you solve this problem for different applications, but I’d like to show you how simple it can be to implement when you have fast access to a database with strong ACID guarantees&lt;sup id="fnref1"&gt;1&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Specifically, in this post I’m going to look at implementing the following, storing just two numbers per rate limit.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Token bucket:&lt;/strong&gt; Enable limiting the overall request rate for a sliding window, while also accommodating bounded bursts of traffic after a period of inactivity.  For example, the number of requests should be limited to X per hour plus up to Y more “rollover minutes” from previous hours. Tokens become available continuously over time. &lt;strong&gt;This is what I recommend for most use-cases.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fixed window:&lt;/strong&gt; For example, in each hour there can be no more than X requests to some third-party service, to avoid hitting their rate limits. When the hour is up, all tokens are available again. We’ll also discuss using “jitter” to avoid &lt;a href="https://en.wikipedia.org/wiki/Thundering_herd_problem" rel="noopener noreferrer"&gt;thundering herds&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For those who just want to use a library, I made one (&lt;a href="https://github.com/get-convex/convex-helpers/blob/main/packages/convex-helpers/server/rateLimit.ts" rel="noopener noreferrer"&gt;code here&lt;/a&gt;) available in &lt;a href="https://www.npmjs.com/package/convex-helpers#rate-limiting" rel="noopener noreferrer"&gt;&lt;code&gt;convex-helpers&lt;/code&gt;&lt;/a&gt;. You can see examples below.&lt;/p&gt;

&lt;p&gt;For the sake of this article, when I refer to “tokens” it is using the mental model of rate limiting where you are granted a certain number of tokens per some time period, and when a request successfully “consumes” them it can proceed. I’ll say a “debt” or “deficit” when tokens are over-consumed, as we’ll see later in exploring reservations.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is application-layer rate limiting
&lt;/h2&gt;

&lt;p&gt;In this article, we are going to talk about application-layer rate limiting. Specifically, the controls you have available when you are aware of a user and the operation they are trying to perform, rather than the networking layer, which is the responsibility domain of the hosting platform provider. To prevent a request from being made in the first place, you can also look into client-side throttling, such as &lt;a href="https://stack.convex.dev/throttling-requests-by-single-flighting" rel="noopener noreferrer"&gt;single-flighting&lt;/a&gt; requests.&lt;/p&gt;

&lt;p&gt;In practice, application layer rate limiting is the most useful and only falls down during extreme load, such as a distributed denial of service attack (DDOS) which thankfully are extremely rare. More commonly there is a small number of users, whether malicious or other otherwise, who are consuming more resources than you expected. The cost incurred of such "attacks" are often in the expensive requests to auto-scaling third party services, such as hosted LLMs for AI apps.&lt;sup id="fnref2"&gt;2&lt;/sup&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Benefits of these implementations
&lt;/h3&gt;

&lt;p&gt;The rate limits discussed here have these properties:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Efficient storage and compute&lt;/strong&gt;: in particular, it doesn’t require crons or storage that scales with load. Each rate limit (a combination of a name and a “key”) stores two numbers and does simple math.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transactional evaluation:&lt;/strong&gt; You can make multiple decisions using multiple rate limits and be ensured that they’ll all be consumed or none will be (if you roll back by throwing an exception, for instance).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fairness via opt-in credit “reservation”&lt;/strong&gt;: By using &lt;code&gt;reserve: true&lt;/code&gt; below, I’ll show how you can pre-allocate tokens and schedule work that doesn’t require client backoff, and doesn’t starve large requests.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Opt-in “rollover” allowance:&lt;/strong&gt; Allowing clients that have been under-consuming resources to accumulate tokens that “roll over” to the next period up to some limit, so they can service bursts of traffic while limiting average usage by a token bucket.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deterministic:&lt;/strong&gt; The results of these approaches will give you concrete bounds on usage, do not rely on probability, and will not “drift” over time. They also can determine the next time a retry could succeed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fail closed&lt;/strong&gt;: If the system is under heavy load, it won’t fling open the gates for the traffic to overwhelm other services and cause &lt;a href="https://en.wikipedia.org/wiki/Cascading_failure#:~:text=A%20cascading%20failure%20is%20a%20failure%20in%20a,probability%20that%20other%20portions%20of%20the%20system%20fail." rel="noopener noreferrer"&gt;cascading failure&lt;/a&gt;, as other “fail open” solutions can. This is an easy property to satisfy, since the application database is being used for the rate limit. If the application database is unavailable, continuing to serve the request is unlikely to succeed anyways. Failing open makes more sense when adding additional infrastructure or services that could introduce a single point of failure (SPOF) such as single-host in-memory service.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The algebra of rate limits
&lt;/h2&gt;

&lt;p&gt;Here is how you calculate rate limits, using just to numbers: &lt;code&gt;value&lt;/code&gt; and &lt;code&gt;ts&lt;/code&gt;. Here is the Convex database schema for it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="nx"&gt;rateLimits&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;defineTable&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;optional&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt; &lt;span class="c1"&gt;// undefined is singleton&lt;/span&gt;
  &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;number&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="c1"&gt;// can go negative if capacity is reserved ahead of time&lt;/span&gt;
  &lt;span class="na"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;number&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
&lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;name&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;name&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;key&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each of the following approaches have some basic error checking omitted for brevity, such as checking that the requested number of units is less than the maximum possible (which will never succeed). &lt;/p&gt;

&lt;p&gt;See below for modifications to accommodate reserving tokens.&lt;/p&gt;

&lt;h3&gt;
  
  
  Token bucket
&lt;/h3&gt;

&lt;p&gt;Instead of a traditional approach to a sliding window in which there are discrete jumps in value based on past events, we can use a token bucket to provide similar benefits, with a much more efficient storage and runtime footprint. &lt;/p&gt;

&lt;p&gt;We model tokens as being continuously provided, with a &lt;code&gt;capacity&lt;/code&gt; defined in the config. If your configured &lt;code&gt;rate&lt;/code&gt; is 10 in a &lt;code&gt;period&lt;/code&gt; of a minute and you use 5 tokens, they will be fully restored in 30 seconds. Since we model it continuously, one credit will be restored every six seconds. So you could be consuming one credit every six seconds, five every thirty seconds, or ten every minute. When you don’t use all your tokens, they can accumulate up to the defined &lt;code&gt;capacity&lt;/code&gt; amount. With &lt;code&gt;capacity&lt;/code&gt;, you can use more than the normal rate for a period of time, resting assured that all of that capacity was accumulated during idle time. Because the tokens are issued at a fixed rate, the overall credit consumption is bound to that rate.  If it isn’t set, &lt;code&gt;capacity&lt;/code&gt; defaults to &lt;code&gt;rate&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;TokenBucketRateLimit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;token bucket&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;rate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;period&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;capacity&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// defaults to rate&lt;/span&gt;
  &lt;span class="nl"&gt;maxReserved&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--MEVqAbg7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.sanity.io/images/ts10onj4/production/59cf030b9be15f734123b48d88a8a9e86681bffe-1252x809.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--MEVqAbg7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.sanity.io/images/ts10onj4/production/59cf030b9be15f734123b48d88a8a9e86681bffe-1252x809.png" alt="Token bucket rate limiting works by continuously adding " width="800" height="517"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The core calculation is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;now&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;elapsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;now&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nx"&gt;ts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;now&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;elapsed&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rate&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;period&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;capacity&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rate&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;We keep track of the last time we calculated the state of the bucket as &lt;code&gt;ts&lt;/code&gt; and the &lt;code&gt;value&lt;/code&gt; at that time.&lt;/li&gt;
&lt;li&gt;We calculate the current value since then, capping it at the capacity of the bucket. If there is no capacity configured, we default to the rate. So if you allow 10 per second and don't specify a capacity, you can have up to 10 tokens in the bucket.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Full code, including handling reservations:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;now&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="c1"&gt;// Fetch the existing value &amp;amp;amp; ts from the database, if present.&lt;/span&gt;
&lt;span class="c1"&gt;// If the key is undefined, it will fetch the shared value for the name.&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;existing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;rateLimits&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withIndex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;name&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;q&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;gt&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;q&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;eq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;name&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;eq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;key&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;unique&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="c1"&gt;// If there isn't a capacity defined, default to the rate&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;max&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;capacity&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rate&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;consuming&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Start of token-bucket-specific code&lt;/span&gt;
&lt;span class="c1"&gt;// Default to the maximum available right now.&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;existing&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;max&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;now&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;elapsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;now&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;rate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rate&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;period&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// I appologize for the rate naming.&lt;/span&gt;
&lt;span class="c1"&gt;// The current value is whatever accumulated since the last evaluation up to max.&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;elapsed&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;rate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;max&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;consuming&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;now&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;retryAt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;lt&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;consuming&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="c1"&gt;// not enough capacity currently&lt;/span&gt;
  &lt;span class="nx"&gt;retryAt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;now&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;rate&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="c1"&gt;// End of token-bucket-specific code&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;reserve&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;maxReserved&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;amp&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;amp&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;  &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;gt&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;maxReserved&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;retryAt&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;existing&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;patch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;existing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ts&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;rateLimits&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;retryAt&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h4&gt;
  
  
  Some things to keep in mind:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;If you are sensitive to bursts of traffic, for instance if you’re using a third party API that has a hard rate limit cap, note that with this approach you could have as many as &lt;code&gt;rate + capacity&lt;/code&gt; requests in a single window, if the accumulated capacity is all consumed at once, and then the accumulated tokens consumed at the end of the window. For these scenarios, you can:

&lt;ul&gt;
&lt;li&gt;Use &lt;code&gt;fixed window&lt;/code&gt; to divide capacity into fixed windows.&lt;/li&gt;
&lt;li&gt;Set &lt;code&gt;capacity&lt;/code&gt; and &lt;code&gt;rate&lt;/code&gt; to both be half of the third party hard cap, so worst case you use a burst of half, then use the accumulated amount before the window is over. The downside is your average consumption is limited to half of what’s available.&lt;/li&gt;
&lt;li&gt;Set &lt;code&gt;capacity&lt;/code&gt; to be smaller than &lt;code&gt;rate&lt;/code&gt;, which allows using more steady-state bandwidth, so long as the requests are small and somewhat frequent. For example if you had an hourly budget of 70 tokens, you could have a rate of 60 and a capacity of 10. You could consume one token per minute, or 5 tokens every five minutes, but if you didn’t consume anything for 15 minutes, you’d only have accumulated 10 tokens.&lt;/li&gt;
&lt;li&gt;Set &lt;code&gt;capacity&lt;/code&gt; to zero and always use &lt;code&gt;reserve&lt;/code&gt; and scheduling to perfectly space requests based on their needs. You lose the benefits of accumulated bandwidth, and all requests will suffer some delay, so only use this for time-insensitive workloads.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;If many clients are waiting for the same rate limit to have bandwidth, and you aren’t using the &lt;code&gt;reserve&lt;/code&gt; technique, you should add some jitter before returning it to a client, so each client attempts at a different time.&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Fixed window
&lt;/h3&gt;

&lt;p&gt;When you need your rate limiting windows to be rigid, you can use this more traditional approach, where tokens are issued at distinct intervals and can be used during that interval. We also extend it with an optional &lt;code&gt;capacity&lt;/code&gt; configuration to allow unused tokens to accumulate, allowing us to control the overall usage while accommodating traffic that isn’t consistent.&lt;/p&gt;

&lt;p&gt;Config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;FixedRateLimit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;fixed window&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;rate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;period&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;capacity&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// defaults to rate&lt;/span&gt;
  &lt;span class="nl"&gt;start&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--9BtGQZtn--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.sanity.io/images/ts10onj4/production/90d37a967098fe521eff55348a539796c4f96015-1244x871.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--9BtGQZtn--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.sanity.io/images/ts10onj4/production/90d37a967098fe521eff55348a539796c4f96015-1244x871.png" alt="Fixed window rate limiting adds tokens at discrete intervals to be used to service requests. Requests that need more tokens have to wait until the next window." width="800" height="560"&gt;&lt;/a&gt;&lt;br&gt;
The core calculation for value at a given time is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;elapsedWindows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;floor&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;period&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nx"&gt;ts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ts&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;elapsedWindows&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;period&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rate&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;elapsedWindows&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
  &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;capacity&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rate&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;We use &lt;code&gt;ts&lt;/code&gt; to mark the start of the window, and don’t update it until we are consuming resources in a more recent window, at which point we add tokens for each window that started since then.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;value&lt;/code&gt; holds the tokens available at that timestamp.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Full code, including reservations:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;now&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="c1"&gt;// Fetch the existing value &amp;amp;amp; ts from the database, if present.&lt;/span&gt;
&lt;span class="c1"&gt;// If the key is undefined, it will fetch the shared value for the name.&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;existing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;rateLimits&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withIndex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;name&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;q&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;gt&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;q&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;eq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;name&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;eq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;key&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;unique&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="c1"&gt;// If there isn't a capacity defined, default to the rate&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;max&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;capacity&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rate&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;consuming&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Start of fixed-window-specific code&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;existing&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// If there wasn't a value or start time, default to a random time.&lt;/span&gt;
  &lt;span class="nl"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;start&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;random&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;period&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;max&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// start at full capacity&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;elapsedWindows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;floor&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;period&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// Add value for each elapsed window&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rate&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;elapsedWindows&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;max&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;consuming&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;// Move ts forward to the start of this window&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ts&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;elapsedWindows&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;period&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;retryAt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;lt&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;windowsNeeded&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ceil&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rate&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;retryAt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;ts&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;period&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;windowsNeeded&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="c1"&gt;// End of fixed-window-specific code&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;reserve&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;maxReserved&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;amp&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;amp&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;  &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;gt&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;maxReserved&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;retryAt&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;existing&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;patch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;existing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ts&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;rateLimits&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;retryAt&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;&lt;code&gt;start&lt;/code&gt; is the offset from &lt;code&gt;0&lt;/code&gt; UTC, in this case to align the start of the period with midnight in the PDT timezone. This is handy for aesthetically aligning requests with a user’s midnight or starting on the hour or on the minute, but if the rate limit will see a lot of concurrent usage, this can lead to many clients all waiting until midnight to fire off requests and causing a &lt;a href="https://en.wikipedia.org/wiki/Thundering_herd_problem" rel="noopener noreferrer"&gt;thundering herd&lt;/a&gt;. For these situations you should add jitter, or omit &lt;code&gt;start&lt;/code&gt;. If you don’t provide &lt;code&gt;start&lt;/code&gt;, it will use the “key” to assign a random time as the start time.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; if you allow for &lt;code&gt;capacity&lt;/code&gt;, the maximum number of tokens used in a given period will be &lt;code&gt;capacity&lt;/code&gt;, whereas the token bucket implementation above could use a maximum &lt;code&gt;capacity&lt;/code&gt; + &lt;code&gt;rate&lt;/code&gt; for a single period (worst case). This makes fixed windows a good fit for maximizing third party API limits, and &lt;code&gt;capacity&lt;/code&gt; a nice feature if the third party has an accommodation for “burst” traffic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reserving tokens
&lt;/h2&gt;

&lt;p&gt;When you hit a rate limit with these implementations, the rate limiter knows the next time it could plausibly handle your request. However, by that time a smaller request could have come along and consumed tokens, further delaying the larger request. If you are sure you want to eventually serve a request, it’s more efficient to pre-allocate the work and schedule its execution.&lt;sup id="fnref3"&gt;3&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;Reserving tokens provides three useful properties:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fairness:&lt;/strong&gt; By reserving capacity ahead of time, larger requests can allocate capacity without retrying until enough tokens accumulate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fire-and-forget:&lt;/strong&gt; When you require a client to retry an operation later, there’s a chance the client won't be around - e.g. if a user clicks away or refreshes a website. If you know you eventually want to take some action, you can schedule execution for later and free the client from the responsibility of retrying.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;“Perfect” scheduling:&lt;/strong&gt; Using retries, especially when you add jitter, prevents you from fully utilizing available resources. With reservations, you are given advance authorization to run your operation at the exact time.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The implementation for both strategies is relatively straightforward. Both allow the available tokens to go negative, up to some configurable limit (by default unlimited).  &lt;/p&gt;

&lt;p&gt;For example, say you had 3 tokens and a request came in for 5.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You updated the token count to -2 and responded that the work should happen later - specifically when 2 tokens would have been added.&lt;/li&gt;
&lt;li&gt;The caller &lt;a href="https://docs.convex.dev/scheduling/scheduled-functions" rel="noopener noreferrer"&gt;schedules&lt;/a&gt; their work to happen at that later time. &lt;strong&gt;It is important to schedule it and not just run the request right away&lt;/strong&gt; - otherwise it's just equivalent to a higher burst &lt;code&gt;capacity&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;When the scheduled function runs later, it doesn't need to check rate limits because it's already been approved.&lt;/li&gt;
&lt;li&gt;Later on, when another function call checks the rate limit, it calculates how many tokens to add based on the elapsed time and adds it to the -2 value before deciding whether there are enough tokens for the new call.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Passing &lt;code&gt;reserve: true&lt;/code&gt; is optional. By default the library will refuse to go negative.&lt;/p&gt;

&lt;p&gt;See the below example for usage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using rate limits for common operations
&lt;/h2&gt;

&lt;p&gt;To make the rate limit tradeoffs concrete, let’s consider how we’d use rate limiting in our application. I made a library for  that provides a simple rate limiting API to consume, check, and reset limits. This library is Convex-specific, but it will serve as a representative example of how application-layer rate limits might look.&lt;/p&gt;



&lt;p&gt;In general, the distinctions here will be between “global” rate limits, for which there is one per application for a given rate limit “name,” and ones where each distinct “key” is rate limited independently.&lt;/p&gt;

&lt;p&gt;The examples below assume a flow like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A client calls a &lt;a href="https://docs.convex.dev/functions/mutation-functions" rel="noopener noreferrer"&gt;mutation&lt;/a&gt; to take some action, which may involve a rate-limited behavior. For those unfamiliar with Convex, a mutation runs server-side and encapsulates a database transaction, providing Serializable isolation and Optimistic Concurrency Control with automatic retries.&lt;/li&gt;
&lt;li&gt;The mutation checks the rate limit before taking the action, and if it fails it returns the time when the client should retry.&lt;/li&gt;
&lt;li&gt;Alternatively, the client could call an &lt;a href="https://docs.convex.dev/functions/actions" rel="noopener noreferrer"&gt;action&lt;/a&gt; (a non-transactional non-deterministic general-purpose environment, akin to a normal API endpoint) which could then call a mutation before taking some action.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Configuration can be centralized or provided at the call-site. If you use a limit from more than one place, defining them centrally is best, which will produce type-safe functions auto-completing your rate limit names.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;defineRateLimits&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;convex-helpers/server/rateLimit&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;rateLimit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;checkRateLimit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;resetRateLimit&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;defineRateLimits&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;createAThing&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;token bucket&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;rate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;period&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;HOUR&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;makeAThirdPartyRequest&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;fixed window&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;rate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;period&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;MINUTE&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;rateLimit&lt;/code&gt; is what you call to consume resources. It will return whether it succeeded.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;checkRateLimit&lt;/code&gt; will do the same thing as &lt;code&gt;rateLimit&lt;/code&gt;, but return the result without consuming any resources, as a way to tell whether it would have failed.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;resetRateLimit&lt;/code&gt; will reset a given rate limit.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;doAThing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;mutation&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;retryAt&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;rateLimit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;myRateLimit&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;retryAt&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;doTheThing&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ok&lt;/code&gt; is whether it successfully consumed the resource and the operation should proceed.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;retryAt&lt;/code&gt; is when it would have succeeded in the future, which can be used by a client to decide when to retry. We’ll discuss “jitter” later which is important here if it’s highly contended.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The only setup required is to add the rate limiting table to your schema in &lt;code&gt;convex/schema.ts&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;rateLimitTables&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;./rateLimit.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nf"&gt;defineSchema&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;rateLimitTables&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;exampleOtherTable&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;defineTable&lt;/span&gt;&lt;span class="p"&gt;({}),&lt;/span&gt;
  &lt;span class="c1"&gt;// other tables&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From here on, the mutation context will be assumed. We’ll also define these constants to make it more readable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;SECOND&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// ms&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;MINUTE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;SECOND&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;HOUR&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;MINUTE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;DAY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;HOUR&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Failed logins
&lt;/h3&gt;

&lt;p&gt;This will allow 5 failed requests in an hour. Because it’s a bucket implementation, the user will be able to try 5 times immediately if they haven’t tried in an hour, and then can try again every 6 minutes afterwards.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;checkRateLimit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;rateLimit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;resetRateLimit&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;defineRateLimits&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;failedLogins&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;token bucket&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;rate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;period&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Hour&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using the functions to manage failed logins:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;checkRateLimit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;failedLogins&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;throws&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;success&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;logInAttempt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;otp&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;success&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// If we successfully logged in, stop limiting us in the future&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;resetRateLimit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;failedLogins&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;userId&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;retryAt&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;rateLimit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;failedLogins&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;userId&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;retryAt&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt; &lt;span class="c1"&gt;// So the client can indicated to the user when to try again&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;throws&lt;/code&gt; is a convenience to have it throw when &lt;code&gt;ok&lt;/code&gt; is &lt;code&gt;false&lt;/code&gt; instead of return the values. It works for &lt;code&gt;rateLimit&lt;/code&gt; and &lt;code&gt;checkRateLimit&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Account creation via global limit
&lt;/h3&gt;

&lt;p&gt;To prevent a flood of spam accounts, you can set a global limit on signing up for a free trial. This limits sign-ups to an average of 100 per hour.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;rateLimit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; 
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;freeTrialSignUp&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;token bucket&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;rate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;period&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;HOUR&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;throws&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;config&lt;/code&gt;: The configuration is inlined if you didn’t define it with &lt;code&gt;defineRateLimits.&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Note: this is a deterrent for spammers, but means that during a flood of attempts, other users will be impacted. See below for tips on authenticating anonymous users.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sending messages per user
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;retryAt&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;rateLimit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;sendMessage&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;token bucket&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;rate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;period&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;MINUTE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;capacity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;key&lt;/code&gt; will isolate the rate limiting, in this case to be a per-user limit.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;capacity&lt;/code&gt; here allows accumulating up to 20 unused tokens so a bursty minute won’t fail. See above for details on the implementation.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Making LLM requests with reserved capacity
&lt;/h3&gt;

&lt;p&gt;If you’re staying on the &lt;a href="https://platform.openai.com/docs/guides/rate-limits/free-tier-rate-limits" rel="noopener noreferrer"&gt;OpenAI free tier&lt;/a&gt;, you can ensure you don’t go above their rate limits, and when you have too many requests, you can schedule them to happen when you will.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;retryAt&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;rateLimit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;chatCompletion&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;numTokensInRequest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;reserve&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;retryAt&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt; &lt;span class="c1"&gt;// There were too many reserved already.&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;retryAt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="c1"&gt;// We need to wait until later, but we've reserved the tokens.&lt;/span&gt;
  &lt;span class="c1"&gt;// Spread the request across 10s in case of many reservations.&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;withJitter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;retryAt&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;random&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;SECOND&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;scheduler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;runAt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;withJitter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;internal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;llms&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;generateCompletion&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="c1"&gt;// We can run it immediately. Scheduled for after this DB transaction.&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;scheduler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;runAfter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;internal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;llms&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;generateCompletion&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;count&lt;/code&gt; can decrease the number of tokens by a custom amount. By default it’s 1.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;reserve: true&lt;/code&gt; instructs it to allow a token deficit if there isn’t enough capacity, provided we are willing to schedule our work for the future time when it would have had enough capacity. See above for more details.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;maxReserved&lt;/code&gt; defines how many tokens it should allow to be reserved before refusing to set aside tokens.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Reservations work with either rate limiting approach.&lt;/p&gt;

&lt;h3&gt;
  
  
  Jitter: introducing randomness to avoid thundering herds
&lt;/h3&gt;

&lt;p&gt;If we tell all clients to retry just when the next window starts, we are inviting what’s called a “thundering herd” which is about how it sounds. When too many users show up at once, it can cause network congestion, database contention, and consume other shared resources at an unnecessarily high rate. Instead we can return a random time within the next period to retry. Hopefully this is infrequent. This technique is referred to as adding “jitter.”&lt;/p&gt;

&lt;p&gt;A simple implementation could look like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;withJitter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;retryAt&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;random&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;period&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For the fixed window, we also introduce randomness by picking the start time of the window (from which all subsequent windows are based) randomly if &lt;code&gt;config.start&lt;/code&gt; wasn’t provided. This helps from all clients flooding requests at midnight and paging your on-call.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scaling rate limits with shards
&lt;/h2&gt;

&lt;p&gt;As your usage grows, you’ll want to think about scalability for your rate limiting.&lt;/p&gt;

&lt;p&gt;If you are using per-user keys, using each key won’t conflict with the others. However, if you use global rate limits, or a single key might have hundreds of requests per second, you should shard the rate limit by dividing the capacity into multiple rate limits. &lt;/p&gt;

&lt;p&gt;For example, if you’re trying to limit the overall (global) number of tokens sent to an LLM API, you could make 10 rate limits, each at 1/10th the bandwidth. When you go to use the rate limit, you can set the &lt;code&gt;key&lt;/code&gt; to be one of 10 random values, such as &lt;code&gt;0...9&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If you’re already using a key (for instance per-team), you can use the same approach by appending a random number to the existing key. All of the keys are just strings at the end of the day.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;rateLimit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;teamId&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;shard&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;myLimit&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;  
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will decrease your ability to maximize resources, however, since you could get unlucky and try a rate limiter that has gotten more requests. To mitigate this, we can leverage &lt;a href="https://www.eecs.harvard.edu/~michaelm/postscripts/tpds2001.pdf" rel="noopener noreferrer"&gt;The Power of Two Choices in Randomized Load Balancing&lt;/a&gt; by using &lt;code&gt;checkRateLimit&lt;/code&gt; on two shards and using the one with a higher &lt;code&gt;value&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;status1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;checkRateLimit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;teamId&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;shard1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;myLimit&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;status2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;checkRateLimit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;teamId&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;shard2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;myLimit&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;shard&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;status1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;status2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;status1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;status2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;shard1&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;shard2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;status1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;shard1&lt;/span&gt;
  &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;status2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;shard2&lt;/span&gt;
  &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;status1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;retryAt&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;status2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;retryAt&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;shard1&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;shard2&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;rateLimit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;teamId&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;shard&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;myLimit&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;throws&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;  
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What if I rely on multiple rate limits?
&lt;/h2&gt;

&lt;p&gt;If your operations requires consuming multiple rate limits, you can run into issues if you aren’t careful. You could consume resources that you don’t end up using. This can even deadlock if two operations acquire resources in different orders, for instance:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Request A takes 5 unit of x and fails to take 10 units of y, so returns to the client that it should retry later.&lt;/li&gt;
&lt;li&gt;Request B takes 5 units of y (say there were only 5 units) and fails to take 10 units of x.&lt;/li&gt;
&lt;li&gt;Request A retries, taking another 5 units of x which had accumulated, and fails to take y.&lt;/li&gt;
&lt;li&gt;etc.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Here are two strategies to handle this:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Use &lt;code&gt;checkRateLimit&lt;/code&gt; ahead of time for each rate limit you’ll depend on, and only continue to consume them if they are all satisfied, otherwise return the largest &lt;code&gt;retryAt&lt;/code&gt; value so the client doesn’t retry before it’d plausibly be accepted.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Roll back the transaction by throwing an exception instead of returning. When an exception is thrown, database writes are not committed, oo any rate limits the request already consumed will be reset to their previous values. When a rate limit fails, no state is persisted by the library. &lt;/p&gt;

&lt;p&gt;To pass information back to the client, you can use &lt;code&gt;ConvexError&lt;/code&gt;. This is what the library does if you specify &lt;code&gt;throws: true&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;throws&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ConvexError&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;RateLimited&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;retryAt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;Note: this might not be the maximum value of &lt;code&gt;retryAt&lt;/code&gt; for your request, just the first it ran into. Hopefully your application rarely hits rate limits, so it’s ok for the client to retry later and need to wait again.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In general, I’d advise you to consolidate all of your rate limits into a single transaction (mutation) rather than calling out to multiple mutations from an action as you go, if all of the rate limits need to be satisfied for the operation to succeed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Authenticating anonymous users
&lt;/h2&gt;

&lt;p&gt;It sounds like an oxymoron, but there are a few strategies for authenticating users that haven’t logged in or authenticated using traditional methods.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Client-generated session ID (optimistic):&lt;/strong&gt; This ID is generated in the browser and sent with requests to identify the user. This is only meaningful protection if you authorize the session ID using a strategy below, since a malicious user could generate new ones for each request.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Associate the session ID with an IP (lossy)&lt;/strong&gt;: You can have the client make a one-time HTTP request to an API endpoint that associates the session ID provided with an IP. You then rate-limit behaviors based on IP. This is handy, but ultimately a flawed approach, since many real users may share a virtual IP exposed by their ISP.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use a Captcha or similar to authorize the session ID (robust):&lt;/strong&gt; This is what I’d recommend. Anonymous users submit a captcha to prove they’re not bots, and associate the successful captcha with their session ID to be authorized to do any operation. At this point, you can rate limit their session ID as if it were a userId.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;We looked at implementing application-layer rate limiting to help limit operations, either globally or specific to a user or other key. We looked at implementing both a token bucket and fixed window limits using just two numbers.&lt;/p&gt;

&lt;p&gt;This is enabled by having an environment that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Provides transactional guarantees to avoid race conditions with reading &amp;amp; writing values.&lt;/li&gt;
&lt;li&gt;Automatically retries conflicting transactions.&lt;/li&gt;
&lt;li&gt;Schedules work transactionally, to allow handling multiple rate limits independently and roll back everything if any of them fail.&lt;/li&gt;
&lt;li&gt;Has fast access to a database with indexed lookups.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Beware: if you plan to implement these with Postgres or similar, beware of the &lt;a href="https://www.2ndquadrant.com/en/blog/postgresql-anti-patterns-read-modify-write-cycles/" rel="noopener noreferrer"&gt;read-modify-write behavior&lt;/a&gt; where, by default, you are exposed to data races, even within a transaction.&lt;/p&gt;

&lt;p&gt;We also looked at adding the ability to reserve capacity ahead of time to ensure fairness, as well as handle bursts of traffic while still maintaining an overall average limit.&lt;/p&gt;

&lt;p&gt;As always, let me know in &lt;a href="https://convex.dev/community" rel="noopener noreferrer"&gt;our Discord&lt;/a&gt; what you think and what else you’d like to see out of the library.&lt;/p&gt;




&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;Specifically having serializable isolation is really useful for this use case. “Read committed” isolation (the default for Postgres and other SQL variants) is vulnerable to race conditions for the code in this article, &lt;a href="https://www.2ndquadrant.com/en/blog/postgresql-anti-patterns-read-modify-write-cycles/" rel="noopener noreferrer"&gt;even in a transaction&lt;/a&gt; unless you jump through some hoops. And if you figure out how to do it for Drizzle &lt;a href="https://orm.drizzle.team/docs/transactions" rel="noopener noreferrer"&gt;without adding read locks for every row in a transaction&lt;/a&gt;, &lt;a href="https://github.com/drizzle-team/drizzle-orm/discussions/1337" rel="noopener noreferrer"&gt;this person could use some help&lt;/a&gt;. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn2"&gt;
&lt;p&gt;For an article about a load balancing strategy that helps control costs and optimizes for throughput, check out my recent article on &lt;a href="https://stack.convex.dev/work-stealing" rel="noopener noreferrer"&gt;work stealing&lt;/a&gt;. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn3"&gt;
&lt;p&gt;The Convex scheduler is transactional within mutations. If the mutations throws an exception, no database writes will happen and no functions will be scheduled.  ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>webdev</category>
      <category>fullstack</category>
      <category>database</category>
      <category>convex</category>
    </item>
    <item>
      <title>Streaming HTTP Responses using fetch</title>
      <dc:creator>Ian Macartney</dc:creator>
      <pubDate>Tue, 23 Jul 2024 23:56:37 +0000</pubDate>
      <link>https://forem.com/ianmacartney/streaming-http-responses-using-fetch-1fm2</link>
      <guid>https://forem.com/ianmacartney/streaming-http-responses-using-fetch-1fm2</guid>
      <description>&lt;p&gt;This post will look at working with the JavaScript &lt;a href="https://developer.mozilla.org/en-US/docs/Web/API/Streams_API" rel="noopener noreferrer"&gt;Streams API&lt;/a&gt; which allows making a &lt;a href="https://developer.mozilla.org/en-US/docs/Web/API/fetch" rel="noopener noreferrer"&gt;fetch&lt;/a&gt; HTTP call and receiving a streaming response in chunks, which allows a client to start responding to a server response more quickly and build UIs like ChatGPT.&lt;/p&gt;

&lt;p&gt;As a motivating example, we’ll implement a function to handle the streaming LLM response from OpenAI (or any server using the same http streaming API), using no npm dependencies—just the built-in &lt;a href="https://developer.mozilla.org/en-US/docs/Web/API/fetch" rel="noopener noreferrer"&gt;&lt;code&gt;fetch&lt;/code&gt;&lt;/a&gt;. &lt;a href="https://gist.github.com/ianmacartney/53dafa51d37469534846105e39d99a25" rel="noopener noreferrer"&gt;The full code is here&lt;/a&gt; including retries with exponential backoff, embeddings, non-streaming chat, and a simpler APIs for interacting with chat completions and embeddings.&lt;/p&gt;

&lt;p&gt;If you’re interested in seeing how to also return an HTTP stream to clients, &lt;a href="https://stack.convex.dev/ai-chat-with-http-streaming" rel="noopener noreferrer"&gt;check out this post&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Full example code
&lt;/h2&gt;

&lt;p&gt;Here’s the full example. We’ll look at each piece below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;createChatCompletion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ChatCompletionCreateParams&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Making the request&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;baseUrl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;LLM_BASE_URL&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://api.openai.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;baseUrl&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/v1/chat/completions&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Authorization&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Bearer &lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;LLM_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="c1"&gt;// Handling errors&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Failed (&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;): &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="c1"&gt;// the non-streaming case&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;No body in response&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="c1"&gt;// Returning an async iterator&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;Symbol&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;asyncIterator&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nf"&gt;splitStream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Handling the OpenAI HTTP streaming protocol&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;data:&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;json&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;substring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;data:&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;trimStart&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
          &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;[DONE]&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;
          &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Reading the stream  &lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nf"&gt;splitStream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ReadableStream&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;Uint8Array&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getReader&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;lastFragment&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;done&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;reader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;done&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Flush the last fragment now that we're done&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;lastFragment&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="nx"&gt;lastFragment&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;TextDecoder&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="nx"&gt;lastFragment&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;parts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;lastFragment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="c1"&gt;// Yield all except for the last part&lt;/span&gt;
      &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="nx"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="c1"&gt;// Save the last part as the new last fragment&lt;/span&gt;
      &lt;span class="nx"&gt;lastFragment&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;finally&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;reader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;releaseLock&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;See the code &lt;a href="https://gist.github.com/ianmacartney/53dafa51d37469534846105e39d99a25" rel="noopener noreferrer"&gt;here&lt;/a&gt; for a version that has nice typed overloads for streaming &amp;amp; non-streaming parameter variants, along with retries and other improvements.&lt;/p&gt;

&lt;p&gt;The rest of the post is about understanding what this code does.&lt;/p&gt;

&lt;h2&gt;
  
  
  Making the request
&lt;/h2&gt;

&lt;p&gt;This part is actually very easy. A streaming HTTP response comes from a normal HTTP request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;baseUrl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;LLM_BASE_URL&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://api.openai.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;baseUrl&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/v1/chat/completions&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Authorization&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Bearer &lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;LLM_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The HTTP headers are sent up per usual, and don’t have to set anything in particular to enable streaming. And you can still leverage regular caching headers for HTTP streaming.&lt;/p&gt;

&lt;h2&gt;
  
  
  Handling errors
&lt;/h2&gt;

&lt;p&gt;The story around errors on the client side is a little unfortunate for HTTP streaming. The upside is that for HTTP streaming, the client gets status codes right away in the initial response and can detect failure there. The downside to the http protocol is that if the server returns success but then breaks mid-stream, there isn’t anything at the protocol level that will tell the client that the stream was interrupted. We’ll see below how OpenAI encodes an “all done” sentinel at the end to work around this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Failed (&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;): &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Reading the stream
&lt;/h2&gt;

&lt;p&gt;In order to read an HTTP streaming response, the client can use the &lt;a href="https://developer.mozilla.org/en-US/docs/Web/API/Response/body" rel="noopener noreferrer"&gt;&lt;code&gt;response.body&lt;/code&gt; property&lt;/a&gt; which is a &lt;a href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream" rel="noopener noreferrer"&gt;ReadableStream&lt;/a&gt; allowing you to iterate over the chunks as they come in from the server using the &lt;a href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream/getReader" rel="noopener noreferrer"&gt;&lt;code&gt;.getReader(&lt;/code&gt;) method&lt;/a&gt;.&lt;sup id="fnref1"&gt;1&lt;/sup&gt;&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getReader&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;done&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;reader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;done&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TextDecoder&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="c1"&gt;//... do something with the chunk&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;finally&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;reader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;releaseLock&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This handles every bit of data that we get back, but for the OpenAI HTTP protocol we are expecting the data to be JSON separated by newlines, so instead we will split up the response body and “yield” each line as they’re completed. We buffer the in-progress line into &lt;code&gt;lastFragment&lt;/code&gt; and only return full lines that have been separated by two newlines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="c1"&gt;// stream here is request.body&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nf"&gt;splitStream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ReadableStream&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;Uint8Array&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getReader&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;lastFragment&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;done&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;reader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;done&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Flush the last fragment now that we're done&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;lastFragment&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="nx"&gt;lastFragment&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;TextDecoder&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="nx"&gt;lastFragment&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;parts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;lastFragment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="c1"&gt;// Yield all except for the last part&lt;/span&gt;
      &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="nx"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="c1"&gt;// Save the last part as the new last fragment&lt;/span&gt;
      &lt;span class="nx"&gt;lastFragment&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;finally&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;reader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;releaseLock&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If this &lt;code&gt;function*&lt;/code&gt; and &lt;code&gt;yield&lt;/code&gt; syntax is unfamiliar to you, just treat &lt;code&gt;function*&lt;/code&gt; as a function that can return multiple things in a loop, and &lt;code&gt;yield&lt;/code&gt; as the way of returning something multiple times from a function.&lt;/p&gt;

&lt;p&gt;You can then loop over this &lt;code&gt;splitStream&lt;/code&gt; function like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nf"&gt;splitStream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// data here is a full line of text. For OpenAI, it might look like&lt;/span&gt;
  &lt;span class="c1"&gt;// "data: {...some json object...}" or "data: [DONE]" at the end&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If this "&lt;a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/for-await...of" rel="noopener noreferrer"&gt;for await&lt;/a&gt;" syntax throws you off, it's using what’s called an “async iterator” - like a regular iterator you’d use with a for loop, but every time it gets the next value, it’s awaited. &lt;/p&gt;

&lt;p&gt;For our example, when we’ve gotten some text from OpenAI and we’re waiting for more, the for loop will wait until &lt;code&gt;splitStream&lt;/code&gt; yields another value, which will happen when &lt;code&gt;await reader.read()&lt;/code&gt; returns a value that finishes one or more lines of text.&lt;/p&gt;

&lt;p&gt;Next up we’ll look at another way of returning an async iterator that isn’t a function like &lt;code&gt;splitStream&lt;/code&gt;, so a caller can use a “for await” loop to iterate over this data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Returning an async iterator
&lt;/h2&gt;

&lt;p&gt;Now that we have an async iterator returning full lines of text, we could just return &lt;code&gt;splitStream(response.body)&lt;/code&gt;, but we want to intercept each of the lines and transform them, while still letting the caller of our function to iterate. &lt;/p&gt;

&lt;p&gt;The approach is similar to to the &lt;code&gt;async function*&lt;/code&gt; syntax above. Here we’ll return an async iterator directly, instead of an async function that returns one when it’s called. The difference is the type is &lt;code&gt;AsyncIterator&lt;/code&gt; instead of &lt;code&gt;AsyncGenerator&lt;/code&gt; which needs to be called first. An &lt;code&gt;AsyncIterator&lt;/code&gt; &lt;a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Iteration_protocols#the_async_iterator_and_async_iterable_protocols" rel="noopener noreferrer"&gt;can be defined&lt;/a&gt; by having a certain named function: &lt;code&gt;Symbol.asyncIterator&lt;/code&gt;.&lt;sup id="fnref2"&gt;2&lt;/sup&gt;&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;Symbol&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;asyncIterator&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nf"&gt;splitStream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="c1"&gt;//handle the data&lt;/span&gt;
            &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is useful when you want to return something different from the data coming from &lt;code&gt;splitStream&lt;/code&gt;. Every time a new line comes in from the streaming HTTP request, &lt;code&gt;splitStream&lt;/code&gt; will &lt;code&gt;yield&lt;/code&gt; it, this function will receive it in &lt;code&gt;data&lt;/code&gt; and can do something before yielding it to its caller. &lt;/p&gt;

&lt;p&gt;Next we’ll look at how to interpret this data specifically in the case of OpenAI’s streaming &lt;a href="https://platform.openai.com/docs/api-reference/streaming" rel="noopener noreferrer"&gt;chat completion API&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Handling the OpenAI HTTP streaming protocol
&lt;/h2&gt;

&lt;p&gt;The OpenAI response protocol is a series of lines that start with &lt;code&gt;data:&lt;/code&gt; or &lt;code&gt;event:&lt;/code&gt;, but we’ll just handle the data responses, since that’s the useful part for chat completions. There’s a sentinel of &lt;code&gt;[DONE]&lt;/code&gt; if the stream is done, otherwise it’s just JSON.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nf"&gt;splitStream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;data:&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;json&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;substring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;data:&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;trimStart&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;[DONE]&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;debug&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Unexpected data:&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Bringing it all together
&lt;/h2&gt;

&lt;p&gt;Now that you understand HTTP streaming, you can feel confident working directly with streaming APIs without relying on sdks or libraries. This allows you to hide latency, as your UI can immediately start updating, without consuming more bandwidth with multiple requests. You can use the above function like you would with the official &lt;code&gt;openai&lt;/code&gt; npm package:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;createChatCompletion&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;llama3&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="nx"&gt;your&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;...],&lt;/span&gt;
    &lt;span class="na"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://gist.github.com/ianmacartney/53dafa51d37469534846105e39d99a25" rel="noopener noreferrer"&gt;See the code here&lt;/a&gt; that also lets you make some utility functions to make this even easier by pre-configuring the model and extracting the &lt;code&gt;.choices[0].delta.content&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;chatStream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;content&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Before you copy the code, try to implement it yourself as an exercise in async functions.&lt;/p&gt;

&lt;h2&gt;
  
  
  More resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;For information about returning HTTP streaming data from your own server endpoint, check out this post on &lt;a href="https://stack.convex.dev/ai-chat-with-http-streaming" rel="noopener noreferrer"&gt;AI Chat with HTTP Streaming&lt;/a&gt; that both streams data from OpenAI (or similar) to your server and simultaneously streams it down to a client, while doing custom logic as it goes (such as saving chunks to a database).&lt;/li&gt;
&lt;li&gt;The MDN docs, as always, are great. Beyond the links above, &lt;a href="https://developer.mozilla.org/en-US/docs/Web/API/Streams_API/Using_readable_streams" rel="noopener noreferrer"&gt;here’s a guide on the readable streams&lt;/a&gt; API that shows how to connect a readable stream to an &lt;code&gt;&amp;lt;img&amp;gt;&lt;/code&gt; tag to stream in an image request. Note: this guide uses &lt;code&gt;response.body&lt;/code&gt; as an async iterator, but currently that is &lt;a href="https://github.com/microsoft/TypeScript/issues/57874" rel="noopener noreferrer"&gt;not widely implemented and not in the TypeScript types&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;




&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;Note: you can only have one reader of the stream at a time, so you generally don’t call &lt;code&gt;.getReader()&lt;/code&gt; multiple times - you probabaly want &lt;a href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream/tee" rel="noopener noreferrer"&gt;&lt;code&gt;.tee()&lt;/code&gt; in that case&lt;/a&gt;, and if you want to use &lt;code&gt;.getReader()&lt;/code&gt; multiple times for some reason, make sure to have the first &lt;code&gt;.releaseLock()&lt;/code&gt; first. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn2"&gt;
&lt;p&gt;Or alternatively you can If you aren’t familiar with &lt;code&gt;Symbol&lt;/code&gt;, it’s used in a way to have keys in an object that aren’t strings or numbers. That way they don’t conflict if you added a key named &lt;code&gt;asyncIterator&lt;/code&gt;. You could access the function with &lt;code&gt;myIterator[Symbol.asyncIterator]()&lt;/code&gt;. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>javascript</category>
      <category>ai</category>
      <category>chatgpt</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Multiple apps on a single domain hosted on sub-paths</title>
      <dc:creator>Ian Macartney</dc:creator>
      <pubDate>Tue, 23 Jul 2024 23:52:14 +0000</pubDate>
      <link>https://forem.com/ianmacartney/multiple-apps-on-a-single-domain-hosted-on-sub-paths-1moc</link>
      <guid>https://forem.com/ianmacartney/multiple-apps-on-a-single-domain-hosted-on-sub-paths-1moc</guid>
      <description>&lt;p&gt;&lt;strong&gt;tl;dr&lt;/strong&gt; It's possible to deploy multiple apps on the same domain by serving each at a different sub-path, using a simple vercel.json configuration, and custom build commands.&lt;/p&gt;

&lt;p&gt;Even a static SPA React app with client-side routing can be configured to be served at a custom path.&lt;/p&gt;

&lt;p&gt;When developing my &lt;a href="https://labs.convex.dev/llama-farm" rel="noopener noreferrer"&gt;llama farm chat demo&lt;/a&gt;, I had been developing it like a normal app, with the index at &lt;code&gt;/&lt;/code&gt; on &lt;code&gt;http://localhost:5173/&lt;/code&gt;, but then needed to host the app entirely under the &lt;code&gt;/llama-farm&lt;/code&gt; sub-path at labs.convex.dev/llama-farm, since labs.convex.dev also hosts &lt;a href="https://labs.convex.dev/auth" rel="noopener noreferrer"&gt;auth&lt;/a&gt;, &lt;a href="https://labs.convex.dev/convex-ents" rel="noopener noreferrer"&gt;Ents&lt;/a&gt;, a &lt;a href="https://labs.convex.dev/million" rel="noopener noreferrer"&gt;million checkboxes clone&lt;/a&gt; and other independent projects. It uses a client-side router (React Router), which needs all routes to get redirected to the same &lt;code&gt;index.html&lt;/code&gt;, and each page needs to get served relative to that subpath, without littering the codebase with a prefix for every route.&lt;/p&gt;

&lt;p&gt;In this post I'll cover:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Hosting your Vite-based React app with client-side router under a sub-path on Vercel.&lt;/li&gt;
&lt;li&gt;Hosting multiple apps on subpaths within a single project.&lt;/li&gt;
&lt;li&gt;Configuring multiple sub-paths for the same app in a single vercel.json.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Serving on a subpath
&lt;/h2&gt;

&lt;p&gt;To change my app to work on a sub-path on Vercel, the four changes that I needed to make were:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Setting the &lt;a href="https://vitejs.dev/guide/build#public-base-path" rel="noopener noreferrer"&gt;public base path&lt;/a&gt; by specifying &lt;code&gt;--base=/llama-farm&lt;/code&gt; in my &lt;code&gt;vite build&lt;/code&gt; command.&lt;/li&gt;
&lt;li&gt;Setting the &lt;a href="https://vitejs.dev/config/build-options.html#build-outdir" rel="noopener noreferrer"&gt;&lt;code&gt;build.outDir&lt;/code&gt;&lt;/a&gt; to &lt;code&gt;--outDir=dist/llama-farm&lt;/code&gt; so it would create the assets at the relative path Vercel expected. You can see the full build command below.&lt;/li&gt;
&lt;li&gt;Setting the &lt;a href="https://reactrouter.com/en/main/routers/create-browser-router#optsbasename" rel="noopener noreferrer"&gt;&lt;code&gt;basename&lt;/code&gt;&lt;/a&gt; field for React Router on &lt;code&gt;createBrowserRouter&lt;/code&gt; to &lt;code&gt;{ basename: "/llama-farm" }&lt;/code&gt;. In order to test locally at &lt;code&gt;/&lt;/code&gt; I made this an environment variable &lt;code&gt;VITE_BASEPATH&lt;/code&gt; that I only set in Vercel. You can see my &lt;a href="https://github.com/get-convex/llama-farm-chat/blob/main/src/main.tsx#L41" rel="noopener noreferrer"&gt;code here&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Adding a &lt;a href="https://vercel.com/docs/edge-network/rewrites" rel="noopener noreferrer"&gt;&lt;code&gt;rewrites&lt;/code&gt;&lt;/a&gt; configuration to my &lt;code&gt;vercel.json&lt;/code&gt; file:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;rewrites&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;source&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/llama-farm(.*)&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;destination&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/llama-farm&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On Vercel my build command ended up looking like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;vite build &lt;span class="nt"&gt;--base&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$VITE_BASEPATH&lt;/span&gt; &lt;span class="nt"&gt;--outDir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;dist&lt;span class="nv"&gt;$VITE_BASEPATH&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I set the &lt;code&gt;VITE_BASEPATH&lt;/code&gt; environment variable to &lt;code&gt;/llama-farm&lt;/code&gt; so it was shared between the build command and React Router config.&lt;/p&gt;

&lt;h3&gt;
  
  
  Adding in Convex
&lt;/h3&gt;

&lt;p&gt;One more layer is if our app uses  for the backend, in which case we can also &lt;a href="https://docs.convex.dev/production/hosting/vercel#deploying-to-vercel" rel="noopener noreferrer"&gt;deploy our Convex backend&lt;/a&gt; at the same time. For this, we wrap our build command with &lt;code&gt;npx convex deploy --cmd '&amp;lt;build command here&amp;gt;'&lt;/code&gt;.&lt;sup id="fnref1"&gt;1&lt;/sup&gt; This will build the Vite app with the &lt;code&gt;VITE_CONVEX_URL&lt;/code&gt; set, which tells the frontend where to find the corresponding backend. This is especially useful for preview deploys which each have their own backend. As an additional safeguard, if the build fails it won't deploy your backend.&lt;/p&gt;

&lt;p&gt;The full updated build command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx convex deploy &lt;span class="nt"&gt;--cmd&lt;/span&gt; &lt;span class="s1"&gt;'vite build --base=$VITE_BASEPATH --outDir=dist$VITE_BASEPATH'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now my chat app is being served on &lt;code&gt;llama-farm-chat.vercel.app/llama-farm&lt;/code&gt;. In the next section I'll connect it to the app running &lt;code&gt;labs.convex.dev&lt;/code&gt; so I can access it on &lt;code&gt;labs.convex.dev/llama-farm&lt;/code&gt;. &lt;/p&gt;

&lt;h2&gt;
  
  
  Serving multiple supaths from another project
&lt;/h2&gt;

&lt;p&gt;To serve my app from the &lt;code&gt;labs.convex.dev&lt;/code&gt; app, I'll edit the &lt;code&gt;vercel.json&lt;/code&gt; for the Convex Labs project to handle redirecting and rewriting requests to my app.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;redirects&lt;/code&gt; will serve assets from the same relative path, but from elsewhere. In my case the &lt;code&gt;.js&lt;/code&gt; and &lt;code&gt;.css&lt;/code&gt; files are in the &lt;code&gt;llama-farm/assets/&lt;/code&gt; directory.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;rewrites&lt;/code&gt; will take into account the rewrites in my llama farm project, both for the base url &lt;code&gt;/llama-farm&lt;/code&gt; and any subpaths. These will all get rewritten to &lt;code&gt;/llama-farm&lt;/code&gt; which will serve the &lt;code&gt;index.html&lt;/code&gt; file there.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;redirects&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;source&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/llama-farm/assets/:filePath&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;destination&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://llama-farm-chat.vercel.app/llama-farm/assets/:filePath&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;permanent&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="c1"&gt;//... other projects&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;rewrites&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;source&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/llama-farm&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;destination&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://llama-farm-chat.vercel.app/llama-farm&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;source&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/llama-farm/:match*&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;destination&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://llama-farm-chat.vercel.app/llama-farm/:match*&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="c1"&gt;//... other projects&lt;/span&gt;
  &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can read more about &lt;a href="https://vercel.com/docs/projects/project-configuration#redirects" rel="noopener noreferrer"&gt;redirects&lt;/a&gt; and &lt;a href="https://vercel.com/docs/projects/project-configuration#rewrites" rel="noopener noreferrer"&gt;rewrites&lt;/a&gt; in the Vercel docs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Configuring multiple subpaths
&lt;/h2&gt;

&lt;p&gt;You might want to build the same project and host it on different sub-paths on different domains. For example, I also host my app at &lt;a href="https://llamafarm.chat" rel="noopener noreferrer"&gt;llamafarm.chat&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I'll start by saying there are &lt;a href="https://github.com/orgs/vercel/discussions/3162" rel="noopener noreferrer"&gt;many&lt;/a&gt; &lt;a href="https://github.com/orgs/vercel/discussions/25" rel="noopener noreferrer"&gt;Vercel&lt;/a&gt; &lt;a href="https://github.com/orgs/vercel/discussions/3318" rel="noopener noreferrer"&gt;community&lt;/a&gt; &lt;a href="https://github.com/orgs/vercel/discussions/1054" rel="noopener noreferrer"&gt;discussions&lt;/a&gt; about how using environment variables to configure vercel.json is not supported. The recommendation is to use edge function middleware or a Next.js config. For this app I'd rather not pay for serverless functions or migrate it to Next.js.&lt;/p&gt;

&lt;p&gt;However, I have a configuration that is working for me. I've found that I can put both rewrite configs into one &lt;code&gt;vercel.json&lt;/code&gt; and both apps will route correctly.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;vercel.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;rewrites&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;source&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/llama-farm(.*)&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;destination&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/llama-farm&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;source&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/(.*)&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;destination&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On labs.convex.dev/llama-farm, it makes sense that it will only ever match against incoming urls starting with &lt;code&gt;/llama-farm&lt;/code&gt; since the &lt;code&gt;labs.convex.dev&lt;/code&gt; project only rewrites those urls.&lt;br&gt;
However, I'm a bit confused why llamafarm.chat correctly routes &lt;code&gt;/llama-farm&lt;/code&gt; to &lt;code&gt;/index.html&lt;/code&gt; instead of looking for &lt;code&gt;/llama-farm&lt;/code&gt;. My guess is that it can't find any &lt;code&gt;/llama-farm/index.html&lt;/code&gt; at build time, so ignores the first rule for that project. If you know more, please reach out! I'm just happy that it works.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;We looked at hosting a client-routed SPA app (in my case a Vite app using React Router) on a base path, serving multiple apps on the same project, and configuring the rewrite paths for multiple environments in the one &lt;code&gt;vercel.json&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;I hope it helps you out! Come chat in our &lt;a href="https://convex.dev/community" rel="noopener noreferrer"&gt;community Discord&lt;/a&gt; if you have any comments or feedback.&lt;/p&gt;




&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;Note when pointing multiple Vercel projects at one Convex backend: I choose to only deploy the Convex backend from one project, and just do the frontend-only deploy from the other. For the one that is only deploying the frontend, I set the &lt;code&gt;CONVEX_SITE_URL&lt;/code&gt; environment variable instead of the &lt;code&gt;CONVEX_DEPLOY_KEY&lt;/code&gt;. This means that all frontend preview deployments for this project are talking to the prod backend. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>webdev</category>
      <category>vercel</category>
      <category>vite</category>
      <category>react</category>
    </item>
    <item>
      <title>GPT Streaming With Persistent Reactivity</title>
      <dc:creator>Ian Macartney</dc:creator>
      <pubDate>Mon, 10 Jul 2023 22:31:14 +0000</pubDate>
      <link>https://forem.com/ianmacartney/gpt-streaming-with-persistent-reactivity-53lg</link>
      <guid>https://forem.com/ianmacartney/gpt-streaming-with-persistent-reactivity-53lg</guid>
      <description>&lt;p&gt;Building ChatGPT-powered experiences feel snappier when the responses show up incrementally. Instead of waiting for the full response before showing the user anything, streaming the text in allows them to start reading immediately. &lt;/p&gt;

&lt;p&gt;OpenAI exposes a streaming API for chat completions. But how do you manage a streaming request, when you have a server between the client and OpenAI? You might be tempted to use HTTP streaming end to end - both from the client to the server and the server to OpenAI. However, there’s another way that comes with some big benefits. Spoiler: it’s possible to use a database as a layer of reactivity that separates client request lifecycles from server requests. Don’t worry if that doesn’t make sense yet - we’ll take it one step at a time.&lt;/p&gt;

&lt;p&gt;This post will look at working with streams with OpenAI’s beta v4.0.0 Node SDK. Beyond just getting streaming for a single user, we’ll look at an approach that enables:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Persisting&lt;/strong&gt; the response even if the user closes their browser.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multiplayer chat&lt;/strong&gt;, including &lt;strong&gt;streaming multiple ChatGPT messages&lt;/strong&gt; at once.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resuming&lt;/strong&gt; a stream when a user &lt;strong&gt;refreshes their browser&lt;/strong&gt; mid-stream.&lt;/li&gt;
&lt;li&gt;Streaming to &lt;strong&gt;multiple users&lt;/strong&gt; at once.&lt;/li&gt;
&lt;li&gt;Implement &lt;strong&gt;custom stream granularity&lt;/strong&gt;, such as only updating on full words or sentences, rather than on each token.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To do this, we’ll use Convex to store the messages and make the request to OpenAI. This code is &lt;a href="https://github.com/ianmacartney/streaming-chat-gpt"&gt;on GitHub&lt;/a&gt; for you to clone and play with.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--7lkIhpLP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.sanity.io/images/ts10onj4/production/9a7b8865f6cd1cb6748fdb88c986d6ec7bd26bdb-1200x638.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--7lkIhpLP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.sanity.io/images/ts10onj4/production/9a7b8865f6cd1cb6748fdb88c986d6ec7bd26bdb-1200x638.png" alt="Diagram of browsers talking to Convex, which talks to OpenAI" title="Data flow overview" width="800" height="425"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Persisting messages
&lt;/h1&gt;

&lt;p&gt;Let’s say we have a chat app, like the one pictured in the gif above. We want to store the messages from each user, as well as messages populated by responses from OpenAI. First let’s look at how data is stored (2), assuming a client sends a message (1).&lt;/p&gt;

&lt;p&gt;When a user sends a message, we immediately commit it to the database, so they’re correctly ordered by creation time. This code is executed on the server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;send&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;mutation&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="na"&gt;author&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;scheduler&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;author&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Save our message to the DB.&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;messages&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;author&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;indexOf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@gpt&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="c1"&gt;// ...see below&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This &lt;a href="https://docs.convex.dev/get-started#mutation-functions"&gt;&lt;code&gt;mutation&lt;/code&gt;&lt;/a&gt; saves the message to the database. When the user wants a response from the GPT model (by adding “&lt;a class="mentioned-user" href="https://dev.to/gpt"&gt;@gpt&lt;/a&gt;” to the message), we will:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Store a placeholder message to update later.&lt;/li&gt;
&lt;li&gt;Make a streaming request to OpenAI in an asynchronous background function. &lt;/li&gt;
&lt;li&gt;Progressively update the message as the response streams in.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;By running the streaming request asynchronously (versus blocking in a user request), we can interact with ChatGPT and save the data to the database &lt;strong&gt;even if the client has closed their browser&lt;/strong&gt;. It also allows us to run many requests in &lt;strong&gt;parallel&lt;/strong&gt;, from the same or &lt;strong&gt;multiple users&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;We also run it asynchronously because, in Convex, mutations are pure transactions and as such can’t do non-deterministic things like making API requests. In order to talk to third-party services, we can use an &lt;code&gt;action&lt;/code&gt;. Actions are non-transactional serverless functions that can talk to third-party services. We trigger the background job to call ChatGPT and update the message body by &lt;a href="https://docs.convex.dev/scheduling/scheduled-functions"&gt;scheduling&lt;/a&gt; the &lt;a href="https://docs.convex.dev/functions/actions"&gt;action&lt;/a&gt; like so:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="c1"&gt;// ...when the user wants to send a message to OpenAI's GPT model&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="c1"&gt;// fetch recent messages to send as context&lt;/span&gt;
&lt;span class="c1"&gt;// Insert a message with a placeholder body.&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;messageId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;messages&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;author&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ChatGPT&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;...&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="c1"&gt;// Schedule an action that calls ChatGPT and updates the message.&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;scheduler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;runAfter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;internal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;messageId&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We schedule it for zero milliseconds later, similar to doing &lt;code&gt;setTimeout(fn, 0)&lt;/code&gt;. The message writing and action scheduling happens transactionally in a &lt;a href="https://docs.convex.dev/functions/mutation-functions"&gt;mutation&lt;/a&gt;, so we will only run the action if the messages are successfully committed to the database.&lt;/p&gt;

&lt;p&gt;When the action wants to update the body of a message as the streaming results come in, it can invoke an &lt;code&gt;update&lt;/code&gt; mutation with the &lt;code&gt;messageId&lt;/code&gt; from above:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;update&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;internalMutation&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;messageId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;messages&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;messageId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;patch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;messageId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note: An &lt;code&gt;internalMutation&lt;/code&gt; is just a mutation that isn’t exposed as part of the public API. Next we’ll look at the code that calls this &lt;code&gt;update&lt;/code&gt; function.&lt;/p&gt;

&lt;p&gt;Convex has end-to-end reactivity, so when we update the messages in the database, the UI automatically updates. See below what it looks like to reactively query data.&lt;/p&gt;

&lt;h1&gt;
  
  
  Streaming with the OpenAI node SDK
&lt;/h1&gt;

&lt;p&gt;Streaming is currently available in the beta version of OpenAI’s node SDK. To install it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;openai@4.0.0-beta.2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is very similar to previous releases of the &lt;code&gt;openai&lt;/code&gt; package, with a few nuances you can read about &lt;a href="https://github.com/openai/openai-node/discussions/182"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;internal.openai.chat&lt;/code&gt; action we referenced above will live in &lt;code&gt;convex/openai.ts&lt;/code&gt; - see the full code &lt;a href="https://github.com/ianmacartney/streaming-chat-gpt"&gt;here&lt;/a&gt;. One important detail is that it needs to run in the &lt;a href="https://docs.convex.dev/functions/runtimes"&gt;&lt;code&gt;node&lt;/code&gt; runtime&lt;/a&gt; to support some dependencies in the &lt;code&gt;openai&lt;/code&gt; package, which means it needs to have &lt;code&gt;"use node";&lt;/code&gt; as the first line in the file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;use node&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;OpenAI&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;openai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;internalAction&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;./_generated/server&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;//...&lt;/span&gt;
&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;ChatParams&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Doc&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;messages&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;[];&lt;/span&gt;
  &lt;span class="nl"&gt;messageId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Id&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;messages&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;internalAction&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;runMutation&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;messageId&lt;/span&gt; &lt;span class="p"&gt;}:&lt;/span&gt; &lt;span class="nx"&gt;ChatParams&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;//...Create and handle a stream request&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Creating a stream request
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="c1"&gt;// inside the chat function in convex/openai.ts&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;apiKey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;openai&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;apiKey&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-3.5-turbo&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// "gpt-4" also works, but is so slow!&lt;/span&gt;
  &lt;span class="na"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;system&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;You are a terse bot in a group chat responding to q's.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(({&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;author&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nx"&gt;author&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ChatGPT&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;assistant&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;})),&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="c1"&gt;//...handling the stream&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The main difference here from previous &lt;code&gt;openai&lt;/code&gt; versions (aside from their new simplified client configuration) is passing &lt;code&gt;stream: true&lt;/code&gt;. This changes the return format, which &lt;strong&gt;unfortunately does not currently provide token usage&lt;/strong&gt; as the non-streaming version does. I hope this is fixed in a future release, as keeping track of token usage is useful to know how different users or features are affecting your costs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Handling the stream
&lt;/h2&gt;

&lt;p&gt;The API exposed by the &lt;code&gt;openai&lt;/code&gt; SDK makes handling the stream very easy. We use an async iterator to handle each chunk, appending it to the body and updating the message body with everything we’ve received so far:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;part&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;runMutation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;internal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;update&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;messageId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note that here we’re updating the message every time the body updates, but we could implement &lt;strong&gt;custom granularity&lt;/strong&gt; by deciding when to call &lt;code&gt;runMutation&lt;/code&gt;, such as on word breaks or at the end of full sentences.&lt;/p&gt;

&lt;p&gt;This action allows us to stream messages from OpenAI to our server function and into the database. But how does this translate to clients updating in real time? Next, let’s see how the client reactively updates as messages are created and updated.&lt;/p&gt;

&lt;h1&gt;
  
  
  Client “streaming” via subscriptions
&lt;/h1&gt;

&lt;p&gt;After the previous sections, you might be surprised how little is required to get the client to show live updating messages. I put streaming in quotes since we aren’t using HTTP streaming here - instead, we’re just using the reactivity provided out-of-the-box by Convex.&lt;/p&gt;

&lt;p&gt;On the client, we use the &lt;code&gt;useQuery&lt;/code&gt; hook, which calls the &lt;code&gt;api.messages.list&lt;/code&gt; server function in the &lt;code&gt;messages&lt;/code&gt; module, which we’ll see in a second. This hook will give us an updated list of messages every time a message is added or modified. This is a special property of a Convex &lt;a href="https://docs.convex.dev/get-started#query-functions"&gt;query&lt;/a&gt;: it tracks the database requests, and when any of the data is changed it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Invalidates the query cache (which is managed transparently by Convex).&lt;/li&gt;
&lt;li&gt;Recomputes the result.&lt;/li&gt;
&lt;li&gt;Pushes the new data over a WebSocket to all subscribed clients.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;App&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useQuery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;list&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;...&lt;/span&gt;
  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;article&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;author&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;p&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;p&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;article&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;))}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because this query is decoupled from the HTTP streaming response from OpenAI, &lt;strong&gt;multiple browsers&lt;/strong&gt; can be subscribed to updates as messages change. And if a user &lt;strong&gt;refreshes or restarts their browser&lt;/strong&gt;, it will just pick up the latest results of the query.&lt;/p&gt;

&lt;p&gt;On the server, this is the &lt;a href="https://docs.convex.dev/database/reading-data#querying-documents"&gt;query&lt;/a&gt; that grabs the most recent 100 messages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt; &lt;span class="p"&gt;}):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;Doc&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;messages&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Grab the most recent messages.&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;messages&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;order&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;desc&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;take&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="c1"&gt;// Reverse the list so that it's in chronological order.&lt;/span&gt;
    &lt;span class="c1"&gt;// Alternatively, return it reversed and flip the order via flex-direction.&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reverse&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Convex is doing some magic under the hood. If any message is inserted or updated into the database that would match this query - for instance if a new message is added or one of the first 100 messages is edited - then it will automatically re-execute this query (if there are any clients subscribed to it via &lt;code&gt;useQuery&lt;/code&gt;). If the results differ, it will push the new results over a WebSocket to the clients, which will trigger an update to the components using &lt;code&gt;useQuery&lt;/code&gt; for that query. &lt;/p&gt;

&lt;p&gt;To give you a sense of performance, &lt;code&gt;list&lt;/code&gt; takes ~17ms and &lt;code&gt;update&lt;/code&gt; takes ~7ms for me on the server, so the total latency between a new token coming from OpenAI and a new set of messages being sent to the client is very fast. The gifs in this article are real recordings, not sped up.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz8vpmvvnoxdq3z4ivmur.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz8vpmvvnoxdq3z4ivmur.gif" alt="GPT response streaming in quickly" width="960" height="960"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Summary
&lt;/h1&gt;

&lt;p&gt;We looked at how to stream ChatGPT responses into Convex, allowing clients to watch the responses, without the flakiness of browser-based HTTP streaming requests. The full code is available &lt;a href="https://github.com/ianmacartney/streaming-chat-gpt"&gt;here&lt;/a&gt;. Let us know &lt;a href="https://convex.dev/community"&gt;in Discord&lt;/a&gt; what you think!&lt;/p&gt;

&lt;h3&gt;
  
  
  Extra Credit 🤓
&lt;/h3&gt;

&lt;p&gt;Beyond what’s covered here, it would be easy to extend this demo to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Store whether a message has finished streaming by storing a boolean on the message updated at the end of the stream.&lt;/li&gt;
&lt;li&gt;Add error handling, to mark a message as failed if the stream fails. See &lt;a href="https://stack.convex.dev/full-stack-chatgpt-app#creating-the-action"&gt;this post&lt;/a&gt; for an example of updating a message in the case of failure.&lt;/li&gt;
&lt;li&gt;Schedule a function to serve as a watchdog, that marks a message as timed out if it hasn’t finished within a certain timeframe, just in case the action failed. See &lt;a href="https://stack.convex.dev/background-job-management#5-monitoring-timeouts"&gt;this post&lt;/a&gt; for more details, as well as other patterns for background jobs.&lt;/li&gt;
&lt;li&gt;Organize the messages by thread or user, using &lt;a href="https://docs.convex.dev/database/indexes/"&gt;indexes&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>react</category>
      <category>chatgpt</category>
      <category>convex</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Using Pinecone and Embeddings</title>
      <dc:creator>Ian Macartney</dc:creator>
      <pubDate>Mon, 10 Jul 2023 21:34:11 +0000</pubDate>
      <link>https://forem.com/ianmacartney/using-pinecone-and-embeddings-12gn</link>
      <guid>https://forem.com/ianmacartney/using-pinecone-and-embeddings-12gn</guid>
      <description>&lt;p&gt;Looking to implement semantic search or add on-demand context to a GPT prompt so it doesn’t just make shit up (as much)? &lt;a href="https://www.pinecone.io/"&gt;Pinecone&lt;/a&gt; and Convex are a good match when you’re looking to build an app that leverages &lt;a href="https://stack.convex.dev/the-magic-of-embeddings"&gt;embeddings&lt;/a&gt; and also has user data. Pinecone stores and queries over vectors efficiently, and Convex stores relational &amp;amp; document data with strong transaction guarantees and nifty end-to-end data reactivity.&lt;/p&gt;

&lt;p&gt;Let’s walk through how this shakes out in practice. If you want to see some code you can play around with, check out &lt;a href="https://github.com/ianmacartney/embeddings-in-convex/"&gt;this GitHub repo&lt;/a&gt; where you can add your own data and compare it and search over it using Convex and Pinecone.&lt;/p&gt;

&lt;h1&gt;
  
  
  High-level user flow
&lt;/h1&gt;

&lt;p&gt;To start, what’s an &lt;strong&gt;example&lt;/strong&gt;? With Pinecone and Convex, you can have a flow like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A user submits a &lt;strong&gt;question&lt;/strong&gt; and starts &lt;strong&gt;subscribing&lt;/strong&gt; to the question’s results. Under the hood, Convex has stored the question in a &lt;strong&gt;document&lt;/strong&gt; and kicked off an asynchronous &lt;strong&gt;action&lt;/strong&gt;. If this question has been asked before, it might re-use previous results.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;action&lt;/strong&gt; creates an &lt;strong&gt;embedding&lt;/strong&gt; using a service like OpenAI or Cohere. It can persist this embedding for posterity or to be able to search for similar questions.&lt;/li&gt;
&lt;li&gt;The action uses the embedding to &lt;strong&gt;query&lt;/strong&gt; Pinecone (or any vector store) for &lt;strong&gt;related documents&lt;/strong&gt;, &lt;strong&gt;products&lt;/strong&gt;, or whatever your embeddings represent.&lt;/li&gt;
&lt;li&gt;The action stores the &lt;strong&gt;results&lt;/strong&gt; in the question &lt;strong&gt;document&lt;/strong&gt;, which automatically reflows to update the user’s client with the new data - potentially returning materialized data pulled from other documents in the database associated with the results.&lt;/li&gt;
&lt;li&gt;If this is part of a broader &lt;strong&gt;chain&lt;/strong&gt; of operations, it might use the related documents to compose a &lt;strong&gt;prompt&lt;/strong&gt; to an &lt;strong&gt;LLM&lt;/strong&gt; like &lt;a href="https://stack.convex.dev/full-stack-chatgpt-app"&gt;ChatGPT&lt;/a&gt;, using both the related documents and the question to get a more &lt;strong&gt;contextual answer&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  A word on streaming updates
&lt;/h2&gt;

&lt;p&gt;At every step, updates written to the Convex database will update &lt;strong&gt;all&lt;/strong&gt; subscribed clients. Unlike raw HTTP streaming, Convex subscriptions can be trivially consumed by multiple clients in parallel and are resilient to network issues or page refreshes. The data received will be from a consistent snapshot of the database state, making it easier to reason about correctness.&lt;/p&gt;



&lt;p&gt;Many of the code snippets below can be found in &lt;a href="https://github.com/ianmacartney/embeddings-in-convex/"&gt;this GitHub repo&lt;/a&gt; which you’re welcome to play around with using your own data and API keys. If you are desperate for a hosted playground, let me know &lt;a href="http://discord.gg/convex"&gt;in Discord&lt;/a&gt;!&lt;/p&gt;

&lt;h1&gt;
  
  
  Adding data to Convex and Pinecone
&lt;/h1&gt;

&lt;p&gt;Depending on the application, you may have a large mostly-static corpus of data, or be continually adding data — which I’ll refer to as a &lt;strong&gt;source&lt;/strong&gt; below. The process looks something like the following:&lt;/p&gt;

&lt;h2&gt;
  
  
  Break up your source into bite-sized chunks.
&lt;/h2&gt;

&lt;p&gt;This helps limit how much data you pass (embedding models have context limits), as well as make the embedding more targeted. You could imagine an embedding of this whole post might not rank as highly against “How do you add data to Convex and Pinecone?” as an embedding that just covered this section. &lt;/p&gt;

&lt;p&gt;To do this, you can split it yourself or use a library like LangChain’s &lt;code&gt;RecursiveCharacterTextSplitter&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;RecursiveCharacterTextSplitter&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;langchain/text_splitter&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;textSplitter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RecursiveCharacterTextSplitter&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;chunkSize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ChunkSize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;splitTexts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;textSplitter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createDocuments&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="nx"&gt;pageContent&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;splitTexts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pageContent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;loc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can tune the size and algorithm to your own needs. One tip is to add some overlap, so each chunk has some text from the previous and next sections.&lt;/p&gt;

&lt;h2&gt;
  
  
  Store the source in the database
&lt;/h2&gt;

&lt;p&gt;The Convex database is a great place to store all the metadata that your app will want. For embeddings based on text, you’ll likely even want to store the chunk of text in the database, so you can quickly access it to return as part of queries for a client or as part of a pipeline. For larger data, like video, it makes more sense to store the data in file storage.&lt;/p&gt;

&lt;p&gt;Importantly, you should &lt;strong&gt;not&lt;/strong&gt; store the text chunk directly in Pinecone metadata, as it can quickly fill up the index because &lt;strong&gt;all metadata is indexed by default in Pinecone&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;addSource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;DatabaseWriter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;from&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;to&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}[]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sourceId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;sources&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;chunkIds&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;
    &lt;span class="na"&gt;saved&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunkIds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(({&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;lines&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="nx"&gt;chunkIndex&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
      &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;chunks&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nx"&gt;sourceId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nx"&gt;chunkIndex&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nx"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;patch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sourceId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;chunkIds&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sourceId&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are a few things to note here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I’m both saving a back-reference from chunks to sources, as well as a forward reference from a source to many chunks. As discussed in &lt;a href="https://stack.convex.dev/relationship-structures-let-s-talk-about-schemas"&gt;this post on database relationships&lt;/a&gt;, this is a way to get quick access in both directions without having to define an extra index when you have a small number of relations (1024 technically but 100 is my rule of thumb).&lt;/li&gt;
&lt;li&gt;I’m saving an empty array at first, then patching it with the chunk IDs once I insert them. Convex generates unique IDs on insertion. At this time you can’t pre-allocate or specify custom primary IDs for documents.&lt;/li&gt;
&lt;li&gt;I’m creating the source with &lt;code&gt;saved: false&lt;/code&gt; - we’ll update this once we’ve saved the embeddings into Pinecone. This allows the client to know the insertion status, as well as help with transient failures, which we’ll see later on.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Kick off a background action
&lt;/h2&gt;

&lt;p&gt;The Convex &lt;code&gt;mutation&lt;/code&gt; function is transactional but as a result, we can’t perform a non-transactional operation like talking to a third-party service in the middle of a mutation. A Convex &lt;code&gt;action&lt;/code&gt; is non-transactional and can talk to the outside world. One trick I like to use is to schedule an action to execute after a mutation commits, ensuring that the communication with the outside world only happens if the mutation has successfully run.&lt;/p&gt;

&lt;p&gt;Mutations in Convex are transactions and are prohibited from having non-transactional side effects like calling other cloud services. With actions you can make these sorts of calls, but how do you “call” an action from a mutation if the mutation can’t have side effects? A pattern I really like is to schedule the action from the mutation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;scheduler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;runAfter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sources&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;addEmbedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;source&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(({&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Thanks to Convex’s strong transaction guarantees, the action is only invoked if the mutation successfully commits, so you’ll never have an action running for a source that doesn’t exist.&lt;/p&gt;

&lt;h2&gt;
  
  
  Create an embedding
&lt;/h2&gt;

&lt;p&gt;From our &lt;a href="https://docs.convex.dev/functions/actions"&gt;action&lt;/a&gt;, we can fetch embeddings. See &lt;a href="https://stack.convex.dev/the-magic-of-embeddings"&gt;this post&lt;/a&gt; for more information on what embeddings are. See the &lt;a href="https://github.com/ianmacartney/embeddings-in-convex/blob/main/convex/lib/embeddings.ts"&gt;code for fetchEmbeddingBatch here&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;embeddings&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetchEmbeddingBatch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Upsert into Pinecone
&lt;/h2&gt;

&lt;p&gt;Adding data into Pinecone is a straightforward operation. “Upsert” for those unfamiliar is an update if the specified &lt;code&gt;id&lt;/code&gt; already exists, otherwise it inserts.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;upsertVectors&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;chunks&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// namespace&lt;/span&gt;
  &lt;span class="nx"&gt;source&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chunkIds&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;chunkIndex&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;values&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;chunkIndex&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;sourceId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;source&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;textLen&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;chunkIndex&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;}))&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Tips:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We aren’t including much metadata here - in general, you should &lt;strong&gt;only store metadata&lt;/strong&gt; that you might want to use &lt;strong&gt;to limit Pinecone queries&lt;/strong&gt; - such as keywords, categories, or in this case text length&lt;sup id="fnref1"&gt;1&lt;/sup&gt;.&lt;/li&gt;
&lt;li&gt;We’re &lt;strong&gt;re-using the Convex document ID&lt;/strong&gt; for the pinecone vector. This isn’t required—you could make up your own ID and store that in the Convex document—but I find it very handy. Results of Pinecone queries, without returning metadata, can be used directly with &lt;code&gt;db.get&lt;/code&gt; which is wicked fast. It also means you can fetch or delete the Pinecone vector for a given chunk, without storing an extra ID.&lt;/li&gt;
&lt;li&gt;I used the table name as the Pinecone namespace for convenience, so queries for chunks wouldn’t return vectors for other data. This isn’t required but helped me with organization and naming fatigue.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Tip: Use the &lt;code&gt;@pinecone-database/pinecone&lt;/code&gt; Pinecone client for the best experience in Convex.&lt;/p&gt;

&lt;p&gt;There are two action runtimes in Convex: our optimized runtime, and a generic node environment. When possible I prefer using the optimized runtime, so I can keep the actions in the same file as the queries and mutations, along with some performance benefits. However, our runtime doesn’t support all npm libraries. Thankfully the pinecone package doesn’t depend on any incompatible packages and just uses the &lt;code&gt;fetch&lt;/code&gt; standard under the hood. This is also why I prefer using &lt;code&gt;fetch&lt;/code&gt; and the OpenAI HTTP API directly above. See &lt;a href="https://docs.convex.dev/functions/runtimes"&gt;here&lt;/a&gt; for more information on runtimes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mark the source as “saved”
&lt;/h2&gt;

&lt;p&gt;All we need to do to notify the frontend that the data has been saved is to update the source. Any queries that reference the source document will be updated with the latest data automatically.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;runMutation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sources&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;patch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;source&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;patch&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;saved&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;totalTokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;embeddingMs&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At this point, our data is in Convex and an embedding vector is saved in Pinecone. &lt;/p&gt;

&lt;h2&gt;
  
  
  Extensions
&lt;/h2&gt;

&lt;p&gt;Beyond saving chunks of the source, you might also consider:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Adding an embedding of a summary of the whole source.&lt;/li&gt;
&lt;li&gt;Add a hierarchy of searches - where you could separately search for a category of documents and then provide that category as a &lt;a href="https://docs.pinecone.io/docs/metadata-filtering"&gt;metadata filter&lt;/a&gt; in a later query.&lt;/li&gt;
&lt;li&gt;Namespacing or otherwise segmenting user data so you never accidentally leak context between users.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Searching for sources
&lt;/h1&gt;

&lt;p&gt;Similarly to inserting data, to do a semantic search over your documents, you can:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Insert the search into a table of searches.&lt;/strong&gt; If there’s already an identical search, you could even decide to re-use those results. This is handy for iterating on large pipelines and keeping latency and costs low.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;searchId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;searches&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;count&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Kick off an action&lt;/strong&gt; transactionally.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;scheduler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;runAfter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;searches&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;search&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;searchId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;topK&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Create an embedding&lt;/strong&gt; of the search. Aside: I have a hunch there’s a lot of opportunity here for ways of transforming the raw search into a better text input for the embedding.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;embedding&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetchEmbedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Use the pinecone query&lt;/strong&gt; to find nearby vectors representing chunks.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;matches&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;pinecone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;queryRequest&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;chunks&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;topK&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Update the search results&lt;/strong&gt; by running a mutation.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;runMutation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;searches&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;patch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;searchId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;patch&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="nx"&gt;relatedChunks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Optional: &lt;strong&gt;store the search embedding in Pinecone&lt;/strong&gt; if you want to be able to search semantically over searches themselves!&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Note: you could just do steps 2-4 directly in an action if you don’t care about keeping a cache and storing the search vector.&lt;/p&gt;

&lt;h2&gt;
  
  
  Returning results to the client
&lt;/h2&gt;

&lt;p&gt;The client can subscribe to the search document’s ID:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useQuery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;searches&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;semanticSearch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;searchId&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The query looks for the search and returns the related chunks along with their source’s name:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;semanticSearch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;searchId&lt;/span&gt; &lt;span class="p"&gt;}:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;searchId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Id&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;searches&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;search&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;searchId&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;search&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;relatedChunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;pruneNull&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nx"&gt;search&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;relatedChunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;score&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
          &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
          &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;source&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sourceId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
          &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;sourceName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;source&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
      &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is parallelized by using &lt;code&gt;Promise.all&lt;/code&gt; and calls to &lt;code&gt;db.get&lt;/code&gt; are cached.&lt;/p&gt;

&lt;h1&gt;
  
  
  Summary
&lt;/h1&gt;

&lt;p&gt;In this post, we looked at using Pinecone and Embeddings in Convex. One natural extension of this is to then use the sources as part of a GPT prompt template, but I’ll leave that for a future post. Let us know &lt;a href="https://convex.dev/community"&gt;in discord&lt;/a&gt; what you think and what you’d like to see next!&lt;/p&gt;




&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;At the Pinecone hackathon, there was a discussion of issues of semantic rankings sometimes behaving oddly - in my case, when I searched over a corpus of famous poems for “what is the meaning of life?” one of the top hits was a “hello world” dummy text I had added. One participant mentioned that a useful filter—after listing a plethora of sophisticated ranking strategies—was to just exclude text less than 200 characters. Intuitively this makes some sense - the longer something is, as short phrases probably have higher semantic variance. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>webdev</category>
      <category>vectordatabase</category>
      <category>openai</category>
      <category>convex</category>
    </item>
    <item>
      <title>The Magic of Embeddings</title>
      <dc:creator>Ian Macartney</dc:creator>
      <pubDate>Tue, 06 Jun 2023 16:00:00 +0000</pubDate>
      <link>https://forem.com/ianmacartney/the-magic-of-embeddings-afc</link>
      <guid>https://forem.com/ianmacartney/the-magic-of-embeddings-afc</guid>
      <description>&lt;p&gt;How similar are the strings “I care about strong ACID guarantees” and “I like transactional databases”? While there’s a number of ways we could compare these strings—syntactically or grammatically for instance—one powerful thing AI models give us is the ability to compare these semantically, using something called &lt;em&gt;embeddings&lt;/em&gt;. Given a model, such as OpenAI’s &lt;code&gt;text-embedding-ada-002&lt;/code&gt;, I can tell you that the aforementioned two strings have a similarity of 0.784, and are more similar than “I care about strong ACID guarantees” and “I like MongoDB” 😛. With embeddings, we can do a whole suite of powerful things:&lt;sup id="fnref1"&gt;1&lt;/sup&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Search&lt;/strong&gt; (where results are ranked by relevance to a query string)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clustering&lt;/strong&gt; (where text strings are grouped by similarity)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Recommendations&lt;/strong&gt; (where items with related text strings are recommended)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anomaly detection&lt;/strong&gt; (where outliers with little relatedness are identified)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Diversity measurement&lt;/strong&gt; (where similarity distributions are analyzed)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Classification&lt;/strong&gt; (where text strings are classified by their most similar label)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This article will look at working with raw OpenAI embeddings.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is an embedding?
&lt;/h2&gt;

&lt;p&gt;An embedding is ultimately a list of numbers that describe a piece of text, for a given model. In the case of OpenAI’s model, it’s always a 1,536-element-long array of numbers. Furthermore, for OpenAI, the numbers are all between -1 and 1, and if you treat the array as a vector in 1,536-dimensional space, it has a magnitude of 1 (i.e. it’s “normalized to length 1” in linear algebra lingo).&lt;/p&gt;

&lt;p&gt;On a conceptual level, you can think of each number in the array as capturing some aspect of the text. Two arrays are considered similar to the degree that they have similar values in each element in the array. You don’t have to know what any of the individual values correspond to—that’s both the beauty and the mystery of embeddings—you just need to compare the resulting arrays. We’ll look at how to compute this similarity below.&lt;/p&gt;

&lt;p&gt;Depending on what model you use, you can get wildly different arrays, so it only makes sense to compare arrays that come from the same model. It also means that different models may disagree about what is similar. You could imagine one model being more sensitive to whether the string rhymes. You could fine-tune a model for your specific use case, but I’d recommend starting with a general-purpose one to start, for similar reasons as to why to generally pick Chat GPT over fine-tuned text generation models.&lt;/p&gt;

&lt;p&gt;It’s beyond the scope of this post, but it’s also worth mentioning that we’re just looking at text embeddings here, but there are also models to turn images and audio into embeddings, with similar implications.&lt;/p&gt;

&lt;h2&gt;
  
  
  How do I get an embedding?
&lt;/h2&gt;

&lt;p&gt;There are a few models to turn text into an embedding. To use a hosted model behind an API, I’d recommend &lt;a href="https://platform.openai.com/docs/guides/embeddings"&gt;OpenAI&lt;/a&gt;, and that’s what we’ll be using in this article. For open-source options, you can check out &lt;a href="https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2#all-minilm-l6-v2"&gt;all-MiniLM-L6-v2&lt;/a&gt; or &lt;a href="https://huggingface.co/sentence-transformers/all-mpnet-base-v2"&gt;all-mpnet-base-v2&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Assuming you have an &lt;a href="https://platform.openai.com/account/api-keys"&gt;API key&lt;/a&gt; in your &lt;a href="https://docs.convex.dev/production/hosting/environment-variables"&gt;environment variables&lt;/a&gt;, you can get an embedding via a simple &lt;code&gt;fetch&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;fetchEmbedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://api.openai.com/v1/embeddings&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;Authorization&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Bearer &lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text-embedding-ada-002&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;jsonresults&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;jsonresults&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For efficiency, I’d recommend fetching multiple embeddings at once in a batch.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;fetchEmbeddingBatch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[])&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://api.openai.com/v1/embeddings&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;Authorization&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Bearer &lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;

    &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text-embedding-ada-002&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;jsonresults&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;allembeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jsonresults&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;number&lt;/span&gt;&lt;span class="p"&gt;[];&lt;/span&gt;
    &lt;span class="nl"&gt;index&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}[];&lt;/span&gt;
  &lt;span class="nx"&gt;allembeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;index&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;allembeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(({&lt;/span&gt; &lt;span class="nx"&gt;embedding&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Where should I store it?
&lt;/h2&gt;

&lt;p&gt;Once you have an embedding vector, you’ll likely want to do one of two things with it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Use it to search for similar strings (i.e. search for similar embeddings).&lt;/li&gt;
&lt;li&gt;Store it to be searched against in the future.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you plan to store thousands of vectors, I’d recommend using a dedicated vector database like &lt;a href="https://www.pinecone.io/"&gt;Pinecone&lt;/a&gt;. This allows you to quickly find nearby vectors for a given input, without having to compare against every vector every time. Stay tuned for a future post on using Pinecone alongside Convex.&lt;/p&gt;

&lt;p&gt;If you don’t have many vectors, however, you can just store them directly in a normal database. In my case, if I want to suggest &lt;a href="https://stack.convex.dev"&gt;Stack&lt;/a&gt; posts similar to a given post or search, I only need to compare against fewer than 100 vectors, so I can just fetch them all and compare them in a matter of milliseconds using the &lt;a href="https://convex.dev"&gt;Convex&lt;/a&gt; database.&lt;/p&gt;

&lt;h3&gt;
  
  
  How should I store an embedding?
&lt;/h3&gt;

&lt;p&gt;If you’re storing your embeddings in Pinecone, stay tuned for a dedicated post on it, but the short answer is you configure a Pinecone “Index” and store some metadata along with the vector, so when you get results from Pinecone you can easily re-associate them with your application data. For instance, you can store the document ID for a row that you want to associate with the vector.&lt;/p&gt;

&lt;p&gt;If you’re storing the embedding in Convex, I’d advise storing it as a binary blob rather than a javascript array of numbers. Convex advises to &lt;a href="https://docs.convex.dev/database/types"&gt;not store arrays longer than 1024 elements&lt;/a&gt;. We can achieve this by converting it into a Float32Array pretty easily in JavaScript:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;numberList&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetchEmbedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;inputText&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// number[]&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;floatArray&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Float32Array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;numberList&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// Float32Array&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;floatBytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;floatArray&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// ArrayBuffer&lt;/span&gt;
&lt;span class="c1"&gt;// Save floatBytes to the DB&lt;/span&gt;
&lt;span class="c1"&gt;// Later, after you read the bytes back out:&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;arrayAgain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Float32Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;bytesFromDB&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// Float32Array&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can represent the embedding as a field in a table &lt;a href="https://docs.convex.dev/database/schemas"&gt;in your schema&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="nx"&gt;vectors&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;defineTable&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;float32Buffer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="na"&gt;textId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;texts&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;}),&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this case, I store the vector alongside an ID of a document in the “texts” table.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to compare embeddings in JavaScript
&lt;/h2&gt;

&lt;p&gt;If you’re looking to compare two embeddings from OpenAI without using a vector database, it’s very simple. There’s &lt;a href="https://www.pinecone.io/learn/roughly-explained/distance-between-vectors/"&gt;a few ways of comparing vectors&lt;/a&gt;, including Euclidean distance, dot product, and cosine similarity. Thankfully, because OpenAI normalizes all the vectors to be length 1, they will all give the same rankings! With a simple &lt;a href="https://www.notion.so/Product-f8d495cd1e29469289011becf658f547?pvs=21"&gt;dot product&lt;/a&gt; you can get a similarity score ranging from -1 (totally unrelated) to 1 (incredibly similar). There are optimized libraries to do it, but for my purposes, this simple function suffices:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="cm"&gt;/**
 * Compares two vectors by doing a dot product.
 *
 * Assuming both vectors are normalized to length 1, it will be in [-1, 1].
 * @returns [-1, 1] based on similarity. (1 is the same, -1 is the opposite)
 */&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;compare&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;vectorA&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Float32Array&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;vectorB&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Float32Array&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;vectorA&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reduce&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;val&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;sum&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;val&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;vectorB&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Example
&lt;/h3&gt;

&lt;p&gt;In this example, let’s make a function (a Convex &lt;a href="https://docs.convex.dev/functions/query-functions"&gt;query&lt;/a&gt; in this case) that returns all of the vectors and their similarity scores in order based on some query vector, assuming a table of &lt;code&gt;vectors&lt;/code&gt; as we defined above, and the &lt;code&gt;compare&lt;/code&gt; function we just defined.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;compareTo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;vectorId&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;target&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;vectorId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;targetArray&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Float32Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;float32Buffer&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;vectors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;vectors&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;collect&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;vectors&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_id&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;equals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;vectorId&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;compare&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
          &lt;span class="nx"&gt;targetArray&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Float32Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;float32Buffer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;textId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;textId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;vectorId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_id&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
      &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;score&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;score&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;In this post, we looked at embeddings, why they’re useful, and how we can store and use them in Convex. I’ll be making more posts on working with embeddings, including chunking long input into multiple embeddings and using Pinecone alongside Convex soon. Let us know in &lt;a href="https://convex.dev/community"&gt;our Discord&lt;/a&gt; what you think!&lt;/p&gt;




&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;Copied from &lt;a href="https://platform.openai.com/docs/guides/embeddings/what-are-embeddings"&gt;OpenAI’s guide&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>javascript</category>
      <category>openai</category>
      <category>vectordatabase</category>
      <category>convex</category>
    </item>
    <item>
      <title>Building a Full-Stack ChatGPT app</title>
      <dc:creator>Ian Macartney</dc:creator>
      <pubDate>Mon, 05 Jun 2023 21:11:38 +0000</pubDate>
      <link>https://forem.com/ianmacartney/building-a-full-stack-chatgpt-app-58nm</link>
      <guid>https://forem.com/ianmacartney/building-a-full-stack-chatgpt-app-58nm</guid>
      <description>&lt;p&gt;In this post, we’ll walk through putting together a full-stack chat app, and add some features. &lt;/p&gt;

&lt;p&gt;We’ll use React and Vite, though any React framework works. We’ll use &lt;a href="https://convex.dev"&gt;Convex&lt;/a&gt; as the backend, where our server-side functions will run, and where we’ll store our app’s data. &lt;/p&gt;

&lt;p&gt;To see this in action, the code is &lt;a href="https://github.com/ianmacartney/convex-chat-gpt"&gt;here&lt;/a&gt;, and a running version is &lt;a href="https://convex-chat-gpt-f6rq.vercel.app/"&gt;here&lt;/a&gt; (for now). You can clone it and run it yourself with a few configurations in the README, but read on to see a step by step guide on building your own. Also, the published version uses the "authed" branch in the repo, in case you want to see how simple it is to add auth.&lt;/p&gt;

&lt;p&gt;To run this yourself, you’ll need to make an &lt;a href="https://platform.openai.com/signup"&gt;OpenAI account&lt;/a&gt; and &lt;a href="https://platform.openai.com/account/api-keys"&gt;get an API key&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you’re familiar with Convex, you can skip ahead to step 2.&lt;/p&gt;

&lt;h2&gt;
  
  
  0. Bootstrap a Vite React app
&lt;/h2&gt;

&lt;p&gt;If you don’t already have an app, we can make one:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm create vite@latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I picked &lt;code&gt;convex-chatgpt&lt;/code&gt; as the project name, &lt;code&gt;React&lt;/code&gt; as the framework, and &lt;code&gt;Javascript&lt;/code&gt; as the variant.&lt;/p&gt;

&lt;p&gt;At this point, if we run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install
&lt;/span&gt;npm run dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;we have a locally running webapp.&lt;/p&gt;

&lt;p&gt;Let’s change &lt;code&gt;src/App.jsx&lt;/code&gt; to list messages and have a form to submit messages:&lt;/p&gt;

&lt;p&gt;App.jsx&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;useState&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;react&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;./App.css&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;App&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="na"&gt;author&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Hello, world&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;];&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sendMessage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Trying to send: &lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;newMessageText&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;setNewMessageText&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
    &lt;span class="nf"&gt;useState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt; &lt;span class="na"&gt;className&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"App"&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;p&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;span&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;author&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;: &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;span&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;span&lt;/span&gt; &lt;span class="na"&gt;style&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;whiteSpace&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;pre-wrap&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;...&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
          &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;span&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;p&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;form&lt;/span&gt; &lt;span class="na"&gt;onSubmit&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;preventDefault&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="nf"&gt;setNewMessageText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nf"&gt;sendMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;newMessageText&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;input&lt;/span&gt;
          &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;newMessageText&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
          &lt;span class="na"&gt;onChange&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;setNewMessageText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
          &lt;span class="na"&gt;placeholder&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"Write a message…"&lt;/span&gt;
        &lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;input&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"submit"&lt;/span&gt; &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"Send"&lt;/span&gt; &lt;span class="na"&gt;disabled&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;newMessageText&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt; &lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;form&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nx"&gt;App&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At this point, we have an app that shows a static list of messages and logs to the console when trying to send a message.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Add Convex
&lt;/h2&gt;

&lt;p&gt;This is similar to the Convex &lt;a href="https://docs.convex.dev/quickstart/react"&gt;quickstart&lt;/a&gt;. In a new terminal (leave the other one running &lt;code&gt;npm run dev&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;convex
npx convex init
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you haven’t used convex before, you’ll be prompted to log in, create an account, etc. I picked &lt;code&gt;convex-chatgpt&lt;/code&gt; as the project name.&lt;/p&gt;

&lt;p&gt;Let’s add the ability to send messages and list messages. We will add functions that run in Convex (on a server) that will read and write to the database with a &lt;a href="https://docs.convex.dev/functions/query-functions"&gt;query&lt;/a&gt; and &lt;a href="https://docs.convex.dev/functions/mutation-functions"&gt;mutation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Add &lt;code&gt;convex/messages.ts&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;mutation&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;./_generated/server&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;messages&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;collect&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;send&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;mutation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;messages&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;author&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;botMessageId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;messages&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;author&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;assistant&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In a separate terminal from the one running &lt;code&gt;npm run dev&lt;/code&gt;, run &lt;code&gt;npx convex dev&lt;/code&gt;. This will deploy these functions to your new Convex backend, and as you edit the functions, it will automatically re-deploy. This will ask you to save the deployment URL into your &lt;code&gt;.env&lt;/code&gt; and &lt;code&gt;.env.local&lt;/code&gt; files: these point at your convex backends for your production and development deployments. We’ll just be working with the development deployment.&lt;/p&gt;

&lt;p&gt;To access Convex from inside React, you’ll need to add a &lt;code&gt;&amp;lt;ConvexProvider&amp;gt;&lt;/code&gt; context at the top level of your app. Edit &lt;code&gt;main.jsx&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;ConvexProvider&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ConvexReactClient&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;convex/react&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;convex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ConvexReactClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;import&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;VITE_CONVEX_URL&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nx"&gt;ReactDOM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createRoot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getElementById&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;root&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;render&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;React&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;StrictMode&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;ConvexProvider&lt;/span&gt; &lt;span class="na"&gt;client&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;convex&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;App&lt;/span&gt; &lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nc"&gt;ConvexProvider&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nc"&gt;React&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;StrictMode&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;,&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can then use these functions from &lt;code&gt;App.jsx&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;App&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useQuery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;messages:list&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sendMessage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useMutation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;messages:send&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Great! Now we have messages being written to and from the database. Try it out! Those new to Convex will be surprised to see that adding new messages will automatically result in the &lt;code&gt;useQuery("messages:list")&lt;/code&gt; hook returning new messages. This is part of the magic of Convex. Learn more about it &lt;a href="https://docs.convex.dev/understanding/convex-fundamentals/functions#reactivity"&gt;here&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;You can check out your data in the dashboard: &lt;code&gt;npx convex dashboard&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;You may note that the query and mutation are named by the &lt;code&gt;filename:function&lt;/code&gt;. See more about this &lt;a href="https://docs.convex.dev/using/writing-convex-functions#defining-convex-functions"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;You’ll notice every time we send a message, there’s a “…” message from the assistant. Next let’s update that message from chatGPT. &lt;/p&gt;

&lt;h2&gt;
  
  
  2. Send a message to the ChatGPT API
&lt;/h2&gt;

&lt;p&gt;So far we’ve been working with a &lt;a href="https://docs.convex.dev/functions/query-functions"&gt;query&lt;/a&gt; and &lt;a href="https://docs.convex.dev/functions/mutation-functions"&gt;mutation&lt;/a&gt;, which are Convex functions that interact with the database. In order to interact with external services, we need to do that in an “&lt;a href="https://docs.convex.dev/functions/actions"&gt;action&lt;/a&gt;” - which is a Convex function that isn’t inside a deterministic environment and isn’t part of a database transaction. This frees us to do things with side effects, such as making requests to OpenAI’s API. We’ll do this in a few steps.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fetching messages to send to ChatGPT
&lt;/h3&gt;

&lt;p&gt;In our &lt;code&gt;send&lt;/code&gt; mutation, we can grab the latest 10 messages to send to ChatGPT using a database query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;send&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;mutation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;//... insert messages to the table&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;messages&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;order&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;desc&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;q&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;q&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;neq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;q&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;body&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;take&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;21&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reverse&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;botMessageId&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This orders messages in descending order (sorted by creation time unless you use a different &lt;a href="https://docs.convex.dev/database/indexes"&gt;index&lt;/a&gt;), filters out messages with no body (for instance, the new placeholder system message), and takes the first 21 (which are the 21 most recent messages). It then reverses the list so it’s ordered in ascending time order. I picked 21 so that we’ll have 10 pairs of user/bot messages, followed by the latest prompt. It returns the messages to be used as input to the chat completion API, which we'll add next.&lt;/p&gt;

&lt;h3&gt;
  
  
  Creating the action
&lt;/h3&gt;

&lt;p&gt;We’ll use the &lt;code&gt;openai&lt;/code&gt; npm package. You can install it with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;openai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Make a new file: &lt;code&gt;convex/openai.js&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;use node&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Configuration&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;OpenAIApi&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;openai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;action&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;../_generated/server&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;action&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;runMutation&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;botMessageId&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;runMutation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;messages:send&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fail&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="c1"&gt;// Grab the API key from environment variables&lt;/span&gt;
  &lt;span class="c1"&gt;// Specify this in your dashboard: `npx convex dashboard`&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;apiKey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fail&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Add your OPENAI_API_KEY as an env variable&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;configuration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Configuration&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;apiKey&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;openai&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OpenAIApi&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;configuration&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;openaiResponse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createChatCompletion&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-3.5-turbo&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;system&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;instructions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(({&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;author&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;author&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;})),&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;openaiResponse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fail&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;OpenAI error: &lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;openaiResponse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;statusText&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;openaiResponse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Response: &lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will first send the messages with the send mutation we modified, getting in return the list of messages and message ID to update with the bot’s message. It will then make a request to the &lt;code&gt;gpt-3.5-turbo&lt;/code&gt; model, passing in one system message with instructions (hard-coded for now), followed by each message. We’re turning our body &amp;amp; author fields into “role” and “content”. See their docs &lt;a href="https://platform.openai.com/docs/guides/chat"&gt;here&lt;/a&gt; for more details on the API.&lt;/p&gt;

&lt;h3&gt;
  
  
  Add your API key to the dashboard
&lt;/h3&gt;

&lt;p&gt;To authenticate requests to OpenAI, you’ll need to add your &lt;a href="https://beta.openai.com/account/api-keys"&gt;API key&lt;/a&gt; to the Convex dashboard, so your action (which runs on Convex’s servers) has access to it. Go to your dashboard with &lt;code&gt;npx convex dashboard&lt;/code&gt; and on the left panel go into Settings and add the key with the name &lt;code&gt;OPENAI_API_KEY&lt;/code&gt;. While you’re there, you can add it as an env variable to both your production and development deployments by toggling the dropdown in the left panel.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--F_G2ookc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.sanity.io/images/ts10onj4/production/6ed549cf10845a91f6b6c8697a165b253705f4b8-1166x629.png%3Fw%3D450" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--F_G2ookc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://cdn.sanity.io/images/ts10onj4/production/6ed549cf10845a91f6b6c8697a165b253705f4b8-1166x629.png%3Fw%3D450" alt="Dashboard side panel" width="450" height="243"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Triggering the action from the UI
&lt;/h3&gt;

&lt;p&gt;We can change our call to &lt;code&gt;useQuery("messages:send")&lt;/code&gt; to&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sendMessage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useAction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;openai:chat&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note the action is addressed by the path to the file (&lt;code&gt;openai&lt;/code&gt;) and the function (&lt;code&gt;chat&lt;/code&gt;). The API is the same (taking one argument, &lt;code&gt;body&lt;/code&gt;), so it’s a drop-in replacement!&lt;/p&gt;

&lt;p&gt;At this point, if you run it, it will send a request to ChatGPT and console.log the response. Convex prints Convex function logs to the browser’s console, though you can also see them in the &lt;a href="https://dashboard.convex.dev/"&gt;dashboard&lt;/a&gt;, to prove to yourself it’s running in the cloud. Now let’s show them in the UI.&lt;/p&gt;

&lt;h3&gt;
  
  
  Updating the message with the reply
&lt;/h3&gt;

&lt;p&gt;In order to get the response into the chat message, we’ll want to update the empty placeholder message we added. Treat each call to a mutation as a transaction. Once we called our first mutation, the messages were committed to the database and the UI could show the new messages before doing the slow call to OpenAI. Once we get the response, we update the bot’s message and the UI will update automatically via the &lt;code&gt;messages:list&lt;/code&gt; query. We’ll add a new mutation in the &lt;code&gt;convex/messages.js&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="c1"&gt;// An `internalMuation` can only be called from other server functions.&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;update&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;internalMutation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;messageId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;patch&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;patch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;messageId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;patch&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This patches the specified message. You could pass in &lt;code&gt;{body: "hi"}&lt;/code&gt; to change the message’s body to “hi,” for example. Since the first mutation returned the &lt;code&gt;botMessageId&lt;/code&gt;, we can use that from the action in &lt;code&gt;convex/openai.js&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;action&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;runMutation&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;botMessageId&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;runMutation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;messages:send&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="c1"&gt;// Call OpenAI&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;runMutation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;messages:update&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;messageId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;botMessageId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="na"&gt;patch&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;openaiResponse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="c1"&gt;// Track how many tokens we're using for various messages&lt;/span&gt;
      &lt;span class="na"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;openaiResponse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;updatedAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
      &lt;span class="c1"&gt;// How long it took OpenAI&lt;/span&gt;
      &lt;span class="na"&gt;ms&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Number&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;openaiResponse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;openai-processing-ms&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;While we’re here, we can store a bunch of interesting data into the message along with the body. Convex’s database is flexible enough to store &lt;a href="https://docs.convex.dev/using/types"&gt;strings, numbers, javascript objects, &lt;code&gt;Map&lt;/code&gt;s, etc.&lt;/a&gt; while also letting you nail down a &lt;a href="https://docs.convex.dev/database/schemas"&gt;schema&lt;/a&gt; later to get typescript types, autocomplete, etc. &lt;/p&gt;

&lt;p&gt;🎉 Now you have chatGPT replying to your messages!&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;In this post, we made a full-stack web app to chat with OpenAI’s ChatGPT API. Let us know &lt;a href="https://convex.dev/community"&gt;in Discord&lt;/a&gt; what you think, and if you’d like to see some extensions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://stack.convex.dev/adding-personality-to-chatgpt-3"&gt;Customizing the identity you’re talking to.&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://stack.convex.dev/moderating-chatgpt-content-full-stack"&gt;Moderating the identities and messages to prevent harmful content.&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Create new message threads. (See the &lt;a href="https://github.com/ianmacartney/convex-chat-gpt"&gt;code&lt;/a&gt; for how it's done!)&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>react</category>
      <category>openai</category>
      <category>convex</category>
      <category>javascript</category>
    </item>
  </channel>
</rss>
