<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Kong</title>
    <description>The latest articles on Forem by Kong (@konghq).</description>
    <link>https://forem.com/konghq</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F13211%2F63f22eae-9468-4f4b-bdbe-2f4f7977490a.png</url>
      <title>Forem: Kong</title>
      <link>https://forem.com/konghq</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/konghq"/>
    <language>en</language>
    <item>
      <title>💰Monetize Your AI Agents with LangChain and Kong</title>
      <dc:creator>Teja Kummarikuntla</dc:creator>
      <pubDate>Tue, 05 May 2026 15:26:47 +0000</pubDate>
      <link>https://forem.com/konghq/how-to-monetize-your-ai-agents-with-langchain-and-kong-1fn0</link>
      <guid>https://forem.com/konghq/how-to-monetize-your-ai-agents-with-langchain-and-kong-1fn0</guid>
      <description>&lt;p&gt;Say you built an AI agent and customers are starting to pay for it. Sooner or later you'll want to charge them by what they actually use, because some customers hammer the agent all day while others send a handful of messages a week. A single flat fee loses money on the heavy users and overcharges the light ones.&lt;/p&gt;

&lt;p&gt;The billing problem is the same whether your agent runs on your own model (self-hosted, fine-tuned, or trained from scratch) or calls a third-party API like &lt;strong&gt;OpenAI&lt;/strong&gt;, &lt;strong&gt;Anthropic&lt;/strong&gt;, or &lt;strong&gt;Gemini&lt;/strong&gt;. You still need to know which customer made which call, count the tokens it used, and turn that into a dollar amount on a real invoice. That mapping (request → customer → token count → dollar amount → invoice) is yours to build, and that's what this tutorial sets up.&lt;/p&gt;

&lt;p&gt;The agent uses &lt;a href="https://js.langchain.com/" rel="noopener noreferrer"&gt;&lt;strong&gt;LangChain&lt;/strong&gt;&lt;/a&gt;, which sits one layer above the model so the same metering code works regardless of what's behind it. The example runs on OpenAI's &lt;code&gt;gpt-4o-mini&lt;/code&gt; for convenience, but swap the chat model and nothing else changes. A small LangChain callback records each call's input and output token counts, tagged with the customer ID. Those records flow to &lt;a href="https://developer.konghq.com/metering-and-billing/" rel="noopener noreferrer"&gt;&lt;strong&gt;Kong Konnect Metering &amp;amp; Billing&lt;/strong&gt;&lt;/a&gt;, which keeps a running per-customer tally, applies your prices (input and output tokens can be priced separately), and produces invoices on a monthly cycle.&lt;/p&gt;

&lt;h2&gt;
  
  
  See it in action first
&lt;/h2&gt;

&lt;p&gt;Before getting into the setup, here is what the finished pipeline looks like end to end. The agent runs on one side and reports the tokens it just used. Those same tokens land as a billable line item on the customer's invoice in Kong on the other.&lt;/p&gt;

&lt;h3&gt;
  
  
  The AI Agent App
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F88hdvr77xlr14kc2zttj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F88hdvr77xlr14kc2zttj.png" alt=" " width="800" height="657"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The user types &lt;code&gt;Hello world&lt;/code&gt;. The agent replies with &lt;code&gt;Hello! How can I assist you today?&lt;/code&gt;. Both ends happen to land on &lt;strong&gt;9 tokens&lt;/strong&gt;. The input count is 9 rather than 2 because OpenAI wraps the prompt in chat-message formatting, which adds a few more beyond the literal words. The output landing on 9 as well. The agent fires off one record for the input tokens and another for the output tokens, both tagged with the customer (&lt;code&gt;acme&lt;/code&gt;).&lt;/p&gt;

&lt;h3&gt;
  
  
  Metering and Billing the Agent in Kong
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1ylipk6x82v51tjbd4yc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1ylipk6x82v51tjbd4yc.png" alt=" " width="800" height="456"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The same call now sits there as a real billable line item. With a simple test pricing of $1 per input token and $2 per output token, the math lines up:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Input: 9 tokens × $1 = &lt;strong&gt;$9&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Output: 9 tokens × $2 = &lt;strong&gt;$18&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Total: &lt;strong&gt;$27&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Same numbers on both sides of the pipeline. That is what we are about to build.&lt;/p&gt;

&lt;p&gt;Let's go through it step by step.&lt;/p&gt;

&lt;p&gt;AI Agent App: &lt;a href="https://github.com/tejakummarikuntla/llm-metering-langchian-kong" rel="noopener noreferrer"&gt;github.com/tejakummarikuntla/llm-metering-langchian-kong&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhtfliv9s93re0ggm6ogw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhtfliv9s93re0ggm6ogw.png" alt=" " width="800" height="442"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Every LLM call produces two CloudEvents. One carries the prompt token count, the other carries the response token count. Both events carry a &lt;code&gt;subject&lt;/code&gt; field set to the customer identifier. Kong groups events by subject, sums the token field, multiplies by the rate card configured on the customer's plan, and rolls everything into invoices on the billing cycle.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Why this stack&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Kong Konnect Metering &amp;amp; Billing&lt;/strong&gt; fits this tutorial for three specific reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Open source core.&lt;/strong&gt; The metering side is built on &lt;a href="https://openmeter.io/" rel="noopener noreferrer"&gt;OpenMeter&lt;/a&gt;, which is open source. You can self-host the metering pipeline, or use the managed Konnect service. &lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Configurable billing engine.&lt;/strong&gt; Meters, features, plans, rate cards, and subscriptions are first-class primitives, configured in the portal rather than shipped as code.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You're not replacing Stripe here; you're using Kong as the metering and invoicing layer that feeds it.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What you will build&lt;/strong&gt;
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt; A LangChain callback handler that emits two CloudEvents per LLM call&lt;/li&gt;
&lt;li&gt; A Kong meter that filters &lt;code&gt;kong.llm_request&lt;/code&gt; events and sums the &lt;code&gt;tokens&lt;/code&gt; field&lt;/li&gt;
&lt;li&gt; Two features (input and output tokens) feeding a plan with separate rate cards&lt;/li&gt;
&lt;li&gt; A customer subscribed to that plan, with metered usage and dollar values in the Konnect portal&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  Node.js 22.6 or higher&lt;/li&gt;
&lt;li&gt;  pnpm: &lt;code&gt;npm install -g pnpm&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;  An OpenAI API key&lt;/li&gt;
&lt;li&gt;  A free Kong Konnect account: &lt;a href="https://konghq.com" rel="noopener noreferrer"&gt;konghq.com&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  A Konnect Personal Access Token with Metering &amp;amp; Billing write permissions&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Tutorial map&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Part 1: Add Metering into the AI Agent app&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Clone the AI agent app&lt;/li&gt;
&lt;li&gt; Configure environment variables&lt;/li&gt;
&lt;li&gt; Walk through the codebase&lt;/li&gt;
&lt;li&gt; Run the AI Agent app&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Part 2: Connect to Kong Metering &amp;amp; Billing&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Create a &lt;strong&gt;Meter&lt;/strong&gt; in Kong M&amp;amp;B&lt;/li&gt;
&lt;li&gt; Create &lt;strong&gt;Features&lt;/strong&gt; for input and output tokens&lt;/li&gt;
&lt;li&gt; Create a &lt;strong&gt;Plan&lt;/strong&gt; with Rate Cards&lt;/li&gt;
&lt;li&gt; Create the &lt;strong&gt;Customer&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt; Add a &lt;strong&gt;Subscription&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt; Inspect usage and &lt;strong&gt;Invoices&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt; Connect a Payment provider&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Part 1: Add Metering into the AI Agent app&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Clone the AI Agent app
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/tejakummarikuntla/llm-metering-langchian-kong
&lt;span class="nb"&gt;cd &lt;/span&gt;llm-metering-langchian-kong
pnpm &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The reference is two TypeScript files. &lt;code&gt;handler.ts&lt;/code&gt; is the metering callback. &lt;code&gt;index.ts&lt;/code&gt; is a small chain that reads a prompt from stdin so you have something to exercise the handler with. No sidecar service, no separate ingestion worker, no extra runtime dependency beyond LangChain and the OpenAI client.&lt;/p&gt;

&lt;h3&gt;
  
  
  Configure environment variables
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open &lt;code&gt;.env&lt;/code&gt; and fill in real values:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;API_URL=https://us.api.konghq.tech/v3/openmeter/events
API_KEY=your-konnect-personal-access-token
SUBJECT=acme
MODEL=gpt-4o-mini
OPENAI_API_KEY=your-openai-api-key
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Variable&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;API_URL&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Kong Konnect ingestion endpoint. The default is the US region. EU organizations use &lt;code&gt;https://eu.api.konghq.tech/v3/openmeter/events&lt;/code&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;API_KEY&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Konnect &lt;a href="https://docs.konghq.com/konnect/api/" rel="noopener noreferrer"&gt;Personal Access Token&lt;/a&gt; with Metering &amp;amp; Billing write scope.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;SUBJECT&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Customer identifier attached to every event. Use &lt;code&gt;acme&lt;/code&gt; for testing. In production this comes from your authenticated session, not an env var.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;MODEL&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Any chat-completion model. &lt;code&gt;gpt-4o-mini&lt;/code&gt; keeps testing cheap.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;OPENAI_API_KEY&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Standard &lt;a href="https://platform.openai.com/api-keys" rel="noopener noreferrer"&gt;OpenAI API key&lt;/a&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Walk through the codebase
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;MeteringCallbackHandler&lt;/code&gt; extends LangChain's &lt;a href="https://js.langchain.com/docs/concepts/callbacks/" rel="noopener noreferrer"&gt;&lt;code&gt;BaseCallbackHandler&lt;/code&gt;&lt;/a&gt; and implements two of its lifecycle hooks. Callbacks fire at the same place token counts are reported, you do not need to subclass the LLM client, and the LangChain &lt;code&gt;runId&lt;/code&gt; gives you a stable event ID for free.&lt;/p&gt;

&lt;h4&gt;
  
  
  handleLLMStart
&lt;/h4&gt;

&lt;p&gt;This hook fires immediately before the model is called. The handler captures run metadata so the LLM end hook can build a CloudEvent with the right customer attribution:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;handleLLMStart&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;_llm&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Serialized&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;_prompts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;
  &lt;span class="nx"&gt;runId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;parentRunId&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;_extraParams&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;_tags&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;
  &lt;span class="nx"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;parentRunId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;parentMetadata&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;runMetadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;parentRunId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;parentMetadata&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;assign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;parentMetadata&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;runMetadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;runId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The parent run check matters. LLM calls almost always run inside a chain, agent, or tool-calling flow, which LangChain models as a parent run. When you set metadata at &lt;code&gt;chain.invoke({}, { metadata: { subject: 'acme' } })&lt;/code&gt;, LangChain attaches it to the chain run, not the child LLM run. Without merging parent metadata into the child, the LLM end hook reads an empty metadata object and the &lt;code&gt;subject&lt;/code&gt; is lost.&lt;/p&gt;

&lt;h4&gt;
  
  
  handleLLMEnd
&lt;/h4&gt;

&lt;p&gt;This hook fires after the model returns. The handler reads token counts from &lt;code&gt;output.llmOutput.tokenUsage&lt;/code&gt; (the field OpenAI fills on non-streaming completions), builds two CloudEvents, and posts each to the Kong ingestion endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;handleLLMEnd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;LLMResult&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;runId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;promptTokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;completionTokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
    &lt;span class="nx"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;llmOutput&lt;/span&gt;&lt;span class="p"&gt;?.[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;tokenUsage&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt;
    &lt;span class="nx"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;llmOutput&lt;/span&gt;&lt;span class="p"&gt;?.[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;estimatedTokenUsage&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt;
    &lt;span class="p"&gt;{};&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;promptTokens&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;completionTokens&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;metadata&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;runMetadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;runId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="p"&gt;{};&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;subject&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ls_model_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ls_provider&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ls_model_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;inputEvent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;specversion&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;1.0&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;runId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;-input`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;langchain&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;kong.llm_request&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;subject&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;input&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;promptTokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ls_model_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ls_provider&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;outputEvent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;specversion&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;1.0&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;runId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;-output`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;langchain&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;kong.llm_request&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;subject&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;output&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;completionTokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ls_model_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ls_provider&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ingest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;inputEvent&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ingest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;outputEvent&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few decisions in this block matter for production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The &lt;code&gt;id&lt;/code&gt; field combines the LangChain &lt;code&gt;runId&lt;/code&gt; with &lt;code&gt;-input&lt;/code&gt; or &lt;code&gt;-output&lt;/code&gt;. Kong &lt;a href="https://openmeter.io/docs/getting-started/concepts" rel="noopener noreferrer"&gt;deduplicates events by &lt;code&gt;id&lt;/code&gt; plus &lt;code&gt;source&lt;/code&gt;&lt;/a&gt;, so retries do not double-bill.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;data.type&lt;/code&gt; separates input from output tokens at the event level. That separation is what makes per-token-class pricing possible without running two meters.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Anything you pass in &lt;code&gt;metadata&lt;/code&gt; at &lt;code&gt;chain.invoke&lt;/code&gt; time spreads into &lt;code&gt;data&lt;/code&gt;. Tenant tier, region, feature flag: add it once at invoke time and filter on it in the meter. No handler changes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;ingest&lt;/code&gt; is a plain &lt;code&gt;fetch&lt;/code&gt; POST with a Bearer token header. No SDK, no batching layer.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Read the agent entry point
&lt;/h4&gt;

&lt;p&gt;&lt;code&gt;index.ts&lt;/code&gt; wires &lt;code&gt;ChatOpenAI&lt;/code&gt; up with the metering handler and runs a small one-shot chain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;MeteringCallbackHandler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;apiUrl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ChatOpenAI&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;openaiApiKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;callbacks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;PromptTemplate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromTemplate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;{input}&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;StringOutputParser&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;chain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;userInput&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;subject&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;kong&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;strong&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two lines do the integration. &lt;code&gt;callbacks: [handler]&lt;/code&gt; on the &lt;code&gt;ChatOpenAI&lt;/code&gt; instance attaches the handler to every call made through it. The &lt;code&gt;metadata&lt;/code&gt; block on &lt;code&gt;chain.invoke&lt;/code&gt; carries the customer identifier into the run metadata that &lt;code&gt;handleLLMStart&lt;/code&gt; reads. The &lt;code&gt;kong: 'strong'&lt;/code&gt; field is just a metadata pass-through demonstration: anything you add in that block lands inside &lt;code&gt;data&lt;/code&gt; on the CloudEvent.&lt;/p&gt;

&lt;h3&gt;
  
  
  Run the AI Agent app
&lt;/h3&gt;

&lt;h5&gt;
  
  
  Start the app:
&lt;/h5&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pnpm start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Type a prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You: Explain how token-based usage billing works for LLM applications.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The handler logs both events as it sends them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;MeteringCallbackHandler: ingesting event {
  specversion: '1.0',
  id: '019dd41b-f14e-705b-a4dd-894bd025c73d-input',
  source: 'langchain',
  type: 'kong.llm_request',
  subject: 'acme',
  data: { kong: 'strong', type: 'input', tokens: 18, model: 'gpt-4o-mini', provider: 'openai', model_type: 'chat' }
}

AI: Token-based usage billing charges customers based on the number of tokens consumed...

MeteringCallbackHandler: ingesting event {
  specversion: '1.0',
  id: '019dd41b-f14e-705b-a4dd-894bd025c73d-output',
  source: 'langchain',
  type: 'kong.llm_request',
  subject: 'acme',
  data: { kong: 'strong', type: 'output', tokens: 156, model: 'gpt-4o-mini', provider: 'openai', model_type: 'chat' }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both events are in Kong. They will not appear in a customer's usage view or invoice until Part 2 is set up.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 2: &lt;strong&gt;Connect to Kong Metering &amp;amp; Billing&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The next sections build the meter, features, plan, and subscription that turn the raw event stream into priced, per-customer usage. The flow follows the &lt;a href="https://developer.konghq.com/metering-and-billing/concepts/" rel="noopener noreferrer"&gt;Konnect M&amp;amp;B concepts model&lt;/a&gt;: events feed meters, meters feed features, features attach to plans through rate cards, customers subscribe to plans.&lt;/p&gt;

&lt;p&gt;Open &lt;a href="https://cloud.konghq.com" rel="noopener noreferrer"&gt;cloud.konghq.com&lt;/a&gt; and confirm you're in the region matching &lt;code&gt;API_URL&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Create the LLM Tokens meter
&lt;/h3&gt;

&lt;p&gt;A meter is a continuously-running query over the event stream. It picks events that match a filter, applies an aggregation, and exposes the result as a numeric usage value.&lt;/p&gt;

&lt;p&gt;In the Konnect console:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Left navigation: &lt;strong&gt;Metering &amp;amp; Billing&lt;/strong&gt; → &lt;strong&gt;Metering&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt; Top right: &lt;strong&gt;Create Meter&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt; Choose template: &lt;strong&gt;LLM Tokens&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9uqbf3pi7zss32qq5yqn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9uqbf3pi7zss32qq5yqn.png" alt=" " width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The LLM Tokens template fills in the right defaults for this handler:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Event type filter&lt;/strong&gt;: &lt;code&gt;kong.llm_request&lt;/code&gt; (matches the &lt;code&gt;type&lt;/code&gt; field on every CloudEvent the handler emits)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Aggregation&lt;/strong&gt;: Sum&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Value property&lt;/strong&gt;: &lt;code&gt;tokens&lt;/code&gt; (reads &lt;code&gt;data.tokens&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Click &lt;strong&gt;Save&lt;/strong&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  CLI alternative
&lt;/h4&gt;

&lt;p&gt;The same meter can be created through the &lt;a href="https://developer.konghq.com/api/konnect/metering-and-billing-api/v3/" rel="noopener noreferrer"&gt;Konnect API&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST https://us.api.konghq.tech/v3/openmeter/meters &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$API_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "name": "LLM Tokens",
    "key": "llm-tokens",
    "description": "LLM token usage",
    "event_type": "kong.llm_request",
    "aggregation": "SUM",
    "value_property": "$.tokens",
    "dimensions": { "type": "$.type", "provider": "$.provider", "model": "$.model" }
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Create Features
&lt;/h3&gt;

&lt;p&gt;Features turn a single meter into multiple billable units. You need two: one exposing only input-token events, one exposing only output-token events. The split is what makes asymmetric pricing possible (most providers charge more for output than input).&lt;/p&gt;

&lt;p&gt;Left navigation: &lt;strong&gt;Product Catalog&lt;/strong&gt; → &lt;strong&gt;Features&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwdxtrtt6wl7mxbos09wi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwdxtrtt6wl7mxbos09wi.png" alt=" " width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Input token feature
&lt;/h4&gt;

&lt;p&gt;Click &lt;strong&gt;Create Feature&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Name&lt;/strong&gt;: &lt;code&gt;Input Token&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Key&lt;/strong&gt;: auto-fills from the name (&lt;code&gt;input_token&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Meter&lt;/strong&gt;: &lt;code&gt;LLM Tokens&lt;/code&gt; (from the dropdown)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Meter Group Filters&lt;/strong&gt;: add a single filter

&lt;ul&gt;
&lt;li&gt;  Field: &lt;code&gt;type&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;  Operator: equals&lt;/li&gt;
&lt;li&gt;  Value: &lt;code&gt;input&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Save.&lt;/p&gt;

&lt;h4&gt;
  
  
  Output token feature
&lt;/h4&gt;

&lt;p&gt;Same form, output values:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Name&lt;/strong&gt;: &lt;code&gt;Output Token&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Key&lt;/strong&gt;: auto-fills from the name (&lt;code&gt;input_token&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Meter&lt;/strong&gt;: &lt;code&gt;LLM Tokens&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Meter Group Filter&lt;/strong&gt;: &lt;code&gt;type&lt;/code&gt; equals &lt;code&gt;output&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Save.&lt;/p&gt;

&lt;p&gt;The same meter now feeds two features, each filtered to a different event subset.&lt;/p&gt;




&lt;h3&gt;
  
  
  Create a Plan with usage-based Rate Cards
&lt;/h3&gt;

&lt;p&gt;A plan is what a customer subscribes to. Inside it, rate cards attach prices to features.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Product Catalog&lt;/strong&gt; → &lt;strong&gt;Plans&lt;/strong&gt; → &lt;strong&gt;New Plan&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Name&lt;/strong&gt;: &lt;code&gt;Pro&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Click &lt;strong&gt;Save&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Inside the new plan, add two rate cards.&lt;/p&gt;

&lt;h4&gt;
  
  
  Input token rate card
&lt;/h4&gt;

&lt;p&gt;Click &lt;strong&gt;Add Rate Card&lt;/strong&gt; and select the &lt;code&gt;input token&lt;/code&gt; feature:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Pricing model&lt;/strong&gt;: Usage-based&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Price per unit&lt;/strong&gt;: &lt;code&gt;1&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdh9huala8za7vxtjgizy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdh9huala8za7vxtjgizy.png" alt=" " width="800" height="381"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Two notes about this field that bite people on the first run.&lt;/p&gt;

&lt;p&gt;First, &lt;strong&gt;price per unit is the price for a single token&lt;/strong&gt;. Not per thousand, not per million. There is no toggle that switches the unit. Production rates are decimals like &lt;code&gt;0.000003&lt;/code&gt;. The example uses &lt;code&gt;1&lt;/code&gt; here so the dollar values on the test invoice are large and obvious.&lt;/p&gt;

&lt;p&gt;Second, &lt;strong&gt;the pricing model selector decides whether the feature is metered or flat&lt;/strong&gt;. Choosing flat-fee here would charge a fixed amount per cycle regardless of usage, which is the opposite of what you want for a metered feature.&lt;/p&gt;

&lt;h4&gt;
  
  
  Output token rate card
&lt;/h4&gt;

&lt;p&gt;Click &lt;strong&gt;Add Rate Card&lt;/strong&gt; again, select &lt;code&gt;output token&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Pricing model&lt;/strong&gt;: Usage-based&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Price per unit&lt;/strong&gt;: &lt;code&gt;2&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Save. Output tokens now cost twice the input rate, which roughly mirrors how OpenAI and most other providers price the underlying API.&lt;/p&gt;




&lt;h3&gt;
  
  
  Create the customer
&lt;/h3&gt;

&lt;p&gt;The customer record needs to be created manually. The &lt;code&gt;subject&lt;/code&gt; field on every CloudEvent ties a token usage event to a specific customer through the customer's &lt;strong&gt;key&lt;/strong&gt;, so the key has to match the &lt;code&gt;SUBJECT&lt;/code&gt; value in your &lt;code&gt;.env&lt;/code&gt; (&lt;code&gt;acme&lt;/code&gt; in this tutorial).&lt;/p&gt;

&lt;p&gt;Left navigation: &lt;strong&gt;Metering &amp;amp; Billing&lt;/strong&gt; → &lt;strong&gt;Billing&lt;/strong&gt; → &lt;strong&gt;Customers&lt;/strong&gt;. Top right: &lt;strong&gt;Create new&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb9vir1h5lvq3j2vdmt40.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb9vir1h5lvq3j2vdmt40.png" alt=" " width="800" height="414"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Fill in the form:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Name&lt;/strong&gt;: &lt;code&gt;acme&lt;/code&gt; (display name shown in the portal)
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Key&lt;/strong&gt;: &lt;code&gt;acme&lt;/code&gt; (must match the &lt;code&gt;SUBJECT&lt;/code&gt; env value)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Click &lt;strong&gt;Save&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The customer is now in the system but does not have any plan attached yet. Token events tagged with &lt;code&gt;subject: acme&lt;/code&gt; will associate to this record once a subscription is in place.&lt;/p&gt;

&lt;h3&gt;
  
  
  Add a subscription
&lt;/h3&gt;

&lt;p&gt;A subscription connects this customer to the &lt;code&gt;Pro&lt;/code&gt; plan you built earlier. Without it, events still flow into the meter but never produce invoice line items.&lt;/p&gt;

&lt;p&gt;Open the &lt;code&gt;acme&lt;/code&gt; customer page and switch to the &lt;strong&gt;Subscriptions&lt;/strong&gt; tab. Click &lt;strong&gt;Create subscription&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcy448cbriovo2cmp8x4z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcy448cbriovo2cmp8x4z.png" alt=" " width="800" height="461"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Step 1 of the wizard: pick the plan.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Subscription plan&lt;/strong&gt;: &lt;code&gt;Pro&lt;/code&gt; (the plan with input-token and output-token rate cards)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Click &lt;strong&gt;Next&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Step 2: timing and billing cycle. Defaults are fine for testing.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Start subscription&lt;/strong&gt;: Immediately&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Bill&lt;/strong&gt;: Monthly&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Starting&lt;/strong&gt;: Start of subscription&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Click &lt;strong&gt;Next&lt;/strong&gt;, then &lt;strong&gt;Start subscription&lt;/strong&gt; on the confirmation step.&lt;/p&gt;

&lt;p&gt;The subscription is now active. The next call from &lt;code&gt;pnpm start&lt;/code&gt; lands inside an active billing window and rolls into an invoice.&lt;/p&gt;

&lt;h3&gt;
  
  
  Track usage and invoices
&lt;/h3&gt;

&lt;p&gt;Run the agent a few times with prompts long enough that the response is more than a handful of tokens, otherwise the input and output counts can look almost identical.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pnpm start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Back in Konnect, open the &lt;code&gt;acme&lt;/code&gt; customer page from the &lt;strong&gt;Billing&lt;/strong&gt; section and switch to the &lt;strong&gt;Invoicing&lt;/strong&gt; tab.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8y7bx5ajbeahct9z38ww.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8y7bx5ajbeahct9z38ww.png" alt=" " width="800" height="456"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The view shows the active plan, both rate cards, accumulated usage per feature, and the running invoice total. With test rates of &lt;code&gt;$1&lt;/code&gt; per input token and &lt;code&gt;$2&lt;/code&gt; per output token, even four prompts produce a dollar value that is easy to verify against the handler's logged token counts. Switch to production decimals like &lt;code&gt;0.0000015&lt;/code&gt; and &lt;code&gt;0.000006&lt;/code&gt; and the same view continues to work, just with smaller numbers.&lt;/p&gt;




&lt;h3&gt;
  
  
  Connect a payment provider
&lt;/h3&gt;

&lt;p&gt;The metering and billing layer ends at invoice generation. Actually charging the customer needs a payment provider.&lt;/p&gt;

&lt;p&gt;Konnect connects to providers like &lt;strong&gt;Stripe&lt;/strong&gt; to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Sync customer payment methods between Konnect and the provider&lt;/li&gt;
&lt;li&gt;  Charge invoices automatically when the billing cycle closes&lt;/li&gt;
&lt;li&gt;  Handle dunning, retries, and failed payments&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The metering pipeline doesn't change when payment providers change. Kong owns usage aggregation and invoice generation. The provider only handles collection. That separation makes it possible to support multiple providers, switch between them, or test with one provider in staging and another in production without touching any code.&lt;/p&gt;




&lt;h2&gt;
  
  
  Gotchas
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Input and output token counts that look identical.&lt;/strong&gt; Short prompts can produce the same input and output token count by coincidence. The input count includes chat message formatting overhead (role markers, message delimiters) added by OpenAI before the prompt reaches the model, so a two-word prompt is rarely two tokens. Use a longer prompt to see the counts diverge clearly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Events appear in the meter but not in invoices.&lt;/strong&gt; The subscription started after the events were ingested. Kong only invoices events that fall inside an active subscription window. Run the app again after creating the subscription.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;subject&lt;/code&gt; &lt;strong&gt;missing warning in the logs.&lt;/strong&gt; The handler logs &lt;code&gt;could not find 'subject' in run metadata&lt;/code&gt; when the metadata block doesn't include a subject. Check that &lt;code&gt;.env&lt;/code&gt; exists (not just &lt;code&gt;.env.example&lt;/code&gt;), that &lt;code&gt;SUBJECT&lt;/code&gt; is set, and that the metadata block in &lt;code&gt;index.ts&lt;/code&gt; reads &lt;code&gt;subject&lt;/code&gt; from the env variable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;EU vs US endpoint.&lt;/strong&gt; The default &lt;code&gt;API_URL&lt;/code&gt; is the US endpoint. EU Konnect organizations need &lt;code&gt;https://eu.api.konghq.tech/v3/openmeter/events&lt;/code&gt;. Wrong region produces silent ingestion failures. Confirm the region from Konnect organization settings.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Event deduplication.&lt;/strong&gt; Kong deduplicates by &lt;code&gt;id&lt;/code&gt; plus &lt;code&gt;source&lt;/code&gt;. Replaying the same event twice produces one record, not two. The handler builds &lt;code&gt;id&lt;/code&gt; from the LangChain &lt;code&gt;runId&lt;/code&gt;, so this is rarely an issue in normal use, but worth knowing if events are being replayed or generated outside this handler.&lt;/p&gt;




&lt;h2&gt;
  
  
  Production checklist
&lt;/h2&gt;

&lt;p&gt;The reference app demonstrates the mechanics. A production setup needs a few real changes.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;subject&lt;/code&gt; &lt;strong&gt;from auth, not env.&lt;/strong&gt; Replace &lt;code&gt;SUBJECT=acme&lt;/code&gt; with a value pulled from the authenticated user session. Each chain invocation passes the real customer ID into the metadata block.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Per-model pricing.&lt;/strong&gt; Add &lt;code&gt;model&lt;/code&gt; to the meter group filters on each feature and run different rate cards per model. GPT-4o, GPT-4o-mini, Claude, and others can all be priced independently while sharing one meter.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Custom segmentation.&lt;/strong&gt; Any field added to the metadata block lands in data on the CloudEvent. Add tenant tier, region, or provider and filter or group on them in the meter to bill differently per segment.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Usage alerts.&lt;/strong&gt; Once events flow, configure usage thresholds in Kong to notify customers, throttle them, or pause subscriptions when they hit a limit.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Idempotent retries.&lt;/strong&gt; The handler doesn't retry failed &lt;code&gt;ingest()&lt;/code&gt; calls. Wrap &lt;code&gt;fetch&lt;/code&gt; with a small retry layer (exponential backoff, max attempts) to handle transient network errors without losing billable events. Kong's deduplication on &lt;code&gt;id + source&lt;/code&gt; makes safe retries straightforward.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;The full reference AI Agent app is at &lt;a href="https://github.com/tejakummarikuntla/llm-metering-langchian-kong" rel="noopener noreferrer"&gt;https://github.com/tejakummarikuntla/llm-metering-langchian-kong&lt;/a&gt;. Clone, configure, and the metering pipeline runs locally in a few minutes. Adding it to an existing LangChain agent is a single line: &lt;code&gt;callbacks: [handler]&lt;/code&gt; on the LLM client. Everything else is Kong configuration.&lt;/p&gt;

&lt;p&gt;What's the trickiest part of metering an AI agent in production for you? Streaming responses, multi-model pricing, or per-tenant segmentation? Drop a comment.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>langchain</category>
      <category>programming</category>
      <category>webdev</category>
    </item>
    <item>
      <title>💰I Built a Token Billing System for My AI Agent - Here's How It Works</title>
      <dc:creator>Teja Kummarikuntla</dc:creator>
      <pubDate>Tue, 31 Mar 2026 15:39:56 +0000</pubDate>
      <link>https://forem.com/konghq/i-built-a-token-billing-system-for-my-ai-agent-heres-how-it-works-dl2</link>
      <guid>https://forem.com/konghq/i-built-a-token-billing-system-for-my-ai-agent-heres-how-it-works-dl2</guid>
      <description>&lt;p&gt;I've been building an AI agent that routes requests across multiple LLM providers, &lt;strong&gt;OpenAI&lt;/strong&gt;, &lt;strong&gt;Anthropic&lt;/strong&gt; etc., based on the task. But pretty quickly, I hit a real problem: &lt;em&gt;how do you charge for this fairly?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Flat subscriptions didn't make sense. Token costs vary by model, input vs output, and actual usage. A user generating a two-line summary isn't the same as someone churning out 3,000-word articles, yet flat pricing treats them the same.&lt;/p&gt;

&lt;p&gt;I looked at a few options for usage-based billing. &lt;strong&gt;Stripe Billing&lt;/strong&gt; has metered subscriptions but you have to build your own token tracking pipeline on top. &lt;strong&gt;Orb&lt;/strong&gt; and &lt;strong&gt;Metronome&lt;/strong&gt; are good, but they're separate vendors, you'd still need something to capture token data from your LLM calls and pipe it in. What I wanted was something at the gateway level, where the traffic already flows.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbeci2wp1ljaq0d7kl42f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbeci2wp1ljaq0d7kl42f.png" alt=" " width="800" height="263"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I ended up using &lt;strong&gt;&lt;a href="https://konghq.com/products/kong-ai-gateway" rel="noopener noreferrer"&gt;Kong AI Gateway&lt;/a&gt;&lt;/strong&gt; with &lt;strong&gt;&lt;a href="https://konghq.com/products/kong-konnect/features/usage-based-metering-and-billing" rel="noopener noreferrer"&gt;Konnect Metering &amp;amp; Billing&lt;/a&gt;&lt;/strong&gt; (built on &lt;strong&gt;OpenMeter&lt;/strong&gt;). The gateway proxies every LLM request, so it already knows the token counts. The metering layer plugs directly into that. No separate vendor, no custom pipeline.&lt;/p&gt;

&lt;p&gt;So instead of debating about pricing models, I set up the billing layer. A working system where every API request flows through a gateway, gets tracked, and is priced based on real usage:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;🚧 Route requests through AI Gateway&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;🪙 Tokens get metered per consumer&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;💵 Pricing gets applied&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;🧾 Invoice generated&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here's the whole setup, step by step.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Set up the gateway&lt;/li&gt;
&lt;li&gt;Step 1: Create a consumer&lt;/li&gt;
&lt;li&gt;Step 2: Configure the AI Proxy&lt;/li&gt;
&lt;li&gt;Step 3: Enable token metering&lt;/li&gt;
&lt;li&gt;Step 4: Create a feature&lt;/li&gt;
&lt;li&gt;Step 5: Create a plan with a rate card&lt;/li&gt;
&lt;li&gt;Step 6: Create a subscription&lt;/li&gt;
&lt;li&gt;Step 7: Validate the invoice&lt;/li&gt;
&lt;li&gt;Step 8: Connect Stripe&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;The billing pipeline has three layers:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Kong AI Gateway&lt;/strong&gt; proxies the LLM requests. It sits between the app and the provider, handles auth, and this is the part that matters for billing, it logs token statistics for every request.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Konnect Metering &amp;amp; Billing&lt;/strong&gt; (this is built on &lt;strong&gt;OpenMeter&lt;/strong&gt;) takes those token events and aggregates them per consumer, per billing cycle. It supports defining features, pricing models, and plans on top of the raw usage data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stripe&lt;/strong&gt; collects payment. The metering layer generates invoices that sync to Stripe.&lt;/p&gt;

&lt;p&gt;Let me walk through each piece.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;You can do this entirely through the UI or via CLI. I'll cover both as we go.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A &lt;a href="https://konghq.com/products/kong-konnect" rel="noopener noreferrer"&gt;Kong Konnect&lt;/a&gt; account&lt;/li&gt;
&lt;li&gt;An &lt;strong&gt;OpenAI&lt;/strong&gt; API key (or any LLM provider key of your choice)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For CLI, you'll also need &lt;a href="https://developer.konghq.com/deck/" rel="noopener noreferrer"&gt;decK (v1.43+)&lt;/a&gt; installed and a &lt;a href="https://cloud.konghq.com/global/account/tokens" rel="noopener noreferrer"&gt;PAT from Kong Konnect&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Set Up the Gateway
&lt;/h2&gt;

&lt;p&gt;Once you log in, click on &lt;strong&gt;API Gateway&lt;/strong&gt; and create one.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fms4m351xq50wk94vsdk7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fms4m351xq50wk94vsdk7.png" alt=" " width="800" height="553"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I'm using Serverless here. You can choose Self-managed too. Enter the gateway name as &lt;code&gt;ai-service&lt;/code&gt; and click &lt;strong&gt;Create and configure&lt;/strong&gt;. Once that's done, click &lt;strong&gt;Add a service and route&lt;/strong&gt; and fill in:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw0qxc9dwgjcbqnsbyowd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw0qxc9dwgjcbqnsbyowd.png" alt=" " width="800" height="477"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Service Name:&lt;/strong&gt; &lt;code&gt;ai-service&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Service URL:&lt;/strong&gt; &lt;code&gt;http://httpbin.konghq.com/anything&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Route Name:&lt;/strong&gt; &lt;code&gt;ai-chat&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Route Path:&lt;/strong&gt; &lt;code&gt;/chat&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  CLI
&lt;/h3&gt;

&lt;p&gt;If you prefer the command line, generate your PAT and run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;KONNECT_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'your_konnect_pat'&lt;/span&gt;
curl &lt;span class="nt"&gt;-Ls&lt;/span&gt; https://get.konghq.com/quickstart | bash &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-k&lt;/span&gt; &lt;span class="nv"&gt;$KONNECT_TOKEN&lt;/span&gt; &lt;span class="nt"&gt;--deck-output&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives you a running Kong Gateway connected to Konnect. It'll output some environment variables, export them as instructed. You'll also need:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;DECK_OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'your_openai_api_key'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then set up the service and route:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;_format_version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3.0"&lt;/span&gt;
&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ai-service&lt;/span&gt;
    &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://httpbin.konghq.com/anything&lt;/span&gt;
&lt;span class="na"&gt;routes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ai-chat&lt;/span&gt;
    &lt;span class="na"&gt;paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/chat"&lt;/span&gt;
    &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ai-service&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Apply it with &lt;code&gt;deck gateway apply&lt;/code&gt;. Now you have a route at &lt;code&gt;/chat&lt;/code&gt; that we'll wire up to an LLM.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Create a Consumer
&lt;/h2&gt;

&lt;p&gt;You can't bill anyone if the gateway doesn't know &lt;em&gt;who&lt;/em&gt; is making the request. Consumers are how Kong identifies API callers. Later, we'll map each consumer to a billing customer.&lt;/p&gt;

&lt;p&gt;Add a consumer with a key-auth credential:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0gmknorg1j0xdfl4tcip.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0gmknorg1j0xdfl4tcip.png" alt=" " width="800" height="347"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F35iwgwyce9skaht7ows1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F35iwgwyce9skaht7ows1.png" alt=" " width="800" height="363"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can enter the Key value as &lt;code&gt;acme-secret-key&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Now, you need to add the key-auth plugin to the service so the gateway actually requires authentication:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Click on &lt;strong&gt;Plugins&lt;/strong&gt; in the left sidebar&lt;/li&gt;
&lt;li&gt;Click on &lt;strong&gt;New Plugin&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Select &lt;strong&gt;Key Authentication&lt;/strong&gt; from the plugin list&lt;/li&gt;
&lt;li&gt;Select &lt;strong&gt;Service&lt;/strong&gt; as the scope or keep it as &lt;strong&gt;Global&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Save&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  CLI
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;_format_version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3.0"&lt;/span&gt;
&lt;span class="na"&gt;consumers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;username&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;acme-corp&lt;/span&gt;
    &lt;span class="na"&gt;keyauth_credentials&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;acme-secret-key&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then enable the key-auth plugin on the service so the gateway actually requires authentication:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;_format_version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3.0"&lt;/span&gt;
&lt;span class="na"&gt;plugins&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;key-auth&lt;/span&gt;
    &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ai-service&lt;/span&gt;
    &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;key_names&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;apikey&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Apply both with &lt;code&gt;deck gateway apply&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Now every request to &lt;code&gt;/chat&lt;/code&gt; must include an &lt;code&gt;apikey&lt;/code&gt; header. The gateway identifies the caller as &lt;code&gt;acme-corp&lt;/code&gt;, and that identity flows through to metering. Without this step, usage events have no subject. They're anonymous, and you can't attribute them to anyone.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Configure the AI Proxy
&lt;/h2&gt;

&lt;p&gt;Next, wire the route to an actual LLM. The AI Proxy plugin accepts requests in OpenAI's chat format and forwards them to the configured provider.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Navigate to &lt;strong&gt;Plugins&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Click on &lt;strong&gt;New Plugin&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Select &lt;strong&gt;AI Proxy&lt;/strong&gt; from the plugin list&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frsy0uvct4i4h9fl4siqt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frsy0uvct4i4h9fl4siqt.png" alt=" " width="800" height="456"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Following the below yaml for CLI and configure the plugin fields accordingly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;_format_version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3.0"&lt;/span&gt;
&lt;span class="na"&gt;plugins&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ai-proxy&lt;/span&gt;
    &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;route_type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;llm/v1/chat&lt;/span&gt;
      &lt;span class="na"&gt;auth&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;header_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Authorization&lt;/span&gt;
        &lt;span class="na"&gt;header_value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Bearer ${{ env "DECK_OPENAI_API_KEY" }}&lt;/span&gt;
      &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;openai&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gpt-4o&lt;/span&gt;
      &lt;span class="na"&gt;logging&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;log_payloads&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
        &lt;span class="na"&gt;log_statistics&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two things to note here:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;log_statistics: true&lt;/code&gt; is what makes billing possible. Without it, the gateway proxies requests but doesn't record token counts. When enabled, it captures prompt tokens, completion tokens, and total tokens on every response. This is the data that metering consumes downstream.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;log_payloads: true&lt;/code&gt; logs the actual request/response content. This is optional and useful for debugging, but you'd probably turn it off in production for privacy reasons.&lt;/p&gt;

&lt;p&gt;Apply with &lt;code&gt;deck gateway apply&lt;/code&gt; and test:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$KONNECT_PROXY_URL&lt;/span&gt;&lt;span class="s2"&gt;/chat"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"apikey: acme-secret-key"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--json&lt;/span&gt; &lt;span class="s1"&gt;'{
    "messages": [
      {"role": "system", "content": "You are a mathematician."},
      {"role": "user", "content": "What is 1+1?"}
    ]
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should get a response from GPT-4o. The gateway handled auth, forwarded the request, and logged the token statistics.&lt;/p&gt;

&lt;p&gt;If you want to proxy multiple providers (say, OpenAI and Anthropic with automatic failover), you'd use &lt;code&gt;[ai-proxy-advanced](https://developer.konghq.com/plugins/ai-proxy-advanced/)&lt;/code&gt; instead with a load balancing config. I stuck with a single provider here to keep the billing walkthrough focused.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Enable Token Metering
&lt;/h2&gt;

&lt;p&gt;Now we connect the gateway's token logs to the metering system.&lt;/p&gt;

&lt;p&gt;In Konnect, go to &lt;strong&gt;Metering &amp;amp; Billing&lt;/strong&gt; in the sidebar. You'll see an &lt;strong&gt;AI Gateway Tokens&lt;/strong&gt; section. Click &lt;strong&gt;Enable Related API Gateways&lt;/strong&gt;, select your control plane (the quickstart one), and confirm.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm5ktuk4v5bkcc0poondr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm5ktuk4v5bkcc0poondr.png" alt=" " width="800" height="347"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This activates a built-in meter called &lt;code&gt;kong_konnect_llm_tokens&lt;/code&gt;. It uses SUM aggregation on the token count, grouped by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;$.model&lt;/code&gt; : which LLM handled the request&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;$.type&lt;/code&gt; : whether the tokens are input (&lt;code&gt;request&lt;/code&gt;) or output (&lt;code&gt;response&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The grouping matters because LLM providers charge differently for input vs. output tokens. Output tokens are typically 3-5x more expensive because input can be parallelized across GPUs while output generation is sequential, each token depends on all previous tokens. If your metering doesn't split these, your pricing will be wrong.&lt;/p&gt;

&lt;p&gt;At this point, every authenticated request through the AI Gateway generates a usage event that gets aggregated by the meter. But usage alone doesn't generate invoices. You need to define what's billable and how it's priced.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Create a Feature
&lt;/h2&gt;

&lt;p&gt;A feature is the link between raw metered data and something that appears on an invoice. Without it, usage is tracked but never billed.&lt;/p&gt;

&lt;p&gt;Go to &lt;strong&gt;Metering &amp;amp; Billing → Product Catalog → Features&lt;/strong&gt; and create one:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Name:&lt;/strong&gt; &lt;code&gt;ai-token&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Meter:&lt;/strong&gt; AI Gateway Tokens&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Group by filters:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Provider = &lt;code&gt;openai&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Type = &lt;code&gt;request&lt;/code&gt; (this tracks input tokens; you'd create a separate feature for output tokens if you want to price them differently)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgk1w21y609vz6zp52x3w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgk1w21y609vz6zp52x3w.png" alt=" " width="800" height="460"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The filters narrow the meter to a specific slice of usage. In a real setup, you'd likely create multiple features, one per model, one per token direction, to apply different rates. For this walkthrough, I'm keeping it to one feature to show the flow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Create a Plan with a Rate Card
&lt;/h2&gt;

&lt;p&gt;Plans bundle features with pricing. Go to &lt;strong&gt;Product Catalog → Plans&lt;/strong&gt; and create one:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Name:&lt;/strong&gt; &lt;code&gt;Starter&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Billing cadence:&lt;/strong&gt; 1 month&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnifkwcpt9d3jnpc01qqs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnifkwcpt9d3jnpc01qqs.png" alt=" " width="800" height="381"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Add a rate card:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Feature:&lt;/strong&gt; &lt;code&gt;ai-token&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pricing model:&lt;/strong&gt; Usage Based&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Price per unit:&lt;/strong&gt; &lt;code&gt;1&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Entitlement type:&lt;/strong&gt; Boolean (grants access to the feature)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx3q5x8ej2aif9p38k39k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx3q5x8ej2aif9p38k39k.png" alt=" " width="800" height="223"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A note on what "price per unit" means here: 1 unit = 1 token, because the meter SUMs individual tokens. So entering &lt;code&gt;1&lt;/code&gt; means $1.00 per token, which is way too expensive for real use. I'm using it here because the &lt;a href="https://developer.konghq.com/how-to/meter-llm-traffic/" rel="noopener noreferrer"&gt;official tutorial&lt;/a&gt; does the same thing: a round number that makes invoice changes easy to spot during testing.&lt;/p&gt;

&lt;p&gt;For production, you'd enter something like &lt;code&gt;0.000003&lt;/code&gt; for GPT-4o input tokens ($3.00 per 1M tokens) or &lt;code&gt;0.00001&lt;/code&gt; for GPT-4o output tokens ($10.00 per 1M tokens). There's no "per 1,000" toggle in the UI. You do the math yourself and enter the per-token price as a decimal.&lt;/p&gt;

&lt;p&gt;Publish the plan. It's now available for subscriptions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 6: Create a Customer and Start a Subscription
&lt;/h2&gt;

&lt;p&gt;This is where the consumer from Step 1 connects to the billing system.&lt;/p&gt;

&lt;p&gt;Go to &lt;strong&gt;Metering &amp;amp; Billing → Billing → Customers&lt;/strong&gt; and create one:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Name:&lt;/strong&gt; &lt;code&gt;Acme Corp&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Include usage from:&lt;/strong&gt; select the &lt;code&gt;acme-corp&lt;/code&gt; consumer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjhg2bomizkgass3rhmio.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjhg2bomizkgass3rhmio.png" alt=" " width="800" height="352"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This mapping is what ties gateway traffic to a billable entity. The consumer handles identity at the gateway level; the customer handles identity at the billing level. They're separate concepts joined here.&lt;/p&gt;

&lt;p&gt;Now create a subscription:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Go to the Acme Corp customer, then &lt;strong&gt;Subscriptions → Create a Subscription&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plan:&lt;/strong&gt; &lt;code&gt;Starter&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Start the subscription&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One important detail: &lt;strong&gt;metering only invoices events that occur after the subscription starts.&lt;/strong&gt; If you sent test requests before creating the subscription, those tokens won't appear on any invoice. I spent some time confused by this before finding it in the docs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 7: Validate the Invoice
&lt;/h2&gt;

&lt;p&gt;Send a few requests through the gateway:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="k"&gt;for &lt;/span&gt;i &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;1..6&lt;span class="o"&gt;}&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$KONNECT_PROXY_URL&lt;/span&gt;&lt;span class="s2"&gt;/chat"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"apikey: acme-secret-key"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--json&lt;/span&gt; &lt;span class="s1"&gt;'{
      "messages": [
        {"role": "user", "content": "Explain what a Fourier transform does in two sentences."}
      ]
    }'&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="k"&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Wait a minute or two for the events to propagate, then go to &lt;strong&gt;Metering &amp;amp; Billing → Billing → Invoices&lt;/strong&gt;. Click on Acme Corp, go to the &lt;strong&gt;Invoicing&lt;/strong&gt; tab, and hit &lt;strong&gt;Preview Invoice&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyi98dp2rv7i6qnom8se2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyi98dp2rv7i6qnom8se2.png" alt=" " width="800" height="366"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You should see the &lt;code&gt;ai-token&lt;/code&gt; feature listed with the aggregated token count and the calculated charge based on your rate card. That's the billing pipeline working end to end, from an API request to a line item on an invoice.&lt;/p&gt;

&lt;h2&gt;
  
  
  Connecting Stripe
&lt;/h2&gt;

&lt;p&gt;Konnect syncs invoices to Stripe, which handles payment collection, receipts, and retry logic for failed payments. You connect your Stripe account in the Metering &amp;amp; Billing settings, and invoices flow through automatically at the end of each billing cycle.&lt;/p&gt;

&lt;p&gt;The result for end users is a transparent invoice showing exactly what they consumed: token count, model, rate applied. Not a flat fee with no breakdown.&lt;/p&gt;

&lt;p&gt;## Things I Ran Into&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The consumer-customer mapping confused me at first.&lt;/strong&gt; Kong Gateway has "consumers" (API identity). Metering &amp;amp; Billing has "customers" (billing identity). They're separate. You create both, then link them. If you skip the consumer or forget to link it, usage events come in but they're not attributed to anyone billable. Set this up before you start sending traffic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Input vs. output pricing is a bigger deal than I expected.&lt;/strong&gt; Output tokens from OpenAI's GPT-4o cost $10.00/1M vs. $2.50/1M for input. If you use a single flat rate for "tokens," you'll underprice output-heavy workloads significantly. Splitting features by token type (request vs. response) and pricing them separately is worth the extra configuration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The order of operations matters.&lt;/strong&gt; Specifically: create the consumer and link it to a customer &lt;em&gt;before&lt;/em&gt; you start sending traffic you care about billing for. Events that arrive before a subscription exists don't retroactively appear on invoices.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where I'd Take This Next
&lt;/h2&gt;

&lt;p&gt;This walkthrough uses a single provider and a single feature. A production setup would look more like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multiple features&lt;/strong&gt;: one per model per token direction (GPT-4o input, GPT-4o output, Claude input, Claude output)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tiered pricing&lt;/strong&gt;: lower per-token rates at higher usage thresholds to incentivize growth&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Entitlements with metered limits&lt;/strong&gt;: cap total tokens per month per plan tier, so you can offer Starter (500K tokens), Pro (5M tokens), Enterprise (unlimited)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI Proxy Advanced&lt;/strong&gt;: route across multiple providers with load balancing (lowest-latency, round-robin, or cost-based routing)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The docs for all of these are at &lt;a href="https://developer.konghq.com/metering-and-billing/" rel="noopener noreferrer"&gt;developer.konghq.com/metering-and-billing&lt;/a&gt; and &lt;a href="https://developer.konghq.com/ai-gateway/" rel="noopener noreferrer"&gt;developer.konghq.com/ai-gateway&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you're building an AI agent and thinking about how to charge for it, I'd be curious to hear your approach. Per-token, credits, flat rate? What's working, what's not? Drop your thoughts in the comments.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>architecture</category>
    </item>
  </channel>
</rss>
