Forem: Kong

💰Monetize Your AI Agents with LangChain and Kong

Teja Kummarikuntla — Tue, 05 May 2026 15:26:47 +0000

Say you built an AI agent and customers are starting to pay for it. Sooner or later you'll want to charge them by what they actually use, because some customers hammer the agent all day while others send a handful of messages a week. A single flat fee loses money on the heavy users and overcharges the light ones.

The billing problem is the same whether your agent runs on your own model (self-hosted, fine-tuned, or trained from scratch) or calls a third-party API like OpenAI, Anthropic, or Gemini. You still need to know which customer made which call, count the tokens it used, and turn that into a dollar amount on a real invoice. That mapping (request → customer → token count → dollar amount → invoice) is yours to build, and that's what this tutorial sets up.

The agent uses LangChain, which sits one layer above the model so the same metering code works regardless of what's behind it. The example runs on OpenAI's gpt-4o-mini for convenience, but swap the chat model and nothing else changes. A small LangChain callback records each call's input and output token counts, tagged with the customer ID. Those records flow to Kong Konnect Metering & Billing, which keeps a running per-customer tally, applies your prices (input and output tokens can be priced separately), and produces invoices on a monthly cycle.

See it in action first

Before getting into the setup, here is what the finished pipeline looks like end to end. The agent runs on one side and reports the tokens it just used. Those same tokens land as a billable line item on the customer's invoice in Kong on the other.

The AI Agent App

The user types Hello world. The agent replies with Hello! How can I assist you today?. Both ends happen to land on 9 tokens. The input count is 9 rather than 2 because OpenAI wraps the prompt in chat-message formatting, which adds a few more beyond the literal words. The output landing on 9 as well. The agent fires off one record for the input tokens and another for the output tokens, both tagged with the customer (acme).

Metering and Billing the Agent in Kong

The same call now sits there as a real billable line item. With a simple test pricing of $1 per input token and $2 per output token, the math lines up:

Input: 9 tokens × $1 = $9
Output: 9 tokens × $2 = $18
Total: $27

Same numbers on both sides of the pipeline. That is what we are about to build.

Let's go through it step by step.

AI Agent App: github.com/tejakummarikuntla/llm-metering-langchian-kong.

Architecture

Every LLM call produces two CloudEvents. One carries the prompt token count, the other carries the response token count. Both events carry a subject field set to the customer identifier. Kong groups events by subject, sums the token field, multiplies by the rate card configured on the customer's plan, and rolls everything into invoices on the billing cycle.

Why this stack

Kong Konnect Metering & Billing fits this tutorial for three specific reasons:

Open source core. The metering side is built on OpenMeter, which is open source. You can self-host the metering pipeline, or use the managed Konnect service.
Configurable billing engine. Meters, features, plans, rate cards, and subscriptions are first-class primitives, configured in the portal rather than shipped as code.

You're not replacing Stripe here; you're using Kong as the metering and invoicing layer that feeds it.

What you will build

A LangChain callback handler that emits two CloudEvents per LLM call
A Kong meter that filters kong.llm_request events and sums the tokens field
Two features (input and output tokens) feeding a plan with separate rate cards
A customer subscribed to that plan, with metered usage and dollar values in the Konnect portal

Prerequisites

Node.js 22.6 or higher
pnpm: npm install -g pnpm
An OpenAI API key
A free Kong Konnect account: konghq.com
A Konnect Personal Access Token with Metering & Billing write permissions

Tutorial map

Part 1: Add Metering into the AI Agent app

Clone the AI agent app
Configure environment variables
Walk through the codebase
Run the AI Agent app

Part 2: Connect to Kong Metering & Billing

Create a Meter in Kong M&B
Create Features for input and output tokens
Create a Plan with Rate Cards
Create the Customer
Add a Subscription
Inspect usage and Invoices
Connect a Payment provider

Part 1: Add Metering into the AI Agent app

Clone the AI Agent app

git clone https://github.com/tejakummarikuntla/llm-metering-langchian-kong
cd llm-metering-langchian-kong
pnpm install

The reference is two TypeScript files. handler.ts is the metering callback. index.ts is a small chain that reads a prompt from stdin so you have something to exercise the handler with. No sidecar service, no separate ingestion worker, no extra runtime dependency beyond LangChain and the OpenAI client.

Configure environment variables

cp .env.example .env

Open .env and fill in real values:

API_URL=https://us.api.konghq.tech/v3/openmeter/events
API_KEY=your-konnect-personal-access-token
SUBJECT=acme
MODEL=gpt-4o-mini
OPENAI_API_KEY=your-openai-api-key

Variable	Purpose
`API_URL`	Kong Konnect ingestion endpoint. The default is the US region. EU organizations use `https://eu.api.konghq.tech/v3/openmeter/events`.
`API_KEY`	Konnect Personal Access Token with Metering & Billing write scope.
`SUBJECT`	Customer identifier attached to every event. Use `acme` for testing. In production this comes from your authenticated session, not an env var.
`MODEL`	Any chat-completion model. `gpt-4o-mini` keeps testing cheap.
`OPENAI_API_KEY`	Standard OpenAI API key.

Walk through the codebase

MeteringCallbackHandler extends LangChain's BaseCallbackHandler and implements two of its lifecycle hooks. Callbacks fire at the same place token counts are reported, you do not need to subclass the LLM client, and the LangChain runId gives you a stable event ID for free.

handleLLMStart

This hook fires immediately before the model is called. The handler captures run metadata so the LLM end hook can build a CloudEvent with the right customer attribution:

async handleLLMStart(
  _llm: Serialized,
  _prompts: string[],
  runId: string,
  parentRunId?: string,
  _extraParams?: Record<string, unknown>,
  _tags?: string[],
  metadata: Record<string, unknown> = {},
) {
  if (parentRunId) {
    const parentMetadata = this.runMetadata.get(parentRunId);
    if (parentMetadata) {
      Object.assign(metadata, parentMetadata);
    }
  }
  this.runMetadata.set(runId, metadata);
}

The parent run check matters. LLM calls almost always run inside a chain, agent, or tool-calling flow, which LangChain models as a parent run. When you set metadata at chain.invoke({}, { metadata: { subject: 'acme' } }), LangChain attaches it to the chain run, not the child LLM run. Without merging parent metadata into the child, the LLM end hook reads an empty metadata object and the subject is lost.

handleLLMEnd

This hook fires after the model returns. The handler reads token counts from output.llmOutput.tokenUsage (the field OpenAI fills on non-streaming completions), builds two CloudEvents, and posts each to the Kong ingestion endpoint:

async handleLLMEnd(output: LLMResult, runId: string) {
  const { promptTokens = 0, completionTokens = 0 } =
    output.llmOutput?.['tokenUsage'] ??
    output.llmOutput?.['estimatedTokenUsage'] ??
    {};

  if (!(promptTokens > 0 || completionTokens > 0)) return;

  const metadata = this.runMetadata.get(runId) ?? {};
  const { subject, ls_model_name, ls_provider, ls_model_type, ...data } = metadata;

  const inputEvent = {
    specversion: '1.0',
    id: `${runId}-input`,
    source: 'langchain',
    type: 'kong.llm_request',
    subject,
    data: { ...data, type: 'input', tokens: promptTokens, model: ls_model_name, provider: ls_provider },
  };

  const outputEvent = {
    specversion: '1.0',
    id: `${runId}-output`,
    source: 'langchain',
    type: 'kong.llm_request',
    subject,
    data: { ...data, type: 'output', tokens: completionTokens, model: ls_model_name, provider: ls_provider },
  };

  await this.ingest(inputEvent);
  await this.ingest(outputEvent);
}

A few decisions in this block matter for production:

The id field combines the LangChain runId with -input or -output. Kong deduplicates events by id plus source, so retries do not double-bill.
data.type separates input from output tokens at the event level. That separation is what makes per-token-class pricing possible without running two meters.
Anything you pass in metadata at chain.invoke time spreads into data. Tenant tier, region, feature flag: add it once at invoke time and filter on it in the meter. No handler changes.
ingest is a plain fetch POST with a Bearer token header. No SDK, no batching layer.

Read the agent entry point

index.ts wires ChatOpenAI up with the metering handler and runs a small one-shot chain:

const handler = new MeteringCallbackHandler(apiUrl, apiKey);

const llm = new ChatOpenAI({
  model,
  apiKey: openaiApiKey,
  callbacks: [handler],
});

const chain = PromptTemplate.fromTemplate('{input}')
  .pipe(llm)
  .pipe(new StringOutputParser());

const result = await chain.invoke(
  { input: userInput },
  {
    metadata: {
      subject,
      kong: 'strong',
    },
  },
);

Two lines do the integration. callbacks: [handler] on the ChatOpenAI instance attaches the handler to every call made through it. The metadata block on chain.invoke carries the customer identifier into the run metadata that handleLLMStart reads. The kong: 'strong' field is just a metadata pass-through demonstration: anything you add in that block lands inside data on the CloudEvent.

Run the AI Agent app

Start the app:

pnpm start

Type a prompt:

You: Explain how token-based usage billing works for LLM applications.

The handler logs both events as it sends them:

MeteringCallbackHandler: ingesting event {
  specversion: '1.0',
  id: '019dd41b-f14e-705b-a4dd-894bd025c73d-input',
  source: 'langchain',
  type: 'kong.llm_request',
  subject: 'acme',
  data: { kong: 'strong', type: 'input', tokens: 18, model: 'gpt-4o-mini', provider: 'openai', model_type: 'chat' }
}

AI: Token-based usage billing charges customers based on the number of tokens consumed...

MeteringCallbackHandler: ingesting event {
  specversion: '1.0',
  id: '019dd41b-f14e-705b-a4dd-894bd025c73d-output',
  source: 'langchain',
  type: 'kong.llm_request',
  subject: 'acme',
  data: { kong: 'strong', type: 'output', tokens: 156, model: 'gpt-4o-mini', provider: 'openai', model_type: 'chat' }
}

Both events are in Kong. They will not appear in a customer's usage view or invoice until Part 2 is set up.

Part 2: Connect to Kong Metering & Billing

The next sections build the meter, features, plan, and subscription that turn the raw event stream into priced, per-customer usage. The flow follows the Konnect M&B concepts model: events feed meters, meters feed features, features attach to plans through rate cards, customers subscribe to plans.

Open cloud.konghq.com and confirm you're in the region matching API_URL.

Create the LLM Tokens meter

A meter is a continuously-running query over the event stream. It picks events that match a filter, applies an aggregation, and exposes the result as a numeric usage value.

In the Konnect console:

Left navigation: Metering & Billing → Metering
Top right: Create Meter
Choose template: LLM Tokens

The LLM Tokens template fills in the right defaults for this handler:

Event type filter: kong.llm_request (matches the type field on every CloudEvent the handler emits)
Aggregation: Sum
Value property: tokens (reads data.tokens)

Click Save.

CLI alternative

The same meter can be created through the Konnect API:

curl -X POST https://us.api.konghq.tech/v3/openmeter/meters \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "LLM Tokens",
    "key": "llm-tokens",
    "description": "LLM token usage",
    "event_type": "kong.llm_request",
    "aggregation": "SUM",
    "value_property": "$.tokens",
    "dimensions": { "type": "$.type", "provider": "$.provider", "model": "$.model" }
  }'

Create Features

Features turn a single meter into multiple billable units. You need two: one exposing only input-token events, one exposing only output-token events. The split is what makes asymmetric pricing possible (most providers charge more for output than input).

Left navigation: Product Catalog → Features.

Input token feature

Click Create Feature:

Name: Input Token
Key: auto-fills from the name (input_token)
Meter: LLM Tokens (from the dropdown)
Meter Group Filters: add a single filter
- Field: type
- Operator: equals
- Value: input

Save.

Output token feature

Same form, output values:

Name: Output Token
Key: auto-fills from the name (input_token)
Meter: LLM Tokens
Meter Group Filter: type equals output

Save.

The same meter now feeds two features, each filtered to a different event subset.

Create a Plan with usage-based Rate Cards

A plan is what a customer subscribes to. Inside it, rate cards attach prices to features.

Product Catalog → Plans → New Plan:

Name: Pro
Click Save

Inside the new plan, add two rate cards.

Input token rate card

Click Add Rate Card and select the input token feature:

Pricing model: Usage-based
Price per unit: 1

Two notes about this field that bite people on the first run.

First, price per unit is the price for a single token. Not per thousand, not per million. There is no toggle that switches the unit. Production rates are decimals like 0.000003. The example uses 1 here so the dollar values on the test invoice are large and obvious.

Second, the pricing model selector decides whether the feature is metered or flat. Choosing flat-fee here would charge a fixed amount per cycle regardless of usage, which is the opposite of what you want for a metered feature.

Output token rate card

Click Add Rate Card again, select output token:

Pricing model: Usage-based
Price per unit: 2

Save. Output tokens now cost twice the input rate, which roughly mirrors how OpenAI and most other providers price the underlying API.

Create the customer

The customer record needs to be created manually. The subject field on every CloudEvent ties a token usage event to a specific customer through the customer's key, so the key has to match the SUBJECT value in your .env (acme in this tutorial).

Left navigation: Metering & Billing → Billing → Customers. Top right: Create new.

Fill in the form:

Name: acme (display name shown in the portal)
Key: acme (must match the SUBJECT env value)

Click Save.

The customer is now in the system but does not have any plan attached yet. Token events tagged with subject: acme will associate to this record once a subscription is in place.

Add a subscription

A subscription connects this customer to the Pro plan you built earlier. Without it, events still flow into the meter but never produce invoice line items.

Open the acme customer page and switch to the Subscriptions tab. Click Create subscription.

Step 1 of the wizard: pick the plan.

Subscription plan: Pro (the plan with input-token and output-token rate cards)

Click Next.

Step 2: timing and billing cycle. Defaults are fine for testing.

Start subscription: Immediately
Bill: Monthly
Starting: Start of subscription

Click Next, then Start subscription on the confirmation step.

The subscription is now active. The next call from pnpm start lands inside an active billing window and rolls into an invoice.

Track usage and invoices

Run the agent a few times with prompts long enough that the response is more than a handful of tokens, otherwise the input and output counts can look almost identical.

pnpm start

Back in Konnect, open the acme customer page from the Billing section and switch to the Invoicing tab.

The view shows the active plan, both rate cards, accumulated usage per feature, and the running invoice total. With test rates of $1 per input token and $2 per output token, even four prompts produce a dollar value that is easy to verify against the handler's logged token counts. Switch to production decimals like 0.0000015 and 0.000006 and the same view continues to work, just with smaller numbers.

Connect a payment provider

The metering and billing layer ends at invoice generation. Actually charging the customer needs a payment provider.

Konnect connects to providers like Stripe to:

Sync customer payment methods between Konnect and the provider
Charge invoices automatically when the billing cycle closes
Handle dunning, retries, and failed payments

The metering pipeline doesn't change when payment providers change. Kong owns usage aggregation and invoice generation. The provider only handles collection. That separation makes it possible to support multiple providers, switch between them, or test with one provider in staging and another in production without touching any code.

Gotchas

Input and output token counts that look identical. Short prompts can produce the same input and output token count by coincidence. The input count includes chat message formatting overhead (role markers, message delimiters) added by OpenAI before the prompt reaches the model, so a two-word prompt is rarely two tokens. Use a longer prompt to see the counts diverge clearly.

Events appear in the meter but not in invoices. The subscription started after the events were ingested. Kong only invoices events that fall inside an active subscription window. Run the app again after creating the subscription.

subject missing warning in the logs. The handler logs could not find 'subject' in run metadata when the metadata block doesn't include a subject. Check that .env exists (not just .env.example), that SUBJECT is set, and that the metadata block in index.ts reads subject from the env variable.

EU vs US endpoint. The default API_URL is the US endpoint. EU Konnect organizations need https://eu.api.konghq.tech/v3/openmeter/events. Wrong region produces silent ingestion failures. Confirm the region from Konnect organization settings.

Event deduplication. Kong deduplicates by id plus source. Replaying the same event twice produces one record, not two. The handler builds id from the LangChain runId, so this is rarely an issue in normal use, but worth knowing if events are being replayed or generated outside this handler.

Production checklist

The reference app demonstrates the mechanics. A production setup needs a few real changes.

subject from auth, not env. Replace SUBJECT=acme with a value pulled from the authenticated user session. Each chain invocation passes the real customer ID into the metadata block.
Per-model pricing. Add model to the meter group filters on each feature and run different rate cards per model. GPT-4o, GPT-4o-mini, Claude, and others can all be priced independently while sharing one meter.
Custom segmentation. Any field added to the metadata block lands in data on the CloudEvent. Add tenant tier, region, or provider and filter or group on them in the meter to bill differently per segment.
Usage alerts. Once events flow, configure usage thresholds in Kong to notify customers, throttle them, or pause subscriptions when they hit a limit.
Idempotent retries. The handler doesn't retry failed ingest() calls. Wrap fetch with a small retry layer (exponential backoff, max attempts) to handle transient network errors without losing billable events. Kong's deduplication on id + source makes safe retries straightforward.

The full reference AI Agent app is at https://github.com/tejakummarikuntla/llm-metering-langchian-kong. Clone, configure, and the metering pipeline runs locally in a few minutes. Adding it to an existing LangChain agent is a single line: callbacks: [handler] on the LLM client. Everything else is Kong configuration.

What's the trickiest part of metering an AI agent in production for you? Streaming responses, multi-model pricing, or per-tenant segmentation? Drop a comment.

💰I Built a Token Billing System for My AI Agent - Here's How It Works

Teja Kummarikuntla — Tue, 31 Mar 2026 15:39:56 +0000

I've been building an AI agent that routes requests across multiple LLM providers, OpenAI, Anthropic etc., based on the task. But pretty quickly, I hit a real problem: how do you charge for this fairly?

Flat subscriptions didn't make sense. Token costs vary by model, input vs output, and actual usage. A user generating a two-line summary isn't the same as someone churning out 3,000-word articles, yet flat pricing treats them the same.

I looked at a few options for usage-based billing. Stripe Billing has metered subscriptions but you have to build your own token tracking pipeline on top. Orb and Metronome are good, but they're separate vendors, you'd still need something to capture token data from your LLM calls and pipe it in. What I wanted was something at the gateway level, where the traffic already flows.

I ended up using Kong AI Gateway with Konnect Metering & Billing (built on OpenMeter). The gateway proxies every LLM request, so it already knows the token counts. The metering layer plugs directly into that. No separate vendor, no custom pipeline.

So instead of debating about pricing models, I set up the billing layer. A working system where every API request flows through a gateway, gets tracked, and is priced based on real usage:

🚧 Route requests through AI Gateway
🪙 Tokens get metered per consumer
💵 Pricing gets applied
🧾 Invoice generated

Here's the whole setup, step by step.

Set up the gateway
Step 1: Create a consumer
Step 2: Configure the AI Proxy
Step 3: Enable token metering
Step 4: Create a feature
Step 5: Create a plan with a rate card
Step 6: Create a subscription
Step 7: Validate the invoice
Step 8: Connect Stripe

The Setup

The billing pipeline has three layers:

Kong AI Gateway proxies the LLM requests. It sits between the app and the provider, handles auth, and this is the part that matters for billing, it logs token statistics for every request.

Konnect Metering & Billing (this is built on OpenMeter) takes those token events and aggregates them per consumer, per billing cycle. It supports defining features, pricing models, and plans on top of the raw usage data.

Stripe collects payment. The metering layer generates invoices that sync to Stripe.

Let me walk through each piece.

Prerequisites

You can do this entirely through the UI or via CLI. I'll cover both as we go.

A Kong Konnect account
An OpenAI API key (or any LLM provider key of your choice)

For CLI, you'll also need decK (v1.43+) installed and a PAT from Kong Konnect.

Set Up the Gateway

Once you log in, click on API Gateway and create one.

I'm using Serverless here. You can choose Self-managed too. Enter the gateway name as ai-service and click Create and configure. Once that's done, click Add a service and route and fill in:

Service Name: ai-service
Service URL: http://httpbin.konghq.com/anything
Route Name: ai-chat
Route Path: /chat

CLI

If you prefer the command line, generate your PAT and run:

export KONNECT_TOKEN='your_konnect_pat'
curl -Ls https://get.konghq.com/quickstart | bash -s -- \
  -k $KONNECT_TOKEN --deck-output

This gives you a running Kong Gateway connected to Konnect. It'll output some environment variables, export them as instructed. You'll also need:

export DECK_OPENAI_API_KEY='your_openai_api_key'

Then set up the service and route:

_format_version: "3.0"
services:
  - name: ai-service
    url: http://httpbin.konghq.com/anything
routes:
  - name: ai-chat
    paths:
      - "/chat"
    service:
      name: ai-service

Apply it with deck gateway apply. Now you have a route at /chat that we'll wire up to an LLM.

Step 1: Create a Consumer

You can't bill anyone if the gateway doesn't know who is making the request. Consumers are how Kong identifies API callers. Later, we'll map each consumer to a billing customer.

Add a consumer with a key-auth credential:

You can enter the Key value as acme-secret-key.

Now, you need to add the key-auth plugin to the service so the gateway actually requires authentication:

Click on Plugins in the left sidebar
Click on New Plugin
Select Key Authentication from the plugin list
Select Service as the scope or keep it as Global
Click Save

CLI

_format_version: "3.0"
consumers:
  - username: acme-corp
    keyauth_credentials:
      - key: acme-secret-key

Then enable the key-auth plugin on the service so the gateway actually requires authentication:

_format_version: "3.0"
plugins:
  - name: key-auth
    service: ai-service
    config:
      key_names:
        - apikey

Apply both with deck gateway apply.

Now every request to /chat must include an apikey header. The gateway identifies the caller as acme-corp, and that identity flows through to metering. Without this step, usage events have no subject. They're anonymous, and you can't attribute them to anyone.

Step 2: Configure the AI Proxy

Next, wire the route to an actual LLM. The AI Proxy plugin accepts requests in OpenAI's chat format and forwards them to the configured provider.

Navigate to Plugins
Click on New Plugin
Select AI Proxy from the plugin list

Following the below yaml for CLI and configure the plugin fields accordingly:

_format_version: "3.0"
plugins:
  - name: ai-proxy
    config:
      route_type: llm/v1/chat
      auth:
        header_name: Authorization
        header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
      model:
        provider: openai
        name: gpt-4o
      logging:
        log_payloads: true
        log_statistics: true

Two things to note here:

log_statistics: true is what makes billing possible. Without it, the gateway proxies requests but doesn't record token counts. When enabled, it captures prompt tokens, completion tokens, and total tokens on every response. This is the data that metering consumes downstream.

log_payloads: true logs the actual request/response content. This is optional and useful for debugging, but you'd probably turn it off in production for privacy reasons.

Apply with deck gateway apply and test:

curl -X POST "$KONNECT_PROXY_URL/chat" \
  -H "Content-Type: application/json" \
  -H "apikey: acme-secret-key" \
  --json '{
    "messages": [
      {"role": "system", "content": "You are a mathematician."},
      {"role": "user", "content": "What is 1+1?"}
    ]
  }'

You should get a response from GPT-4o. The gateway handled auth, forwarded the request, and logged the token statistics.

If you want to proxy multiple providers (say, OpenAI and Anthropic with automatic failover), you'd use [ai-proxy-advanced](https://developer.konghq.com/plugins/ai-proxy-advanced/) instead with a load balancing config. I stuck with a single provider here to keep the billing walkthrough focused.

Step 3: Enable Token Metering

Now we connect the gateway's token logs to the metering system.

In Konnect, go to Metering & Billing in the sidebar. You'll see an AI Gateway Tokens section. Click Enable Related API Gateways, select your control plane (the quickstart one), and confirm.

This activates a built-in meter called kong_konnect_llm_tokens. It uses SUM aggregation on the token count, grouped by:

$.model : which LLM handled the request
$.type : whether the tokens are input (request) or output (response)

The grouping matters because LLM providers charge differently for input vs. output tokens. Output tokens are typically 3-5x more expensive because input can be parallelized across GPUs while output generation is sequential, each token depends on all previous tokens. If your metering doesn't split these, your pricing will be wrong.

At this point, every authenticated request through the AI Gateway generates a usage event that gets aggregated by the meter. But usage alone doesn't generate invoices. You need to define what's billable and how it's priced.

Step 4: Create a Feature

A feature is the link between raw metered data and something that appears on an invoice. Without it, usage is tracked but never billed.

Go to Metering & Billing → Product Catalog → Features and create one:

Name: ai-token
Meter: AI Gateway Tokens
Group by filters:
- Provider = openai
- Type = request (this tracks input tokens; you'd create a separate feature for output tokens if you want to price them differently)

The filters narrow the meter to a specific slice of usage. In a real setup, you'd likely create multiple features, one per model, one per token direction, to apply different rates. For this walkthrough, I'm keeping it to one feature to show the flow.

Step 5: Create a Plan with a Rate Card

Plans bundle features with pricing. Go to Product Catalog → Plans and create one:

Name: Starter
Billing cadence: 1 month

Add a rate card:

Feature: ai-token
Pricing model: Usage Based
Price per unit: 1
Entitlement type: Boolean (grants access to the feature)

A note on what "price per unit" means here: 1 unit = 1 token, because the meter SUMs individual tokens. So entering 1 means $1.00 per token, which is way too expensive for real use. I'm using it here because the official tutorial does the same thing: a round number that makes invoice changes easy to spot during testing.

For production, you'd enter something like 0.000003 for GPT-4o input tokens ($3.00 per 1M tokens) or 0.00001 for GPT-4o output tokens ($10.00 per 1M tokens). There's no "per 1,000" toggle in the UI. You do the math yourself and enter the per-token price as a decimal.

Publish the plan. It's now available for subscriptions.

Step 6: Create a Customer and Start a Subscription

This is where the consumer from Step 1 connects to the billing system.

Go to Metering & Billing → Billing → Customers and create one:

Name: Acme Corp
Include usage from: select the acme-corp consumer

This mapping is what ties gateway traffic to a billable entity. The consumer handles identity at the gateway level; the customer handles identity at the billing level. They're separate concepts joined here.

Now create a subscription:

Go to the Acme Corp customer, then Subscriptions → Create a Subscription
Plan: Starter
Start the subscription

One important detail: metering only invoices events that occur after the subscription starts. If you sent test requests before creating the subscription, those tokens won't appear on any invoice. I spent some time confused by this before finding it in the docs.

Step 7: Validate the Invoice

Send a few requests through the gateway:

for i in {1..6}; do
  curl -s -X POST "$KONNECT_PROXY_URL/chat" \
    -H "Content-Type: application/json" \
    -H "apikey: acme-secret-key" \
    --json '{
      "messages": [
        {"role": "user", "content": "Explain what a Fourier transform does in two sentences."}
      ]
    }'
  echo ""
done

Wait a minute or two for the events to propagate, then go to Metering & Billing → Billing → Invoices. Click on Acme Corp, go to the Invoicing tab, and hit Preview Invoice.

You should see the ai-token feature listed with the aggregated token count and the calculated charge based on your rate card. That's the billing pipeline working end to end, from an API request to a line item on an invoice.

Connecting Stripe

Konnect syncs invoices to Stripe, which handles payment collection, receipts, and retry logic for failed payments. You connect your Stripe account in the Metering & Billing settings, and invoices flow through automatically at the end of each billing cycle.

The result for end users is a transparent invoice showing exactly what they consumed: token count, model, rate applied. Not a flat fee with no breakdown.

## Things I Ran Into

The consumer-customer mapping confused me at first. Kong Gateway has "consumers" (API identity). Metering & Billing has "customers" (billing identity). They're separate. You create both, then link them. If you skip the consumer or forget to link it, usage events come in but they're not attributed to anyone billable. Set this up before you start sending traffic.

Input vs. output pricing is a bigger deal than I expected. Output tokens from OpenAI's GPT-4o cost $10.00/1M vs. $2.50/1M for input. If you use a single flat rate for "tokens," you'll underprice output-heavy workloads significantly. Splitting features by token type (request vs. response) and pricing them separately is worth the extra configuration.

The order of operations matters. Specifically: create the consumer and link it to a customer before you start sending traffic you care about billing for. Events that arrive before a subscription exists don't retroactively appear on invoices.

Where I'd Take This Next

This walkthrough uses a single provider and a single feature. A production setup would look more like:

Multiple features: one per model per token direction (GPT-4o input, GPT-4o output, Claude input, Claude output)
Tiered pricing: lower per-token rates at higher usage thresholds to incentivize growth
Entitlements with metered limits: cap total tokens per month per plan tier, so you can offer Starter (500K tokens), Pro (5M tokens), Enterprise (unlimited)
AI Proxy Advanced: route across multiple providers with load balancing (lowest-latency, round-robin, or cost-based routing)

The docs for all of these are at developer.konghq.com/metering-and-billing and developer.konghq.com/ai-gateway.

If you're building an AI agent and thinking about how to charge for it, I'd be curious to hear your approach. Per-token, credits, flat rate? What's working, what's not? Drop your thoughts in the comments.