Forem: Soumya Prasad

Stop Bleeding Tokens: How We Cut Enterprise API Costs by 46% with a Lossless JSON Encoder

Soumya Prasad — Sat, 28 Feb 2026 18:26:07 +0000

You're not paying for what your LLM thinks. You're paying for what your enterprise API sends.

The Token Crisis Nobody Talks About

Everyone talks about prompt engineering, context windows, and model selection when optimizing LLM costs. But there's a silent killer hiding in plain sight — the raw JSON payloads from enterprise APIs.

When you connect an AI agent to systems like IBM Maximo, ServiceNow, or SAP, you're not just getting data back. You're getting infrastructure noise dressed up as data.

Here's what a single incident record from a Maximo OSLC API actually looks like:

{
  "owner": "PRIYA.N",
  "status_description": "In Progress",
  "slarecords_collectionref": "api/os/mxapiincident/_VElDS0VULzEwMDE3Mw--/slarecords",
  "labtrans_collectionref":   "api/os/mxapiincident/_VElDS0VULzEwMDE3Mw--/labtrans",
  "ticketprop_collectionref": "api/os/mxapiincident/_VElDS0VULzEwMDE3Mw--/ticketprop",
  "relatedrecord_collectionref": "api/os/mxapiincident/_VElDS0VULzEwMDE3Mw--/relatedrecord",
  "_rowstamp": "26338737",
  "accumulatedholdtime": 0,
  "class_description": "Service Request",
  "class": "SR",
  "changeby": "PRIYA.N",
  "createdby": "MAXUSER1",
  "ownergroup": "PLANTOPS",
  "origfromalert": false,
  ...
}

That's one record. Now multiply it by 15–20 records per page. The pattern becomes painfully clear:

slarecords_collectionref, labtrans_collectionref, ticketprop_collectionref — pagination handles the LLM can never call
_rowstamp — a database concurrency token the LLM has no use for
PRIYA.N, SR, MAXUSER1, PLANTOPS — repeated identically across every single record
class_description: "\"Service Request\" alongside class: \"SR\" — the same thing, twice, on every row"

On a real-world test of 15 Maximo incident records: ~5,400 tokens consumed. The LLM asked for incident data. It got database plumbing.

The Insight: Enterprise APIs Weren't Built for LLMs

Enterprise systems like Maximo were designed for browser UIs and system integrations — not token-efficient AI consumption. Their REST/OSLC APIs are built to be complete and self-describing. That's great for a UI developer. It's expensive for an LLM.

The data your agent actually needs is maybe 40–50% of what gets sent. The rest is:

Repeated strings — owner names, status codes, class labels duplicated on every record
Collection refs — internal pagination handles the LLM can't follow
Internal metadata — rowstamps, localrefs, origfromalert flags
Verbose field names — description_longdescription, status_description, class_description eating characters
HTML markup — embedded in long description fields, consuming tokens on angle brackets

Every token of this noise costs money. On high-frequency agentic workflows — think a help desk AI processing hundreds of incident queries per hour — this adds up fast.

Introducing `lean-normalizer`

We built lean-normalizer — a pre-processing layer that sits between your enterprise API and your LLM tool call. It encodes the response into LEAN format: a compact, human-readable, fully reversible wire format specifically designed for LLM consumption.

LEAN stands for Lossless Enterprise API Normalization.

The key word is lossless. Nothing is dropped. Every field value is preserved. The LLM can reconstruct any original value without any client-side code.

What It Does

LEAN encoding applies three compression strategies in two passes:

1. Dictionary deduplication
Any string that repeats across records gets stored once in a ### DICT block and referenced as a *N pointer inline. PRIYA.N appearing 13 times across 15 records? Stored once. Referenced as *0 everywhere else.

2. Schema key shortening
Long field names like status_description, description_longdescription, accumulatedholdtime get replaced with short base-36 keys (d, i, 0). The mapping lives in a ### SCHEMA block the LLM reads once.

3. Noise suppression
_rowstamp, *_collectionref fields, localref, empty strings, HTML markup — stripped entirely via adapter-specific rules. The LLM never asked for them.

Two things are never compressed: href values and ISO dates. These are always emitted raw so agents can pass them directly to follow-up tool calls (PATCH, GET) without any decoding step.

The Output Format

### LEAN FORMAT v1

### DICT
*0=PRIYA.N
*1=SR
*2=Service Request
*3=MAXUSER1
*4=QUEUED

### SCHEMA
0=accumulatedholdtime
1=changeby
c=status
d=status_description
f=ticketid
h=description
i=description_longdescription

### DATA: member
_id:0 0:0 1:"*0" 3:"*1" 4:"*2" 5:"*3" c:"*4" d:"Queued" f:100171 ...
_id:1 0:0 1:"*0" 3:"*1" 4:"*2" 5:"*3" h:"High Priority: Cooling Tower..." c:"*9" ...

The LLM reads ### SCHEMA and ### DICT once, then processes each ### DATA row. Nested child objects (like relatedrecord arrays) are decomposed into named child tables linked by _p parent references — flat, indexed, readable.

Real-World Test Results

We tested against a real IBM Maximo instance, querying active incidents via the MXAPIINCIDENT OSLC Object Structure.

Dataset: 13 incident records, full oslc.select=* payload including doclinks and related records.

Mode	Payload Size	Tokens (approx)
Raw JSON (`useLean=false`)	18,407 chars	~4,600
LEAN encoded (`useLean=true`)	9,921 chars	~2,480
Saving	8,486 chars	~2,120 tokens (~46%)

Compression ratio: 0.539 — consistently in the 46–54% range across multiple runs on the same dataset.

The circuit breaker was not triggered (it fires only when encoding would make the payload larger, which can happen with very small or highly unique datasets).

Was Any Data Lost?

We cross-validated both responses field by field:

All 13 records returned in both modes ✅
All ticket IDs, descriptions, priorities, statuses matched ✅
Related records and linked work orders preserved ✅
Attachment/doclink metadata intact ✅
actualfinish dates on resolved tickets present in both ✅

Zero data loss. 46% token reduction.

LLM Compatibility

A natural question: can modern LLMs actually decode this format reliably?

We tested with Claude (Anthropic) and OpenAI tool calling. Both models handle LEAN decoding correctly — they read ### SCHEMA to resolve short keys, read ### DICT to expand *N pointers, and process ### DATA rows without confusion.

The format is intentionally designed to be self-documenting. There's no magic — it's structured text with a two-block header. Any enterprise-grade LLM that can follow instructions can decode it. The tool description tells the model the format exists upfront:

description:
  'Returns open Maximo incidents. When the response contains "### LEAN FORMAT v1", ' +
  'it is LEAN-encoded. Use ### SCHEMA to map short keys back to field names, ' +
  'and ### DICT to expand *N pointer values. ' +
  'href values and ISO dates are always emitted raw.'

That's all the LLM needs. No client SDK. No parsing library on the model side.

Using It in Your MCP Tool

Installation is a single package:

npm install @soumyaprasadrana/lean-normalizer

Drop it into any MCP tool that retrieves enterprise data:

import { LeanEncoder, MaximoAdapter } from '@soumyaprasadrana/lean-normalizer';

const encoder = new LeanEncoder({ adapter: new MaximoAdapter() });

server.tool('get_incidents', { ... }, async ({ status }) => {
  const raw    = await maximo.getIncidents({ status });
  const result = encoder.encode(raw);

  return {
    content: [{ type: 'text', text: result.encoded }],
    _meta: {
      lean_compressed:     result.compressed,
      lean_ratio:          result.ratio,
      lean_original_bytes: result.originalSize,
      lean_encoded_bytes:  result.encodedSize,
    },
  };
});

The _meta block is optional but useful for monitoring compression stats in tool call traces.

The circuit breaker handles edge cases automatically — if encoding a small payload would make it larger, the library returns raw JSON unchanged with compressed: false. Your agent code doesn't need to handle this specially.

Built-in Adapters

The library ships adapters for three enterprise systems out of the box:

IBM Maximo (OSLC / REST API)
Detects the root array at payload.member, derives table names from OSLC Object Structure hrefs, suppresses _rowstamp / *_collectionref fields, strips spi: / rdf: namespace prefixes, strips HTML from long descriptions.

ServiceNow (Table API)
Detects root at payload.result, drops the link half of reference objects ({ link, value } → keeps value), suppresses sys_class_name, sys_domain, sys_domain_path.

SAP OData (v2 and v4)
Detects root at payload.d.results or payload.value, strips __metadata and __deferred, converts /Date(ms)/ timestamps to ISO-8601.

Writing a new adapter is ~30 lines of TypeScript — implement the LeanAdapter interface, add a fixture JSON, add a test.

The Economics

Let's make this concrete. Suppose your AI agent processes 500 Maximo incident queries per day — a moderate load for a help desk automation.

At ~4,600 tokens per raw query vs ~2,480 tokens with LEAN:

	Raw JSON	LEAN
Tokens per query	~4,600	~2,480
Daily tokens (500 queries)	~2,300,000	~1,240,000
Daily saving	—	~1,060,000 tokens

At typical enterprise LLM pricing, that's a meaningful cost line — and it compounds directly with query volume. The heavier your agentic workload, the more LEAN pays for itself.

And this is just the input token side. Smaller context also means faster responses — the model processes less before it can start reasoning.

When to Use It (and When Not To)

Good fit:

MCP tools connecting to Maximo, ServiceNow, SAP
Any enterprise API that returns repetitive, field-heavy records
Agentic workflows with high query volume
Situations where you're approaching context window limits

Not the right tool:

Very small payloads (1–3 records) — the circuit breaker will likely return raw JSON anyway
Streaming responses — LEAN is a complete-payload format
Cases where the LLM needs raw JSON for downstream tool arguments (though href and dates are always raw)

What's Next

The library is experimental and actively maintained. A few things on the roadmap:

Selective field encoding — let adapters specify which fields are agent-critical vs suppressible, giving finer control than the current skip/keep binary
Streaming-safe chunked mode — for APIs that support server-sent events
More adapters — Oracle EBS, Salesforce, Dynamics 365 are obvious next targets
Benchmarks dashboard — a public comparison of compression ratios across different enterprise API response shapes

Contributions are welcome. If you're connecting an AI agent to any enterprise system and hitting token ceilings, this library might be worth a look.

Supercharge Java Debugging in VS Code with Java DebugX: Macro Recording & Playback Made Easy

Soumya Prasad — Sun, 10 Nov 2024 03:54:20 +0000

Let’s face it: if you’re a developer, a massive chunk of your time is probably spent on debugging. While development might only take up 20% of the process, the rest is usually about fixing issues, tracing paths, and reproducing bugs. Debugging large Java applications can be especially challenging and time-consuming, as you sift through complex flows and repeatedly retrace steps. But what if there was a way to make this easier?

Meet Java DebugX—an innovative Visual Studio Code extension designed to transform Java debugging with advanced features like macro recording and automated playback. Let’s dive into how Java DebugX can simplify the debugging process, saving you time and increasing productivity.

Setting Up Java DebugX in VS Code

To get started, ensure you have Visual Studio Code set up with the Red Hat Java Language Support and Lightweight Java Debugger extensions installed. These provide essential Java development and debugging support in VS Code.

Then, install Java DebugX from the marketplace. It’s as simple as searching for "Java DebugX" in the Extensions tab and clicking install. With Java DebugX, you’re ready to take debugging to a new level.

Recording a Debugging Session with Java DebugX

Once installed, you’ll see a "Start Recording" button in the Stack View navigation menu. Start a debugging session as usual and press "Start Recording." Java DebugX will automatically record your actions, including:

Step in, step out, and step over actions
Setting and removing breakpoints
Continue and pause actions

Every action is saved in a macro format, allowing you to replay the session later. This can be invaluable if you’re debugging a complex flow and need to reproduce the exact sequence of actions without repeating each step manually.

Macro Recording Action Buttons

Playing Back the Recorded Macro

After recording, you can play back the macro to revisit the debugging flow starting from the exact breakpoint where you initially began recording. Java DebugX allows you to control playback speed by setting java.debugx.macro.stepDelayInSeconds, adding delays between each automated playback step. Additionally, you can pause, resume, or stop the playback anytime using buttons in the Stacktrace Navigation Menu.

Play a macro

How playback works

Playback actions

How Java DebugX Can Boost Productivity

Here’s a typical scenario: You’re debugging a large Java application and find a potential root cause. But after stepping forward, you realize you need to repeat the process to verify something. In a real-life setting, this is where Java DebugX shines—you can record the session once, then replay it up to the exact point you need to examine again.

Java DebugX even includes enhanced diagnostics to help when your macro takes a wrong path. If your playback reaches a point that differs from the expected line (like reaching an unexpected catch block or exception), DebugX will try to gather diagnostics and log them to a file, giving you a better understanding of potential issues.

Transforming Java Debugging with Java DebugX

With Java DebugX, debugging becomes faster, more manageable, and much less repetitive. This extension helps reduce human error and time spent on manual tasks, letting you focus on what matters—finding and fixing issues efficiently.

Install Java DebugX today and see how it can change the way you debug!``

Explore Java DebugX on GitHub

Java DebugX is an open-source project, and you can find the complete codebase, documentation, and updates on its GitHub repository.

Whether you’re curious about how it’s built, want to contribute, or need to report an issue, the GitHub repo has everything you need. Join our community of developers and help make Java debugging even better!