Forem: Raj Kundalia

Why Your Story Points Feel Arbitrary (And How to Fix It)

Raj Kundalia — Tue, 12 May 2026 17:21:03 +0000

This article was originally published on Medium.

I just feel it is "2", no, I think it is "3".

When I first came across story points, I always wondered how the "experienced" people on the team were calling out numbers so confidently. Now that I've become one of those "experienced" people — I realized I still didn't have a stronger framework for it. Like everyone else, I'd gone with gut feeling. Others would nod along, and sometimes we'd walk out with no real sense of why. All I knew was it should follow the Fibonacci series. Everyone had different intuitions, and the loudest voice — or the consensus — won.

So I tried to build a mental model for myself (not used by the team, yet), so I'd have something to point at when I picked a number. This is what I landed on after trying it on a couple of story-pointing sessions.

The four dimensions

Story points are supposed to capture how big the story would be, but the word "big" can pack in a lot of things, so I unpacked it into four things that I could actually measure:

Complexity — How hard is this to build? New tech, tricky logic, or big design decisions push this up.

Effort — How much work is there? A lot of small, easy changes across many files can still be Medium or High here, even if each change is trivial.

Uncertainty / Risk — How clear is the requirement? Open questions, unfamiliar parts of the system, or things that might surprise me mid-way add risk.

Dependencies — Does this depend on other teams or systems? This one is about waiting, not work. And waiting still inflates the point value. Every time the work unblocks, I have to re-page the state back into my head — and a story that crosses sprint boundaries carries its own cognitive overhead and spillover risk. Some people would argue dependencies shouldn't affect the size at all — they're not work, just calendar drag — but for me the cognitive overhead is real enough that they belong in the rubric.

For each, I rate Low, Medium, or High.

Rating guide

Dimension	Low	Medium	High
Complexity	Pattern we've done many times	Some design decisions needed	New tech, design-heavy, or novel problem
Effort	Single small change	Multiple changes, moderate scope	Many files / modules / large scope
Uncertainty / Risk	Requirement fully clear	Some open questions	Significant unknowns
Dependencies	None external	One known, manageable	Multiple, or blocked on another team

How I map ratings to points

Roughly, here's where I thought it made sense. Your numbers will probably differ once you've used this a few times — and they should.

Ratings	Points
All Low	1
One Medium, rest Low	2
Multiple Medium	3
One High	5
Multiple High	8
Mostly High	13

Do not be mechanical about this, it works now for me, can change in future.

Sanity check: if a story lands at 8 or 13, ask whether it should be split before you size it. Stories with several high-rated dimensions are usually epics in disguise.

How I actually use it

I rate each dimension in my head before I name a number. The dimensions are the work; the number is just the output.

The biggest thing I noticed isn't that I'm picking better numbers — it's that I can finally say why. Before, "5" was a feeling. Now I can trace it back: complexity High, dependencies Medium, the rest Low. Even if I don't share the breakdown out loud, having it in my head means I'm offering an estimate instead of a guess. And when I disagree with the room, I have something specific to point at — "I think it's a 5 because the unknowns here are bigger than they look" — instead of defending a number on instinct.

This is a starting point, not a strict formula. Override it when experience says otherwise. After a few sprints, look back at the stories that surprised you — were the surprises about complexity? Dependencies? Something else? Adjust the dimensions and the ratings to match what actually drives your estimation misses.

What this is not

This doesn't convert to hours, and it shouldn't be used to measure individual productivity.

The number isn't the point. The four-dimension conversation that produces it is.

Thank you for reading, suggestions are welcome.

Follow me on LinkedIn: Raj Kundalia

How I Review PRs with AI — Without Losing My Own Judgment

Raj Kundalia — Sun, 26 Apr 2026 14:05:36 +0000

Originally published on Medium:
https://medium.com/@rajkundalia/how-i-review-prs-with-ai-without-losing-my-own-judgment-f930ad30dc60

Over the last few months, my code review queue has changed completely. With agentic coding, PRs are larger, faster, and harder to reason about.

I needed a system that was faster, but I absolutely did not want to just hand things off to an AI and call it a review.

Built-in tools exist. Claude Code has /review or /deep-review, and GitHub Copilot's PR review is decent out of the box. If you just want an AI pass, they work fine. But I am not optimizing for just an AI pass; I am optimizing for understanding and architectural signal.

Here is a repeatable framework I use to let AI handle the heavy scanning, while I keep the heavy thinking and judgment firmly in my own hands.

(Note: All the prompts referenced below are open-source in my GitHub repo: 👉 https://github.com/rajkundalia/ai-code-review-prompts. They are tool-agnostic — paste them into Claude, ChatGPT, Cursor, or whatever you prefer.)

The Golden Rule: Context Isolation

Before we get into the phases, there is one non-negotiable rule that makes this entire system work: One AI session per PR.

If you mix your own daily work, multiple PR reviews, and random questions into a single AI session, you lose context. PR reviews are context-heavy. When a colleague replies to your comment four days later, having a dedicated, preserved AI session helps you instantly remember your mental model and why you left that comment in the first place.

Keep the thread alive from the start of the review through the merge.

The 4-Phase PR Review Workflow

When I load my initial prompt, it gives me a starting point: a high-level summary, the files touched, and the core intent of the PR. From there, I move through four distinct phases. Do not skip ahead.

Phase 1: Build Understanding (Human First)

What happens next is entirely mine. I go file by file, line by line, and ask the AI questions until I have built my own understanding of the flow:

What is this doing?
Where is this data model used further downstream?
What breaks if this assumption changes?

This is deliberately manual. Anything I still do not understand after interrogating the AI, I flag for a human comment.

If you skip this phase, you're not reviewing the code — you're reviewing the AI's opinion of the code.

Phase 2: AI First Pass (Filter the Noise)

This is where the AI does its first real pass, flagging standard issues and inconsistencies. This is intentionally a surface pass.

The reason this is a separate phase from the deep review is simple: I want the obvious stuff caught and out of the way early. It gives me a chance to dismiss irrelevant suggestions immediately, ensuring the next phase isn't cluttered with noise.

👉 Think of this as signal extraction, not decision-making.

Phase 3: The Deep Review (Pressure Testing)

This is the heaviest phase, driven by a few specific forcing functions:

The "Chief Programmer" & "Chief Architect" Persona
Giving the AI a specific role produces sharper, more critical output than a generic "review this code." You can adjust the role to fit your domain, e.g., chief AI engineer if you are reviewing prompt code.

Real Coverage vs. Theater
AI agents generate a massive amount of tests. Left unchecked, they will write tests for data models with no logic, or tests that just verify Python works. I explicitly prompt the AI to look for meaningful behavior validation so we catch the noise upfront. It is better than constantly asking the AI to remove redundant tests.

Tests should prove behavior, not existence.

Playing Devil’s Advocate
I force the LLM to question its own assumptions. What could go wrong? Where would this fail in production three months from now?

This surfaces edge cases that standard reviews easily miss.

Phase 4: The Verdict

Finally, I combine my Phase 1 understanding with the AI's deep review insights. The AI helps me classify the findings into:

Must-fix blockers
Good-to-have stylistic suggestions
Noise to be discarded

The Author's Duty: Self-Review

Before your code ever reaches another human, it is your responsibility to review it.

I converted my PR review framework into a self-review prompt. I run through the exact same phases on my own code. The output here is highly surgical: it tells me the file, the line, what is wrong, and what to do instead.

The goal is simple:

The comments you eventually get from your peers should be about high-level design decisions — not trivial things you could have caught yourself.

You get serious brownie points for consistently raising high-quality, pre-vetted PRs.

Scaling the Process

Not every PR needs all four phases.

A 10-line config change → quick pass
A 1,000-line refactor → full deep review

Match the depth of review to the risk and complexity.

Over-reviewing small changes is wasteful.
Under-reviewing large ones is dangerous.

Final Thoughts

I am not offloading my thinking to an AI. I am using it to explore faster, validate assumptions, and stress-test decisions. The thinking is still mine.

The leverage is new.
The responsibility isn’t.

These tools are incredibly powerful — but you still need to hold the leash.

I’ve open-sourced the prompts and guidelines I use:
👉 https://github.com/rajkundalia/ai-code-review-prompts

If you have better ideas, improvements, or ways to reduce noise — I’d genuinely like to see them.

Following a Database Read to the Metal — A Simple Walkthrough

Raj Kundalia — Sat, 11 Apr 2026 11:01:40 +0000

This is a cross-post from my Medium article.

I wanted to learn about the internals of database indexes. The first step was understanding how Disk I/O works — so I got Claude/Gemini to curate a reading list, which led me to Database Pages — A Deep Dive by Hussein Nasser.

There were things I hadn't understood, so I wrote this mellowed-down version for my own clarity. For complete understanding, do read the original post by Hussein Nasser.

Here it goes.

1. Database Layer

You run:

SELECT NAME FROM STUDENTS WHERE ID = 1008

DB parses the query → looks up STUDENTS in pg_class (an internal catalog, also stored on disk) → finds OID (Object Identifier) 24601
DB knows the file lives at PGDATA/base/<db_oid>/24601 on the filesystem
DB asks the OS to open that file — the OS hands back a temporary integer called a file descriptor (fd), say fd = 7. This is a short-lived handle, valid only for the session. The fd is never stored on disk.

No index on ID, so DB scans pages one by one. For each page it:

Checks its buffer pool first — if the page is already in memory, no disk read needed
If not found, issues a read() to the OS for that page

read(fd, 0,    8192)  → page 0: bytes 0–8191
read(fd, 8192, 8192)  → page 1: bytes 8192–16383

The OS → SSD journey below happens once per page. We trace it for page 0.

Note: The exact syscall used by databases may differ — Postgres uses pread() which takes an explicit offset. The intent here is to show what information is passed, not the exact function signature.

2. File System / OS Layer

OS looks up the inode of file 24601 → finds block mapping

inode (index node): a data structure the Linux filesystem maintains for every file on disk.

bytes 0–4095    → LBA 100
bytes 4096–8191 → LBA 101

OS checks its page cache → blocks not found
OS sends a read command to the NVMe driver with LBA 100 and 101

NVMe (Non-Volatile Memory Express): a communication protocol designed specifically for SSDs.

3. LBA — The Bridge Between OS and SSD

LBA (Logical Block Address) is a sequential numbering system for blocks on a storage device.

The OS doesn't know or care about physical locations on the SSD — it just says:

"Give me LBA 100 and 101."

The NVMe controller receives this and translates internally:

LBA 100 → Physical page 99, offset 0x0001
LBA 101 → Physical page 99, offset 0x1002

This translation is managed by the SSD's Flash Translation Layer (FTL).

The reason this layer exists: the SSD can move data around internally (for wear leveling, bad block management, etc.) without the OS ever knowing.

4. SSD Layer

NVMe controller checks its DRAM cache — page 99 not found
Fetches the entire NAND page 99 (16KB) into DRAM cache
Extracts just the requested 8KB (LBA 100 + 101) and returns it to the OS

5. Back Up the Stack

SSD returns 8KB
      ↓
OS stores blocks 100, 101 in PAGE CACHE (RAM)
      ↓
OS returns 8KB to DB
      ↓
DB stores page 0 in BUFFER POOL (RAM)
      ↓
DB scans page 0 — rows 1–1000, row 1008 not found
      ↓
entire journey repeats for page 1
      ↓
DB stores page 1 in BUFFER POOL (RAM)
      ↓
DB scans page 1 — finds row 1008, returns to user ✓

Layered Abstraction Summary

Each layer only knows its own abstraction and talks to the layer directly below it.

Layer	Abstraction it uses
Database	File + offset (pages)
OS	Inodes + LBAs
NVMe Controller	LBA → physical page (via FTL)
NAND Flash	Physical pages and cells

LBA is the common language between the OS and the SSD — the key handoff point where the OS's logical world meets the SSD's physical world. And the FTL is what keeps the physical complexity invisible to everyone above it.

*Originally published on Medium.

Find me on LinkedIn · Medium

How BAML Brings Engineering Discipline to LLM-Powered Systems

Raj Kundalia — Sat, 21 Mar 2026 14:36:43 +0000

TL;DR

BAML is a domain-specific language and toolchain for defining LLM function interfaces with strict, recoverable output parsing - addressing the reliability gap that makes production LLM systems painful to build and maintain. It generates type-safe client code from schema definitions across Python, TypeScript, Go, Ruby, and several other languages, and uses a parsing approach called Schema Aligned Parsing that recovers structured data even from garbled or partial model responses. For a working reference implementation, see:

GitHub - rajkundalia/error-analyzer-with-baml: Analyze Java compilation and runtime errors using BAML with a local Ollama model.

How I came to know about BAML

I was wondering about if there is something that tries to handle output from an LLM and then suddenly, a talk by Vaibhav Gupta landed. I started exploring more; if you want to explore like how did and not read this post, you can try asking these questions to know it by yourself:

What is BAML?
What is Pydantic? Does it relate to BAML? If yes, how does it relate to BAML?
What is PydanticAI? How does it compare to BAML? Can I use PydanticAI just for what BAML does? Does PydanticAI retry to get right output from the model?
How BAML handles a heavily hallucinated output?
What is instructor? [https://github.com/567-labs/instructor]? Compare it with BAML. - Follow up for clarity: If one is using PydanticAI, there is no point in using Instructor?
Where exactly does BAML fit into a standard RAG pipeline?
How does BAML help in token efficiency?
What is semantic streaming in BAML? What does problems does it solve? How does it help in Generative UI (add information about what Generative UI is in short)?
What is BAML code generator?
What is Schema Aligned Parsing? And what can it handle?
What kind of testing is done or can be done in BAML?
What is union in BAML?
How does logging and tracing or observability work in BAML?
How does BAML use Jinja templating to inject dynamic context, loops, and precise chat roles into prompts without messy string concatenation?
What are dynamic types (or runtime schemas) in BAML?
What aspects can BAML help in?
Will BAML make sense with something like Claude Agent SDK?

What BAML Is and the Problem It Solves

Every engineer who has tried building an LLM-powered feature knows the first hour of optimism and the next two weeks of fire-fighting. The model returns JSON with an extra key, or wraps it in markdown fences, or truncates mid-response. The prompt worked fine in POC/Demo. Now there are three different parsing bugs during production grade implementation, all subtly different.

BAML (or Basically a made-up language) - Boundary ML - exists to solve this class of problem at the right level of abstraction. It is a language-level contract between the application and the model. You define what you want the model to return, write the prompt logic in a dedicated templating layer, and BAML handles parsing, type-checking, retries, and client generation across Python, TypeScript, Go, Ruby, and other languages - with opt-in retry policies when you need them.

The project positions itself as the Pydantic of LLM engineering - a statement about philosophy rather than API compatibility. Just as Pydantic introduced runtime type validation into Python codebases that previously relied on convention and hope, BAML introduces structural guarantees into LLM pipelines that previously relied on prompt tuning and defensive try/except blocks.

How BAML Relates to Pydantic and Tools Like Instructor

Pydantic itself does one thing exceptionally well: it validates Python data structures against declared schemas. Feed it a dictionary, and it tells you whether it conforms to the model definition. It does not know anything about language models, prompts, or API calls - it is a validation library, and a very good one.

Instructor builds on top of Pydantic to handle the LLM layer. It takes a Pydantic model, wraps the OpenAI (or Anthropic, or other) API call, and uses function calling or JSON mode to coax the model into returning something the Pydantic validator can accept. When validation fails, Instructor can retry with the validation error message appended to the conversation, giving the model a chance to self-correct. This is practical, widely used, and works well for straightforward extraction tasks. What Instructor does not do is provide a dedicated authoring layer for prompts, generate client code from schema definitions, or go beyond retry logic when the model output is deeply malformed.

PydanticAI goes further than Instructor. It is an agent framework - it handles tool registration, multi-step agent loops, dependency injection, and result validation as part of a unified system. Validation failures feed back into the agent's run loop through a reflection mechanism, giving the model a chance to self-correct - structurally similar to what Instructor does but integrated at the framework level rather than as a wrapper. Comparing PydanticAI and BAML feature-for-feature would miss the point.

The more accurate comparison is about what layer each tool operates at. PydanticAI and BAML both handle structured output and retry behavior, but they do so with different default assumptions. PydanticAI is a Python framework - everything is Python, configured in Python, tested in Python. BAML is a language-level abstraction with its own syntax, its own code generator, and its own parsing engine that operates below what either Pydantic or the model's native JSON mode provides.

If a team is already using PydanticAI and happy with it, BAML is not a necessary replacement. If the team is hitting parsing failures that retry loops do not reliably fix, or needs multi-language client generation, or wants prompt authoring with first-class tooling support, BAML addresses different parts of the problem.

The BAML DSL and Code Generation

BAML is its own language. Not a Python DSL, not a configuration file format - a purpose-built syntax for describing LLM function signatures, data schemas, and prompt templates in a single, unified file format. A .baml file defines the inputs, the expected output structure, and the prompt template that connects them. The BAML compiler - written in Rust - reads those files and generates native client code in Python, TypeScript, Go, Ruby, and other languages. The Rust foundation is also what makes the SAP parsing engine fast enough to run inline on streaming responses without meaningful latency overhead - error correction applies in under 10ms, orders of magnitude cheaper than a retry API call. This is why BAML can credibly claim to be a language-level abstraction rather than a Python-centric library with thin wrappers for other runtimes.

This matters for a reason that is easy to dismiss as aesthetic but is actually structural: when the schema and the prompt live in the same file, they cannot drift apart. In a typical setup, the Pydantic model is in one file, the prompt string is in another, and the parsing logic is somewhere else. When the prompt changes, the schema might not. When the schema changes, the prompt often does not. This is less about convenience and more about eliminating an entire class of bugs - schema drift between prompt, parser, and application code - that is difficult to catch in review and invisible until it surfaces in production. BAML makes these co-located and co-versioned by design.

The generated client code behaves like a typed function call - call the function, pass the inputs, receive the validated return type. The underlying API call, parsing, and error handling are managed by the runtime. Retry behavior is available but opt-in, defined as an explicit policy in the .baml file rather than applied automatically. There is no boilerplate to maintain per endpoint.

Schema Aligned Parsing - BAML's Core Reliability Mechanism

Most structured output approaches rely on either JSON mode (asking the model to emit valid JSON) or function/tool calling (structured prompting that constrains the output format at the API level). Both of these approaches have the same failure mode: when the model output does not conform, parsing fails.

Without BAML, that failure looks like: model returns slightly malformed JSON, the parser throws, the application retries, the model might produce the same output again, and the request either surfaces an error or silently falls back. With BAML, that same malformed output goes through SAP, which extracts the structured data the model clearly intended to produce, and returns a typed object to the application - no retry required.

Schema Aligned Parsing - SAP - takes a different approach. Rather than requiring the model output to be valid JSON before interpretation begins, BAML's parser extracts structured data from whatever the model actually returns, using the declared schema as a guide for what to look for.

Consider what SAP actually handles in practice. A model that wraps its JSON in a markdown code fence - common with instruction-tuned models - would break a strict JSON parser. SAP strips the fences. A model that emits trailing commas or unquoted string values - technically invalid JSON - would fail JSON.parse. SAP corrects them. A reasoning model that outputs chain-of-thought text before the structured object would confuse most parsers. SAP identifies where the structured content begins and parses from there. An enum value returned in a different capitalisation or with surrounding punctuation gets normalised against the declared enum values in the schema.

What SAP does not do is hallucinate missing data. If the model completely omits a required field and there is no recoverable signal in the output, BAML reports a parse failure. The mechanism is about recovery, not invention. The practical result is a substantial reduction in false-negative parse failures - cases where the model actually produced the right conceptual answer but in a form that strict JSON parsing would reject.

This is the technical core of BAML's reliability claim, and it is a real engineering distinction from approaches that rely entirely on the model's ability to produce valid JSON every time.

Prompt Authoring with Jinja Templating

BAML uses Jinja-style syntax for prompt construction - powered by Minijinja, a Rust-native template engine implementing the Jinja templating language - which brings a mature, well-understood templating model into a space where most alternatives are either string concatenation or ad-hoc formatting functions.

The practical benefits are cleaner than they sound. Dynamic context injection - passing a list of documents, a user's history, or a set of retrieved chunks - is expressed as a loop in the template, not as string building in application code. Chat role separation (system prompt, user turn, assistant turn) is handled inline via role macros directly in the template - _.role("system"), _.role("user") - rather than being assembled through data structures outside the prompt. Conditional prompt logic, like including an extended set of instructions only when a particular flag is set, reads like a template rather than a maze of conditional string appends.

The alternative - building prompts through f-strings or concatenation - works until it does not. When prompts reach several hundred tokens with dynamic sections, the only way to debug them is to log the final assembled string and manually reconstruct how it was built - which requires understanding the application code that generated it, not the prompt itself. In BAML, the prompt template is the source of truth and can be inspected, versioned, and tested directly. The Jinja layer also makes it straightforward to separate prompt structure from the data flowing into it, which helps when iterating on prompt content without touching application logic.

Unions and Dynamic Types

BAML's type system supports union types - the ability to declare that a field or return value could be one of several distinct schemas. A model that might return either a SearchResult or an ErrorResponse depending on the query can express that distinction in the schema definition rather than through runtime inspection of the output.

Dynamic types solve a related but different problem. Unions work when the possible schemas are known at compile time. When the schema itself depends on data that only exists at runtime - categories pulled from a database, fields defined by user configuration, or tenant-specific structures - BAML provides a @@dynamic annotation on the type definition and a TypeBuilder API in the generated client. At runtime, application code uses TypeBuilder to add fields or enum variants before making the call, and the parser uses the extended schema to interpret the response.

A concrete example that illustrates both: an extraction pipeline where the possible document types (invoice, contract, medical record) are fixed and known - that is a union, declared once in the .baml file. If those document types and their fields are instead loaded from a database schema at request time, that is where @@dynamic and TypeBuilder come in. The distinction matters: unions are a schema design choice, dynamic types are a runtime extension mechanism.

Token Efficiency

BAML's schema-aware prompting tends to produce shorter system instructions than equivalent prompt engineering done by hand. Because the output structure is declared in the schema and the runtime handles parsing flexibility, prompts do not need extensive instructions about output formatting, JSON validity, or field naming conventions. Those concerns are handled at the tooling layer. For high-volume applications where token costs are meaningful, this reduction in system prompt overhead accumulates.

Semantic Streaming and Generative UI

LLM responses arrive token by token. In a chat interface, streaming the raw text is straightforward. In a structured output pipeline, streaming creates a problem: the output is not parse-able until it is complete, so the application has to buffer everything, parse at the end, and only then update the UI. This introduces latency from the user's perspective - the model is working, but nothing is happening on screen.

BAML's semantic streaming solves this by parsing the output incrementally as tokens arrive. Because the parser knows the expected schema, it can identify which field is being populated as the stream progresses. Streaming attributes on schema fields give developers explicit control over atomicity - a field can be configured to surface only when fully complete, or to stream token-by-token as a partial value, depending on what makes sense for the UI.

This enables a pattern often called Generative UI - rendering partial structured data into meaningful interface components as the model generates the response. An interface showing a list of extracted line items from a document does not need to wait for all line items to load simultaneously. Each item can appear as it is parsed. A dashboard that displays model-extracted analytics fields can populate each card progressively rather than flipping from empty to complete.

The mechanism is not unique to any particular UI framework - it is a property of the streaming parser that the generated client exposes. Applications consuming the stream receive typed partial objects they can render directly.

Testing in BAML

BAML includes a testing layer that allows declaring test cases directly in .baml files alongside the function definitions they test. A test case specifies the input and optionally assertions about specific field values or structural properties of the result, using @@assert expressions evaluated against the actual model output.

Tests run against live model APIs, either through the VSCode playground interactively or via baml-cli test from the command line. The CLI runner makes it straightforward to integrate BAML tests into CI pipelines, running them selectively on merge or on a scheduled basis.

The tooling also includes a playground - PromptFiddle - that surfaces prompt rendering, model output, and parse results interactively. This shortens the iteration loop on prompt changes considerably compared to editing, deploying, and inspecting logs.

Observability - Logging and Tracing

BAML provides structured trace data for every function call through a Collector API: the rendered prompt, the raw model response, the parsed output, timing, and token usage are all accessible by attaching a collector to a function call. This data can be pushed to Boundary Cloud for production dashboards and alerting, or routed to an external observability system.

For teams already using LLM observability tools like Langfuse (I have not used this!) or similar OpenTelemetry-compatible platforms, BAML's trace events integrate through standard logging hooks. The key value is that traces include the pre-parsing and post-parsing representations side by side - which makes it possible to distinguish whether a failure is a model issue (the model produced conceptually wrong output) or a parsing boundary issue (the model produced the right answer in a form the parser could not handle). That distinction matters when deciding whether to adjust the prompt, the schema, or the model configuration.

Where BAML Fits in a RAG Pipeline and with Agent Frameworks

A typical RAG pipeline has several identifiable layers: retrieval (vector search, keyword search, or hybrid), context assembly (chunking, ranking, formatting), model invocation (the API call), and response handling (parsing, post-processing, returning to the caller).

BAML operates at the model invocation and response handling layers. It does not replace a vector database, a retrieval library like LlamaIndex, or a reranking model. It does not manage document ingestion or embedding generation. BAML does not make retrieval better; it makes the interface between retrieval and generation reliable. What it replaces is the ad-hoc code that sits between the API call and the application: prompt construction, output parsing, retry logic, and client generation.

In a RAG system, BAML would typically receive the assembled context - the retrieved chunks, formatted by the application layer - as input to a BAML function. The function template injects that context into the prompt, calls the model, and returns a typed result to the application. The retrieval and chunking infrastructure remains unchanged.

For agent frameworks - the Claude Agent SDK, LangGraph, Autogen, or similar orchestration tools - BAML serves a similar role. Agent frameworks handle tool registration, loop control, state management, and multi-step planning. BAML-backed functions sit outside that loop as callable tools - the framework invokes them the same way it would any other tool, and BAML handles the structured output guarantees for that specific call. They are not alternatives; they operate at different layers. The combination is particularly useful when tools need to return strongly typed structured data that downstream steps in the agent depend on, rather than freeform text that the orchestrator has to interpret.

What to Do Next

The BAML playground at https://www.promptfiddle.com/ runs entirely in the browser - no installation, no API key setup. It is a good place to experiment with the DSL syntax and see how SAP handles malformed model output before committing to local setup. A broader set of working examples covering extraction, classification, streaming, and agent integration is available at https://baml-examples.vercel.app/.

The documentation at docs.boundaryml.com covers installation, the DSL reference, and integration guides for the major model providers. The thing worth evaluating specifically is SAP behavior under the failure cases that already exist in a current system - feed BAML the actual bad outputs that are currently causing parsing failures and observe how the recovery layer handles them. That test is more informative than any benchmark.

As LLM systems move from prototype to infrastructure, the cost of unreliable parsing compounds. BAML represents a considered answer to where that reliability boundary should live - not in the model, not in retry loops, but in a deterministic layer between them.

Sample Github Repository

GitHub - rajkundalia/error-analyzer-with-baml: Analyze Java compilation and runtime errors using BAML with a local Ollama model.

Resources

These are the resources and links that I used to know more:

Sample projects that I found while exploring:

Try out BAML:

From println to Production Logging: Internals and Performance Across Languages and the OS

Raj Kundalia — Sun, 22 Feb 2026 16:01:56 +0000

If you do not want to read the article, it is A-OK:

I got interested in logging — and because now we have LLM at our fingertips for asking questions, I decided to form a question bank first:

How are loggers implemented in different languages or in OS's?
How efficient is logging in different OS?
How much overhead does loggers bring?
How are they efficiently implemented?
How much of a difference is there between sys out vs. writing to a file vs. a logger vs. streaming logs in terms of efficiency and performance? Can we measure this? Compare similar methods for other languages.
How does logger get information that it is coming from this file? What is the mechanism for this in different languages? — Very important question
What part of logging filters is based on log level?
The first thing the logger does is compare the message's level integer against its own threshold integer; if the message level is lower, it returns immediately and nothing else runs. Is this based on configuration?
Which is the most efficient language to write loggers in that would still be usable in other languages — or does something like this not make sense?
Why are markers used in logging? What does it solve that we cannot already solve without them? I know Java contains Markers, but do other languages contain them?
When I provide a lower log level while writing loggers but keep a higher log level in the configuration, does it create a performance impact? (e.g., having many Debug and Trace loggers while the log level is kept at Info).
In Java, are the placeholders in the loggers — such as Request was successful user={}, userId—concatenations, or is some other mechanism used for them?

If you do not want to read the article, you can skip it and use this question bank to form your own understanding.

GitHub - rajkundalia/logger-internals-java: A Java logging library built from scratch - exploring async handlers, structured fields, granular caller info…

TL;DR

We all assume a disabled log call costs nothing. It doesn't — the level check is cheap, but any string you constructed before passing it to the logger is already gone, whether the log fires or not.
Every time you see a class name and line number in a log output, something paid for that. In Java, when caller info is enabled, it's a runtime stack walk. In C and Rust, it was resolved at compile time and costs nothing at runtime. Most engineers have never had reason to think about the difference.
logger.info("User {}", user) is not just cleaner syntax. It's a different evaluation model — the string is only built if the log actually fires. "User " + user is evaluated before the logger even sees it.
Async logging feels like a free upgrade. It isn't. It changes what you can trust about your logs when something crashes — and the logs you lose are exactly the ones you needed.
In Rust and C/C++, a disabled log call can be removed from the binary entirely at compile time. In Java and Python, it always exists at runtime, even if it does nothing. The language made this choice.
Go and C logging stacks sit closer to the OS than JVM-based logging stacks. There are fewer layers between the log call and the syscall. That distance has a cost, and it compounds under load.

Thanks to LLMs I could create this: https://github.com/rajkundalia/logger-internals-java

Why Logging Is Not Just Printing

Most of us haven't considered how much happens between our code calling logger.info(...) and that string reaching disk: a level check, a formatter, a handler with its own buffering strategy, a lock or queue depending on sync versus async mode, a syscall into the kernel, and sometimes a second system — syslog, journald — that takes over from there. At scale, that pipeline has real cost. String formatting allocates. Synchronous file writes add latency to every thread that logs. A slow disk creates backpressure that stalls application threads. And in a distributed system where logs are your only audit trail, how that pipeline behaves during a crash is not an edge case — it is a design constraint you either chose or inherited without knowing it. None of that is obvious from a println.

The Pipeline: What a Logger Actually Does

Before pulling any of this apart, it helps to see the whole shape at once:

Application
    ↓
Logger
    ↓
Level Filter
    ↓
Formatter
    ↓
Appender / Handler
    ↓
Operating System
    ↓
Disk / Stream

The application emits a log event with a level, message, and arguments. The logger checks whether the configured threshold allows the event through. If it passes, the formatter constructs the final string — interpolating placeholders, appending timestamps, resolving caller location. The appender or handler takes that string and writes it somewhere: a file, stdout, a socket, a rolling buffer. That write becomes a system call, handing control to the OS, which manages buffering and flush behavior before data actually hits disk. Each stage has cost. Each stage is a place where things can go wrong or get optimized. The rest of this post is about what happens at each one.

Log Level Filtering Internals

Here's something that seems obvious until you think about it: a DEBUG log call in a hot loop, in a production service configured at INFO, runs on every single iteration. It doesn't log anything — but it doesn't disappear either.

The level check itself is cheap. Each level maps to an integer, and the check is a comparison — INFO against whatever the event's level is, early return if it doesn't pass. No formatting, no allocation, no appender invocation. In Logback, higher integers map to higher severity — TRACE is 5000, ERROR is 40000. java.util.logging follows the same direction but uses a different numeric scale and different level names: FINE is 500, SEVERE is 1000. The ordering is not inverted — the scales and names just don't align. Either way, the comparison is fast.

What I found more interesting is where in the pipeline the check actually happens. I assumed there was one gate. There are often several. In Java's SLF4J backed by Logback, the logger checks first — that's the fast path. But appenders can have their own filter chains, meaning an event can clear the logger-level check and still be dropped downstream. This is deliberate and useful: you can send WARN and above to a file, ERROR and above to an alert sink, and everything to stdout, all from the same pipeline. But it means filtering is not a single decision — it's a sequence of decisions, each adding a small amount of overhead to events that reach it.

The real cost isn't the check. It's everything you did before the call site. If you constructed a string before passing it to the logger, that work happened regardless of whether the log fires. Which is exactly why placeholder syntax exists, and why it's not just a style preference.

How a Logger Knows Where It Came From

You've probably never thought about how a log line knows it came from UserService.java:142. It just appears. What's actually happening underneath varies so much across languages that it's worth making explicit — because the cost difference is not small.

In Java, two approaches exist. The older one constructs a Throwable and extracts the stack trace — the JVM walks the call stack and allocates an array of frame objects. The newer approach, StackWalker introduced in Java 9, is lazy and stream-based: you only materialize the frames you actually need. Both are runtime operations with real cost, which is why caller location logging is configurable in most Java frameworks and off by default in many Logback configurations. You can see how this plays out in the reference implementation at https://github.com/rajkundalia/logger-internals-java.

Python captures caller information as part of LogRecord creation, inside _log(), which is only reached after the level check passes. The depth of that inspection — whether stack info is captured, whether additional frame walking occurs — depends on configuration and what the formatter requests. The cost is not paid on every call, but it is paid at record creation time, not at formatting time.

Go makes this explicit. runtime.Caller(skip int) returns the file, line, and function name when you ask for it. It's a runtime operation, but controlled — you call it when you need it, rather than it being woven into every log record automatically.

C and C++ sidestep runtime cost entirely. __FILE__ and __LINE__ are preprocessor macros, expanded at compile time. By the time the binary runs, those values are string literals and integers baked into the executable. No stack walking, no frame introspection, nothing.

Rust takes the same approach through the log crate's macro system. log::info!("...") expands at compile time to include the module path and line number as constants. The binary contains no machinery for discovering caller location — it was resolved before the program ran.

The gap between compile-time resolution and runtime stack walking is the kind of thing that's invisible until you're logging at high volume. C/C++ and Rust pay nothing. Java pays on every logged event where caller info is enabled. Go pays when you ask. Most engineers pick a logging framework without knowing which of these models they've signed up for.

Placeholders vs String Concatenation

These two lines look similar. They are not:

// Eager: string is built before the logger is invoked
logger.info("Connected user: " + user.toString());

// Lazy: string is only built if the level check passes
logger.info("Connected user: {}", user);

In the first version, the JVM evaluates user.toString() and concatenates the string before the logger receives anything. If the level check drops the event — which it will, for any DEBUG or TRACE call in a production service configured at INFO — that allocation and work was wasted. At low log volumes this is invisible. Scattered through hot paths at high throughput, it accumulates.

In the second version, user is passed as an object reference. The logger receives the raw argument. Only if the event clears the level filter does the formatter resolve the placeholder and build the final string. toString() is never called otherwise, and no intermediate string is allocated.

This only matters because of how filtering works — specifically the early return discussed in the filtering section. The two design choices reinforce each other: a cheap level check creates the condition under which deferred string construction delivers its benefit. If logging were unconditional, the distinction wouldn't save anything.

OS Interaction: Where Language Logging Ends and the OS Begins

There's a boundary in every logging pipeline that most application engineers have never had reason to think about: the point where your code hands a string to the OS and stops being in control of what happens next.

When an appender writes to a file, it eventually calls write() — a system call. Everything above that boundary is the language runtime: string formatting, in-memory buffering, lock acquisition. Everything below it is the kernel: its own buffers, filesystem cache, eventual persistence to disk. Crossing that boundary involves a context switch from user space to kernel space. It's not free, and it happens on every unbuffered write.

This is why buffered I/O matters. Rather than one write() per log line, most production logging configurations accumulate output in memory and flush periodically or when the buffer is full. Fewer syscalls, higher throughput. The trade-off: a crash can lose whatever is buffered and not yet flushed. You are always choosing between durability and throughput at that boundary, whether you know it or not.

The OS also offers its own logging infrastructure — syslog on POSIX systems, journald on Linux. These are daemons that accept log messages via a socket and handle buffering, rotation, and persistence outside your application entirely. The boundary shifts: your application writes to a socket, and the daemon takes responsibility for the rest. Structured fields are first-class in journald. Log rotation is not your problem. The cost is IPC (Inter-process communication) overhead — a socket write instead of a local file write.

Go and C-adjacent logging stacks sit naturally close to this boundary. Go's os.File.Write is a thin wrapper over write() with minimal overhead between your code and the syscall. JVM logging absolutely works at scale — but it involves more layers: GC-managed heap allocations, object creation for log events, the JVM's own I/O abstraction. Those layers add up under load.

Synchronous vs Asynchronous Logging

At some point, most engineers configure async logging and move on. Throughput goes up, latency on application threads drops, and nothing seems worse. It feels like a free upgrade.

Here's what actually changed: you no longer have a guarantee that a log line you wrote ever reached disk.

Synchronous logging blocks the calling thread until the write completes. The appender acquires a lock, formats the string, calls write(), releases the lock. Every log call has latency. Under high write volume to a slow disk, this becomes a bottleneck that shows up on every application thread that logs.

Async logging breaks this coupling. Your thread drops an event into a queue and returns immediately. A dedicated logging thread drains the queue, formats events, and writes to the appender. Throughput increases because writes get batched. Thread latency drops to the cost of a queue insertion. This sounds like a strict improvement. It is not.

The queue is bounded. Under sustained high load it fills up. At that point the framework has a decision to make: block the calling thread, drop the event, or expand the queue. Many async logging implementations are configured to drop lower-severity events under pressure unless explicitly set to block — Logback's AsyncAppender, for instance, starts discarding TRACE, DEBUG, and INFO events when the queue reaches 80% capacity by default, while WARN and ERROR are retained. Which means under the conditions where your system is most stressed, in the moments just before something breaks, you may be losing the exact log lines that would have told you why.

The crash case is worse. Events sitting in the queue when the application crashes never reach the appender. Your crash logs — the ones you needed most — may not exist.

Async logging is worth using. It is the right choice in many high-throughput systems. But it is an architectural decision about what you are willing to lose and when. Using it without understanding the failure contract means you have made that trade without knowing it.

Compile-Time vs Runtime Filtering

Something I hadn't considered when I started this: in Java, Python, and Go, a disabled log call still exists in the binary. In Java and Python this is unambiguous — the level check runs on every call. Go's compiler is more aggressive about in-lining and dead code elimination, so the picture is less clear-cut and depends on the logging library and how it's implemented. But in none of these languages can the call be eliminated entirely at compile time the way it can in Rust or C/C++.

Take a TRACE call inside a hot loop in a Java service configured at INFO. On every iteration, the JVM executes an integer comparison and branches. The call is suppressed, but it was visited. At high enough frequency, that cost appears.

In Rust and C/C++, this can be eliminated entirely. A trace!() macro in Rust, conditioned on a compile-time feature flag, is removed by the compiler if tracing is disabled at build time. The instruction does not exist in the binary. There is no branch, no comparison, no overhead of any kind. The code was removed before the program ran.

The trade-off is operational flexibility. A Java application can change its log level at runtime — attach to a running JVM, set the Logback threshold to TRACE, watch debug output appear without a restart. A C binary compiled with TRACE disabled cannot do this. The capability is gone. You traded dynamic observability for zero runtime cost.

Which is right depends on context. A long-running service that needs live level adjustment values the runtime flexibility. A systems program where every cycle matters may prefer compile-time elimination. Most languages make this choice implicitly, as part of how their logging ecosystem is designed. It is worth knowing which choice your language made for you.

Cross-Language Comparison Table

Language	Caller Detection	Filter Type	Async Ecosystem	Compile-time Elimination
Java	StackWalker / Throwable	Runtime	Logback AsyncAppender	No
Go	runtime.Caller	Runtime	zap, zerolog (non-block)	No
Python	currentframe / LogRecord	Runtime	QueueHandler	No
C/C++	FILE, LINE macros	Runtime / Compile	spdlog async mode	Yes (preprocessor)
Rust	Compile-time macro expansion	Runtime / Compile	tracing crate	Yes (feature flags)

Markers in Java/SLF4J — A Brief Callout

Log levels give you one axis for filtering: severity. But severity alone can't answer a question like "show me all security-related events, regardless of level." That's what Markers solve. In SLF4J, a Marker is a named tag attached to a log event — SECURITY, AUDIT, BILLING — that appenders can filter on independently of level. You can route all AUDIT-marked events to a dedicated file while dropping untagged DEBUG events entirely. It's multi-dimensional filtering: level is one axis, marker is another. Other ecosystems approximate this — Go's zap uses structured fields, Python's logging has Filter objects that can inspect arbitrary LogRecord attributes — but SLF4J Markers are one of the cleaner formulations of the idea, and they're underused in codebases that reach for custom log levels when what they actually need is a second axis.

What Surprised Me

We all assume async logging was a performance upgrade with no real downside. It's a trade — lower latency on application threads in exchange for weaker guarantees about what survives a crash. That trade is often worth making. It's not invisible.

I didn't expect caller detection to have such variance across languages. The gap between __FILE__ resolved at compile time and StackWalker walking the call stack at runtime is not a footnote — it's an architectural difference that shows up under load, and most engineers pick a logging framework without knowing which model they've chosen.

Filtering being a pipeline of gates, not a single check, was more nuanced than I expected. I assumed one threshold, one decision. In practice, logger-level filters and appender-level filters can conflict, and events can be dropped at multiple points for different reasons.

The syscall boundary reframed how I think about logging performance. Everything above it is yours — allocations, formatting, buffering. Everything below it is the kernel's. Understanding where that boundary sits, and how often you cross it, makes the buffering trade-offs obvious in a way they weren't before.

Compile-time log elimination felt genuinely strange when I first understood it. The log crate in Rust doesn't just suppress a call when a level is disabled — the code is removed entirely from the binary by the compiler. That's a fundamentally different model from anything Java or Python offer, and it matters in contexts where it matters.

Markers are really interesting. The logs that are easiest to reason about in production are the ones where someone thought carefully about how to filter them — not just what level to assign, but what category they belong to. It's a small design decision that compounds over time.

Resources

These are the rabbit holes that led here.

https://stackoverflow.com/questions/26949503/how-exactly-is-the-logger-a-singleton-and-how-are-different-log-files-created-i — The good old StackOverFlow had a question regarding this.
https://docs.oracle.com/javase/6/docs/technotes/guides/logging/overview.html
https://www.reddit.com/r/java/comments/rdv98z/have_you_ever_wondered_how_javas_logging/ — Down the memory lane.
https://www.loggly.com/ultimate-guide/java-logging-basics/
https://www.marcobehler.com/guides/java-logging
https://signoz.io/guides/java-log/ — table for log level is very good
https://github.com/pinojs/pino — JS Library for logging
https://davidagood.com/logging-in-java/ — Java's logging is crazy
https://github.com/TheTechGranth/thegranths/tree/master/src/main/java/SystemDesign/LoggingFramework — a good basic logger
https://www.youtube.com/watch?v=hOzH7ecc8vg&t=2s — a good explanation for LLD for logger
https://www.youtube.com/live/QV4O9u1N_XU?si=lO4YYFxf-jOk5tTb
https://algomaster.io/learn/system-design/logging — logging best practices

Distributed Tracing in Spring Boot: A Practical Guide to OpenTelemetry and Jaeger

Raj Kundalia — Sat, 31 Jan 2026 18:23:48 +0000

TL;DR

Distributed tracing helps you understand how requests flow through microservices by tracking every hop with minimal overhead. This guide covers OpenTelemetry integration in Spring Boot 4 using the native starter, explains core concepts like spans and context propagation, and demonstrates Jaeger-based tracing with best practices for production. Whether you're debugging latency issues or optimizing service dependencies, distributed tracing provides the visibility modern architectures demand.

GitHub Repository: learning-distributed-tracing

The Problem: Debugging in the Dark

In a monolithic application, debugging a slow request is straightforward. Add some logging, attach a profiler, and you can see exactly where time is spent. But microservices change everything. A single user request might touch ten or more services, each with its own logs. Failures often happen between services, not inside them. When something breaks or slows down, where do you even start?

Traditional logging falls short here. Sure, you can correlate logs by request ID, but manually piecing together the journey across services, databases, and queues is tedious and error-prone. You need something that automatically tracks the entire execution path, measures timing at each step, and shows you the complete picture. That's distributed tracing.

Understanding Observability: Metrics, Logs, and Traces

Modern observability rests on three pillars. Metrics are numerical measurements like CPU usage or request count—great for alerting but lacking context for debugging. Logs are discrete events that tell you what happened at a specific moment but struggle with correlation across distributed systems. Traces capture the complete journey of a request through your system, showing execution flow and timing.

These pillars complement each other. Metrics tell you there's a problem, logs provide event details, and traces show you the execution path. Together, they form a complete observability strategy.

It's worth distinguishing observability from monitoring. Monitoring answers "Is the system healthy?" through dashboards and alerts. Observability answers "Why is the system behaving this way?" by designing systems to answer questions you didn't anticipate. Distributed tracing is a core enabler of observability, not a replacement for monitoring.

The Fundamentals of Distributed Tracing

Telemetry refers to automated data collection from remote sources—your application constantly reporting its health and activity. Spans are the building blocks of traces, representing units of work with start time, duration, and metadata. When Service A calls Service B, both create spans that form a parent-child relationship showing the call hierarchy.

Traces are collections of spans representing a single transaction. A trace ID ties all related spans together across service boundaries. Context Propagation maintains trace continuity—when Service A calls Service B, it passes the trace context in HTTP headers, allowing Service B to create child spans under the same trace.

OpenTelemetry: The Industry Standard

Before OpenTelemetry, every observability vendor had proprietary SDKs and formats. If you wanted to switch from Jaeger to Zipkin, you'd re-instrument your entire codebase. This vendor lock-in meant architectural decisions became permanent commitments.

OpenTelemetry is a vendor-neutral framework providing APIs, SDKs, and tools for telemetry data. Formed by merging OpenTracing and OpenCensus, it provides a single instrumentation API that works with any backend. The value proposition is simple: instrument once, send data anywhere.

The architecture includes the API and SDK for creating telemetry, Auto-instrumentation for frameworks like Spring and JDBC, and the Collector—an optional but recommended component that receives, processes, and exports telemetry.

While this article focuses on distributed tracing, it's worth noting that OpenTelemetry standardizes all three pillars of observability—metrics, logs, and traces. The same SDK and protocol handle all three, giving you a unified approach to instrumentation across your entire observability stack.

OTLP (OpenTelemetry Protocol) is the wire format for transmitting telemetry data. Supporting both gRPC and HTTP transports, OTLP defines how traces, metrics, and logs are serialized and sent to collectors or backends. The protocol handles backpressure, retries, and batching for reliable delivery. Most modern observability tools now support OTLP natively, making it the de facto standard.

Spring Boot 4 and OpenTelemetry Integration

Spring Boot 4 brings first-class support for OpenTelemetry through the spring-boot-starter-opentelemetry dependency. This starter provides automatic configuration and instrumentation for common scenarios like HTTP requests, database calls, and messaging.

Previous versions of Spring Boot required manual setup using the OpenTelemetry Java agent or custom configuration. Spring Boot 2 and 3 users could leverage the Java agent for bytecode instrumentation, which worked but added operational complexity. The agent approach meant deploying a JAR alongside your application and configuring it via environment variables or system properties.

With Spring Boot 4, the starter eliminates much of this complexity. Add the dependency, configure a few properties, and you're done. Under the hood, it uses Spring's auto-configuration to set up the OpenTelemetry SDK, register instrumentation libraries, and configure exporters based on your application properties.

The starter automatically instruments:

HTTP requests and responses via Spring MVC and WebFlux
RestTemplate, RestClient, and WebClient calls
JDBC database operations
Logs (automatically includes trace and span IDs)

For additional instrumentation like Kafka messaging, you can use the @WithSpan annotation for manual instrumentation, or use the OpenTelemetry Java Agent which provides automatic instrumentation for 150+ libraries.

Spring Boot Actuator's Role: While Actuator isn't required for tracing, it plays a complementary role in Spring Boot 4's observability story. Actuator's ObservationRegistry is what actually observes requests and framework operations. The OpenTelemetry starter bridges these observations into OTel-compliant traces. Think of Actuator as operational introspection (health, metrics) and OpenTelemetry as behavioral introspection (request flows).

You can still use the Java agent if you need instrumentation for libraries outside Spring's ecosystem, but for typical Spring Boot applications, the starter is sufficient and more maintainable. Framework-level instrumentation gives you baseline visibility automatically, while custom spans should be added only where domain insight is needed. This balance is critical—over-instrumentation creates noise, while under-instrumentation hides intent.

Jaeger: Your Trace Backend

Jaeger is an open-source distributed tracing platform originally developed by Uber, providing storage, querying, and visualization for traces. While OpenTelemetry handles generation and collection, Jaeger handles the backend.

Jaeger's architecture includes agents, collectors, a query service, and a web UI. For development, the all-in-one Docker image combines all components. A common misconception is that Jaeger requires Kubernetes—it doesn't. Jaeger runs on Docker, VMs, or bare metal. The all-in-one image works for local development, while production typically uses separate components with external storage like Cassandra or Elasticsearch.

Jaeger supports multiple ingestion formats, including OTLP. With OpenTelemetry's standardization, OTLP is now recommended, meaning your Spring Boot application sends traces in OTLP format directly to Jaeger without needing Jaeger-specific libraries.

Tracing Beyond Services: Databases and Message Queues

One of the most powerful aspects of distributed tracing is visibility into external dependencies. When your application makes a database call or publishes to Kafka, those operations appear as spans in your trace.

Database tracing works through JDBC instrumentation. When your Spring Boot application executes a SQL query, the OpenTelemetry instrumentation automatically creates a span containing the query, execution time, and database connection details. This visibility is crucial for identifying slow queries or N+1 problems—those situations where you're executing one query to fetch entities, then N additional queries to fetch related data for each entity. Database spans make these anti-patterns immediately visible in your trace timeline. However, be mindful of sensitive data. Database spans can include SQL statements with parameter values, which might contain PII. OpenTelemetry provides span processors to redact or mask sensitive information before export.

Message queue tracing extends traces across asynchronous boundaries. When Service A publishes a message to Kafka, it injects the trace context into message headers. When Service B consumes that message, it extracts the context and continues the trace. This creates a parent-child relationship between the producer and consumer spans, even though they execute at different times. The result is end-to-end visibility into asynchronous workflows, making it much easier to debug message processing issues or track down where data transformations went wrong.

Performance Impact and Production Considerations

Distributed tracing adds overhead from creating spans, serializing data, and network transmission. The impact varies by component:

CPU: Span creation and serialization typically add microseconds per operation. The OpenTelemetry SDK uses efficient batching to minimize per-span overhead.

Memory: The SDK buffers spans before export. Configure batch size and timeout based on traffic patterns and memory constraints to prevent excessive buffering.

Network IO: Sending traces to a local collector over localhost has minimal impact. Remote backends introduce latency and bandwidth usage. Using a collector to batch and compress traces reduces network overhead significantly. Importantly, the collector absorbs most of the performance cost, acting as a buffer between your applications and backends.

In practice, overhead is typically under 5 percent for CPU and memory. The key is intelligent sampling—trace 1-5 percent of traffic in production rather than every request (development should trace 100 percent for debugging). OpenTelemetry supports probability-based sampling for production and rate-limiting to cap traces per second.

Best Practices for Distributed Tracing

Use meaningful span names: "validatePaymentRequest" beats "process" every time. Good naming makes traces self-documenting.

Add relevant attributes: Follow OpenTelemetry semantic conventions for HTTP, databases, and queues. Add custom attributes for business context like user ID or tenant ID.

Don't over-instrument: Creating spans for every method produces noise. Focus on external calls, database queries, and significant business logic.

Implement proper error handling: Mark spans as failed and record exception details when errors occur. This helps identify which service and operation caused failures.

Sample intelligently: Trace everything in development (probability 1.0), but use 1-5 percent sampling in production (probability 0.01-0.05). This gives you statistically significant insights without overloading infrastructure. Consider adaptive sampling that increases rates for slow requests or errors.

Watch for orphaned spans: When requests hand off work to async thread pools, ensure context propagation is maintained. If a new thread loses the trace context, your trace will break, resulting in disconnected "orphaned spans" that can't be correlated. Spring Boot 4 usually handles this automatically, but verify your custom executors are properly instrumented.

Use the Collector: It provides buffering, enrichment, routing, and reliability that SDK exporters alone cannot.

Monitor your telemetry pipeline: Track export success rates and latency. If your pipeline breaks, you're debugging blind.

Querying and Analyzing Traces

Jaeger's UI provides powerful analysis tools. Search for traces by service, operation, tags, duration, and time range. The trace timeline shows the complete request flow with parent-child relationships visually nested. For advanced use cases, Jaeger Query Language (JQL) enables programmatic querying and integration with automated alerting systems. The trace comparison feature helps identify performance regressions by highlighting timing differences between trace versions.

Conclusion

Distributed tracing transforms how you understand and debug microservices. By automatically capturing request flows and timing information, it eliminates the guesswork from performance analysis and incident response. OpenTelemetry provides the standardized instrumentation, OTLP handles reliable transmission, and backends like Jaeger give you the visualization and querying tools to make sense of the data.

Spring Boot 4's native OpenTelemetry support makes adoption straightforward. Add the starter, configure your exporter, and you're tracing HTTP requests, database queries, and message queues with minimal code. The result is a system where every request tells its own story, complete with timing, dependencies, and errors.

Start small. Enable tracing in one service, verify the data reaches Jaeger, and gradually expand to your entire application. The visibility you gain will pay dividends the first time you debug a cross-service issue or optimize a slow endpoint. Distributed tracing isn't just a monitoring tool; it's a fundamental shift in how you understand distributed systems.

For hands-on examples and complete configuration, check out the learning-distributed-tracing repository.

Learning Links:

https://spring.io/blog/2025/11/18/opentelemetry-with-spring-boot
https://opentelemetry.io/docs/zero-code/java/spring-boot-starter/
https://foojay.io/today/spring-boot-4-opentelemetry-explained/
https://last9.io/blog/opentelemetry-for-spring/
https://signoz.io/blog/opentelemetry-spring-boot/
https://vorozco.com/blog/2024/2024-11-18-A-practical-guide-spring-boot-open-telemetry.html
https://medium.com/cloud-native-daily/how-to-send-traces-from-spring-boot-to-jaeger-229c19f544db
https://medium.com/xebia-engineering/jaeger-integration-with-spring-boot-application-3c6ec4a96a6f
https://blog.vinsguru.com/distributed-tracing-in-microservices-with-jaeger/
https://last9.io/blog/distributed-tracing-with-spring-boot/
https://signoz.io/blog/jaeger-vs-zipkin/

LangChain vs LangGraph vs LangSmith: Understanding the Ecosystem

Raj Kundalia — Sat, 17 Jan 2026 13:29:07 +0000

Building LLM apps isn’t just about prompts anymore.
It’s about composition, orchestration, and observability.

TL;DR

LangChain provides the foundational building blocks for creating LLM applications through modular components and a unified interface for working with different AI providers.
LangGraph extends this foundation with stateful, graph-based orchestration for complex multi-agent workflows requiring loops, branching, and persistent state.
LangSmith completes the picture by offering observability, tracing, and evaluation tools for debugging and monitoring LLM applications in production.

Use:

LangChain for straightforward chains and RAG systems
LangGraph when you need sophisticated state management and agent coordination
LangSmith throughout development and production for visibility into behavior

Hands-on GitHub Repositories

LangChain RAG Project → https://github.com/rajkundalia/langchain-rag-project
LangGraph Analyzer → https://github.com/rajkundalia/langgraph-analyzer
LangSmith Learning → https://github.com/rajkundalia/langsmith-learning

Introduction

The landscape of LLM application development has evolved rapidly since 2022.

What began as simple prompt–response interactions has grown into multi-step workflows involving retrieval systems, tool usage, autonomous agents, and long-running processes. This evolution introduced new problems at each stage of the development lifecycle.

The composition problem → How do you connect prompts, models, tools, and data?
The orchestration problem → How do you manage branching, retries, loops, and shared state?
The observability problem → How do you debug, evaluate, and monitor these systems?

The LangChain ecosystem emerged to address each layer:

Problem	Tool	Year
Composition	LangChain	2022
Orchestration	LangGraph	2024
Observability	LangSmith	2023–2024

Each tool targets a specific layer in the LLM application stack.

LangChain: The Foundation

LangChain is the core framework for building LLM-powered applications.

Its primary goal is abstraction: different LLM providers expose different APIs, capabilities, and quirks. LangChain hides these differences behind a unified interface.

Core Building Blocks

LangChain is composed of modular, swappable components:

Prompts – Templates and structured inputs for models
Models – OpenAI, Anthropic, Google, or local LLMs
Memory – Conversation history and contextual state
Tools – Function calls to external systems
Retrievers – Vector databases and RAG pipelines

LCEL: LangChain Expression Language

What ties everything together is LCEL.

LCEL introduces a declarative, pipe-based syntax for composing chains:

prompt | model | output_parser

Instead of writing imperative glue code, you describe data flow.

Why LCEL Matters

LCEL enables:

Automatic async, streaming, and batch execution
Built-in LangSmith tracing
Parallel execution of independent steps
A unified Runnable interface (invoke, batch, stream)

This makes chains faster, cleaner, and easier to reason about.

Multi-Provider Support

LangChain supports dozens of LLM providers and integrations.

You can switch providers by changing one line of configuration, enabling:

Vendor independence
A/B testing across models
Cost and latency optimization

When LangChain Is Enough

Use LangChain when your workflow is primarily:

Input → Process → Output

Typical use cases include:

Chatbots with memory
RAG-based Q&A systems
Natural language → SQL generation
Linear tool pipelines

If your application doesn’t need complex branching or shared long-lived state, LangChain is the right tool.

LangGraph: Stateful Agent Orchestration

LangGraph solves the orchestration problem.

As soon as your application needs to:

make decisions,
loop,
retry,
or coordinate multiple agents, linear chains start to break down.

Graph-Based Architecture

LangGraph models your application as a directed graph:

Nodes → processing steps or agents
Edges → execution flow between nodes

This enables patterns that are hard or impossible with chains:

Loops and retries
Conditional branching
Parallel execution
Shared, persistent state

State as a First-Class Concept

Every LangGraph workflow operates on a shared state object.

Nodes receive the current state
They compute updates
Updates are merged back into state

This allows multiple agents to collaborate naturally.

Example:

Research agent gathers sources
Fact-checking agent validates claims
Synthesis agent produces the final answer

All without complex message passing.

Conditional Routing

LangGraph supports conditional edges.

A function decides which node runs next based on runtime state:

Route customer queries to specialist agents
Loop back when required information is missing
Retry until success conditions are met

Persistence & Checkpointing

LangGraph includes built-in checkpointing:

Persist state across restarts
Resume long-running workflows
Support human-in-the-loop pauses
Enable time-travel debugging

This is critical for production-grade agent systems.

Visualization Support

LangGraph workflows are inspectable and exportable:

Mermaid diagrams for documentation
PNG images for presentations
ASCII graphs for terminal debugging

This makes complex agent systems understandable and communicable.

When You Need LangGraph

Choose LangGraph when you need:

Explicit shared state
Runtime decision-making
Retry and failure recovery
Multi-agent coordination
Long-running workflows

A classic example is an autonomous research agent that iteratively searches, reads, verifies, and synthesizes information.

LangSmith: The Observability Layer

LangSmith answers the question:

“What is my LLM application actually doing?”

It doesn’t build workflows — it illuminates them.

Tracing Everything

LangSmith captures full execution traces:

Prompts and responses
Token usage and latency
Component call stacks
Errors and retries

You can drill down from:

a full workflow run → to a single LLM call.

This makes debugging dramatically easier.

Evaluation & Regression Testing

LangSmith allows you to:

Create evaluation datasets
Run structured tests
Track quality metrics
Compare prompts and models

This enables regression testing for LLM apps — a must-have for production systems.

Production Monitoring

In production, LangSmith tracks:

Response times
Error rates
Token and cost trends
Usage by workflow or user

Alerts help you catch issues early and optimize costs.

Framework-Agnostic

While LangSmith integrates seamlessly with LangChain and LangGraph, it’s not limited to them.

You can instrument any LLM application with LangSmith.

Quick Comparison

Tool	Solves	Use When
LangChain	Composition	Linear workflows, RAG, simple agents
LangGraph	Orchestration	Branching, loops, shared state, multi-agent
LangSmith	Observability	Debugging, evaluation, production monitoring

The Broader Ecosystem

LangFlow

LangFlow provides a visual, drag-and-drop interface for building LangChain workflows.

Great for prototyping
Helpful for non-technical collaboration
Often exported to code for production

Model Context Protocol (MCP)

MCP (by Anthropic) standardizes tool and resource access for LLMs.

Works at the tool/retriever layer
Complements LangChain and LangGraph
Reduces custom integration effort
Framework-agnostic

MCP does not replace orchestration tools — it enhances connectivity.

Conclusion

The LangChain ecosystem is layered, not competitive.

LangChain builds the core logic
LangGraph manages complex workflows
LangSmith makes everything observable

Most serious LLM applications will use more than one of these tools.

Start simple, add complexity only when needed, and never ship without observability.

Understanding Model Context Protocol (MCP): Beyond the Hype

Raj Kundalia — Mon, 08 Dec 2025 17:02:38 +0000

As always, I have created code repositories which will be easier to understand; also, resources much better than what I have here are added at the bottom:
MCP Book Library: https://github.com/rajkundalia/mcp-book-library
MCP Toolbox: https://github.com/rajkundalia/mcp-toolbox

As software engineers, we were and are witnessing a fragmentation problem in the AI ecosystem. Every major model provider (Anthropic, OpenAI, Google) and every tool (Linear, GitHub, Slack) has its own proprietary integration pattern. If you want Claude to talk to your PostgreSQL database, you write a specific integration, and if you switch to GPT-5, you rewrite it.

This “m × n” integration problem — where m models need to connect to n tools — is creating an exponential explosion of custom code. It is one of the primary bottlenecks preventing LLMs from becoming true agents.

Enter the Model Context Protocol (MCP).

What Is MCP?

The Model Context Protocol is an open standard that defines how AI models interact with data and tools. Think of it as a “USB-C port” for AI applications.

In short, MCP removes the need for bespoke integrations between every tool and every AI model. Instead of building a specific connector for every data source to every AI model, MCP provides a universal protocol.

If a tool is “MCP compliant,” any MCP client (like Claude Desktop, Cursor, or Zed) can instantly connect to it without custom glue code.

Why MCP?

The value proposition is decoupling.

For tool builders: You build one MCP server for your API. It now works with Claude, Cursor, and any future MCP-compliant application.
For AI app developers: You build your host application once and gain access to the entire ecosystem of MCP servers (Google Drive, Slack, PostgreSQL, etc.).
For end users: You can switch between AI providers without losing access to your tools.

This solves the m × n problem by reducing it to m + n. The math alone makes the case compelling.

How MCP Works Architecturally

The architecture relies on a triangle of roles. The “Client” is often hidden inside the application you are using.

MCP Hosts: The user-facing application (e.g., Claude Desktop, Zed, or a custom dashboard). The Host orchestrates the flow, manages the UI, and contains the LLM.
MCP Clients: The bridge (often a library) embedded within the Host. It maintains the connection with the Server, negotiates capabilities, and routes requests.
MCP Servers: Where your custom logic lives. A server wraps a capability (Postgres, file system, REST API) and exposes it via standardized primitives.

Core MCP Primitives

When you write an MCP server, you are generally exposing one of these three capabilities.

Resources: Passive data. The client asks to “read” a URI (for example, postgres://logs/latest). These are analogous to file reads—informational only.
Tools: Executable functions, allowing the LLM to take action (for example, execute_sql_query, send_slack_message).
Prompts: Reusable context. A server can define a template (for example, “Analyze Error Logs”) that the host loads to jumpstart a conversation.

Capability Discovery and Schemas

A critical part of the protocol is discovery. When a client connects, it asks the server, “What can you do?” and the server responds with a list of tools and resources, including JSON Schemas for arguments.

This is how the LLM knows exactly which parameters (for example, isbn: string) are required to call a tool, enforcing type safety at the model level.

Why JSON-RPC 2.0?

MCP uses JSON-RPC 2.0 for its wire protocol, and this choice maps naturally to the problem space.

Bidirectional: JSON‑RPC supports both requests and notifications from either side over a single logical session, which maps cleanly onto long‑lived transports like stdio or streaming HTTP.
Session-based: MCP sessions are often long-lived. JSON-RPC handles this persistent state naturally without the overhead of stateless HTTP headers for every interaction.
Transport agnostic: The message shape remains identical whether piped over local stdio (for local dev) or SSE/WebSockets (for remote deployment).

Example: A Full MCP Flow

User: “Check the library database for book availability for ISBN 12345.”

Host (LLM): Recognizes the intent and asks the client to find a relevant tool.
Client: Identifies check_availability via discovery and sends a JSON-RPC request:

{
  "method": "tools/call",
  "params": {
    "name": "check_availability",
    "arguments": { "isbn": "12345" }
  }
}

Server: Receives the request, runs the query, and returns:

{
  "result": {
    "content": [
      { "type": "text", "text": "Available: 5 copies" }
    ]
  }
}

Host: Feeds this back into the LLM context window.
LLM: Responds: “Good news! There are 5 copies available.”

Advanced Mechanisms: Sampling and Roots

MCP extends beyond simple API calls with features that enable sophisticated interaction.

Sampling: Enables the server to delegate complex tasks back to the host. During the execution of a tool, the server can effectively say, “Hey LLM, I need your brain for a second,” and request the host to generate text or analyze code.
Roots: A security boundary mechanism. A server can declare boundaries (for example, “I only have access to /var/www/project”), preventing access to files or resources outside a specific scope.

Real-Time Updates and Transports

Unlike standard APIs where the client must poll for changes, MCP supports server-initiated notifications.

Once a session is established, a server can send streaming responses and JSON‑RPC notifications without additional polling. For example, a filesystem server can notify the host immediately when a watched file changes, or a long-running build process can stream log lines as they appear.

This is supported across the main standard transports.

stdio: For local processes (ideal for desktop apps like Cursor).
SSE (Server-Sent Events): For remote servers sending updates to clients.
Custom transports: The protocol is extensible to additional carriers like WebSockets; draft proposals already explore this on top of the existing HTTP/streaming model.

Is MCP a Silver Bullet?

MCP solves the integration problem, but it is not a magic fix for every scenario.

Use it when you:

Need interactive AI–tool integrations
Expect multiple AI models to use the same tools
Have tooling that evolves frequently

Avoid it when you:

Have a simple one-off integration
Run large batch jobs without interaction
Care about latency more than flexibility

Production Challenges

While the local development story is fantastic, moving to production introduces complexity.

1. The Scaling Challenge

In development, a “one host process → one server process” model via stdio works well. In production, this naive 1:1 model does not scale, because you cannot spawn a new database connection process for every one of 10,000 concurrent users.

The solution: Production architectures use MCP gateways, which sit between clients and servers to handle connection pooling and multiplex many logical sessions over fewer physical connections.

2. Security and Auth

MCP defines the transport, but it does not strictly mandate how you authenticate. In a remote setup, you need to secure the transport layer (for example, via headers in SSE).

Because MCP servers can execute code or read files, strict roots configuration and containerization are essential to prevent privilege escalation.

3. Debugging and Observability

Debugging streaming JSON‑RPC over a long‑lived transport can be opaque. Unlike REST, where you have discrete HTTP logs, MCP is a stream of messages.

Production implementations require robust tracing (for example, correlation IDs) to track a request as it hops from Host → Gateway → Server and back.

Final Thoughts

The Model Context Protocol represents a meaningful step toward standardizing AI-to-tool communication. While Anthropic seeded the ecosystem, there is now broad adoption across open-source tools, IDEs, and infrastructure providers.

However, treat it as a protocol, not a magic solution. It requires ecosystem adoption and careful architectural planning for production scale.

Example MCP Implementations

To explore MCP in practice, here are the implementation repositories built while learning the ecosystem:

MCP Book Library: https://github.com/rajkundalia/mcp-book-library
MCP Toolbox: https://github.com/rajkundalia/mcp-toolbox

These projects demonstrate MCP servers and integrations for realistic data sources and workflows.

Why Use `mcp` Over `fastmcp`?

Short version:

Use mcp (official) if you want to learn the architecture, build custom clients/hosts, or manually configure the HTTP/SSE layers (which is exactly what many project prompts ask for).
Use fastmcp if you just want to ship a tool to Claude Desktop in a few minutes and do not care how the wiring works under the hood.

The best way to understand MCP is to build with it. Start small, implement a simple server for a data source you use regularly, and compare the experience to traditional point-to-point integrations.

Resources That Helped

Some resources that helped deepen understanding of MCP and its ecosystem:

https://youtu.be/5CmAKm1wWW0?si=17DNRC7cQ89UfSLD – a great starter video.
https://huggingface.co/blog/Kseniase/mcp – very good conceptual and practical overview.
https://modelcontextprotocol.io/docs/getting-started/intro – official, well-written documentation.
https://www.descope.com/learn/post/mcp – good discussion of security and auth aspects.
https://zapier.com/blog/mcp/ – promotes Zapier, but still an insightful read on real-world use.
https://norahsakal.com/blog/mcp-vs-api-model-context-protocol-explained/ – useful section on when to use MCP.
https://medium.com/ai-cloud-lab/model-context-protocol-mcp-with-ollama-a-full-deep-dive-working-code-part-1-81a3bb6d16b3 and https://medium.com/ai-cloud-lab/model-context-protocol-mcp-with-ollama-and-llama-3-a-step-by-step-guide-part-2-2a5917c8c745 – detailed deep dives with working code.
https://skywork.ai/skypage/en/ollama-mcp-MCP-Server-The-Definitive-Guide-for-AI-Engineers/1972585330623180800 – explains ollama-mcp, an MCP server that exposes a local Ollama instance as standardized tools.
https://apidog.com/blog/mcp-ollama/ – explains Dolphin MCP, a Python-based MCP client that bridges an LLM and multiple MCP servers.

API Gateway vs Service Mesh: Beyond the North–South/East–West Myth

Raj Kundalia — Thu, 20 Nov 2025 01:41:21 +0000

Please note that the page became big because I had questions on my own and less information would have made things look speculatory. You can skip this and read links added at the end of the page, they are very good.

My Experimental Code Link

Like always, if you just read and not code for this, it pretty much becomes as good as not reading it.

Github Link: https://github.com/rajkundalia/api-gateway-service-mesh-sample

This took a long time, I tried implementing a service mesh but it went above my scope - so things like Intentions in Consul would not work.

Introduction: The Misconception That's Costing Teams

If you've worked with microservices, you've probably heard this oversimplification: "API Gateways handle north–south traffic, while Service Meshes handle east–west traffic."

This directional framing has become microservices folklore - repeated in architecture discussions and echoed in conference talks for years.

Here's the issue: it's fundamentally wrong.

This misconception leads to poor architectural decisions, unnecessary complexity, and recurring confusion about which technology solves which problem. Teams often reach for an API Gateway when a Service Mesh is what they truly need - or vice versa - because they focus on traffic direction rather than the underlying purpose.

The truth is more nuanced:

API Gateways can manage east–west traffic via internal gateways that govern inter-service communication, apply policies, and handle versioning.
Service Meshes can handle north–south traffic through mesh-aware ingress gateways (such as Istio's Ingress Gateway or Linkerd's ingress controller) that bring external traffic into the mesh.

So if traffic direction isn't the real difference, what is?

Purpose and responsibility.

An API Gateway treats services as products - with user governance, access control, monetization, lifecycle management, and business context.

A Service Mesh, by contrast, provides infrastructure-level reliability for service-to-service communication - zero business logic, zero product thinking, purely connectivity.

In this article, we'll cut through the confusion and give you a clear mental model for when to use each technology - or when using both together creates the strongest architecture.

You'll learn:

What problems each technology actually solves (and why traffic direction doesn't matter)
The architectural differences that lead to different use cases
How capabilities like mTLS, retries, and zero-trust security define service meshes
A practical decision framework for choosing the right tool
How API Gateways and Service Meshes complement each other in real-world systems

Let's start by understanding the fundamental problems each technology was designed to solve.

Understanding the Real Problem Each Solves

API Gateway: APIs as a Product

An API Gateway's primary purpose is to expose services as managed, consumable APIs - treating your services like products that internal or external consumers can discover, use, and rely on.

But an API Gateway is far more than a reverse proxy. It embeds business logic and enables API composition: aggregating data from multiple services into a single response, transforming payloads, standardizing errors, and presenting a unified interface that shields clients from backend complexity. This is effectively the Backend-for-Frontend (BFF) pattern.

And once you move past request/response mechanics, the real power emerges. API Gateways participate in the entire API lifecycle - the part most developers overlook:

Creation & design: specs, versioning, schema validation
Testing & documentation: interactive docs, automated tests, sandboxes
Publishing & onboarding: developer portals, marketplaces, self-service access
Monetization: usage metering, billing hooks, tiered plans
Analytics: usage patterns, behavior insights, performance dashboards

This is where the gateway gains business context. It knows concepts like customers, products, API keys, and rate-limit tiers. When a mobile client sends a request, the gateway understands: "This is Acme Corp, a premium tier subscriber, allowed 10,000 requests per hour on the /payments API."

Modern platforms such as Kong, AWS API Gateway, Azure API Management, Apigee, and Ambassador all embody this philosophy - combining policy enforcement with full lifecycle and product-style API management.

Service Mesh: Service Connectivity Infrastructure

A Service Mesh has a fundamentally different purpose: providing decoupled infrastructure for service-to-service communication without requiring changes to application code.

Service Meshes offload network functions from services into a dedicated infrastructure layer. They handle concerns like service discovery, load balancing, circuit breaking, retries, and timeouts - all the complexity that developers would otherwise implement (and often implement inconsistently) across services.

Critically, Service Meshes have no business logic. They're purely connectivity and observability infrastructure. A service mesh doesn't know or care whether it's routing a payment transaction or a product catalog query. Every service is treated equally as a network endpoint with routing rules and policies.

This enables polyglot architectures. Your Python services, Go services, and Java services all get the same networking capabilities without embedding client libraries or writing language-specific code. The infrastructure handles it transparently.

The key insight: A Service Mesh is business-agnostic. It operates at the infrastructure layer, understanding concepts like "service instances," "endpoints," "failure rates," and "latency percentiles" - but never "customers," "API products," or "billing tiers."

Popular implementations include Istio, Linkerd, Consul Connect, and AWS App Mesh.

Quick Comparison

Aspect	API Gateway	Service Mesh
Primary Purpose	Expose services as managed API products	Decouple service communication infrastructure
Context	Business-aware (users, products, billing)	Business-agnostic (endpoints, metrics)
Logic	Can contain transformation, aggregation logic	No business logic, pure infrastructure
Lifecycle Scope	Full API lifecycle (design → retirement)	Runtime connectivity only
Consumer Focus	External developers, partners, clients	Services communicating with each other

Architecture Deep Dive

Deployment Models

The architectural differences between API Gateways and Service Meshes are stark, and understanding these differences clarifies why each excels at different problems.

API Gateway: Centralized Architecture

An API Gateway deploys as a standalone reverse proxy or clustered front-door, creating a single entry point (or small cluster) for API traffic. It lives in its own architectural layer, distinct from your services.

Here's a simplified view:

External Clients (Mobile, Web, Partners)
              ↓
    ┌─────────────────┐
    │  API Gateway    │ ← Centralized, clustered for HA
    │   (Kong/AWS)    │
    └─────────────────┘
         ↓    ↓    ↓
    ┌────┐ ┌────┐ ┌────┐
    │Svc │ │Svc │ │Svc │
    │ A  │ │ B  │ │ C  │
    └────┘ └────┘ └────┘

Traffic flows through the gateway as a dedicated hop. The gateway terminates external connections, applies policies, performs routing decisions, and forwards requests to backend services. Deployment is relatively straightforward - you provision the gateway infrastructure separately from your services.

Service Mesh: Decentralized Architecture

A Service Mesh deploys in a fundamentally different way: a sidecar proxy alongside every service replica. This is a decentralized, peer-to-peer model.

Service A          Service B          Service C
┌─────────┐        ┌─────────┐        ┌─────────┐
│  App    │        │  App    │        │  App    │
│Container│        │Container│        │Container│
└────┬────┘        └────┬────┘        └────┬────┘
     │                  │                  │
┌────┴────┐        ┌────┴────┐        ┌────┴────┐
│ Envoy   │◄──────►│ Envoy   │◄──────►│ Envoy   │
│ Sidecar │        │ Sidecar │        │ Sidecar │
└─────────┘        └─────────┘        └─────────┘
       ▲                 ▲                 ▲
       └─────────────────┴─────────────────┘
              Control Plane (Istio/Linkerd)
              (Configuration, not traffic)

Each service instance gets its own proxy (typically Envoy). When Service A calls Service B, the request flows: App A → Sidecar A → Sidecar B → App B. The service code itself doesn't know about the mesh - it makes standard HTTP or gRPC calls to localhost, and the sidecar handles everything else.

This deployment model is more invasive. It requires modifying your CI/CD pipelines to inject sidecars, updating Kubernetes manifests (or VM configurations), and managing the lifecycle of proxies alongside applications.

Key Insight: In an API Gateway, traffic converges at a central point. In a Service Mesh, traffic flows peer-to-peer between distributed proxies, with the control plane managing configuration but never touching actual requests.

Control Plane vs Data Plane Architecture

This separation of concerns is crucial for understanding Service Meshes, though it applies (less critically) to some API Gateway implementations.

Service Mesh: Deep Dive into Control and Data Planes

The control plane (examples: Istio's Pilot, Linkerd's Controller, Consul's servers) is the brain of the mesh:

Configuration management: Distributes routing rules, traffic policies, and service configurations to all sidecars
Service discovery: Maintains a live registry of all service instances and their endpoints
Certificate authority: Generates and rotates mTLS certificates for service identity
Telemetry aggregation: Collects metrics and traces from data plane proxies
Policy enforcement setup: Configures access control rules and rate limits

Critically: the control plane is NOT on the request path. It handles configuration and management but never sees actual user requests. This is fundamental to mesh scalability.

The data plane (examples: Envoy sidecars in Istio, Linkerd2-proxy in Linkerd) does the heavy lifting:

Handles actual request traffic: Every request flows through data plane proxies
Enforces policies: Implements circuit breakers, retries, timeouts configured by control plane
L4/L7 routing and load balancing: Makes real-time routing decisions
Security enforcement: Performs mTLS handshakes, validates certificates
Telemetry generation: Reports metrics, logs, and traces for observability

Let's make this concrete with service discovery as an example. When Service C scales from 3 to 5 replicas, here's what happens:

Kubernetes (or your orchestrator) starts two new pods with Service C containers and Envoy sidecars
The Envoy sidecars register with the control plane upon startup
The control plane updates its service registry with the two new endpoints
The control plane pushes updated routing configurations to all Envoy sidecars in the mesh
Within seconds, Service A and Service B know about the new Service C instances and start load balancing across all 5 replicas

No DNS propagation delays. No manual configuration updates. No service discovery libraries in application code. The control plane orchestrates everything, while sidecars handle the actual routing.

API Gateway: Simpler Control Plane Model

Some API Gateway implementations (like Kong with its declarative configuration) have control plane concepts, but the separation is less critical. Many gateways bundle control and data plane functions in the same process. Configuration changes might require gateway reloads, and the gateway itself is on the request path - serving as both traffic handler and configuration enforcer.

Organizational and Deployment Challenges

Service Meshes face unique adoption barriers that API Gateways largely avoid:

1. Universal Sidecar Deployment Requirement

To get value from a service mesh, you need sidecars deployed alongside all services you want to manage. This creates organizational friction: it's not something a single team can adopt independently. You need buy-in from every service owner.

2. Shared Control Plane Access

All services must share access to the mesh control plane. This crosses security boundaries - teams that previously had isolated deployments now share infrastructure. Organizations with strict security postures find this challenging.

3. Cannot Control External Services

You can only mesh services you directly control. Third-party APIs, legacy systems outside your infrastructure, and managed services like external databases cannot participate in the mesh. This limits where resilience patterns apply.

4. Certificate Authority Coordination

Services in the same mesh must share a Certificate Authority (CA) for mTLS. This requires cross-team coordination on security policies and trust models. Different teams or products often want separate CAs for isolation - which means separate meshes.

Why This Matters: Service mesh adoption is often limited to team or product boundaries. An API Gateway, deployed as central infrastructure, can span the entire organization much more easily. It doesn't require every team to change their deployment processes.

Now that we understand the architectural differences and deployment realities, let's examine specific capabilities side-by-side.

Capabilities Comparison

Both technologies offer overlapping capabilities, but with different implementations and tradeoffs. Understanding these differences guides architectural decisions.

Service Discovery

API Gateway: Uses external service registries (Consul, Eureka, DNS, Kubernetes Services). The gateway queries the registry to find service endpoints, then routes traffic accordingly.
Service Mesh: Built-in service discovery via the control plane. The control plane automatically tracks all sidecar-enabled services, maintaining a live registry without external dependencies. When a service scales or moves, the mesh knows immediately.

Authentication and Authorization ⭐

This is perhaps the most important architectural differentiator between the two patterns.

API Gateway: Focuses on user and client identity. Validates API keys, OAuth2 tokens, JWT claims. Answers questions like: "Is this mobile app authorized to call the /payments endpoint?" or "Has this partner exceeded their rate limit?" Security is about edge protection - who gets into your system and what they can access.
Service Mesh: Focuses on service identity via mTLS certificates. Every service gets a cryptographic identity. Answers questions like: "Is this really the Payment service calling Fraud Detection?" or "Should Order Service be allowed to communicate with User Profile Service?" Security is about Zero-Trust architecture - no service implicitly trusts another.

Load Balancing

API Gateway: Server-side load balancing at the gateway layer. The gateway distributes requests across service instances based on configured algorithms (round-robin, least connections, weighted).
Service Mesh: Client-side load balancing distributed via sidecars. Each sidecar makes load balancing decisions locally, using health status and latency information from the control plane. This enables more sophisticated strategies like locality-aware routing (prefer same-zone instances).

Rate Limiting

API Gateway: Edge-focused, per-client or per-API-key. Limits like "1000 requests per hour for this developer" or "premium tier customers get 10x capacity." Centralized enforcement at the gateway.
Service Mesh: Can implement distributed rate limiting to prevent service overload. For example, preventing the Notification Service from overwhelming Email Service with requests, regardless of which client triggered the flow. Enforcement happens at sidecars across the mesh.

Circuit Breakers and Retries

API Gateway: Configured at the gateway level to protect against downstream service failures. If Payment Service is down, the gateway can circuit break to avoid cascading failures.
Service Mesh: Configured at the control plane, enforced at every sidecar. Each service gets automatic circuit breakers and retries without code changes. When Inventory Service calls Warehouse Service and detects failures, the sidecar automatically circuit breaks - no retry logic in Inventory Service code.

Health Checks

API Gateway: Gateway actively probes downstream services for health, removing unhealthy instances from its routing pool.
Service Mesh: Sidecars monitor local service health and report to the control plane. Passive health checks based on actual request success rates. Faster reaction to failures because the sidecar sits adjacent to the service.

Observability

API Gateway: Edge metrics and API-level analytics. Tracks which APIs are called, by whom, how often, and with what latency. Great for understanding API usage patterns and client behavior.
Service Mesh: Deep service-to-service metrics and distributed tracing. Tracks every internal call with detailed latency breakdowns, success rates, and request volumes. Enables debugging complex distributed transactions by tracing requests as they flow through multiple services.

Example: When a user checkout fails, the API Gateway shows the client request hit the /checkout endpoint with a 500 error. The service mesh traces reveal that Order Service → Inventory Service succeeded, but Inventory Service → Warehouse Service timed out after 3 retries - pinpointing the exact failure point.

Protocol Support

API Gateway: Primarily HTTP/HTTPS, with increasing support for gRPC, WebSockets, and GraphQL. Focused on application-layer protocols.
Service Mesh: Supports both L4 (TCP) and L7 (HTTP, gRPC) protocols. Can handle raw TLS connections, TCP traffic, and any IP-based protocol. Broader protocol range because it operates at the network infrastructure layer.

Chaos Engineering and Defect Simulation

API Gateway: Limited capabilities - some gateways allow injecting delays or errors, but it's not a primary feature.
Service Mesh: Built-in chaos engineering support. Can inject faults (return 500 errors), add delays (simulate network latency), or abort connections to specific services. Enables testing resilience in production-like conditions. For example, "Make 10% of calls from Order Service to Inventory Service return 503 errors to verify circuit breakers work."

Summary Table

Capability	API Gateway	Service Mesh
Service Discovery	External registry (Consul, DNS)	Built-in via control plane
Authentication/Authorization	User/client identity (OAuth, API keys)	Service identity (mTLS certificates)
Load Balancing	Server-side, centralized	Client-side, distributed
Rate Limiting	Per-client/API key at edge	Per-service, distributed
Circuit Breakers	At gateway	Distributed, no code changes
Health Checks	Gateway probes services	Sidecars monitor local health
Observability	Edge metrics, API analytics	Service-to-service tracing
Protocols	HTTP/HTTPS, gRPC, WebSockets	L4 + L7 (TCP, HTTP, gRPC, TLS)
Chaos Engineering	Limited	Built-in fault injection

Among these capabilities, mutual TLS deserves special attention because it fundamentally changes how services authenticate and trust each other.

Mutual TLS (mTLS) in Service Mesh

How mTLS Works and Why It Matters

The Mechanism:

When a service mesh is deployed, the control plane includes a Certificate Authority (CA). This CA generates unique, short-lived certificates for every service replica. When Service A's sidecar calls Service B's sidecar, both sides present certificates during the TLS handshake, cryptographically proving their identities.

Here's the flow:

Order Service sidecar initiates connection to Payment Service
Payment sidecar presents certificate: "I am payment.production.svc.cluster"
Order sidecar verifies certificate against the mesh CA
Order sidecar presents its own certificate: "I am order.production.svc.cluster"
Payment sidecar verifies Order's certificate
Encrypted, authenticated connection established

Crucially, sidecars automatically handle certificate rotation. Certificates might rotate every few hours, and services never see this complexity - it's entirely transparent.

The Value:

This eliminates the need for service-level authentication code. Previously, Payment Service might check an API key or JWT token to verify the caller. With mTLS, the infrastructure proves identity cryptographically. Your service code doesn't need to know about authentication - it receives requests that have already been authenticated at the network layer.

Additionally:

Encryption by default: All east-west traffic is encrypted, protecting against network sniffing
Audit trail: The mesh knows exactly which services communicated with which other services
Compliance: Meets requirements for data-in-transit encryption (SOC2, PCI-DSS, HIPAA)

Certificate Authority Boundaries

Services in the same mesh must share a Certificate Authority. This has organizational implications.

Consider a large company with two product teams: Banking and Trading. For security isolation, they want separate Certificate Authorities - Banking services shouldn't trust certificates from Trading services. This means they need two separate service meshes (Mesh A and Mesh B).

But what if Banking needs to expose APIs to Trading? This is where API Gateways complement service meshes. An API Gateway can sit at the boundary between meshes, terminating mTLS from one mesh and re-establishing it in another mesh (or using traditional API authentication). The gateway bridges different trust domains.

mTLS and Zero-Trust Networking

mTLS enables Zero-Trust architecture for internal service communication.

Traditional security followed the "castle and moat" model: strong perimeter defenses, but once inside the network, services implicitly trusted each other. An attacker who breached the perimeter had free access to internal systems.

Zero-Trust rejects this model: never trust, always verify. Every request, even between internal services, requires authentication. No service is trusted by default, regardless of network location.

Service meshes with mTLS implement Zero-Trust for east-west traffic. Even if an attacker deploys a rogue container inside your cluster, it cannot communicate with legitimate services because it lacks valid certificates signed by the mesh CA. Every service must cryptographically prove its identity on every request.

With these capabilities and security models in mind, let's turn to practical decision-making: when should you use each technology?

When to Use Each

There's no one-size-fits-all answer. Choosing between API Gateways and Service Meshes depends on your primary challenge, team maturity, and architectural scale. Let's build a decision framework.

Decision Framework: Use API Gateway When…

Primary Challenge: External Access & Client Management

If you need to expose services to external consumers - developers, partners, customers, mobile apps - choose an API Gateway. It excels at edge security, client authentication (API keys, OAuth2), and managing the full API product lifecycle.

Concrete scenario: You're building a SaaS platform where third-party developers integrate with your product catalog API. You need developer onboarding, API key provisioning, documentation portals, usage analytics, and tiered rate limiting. An API Gateway provides all of this out-of-the-box.

Primary Challenge: Service Abstraction & Evolution

If different products or teams need to communicate with governance, versioning, and backward compatibility, choose an API Gateway. It provides abstraction as underlying services evolve.

Concrete scenario: Your mobile team needs stable APIs while your backend undergoes frequent changes. The API Gateway maintains version 1 and version 2 of the /orders endpoint, routing v1 clients to legacy services and v2 clients to the new architecture. Backend teams can refactor without breaking mobile apps.

Primary Challenge: Centralized Control & Simplicity

If you're starting your microservices journey and need immediate value with lower operational complexity, choose an API Gateway. Simpler deployment, easier to understand, lower barrier to entry.

Concrete scenario: You're migrating from a monolith to 5–10 microservices. You need request routing, basic rate limiting, and API documentation. A service mesh would be overkill - too much infrastructure overhead for your scale. An API Gateway solves your immediate needs without the operational burden.

Primary Challenge: Edge Security & Rate Limiting

If your main concern is protecting services from external threats and managing API quotas per customer, choose an API Gateway.

Concrete scenario: Your public APIs face potential DDoS attacks, credential stuffing, and abusive clients. The API Gateway implements rate limiting, IP blocking, JWT validation, and anomaly detection at the edge, before traffic reaches your services.

Decision Framework: Use Service Mesh When…

Primary Challenge: Internal Service Reliability

If you have large-scale internal architecture (dozens to hundreds of services) with complex communication patterns, and services need automatic retries, circuit breakers, and timeouts without code changes, choose a Service Mesh.

Concrete scenario: You have 80 microservices across 12 teams. Services frequently fail partially - timeouts, transient errors, network blips. Rather than each team implementing retry logic differently (or not at all), the service mesh provides consistent resilience patterns across all services. When Recommendation Service calls User Profile Service and gets a timeout, the sidecar automatically retries with exponential backoff - no code change needed.

Primary Challenge: Polyglot Environments & Code Elimination

If you want to eliminate networking code from services and need uniform connectivity across services written in different languages, choose a Service Mesh.

Concrete scenario: Your platform includes Python ML services, Go APIs, Java batch processors, and Node.js real-time services. Rather than maintaining four different HTTP client libraries with circuit breakers, retries, and observability, the service mesh provides identical capabilities to all services regardless of language. Developers focus on business logic, not networking infrastructure.

Primary Challenge: Security Compliance & Zero-Trust

If security compliance requires mTLS encryption for all internal communication, or you need Zero-Trust architecture with cryptographic service identity, choose a Service Mesh.

Concrete scenario: Rather than configuring TLS in every service's application code, the service mesh provides automatic mTLS between all services. Auditors see consistent encryption policies enforced at the infrastructure layer, dramatically simplifying compliance evidence.

Primary Challenge: Deep Observability & Traffic Control

If you require deep east-west observability and distributed tracing across all services, or need advanced traffic management (canary deployments, traffic splitting, A/B testing) for internal services, choose a Service Mesh.

Concrete scenario: You're rolling out a major refactor of Order Service. You want to send 5% of traffic to the new version, monitor error rates and latency, gradually increase to 50%, then 100%. The service mesh enables this with configuration changes - no deployment changes, no feature flags in code. If error rates spike, you roll back instantly by updating traffic weights.

When NOT to Use Service Mesh

Avoiding Unnecessary Complexity:

Service meshes are powerful but operationally complex. Don't use them if:

Small architectures (< 10–15 services): Operational overhead outweighs benefits. You'll spend more time managing the mesh than you save from its features.
Team lacks infrastructure expertise: Service meshes have a steep learning curve. If your team struggles with Kubernetes basics, adding a service mesh will slow you down.
Cannot deploy sidecars: If you depend on external services, legacy systems you don't control, or third-party SaaS APIs, a service mesh can't manage those connections.
Organizational resistance: Service meshes require cross-team adoption. If teams resist sidecar injection or control plane dependencies, forced adoption fails.
Ultra-sensitive performance requirements: Sidecars add latency (typically 1–5ms per hop). For ultra-low-latency scenarios where even milliseconds matter, this overhead is unacceptable.
Limited operational resources: Service meshes require dedicated platform engineering resources. If you lack staff to manage mesh infrastructure, troubleshoot sidecar issues, and handle certificate rotation problems, don't adopt a mesh.

Decision Matrix: Use Both When…

The Comprehensive Approach:

Many mature architectures use both technologies together, leveraging each for its strengths.

Use both when:

You need edge control for external clients (API Gateway) AND in-mesh reliability for internal services (Service Mesh)
You want API-as-a-product capabilities (documentation, monetization, developer portals) AND Zero-Trust security internally (mTLS between services)
You have a mature platform engineering team capable of managing layered infrastructure

Example decision: "We expose our Payment API to mobile apps and partners via API Gateway - handling JWT validation, per-customer rate limiting, and maintaining a developer portal. Internal communication between Payment Service, Fraud Detection Service, and Notification Service uses a service mesh - providing mTLS encryption, circuit breakers, and distributed tracing. The API Gateway itself runs as a service within the mesh, getting the same resilience and observability benefits."

Real-World Architecture Example

Let's walk through a financial institution scenario that illustrates how both technologies complement each other.

Scenario: Multi-Product Financial Platform

A financial institution has two major products:

Banking Platform (account management, transfers, statements)
Trading Platform (stock trading, portfolio management, market data)

Each product has its own engineering team, separate deployments, and independent release cycles. Here's how they use both technologies:

Service Mesh Deployment (Two Separate Meshes)

Banking Mesh: Covers 25 microservices (Account Service, Transaction Service, Statement Generator, etc.) with its own Certificate Authority for security isolation
Trading Mesh: Covers 18 microservices (Order Execution, Portfolio Service, Market Data, etc.) with a separate Certificate Authority

Each mesh provides:

mTLS encryption for all internal communication within that product
Circuit breakers and retries for resilience
Distributed tracing to debug complex transactions
Zero-Trust security - no service trusts another by default

API Gateway Deployment (Multiple Gateways)

Internal API Gateway: Banking Platform exposes select APIs to Trading Platform (e.g., "Get Account Balance" for margin trading). This gateway sits at the boundary between Banking Mesh and Trading Mesh, bridging different trust domains.
Edge API Gateway: Both products expose APIs to mobile applications. This gateway handles:
- JWT validation for user authentication
- Rate limiting per user tier (retail vs institutional)
- API versioning (mobile app v1.2 uses older endpoint, v2.0 uses new schema)
- Developer portal for partner integrations
- Analytics on API usage patterns

Multi-Datacenter Deployment

The architecture spans two datacenters (DC1 and DC2) for high availability:

Each datacenter has full mesh deployment (Banking Mesh and Trading Mesh)
API Gateways in each datacenter for local request handling
Cross-datacenter mesh communication uses mTLS across the WAN
API Gateway load balancers route users to nearest datacenter

Key Architectural Insights:

This architecture demonstrates several principles:

Isolation through separate meshes: Banking and Trading use different CAs, preventing accidental trust relationships
API Gateways bridge trust domains: Internal gateway mediates between meshes when cross-product communication is needed
Layered security: Edge gateway handles user authentication, mesh handles service authentication
Different lifecycle management: API versions can change without mesh reconfiguration; mesh policies can change without API versioning

When a mobile user checks their trading portfolio's buying power, here's the flow:

Mobile app → Edge API Gateway (JWT validation, rate limiting)
Edge API Gateway → Trading Platform's Portfolio Service (via Trading Mesh, with mTLS)
Portfolio Service → Internal API Gateway (requesting account balance from Banking)
Internal API Gateway → Banking Platform's Account Service (via Banking Mesh, with mTLS)
Response flows back through each layer

Each technology layer adds value: the edge gateway protects against external threats and manages API products, while the meshes ensure reliable, secure service-to-service communication.

Pros and Cons Summary

Understanding the tradeoffs helps set realistic expectations and plan for operational challenges.

API Gateway

Pros:

Standardizes API delivery: Consistent authentication, rate limiting, and versioning across all APIs
Simplifies client integration: Single entry point with unified documentation reduces client complexity
High flexibility: Can transform requests, aggregate responses, implement complex routing logic
Easier adoption: Centralized deployment model requires less organizational coordination
Centralized analytics: Single place to monitor API usage, client behavior, and performance trends
Legacy integration: Can front legacy systems, providing modern API interfaces to old infrastructure

Cons:

Single point of failure risk: Though clustering mitigates this, the gateway remains a critical chokepoint
Centralization complexity at scale: As more APIs are added, gateway configuration grows complex
Latency introduction: Extra hop adds latency (typically 5–20ms depending on gateway processing)
Limited internal visibility: Only sees edge traffic, not service-to-service communication patterns
Scaling challenges: While horizontal scaling is possible, it's more complex than distributed architectures

Service Mesh

Pros:

Built-in observability: Comprehensive metrics, distributed tracing, and logging without code instrumentation
Enhanced security: Automatic mTLS, Zero-Trust architecture, cryptographic service identity
Resilience without code: Circuit breakers, retries, timeouts configured centrally, enforced everywhere
Fine-grained traffic control: Canary deployments, traffic splitting, A/B testing at infrastructure level
Chaos engineering capabilities: Inject faults and delays to test system resilience
Abstracts networking from code: Developers focus on business logic, not HTTP clients and retry libraries
Language agnostic: Same capabilities for Go, Python, Java, Node.js services

Cons:

Steep learning curve: Complex architecture requires dedicated platform engineering expertise
Operational complexity: Managing control plane, certificate rotation, sidecar upgrades adds operational burden
Latency overhead: Each sidecar hop adds latency; multiple hops compound this
Resource overhead: Memory and CPU per sidecar
Requires infrastructure maturity: Best suited for Kubernetes environments with GitOps practices
Organizational challenges: Requires cross-team adoption and coordination - can't be implemented in isolation
Deployment complexity: Sidecar injection, control plane dependencies increase deployment complexity

Conclusion

Let's return to where we started: the pervasive north-south/east-west myth that frames API Gateways and Service Meshes as mutually exclusive technologies defined by traffic direction.

This framing is fundamentally flawed. Both technologies can handle both traffic types. API Gateways can manage internal service-to-service communication through private gateways. Service Meshes can expose external traffic through ingress gateways. The real distinction has nothing to do with where traffic flows.

What actually matters is purpose:

API Gateways treat services as products with business context - managing full API lifecycles, understanding users and customers, handling monetization and developer onboarding. They operate at the application edge with business awareness.
Service Meshes provide business-agnostic infrastructure for service connectivity - offloading networking concerns from application code, enabling Zero-Trust security through mTLS, and providing deep observability without instrumentation. They operate at the infrastructure layer with no business logic.

Looking forward, both patterns continue to evolve. Service Meshes are simplifying operationally (Linkerd's focus on simplicity, Istio's ambient mesh reducing sidecar overhead). API Gateways are adding mesh-like features (Kong Mesh, Ambassador's service mesh integration). The boundaries blur, but the fundamental purposes remain distinct.

Choose your tools based on the problems they solve, not the traffic patterns they handle. Your architecture - and your team's sanity - will thank you.

Note

Obviously this content has been generated by LLM, but my approach to writing has been the following:

I read topics from various pages out there.
I come across questions/sub topics that I would want to cover.
I add this questions/subtopics and then generate using LLM.
I read the LLM generated content and then keep what I find necessary.

Micronaut Framework: The Next Generation JVM

Raj Kundalia — Sat, 01 Nov 2025 15:13:11 +0000

Note:
There are chances that you do not want to read the whole page and just see some code implementation. Don't worry — I have a repository for it: https://github.com/rajkundalia/product-catalogue-micronaut

Do read the Development Experience for it at the end.

In the fast-changing world of cloud-native development, Java's long-standing dominance has faced new challenges. Developers love its stability and rich ecosystem, but modern workloads — serverless functions, microservices, and edge computing — demand instant startup, low memory footprint, and scalable concurrency.

Frameworks like Spring Boot have made Java approachable and powerful for enterprise-scale systems, yet their reliance on runtime reflection and classpath scanning adds overhead that feels increasingly dated in a world obsessed with milliseconds and megabytes.

Enter *Micronaut *— a modern, full-stack JVM framework designed from the ground up for cloud-native, serverless, and microservice architectures. Developed by Object Computing, Inc. (OCI) — the same team behind the Grails framework — Micronaut rethinks how dependency injection, configuration, and reflection should work in the JVM world.

Micronaut doesn't merely compete with Spring Boot or Quarkus; it redefines how JVM applications can be lightweight, reactive, and cloud-optimized — all without sacrificing developer productivity.

What is Micronaut?

Micronaut is a modern, full-stack JVM framework designed specifically for building cloud-native applications with minimal resource consumption. Unlike traditional frameworks, it's lightweight by design, not as an afterthought.

The framework provides comprehensive support for Java, Kotlin, and Groovy, allowing teams to choose their preferred JVM language without compromising on features or performance. This multi-language capability extends throughout the entire stack, from dependency injection to HTTP handling to data access.

Micronaut's core philosophy centers on cloud-first, performance-first development. Every architectural decision prioritizes fast startup times and low memory footprints — critical factors for modern deployment models where applications must scale rapidly and run cost-effectively in containerized or serverless environments. Rather than optimizing legacy runtime reflection patterns, Micronaut eliminates them entirely through compile-time code generation.

Key Differentiators & Architecture

Ahead-of-Time (AOT) compilation sits at the core of Micronaut's design. The framework performs all reflection operations, proxy generation, and configuration processing during compilation — not at runtime. This eliminates the startup penalty and memory overhead associated with runtime reflection entirely.

Compile-time Dependency Injection represents a paradigm shift from Spring's runtime approach. While Spring scans the classpath, creates bean definitions, and builds the application context at startup, Micronaut generates all dependency injection code during compilation. The resulting bytecode contains explicit wiring instructions with zero reflection or dynamic proxy creation at runtime.

This architectural approach has profound implications for cloud-native deployments. Low memory footprint and fast startup aren't just performance optimizations — they're fundamental characteristics that enable new deployment models. Serverless functions that must initialize in milliseconds become practical. Container-dense environments can pack more instances per node, directly reducing infrastructure costs. Auto-scaling responds faster because new instances reach readiness in seconds rather than minutes.

Dependency Injection at Compile Time

Dependency injection at compile time is Micronaut's most significant innovation and deserves careful examination.

Zero Reflection at Runtime means exactly that. Micronaut doesn't scan your classpath looking for annotated classes. It doesn't build bean registries in memory. It doesn't create reflection-based proxies. All of this work happens during compilation, producing standard bytecode with explicit constructor calls and method invocations. The result is predictable, low memory consumption with no hidden caches or reflection metadata.

Note: "Zero reflection at Runtime" is accurate for most use cases, but certain integrations (like serialization frameworks, e.g., Jackson) may still perform limited reflection if not configured with Micronaut Serde.

True AOT Compilation generates all the boilerplate that frameworks traditionally create at runtime. When you annotate a class with @Singleton, Micronaut's annotation processors generate the factory code that will instantiate it. When you use @Inject, it generates the wiring code. Aspect-oriented programming (AOP) concerns like @Transactional or @Cacheable become compile-time-generated method interceptors, not runtime proxies.

At compile time, Micronaut generates factory classes and wiring code for these beans. At runtime, there's no reflection — just straightforward object instantiation and method calls. Compare this to Spring, where the application context scans packages, uses reflection to discover beans, creates proxies for AOP, and builds a runtime dependency graph. That entire process consumes time and memory on every startup.

Reactive Programming & Non-Blocking I/O

Micronaut embraces reactive programming as a first-class concern, not an afterthought. The framework provides built-in reactive support throughout its HTTP layer, data access, and client interactions.

Integration with RxJava, Project Reactor, and the Java Flow API means you can choose your preferred reactive library. Micronaut's HTTP server and clients natively support reactive types — return a Mono, Flux, Single, or Flowable from your controller, and the framework handles backpressure and streaming appropriately.

For high-throughput applications handling thousands of concurrent requests, reactive programming enables better resource utilization. Non-blocking I/O allows a small number of threads to handle massive concurrency by avoiding thread-per-request models. This becomes particularly valuable in microservices architectures where services spend most of their time waiting for network calls to complete.

Netty-Based HTTP Server

Micronaut's HTTP layer is built on Netty, the high-performance, non-blocking network framework that powers numerous production systems including Elasticsearch, Cassandra, and gRPC.

The embedded Netty server provides several advantages over traditional servlet containers. It starts in milliseconds, consumes minimal memory, and handles thousands of concurrent connections efficiently through its event loop architecture. There's no separate container to deploy or configure — your application is the server.

For cloud applications, Netty's characteristics align perfectly with containerized deployments. The non-blocking I/O model means you're not wasting resources on idle threads waiting for requests. The lightweight footprint means smaller container images and faster cold starts. The performance consistency means predictable behavior under load.

HTTP Clients

Micronaut revolutionizes HTTP client development with its declarative client approach. Instead of writing boilerplate HTTP code, you define an interface with annotations and Micronaut generates the implementation at compile time.

@Client("https://api.example.com")
public interface UserClient {

    @Get("/users/{id}")
    User getUser(@PathVariable Long id);

    @Post("/users")
    User createUser(@Body User user);

    @Delete("/users/{id}")
    HttpResponse<Void> deleteUser(@PathVariable Long id);
}

At compile time, Micronaut generates a full HTTP client implementation. You simply inject the interface and call methods — no manual HTTP construction, no response parsing, no error handling boilerplate. The generated code handles serialization, deserialization, headers, and error conditions.

For scenarios requiring more control, Micronaut provides a programmatic HTTP client API with full access to requests, responses, headers, and streaming.

Client-side load balancing is built-in, enabling direct service-to-service communication without external load balancers. Combined with service discovery, this creates efficient microservices communication patterns with minimal infrastructure dependencies.

Resilience Features

Distributed systems require resilience patterns, and Micronaut makes them trivial to implement through declarative mechanisms.

The @Retryable annotation adds automatic retry logic with configurable delays and maximum attempts:

@Retryable(attempts = "3", delay = "2s")
public User fetchUser(Long id) {
    return userClient.getUser(id);
}

The @CircuitBreaker annotation protects against cascading failures by opening circuits when error rates exceed thresholds:

@CircuitBreaker(reset = "60s")
public List<Product> getProducts() {
    return productService.fetchAll();
}

Fallback mechanisms allow graceful degradation by specifying alternative methods when primary operations fail. These patterns, which traditionally require separate libraries like Resilience4j or Hystrix, are built directly into Micronaut's AOP layer and generated at compile time.

Threading Model & @ExecuteOn

Understanding Micronaut's threading model is critical for building performant applications. The framework uses an event loop model where a small pool of worker threads handles I/O operations efficiently.

The key distinction is between blocking and non-blocking operations. Non-blocking code — reactive operations, async I/O, HTTP calls returning reactive types — executes on event loop threads without problems. However, blocking code — database queries, file operations, thread sleeps — must not run on event loop threads, as it would prevent other operations from executing.

This is where @ExecuteOn(TaskExecutors.BLOCKING) becomes essential:

@Get("/users/{id}")
@ExecuteOn(TaskExecutors.BLOCKING)
public User getUser(Long id) {
    return userRepository.findById(id); // Blocking database call
}

The annotation tells Micronaut to execute this method on a separate thread pool designed for blocking operations, preventing event loop starvation. Forgetting this annotation when performing blocking operations is a common pitfall that can severely impact application throughput.

For truly non-blocking applications using reactive database drivers (R2DBC) or reactive HTTP clients, you can omit @ExecuteOn and keep everything on the event loop for maximum efficiency.

Cloud-Native Design [Not tried by me]

Micronaut was architected specifically for cloud platforms, and it shows in every integration point.

First-class cloud provider integration means native support for AWS, Google Cloud Platform, and Azure is designed in. Micronaut provides dedicated modules for each cloud provider's services.

Service discovery support includes Consul, Eureka, and Kubernetes service discovery out of the box. Micronaut applications can register themselves and discover other services without external configuration management tools.

Distributed configuration support allows applications to pull configuration from Consul, Vault, AWS Parameter Store, or GCP Cloud Config. Configuration changes can be detected and reloaded without restarting applications.

Distributed tracing integration with Zipkin, Jaeger, and OpenTelemetry provides observability for microservices communication patterns. Tracing context propagates automatically across service boundaries.

Kubernetes readiness is built-in with automatic configuration for health checks, config maps, secrets, and service discovery. Deploy Micronaut applications to Kubernetes without complex configuration or sidecars.

Observability is automatic. Micronaut exposes /health and /metrics endpoints by default, implementing standard health check protocols for Kubernetes liveness and readiness probes. The Micronaut Management module provides comprehensive management endpoints, while Micrometer integration enables metrics export to Prometheus, Datadog, New Relic, and other monitoring platforms without manual instrumentation.

GraalVM Native Image Support [Not tried by me]

GraalVM Native Image compilation represents the ultimate in startup performance and memory efficiency. Native images are ahead-of-time compiled binaries that start in milliseconds and consume a fraction of the memory of JVM applications — often 5–10x less.

For serverless and container deployments, these characteristics are transformative. AWS Lambda functions compiled to native images can initialize in under 100ms instead of 10+ seconds. Container-dense environments can pack 5–10x more instances per node. Cold start penalties nearly disappear.

The trade-offs involve build time — native image compilation can take several minutes compared to seconds for standard JVM compilation. For development workflows, you typically run on the JVM and compile native images only for production deployments.

Micronaut's compile-time architecture makes it uniquely suited for native images. Since there's no runtime reflection or dynamic class loading, the static analysis required for native compilation succeeds without extensive configuration. Most Micronaut applications compile to native images with zero additional configuration.

CRaC (Coordinated Restore at Checkpoint) represents an alternative approach to fast startup. Instead of ahead-of-time compilation, CRaC takes a snapshot of a warmed-up JVM application and restores it nearly instantaneously when needed. This provides native image startup speeds while maintaining full JVM compatibility and avoiding native compilation limitations. Micronaut supports CRaC, giving teams flexibility in optimizing startup performance based on their deployment constraints.

Micronaut CLI & Launch

Getting started with Micronaut is straightforward. The Micronaut CLI provides scaffolding commands for creating projects, generating controllers, clients, and beans, and managing dependencies.

Micronaut Launch (https://micronaut.io/launch/) is a web-based project generator similar to Spring Initializr. Select your build tool, language, features, and cloud integrations, and Launch generates a complete project structure ready for development. This makes starting new Micronaut projects effortless — no manual configuration or dependency management required.

Micronaut Data

Micronaut Data applies the framework's compile-time philosophy to data access, generating repository implementations at compile time rather than runtime.

Support for JPA, JDBC, MongoDB, and R2DBC covers both traditional and reactive data access patterns. You define repository interfaces with query methods, and Micronaut Data generates implementations during compilation:

@JdbcRepository(dialect = Dialect.POSTGRES)
public interface UserRepository extends CrudRepository<User, Long> {

    Optional<User> findByEmail(String email);

    List<User> findByAgeGreaterThan(int age);

    @Query("SELECT * FROM users WHERE status = :status ORDER BY created_at DESC")
    List<User> findRecentActiveUsers(String status);
}

The key advantage is compile-time query validation. Micronaut Data validates your query methods against your entity model during compilation. If you reference a non-existent field or use incorrect syntax, you get a compilation error, not a runtime exception in production.

Compare this to Spring Data, which generates implementations using reflection at startup. Spring Data defers error detection to runtime, meaning invalid queries only fail when executed. Micronaut Data catches these errors at compile time, providing faster feedback and higher confidence.

The compile-time approach also contributes to Micronaut's startup performance — there's no runtime query generation or repository proxy creation. The implementations are standard bytecode ready to execute immediately.

Testing Support

Micronaut provides comprehensive testing support designed for integration testing microservices.

The @MicronautTest annotation starts a Micronaut application context for your tests with dependency injection fully available:

@MicronautTest
public class UserServiceTest {

    @Inject
    UserService userService;

    @Test
    void testUserCreation() {
        User user = userService.create("test@example.com");
        assertNotNull(user.getId());
    }
}

JUnit 5 integration is first-class, with support for parameterized tests, lifecycle callbacks, and test instance per class behavior. Testcontainers integration enables spinning up databases, message brokers, and other infrastructure for integration tests, ensuring tests run against real dependencies.

Micronaut's fast startup makes integration testing practical — even complex applications start in under a second, meaning comprehensive integration test suites remain fast enough for continuous integration pipelines.

Additional Features

Micronaut includes numerous features expected from modern frameworks:

API versioning supports multiple approaches including header-based, URI-based, and parameter-based versioning. Routes can specify version constraints, allowing smooth API evolution.

Validation through @Valid and custom @Constraint annotations integrates with Bean Validation (JSR 380). Validation occurs automatically on controller inputs, with detailed error responses.

Error handling provides customizable exception handlers, global error responses, and standardized error formats. Custom exception handlers can transform application exceptions into appropriate HTTP responses.

Security features via Micronaut Security include OAuth2, JWT, OpenID Connect, basic authentication, session-based authentication, and authorization rules. The security module integrates seamlessly with major identity providers and supports both stateless and stateful authentication patterns.

Configuration management uses YAML or properties files with support for environment-specific configurations, configuration placeholders, and type-safe configuration properties through @ConfigurationProperties.

Federation Projects — Enterprise Ecosystem

Micronaut's ecosystem extends through specialized modules addressing enterprise needs:

Micronaut Data provides compile-time repository generation for JPA, JDBC, MongoDB, and R2DBC. Micronaut Security delivers comprehensive authentication and authorization with OAuth2, JWT, and OIDC support. Integration modules cover gRPC for high-performance RPC, Kafka for event streaming, RabbitMQ for message queuing, and various SQL/NoSQL databases.

Additional modules support GraphQL, caching (Redis, EHCache, Hazelcast), scheduling, email, templating engines, and cloud-specific services. This growing ecosystem provides production-ready components while maintaining Micronaut's performance characteristics.

Detailed Comparison with Spring Boot

The fundamental difference between Micronaut and Spring Boot lies in their core architecture: reflection-based versus compile-time code generation.

Spring Boot's reflection-based approach provides flexibility and extensive third-party library compatibility. The runtime dependency injection allows dynamic bean creation, conditional loading, and runtime configuration. This flexibility comes at a cost: classpath scanning, reflection-based proxy generation, and runtime context initialization consume significant time and memory at startup.

Micronaut's compile-time approach trades some runtime flexibility for performance and predictability. By generating all dependency injection, AOP, and configuration code during compilation, Micronaut eliminates startup overhead entirely. The resulting applications start faster, consume less memory, and behave predictably.

From a philosophy perspective, Spring Boot emerged from enterprise Java tradition, evolving to support cloud deployments. Its massive ecosystem, mature tooling, and extensive third-party integration options reflect decades of evolution. Micronaut was designed specifically for modern cloud-native requirements, prioritizing efficiency and startup performance from inception.

Migration considerations are important: Micronaut offers Spring API compatibility modules that support Spring annotations like @Autowired, @Component, and @RequestMapping. This compatibility layer eases migration for Spring teams, allowing gradual adoption without rewriting all application code immediately. However, to fully benefit from Micronaut's advantages, eventually adopting its native annotations is recommended.

The learning curve for Spring developers is gentle. Most dependency injection and web controller patterns translate directly. @Singleton replaces @Component, @Inject replaces @Autowired, but the concepts remain identical.

When Spring Boot remains the better choice: Large monolithic applications where startup time is irrelevant and memory footprint isn't constrained find little benefit from Micronaut's optimizations. Teams heavily invested in the Spring ecosystem with extensive use of Spring-specific libraries, Spring Batch, Spring Integration, or niche third-party Spring extensions may face integration challenges. Projects requiring specific third-party libraries without Micronaut support should carefully evaluate compatibility before committing to migration.

Detailed Comparison with Quarkus

Micronaut and Quarkus share similar goals — fast startup, low memory consumption, cloud-native design — but approach them differently.

Core philosophy differs significantly. Quarkus prioritizes Jakarta EE and MicroProfile standards compatibility, positioning itself as the natural evolution of Java EE for cloud-native applications. Red Hat's stewardship means strong alignment with enterprise Java standards and specification compliance. Micronaut takes a lightweight, custom annotation approach, designed without legacy specification constraints. While Micronaut supports JAX-RS through extension modules, it doesn't position standards compatibility as a primary goal.

Platform focus reveals different priorities. Quarkus is heavily Red Hat/Kubernetes-focused, with exceptional integration for OpenShift, Kubernetes operators, and Red Hat's cloud offerings. Micronaut is platform-agnostic with strong serverless focus, providing first-class support for AWS Lambda, Azure Functions, and GCP Functions alongside Kubernetes deployments. This makes Micronaut particularly attractive for multi-cloud strategies and serverless-first architectures.

Language support in Micronaut extends more broadly across Java, Kotlin, and Groovy with consistent feature parity. Quarkus focuses primarily on Java with growing Kotlin support but less emphasis on alternative JVM languages.

Choosing between them often comes down to organizational context. Choose Quarkus if you're committed to Jakarta EE standards, heavily invested in Red Hat's ecosystem, or prioritize standards-based portability. Choose Micronaut if you need strong serverless support, prefer platform agnosticism, or want broader JVM language support without specification overhead.

When to Use Micronaut

Micronaut excels in specific scenarios where its architectural advantages deliver maximum value:

Serverless functions on AWS Lambda, Azure Functions, or GCP Functions benefit immensely from fast startup and low memory consumption. Sub-second cold starts make Micronaut ideal for event-driven architectures where functions must respond immediately.

Cloud-native applications with strict resource constraints find Micronaut's efficiency critical. When you're paying for memory and compute by the second, reducing memory footprint by 50–70% (approximation) directly impacts your bill.

Projects prioritizing lowest Total Cost of Ownership (TCO) gain competitive advantage through infrastructure cost reduction. Running the same workload on fewer instances with smaller resource allocations translates to real savings at scale.

Event-driven architectures requiring high message throughput benefit from Micronaut's reactive programming support and efficient threading model. Non-blocking I/O enables handling thousands of concurrent events with minimal resources.

High-throughput applications processing large request volumes appreciate the Netty-based HTTP server's performance characteristics and efficient resource utilization.

Container-dense environments where you need to maximize instance density per node see significant cost benefits. Smaller memory footprints mean more containers per host, reducing infrastructure requirements.

Greenfield microservices projects without legacy constraints can leverage Micronaut's modern architecture without migration concerns. Starting fresh with Micronaut avoids technical debt from older framework patterns.

When NOT to Use Micronaut

Honest assessment requires acknowledging scenarios where Micronaut isn't the best choice:

Large monolithic applications where startup time happens once and memory footprint isn't constrained gain minimal benefit from Micronaut's optimizations. If your application starts once per week and has gigabytes of memory available, Spring Boot's ecosystem advantages outweigh Micronaut's efficiency.

Teams deeply invested in the Spring ecosystem with extensive use of Spring-specific modules may face substantial migration costs. If you depend heavily on Spring Batch, Spring Integration, Spring Cloud Data Flow, or niche Spring extensions, evaluate integration carefully.

Projects requiring niche third-party libraries without Micronaut support should verify compatibility thoroughly. While Micronaut's ecosystem is growing, Spring's decades of development mean broader third-party library support.

When developer familiarity trumps performance needs, sticking with what your team knows may be pragmatic. If your developers are Spring experts and performance isn't a bottleneck, retraining costs might exceed efficiency benefits.

Extensive legacy integration requirements with systems expecting specific Spring behaviors may complicate adoption. While Spring compatibility modules help, some Spring-specific patterns don't translate directly.

Conclusion

Micronaut delivers the full JVM feature set without the traditional JVM performance tax. By eliminating runtime reflection through compile-time code generation, Micronaut achieves startup times and memory footprints previously requiring significant framework compromises.

The framework was designed specifically for the cloud-native and serverless era, where applications must start instantly, consume minimal resources, and scale efficiently. These characteristics directly impact operational costs and user experience in modern deployment models.

A balanced perspective is essential: Micronaut isn't a universal Spring Boot replacement. It excels in specific scenarios — serverless deployments, container-dense environments, cost-sensitive applications, and greenfield microservices — where its architectural advantages deliver measurable value. For monolithic applications, Spring-heavy ecosystems, or teams without performance constraints, Spring Boot's maturity and ecosystem remain compelling.

Key strengths that distinguish Micronaut include compile-time dependency injection eliminating reflection overhead, sub-second startup times enabling serverless viability, memory footprints 50–70% (approximation) smaller than Spring Boot, first-class reactive programming support, native cloud integrations for AWS, GCP, and Azure, and GraalVM native image support with minimal configuration.

Who should seriously consider Micronaut: Organizations deploying serverless functions requiring fast cold starts, teams building microservices with strict resource constraints, projects where infrastructure costs are significant operational expenses, greenfield applications without legacy framework commitments, and engineering teams valuing performance efficiency and modern JVM practices.

The future looks promising. As cloud costs continue rising and serverless adoption accelerates, frameworks optimized for these deployment models gain strategic importance. Micronaut's architecture positions it well for emerging patterns like edge computing, where startup performance and resource efficiency become even more critical. The framework's growing ecosystem, strong community, and backing from Object Computing and Oracle indicate continued investment and evolution.

Code Repository

rajkundalia / product-catalogue-micronaut

This is a sample product catalogue with external call mocked - with Micronaut

Product Catalog REST API

A production-ready Micronaut 4.x REST API demonstrating key framework features including Data JDBC, validation, declarative HTTP clients, resilience patterns, reactive streaming, and observability.

Features

Core Capabilities

RESTful CRUD Operations - Complete product catalog management
Micronaut Data JDBC - Compile-time repository generation with H2 database
Bean Validation - Request validation using Hibernate Validator
Declarative HTTP Client - Type-safe HTTP client with annotations
Resilience Patterns - Retry, circuit breaker, and fallback mechanisms
Server-Sent Events (SSE) - Reactive streaming for real-time updates
Health Checks & Metrics - Production-ready observability endpoints
Global Exception Handling - Consistent error responses
Comprehensive Testing - Unit, integration, and data layer tests

Technology Stack

Framework: Micronaut 4.x
Language: Java 17
Build Tool: Gradle
Database: H2 (in-memory)
ORM: Micronaut Data JDBC
Testing: JUnit 5, Mockito, AssertJ

Setup & Installation

Prerequisites

Java 17 or higher
Gradle 7.0+

Build the Project

…

View on GitHub

Development Experience

Micronaut seemed very clumsy with Maven.
Also, use the https://micronaut.io/launch for starting, works better.
It was crazy getting swagger to work — follow #2 for generating a project, that would make your life easier.
Micronaut community is not as big as Quarkus.
Build takes time but run like Quarkus is super quick.

Quarkus: Revolutionizing Java Development for the Cloud-Native Era

Raj Kundalia — Fri, 26 Sep 2025 19:07:18 +0000

In the rapidly evolving landscape of cloud-native development, Java has faced criticism for being slow to start and memory-hungry compared to newer technologies. Enter Quarkus, a framework that promises to make Java “supersonic and subatomic” while maintaining the rich ecosystem and developer experience Java developers love.

If you’re coming from Spring Boot or other traditional Java frameworks, this comprehensive guide will help you understand what Quarkus brings to the table and whether it’s the right choice for your next project.

Sample project link with a readme: https://github.com/rajkundalia/url-shortener-quarkus

1. Introduction to Quarkus

Quarkus is a Kubernetes-native Java stack tailored for OpenJDK HotSpot and GraalVM, crafted from the best-of-breed Java libraries and adhering to Jakarta EE and MicroProfile standards. Developed by Red Hat, it’s designed to make Java a leading platform in Kubernetes and serverless environments by dramatically reducing startup times and memory consumption.

The “Supersonic Subatomic Java” tagline isn’t just marketing — it reflects Quarkus’s core promise:

Supersonic: Lightning-fast startup times (under 100ms for many applications)
Subatomic: Minimal memory footprint (as low as 12MB for simple applications)

Key Benefits

Performance: Applications start in milliseconds rather than seconds, with memory usage reduced significantly compared to traditional frameworks. This translates to significant cost savings in cloud environments.
Developer Productivity: Live coding capabilities, dev UI, and tooling make development faster and more enjoyable.
Cloud-Native First: Built with containers, Kubernetes, and serverless in mind. Quarkus generates Kubernetes manifests, Docker files, and provides native compilation support out of the box.

Target Use Cases

Quarkus excels in:

Microservices architectures
Serverless applications that need instant startup
Cloud-native applications requiring efficiency
High-throughput, low-latency systems leveraging reactive programming

What sets Quarkus apart is its compile-time optimization approach. Unlike traditional frameworks (e.g., Spring Boot) that rely on runtime classpath scanning and reflection, Quarkus shifts this work to build time, eliminating overhead at runtime.

2. Core Architecture & Development Experience

Dependency Injection

Quarkus uses CDI (Contexts and Dependency Injection) 2.0, specifically the ArC implementation. All DI metadata is processed at build time:

@ApplicationScoped
public class GreetingService {
    public String greeting(String name) {
        return "Hello " + name;
    }
}

@Path("/hello")
public class GreetingResource {
    @Inject
    GreetingService service;

    @GET
    @Produces(MediaType.TEXT_PLAIN)
    public String hello() {
        return service.greeting("World");
    }
}

Extension Ecosystem

Extensions augment application bytecode at build time, moving runtime operations to compile time. Popular extensions include:

quarkus-resteasy-reactive — REST endpoints
quarkus-hibernate-orm-panache — database operations
quarkus-smallrye-openapi — API documentation
quarkus-micrometer — metrics

Configuration Management

Configuration with application.properties is simple:

# Database configuration
quarkus.datasource.db-kind=postgresql
quarkus.datasource.jdbc.url=jdbc:postgresql://localhost/mydatabase

# Profiles
%dev.quarkus.log.level=DEBUG
%test.quarkus.datasource.jdbc.url=jdbc:h2:mem:test
%prod.quarkus.log.level=WARN

Hot Reloading & Live Coding

Run mvn quarkus:dev for instant updates:

Code recompiles automatically
Config updates apply instantly
Continuous testing runs automatically

DevUI

At http://localhost:8080/q/dev/, you’ll find:

Extension management
Database console
OpenAPI browser
Config editor
Health checks & metrics

Testing

Testing is seamless with @QuarkusTest:

@QuarkusTest
public class GreetingResourceTest {
    @Test
    public void testHelloEndpoint() {
        given()
            .when().get("/hello")
            .then()
            .statusCode(200)
            .body(is("Hello World"));
    }
}

3. Performance & Technology Stack

JVM vs Native Mode

JVM Mode: Faster than Spring Boot (0.8–1.5s startup, 50–100MB memory), build time ~45s
Native Mode: GraalVM compilation gives <100ms startup, ~12–20MB memory, tiny container images, build time ~2–5 minutes [Not tried, obtained from Internet]

Trade-off: Native builds take longer (2–5 minutes).

Performance Comparison

Aspect	Quarkus (JVM)	Spring Boot
Startup Time	0.8–1.5s	2.5–4s
Memory Usage	50–100MB	250–500MB
Build Time	~45s	~30s

Note: Please note, these are approximate numbers.

Technology Stack

Eclipse Vert.x → Reactive, event-driven foundation
Eclipse MicroProfile → Enterprise features (config, health, metrics, security, tracing)
GraalVM Integration → AOT compilation, dead code elimination, pre-initialized structures

4. Cloud-Native & Production Features

Container-First

Quarkus auto-generates optimized Dockerfiles:

FROM registry.access.redhat.com/ubi8/openjdk-11:1.3
COPY target/quarkus-app/ /deployments/

Native images shrink containers to under 100MB.

Kubernetes Integration

Auto-generated manifests
ConfigMaps & Secrets support
Health/readiness probes
Operator integration

Observability

Health endpoints
Metrics via Micrometer/Prometheus
Tracing with Jaeger/OpenTracing
Structured logging

Security

OIDC, OAuth2, JWT
RBAC via annotations
Keycloak integration

5. Framework Comparisons

Aspect	Quarkus	Spring Boot
Startup Time	0.8–1.5s (JVM), <0.1s (Native)	2.5–4s
Memory Usage	50–100MB (JVM), 10–30MB (Native)	250–500MB
Architecture	Build-time optimized	Runtime reflection/scanning
Ecosystem	Growing, EE/MicroProfile aligned	Mature, massive
Native Compilation	Core feature	Experimental (Spring Native)
Reactive	Built-in (Vert.x)	Optional

Please note, numbers can vary.

Choose Quarkus when you need performance, microservices, serverless, or reactive programming.

Quarkus vs Micronaut

Quarkus: Build-time optimization, enterprise-ready (MicroProfile), Red Hat backing
Micronaut: Compile-time DI, small footprint, broader language support (Kotlin, Groovy)

Essential Extensions

quarkus-resteasy-reactive-jackson — REST + JSON
quarkus-hibernate-orm-panache — ORM
quarkus-jdbc-postgresql — Database
quarkus-smallrye-health — Health checks
quarkus-micrometer-registry-prometheus — Metrics
quarkus-container-image-docker — Container builds

Learning Resources

Official Guides
GitHub
Community: Zulip, Stack Overflow, GitHub Discussions
Red Hat training for enterprises
https://code.quarkus.io/?e=rest
https://www.logicmonitor.com/blog/quarkus-vs-spring — a good read
https://medium.com/sysco-labs/quarkus-a-supersonic-subatomic-java-e7c3ba510d79
https://maddevs.io/blog/spring-boot-vs-quarkus/

Conclusion

Quarkus represents a significant evolution in Java development — addressing performance bottlenecks while preserving Java’s strengths.

With sub-second startups, tiny memory usage, and enterprise-ready features, it’s tailor-made for microservices, serverless, and cloud-native apps. Backed by Red Hat and a growing community, Quarkus is positioned for long-term success.

Whether starting fresh or migrating from Spring Boot, Quarkus deserves serious consideration for your next Java application.

Learning TypeScript by Building a Markdown Editor

Raj Kundalia — Wed, 27 Aug 2025 11:47:57 +0000

When I wanted to learn TypeScript, I decided not to just read the docs — I built a small project with the help of an LLM: a Markdown Editor using Next.js and TypeScript.

👉 Code: Markdown Editor

My Observations

UI development is not very intuitive to me, but since TypeScript code is less verbose than Java, generating fixes and iterating with an LLM felt simpler.
I provided the final prompt I gave an LLM (Claude) to generate the code. It took some iterations to refine.
If I had just built something in TypeScript without UI, I would have been more comfortable. Adding a UI layer made it more complex.
I tried to understand most of the code I committed, but I wouldn’t call myself an expert yet.
Will I experiment with UI using LLMs again? Definitely. It was a lot of fun.

Note: You can check out the prompt and the github repository. The rest might bore you, so feel free to skip it.

The Prompt I Used

Here’s the exact prompt that generated the project:

🚀 Prompt: Next.js + TypeScript Markdown Editor with Toolbar

Build a Markdown Editor using Next.js + TypeScript + TailwindCSS with a live preview.

✅ Features to Implement

Core Editor
- Two-pane layout:
  - Left → Markdown input (<textarea>)
  - Right → Live preview (rendered using react-markdown)
- Use CSS Grid/Flexbox for clean split layout

Toolbar (Formatting Buttons)
- Undo & Redo → rely on browser <textarea> undo/redo stack
- Bold → Inserts **selected text**
- Italic → Inserts *selected text*
- Headings H1–H6 → Inserts #, ##, … ######
- Unordered List → Inserts - item
- Ordered List → Inserts 1. item
- Blockquote → Inserts > quote
- Code Block → Inserts fenced triple backticks (```

)
- Table → Inserts a Markdown table skeleton
- URL/Link → Inserts [text](url)

Markdown Rendering
- Use react-markdown with remark-gfm for:
  - Tables
  - Lists
  - Strikethrough
  - URLs
- Add syntax highlighting in code blocks with react-syntax-highlighter 
  (or Prism.js/Highlight.js)

Other Features
- Persistence → Save editor state in localStorage and restore on reload
- File Import/Export →
  - Export current content as .md file
  - Import .md file via <input type="file"> and load into editor
- Light/Dark Theme → Toggle using Tailwind dark mode

🔑 Requirements
- Must be Next.js + TypeScript
- Use TailwindCSS for styling
- Strong typing everywhere (e.g., React.ChangeEvent<HTMLTextAreaElement>)
- Toolbar actions should insert Markdown at the cursor position
- Modular code (components: Editor, Preview, Toolbar, ThemeToggle)
- Add comments where necessary
- Include a README.md with setup instructions
- Give a working project

Why TypeScript?

Coming from Java, here’s what I missed in plain JavaScript:

Types → Catching errors before runtime
Interfaces → Defining object shapes
Tooling → Better autocomplete and refactoring

TypeScript provides these while staying close to JavaScript.

Key TypeScript Features

Here are the beginner-friendly features:

1. Type Annotations


ts
let username: string = "Raj";
let age: number = 30;
let isActive: boolean = true;

// Prevents mistakes like:
age = "thirty"; // ❌ error

2. Functions with Types


ts
function add(a: number, b: number): number {
  return a + b;
}

// TypeScript catches:
add(2, "3"); // ❌ error

3. Arrays & Objects


ts
let numbers: number[] = [1, 2, 3];

let user: { id: number; name: string } = {
  id: 1,
  name: "Raj"
};

4. Interfaces


ts
interface User {
  id: number;
  name: string;
}

let u: User = { id: 1, name: "Alice" };

5. Generics


ts
function identity<T>(arg: T): T {
  return arg;
}

let num = identity<number>(10);   // returns 10
let str = identity<string>("Hi"); // returns "Hi"

This clicked quickly for me since it’s similar to Java Generics.

👉 Check out the full project here:
GitHub Repo – Markdown Editor with TypeScript