Forem: Manjunath G

Building a Reliable LangGraph Workflow: Plan-Execute-Validate (PEV), Automated Retries, and MCP Integration in One Template

Manjunath G — Fri, 10 Apr 2026 21:48:37 +0000

Most LangGraph agent examples stop at "hello world." A basic planner, an executor that calls a search tool, and a printed final result. That is fine for a demo, but the moment you try to ship an autonomous agent in production, you immediately run into the same four problems:

How do I prevent an agent from silently hallucinating missing details between steps?
How do I build a self-correcting loop that catches bad tool outputs before they poison the final answer?
How do I optimize costs so I'm not using expensive reasoning models for simple bookkeeping?
How do I safely integrate this orchestration layer with enterprise data and capabilities?

I spent time solving all of these in a real production environment—a regulated life sciences platform automating scientific workflows over tens of millions of research records—and packaged the result as a template anyone can fork: langgraph-plan-execute-validate.

This post explains the decisions behind each piece.

What is PEV and Why Does it Need a Production Template?

The standard LangGraph Plan-and-Execute pattern has two nodes: Plan and Execute. The gap between that two-node quickstart and a deployable, reliable service is significant.

In production, execution quality is not binary. An agent can technically complete a step while producing output that is incomplete, hallucinated, or missing a critical detail. Without a quality gate, those failures propagate silently to the next step.

You need:

A structured Validator to score outputs
A deterministic Router to handle retries and replanning
An Audit Trail to debug what actually happened
A cost-effective multi-model strategy

The template gives you all of this wired together and working out of the box.

Feature	Standard plan-execute	This template
Planning node	✓	✓
Execution node with tool calls	✓	✓
Validation + confidence score	✗	✓ 0.0 – 1.0
Per-step retry with feedback injection	✗	✓ configurable
Automatic replanning on exhausted retries	✗	✓ with failure context
Multi-model cost optimisation	✗	✓ haiku/sonnet split
Full audit trail (every attempt)	✗	✓ operator.add accumulator
Structured outputs (no string parsing)	✗	✓ Pydantic models

The Architecture

The graph adds two nodes to the standard pattern: a Validator that scores every step output, and a Router (pure Python, no LLM) that decides what happens next.

START
  │
planner ◄──────────────────────────────────────── (replan)
  │
executor ◄─────────────────────────────── (retry)
  │
validator
  │
router ─── score ≥ threshold, more steps ──► executor (next step)
         ─── score ≥ threshold, last step  ──► END (complete)
         ─── score < threshold, retry left ──► executor (retry)
         ─── score < threshold, retry gone ──► planner (replan)
         ─── all limits exhausted           ──► END (failed)

System Overview

State Machine

Problem 1: Silent Failures in 2-Node Workflows

The naive approach is passing the output of the Executor directly to the next step. If Step 1 returns "I couldn't find the data," the Executor often just moves to Step 2 anyway, hallucinating the missing context.

The template solves this by introducing a third node: a structured Validator that scores every step output (0.0–1.0) against the original intent.

cfg = PEVConfig(
    # Quality gate
    pass_threshold = 0.80,    # score >= this -> step passes

    # Loop guards
    max_retries  = 2,         # retries per step before escalating to replan
    max_replans  = 1,         # full replanning cycles before marking failed
)

If the score is below the pass_threshold, the system doesn't move forward—it recovers. The validator produces a structured output with both a numeric score and a one-sentence explanation, which gets injected into the next retry's prompt.

Problem 2: Automated Retries and Replanning

When an execution fails validation, throwing an exception or failing the entire run is brittle. The LLM needs a chance to correct itself.

The template uses a pure Python Router node that implements deterministic recovery logic. When a step scores below the threshold, the Router injects the Validator's feedback directly into the next prompt and triggers a retry:

# The Router decides how to recover — no LLM involved
if score >= cfg.pass_threshold:
    next_idx = idx + 1
    if next_idx >= len(plan):
        return {"status": "complete", "_next": "complete"}
    return {
        "current_step_idx": next_idx,
        "retry_count": 0,
        "status": "executing",
        "_next": "execute",
    }

if retry_count < cfg.max_retries:
    return {
        "retry_count": retry_count + 1,
        "status": "executing",
        "_next": "retry",
    }

if replan_count < cfg.max_replans:
    return {"status": "planning", "_next": "replan"}

# All recovery options exhausted
return {
    "status": "failed",
    "error": f"Step '{plan[idx]}' failed after {retry_count} retries and {replan_count} replans.",
    "_next": "failed",
}

Design note: LangGraph conditional edges are read-only—they can't update state. The Router is a real node so it can both update current_step_idx / retry_count AND write the routing decision to state["_next"]. A single _dispatch conditional edge then reads that field to pick the next node.

If max_retries are exhausted, the Router escalates to the Planner, regenerating the entire remaining plan with the failure context. Nothing moves forward until it passes the quality gate.

Request Lifecycle

Problem 3: Multi-Model Cost Optimization

Routing every trivial check and structured output through your most capable (and expensive) model destroys the ROI of autonomous agents.

The template uses a 3-model routing strategy initialized via the configuration:

cfg = PEVConfig(
    # Cheap model for structured JSON planning
    planner_model   = "claude-haiku-4-5-20251001",

    # Capable model for complex tool calls + reasoning
    executor_model  = "claude-sonnet-4-6",

    # Cheap model for scoring + generating one sentence of feedback
    validator_model = "claude-haiku-4-5-20251001",
)

The planner and validator only produce structured JSON—a cheaper model handles this perfectly. The executor is where reasoning and tool use happen, so we invest in the capable model there.

Cheap  (claude-haiku  ~$0.25 / 1M tokens)
  ├── Planner    — JSON output only
  └── Validator  — score + one sentence

Capable (claude-sonnet  ~$3 / 1M tokens)
  └── Executor   — tool calls + reasoning

This design cuts per-run costs by ~60–70% while maintaining high final quality. A typical 3-step task costs ~$0.01 vs ~$0.027 if you route everything through sonnet.

Problem 4: MCP Integration for Enterprise Capabilities

Reliability doesn't stop at orchestration; it extends to your capabilities. Standard tool calling often means hardcoding database credentials or API integrations directly into the agent code.

The template is designed to pair with the FastMCP Production Template. By separating the Orchestrator (PEV) from the Tools (MCP), you can expose enterprise databases and internal APIs securely via a standalone FastMCP server, and pull them into the PEV loop with just a few lines:

from langchain_mcp_adapters.tools import load_mcp_tools
from pev import create_pev_graph, PEVConfig

# Connect to an MCP server (e.g., built with fastmcp-production-template)
mcp_tools = load_mcp_tools("uv", ["run", "mcp_server.py"])

# Orchestrate with PEV's quality gates
graph = create_pev_graph(PEVConfig(tools=mcp_tools, pass_threshold=0.85))

This is the Full-Stack AI architecture: PEV acts as the Brain (reasoning, planning, validation) and MCP acts as the Hands (standardized, secure access to enterprise data). You can update tools and data integrations without redeploying your core agent logic.

The Audit Trail: Observability at the Step Level

When something goes wrong with an AI-driven workflow, you need to know exactly where the agent struggled and how many attempts it took.

The template preserves every attempt in step_results via operator.add. Nothing is ever overwritten:

result["step_results"]
# [
#   StepResult(step="Search for X",  score=0.55, attempts=1, feedback="Missing Y — result was too generic"),
#   StepResult(step="Search for X",  score=0.88, attempts=2, feedback="Good. All required details present."),
#   StepResult(step="Summarise X",   score=0.92, attempts=1, feedback="Complete and well-structured."),
# ]

This is the operational signal that matters. It proves the system is catching failures and self-correcting before returning data to the user. You can see exactly where the agent struggled, what feedback it received, and how many attempts each step took.

The state is defined with Annotated[list[StepResult], operator.add] so LangGraph handles the accumulation automatically:

class PEVState(TypedDict):
    # operator.add means append-only — nothing is ever overwritten
    step_results: Annotated[list[StepResult], operator.add]
    ...

Getting Started

git clone https://github.com/ManjunathGovindaraju/langgraph-plan-execute-validate.git
cd langgraph-plan-execute-validate
uv sync
cp .env.example .env   # add your ANTHROPIC_API_KEY

Five lines to run your first agent:

from pev import create_pev_graph, initial_state, PEVConfig

graph = create_pev_graph(PEVConfig(pass_threshold=0.85))
result = graph.invoke(initial_state("Research the top 3 vector databases"))

print(result["status"])       # "complete"
print(result["step_results"]) # scored audit trail for every step

Run the included examples:

python examples/research_agent.py        # web search + validate
python examples/code_review_agent.py     # strict threshold, shows retry flow
python examples/data_analysis_agent.py   # no tools, LLM reasoning only

The template also includes a benchmark_reliability.py script that uses a "Flaky Tool" to prove the Validator catches generic results 100% of the time, forcing a retry that retrieves high-fidelity data:

make example-benchmark

Full Configuration Reference

from pev import PEVConfig

cfg = PEVConfig(
    # Model routing
    planner_model   = "claude-haiku-4-5-20251001",
    executor_model  = "claude-sonnet-4-6",
    validator_model = "claude-haiku-4-5-20251001",

    # Quality gate
    pass_threshold = 0.80,

    # Loop guards
    max_retries  = 2,
    max_replans  = 1,

    # Tools (planner and validator never see these)
    tools = [TavilySearchResults(max_results=3)],
)

Parameter	Default	Description
`planner_model`	`claude-haiku-4-5-20251001`	Structured JSON output only
`executor_model`	`claude-sonnet-4-6`	Tool calls + reasoning
`validator_model`	`claude-haiku-4-5-20251001`	Scoring only
`pass_threshold`	`0.80`	Minimum score [0.0–1.0] for a step to pass
`max_retries`	`2`	Retries per step before triggering replan
`max_replans`	`1`	Full replanning cycles before marking failed
`tools`	`[]`	LangChain tools available to the executor

Project Structure

langgraph-plan-execute-validate/
├── src/pev/
│   ├── __init__.py          # Public API: create_pev_graph, initial_state, PEVConfig
│   ├── graph.py             # StateGraph wiring, router node, _dispatch edge
│   ├── state.py             # PEVState TypedDict, StepResult, Status
│   ├── config.py            # PEVConfig dataclass with validation
│   ├── prompts.py           # All prompt templates (one place, easy to tune)
│   └── nodes/
│       ├── planner.py       # Structured output, replan-aware
│       ├── executor.py      # Tool-call loop, feedback injection on retry
│       └── validator.py     # Confidence scoring, audit trail append
├── examples/
│   ├── research_agent.py
│   ├── code_review_agent.py
│   ├── data_analysis_agent.py
│   └── mcp_agent.py
├── tests/
│   ├── test_planner.py
│   ├── test_executor.py
│   ├── test_validator.py
│   ├── test_retry_replan.py # Router decision tree — 12 routing scenarios
│   └── test_graph.py
└── docs/
    └── architecture.md      # Full architecture with Mermaid diagrams

Testing

The test suite is designed to run without API calls in CI:

# Unit tests — no API calls, runs in ~5 seconds
uv run pytest tests/ -m "not slow" -v

# Integration tests — requires ANTHROPIC_API_KEY
uv run pytest tests/ -m slow -v

The router decision tree is the most critical test target—test_retry_replan.py covers all 12 routing branches:

Test file	What it covers
`test_planner.py`	First-plan vs replan, state resets, step injection
`test_executor.py`	Context injection, retry feedback, tool-call loop
`test_validator.py`	Score/feedback writing, score clamping, audit trail
`test_retry_replan.py`	Every router branch — 12 routing scenarios
`test_graph.py`	Config validation, graph compilation, `initial_state`

What This Is Not

This template is opinionated about the things that are always true in production agent workflows: quality gates, retry determinism, audit trails, and cost optimization. It does not make choices about your domain logic or the specific tools you expose.

Fork it, define your domain-specific tools (or load them via MCP), tune the validator prompts in prompts.py for your use case, and you have a production-grade orchestration engine without building the reliability layer from scratch.

github.com/ManjunathGovindaraju/langgraph-plan-execute-validate

If you found this useful, the companion post on the MCP side of this architecture is here: Building a Production-Ready MCP Server: Async PostgreSQL, OpenTelemetry, and Kubernetes in One Template

Building a Production-Ready MCP Server: Async PostgreSQL, OpenTelemetry, and Kubernetes in One Template

Manjunath G — Fri, 10 Apr 2026 04:01:50 +0000

GitHub: fastmcp-production-template — fork, rename the tools, ship.

Over the past year I built and deployed 20+ MCP servers in production inside a regulated life sciences environment — powering AI agents that search 57M+ research records, automate scientific workflows, and surface real-time data to LLMs via 150+ registered tools.

Every new server started the same way: copy the FastMCP quickstart, then spend days re-solving the same four problems:

Async database connections that deadlock under concurrent agent calls
No guardrails against prompt injection via unauthorized tool invocation
Zero observability — metrics defined but never actually recorded
Kubernetes deployments cobbled together from unrelated examples

After the third server, I extracted the patterns that actually held up in production and built this template. This is not a toy demo. Fork it, rename the tools to match your domain, and ship.

What Is MCP and Why Does It Need a Production Template?

The Model Context Protocol (MCP) is an open standard that lets AI agents call external tools — databases, APIs, internal services — via a structured JSON-RPC interface. FastMCP is the Python framework that implements it.

Most MCP examples stop at "it works on localhost." The moment you move to production you hit the same set of problems. This template solves all of them.

Problem 1: Async PostgreSQL Connection Pooling

Under concurrent agent load, naive database connections either exhaust the pool or serialize requests through a single connection. The fix is a bounded asyncpg pool initialized once at server startup and shared across all tool calls.

@asynccontextmanager
async def lifespan(server: FastMCP):
    await db_pool.initialize()   # creates the asyncpg pool
    set_pool(db_pool)            # exposes it to tools via module-level singleton
    yield
    await db_pool.close()        # graceful drain on shutdown

The pool is exposed via a module-level singleton (mirroring the pattern in db/pool.py) so tool functions never need to thread connection state through function arguments.

class DatabasePool:
    async def fetch(self, query: str, *args) -> list[dict]:
        async with self.acquire() as conn:
            rows = await conn.fetch(query, *args)
            return [dict(row) for row in rows]

For paginated search tools, asyncio.gather runs the data fetch and count query in parallel — cutting round-trip time roughly in half:

results, total = await asyncio.gather(
    db.fetch(sql, *params),
    db.fetchval(count_sql, *count_params),
)

Pool configuration (DB_POOL_MIN=5, DB_POOL_MAX=20) comes from environment variables via Pydantic Settings. The min-5 floor means warm connections are always available; the max-20 ceiling prevents runaway connection exhaustion under burst load.

Problem 2: Prompt Injection via Tool Invocation

A malicious prompt can ask an LLM to call internal debug tools, admin endpoints, or anything the MCP server exposes. The fix is an explicit allowlist — only tools on the list can be invoked.

# config/allowlist.yaml
allowed_tools:
  - search_records
  - get_record_detail
  - get_statistics
  # get_pool_status intentionally omitted — health is always reachable

The list is loaded once at startup and enforced via a decorator:

def require_allowlist(tool_name: str):
    def decorator(func):
        @wraps(func)
        async def wrapper(*args, **kwargs):
            if not is_allowed(tool_name):
                logger.warning(f"Blocked access to tool '{tool_name}' — not in allowlist")
                raise PermissionError(
                    f"Tool '{tool_name}' is not permitted."
                )
            return await func(*args, **kwargs)
        return wrapper
    return decorator

Decorator order matters. The @instrument_tool decorator (introduced in the observability section below) must be the outermost layer so that blocked calls are counted in metrics — giving you visibility into prompt injection attempts, not just successful calls:

@instrument_tool("search_records")   # outermost — records ALL attempts including blocked
@require_allowlist("search_records") # inner — raises PermissionError if not on list
async def search_records(query: str, limit: int = 20) -> dict:
    ...

SQL injection is prevented at two levels. All user-supplied values use asyncpg parameterized queries ($1, $2). Column names used in dynamic queries are validated against a hardcoded allowlist before the query is constructed:

allowed_filter_cols = {"status", "type", "category"}
if filters:
    invalid = set(filters) - allowed_filter_cols
    if invalid:
        raise ValueError(f"Invalid filter column(s): {invalid}")

Problem 3: Observability

This is the section most MCP templates get wrong — and where I spent the most time getting it right.

The naive approach is to define OpenTelemetry metrics in a setup function and assume they get populated. They don't. Defining a counter and incrementing a counter are two separate things, and many templates (including early versions of this one) only do the first half.

The fix: `@instrument_tool`

Every tool call needs to be wrapped in a decorator that records the metrics. This decorator is placed as the outermost layer on each tool function:

# src/server/observability/instrument.py

def instrument_tool(tool_name: str):
    def decorator(func):
        @functools.wraps(func)
        async def wrapper(*args, **kwargs):
            tel = get_telemetry()
            if tel is None:          # unit tests — zero-overhead pass-through
                return await func(*args, **kwargs)

            attrs = {"tool": tool_name}
            tel.tool_calls.add(1, attrs)
            start = time.monotonic()

            with tel.tracer.start_as_current_span(tool_name):
                try:
                    result = await func(*args, **kwargs)
                    tel.tool_duration.record((time.monotonic() - start) * 1000, attrs)
                    return result
                except Exception:
                    tel.tool_errors.add(1, attrs)
                    raise
        return wrapper
    return decorator

get_telemetry() reads from a module-level singleton in observability/context.py — the same pattern used by db/pool.py for the database pool. It returns None when telemetry is not initialized (unit tests), making the decorator a zero-overhead pass-through in that case.

Four metrics recorded on every call

Metric	Type	What it tells you
`mcp.tool.calls`	Counter	Total invocations per tool (including blocked)
`mcp.tool.errors`	Counter	Failures per tool — allowlist blocks, DB errors, validation failures
`mcp.tool.duration`	Histogram (ms)	Latency distribution — p50/p95/p99 per tool
`mcp.db.pool_size`	Gauge	Live pool size at each export interval

The pool size gauge — a subtle fix

Observable gauges require a callback registered at creation time. Without it the gauge exists but reports nothing. The callback uses late binding via get_pool() so it works even though the pool is initialized after setup_telemetry() runs:

def _pool_size_callback(options):
    """Called by the metrics SDK at each 30s export interval."""
    try:
        from ..db.pool import get_pool  # lazy import — avoids circular dependency
        asyncpg_pool = get_pool()._pool
        if asyncpg_pool is not None:
            return [Observation(asyncpg_pool.get_size())]
    except RuntimeError:
        pass  # pool not yet initialized — skip this cycle
    return []

db_pool_size = meter.create_observable_gauge(
    "mcp.db.pool_size",
    callbacks=[_pool_size_callback],   # ← was missing in early versions
    description="Current DB connection pool size",
)

OTLP or console — controlled by env var

The server emits standard OTLP and is completely backend-agnostic. Whether it exports to a real collector or falls back to stdout is controlled by a single env var:

telemetry = setup_telemetry(
    settings.service_name,
    otel_enabled=settings.otel_enabled,             # True → OTLP gRPC, False → console
    otlp_endpoint=settings.otel_exporter_otlp_endpoint,
)
set_telemetry(telemetry)  # makes it accessible to @instrument_tool via context.py

# .env
OTEL_ENABLED=true
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317

Multi-backend routing via collector configs

The collector/ directory provides three pre-built configs for routing OTLP to different backends — swap the one your infrastructure uses without touching application code:

File	Routes to
`collector/adot-aws.yaml`	AWS X-Ray (traces) + CloudWatch EMF (metrics) via ADOT sidecar
`collector/otelcol-grafana.yaml`	Grafana Tempo (traces) + Prometheus (metrics)
`collector/otelcol-datadog.yaml`	Datadog APM + Datadog Metrics

FastMCP server  →  OTLP :4317  →  Collector (adot / otelcol)  →  X-Ray / Grafana / Datadog

Local observability in one command

docker/docker-compose.observe.yml adds the full Grafana LGTM stack alongside the MCP server:

make observe-up
# Open http://localhost:3000 — traces and metrics appear immediately, no login
make observe-down

This starts: OpenTelemetry Collector → Grafana Tempo (traces) + Prometheus (metrics) → Grafana (dashboards). OTEL_ENABLED is set to true automatically. Datasources are auto-provisioned — no manual Grafana setup.

Problem 4: Kubernetes Deployment

The Helm chart handles everything a production Kubernetes deployment needs.

Horizontal Pod Autoscaler — scales 2–8 replicas at 70% CPU:

autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 8
  targetCPUUtilizationPercentage: 70

External Secrets Operator — pulls DATABASE_URL and API_KEY from HashiCorp Vault (or any ESO-compatible store) without ever putting secrets in your Helm values:

externalSecrets:
  enabled: true
  secretStoreName: vault-backend
  secrets:
    - remoteKey: mcp-server/database
      data:
        - secretKey: DATABASE_URL
          remoteRef:
            key: database_url

Kubernetes health probes point to /health, which checks the database pool before returning 200:

@mcp.custom_route("/health", methods=["GET"])
async def http_health(request: Request) -> JSONResponse:
    try:
        pool = get_pool()
        pool_status = await pool.health_check()
        return JSONResponse({"status": "ok", "pool": pool_status})
    except RuntimeError:
        return JSONResponse(
            {"status": "degraded", "detail": "database pool not initialized"},
            status_code=503,
        )

For AWS deployments, the optional ADOT sidecar annotation in values.yaml routes traces directly to X-Ray without any application code changes — just enable adot.enabled: true and point OTEL_EXPORTER_OTLP_ENDPOINT at localhost:4317.

Production Safeguards in Settings

Two validators in Settings catch the most common production misconfigurations at startup — not at 2am when the first request fails:

@field_validator("database_url")
@classmethod
def validate_database_url(cls, v: str) -> str:
    if not v.startswith(("postgresql://", "postgres://")):
        raise ValueError("DATABASE_URL must be a valid PostgreSQL connection string")
    return v

@model_validator(mode="after")
def validate_api_key_in_production(self) -> "Settings":
    if (
        self.api_key_enabled
        and self.log_level.upper() != "DEBUG"
        and self.api_key == "change-me-in-production"
    ):
        raise ValueError(
            "API_KEY must be changed from the default value before running in production. "
            "Set LOG_LEVEL=DEBUG to bypass this check in development."
        )
    return self

The second validator is deliberately bypassed with LOG_LEVEL=DEBUG — so local dev still works with the default key, but a misconfigured production deploy fails loudly at boot.

Testing

A production-ready template should have production-grade tests. The suite has two tiers:

Unit tests — no database, no network, runs in ~0.5s:

make test-unit     # 28 tests

Covers: allowlist loading and enforcement, @instrument_tool decorator (call counting, error counting, duration recording, span wrapping, functools.wraps preservation), setup_telemetry console and OTLP modes, pool gauge callback with uninitialized pool, all four tool functions with mocked database, and SQL injection prevention.

Integration tests — runs against real PostgreSQL:

# Start postgres first:
docker compose -f docker/docker-compose.yml up -d postgres

# Run against it:
DATABASE_URL=postgresql://mcpuser:mcppassword@localhost:5432/mcpdb \
    make test-integration    # 21 tests

Covers: real DatabasePool lifecycle (initialize, fetch, fetchrow, fetchval, execute, health_check), search_records pagination and filter behaviour, get_record_detail found and not-found paths, get_statistics grouping and totals, and SQL injection prevention against a live query engine.

CI runs both phases automatically — unit tests first, integration tests in a separate job with a postgres:16 service container. Unit tests alone reach 82% coverage; main.py and settings.py are excluded (covered by integration tests) and documented as such.

Getting Started

Two minutes with Docker Compose

git clone https://github.com/ManjunathGovindaraju/fastmcp-production-template.git
cd fastmcp-production-template
cp .env.example .env
docker compose -f docker/docker-compose.yml up

MCP server at http://localhost:8000/mcp. PostgreSQL starts with sample data from docker/init.sql.

With full observability

cp .env.example .env
make observe-up
# MCP server:  http://localhost:8000/mcp
# Grafana:     http://localhost:3000  (traces + metrics, no login)
make observe-down

Deploy to Kubernetes

helm install fastmcp k8s/helm/ \
  --set image.tag=v1.0.0 \
  --set externalSecrets.secretStoreName=your-vault-store \
  --namespace mcp-system \
  --create-namespace

Adding a New Tool

Three steps:

1. Create the tool in src/server/tools/your_tool.py:

from ..config.security import require_allowlist
from ..db.pool import get_pool
from ..observability.instrument import instrument_tool

@instrument_tool("your_tool_name")   # outermost — metrics + trace span
@require_allowlist("your_tool_name") # inner — blocks if not in allowlist
async def your_tool_name(param: str) -> dict:
    row = await get_pool().fetchrow(
        "SELECT * FROM your_table WHERE id = $1", param
    )
    return dict(row) if row else {}

2. Register it in src/server/main.py:

from .tools import your_tool
mcp.add_tool(your_tool.your_tool_name)

3. Add it to the allowlist in config/allowlist.yaml:

allowed_tools:
  - your_tool_name

What This Is Not

This template does not include:

Application-specific business logic — the tools (search_records, get_record_detail, get_statistics) are placeholders. Replace them with your domain.
A specific LLM or agent framework — it is the tool-serving layer. Pair it with LangGraph, LangChain, or any MCP-compatible agent.
A database migration tool — schema versioning (Alembic, Flyway) is intentionally out of scope. Add the migration tooling your team already uses.

Project Structure

fastmcp-production-template/
├── src/server/
│   ├── main.py                     # FastMCP entry point, lifespan hooks
│   ├── config/
│   │   ├── settings.py             # Pydantic Settings (env / .env)
│   │   └── security.py             # Allowlist loader + @require_allowlist
│   ├── db/
│   │   ├── connection.py           # asyncpg DatabasePool
│   │   └── pool.py                 # Module-level singleton
│   ├── observability/
│   │   ├── telemetry.py            # OTel setup — OTLP (prod) or console (dev)
│   │   ├── context.py              # Telemetry singleton — get/set
│   │   └── instrument.py           # @instrument_tool decorator
│   └── tools/
│       ├── search.py               # search_records
│       ├── detail.py               # get_record_detail
│       ├── stats.py                # get_statistics
│       └── health.py               # get_pool_status (no allowlist)
├── collector/
│   ├── adot-aws.yaml               # → AWS X-Ray + CloudWatch
│   ├── otelcol-grafana.yaml        # → Grafana Tempo + Prometheus
│   └── otelcol-datadog.yaml        # → Datadog
├── config/
│   └── allowlist.yaml
├── docker/
│   ├── Dockerfile
│   ├── docker-compose.yml          # server + postgres
│   ├── docker-compose.observe.yml  # adds Grafana LGTM stack
│   └── init.sql
├── k8s/helm/
│   ├── values.yaml
│   └── templates/
├── tests/
│   ├── conftest.py                 # mock_telemetry fixture
│   ├── test_security.py
│   ├── test_telemetry.py           # context + @instrument_tool + setup_telemetry
│   ├── test_tools.py               # tool behaviour, mocked DB
│   └── test_integration.py        # real DB — skipped unless DATABASE_URL is set
└── .github/workflows/
    ├── ci.yml                      # lint → unit tests → integration tests → docker
    └── release.yml

Summary

Problem	Solution
DB connections deadlock under concurrent load	`asyncpg` pool, min/max bounded, module-level singleton
Prompt injection via unauthorized tool calls	YAML allowlist + `@require_allowlist` decorator
Metrics defined but never recorded	`@instrument_tool` decorator + `context.py` singleton
Pool size gauge silent	Late-bound callback registered at gauge creation
Locked to one observability backend	OTLP app + `collector/` configs for ADOT, Grafana, Datadog
No local observability stack	`make observe-up` — full Grafana LGTM in one command
No real test coverage	28 unit tests + 21 integration tests, 82% coverage
Kubernetes complexity	Helm chart — HPA + ESO + optional ADOT sidecar

GitHub: ManjunathGovindaraju/fastmcp-production-template — MIT license, fork freely.

About the author: Manjunath Govindaraju — Principal Software Engineer with 23 years building production systems. Currently focused on AI platform engineering: multi-agent orchestration (LangGraph), MCP servers, async data pipelines, and enterprise Kubernetes deployments.

LinkedIn · GitHub

Forem: Manjunath G

Building a Reliable LangGraph Workflow: Plan-Execute-Validate (PEV), Automated Retries, and MCP Integration in One Template

What is PEV and Why Does it Need a Production Template?

The Architecture

System Overview

State Machine

Problem 1: Silent Failures in 2-Node Workflows

Problem 2: Automated Retries and Replanning

Request Lifecycle

Problem 3: Multi-Model Cost Optimization

Problem 4: MCP Integration for Enterprise Capabilities

The Audit Trail: Observability at the Step Level

Getting Started

Full Configuration Reference

Project Structure

Testing

What This Is Not

Building a Production-Ready MCP Server: Async PostgreSQL, OpenTelemetry, and Kubernetes in One Template

What Is MCP and Why Does It Need a Production Template?

Problem 1: Async PostgreSQL Connection Pooling

Problem 2: Prompt Injection via Tool Invocation

Problem 3: Observability

The fix: @instrument_tool

Four metrics recorded on every call

The pool size gauge — a subtle fix

OTLP or console — controlled by env var

Multi-backend routing via collector configs

Local observability in one command

Problem 4: Kubernetes Deployment

Production Safeguards in Settings

Testing

Getting Started

Two minutes with Docker Compose

With full observability

Deploy to Kubernetes

Adding a New Tool

What This Is Not

Project Structure

Summary

The fix: `@instrument_tool`