Forem: Manoir Yantai

Optimizing API Performance with Connection Pooling

Manoir Yantai — Tue, 26 May 2026 12:47:19 +0000

Stop treating database connections as disposable JSON objects. Every connection handshake wastes CPU cycles and introduced latency. Connection pooling reuses established TCP connections, reducing overhead and increasing throughput. This is non-negotiable for any performance-sensitive API.

The default approach in many frameworks opens a new connection per request. That means three-way handshakes, authentication, and TLS every single time. For high-traffic endpoints, this kills response times and collapses under load. Connection pooling maintains a set of persistent connections, borrowing and returning them on demand. The pool handles lifecycle management—idle connections are kept alive, and failed ones are replaced.

But pools aren't magic. Misconfigured pools cause more harm than no pool at all. Common mistakes: too few connections queue requests, too many exhaust database resources, and no timeout lets stale connections hang forever.

Here’s a production-ready pool setup in Go using database/sql—which already includes a built-in pool:

import (
    "database/sql"
    "time"
)

func initDB() *sql.DB {
    db, err := sql.Open("postgres", dsn) // zero connections until used
    if err != nil {
        log.Fatal(err)
    }

    // Pool configuration
    db.SetMaxOpenConns(25)             // max concurrent connections
    db.SetMaxIdleConns(5)              // keep at least 5 idle
    db.SetConnMaxLifetime(30 * time.Minute) // recycle connections
    db.SetConnMaxIdleTime(5 * time.Minute)  // close unused idle

    return db
}

This snippet limits total connections, prevents idle bloat, and forces periodic refresh. Tune these values based on your database server limits and traffic patterns. Start conservative—adding connections is cheaper than recovering from resource exhaustion.

Now, apply this to your API handlers. Every request uses the same *sql.DB instance. The pool handles concurrency safely. No handshake overhead per request.

Beyond basic setup, monitor pool health: db.Stats() gives InUse, Idle, and WaitCount. Spikes in WaitCount mean you need more SetMaxOpenConns. High InUse with low Idle suggests transactions holding connections too long—check row iteration or slow queries.

Connection pooling isn’t limited to databases. Redis, gRPC, and HTTP clients all benefit. Standardize pool usage across your stack for consistent performance.

Final rule: never open connections in request handlers. Use pools at the service layer. Your response times will thank you, and your database servers will stay responsive under peak load. Start with the defaults, measure, adjust, and move on to real problems.

Why Your Connection Pool Is Starving Under Load

Manoir Yantai — Tue, 26 May 2026 12:46:48 +0000

You deployed the feature. Load testing looked fine. Traffic spiked, latency climbed, and your database started dropping connections. The first instinct is to scale the app tier. Don’t. The bottleneck isn’t compute. It’s your connection pool.

Most developers treat connection pools as a set-and-forget configuration. You slap a default pool size in your ORM, bump the timeout, and move on. That works until it doesn’t. Connection pooling isn’t magic. It’s a finite resource with hard limits, and misconfiguring it guarantees degraded throughput under real-world concurrency.

The core problem is misunderstanding what max_connections actually controls. It doesn’t scale with your worker threads. It doesn’t auto-tune based on query complexity. It’s a hard ceiling. When every incoming request needs a database handle and your pool is exhausted, your application doesn’t queue gracefully. It blocks. Threads pile up. Memory bloats. Eventually, you hit OS limits or trigger circuit breakers.

Let’s look at the math. If your API runs sixteen worker processes and your pool size is twenty, you’re already fighting for resources. Add background jobs, health checks, and admin endpoints that also hit the database, and starvation is guaranteed. The fix isn’t just increasing the number. It’s aligning pool capacity with actual query concurrency, not request concurrency.

Most frameworks default to pool sizes between ten and twenty. That’s acceptable for local development. It’s terrible for production. A functional baseline starts with (CPU cores * 2) + effective disk spindles for traditional databases, but you must cap it based on your database’s hard connection limit and your application’s actual concurrent query profile. Measure. Don’t guess.

Teams routinely set the pool size but ignore connection validation and idle timeout. A stale connection sitting in the pool will throw a socket error when checked out. You retry. You leak. You exhaust the pool faster. You need active validation on checkout, not just on creation.

Here is the correct baseline configuration for a Node.js PostgreSQL client. It replaces static defaults with explicit lifecycle controls:

const pool = new Pool({
  max: 30, // Aligned to DB max_connections and verified concurrency
  idleTimeoutMillis: 10000, // Reclaim idle handles aggressively
  connectionTimeoutMillis: 1000, // Fail fast instead of hanging
  maxUses: 7500, // Force rotation before DB-side limits accumulate
  allowExitOnIdle: false // Keep pool alive across hot reloads
});

pool.on('error', (err) => {
  // Catch async network drops that checkout won't surface
  logger.error('Pool connection error', err);
});

Notice maxUses. PostgreSQL and MySQL both track connection lifecycle limits. Forcing rotation prevents stale state accumulation. The error listener catches asynchronous network drops that a standard pool.connect() call won’t surface. Without it, broken sockets linger in the active set until they poison a production transaction.

Beyond configuration, audit your code for connection leaks. Every manual checkout requires a release() in a finally block. ORMs abstract this, but raw queries and transaction wrappers bypass safeguards. If your pool usage metrics show a slow climb during low traffic, you’re leaking. Add pool event listeners. Log checkout duration. Alert when the waiting queue exceeds a hard threshold.

You also need to separate read and write pools. Mixing them guarantees contention. Writes block on row locks. Reads wait behind them. Route queries explicitly. Use a read replica pool with higher capacity and lower validation overhead. Keep the primary pool tight, validated, and reserved exclusively for mutations.

Stop treating connection exhaustion as a transient network glitch. A pool exhausted error is a capacity planning signal. Implement backpressure. Return a 503 Service Unavailable with a Retry-After header instead of letting your queue back up. Your downstream services will handle graceful degradation. Your users will notice the difference between a controlled throttle and a cascading failure.

Monitor four metrics: active connections, idle connections, waiting requests, and average checkout time. If checkout time exceeds your average query execution time, your pool is undersized or your queries are blocking. If waiting requests climb, you’re either leaking connections or your concurrency model doesn’t match your pool size.

Tuning a connection pool isn’t a deployment step. It’s a continuous feedback loop. Run load tests that simulate real traffic distributions, not synthetic spikes. Watch the pool metrics. Adjust. Validate. Repeat. Stop guessing. Start measuring. Your database won’t scale if your application chokes on its own resource management.

Test Publish

Manoir Yantai — Tue, 26 May 2026 12:44:23 +0000

This is a test.

AI Agents in Production: What Actually Works

Manoir Yantai — Tue, 26 May 2026 10:32:36 +0000

The hype around AI agents is deafening, but actual production deployments tell a different story. Most failures aren't from the LLM itself—they stem from poor orchestration, brittle tool chains, and lack of proper error handling. After shipping multiple agentic systems, here’s what actually works.

Reliability over intelligence

The first lesson: treat agents as distributed systems, not magic. Every LLM call can fail, hallucinate, or drift. Production agents must handle all three. The single most effective technique is structured output parsing with validation. Use libraries like Pydantic to enforce schemas on LLM responses, and reject malformed outputs before they touch any system resource.

from pydantic import BaseModel, ValidationError
import openai

class AgentAction(BaseModel):
    tool: str
    args: dict

def parse_action(response: str) -> AgentAction:
    try:
        return AgentAction.model_validate_json(response)
    except ValidationError:
        # fall back to retry or safe default
        return AgentAction(tool="fallback", args={})

This pattern catches hallucinated tool names or missing parameters early. Build exhaustive validators for every tool input.

Tool design patterns

Tools are the agent’s interface to the world, and they must be designed for failure. Always enforce idempotency where possible. If a tool call times out or returns 500, the agent should retry exactly once with backoff, then escalate. Use explicit tool contracts: describe not just what the tool does, but exactly what inputs it expects and what outputs it returns. Few-shot examples in tool descriptions improve invocation reliability by 40% in our benchmarks.

Avoid handing the agent raw SQL or shell access. Instead, wrap every tool with authentication, rate limiting, and input sanitization. Log every tool call with request ID, latency, and token count. These logs become the primary debugging surface.

State management done right

Agents accumulate context across turns, and that context balloons quickly. Naive append-only history kills performance. Implement a context policy that keeps only the last N exchanges, plus a summary of earlier turns. Use a cheap LLM call to compress history when it passes a threshold. For long-running agents, store session state in Redis with TTL, not in-memory. This allows horizontal scaling and recovery from crashes.

Memory separation is critical. Short-term working memory (recent history) should be separate from long-term knowledge (retrieved documents). Don’t dump both into the same prompt. Use vector storage for retrieval and keep recent turns as raw text. This reduces noise and improves grounding.

Observability is non-negotiable

Standard logging isn’t enough. You need to trace every agent decision: the prompt used, the output generated, the tool call made, the result received. Instrument your agent loop with structured logs that include latency, token counts, retries, and failure types. Store these in a searchable backend. When something goes wrong, replay the exact sequence of events.

Set up alerts for patterns that indicate degradation: increasing retries, falling back to default actions, or repeated tool errors. These leading indicators catch problems before users notice. Also track cost per agent run; unbounded token usage will bankrupt your budget.

Error recovery is a feature

Every agent path must handle errors gracefully. Implement a three-tier recovery: local retries for transient failures, tool fallbacks for persistent errors (e.g., if search fails, try a cached version), and escalation to a human handler when the agent is stuck. Define “stuck” explicitly: three consecutive failures, or uncertainty above a threshold. Escalation should be visible in the UI and logged for review.

Do not let the agent invent recovery strategies unless explicitly designed. Unsupervised creative failure handling is how you get billing agents emailing customer credit card numbers. Rigid recovery beats flexible chaos.

Testing for real conditions

Unit tests for tools are trivial. The hard part is testing agent behavior under ambiguous conditions. Build a simulation harness that replays production logs with slight perturbations—drop a tool response, delay a call, inject a hallucinated output. Measure if the agent reaches the correct final state despite these distortions. This catches fragility before it hits users.

A/B test agent prompts and tool configurations in production. Use canary deployments with 5% traffic and monitor success rates, latency, and user satisfaction. Roll out changes that improve all three, revert anything that degrades one.

What to skip

Don’t chase autonomous agents that “plan and execute” everything. Hierarchical reasoning introduces failure points and latency. Simplified reactive loops with structured tools and memory almost always outperform complex planners in production. Don’t micro-optimize prompt templates daily; standardize on one pattern and iterate slowly. Avoid expensive chain-of-thought calls unless you have concrete evidence they improve outcomes for your specific task.

AI agents in production work when you treat them as opinionated middleware, not general reasoning engines. Enforce structure, log everything, and recover explicitly. The rest is noise.

Test Post - Ignore

Manoir Yantai — Tue, 26 May 2026 10:21:01 +0000

This is a test post to verify Crier is working correctly.

vibe-coding-realtime

Manoir Yantai — Sun, 24 May 2026 14:35:08 +0000

vibe-coding-universal 实时对话版（英文）

Reddit r/programming (标题党风格)

Title: I made a tool that forces AI to ask questions before writing any code

Been using vibe coding for a while. The biggest problem? AI jumps straight into generic layouts the moment you say "build me a dashboard." No questions, no clarification, just mediocre code.

So I built something that makes AI stop and ask: What's your brand? Who are your users? What does "good" actually look like to you?

It runs 7 rounds of dialogue first. Then you get a design spec with exact CSS tokens from 71 real design systems (Linear, Stripe, Airbnb, Apple, etc.). Then it generates the full build spec before a single line of code gets written.

Most people I've shown this to either love it or think it's over-engineered. I'm in the love it camp.

GitHub: https://github.com/mage0535/vibe-coding-universal

Install is literally: curl -fsSL https://raw.githubusercontent.com/mage0535/vibe-coding-universal/main/install.sh | bash

Anyone else built workflow tools to rein in AI coding assistants?

promo_hermes_current

Manoir Yantai — Sun, 24 May 2026 13:01:38 +0000

vibe-coding-universal 实时对话版（英文）

Reddit r/programming (标题党风格)

Title: I made a tool that forces AI to ask questions before writing any code

Been using vibe coding for a while. The biggest problem? AI jumps straight into generic layouts the moment you say "build me a dashboard." No questions, no clarification, just mediocre code.

So I built something that makes AI stop and ask: What's your brand? Who are your users? What does "good" actually look like to you?

Most people I've shown this to either love it or think it's over-engineered. I'm in the love it camp.

GitHub: https://github.com/mage0535/vibe-coding-universal

Install is literally: curl -fsSL https://raw.githubusercontent.com/mage0535/vibe-coding-universal/main/install.sh | bash

Anyone else built workflow tools to rein in AI coding assistants?

promo_hermes_current

Manoir Yantai — Sat, 23 May 2026 15:01:14 +0000

vibe-coding-universal 实时对话版（英文）

Reddit r/programming (标题党风格)

Title: I made a tool that forces AI to ask questions before writing any code

Been using vibe coding for a while. The biggest problem? AI jumps straight into generic layouts the moment you say "build me a dashboard." No questions, no clarification, just mediocre code.

So I built something that makes AI stop and ask: What's your brand? Who are your users? What does "good" actually look like to you?

Most people I've shown this to either love it or think it's over-engineered. I'm in the love it camp.

GitHub: https://github.com/mage0535/vibe-coding-universal

Install is literally: curl -fsSL https://raw.githubusercontent.com/mage0535/vibe-coding-universal/main/install.sh | bash

Anyone else built workflow tools to rein in AI coding assistants?

promo_hermes_current

Manoir Yantai — Sat, 23 May 2026 14:00:46 +0000

vibe-coding-universal 实时对话版（英文）

Reddit r/programming (标题党风格)

Title: I made a tool that forces AI to ask questions before writing any code

Been using vibe coding for a while. The biggest problem? AI jumps straight into generic layouts the moment you say "build me a dashboard." No questions, no clarification, just mediocre code.

So I built something that makes AI stop and ask: What's your brand? Who are your users? What does "good" actually look like to you?

Most people I've shown this to either love it or think it's over-engineered. I'm in the love it camp.

GitHub: https://github.com/mage0535/vibe-coding-universal

Install is literally: curl -fsSL https://raw.githubusercontent.com/mage0535/vibe-coding-universal/main/install.sh | bash

Anyone else built workflow tools to rein in AI coding assistants?

promo_hermes_current

Manoir Yantai — Fri, 22 May 2026 15:00:34 +0000

vibe-coding-universal 实时对话版（英文）

Reddit r/programming (标题党风格)

Title: I made a tool that forces AI to ask questions before writing any code

Been using vibe coding for a while. The biggest problem? AI jumps straight into generic layouts the moment you say "build me a dashboard." No questions, no clarification, just mediocre code.

So I built something that makes AI stop and ask: What's your brand? Who are your users? What does "good" actually look like to you?

Most people I've shown this to either love it or think it's over-engineered. I'm in the love it camp.

GitHub: https://github.com/mage0535/vibe-coding-universal

Install is literally: curl -fsSL https://raw.githubusercontent.com/mage0535/vibe-coding-universal/main/install.sh | bash

Anyone else built workflow tools to rein in AI coding assistants?

promo_hermes_current

Manoir Yantai — Fri, 22 May 2026 14:00:32 +0000

vibe-coding-universal 实时对话版（英文）

Reddit r/programming (标题党风格)

Title: I made a tool that forces AI to ask questions before writing any code

Been using vibe coding for a while. The biggest problem? AI jumps straight into generic layouts the moment you say "build me a dashboard." No questions, no clarification, just mediocre code.

So I built something that makes AI stop and ask: What's your brand? Who are your users? What does "good" actually look like to you?

Most people I've shown this to either love it or think it's over-engineered. I'm in the love it camp.

GitHub: https://github.com/mage0535/vibe-coding-universal

Install is literally: curl -fsSL https://raw.githubusercontent.com/mage0535/vibe-coding-universal/main/install.sh | bash

Anyone else built workflow tools to rein in AI coding assistants?

promo_hermes_current

Manoir Yantai — Thu, 21 May 2026 15:00:49 +0000

vibe-coding-universal 实时对话版（英文）

Reddit r/programming (标题党风格)

Title: I made a tool that forces AI to ask questions before writing any code

Been using vibe coding for a while. The biggest problem? AI jumps straight into generic layouts the moment you say "build me a dashboard." No questions, no clarification, just mediocre code.

So I built something that makes AI stop and ask: What's your brand? Who are your users? What does "good" actually look like to you?

Most people I've shown this to either love it or think it's over-engineered. I'm in the love it camp.

GitHub: https://github.com/mage0535/vibe-coding-universal

Install is literally: curl -fsSL https://raw.githubusercontent.com/mage0535/vibe-coding-universal/main/install.sh | bash

Anyone else built workflow tools to rein in AI coding assistants?