Forem: Wojciech Wentland

I built a read-only MCP server for Akamai

Wojciech Wentland — Wed, 29 Apr 2026 06:00:00 +0000

I had 200+ CDN properties in Akamai and an agent that couldn't find any of them. Akamai's Property Manager API lists properties by group and contract, but there's no fuzzy search endpoint. If the agent doesn't know the exact property name or ID, it's stuck. The conversation dead-ends with "I couldn't find that property" and the user goes back to the Akamai control panel.

So I built an MCP server that wraps Akamai's APIs. 16 tools for searching properties, browsing EdgeWorker code, querying DNS zones, inspecting network lists, and translating error codes. All read-only. I wrote about why I only build read-only MCP servers separately.

Property search with a preloaded index

Akamai organizes properties under groups and contracts. To search across all of them through the API, you'd iterate every group-contract pair and list properties one by one. Slow, and no fuzzy matching.

The server preloads every property into an in-memory index at startup. It fans out API calls across all group-contract pairs in parallel, deduplicates, and builds a list of names. rapidfuzz handles the matching with WRatio as the scorer. WRatio tries multiple comparison strategies (ratio, partial ratio, token sort, token set) and picks the best one, weighted by string length differences. Slower than a simple ratio, but it means "checkout config" matches "checkout.example.com - Production" without the agent needing to know the exact naming convention.

On a real account with 95 groups and 263 properties, the index loads in about 3 seconds. After that, searches hit memory with zero API calls.

One thing I hit early: fanning out 95 concurrent requests without any throttling. Akamai's PAPI has rate limits, and a burst that size at startup can trigger 429s. The server caps concurrency with a semaphore, 10 requests at a time. Still fast enough, no rejected requests.

The index refreshes every 5 minutes in a background task. I described this pattern in Your MCP server is not an API adapter.

EdgeWorker code browsing

Akamai EdgeWorkers are serverless functions that run on CDN edge nodes. The code is stored as tgz archives containing main.js, bundle.json, and supporting files. To read a file, you download the archive, extract it, and find what you need. Doing that on every tool call would be slow.

The server downloads the bundle once, extracts all files into memory, and caches them with cachetools.TTLCache. 1-hour TTL, max 50 entries. After the first download, the agent can list files, read by line range, and search with regex. No repeat downloads.

When the agent asks "what does the main.js of EdgeWorker X look like?", the first call takes a second or two. Follow-up questions like "search for the routing logic" or "show me lines 50-80" are instant.

I considered caching to disk, but these bundles are small (usually under 100KB). Keeping them in memory avoids filesystem management and the cache evicts automatically when TTL expires or the LRU limit hits. The tradeoff is bundles disappear on restart, but the reload is cheap enough that it doesn't matter.

Response shaping

Akamai property rule trees can be hundreds of KB. A typical production property has nested rules with behaviors, criteria, and options. Sending the full JSON wastes context.

The server strips the rule tree before returning it. Keeps rule names, match criteria, behavior configs, and the recursive structure. Removes template UUIDs, format versions, and other internal metadata the agent doesn't need. Property details, activations, and DNS records get the same treatment.

This is more aggressive than just dropping null fields. The raw rule tree has UUIDs on every node, template links, criteria satisfaction mode flags, locked indicators. None of that helps an agent answer "what caching rules are set for this property?" Stripping it cuts the response to maybe a third of the original size.

EdgeGrid auth from scratch

Akamai uses EdgeGrid for API authentication. There's an official edgegrid-python library, but it wraps requests (sync). I wanted httpx (async) with connection pooling, so the server implements EdgeGrid signing directly: HMAC-SHA256 over a canonical request string, base64-encoded, attached as an Authorization header. About 40 lines.

The signing is straightforward from the public spec. The annoying part is that the query string must be included in the signed data, so you have to build the full URL with parameters before signing, then make the request with that same URL.

What the agent can do

With 16 read-only tools, an agent can answer:

"Which CDN property handles checkout.example.com?"
"What caching rules are configured for the API property?"
"Show me the main.js from the latest EdgeWorker version"
"Search the EdgeWorker code for references to the auth header"
"What DNS records exist for example.com?"
"Which IPs are in the production allowlist?"
"What does Akamai error code 9.6f64d440.1318965461.2f2b078 mean?"

Right now these all require the Akamai control panel.

Setup

Add to Claude Code:

claude mcp add akamai -e AKAMAI_HOST=your-host.akamaiapis.net -e AKAMAI_CLIENT_TOKEN=akab-xxx -e AKAMAI_CLIENT_SECRET=xxx -e AKAMAI_ACCESS_TOKEN=akab-xxx -- uvx readonly-mcp-akamai

Create a read-only API credential in Akamai's Identity and Access Management panel. Source and docs: readonly-mcp-akamai on GitHub.

AI coding agents compressed the feedback loop from hours to seconds. I wrote about why that compression looks a lot like the variable-reward patterns behind slot machines and social media.

Wojciech Wentland — Mon, 13 Apr 2026 17:12:41 +0000

Wojciech Wentland

Apr 13

Your coding agent is a slot machine

#ai #productivity #programming #psychology

Comments

4 min read

Your coding agent is a slot machine

Wojciech Wentland — Mon, 13 Apr 2026 08:16:25 +0000

Programming used to have a speed limit. You wrote code, fought the compiler, debugged, tested, and eventually deployed. The hit of satisfaction came when the feature shipped or the tests went green. That cycle took hours. Sometimes days. The delay regulated behavior.

You couldn't binge on deploy-satisfaction because the loop was too slow. The friction was structural. Nobody pulled 14-hour days writing Java servlets because the reward came too fast. If anything, people quit too early because it took too long.

AI coding agents removed the speed limit. I'm using "slot machine" as a behavioral metaphor here, not a clinical diagnosis.

Thirty seconds

Prompt. Result. Satisfaction. Next prompt. The entire arc of "I had an idea, I built it, I saw it work" now fits inside 30 seconds. And it repeats. Indefinitely.

A 2023 paper in Addictive Behaviors by Clark and Zack lays out why this matters. Two factors make non-drug activities addictive: reward variability (you don't know exactly what you'll get) and frequency (how many reward cycles you can fit into a unit of time). They call the second one temporal compression. Social media has both. Loot boxes have both. Slot machines are the textbook case.

AI coding agents have both. Each prompt returns something slightly different (variability). And you can run dozens of cycles per hour (compression). The paper's conclusion is blunt: "By enabling near limitless diversity and speed of delivery of non-drug rewards, digital technology has permitted engineering of reinforcers with addictive potential that, delivered under natural conditions, would likely never become addictive."

They were writing about gambling and social media. They could have been writing about your terminal.

Eight tabs, eight reward streams

Running multiple agent sessions in parallel looks like multitasking. It's actually multiple concurrent reward streams. A study analyzing over a million social media posts found that people adjust their posting frequency to maximize the rate of likes they receive, the same way animals in a Skinner box adjust lever presses to maximize food pellets. Agent tabs work on the same principle. Every switch carries a chance that something finished. A small hit.

The sessions don't compete for your attention. They take turns feeding it.

I have about twenty terminal tabs open right now. Some are from days ago, left around because I'll probably get back to them. Five or six are active. One is pulling together an infrastructure report, one is waiting on CI after fixes it wrote itself, one is something I opened mid-sentence to chase a bug that came up while I was reviewing something else. While writing this paragraph I switched tabs twice to check if a build passed. It doesn't feel like a problem while it's happening.

I keep seeing people call this "an assembly line of productivity." An assembly line runs whether you're paying attention or not. You just keep feeding it. Nobody describes their relationship with a useful tool that way.

Parallel sessions and rapid context switching get marketed as productivity features. Wanting novelty, trying things quickly, jumping between five terminals. But the behavior this produces looks a lot like the variable-reward patterns behind checking your phone 80 times a day.

The meta-tool trap

People are building elaborate systems to manage the chaos of their agent-assisted workflow. Productivity hubs with skill trees. XP points. Urgency scores. Daily summaries that rate your day out of 10. Automatic task splitting when you miss a deadline.

They gamified the thing that was already acting like a game.

The meta-work around the compulsive work becomes its own loop. Hit from the agent completing a task. Hit from watching the XP bar move. Hit from the daily score. And then you need a system to manage that system, and at some point you're four layers deep in productivity tooling and haven't shipped anything in a week.

Starting is the drug

The first hour of a new project has the highest novelty density. Agents make that phase unusually cheap. Everything is possible, nothing is broken yet, and the code just keeps coming.

Finishing is edge cases. Tests for the boring paths. The last 20% that takes 80% of the time. Reward density drops off a cliff. So you open a new tab and start something else.

You see the pattern everywhere once you look. Bursts of intense output followed by nothing. Somebody produces a hundred pieces of content in six weeks, then drops to zero. Twenty projects in various stages of beginning, none shipped.

And building the thing is the easy part. Finding users, handling support, keeping the service running at 3am, writing docs nobody reads, negotiating contracts, doing the marketing that actually brings people in. None of that gives you a dopamine hit, and none of it fits in a 30-second prompt cycle. The agent can scaffold an app in an afternoon. It can't make anyone care about it. The work that makes a product real is exactly the work that the reward loop skips over.

The behavior tracks novelty, not value. And since starting costs almost nothing now, you run out of interest before you run out of ideas.

The accidental guardrail

The most revealing thing about these tools is the feature nobody asked for: usage limits.

Usage caps end up acting as a hard stop. That's a strange role for a productivity tool, but it's clearly how some people experience these products. Claude's own usage limit docs discuss how to work within the caps. Some users describe the cap running out as the thing that makes them stop for the night. In a r/ClaudeAI thread, one person described deliberately downgrading their plan so it would expire before midnight. Recognized the pattern, reintroduced friction on purpose.

A productivity tool where the most effective safety feature is running out of capacity.

Your text editor doesn't need a cooldown timer. Nobody ships an IDE with "take a break" reminders. Productivity tools don't usually come with harm reduction features.

The token limit is a circuit breaker. Most of the discourse around it is people asking how to get rid of it.

Why I only build read-only MCP servers

Wojciech Wentland — Thu, 09 Apr 2026 13:43:42 +0000

Every MCP server I build is read-only. List, search, get, read. No create, update, delete, activate, purge.

I've been running Claude Code with --dangerously-skip-permissions in environments where the agent has no write-capable MCP tools and no direct path to mutate production systems. I haven't had a single unwanted action against a production system in months. Not because I trust the model to never hallucinate. Because the tools it has access to can't turn a hallucinated action into a real API write.

Read-only doesn't make an agent safe. It removes an entire class of failures.

The failure mode isn't hypothetical

There's a post on r/ClaudeCode where Claude suggested tearing down a GPU instance, then executed it. The user never confirmed. The model said "tear down the H100 too," treated its own suggestion as user confirmation, and destroyed a running instance with hours of cached build artifacts and compiled kernels on it.

The model later admitted: "I hallucinated you saying that. You never said those words. I said it, then executed it as if you'd agreed."

If that agent had read-only tools, it would have read the instance list, maybe suggested tearing something down, and then... nothing. The suggestion dies as text. No one loses a machine.

How I actually use agents

My workflow with Claude Code looks like this: I ask it to investigate something. It reads logs, searches code, pulls data from MCP servers, and comes back with an analysis. If the analysis leads to an action — creating a Jira ticket, updating a config, deploying a change — Claude drafts it. I review the draft, then I do the action myself.

The agent reads and analyzes. I act.

I trust the model's judgment on what to write in a ticket. The problem is it sometimes hallucinates that I asked it to do something I didn't. If the tool is read-only, the worst that happens is it reads data it was going to read anyway. If the tool has write access, the worst that happens is the Reddit post above.

Approval fatigue is the real problem

"But there's a confirmation prompt before destructive actions." Sure. Claude Code asks before running commands. The problem is approval fatigue. After confirming 50 read operations, you stop reading the prompts. You click yes. And then the 51st one is vastai destroy instance 34122719.

Anthropic wrote about this in their sandboxing post. They found that constant permission prompts paradoxically reduce security because users stop paying attention. Their solution was sandboxing: restrict what the agent can access so you don't need to ask as often. They reduced permission prompts by 84% while maintaining security.

Read-only MCP servers follow the same logic. If the server can't write, you don't need to confirm writes. The agent operates freely within the read boundary. No fatigue, no missed confirmation on a destructive action.

That's why I run --dangerously-skip-permissions. It sounds reckless until you realize the agent's entire toolkit is read-only. There's nothing dangerous to skip permission for.

What this doesn't cover

Read-only MCP servers are one boundary, not a complete agent security model. If you also give the agent bash access, cloud CLIs, kubectl, or production credentials through other channels, this design won't save you. Claude Code with --dangerously-skip-permissions can still run shell commands, edit files, and interact with whatever's reachable from the host. Anthropic's own documentation recommends using isolated environments when running in bypass mode, and their sandboxing approach combines filesystem isolation, network restrictions, and permission controls — not just tool-level restrictions.

This article is about the MCP boundary specifically. For me, that boundary matters because my agents talk to external systems almost exclusively through MCP. But it's one layer, not the whole stack.

Beyond the IDE

There's another reason I care about read-only MCP servers: they're portable. My workflow is Claude Code today, but the same servers work in any agent system that speaks MCP.

In a headless agent system — one where there's no human in the loop and no bash shell — the MCP boundary isn't just one layer. It's the only interface the agent has to external systems. If every MCP server it can reach is read-only, the agent literally cannot mutate production state. No sandboxing needed, no permission prompts, no approval fatigue. The tools themselves are the guardrail.

This matters if you're building agent systems for other users. Giving all users read access to your CDN config, build logs, or DNS records is usually fine. Giving all users write access is a different conversation entirely. Read-only MCP servers let you expose data to agents at scale without worrying about what happens when one of them hallucinates an action.

What read-only servers are good for

I run MCP servers for CDN management, CI/CD, log aggregation, DNS, and incident management. All read-only. The questions I ask look like: "What's the current CDN config for checkout?" "Which build failed last night?" "Compare caching rules between production and staging." "Draft a Jira ticket for the DNS change we discussed."

Claude produces the draft text. I copy it into Jira or GitHub myself. Nothing in this workflow needs the agent to write to the target system.

The credential argument

Getting a read-only API credential approved is a conversation. "I need read access to the CDN config API for an AI assistant that helps engineers investigate issues." Most teams say yes.

Getting a write credential is different. "I need an AI agent to be able to modify CDN configurations." That's a meeting, a security review, a discussion about rollback procedures, and probably a "no" or a "let's revisit in Q3."

Read-only credentials have a smaller blast radius and a simpler approval process. They also happen to cover every use case I actually have.

What this means for MCP servers

Every MCP server I publish follows this: read-only by design. The MCP security best practices describe scope minimization as a core principle. Start with the minimum privileges, elevate only when required. My servers don't elevate.

If someone opens a GitHub issue asking for write tools, the answer is: "This server is intentionally read-only. Fork it if you need write operations." That's not laziness. It's a design decision about what I want an AI agent to be able to do when it hallucinates an action at 3am.

I'm planning a series of production-ready read-only MCP servers for various platforms. More on that soon.

Your MCP server is not an API adapter

Wojciech Wentland — Wed, 08 Apr 2026 19:59:13 +0000

A lot of MCP servers I see in the wild look like this:

@mcp.tool()
async def get_thing(id: str):
    resp = await httpx.get(f"https://api.example.com/things/{id}")
    return resp.json()

Fetch, forward, done. A thin HTTP proxy with a JSON Schema wrapper. For some
use cases, that's enough.

The servers I keep coming back to do something different. They hold state and
pre-compute answers. An agent hitting a thin wrapper might need three round
trips and 30 seconds. The same agent hitting a server that does real work gets
its answer in one call, under a millisecond.

Preloaded in-memory index

Here's a failure mode I run into constantly: the agent needs to find something
but doesn't know the exact ID. Most APIs only support exact lookups. No ID, no
result. The conversation dead-ends with "I couldn't find that resource" and the
user gives up.

I built a server that wraps a CDN management API. Hundreds of properties, and
the agent regularly needs to find which one handles a given hostname. The API
has a search endpoint, but it's slow, requires exact matches, and sometimes
returns 403 depending on account permissions.

So the server loads every property into memory at startup:

class PropertyIndex:
    _entries: list[PropertyEntry] = field(default_factory=list)
    _name_index: dict[str, int] = field(default_factory=dict)

    async def load(self, refresh_interval: int = 300):
        await self._build_index()
        self._refresh_task = asyncio.create_task(self._refresh_loop())

    def search_by_name(self, query: str, limit: int = 50):
        names = [e.property_name for e in self._entries]
        matches = process.extract(
            query, names, scorer=fuzz.WRatio,
            limit=limit, score_cutoff=50,
        )
        return [self._entries[idx].to_dict() for _, score, idx in matches]

Builds once by fanning out parallel API calls, deduplicates, refreshes every
five minutes in the background. Lookups take under a millisecond.

Without this, the agent guesses at exact property names, picks the wrong one,
retries, burns three turns. With the index, someone types "the CDN config for
checkout" and gets the right answer first try. That's the kind of difference
that decides whether people keep using the agent or go back to doing it
manually.

I did the same thing for a CI/CD server. The API lets you fetch a build config
by ID, but there's no fuzzy search. If you don't know the ID, you're stuck. The
server caches all build configurations at startup, runs fuzzy matching against
them. The agent says "find the deploy job for the payments service" and gets a
ranked list instantly, even though the CI system itself can't do that.

Embedded analytical database

I have another server that sits in front of a relational database. Some tables
have 20 million rows. The agent needs to answer analytical questions, things
like "which providers have the highest volume in this region?" or "show me the
top performers for a given category."

The database wasn't designed for these queries. It was built for a web UI with
narrow, well-indexed lookups. The agent's access patterns are different: it asks
broad analytical questions that require joins across tables the application
never joins. Adding indexes wasn't an option either, because the database is
owned by another team and optimizing it for an AI agent's query patterns wasn't
on anyone's roadmap. Some of these queries took 10 to 30 seconds on a read
replica, and in an agent loop where that latency gets multiplied by however many
tool calls the agent needs, the conversation times out before it gets anywhere.

The server embeds DuckDB in-process and loads pre-aggregated views and lookup
tables at startup. Some are straight copies of small reference tables. Others
are materialized summaries that flatten joins the source database was never
designed to run efficiently, the kind of cross-table aggregations that make
sense for an analytical question but would be expensive on a schema built for
transactional web UI lookups:

class DuckDBCache:
    async def start(self):
        self._conn = duckdb.connect(":memory:")
        for key, config in fast_configs.items():
            await self._load_table(key, config)
        self._ready = True
        self._deferred_task = asyncio.create_task(
            self._load_deferred(deferred_configs)
        )
        self._refresh_task = asyncio.create_task(self._refresh_loop())

Each table has a fingerprint query (a cheap COUNT(*) or checksum) that the
refresh loop checks before doing a full reload. Large tables load in the
background after the server is already taking requests. If something asks for a
table that hasn't loaded yet, it falls back to the source database.

The 30-second query now takes under a millisecond. The agent can actually have a
back-and-forth with the user instead of timing out after the first question.

There's a query-result cache on top of this too. It has a prewarm manifest,
basically a list of common queries that run at startup so the first person to
use the agent on Monday morning doesn't sit through a cold start.

class QueryCache:
    async def get_or_compute(self, cache_key, compute_fn, ttl=None):
        cached = self.get(cache_key)
        if cached is not None:
            return cached

        result = await compute_fn()
        if "error" not in result:
            self._put(cache_key, result, ttl or self._default_ttl)
        return result

It skips caching error responses. If a query fails because the database is
temporarily overloaded, you don't want that failure served for the next hour.
That one took a production outage to figure out.

Data transformation

Every server I build strips the upstream API response before returning it.
Token usage scales with response size, and most APIs return 10x more data than
the agent will ever look at.

One API I work with returns objects with 60+ fields. The server keeps maybe 8:

def _slim_record(r: dict):
    return _strip_nulls({
        "id": r.get("id"),
        "name": r.get("name"),
        "total_value": _cents_to_major(r.get("total_value_cents")),
        "annual_value": _cents_to_major(r.get("annual_value_cents")),
        "start_date": r.get("start_date"),
        "end_date": r.get("end_date"),
        "status": _effective_status(r),
    })

_cents_to_major converts cents to dollars. The raw API stores monetary values
in cents. Before I added this conversion, 100% of the reports the agent
generated had wrong numbers. Every dollar amount was off by a factor of 100. A $2,000 contract showed up as
$200,000 in the report because the agent treated cents as dollars. No amount of prompt engineering fixed it reliably. Moving the conversion
into the server did.

_effective_status is the other one worth mentioning. The API's status field
can say "active" on a record that ended three months ago. The platform's own UI
derives the real status from multiple fields, so the MCP server does the same:

def _effective_status(r: dict) -> str:
    stage = r.get("stage")
    if stage in ("terminated", "not_renewed"):
        return "inactive"

    if r.get("end_date_not_applicable") or r.get("renewal_type") == "perpetual":
        return r.get("status", "undetermined")

    end_date = r.get("end_date")
    if end_date:
        if date.fromisoformat(end_date) < date.today():
            return "inactive"

    return r.get("status", "undetermined")

Now the agent gives the same answer a human would get looking at the UI.
Stripping nulls across a list of 50 records also saves a few thousand tokens
per response, which adds up.

A log aggregation server I built does something similar: auto-appends
| json auto to queries that don't have a field extraction operator, truncates
raw log lines to 500 characters, converts epoch-millisecond timestamps to
ISO 8601. Small fixes that add up to the agent not wasting turns fighting the
format.

Download once, serve from cache

Some data is expensive to fetch. PDF documents. Code bundles in tgz archives.
The pattern: download on first access, extract the text, build a line offset
index, serve everything from memory after that.

class CachedFile:
    def __init__(self, content: str):
        self.content = content
        offsets = [0]
        pos = 0
        while True:
            pos = content.find("\n", pos)
            if pos == -1:
                break
            pos += 1
            if pos < len(content):
                offsets.append(pos)
        self._offsets = offsets

    def get_lines(self, start: int, end: int) -> str:
        return self.content[self._offsets[start-1] : self._line_end_offset(end)]

I use this for CDN edge function code bundles and PDF documents (extracted with
PyMuPDF). After the first download, the agent reads by line range, searches
with regex, lists the file tree. No repeat downloads. Reading through a
200-page document becomes "just read" instead of "download, extract, read" on
every question.

When thin is fine

Not everything needs this treatment. A server that translates natural language
to a query language and passes it to an API is fine as a thin wrapper. The
translation is the value there. Same for simple lookup tools.

The question I ask: does the agent hit the same data twice? Does the API return
more than the agent needs? Is the API response time slow enough that the agent
loop feels broken? If yes, the server should be doing work.

The multiplier

When a person uses a web UI, they look at a page, think, click something else.
One request at a time, processed by a human brain. An agent works differently.
It makes five tool calls, stuffs all five responses into its context window, and
reasons over them at once. A slow response gets multiplied by every call. A
60-field JSON blob gets multiplied by every call. It adds up fast.

I've measured the difference. CDN property lookups went from three agent turns
to one once the fuzzy index was in place. Analytical queries went from timing
out at 30 seconds to returning in under a millisecond from DuckDB. And every
single dollar amount in every report was wrong until the server started
converting cents for the agent.

You can try to fix that last one with prompt engineering. I tried for weeks. The agent still got it wrong often enough that I couldn't trust the output.