Forem: gyorgy

MCP annotations are a UX layer, not a security layer

gyorgy — Tue, 05 May 2026 14:18:58 +0000

When the Model Context Protocol added tool annotations like readOnlyHint, destructiveHint, and idempotentHint, a lot of MCP server authors and host implementers read them as a permission system. The mental model goes something like: a tool declares itself destructive, the host sees that, and the host either prompts the user or refuses outright. Annotations as enforcement, the way file permissions work in a Unix filesystem.

That's not what they are. A tool annotation is a string the server author typed into a tool definition. The model sees it, the host sees it, and they can use it for confirmation prompts or sorting or color coding. Nothing in the protocol verifies the annotation is true. A server can declare readOnlyHint: true on a tool that drops your production database, and the protocol won't notice. The host can choose to trust the annotation or not, but the trust is a policy decision the host makes about the server, not something the protocol provides.

This distinction matters because the annotation system is being asked to carry weight it wasn't designed to carry. Two active spec proposals (SEP-1862 and SEP-1913) extend the annotation surface in useful ways. Neither of them changes what annotations fundamentally are. They make a UX layer better. They do not turn it into a security layer.

What annotations actually are

Annotations are server-declared hints. The server author writes them into the tool definition, the server sends them to the client in tools/list, and that's the entire chain of custody. There is no signature, no third-party verification, no model-side analysis of what the tool actually does. The annotation is exactly as trustworthy as the server that produced it.

The MCP specification is explicit about this. From the schema documentation: "All properties in ToolAnnotations are hints. They are not guaranteed to provide a faithful description of tool behavior... Clients should never make tool use decisions based on ToolAnnotations received from untrusted servers." That language is in the spec because the working group knows annotations are forgeable.

Justin Spahr-Summers, one of the MCP co-creators, raised the obvious question during the original review of the annotation system: if a client knows the annotations can't be trusted, what's the point of having them? It's the right question and the spec hasn't really answered it. The working answer in practice is that annotations are useful for two things. First, hosts can build better UX on top of them when the server is trusted (skip the confirmation prompt for a tool that declares itself read-only, render destructive tools in a different color, sort tools so safer ones are surfaced first). Second, hosts can use annotations as one signal among many when scoring how much to scrutinize a tool call.

Neither of those is enforcement. Both assume the host has already decided the server is honest. The annotation tells the host how to render the tool's intent, not whether to allow it.

The two SEPs in flight

Two annotation-related proposals are currently working through the MCP spec process, both authored or co-authored by Sam Morrow at GitHub.

SEP-1862 (Tool Resolution) addresses a real problem with static annotations: a single tool that takes an action argument and behaves differently based on its value has to declare itself destructive at all times, because the static annotation has to cover the worst case. A manage_files tool that supports both read and delete operations is forced to look as dangerous as its most dangerous mode, even on read calls. The fix is a new tools/resolve method, inspired by LSP's codeAction/resolve pattern. Before invoking the tool, the client asks the server: given these specific arguments, what are the real annotations? The server returns refined metadata for that call. Multi-action tools become viable again without sacrificing UX accuracy.

SEP-1913 (Trust and Sensitivity Annotations), co-authored with OpenAI, works on a different axis. Where existing annotations describe what a tool does, SEP-1913 adds annotations that describe what the data flowing through a tool means. New fields like sensitiveHint (low/medium/high), privateHint, maliciousActivityHint, and attribution let servers mark returned data with trust and sensitivity metadata, and let that metadata propagate through an agent session so a host can enforce policies like "do not send data marked private to tools marked open-world."

Both proposals fill genuine gaps. SEP-1862 unblocks a tool design pattern that was effectively forbidden by static annotations. SEP-1913 extends the annotation surface from what tools do to what data they handle, which is the right direction if you care about prompt injection and exfiltration.

What neither proposal changes is the trust model. SEP-1862's resolved annotations are still server-declared. SEP-1913's data annotations are still server-declared. A server that lies in tools/list can lie just as easily in tools/resolve or in a sensitiveHint field on returned content. The proposals make honest servers more expressive. They do not make dishonest servers detectable.

What this means for MCP server design today

If annotations are a UX layer, design your server so the UX layer stays accurate without depending on protocol-level enforcement.

The first decision is tool granularity. A multi-action tool with an action argument forces a worst-case static annotation, which means honest hosts will over-prompt and well-tuned models will steer around the tool because it looks dangerous. Until SEP-1862 lands, separate tools per action keep static annotations honest. One tool reads, one tool lists, one tool removes. Each declares its real shape and the annotation is true at all times. This costs you a few more tool definitions and saves the host from making bad UX decisions on your behalf.

The second decision is how to use the existing annotation fields. The boolean grid (readOnlyHint, destructiveHint, idempotentHint, openWorldHint) is independent flags rather than ordered tiers, but in practice tools cluster into three groups. Read-only tools (readOnlyHint: true). Mutating but recoverable tools (readOnlyHint: false, destructiveHint: false). Destructive tools (readOnlyHint: false, destructiveHint: true). Treating these as a tier internally simplifies host policy, even though the protocol doesn't enforce the structure. It also makes it obvious which tier a new tool belongs to when you add one, which matters at scale.

The third decision is what to do about the trust gap. The honest answer is that the protocol can't close it for you, so you close it elsewhere. Sandboxed execution, infrastructure-level egress controls, and third-party scanners (Snyk's Agent Scan is one example) sit outside the protocol and verify or constrain what tools actually do, regardless of what they claim. If your MCP server runs in a context where any of those layers exist, lean on them. The annotations on your tools should be honest, but the security boundary lives somewhere else.

What you should not do is treat annotation correctness as the security boundary. A server author who annotates carefully and a server author who lies look identical to the protocol. If your design assumes the host can tell them apart through annotations alone, you have a gap.

The actual security layer lives outside MCP

Once you accept that annotations are a UX layer, the question of where security actually lives becomes easier to answer. It lives in three places, none of them in the protocol.

The first is host-level policy on which servers to trust. The host decides which MCP servers it accepts tools from, what scopes those servers operate under, and what the user has approved. That's where the real allow/deny decision happens. Annotations help the host build clearer prompts and better defaults, but the host is the one accepting or rejecting the tool call.

The second is infrastructure-level enforcement. Sandboxed execution, network egress rules, filesystem permissions, container boundaries. These don't care what a tool's annotations say. A tool that claims to be read-only but tries to write outside its sandbox is stopped by the sandbox, not by the annotation. For any MCP server doing real work in production, this layer is where deletion, exfiltration, and lateral movement actually get prevented.

The third is third-party verification. Scanners that examine MCP server code or behavior independently of what the server claims. Snyk's Agent Scan is one example of this category, and more will appear as the ecosystem matures. These tools occupy the space the protocol can't, because by definition they treat the server as untrusted and verify rather than trust.

None of this makes annotations useless. Annotations let honest servers communicate intent, let hosts build interfaces that match that intent, and give users the right amount of friction at the right moments. SEP-1862 will make that signal sharper for multi-action tools. SEP-1913 will extend it to the data flowing through tools. Both are worth shipping.

Migrating off AWS App Runner before the April 30 deadline

gyorgy — Tue, 14 Apr 2026 14:11:34 +0000

AWS is shutting the door on App Runner for new customers effective April 30, 2026. If you're running production workloads on it, existing apps keep working for now, but there are no new features coming, and "maintenance mode" at AWS historically means "start planning your migration."

I just finished a migration off App Runner for a production Next.js frontend, and wanted to write down what I learned in case it's useful to anyone else facing the same deadline.

The options

AWS officially recommends ECS Express Mode as the direct App Runner replacement. It's a newer single-resource abstraction that auto-provisions an ECS cluster, service, ALB, security groups, auto-scaling, and CloudWatch logging. One Terraform resource, one deploy, done.

The other options:

Standard ECS Fargate. More moving parts, years of battle-testing, full control.
AWS Lambda + API Gateway. True scale-to-zero, good for infrequent API traffic, cold starts on anything else.
Lightsail containers. Simpler than ECS, cheaper for small workloads.
Google Cloud Run. If you're open to leaving AWS, this is genuinely the best container-in-a-box experience on any cloud.
fly.io / Render / Railway. PaaS experience outside AWS.

For our use case (production Next.js behind CloudFront with a real VPC, Kong gateway, and backend services on the same infrastructure), ECS Fargate was the natural fit. Express Mode looked appealing on paper, but I went with standard Fargate instead.

Why not ECS Express Mode

Three reasons:

1. Terraform bug. The aws_ecs_express_gateway_service resource had an open issue (hashicorp/terraform-provider-aws#45792, "Provider produced inconsistent result after apply") that would have blocked deploys. Fixable with workarounds, but not something I wanted to own.

2. "Managed abstraction" fatigue. App Runner was also supposed to be the easy path. It lasted four years before being sidelined. Express Mode is newer than App Runner was when I first used it. I wasn't willing to bet a second production frontend on another abstraction that might get sunset in 18 months.

3. ALB duplication. Express Mode auto-creates its own ALB. If you already have an ALB for other services (like I did for a Kong gateway routing backend services), you end up paying for two. Around $16/month extra for the overlap. Not huge, but annoying and unnecessary.

Standard ECS Fargate uses the ALB you already have. Same pattern as every other service in the cluster. Boring, predictable, stable.

What the migration actually looked like

The architecture ended up like this:

Browser
  ↓
CloudFront (caching + WAF)
  ↓ X-Origin-Verify header
ALB (port 443, host-based routing)
  ↓                    ↓
Next.js target      Kong target
group               group
  ↓                    ↓
ECS Fargate         Kong gateway
(Next.js)              ↓
                    Backend services

Next.js containers run in private VPC subnets. ALB listener rules use host-based routing to split frontend traffic (example.com → Next.js target group) from API traffic (any host + X-Origin-Verify header → Kong target group). CloudFront in front for caching, SSL, and WAF.

For origin protection, I stuck with X-Origin-Verify header validation on the ALB rule. The AWS-managed CloudFront prefix list is a cleaner option (allow only CloudFront IPs at the security group level) but it's more moving parts and one more thing to update when AWS changes its prefix list. The header check was good enough.

Gotchas I hit

Health checks. Next.js needs a /health endpoint returning 200 for ALB target group health checks. This is obvious in retrospect but it was our first failed deploy. Add it to your app/health/route.ts before you migrate, not during.

Single-phase deploy. The App Runner + CloudFront setup I had was a two-phase deploy: Terraform creates App Runner, CLI collects the URL, Terraform runs again with the URL as a CloudFront origin. With ECS behind an ALB that already exists at plan time, this goes away. One terraform apply, no two-phase dance. Genuinely nicer.

Private subnets from the start. App Runner services are publicly routable on the internet, with WAF-only protection and no network-level isolation. ECS Fargate in private subnets gives you proper network boundaries. Don't skip this. Put your container in private subnets with no public IP, only allow ingress from the ALB security group.

Auto-scaling. Express Mode gives you auto-scaling for free. Standard Fargate requires configuring target-tracking scaling policies yourself. One extra Terraform resource, but you have actual control over what the scaling metric is.

What about scale-to-zero?

This is the pain point for everyone moving off App Runner. Standard Fargate does not scale to zero. You always pay for at least one running task. If your workload has long idle periods, this is a real cost difference.

For production workloads this is usually fine (you want at least one container warm anyway). For dev/staging environments or low-traffic side projects, you have three options:

Cloud Run on GCP. Actual scale-to-zero, sub-second cold starts, no ALB needed.
Lambda + API Gateway. Scale-to-zero, but cold starts hurt if your app isn't designed for them.
Scheduled shutdowns. eventbridge rules to scale the ECS service to 0 at night, back to 1 in the morning. Crude but effective for dev environments.

If your app is a very low traffic fastapi backend (as in the Reddit thread that prompted this article), honestly, Cloud Run is probably the right answer. AWS just doesn't have a real equivalent right now.

Would I do it again?

Yeah, for a production workload with an existing VPC and other services, the standard Fargate path was the right call. The migration was not fun but the result is cleaner than App Runner. Single-phase deploys, private networking, no dependency on a deprecated service.

If I were starting fresh with a brand new single service and no existing infrastructure, I'd look harder at Cloud Run or fly.io. AWS's container story below ECS is just not compelling anymore.

The tsdevstack angle

I build a multi-cloud TypeScript framework called tsdevstack that generates production infrastructure from a config file. The App Runner to ECS Fargate migration above is what shipped in v0.2.0. Framework users who were deploying Next.js frontends via App Runner can now re-run infra:deploy and the framework handles the migration automatically.

One thing worth mentioning given the scale-to-zero discussion above: tsdevstack implements scale-to-zero on AWS for services that set minInstances: 0 in config. Since ECS Fargate doesn't have native scale-to-zero, the framework generates a three-layer mechanism: a CloudWatch alarm scales the service to zero when idle (CPU below 5% for 15 minutes), and a wake-up Lambda spins it back up when the first request hits the ALB and returns 502. Kong catches the 502, fires the wake-up call, and returns a 503 with Retry-After: 30 so the client retries automatically. Cold start is around 30-60 seconds, which is significant compared to Cloud Run or Container Apps, but it's real scale-to-zero on AWS and it works. Kong itself stays at minInstances >= 1 so there's always something to trigger the wake-up.

If you're tired of writing Terraform by hand for every AWS migration AWS forces on you, take a look. Docs here, repo at github.com/tsdevstack.

Tags: aws, terraform, devops, cloud

I built a TypeScript framework that generates your entire cloud infrastructure

gyorgy — Wed, 08 Apr 2026 15:18:57 +0000

TL;DR: tsdevstack is an open-source TypeScript microservices framework. You write a config file and application code. It generates Terraform, Docker, Kong gateway routes, CI/CD pipelines, secrets, and observability — across GCP, AWS, and Azure. One command deploys the whole stack.

The problem

Every TypeScript project I shipped to production followed the same pattern. Write the application code in a week. Spend the next month wiring up infrastructure.

Terraform for the cloud resources. Docker for local dev. Kong or some other gateway for routing. JWT auth boilerplate. Secrets management across environments. CI/CD pipelines. Observability. WAF rules. SSL certificates. Health checks. Database migrations. And then the same dance for staging and production.

The application code was the easy part. Everything around it took 10x longer.

I tried the existing options:

Heroku-style platforms hide too much. The moment you need a WAF, a custom gateway, or VPC isolation, you're stuck.
Pulumi/CDK/Terraform modules are flexible but you still write and maintain all of it. And you write it differently for each cloud provider.
Templates and starters get you a working hello-world but rot the moment you customise them.

I wanted something in between. A framework that owned the infrastructure layer entirely — generated, managed, deployed — but stayed out of the way of the application code.

What I built

Infrastructure as Framework. You write TypeScript application code and one config file. The framework generates everything else.

npx @tsdevstack/cli init

This scaffolds a monorepo with NestJS backends, Next.js frontends, a Kong API gateway, Postgres, Redis, and observability. Everything wired together, ready to run.

npm run dev

Local development matches production. Same gateway, same database engine, same auth flow, same observability stack. No "works on my machine" gap.

npx tsdevstack cloud:init --gcp
npx tsdevstack infra:init --env dev
npx tsdevstack infra:deploy --env dev

This provisions the full production stack: VPC, managed Postgres, Redis, container registry, Cloud Run services, API gateway, load balancer, WAF, SSL certificates, observability. From a single config file.

The same flow works on AWS (ECS Fargate) and Azure (Container Apps). Same framework, same patterns, same commands. No rewriting infrastructure when you switch providers.

What's in the box

Application layer — NestJS backends, Next.js frontends, Rsbuild SPAs. Auto-generated TypeScript API clients with DTOs as separated imports — both frontend and backend apps consume the same type-safe library.

API gateway — Kong routes auto-generated from your OpenAPI specs. JWT validation, rate limiting, CORS, bot detection. Fully customisable when you need it.

Background processing — BullMQ job queues with detached workers running in separate containers. Scale independently from API services.

Object storage — Add buckets with add-bucket-storage. MinIO locally, S3/GCS/Azure Blob in production. Unified StorageModule with pre-signed URLs and per-provider adapters.

Async messaging — Inter-service pub/sub via Redis Streams. Consumer groups, dead letter queues, retry logic. No new infrastructure — runs on the same Redis instance as caching.

Authentication — JWT token management, protected routes, session handling, email confirmation. Bring your own OIDC or use the built-in auth service.

Secrets — Local secrets generated automatically for development. Cloud secrets managed separately and pushed to the cloud provider's Secret Manager. Environment isolation, scoped per service. Works with Secret Manager on all three providers.

Observability — Prometheus metrics, Grafana dashboards, distributed tracing with Jaeger, structured logging. Configured from day one.

Infrastructure — Generated Terraform for GCP, AWS, and Azure. VPC/VNet, managed databases, Redis, container orchestration, load balancers, WAF, SSL, CDN.

CI/CD — Generated GitHub Actions workflows. OIDC authentication, per-service deploys, environment selection. No secrets in your repo.

Compliance — SOC 2, ISO 27001, GDPR technical controls built in. Encryption at rest and in transit, network isolation, zero-credential runtimes.

How it actually works

The framework manages a config.json for your project structure — you don't edit it by hand, you modify it through commands like add-service, add-bucket-storage, add-messaging-topic.

Your config.json ends up looking like this:

{
  "projectName": "my-saas",
  "cloud": "gcp",
  "services": [
    {
      "name": "auth-service",
      "type": "nestjs",
      "hasDatabase": true
    },
    {
      "name": "frontend",
      "type": "nextjs"
    }
  ],
  "storage": {
    "buckets": ["uploads"]
  }
}

When you run npx tsdevstack sync, the framework reads the config and generates:

docker-compose.yml with all the services, dependencies, and health checks
Kong gateway config from OpenAPI specs
Local secrets in .env files per service
Database initialization scripts
Service stubs if you added new ones

You write the infrastructure.json directly for cloud-specific settings (domains, scaling, environments).

{
  "environments": {
    "dev": {
      "services": {
        "auth-service": {
          "minInstances": 0,
          "maxInstances": 3,
          "cpu": 1,
          "memory": "512Mi"
        },
        "frontend": {
          "minInstances": 1,
          "maxInstances": 5,
          "cpu": 1,
          "memory": "1Gi",
          "domain": "dev.example.com",
        }
      }
    }
  }
}

When you run npx tsdevstack infra:deploy, it generates Terraform for your chosen provider and applies it. The framework owns the Terraform, you don't write it, you don't maintain it.

The escape hatch is intentional. Custom Kong config? Drop in your own. Need a Terraform resource the framework doesn't generate? Add it as a side file. Need a cloud-native service the framework doesn't wrap? Use the SDK directly. The framework isn't a cage — it's a starting point that handles 95% of cases and gets out of your way for the other 5%.

Why three clouds?

Vendor lock-in is real but slow-moving. You don't switch clouds because you want to — you switch because acquisition, pricing change, region requirements, or a customer with an immovable preference forces you to. When that happens, rewriting infrastructure is brutal.

tsdevstack generates the equivalent infrastructure on GCP (Cloud Run + Cloud SQL + Memorystore), AWS (ECS Fargate + RDS + ElastiCache), and Azure (Container Apps + Azure Database for PostgreSQL + Azure Cache for Redis). Same application code, same config file, different generated Terraform. Switching providers is a config change and a redeploy.

No abstraction layer trying to hide the differences between clouds. Each provider gets a native, idiomatic implementation. The framework handles the translation.

What about AI agents?

There's a built-in MCP (Model Context Protocol) server with 54 tools for deploying, querying, and debugging your stack. Claude Code, Cursor, and VS Code Copilot can manage the infrastructure directly — and because the framework has strong conventions, the AI agent actually understands what it's doing instead of hallucinating CLI commands.

Three permission tiers: SAFE_READ, CLOUD_MUTATE, CLOUD_DESTRUCTIVE. The agent always asks for permission before mutating anything. The MCP server is built into the CLI — no separate package, no extra setup.

Where it stands

Open source. MIT license. Four packages on npm:

@tsdevstack/cli — the CLI, infrastructure generation, deployment
@tsdevstack/nest-common — shared NestJS modules
@tsdevstack/cli-mcp — MCP server for AI agents
@tsdevstack/react-bot-detection — React bot detection

v0.2.0 just shipped with object storage, async messaging, AWS App Runner → ECS Fargate migration (App Runner stops accepting new customers April 30), and a batch of WAF and observability improvements across all three providers.

This is solo work. I'm a developer building this on the side. It started as the framework I wanted for my own projects and grew into something I think other people will find useful. The first users are showing up now.

npx @tsdevstack/cli init

Docs and guides: tsdevstack.dev
GitHub: github.com/tsdevstack
Discord: discord.gg/tsdevstack

Feedback wanted. Bug reports wanted. Issues, ideas, complaints — all welcome.

Tags: typescript, nestjs, devops, opensource