Forem: Raju Dandigam

Stop Rebuilding Your AI App on Every Change: Docker Compose Watch for Node.js Developers

Raju Dandigam — Mon, 25 May 2026 17:38:51 +0000

Introduction

Local AI application development often starts simple. You build a Node.js API, call a model provider, add a prompt, and test the response. Then the stack grows. You add Redis for short-term memory, Postgres for application state, a local model endpoint, maybe a worker service, and a frontend to inspect results.

At that point, Docker Compose becomes useful because it can run the whole development environment consistently. The problem is the development loop. If every source code change requires stopping containers, rebuilding images, restarting services, and waiting for the app to come back, Docker starts to feel slower than working directly on the host machine.

Docker Compose Watch helps solve that problem. It lets Compose watch local file changes and either sync files into running containers, rebuild services, or sync files and restart services depending on what changed. Docker's documentation describes Compose Watch as a way to automatically update and preview running Compose services as you edit and save code.

For Node.js AI apps, this can make local development feel much smoother. You keep the benefits of a containerized stack, but you avoid the manual rebuild cycle for every small TypeScript change.

The Local AI Development Loop Problem

A typical local AI stack may include several services:

Node.js API (TypeScript)
    ├── Redis (session/memory)
    ├── Postgres (state)
    ├── Local LLM endpoint (optional)
    └── Frontend debug UI

Without watch mode, a small change can turn into a slow loop:

Edit a TypeScript file
Stop containers
Rebuild the API image
Restart containers
Wait for services to initialize
Test the change

You edit a file, rebuild the API image, restart the container, wait for the service to initialize, then test the prompt again. If the frontend also changes, you repeat the same process there. If the change touches dependencies, you rebuild again.

That delay matters because AI development is highly iterative. You may change the prompt, adjust a tool schema, update response parsing, improve logging, or add one guardrail. These are small changes, but you may make dozens of them in a single session.

The goal is not to avoid rebuilds forever. Dependency and Dockerfile changes should still rebuild the image. The goal is to avoid rebuilding the entire service when only a source file changed.

What Compose Watch Does

Compose Watch is configured under the develop.watch section of a service. The Compose Develop specification defines watch actions such as sync, rebuild, sync+restart, and newer sync+exec. The common actions most Node.js developers need are sync, rebuild, and sync+restart.

The sync action copies changed files from your host into the running container. This is useful when the process inside the container already has a watcher, such as tsx watch, nodemon, or Vite.

The rebuild action rebuilds the service image. This is useful when package.json, a lockfile, or a Dockerfile changes.

The sync+restart action copies files and restarts the container. This is useful when the service does not have its own hot-reload process.

You start the environment with:

docker compose up --watch

Docker also provides a docker compose watch command for watching build context and rebuilding or refreshing containers when files are updated.

A Simple Node.js AI API Example

Assume we have a TypeScript API that exposes one endpoint for testing prompts. It talks to Redis for short-term memory and uses an environment variable for the model endpoint.

A simple development Dockerfile can look like this:

FROM node:22-slim

WORKDIR /app

COPY package*.json ./
RUN npm ci

COPY tsconfig.json ./
COPY src ./src

CMD ["npx", "tsx", "watch", "src/index.ts"]

This image is intentionally for development. It includes dependencies needed to run TypeScript directly with a watcher. The production Dockerfile should usually be different and use a compiled dist output.

Now add Compose Watch:

services:
  api:
    build:
      context: .
      dockerfile: Dockerfile.dev
    ports:
      - "3000:3000"
    environment:
      NODE_ENV: development
      REDIS_URL: redis://redis:6379
      OPENAI_BASE_URL: ${OPENAI_BASE_URL:-http://host.docker.internal:12434/engines/llama.cpp/v1}
      OPENAI_API_KEY: ${OPENAI_API_KEY:-local-development-key}
    depends_on:
      - redis
    develop:
      watch:
        - action: sync
          path: ./src
          target: /app/src
          ignore:
            - "**/*.test.ts"
            - "**/*.spec.ts"

        - action: rebuild
          path: ./package.json

        - action: rebuild
          path: ./package-lock.json

        - action: rebuild
          path: ./tsconfig.json

  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"

Now start the stack:

docker compose up --watch

When you edit a TypeScript file in src, Compose syncs it into /app/src inside the container. Then tsx watch notices the change and reloads the process. When you change package.json or the lockfile, Compose rebuilds the image because the dependency layer needs to change.

This gives you a better local loop without abandoning containers.

Why This Helps AI Development

AI development has a different rhythm from many traditional API projects. You often test small changes repeatedly. A developer may adjust a prompt, change a system message, add structured JSON parsing, tweak a retry rule, or update how tool results are summarized.

These changes usually live in source files. They should not require a full image rebuild. Compose Watch lets those files sync quickly while the rest of the stack stays running.

For example, you may have a small prompt helper like this:

export function buildSummaryPrompt(input: string) {
  return [
    {
      role: "system",
      content:
        "You summarize technical logs clearly. Mention the likely cause and next action."
    },
    {
      role: "user",
      content: input
    }
  ];
}

When you change the system message, the API container can reload quickly. Redis stays running. Postgres stays running. Your local model endpoint or cloud model configuration stays the same. You can immediately send another request and compare behavior.

That is the value. Watch mode helps keep the feedback loop close to the speed of normal Node.js development while preserving the consistency of a Compose stack.

Adding a Frontend Debug UI

Many AI apps eventually need a simple UI for testing prompts, reviewing agent traces, or inspecting responses. Compose Watch works well with frontend tools such as Vite or Next.js too.

Here is a small multi-service setup:

services:
  frontend:
    build:
      context: ./frontend
      dockerfile: Dockerfile.dev
    ports:
      - "5173:5173"
    environment:
      VITE_API_URL: http://localhost:3000
    develop:
      watch:
        - action: sync
          path: ./frontend/src
          target: /app/src
          ignore:
            - node_modules/

        - action: rebuild
          path: ./frontend/package.json

        - action: rebuild
          path: ./frontend/package-lock.json

  api:
    build:
      context: ./api
      dockerfile: Dockerfile.dev
    ports:
      - "3000:3000"
    environment:
      REDIS_URL: redis://redis:6379
    depends_on:
      - redis
    develop:
      watch:
        - action: sync
          path: ./api/src
          target: /app/src

        - action: rebuild
          path: ./api/package.json

        - action: rebuild
          path: ./api/package-lock.json

  redis:
    image: redis:7-alpine

With this setup, frontend changes sync into the frontend container, API changes sync into the API container, and Redis keeps its state unless you restart or remove the volume. You can iterate on the UI and backend without rebuilding everything on every change.

Watch Mode vs Bind Mounts

Many developers already use bind mounts for local development:

volumes:
  - ./src:/app/src

That works, but it can be slower or less predictable on macOS and Windows because Docker Desktop runs containers inside a virtualized environment. Large directories, file watchers, and node_modules can create performance issues.

Compose Watch gives you more explicit control. You decide which paths sync, which paths trigger rebuilds, and which paths should be ignored. Docker's file watch documentation also recommends using ignore rules to prevent unnecessary syncs and notes that watch rules can ignore paths relative to the watched path.

For source code, watch mode is often clearer than mounting the entire repository. For persistent data such as Postgres, Redis, uploads, or local cache directories, volumes still make sense.

A good rule is simple: Use watch mode for files you edit frequently. Use volumes for data you need to persist.

A Practical Watch Strategy

For Node.js and TypeScript projects, use sync for source files:

- action: sync
  path: ./src
  target: /app/src

Use rebuild for dependency files:

- action: rebuild
  path: ./package.json

- action: rebuild
  path: ./package-lock.json

Use sync+restart for configuration files if the app does not reload them automatically:

- action: sync+restart
  path: ./config
  target: /app/config

Keep ignored paths explicit:

ignore:
  - node_modules/
  - "**/*.test.ts"
  - "**/*.spec.ts"
  - coverage/

Do not watch everything by default. A broad watch rule can cause unnecessary syncs, rebuilds, and confusing reloads.

Where This Does Not Belong

Compose Watch is a development feature. It should not be part of your production deployment strategy. Production images should be built, tagged, scanned, and deployed through a normal pipeline.

It also should not replace a good production Dockerfile. A development Dockerfile may run tsx watch or nodemon, but a production Dockerfile should usually compile TypeScript and run the compiled output.

Compose Watch also does not remove the need for test automation. It improves the local loop, but you still need unit tests, integration tests, Cypress or Playwright tests, and CI validation before merging.

Common Mistakes

One common mistake is combining watch mode and bind mounts for the same path. If you mount ./src:/app/src and also configure watch to sync ./src to /app/src, you are doing the same job twice. Pick one.

Another mistake is using sync for dependency changes. If package.json changes, the container needs a rebuild so dependencies are installed correctly.

A third mistake is expecting depends_on to mean a service is ready. It controls startup order, but it does not always guarantee readiness. For databases or APIs, add health checks when the dependent service must be ready before another service starts.

Conclusion

Docker Compose Watch is one of those features that can quietly improve daily development. It does not change your architecture, and it does not make your AI app smarter. It simply removes friction from the local development loop.

For Node.js AI apps, that friction matters. Prompt changes, tool schema updates, response parsing fixes, and UI adjustments happen constantly. Rebuilding containers manually after every small change slows down the exact part of development that should feel fast.

The useful pattern is straightforward:

Run your local AI stack with Docker Compose
Use sync for source files
Use rebuild for dependency and build configuration changes
Use sync+restart when a process cannot hot reload by itself
Keep Redis, Postgres, and other services running while you iterate on the code

That gives you the best of both worlds: a repeatable containerized environment and a fast local feedback loop. Compose Watch is not only a Docker convenience feature. For AI app development, it can be the difference between experimenting freely and waiting on rebuilds all day.

Optimizing Docker Images for TypeScript AI Agents with Dive and Multi-Stage Builds

Raju Dandigam — Sat, 23 May 2026 17:47:20 +0000

Introduction

TypeScript AI agents can become surprisingly heavy Docker images.

At first, the service may look small. It is just a Node.js app that calls an LLM, uses a few tools, stores some state, and exposes an API. Then the dependencies start growing. You add the OpenAI SDK, LangChain or another agent framework, Prisma, a database client, Playwright for browser automation, test utilities, TypeScript, build tools, and maybe a few internal packages.

Before long, the Docker image is much larger than expected. It might still run, but the hidden cost shows up in CI, deployments, registry storage, cold starts, and security scans.

This article walks through a practical optimization path for a TypeScript AI agent. The goal is not to chase the smallest possible image. The goal is to remove obvious waste, keep the runtime image focused, and use tools like Dive to understand what is actually inside the container.

Why Image Size Matters for AI Agent Services

Large Docker images are not only a storage problem. They slow down CI pipelines because every build and deployment may need to push or pull hundreds of extra megabytes. They slow down new environments because each new instance needs the image before the app can start. They also increase the security surface because more packages usually mean more things to scan, patch, and maintain.

For AI agent services, this matters even more because the app often includes tooling that is not needed at runtime. TypeScript compilers, test frameworks, browser binaries, local development utilities, and generated artifacts can accidentally end up in the final image. If the agent only needs to run compiled JavaScript and call APIs, the final image should not include everything used to build, test, and develop the project.

The Unoptimized Dockerfile

A common first Dockerfile looks like this:

FROM node:22

WORKDIR /app

COPY . .

RUN npm install
RUN npm run build

EXPOSE 3000

CMD ["node", "dist/index.js"]

This works, but it has several problems.

It uses the full Node.js base image. It copies the entire project into the image, including files that may not be needed. It installs development dependencies. It keeps TypeScript source, tests, local configuration, and build tools in the same image that runs in production. It also makes Docker layer caching less effective because every source change can invalidate dependency installation.

For a TypeScript AI agent, that can mean shipping a runtime image that contains testing libraries, Playwright setup files, development-only packages, local documentation, and other files that are not needed once the app is compiled.

The Better Mental Model

A production Docker image should answer one question: what does this service need to run?

For a TypeScript AI agent, the runtime usually needs compiled JavaScript, production dependencies, package metadata, environment configuration, and maybe Prisma-generated client files or migration-related assets depending on how you deploy.

It usually does not need TypeScript compiler dependencies, unit tests, Cypress or Playwright test specs, coverage reports, local .env files, source maps in some production environments, .git history, or development caches.

Multi-stage builds help enforce that separation.

Optimized Multi-Stage Dockerfile

Here is a cleaner Dockerfile for a TypeScript AI agent API:

# syntax=docker/dockerfile:1.7

FROM node:22-slim AS build

WORKDIR /app

COPY package*.json ./
RUN npm ci

COPY tsconfig.json ./
COPY src ./src

RUN npm run build

FROM node:22-slim AS runtime

WORKDIR /app

ENV NODE_ENV=production

COPY package*.json ./
RUN npm ci --omit=dev && npm cache clean --force

COPY --from=build /app/dist ./dist

USER node

EXPOSE 3000

CMD ["node", "dist/index.js"]

This is a much better starting point.

The build stage installs all dependencies and compiles the TypeScript code. The runtime stage starts fresh, installs only production dependencies, and copies only the compiled output from the build stage. The final image does not include the TypeScript compiler, test files, or most development tooling.

The node:22-slim base image keeps broad compatibility while avoiding the size of the full Node.js image. Alpine can be smaller, but it can introduce compatibility issues with native dependencies. Many TypeScript AI apps use packages that depend on native modules, database clients, or browser-related libraries, so slim is often the safer first optimization.

Add a Proper .dockerignore

The Dockerfile is only part of the optimization. The build context also matters. If Docker receives your entire repository as context, you may accidentally copy unnecessary files into the image or slow down builds.

A basic .dockerignore for this kind of project can look like this:

node_modules
dist
coverage
playwright-report
cypress/videos
cypress/screenshots
.git
.github
.env
.env.*
.npmrc
*.log
README.md
docs
tests
__tests__
*.spec.ts
*.test.ts

Be careful with this file. Do not ignore files that your build actually needs. For example, if your app needs Prisma schema files during build, include them intentionally:

COPY prisma ./prisma
RUN npx prisma generate

The key is to be deliberate. A Docker image should not receive the entire repository just because COPY . . was easy.

Where Dive Helps

Multi-stage builds are useful, but they do not tell you exactly what is inside the image. Dive helps with that.

Dive is an open-source tool for exploring Docker images layer by layer. It shows which command created each layer, which files were added or changed, and where wasted space may exist. This makes it easier to see whether your image still contains unexpected files such as test reports, cached package data, source files, or large browser binaries.

Install Dive locally, then analyze the image:

brew install dive
docker build -t ai-agent-api:optimized .
dive ai-agent-api:optimized

When you open the image in Dive, look for a few things. Check which layers are the largest. Look for files that should not exist in production. Verify that the final image does not include your test folders, local .env files, coverage reports, or source repository metadata. Check whether node_modules includes only production dependencies. Look at the efficiency score, but do not treat it as the only goal.

The best use of Dive is not to obsess over every kilobyte. It is to make the image visible. Once you can see the layers, waste becomes much easier to remove.

Example: AI Agent with Playwright

Browser automation is common in agent workflows. A support agent may open a web page, a QA agent may validate a flow, or a research agent may inspect a site. Playwright is powerful, but it can also make images much larger because browsers and system dependencies are heavy.

The important question is whether Playwright is needed in the same runtime image as your API.

If browser automation is only used in tests, do not include it in the production image. Keep Playwright in a separate test image or CI step:

services:
  app:
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - "3000:3000"

  playwright-tests:
    image: mcr.microsoft.com/playwright:v1.56.1-noble
    working_dir: /app
    volumes:
      - ./:/app
    command: sh -c "npm ci && npx playwright test"

If the agent truly needs browser automation at runtime, consider isolating that capability into a separate browser worker service instead of bloating the main API image.

This separation keeps the main API smaller and makes the browser automation boundary more explicit.

Safer Dependency Installation

For production images, prefer npm ci over npm install. It uses the lockfile and gives more reproducible installs:

RUN npm ci --omit=dev && npm cache clean --force

If your project uses pnpm, the same idea applies. Install from the lockfile and avoid shipping development dependencies:

RUN corepack enable
RUN pnpm install --frozen-lockfile --prod

Also avoid installing packages at runtime. An AI agent should not dynamically install npm packages in production unless you have a very controlled sandbox and a strong reason. Runtime package installation makes the supply chain harder to scan and the image harder to reproduce.

Before and After Pattern

The unoptimized version often looks like this:

FROM node:22

WORKDIR /app

COPY . .

RUN npm install
RUN npm run build

CMD ["node", "dist/index.js"]

The optimized version should look more like this:

FROM node:22-slim AS build

WORKDIR /app

COPY package*.json ./
RUN npm ci

COPY tsconfig.json ./
COPY src ./src

RUN npm run build

FROM node:22-slim AS runtime

WORKDIR /app

ENV NODE_ENV=production

COPY package*.json ./
RUN npm ci --omit=dev && npm cache clean --force

COPY --from=build /app/dist ./dist

USER node

CMD ["node", "dist/index.js"]

The exact size reduction will depend on your project. A small API may only save a few hundred megabytes. An AI agent with browser tooling, dev dependencies, generated reports, and cached files may see a much larger improvement. The "900MB to 150MB" story is realistic for some messy Node.js images, but it should be treated as an example, not a promise.

Add Image Checks to CI

Dive can also run in CI mode with thresholds:

CI=true dive ai-agent-api:optimized --ci-config .dive-ci

A simple .dive-ci file can enforce a minimum efficiency score:

rules:
  - name: efficiency
    key: efficiency
    operation: ">="
    value: 0.90

This should not be your only quality gate, but it can catch obvious regressions. For example, if someone accidentally copies Playwright reports, .git, or local datasets into the image, the image size and efficiency score may change enough to fail the check.

You can also add a simple image size check:

docker image inspect ai-agent-api:optimized \
  --format='{{.Size}}'

In mature teams, image optimization becomes part of the same quality loop as tests, linting, and vulnerability scanning.

Practical Trade-offs

Optimization has trade-offs.

The smallest image is not always the best image. Alpine images are smaller, but native Node.js packages may require extra work. Distroless images reduce attack surface, but they are harder to debug because they do not include a shell. Aggressive cleanup can make troubleshooting painful. Multi-stage builds improve runtime images, but they may add complexity to the Dockerfile.

For most TypeScript AI agents, the best starting point is simple: use a slim base image, use multi-stage builds, exclude unnecessary files, install only production dependencies, run as a non-root user, and inspect the result with Dive.

That will usually get you most of the benefit without turning the Dockerfile into a maintenance burden.

Conclusion

Docker image optimization is not just a performance trick. For TypeScript AI agents, it is part of reliability and security.

A smaller image pulls faster, deploys faster, scans faster, and usually contains fewer unnecessary files. A cleaner image also makes it easier to understand what your AI service is actually shipping. That matters when the application has access to model APIs, tools, browsers, databases, and credentials.

Start with the obvious improvements. Replace the one-stage Dockerfile with a multi-stage build. Use node:slim instead of the full image. Add a careful .dockerignore. Install only production dependencies in the final stage. Keep Playwright and other heavy tooling out of the main runtime image unless the agent truly needs them. Then use Dive to inspect the image instead of guessing.

The goal is not to win a smallest-image contest. The goal is to ship a focused, understandable, production-ready image for your AI agent.

Your AI Agent Has a Supply Chain: Securing Node.js Apps with Docker Hardened Images

Raju Dandigam — Wed, 20 May 2026 23:07:42 +0000

Introduction

AI applications often look small from the outside. A Node.js service calls a model, connects to a few tools, stores some state, and returns a response. The codebase may be much smaller than a traditional enterprise application.

The security surface is not small.

A modern Node.js AI app may use model provider APIs, MCP servers, browser automation, Redis or Postgres, private npm packages, GitHub tokens, internal APIs, and local files. An agent may read repository code, open a browser, inspect logs, summarize customer data, or call tools that perform real actions. That means the container running the app is not just serving HTTP traffic. It is sitting near credentials, tools, data, and execution paths.

This is why the Docker image matters. The base image, dependency install process, runtime user, filesystem permissions, SBOM, vulnerability scanning, and secret handling are all part of the AI application architecture.

Many AI tutorials skip this layer. They show how to call a model, build an agent loop, or connect a tool. In production, the question is different: what exactly are we shipping, where did it come from, what can it access, and how much damage can it do if compromised?

Why AI Apps Increase Supply Chain Risk

Traditional Node.js applications already have supply chain risk. They depend on npm packages, operating system packages, base images, CI pipelines, and deployment configuration. AI applications add more moving pieces.

An AI agent may use MCP servers as tool adapters. Each MCP server has its own dependencies, permissions, and credentials. A local LLM workflow may pull model artifacts from registries. A browser automation tool may bring large system dependencies. A code-review agent may need GitHub access. A support assistant may access customer-like data. A test-generation agent may read and write files.

The application may still be "just a Node.js service," but the dependency graph is much wider than it looks.

Docker's 2026 supply chain write-up on Trivy and KICS is a useful reminder of the risk. Docker described two incidents where stolen publisher credentials were used to push malicious images through legitimate publishing flows. Docker stated that its infrastructure was not breached, but anyone who pulled the compromised tags was temporarily exposed through the software supply chain.

That story matters for AI apps because agents often rely on tools they did not build. A compromised image, package, MCP server, or build step can become a path to credentials, source code, cloud systems, or sensitive data.

The Risky Dockerfile

A common Dockerfile for a Node.js AI app may look like this:

FROM node:latest

WORKDIR /app

COPY . .

RUN npm install

ENV OPENAI_API_KEY=sk-example
ENV GITHUB_TOKEN=ghp-example

EXPOSE 3000

CMD ["npm", "start"]

This Dockerfile has several problems.

It uses node:latest, which can change over time and make builds less predictable. It copies the entire local directory into the image, which may accidentally include .env, .npmrc, test artifacts, or local files. It uses npm install instead of a lockfile-based install. It bakes secrets into the image through environment variables. It runs as the default user. It does not separate build dependencies from runtime dependencies.

For a demo, this might work. For an AI app with access to tools and credentials, it is too loose.

A Safer Dockerfile Pattern

A better pattern uses a specific base image, a multi-stage build, production-only dependencies, a non-root user, and no embedded secrets:

# syntax=docker/dockerfile:1.7

FROM node:22-slim AS build

WORKDIR /app

COPY package*.json ./
RUN npm ci

COPY tsconfig.json ./
COPY src ./src

RUN npm run build

FROM node:22-slim AS runtime

WORKDIR /app

ENV NODE_ENV=production

COPY package*.json ./
RUN npm ci --omit=dev && npm cache clean --force

COPY --from=build /app/dist ./dist

RUN groupadd --system appgroup \
  && useradd --system --gid appgroup --home /app appuser \
  && chown -R appuser:appgroup /app

USER appuser

EXPOSE 3000

CMD ["node", "dist/index.js"]

This is already much safer. It pins the Node major version instead of using latest. It uses npm ci for reproducible dependency installation. It keeps build tooling out of the runtime stage. It installs only production dependencies in the final image. It runs as a non-root user.

This is not perfect security, but it is a better foundation.

Where Docker Hardened Images Fit

Docker Hardened Images take the base-image part of this problem further. Docker describes Docker Hardened Images as secure, minimal, production-ready images, and in December 2025 Docker announced that they were made free and open source under the Apache 2.0 license. Docker also stated that it had hardened more than 1,000 images and Helm charts in the catalog.

The key idea is that the base image should not be an afterthought. A hardened image reduces unnecessary packages, narrows the attack surface, and gives teams a stronger starting point than a general-purpose image.

The Dockerfile pattern stays mostly the same. The base image changes to the hardened equivalent available in your registry and organization:

FROM <your-hardened-node-image>@sha256:<digest> AS build

WORKDIR /app

COPY package*.json ./
RUN npm ci

COPY tsconfig.json ./
COPY src ./src

RUN npm run build

FROM <your-hardened-node-image>@sha256:<digest> AS runtime

WORKDIR /app

ENV NODE_ENV=production

COPY package*.json ./
RUN npm ci --omit=dev && npm cache clean --force

COPY --from=build /app/dist ./dist

USER 10001

EXPOSE 3000

CMD ["node", "dist/index.js"]

The exact image name depends on how your team accesses Docker Hardened Images and how your registry is configured. The important habits are to use a trusted base image, avoid latest, and pin the image by digest when you need stronger reproducibility.

Docker's Hardened Images product page also emphasizes drop-in replacements, continuous updates, secure customization, and provenance-preserving workflows.

Do Not Bake Secrets Into the Image

AI apps usually need secrets, but the image should not contain them.

Model provider keys, GitHub tokens, MCP credentials, database passwords, and OAuth tokens should be provided at runtime through your deployment environment or secret manager. They should not appear in the Dockerfile.

services:
  agent:
    image: node-ai-agent:secure
    environment:
      NODE_ENV: production
      OPENAI_API_KEY: ${OPENAI_API_KEY}
      GITHUB_TOKEN: ${GITHUB_TOKEN}
      DATABASE_URL: ${DATABASE_URL}

This is acceptable for local development when values come from a local .env file that is not committed. In production, prefer a managed secret system such as Kubernetes Secrets, AWS Secrets Manager, Google Secret Manager, Azure Key Vault, HashiCorp Vault, or your platform's built-in secret store.

Build-time secrets are different. If the image build needs temporary access to a private npm package or private Git repository, use Docker Build Secrets or SSH mounts instead of ARG or ENV. Docker's build secrets documentation explains that secret mounts and SSH mounts are designed for sensitive data needed during the build, and that the process has two steps: pass the secret to docker build, then consume it in the Dockerfile.

RUN --mount=type=secret,id=npm_token \
  npm config set //registry.npmjs.org/:_authToken="$(cat /run/secrets/npm_token)" \
  && npm ci \
  && npm config delete //registry.npmjs.org/:_authToken

Then build with:

docker build \
  --secret id=npm_token,env=NPM_TOKEN \
  -t node-ai-agent:secure .

Docker's GitHub Actions documentation also supports secret mounts and SSH mounts in CI, where secret mounts appear as files under /run/secrets by default.

Add Runtime Guardrails

A secure image is only one layer. The container should also run with limited permissions.

For a Node.js AI app, a practical Compose configuration may look like this:

services:
  agent:
    image: node-ai-agent:secure
    read_only: true
    tmpfs:
      - /tmp
    cap_drop:
      - ALL
    security_opt:
      - no-new-privileges:true
    mem_limit: 1g
    cpus: 1
    environment:
      NODE_ENV: production
      OPENAI_API_KEY: ${OPENAI_API_KEY}
      DATABASE_URL: ${DATABASE_URL}

A read-only filesystem prevents the app from writing to unexpected places. A temporary filesystem gives it a safe place for temporary files. Dropping Linux capabilities reduces what the container can do. no-new-privileges prevents privilege escalation. CPU and memory limits reduce the blast radius of a bad loop, runaway browser process, or unexpected agent behavior.

These settings may need adjustment. Browser automation, file-processing tools, and some native dependencies may require additional permissions or writable directories. The goal is not to blindly copy every restriction. The goal is to start restrictive and open only what the application actually needs.

Isolate Tooling and Data Paths

AI agents often call tools. Those tools should not all run with the same permissions.

A GitHub MCP server may need network access to GitHub but should not need write access to the local filesystem. A filesystem tool may need read access to a specific workspace but should not see the entire host machine. A browser automation tool may need temporary writable space but should not need database credentials.

A simple architecture separates the app, tools, and data:

This separation matters because compromise should not mean total access. If the browser tool is compromised, it should not automatically get database credentials. If the filesystem tool is compromised, it should not automatically get a model provider key. If the GitHub tool is compromised, it should have the smallest useful token scope.

Use SBOMs and Scanning

You cannot secure what you cannot see. A Software Bill of Materials, or SBOM, lists the components inside your image. Docker Scout uses SBOMs to understand the components in an image and cross-reference them with vulnerability data. Docker's Scout documentation describes it as a supply chain security solution that analyzes images, creates an inventory of components, and matches that inventory against vulnerability databases.

A basic scan can be part of your CI workflow:

name: Docker Security Scan

on:
  pull_request:
  push:
    branches:
      - main

jobs:
  scan:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      - name: Build image
        run: docker build -t node-ai-agent:ci .

      - name: Scan image with Docker Scout
        uses: docker/scout-action@v1
        with:
          image: node-ai-agent:ci
          command: cves
          only-severities: critical,high

Docker Scout's image analysis documentation explains that image analysis extracts the SBOM and image metadata, then evaluates it against vulnerability data from security advisories.

Scanning does not prove an image is safe. It gives you visibility. That visibility is especially important for AI apps because the dependency surface includes npm packages, system packages, browser dependencies, tool servers, and sometimes model artifacts.

Pin What Matters

Tags are convenient, but they can move. For production images, pin important base images by digest:

FROM node:22-slim@sha256:<digest>

This makes builds more predictable because the digest identifies a specific image. You can still update regularly, but updates become intentional instead of accidental.

The same thinking applies to MCP server images and other tool containers. Avoid pulling random latest images in production workflows. Use a known version or digest. Review the source. Keep an update process.

This is especially important for agent systems because tools can perform actions. A compromised or unexpectedly changed tool image is more dangerous than a broken static asset build.

A Practical Security Checklist

Before shipping a Node.js AI app in Docker, check the basics:

Use a trusted base image, preferably a hardened image when available
Avoid latest for production and pin critical images by digest
Use a multi-stage Dockerfile so build tools do not ship in the runtime image
Install dependencies with npm ci and ship only production dependencies
Run the container as a non-root user
Do not copy .env, .npmrc, local logs, test reports, or unnecessary files into the image
Use Docker Build Secrets for private package installation
Pass model provider keys, GitHub tokens, database credentials, and MCP credentials at runtime through a secret manager
Run image scanning in CI and review high and critical findings
Use read-only filesystems, dropped capabilities, resource limits, and narrow network access where possible
Give every MCP server and tool the smallest useful permission set

This checklist is not glamorous, but it is the work that makes AI systems safer to operate.

Conclusion

Your AI agent has a supply chain. It starts with the base image, continues through npm packages and build steps, extends into MCP servers and browser tools, and reaches all the way to model artifacts, credentials, and runtime permissions.

Docker gives Node.js teams practical controls for this problem. Docker Hardened Images provide a stronger starting point. Multi-stage builds reduce what ships. Build secrets keep private tokens out of image layers. Runtime restrictions limit damage. SBOMs and Docker Scout improve visibility. Digest pinning makes updates intentional.

None of this makes an AI app perfectly secure. It does create defense in depth. That matters because AI agents are not passive services. They read, call, summarize, browse, and sometimes act. The more capable the agent becomes, the more important its container boundary becomes.

A good AI architecture is not only about prompts, tools, and models. It is also about what the application is allowed to run, what it is allowed to access, where its dependencies came from, and what happens when something goes wrong.

Secure the supply chain before the agent becomes part of someone else's.

Docker as the Safety Net for AI-Generated Frontend Code

Raju Dandigam — Sat, 16 May 2026 14:36:09 +0000

Introduction

AI coding assistants can generate React components, Next.js pages, test files, form handlers, and TypeScript utilities very quickly. That speed is useful, but it also creates a new problem for frontend teams. The code may compile, pass linting, and look reasonable in a pull request, but still fail when a user clicks through the actual flow.

Frontend code is full of small runtime details that are easy to miss. A generated component may not handle empty states. A form may work with happy-path data but fail when the API returns an error. A modal may render correctly but break keyboard navigation. A layout may look fine on desktop and collapse on mobile. A test may pass on one developer's laptop and fail in CI because the browser or system dependencies are different.

This is where Docker becomes valuable. Docker does not make AI-generated code correct. It gives teams a repeatable place to verify that code. When Cypress or Playwright tests run inside Docker, the browser dependencies, Node.js version, operating system libraries, and test environment become more consistent across local development and CI.

The goal is not fully autonomous testing. The healthier pattern is supervised automation. Let AI tools generate or modify code. Run that code in a controlled Docker environment. Use Cypress or Playwright to validate important flows. Then let a human review the code with better evidence.

The Trust Gap in AI-Generated UI Code

AI-generated frontend code often looks convincing because it follows familiar patterns. It can produce a clean React component, use TypeScript interfaces, add Tailwind classes, and wire up a simple event handler. But correctness in frontend applications is not only about syntax.

A real user flow depends on rendering, browser behavior, routing, network calls, state updates, accessibility, responsive layout, and integration with the rest of the application. These are exactly the areas where generated code needs verification.

For example, an AI assistant might generate a profile component like this:

type UserProfileProps = {
  name: string;
  email: string;
  avatarUrl?: string;
};

export function UserProfile({ name, email, avatarUrl }: UserProfileProps) {
  return (
    <section data-testid="user-profile">
      {avatarUrl ? (
        <img src={avatarUrl} alt={`${name} avatar`} />
      ) : null}

      <h2>{name}</h2>
      <p>{email}</p>
    </section>
  );
}

The component is simple and probably fine. But several questions remain. What happens when name is empty? Is the avatar accessible enough? Does the component render properly in the page where it is used? Does the route load the expected data? Does the mobile layout still work? Does an existing test flow break?

Static checks cannot answer all of those questions. Browser tests can.

Why Docker Belongs in the Testing Workflow

Cypress and Playwright already solve the browser automation problem. Docker solves the environment problem.

Cypress maintains Docker images that include the operating system dependencies needed to run Cypress in containers, with different image options depending on whether you want Cypress and browsers preinstalled or want to install Cypress yourself. The Cypress CI documentation also covers Docker images, CI setup, caching, environment variables, and parallel execution.

Playwright also provides official Docker guidance. Its Docker documentation explains that the Playwright image includes browser system dependencies and browser binaries, while the Playwright package itself should be installed in your project. Playwright's Docker image is intended for CI and other Docker-supported environments.

That consistency matters when reviewing AI-generated changes. If a test fails, you want the failure to be about the application, not a missing browser dependency or a local machine difference.

Here is the workflow in one view:

The important part is the loop. AI speeds up generation. Docker and browser tests slow the process down just enough to make it safer.

A Simple Docker Compose Setup

A practical setup can use one service for the application and one service for the browser tests. The test container talks to the app container through Docker's internal network.

Here is a simple Compose file for a React or Next.js application with Playwright:

services:
  app:
    build:
      context: .
    ports:
      - "3000:3000"
    environment:
      NODE_ENV: test
    command: npm run start

  playwright:
    image: mcr.microsoft.com/playwright:v1.56.1-noble
    working_dir: /app
    depends_on:
      - app
    environment:
      PLAYWRIGHT_BASE_URL: http://app:3000
    volumes:
      - ./:/app
    command: sh -c "npm ci && npx playwright test"

This setup keeps the example intentionally simple. The app service starts your application. The playwright service runs tests against http://app:3000, which works because both services are on the same Docker Compose network.

For real projects, you should also make sure the test runner waits until the application is actually ready. depends_on controls startup order, but it does not automatically prove the application is ready to accept HTTP requests unless you use health checks. Docker's Compose documentation explains that Compose can wait for dependencies marked with service_healthy when a health check is defined.

A more reliable version adds a health check:

services:
  app:
    build:
      context: .
    ports:
      - "3000:3000"
    command: npm run start
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000"]
      interval: 5s
      timeout: 3s
      retries: 10

  playwright:
    image: mcr.microsoft.com/playwright:v1.56.1-noble
    working_dir: /app
    depends_on:
      app:
        condition: service_healthy
    environment:
      PLAYWRIGHT_BASE_URL: http://app:3000
    volumes:
      - ./:/app
    command: sh -c "npm ci && npx playwright test"

This avoids a common source of flaky tests: the test runner starts before the app is ready.

Testing an AI-Generated Component with Playwright

Assume the AI assistant generated the UserProfile component and a page renders it at /profile. A small Playwright test can verify the behavior that matters to users:

import { test, expect } from "@playwright/test";

test("profile page displays the user information", async ({ page }) => {
  await page.goto("/profile");

  const profile = page.getByTestId("user-profile");

  await expect(profile).toBeVisible();
  await expect(profile.getByRole("heading", { name: "Jane Doe" })).toBeVisible();
  await expect(profile.getByText("jane@example.com")).toBeVisible();
});

test("profile page works on a mobile viewport", async ({ page }) => {
  await page.setViewportSize({ width: 390, height: 844 });
  await page.goto("/profile");

  await expect(page.getByTestId("user-profile")).toBeVisible();
  await expect(page.getByText("jane@example.com")).toBeVisible();
});

This test does not try to prove everything. It validates the page from the user's point of view. The profile exists, the key information is visible, and the page still works on a mobile-sized viewport.

You can run it through Docker Compose:

docker compose run --rm playwright

If the generated component breaks the route, fails to render expected content, or behaves differently inside the containerized browser environment, the test gives you a clear signal before the code reaches production.

The Same Pattern with Cypress

Some teams prefer Cypress because of its developer experience, debugging flow, dashboard features, or existing test suite. The Docker pattern is similar:

services:
  app:
    build:
      context: .
    ports:
      - "3000:3000"
    command: npm run start
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000"]
      interval: 5s
      timeout: 3s
      retries: 10

  cypress:
    image: cypress/included:15.7.0
    working_dir: /e2e
    depends_on:
      app:
        condition: service_healthy
    environment:
      CYPRESS_baseUrl: http://app:3000
    volumes:
      - ./:/e2e
    command: --browser chrome

A Cypress test for the same page can stay simple:

describe("Profile page", () => {
  it("shows user information", () => {
    cy.visit("/profile");

    cy.get("[data-testid='user-profile']").should("be.visible");
    cy.contains("Jane Doe").should("be.visible");
    cy.contains("jane@example.com").should("be.visible");
  });

  it("works on mobile", () => {
    cy.viewport(390, 844);
    cy.visit("/profile");

    cy.get("[data-testid='user-profile']").should("be.visible");
    cy.contains("jane@example.com").should("be.visible");
  });
});

Run it with Docker Compose:

docker compose run --rm cypress

The exact image tag should match your project and CI strategy. The broader point is that Cypress and Playwright both have strong Docker support, so teams do not need to invent a custom browser environment from scratch.

Using Docker as a Sandbox for AI Changes

Testing is one part of the value. Isolation is another.

When an AI assistant changes code, especially in a larger repository, you may not fully understand the consequences immediately. Docker gives you a controlled environment to build and run the application without depending too much on the developer's machine.

For a safer local test environment, you can add basic constraints:

services:
  app:
    build:
      context: .
    read_only: true
    tmpfs:
      - /tmp
    mem_limit: 768m
    cpus: 1
    environment:
      NODE_ENV: test

These settings are not a complete security sandbox, but they reduce accidental damage. A read-only filesystem limits where the process can write. CPU and memory limits reduce the impact of runaway behavior. A temporary /tmp gives the app space for normal temporary files without opening the whole container filesystem.

For frontend validation, the goal is usually not to run completely untrusted code. The goal is to avoid letting generated code run directly against a developer's full local environment before there is some basic confidence.

CI for Pull Requests

The best place to apply this pattern is the pull request. AI-generated code should not get a lighter path to merge just because it was generated quickly. If anything, it needs visible validation.

Here is a simple GitHub Actions workflow:

name: Frontend E2E Tests

on:
  pull_request:
    paths:
      - "src/**"
      - "app/**"
      - "pages/**"
      - "components/**"
      - "tests/**"
      - "cypress/**"
      - "docker-compose.yml"

jobs:
  playwright:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      - name: Build app image
        run: docker compose build app

      - name: Run Playwright tests
        run: docker compose run --rm playwright

      - name: Upload Playwright report
        if: failure()
        uses: actions/upload-artifact@v4
        with:
          name: playwright-report
          path: playwright-report

  cypress:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      - name: Build app image
        run: docker compose build app

      - name: Run Cypress tests
        run: docker compose run --rm cypress

      - name: Upload Cypress artifacts
        if: failure()
        uses: actions/upload-artifact@v4
        with:
          name: cypress-artifacts
          path: |
            cypress/screenshots
            cypress/videos

You may not need to run both Cypress and Playwright in every project. Many teams should choose one primary browser testing framework and use it well. I included both here because many organizations already have Cypress suites while newer projects may prefer Playwright for cross-browser coverage and traces.

Debugging Failures

One reason browser tests are valuable for AI-generated changes is that they provide evidence. A failed test is not just a red checkmark. It can include screenshots, videos, traces, console logs, and network details.

Cypress can record screenshots and videos for failed runs, depending on configuration. Playwright can produce traces that show actions, DOM snapshots, network requests, console logs, and screenshots. These artifacts make it easier to review AI-generated changes because the reviewer can see how the application behaved, not just read the diff.

A useful review comment is not "AI broke the page." A useful review comment is "the generated profile component removed the empty-state branch, and the Playwright trace shows the mobile profile page rendering a blank card when the user has no avatar."

That is the kind of feedback loop teams need.

Practical Guidelines

Do not try to test every generated line of code with an end-to-end test. That will slow the team down and create brittle suites. Focus on user-facing flows and integration points.

Use unit tests for pure functions, component tests for isolated UI behavior, and Cypress or Playwright for complete flows. Docker is most useful for the tests where environment consistency matters: browser tests, integration tests, and workflows that depend on app services.

Keep the test environment close to production, but not identical at all costs. A test container should be realistic enough to catch meaningful issues and simple enough that developers can run it repeatedly.

Avoid giving AI-generated code direct access to sensitive local files, broad credentials, or production services during validation. Use test credentials, local services, and constrained containers.

Most importantly, keep a human in the loop. Docker and browser tests can tell you whether important behavior still works. They cannot decide whether the generated code is maintainable, aligned with product intent, accessible enough, or architecturally appropriate.

Conclusion

AI coding tools make frontend development faster, but faster code generation needs stronger verification. A React component that compiles is not automatically safe to merge. A generated page that looks good in a diff still needs to work in a browser, with real routing, layout, user interactions, and error states.

Docker gives teams a repeatable environment for that verification. Cypress and Playwright provide the browser automation. Together, they create a practical safety net for AI-generated frontend code.

The pattern is simple:

Let the AI tool propose the change
Start the app in Docker
Run Cypress or Playwright in a container
Capture screenshots, videos, or traces when something fails
Let a human review the code with evidence instead of guesswork

That is the right balance for 2026. Do not blindly trust generated code, and do not reject useful AI assistance out of fear. Put the code in a container, test the behavior, review the result, and merge only when the evidence supports it.

Testing AI-Generated Node.js Code with Real Dependencies using Docker and Test containers

Raju Dandigam — Wed, 13 May 2026 04:56:39 +0000

AI coding tools are becoming part of everyday software development. They can generate API routes, database queries, validation logic, repository classes, test cases, and even Dockerfiles in seconds. That speed is useful, but it also creates a new kind of risk. The generated code may look correct, pass a few mocked tests, and still fail when it meets a real database, a real cache, a real message queue, or a real browser workflow.

This is where many teams start feeling the weakness of mock-heavy testing. Mocks are fast, but they often test our assumptions instead of the actual behavior of the system. A mocked PostgreSQL client will return exactly what we tell it to return. It will not surprise us with a unique constraint violation, a transaction rollback issue, a timestamp behavior difference, a case-sensitive query problem, or a connection pooling edge case. Real systems behave with more friction, and good integration tests should include some of that friction.

Test containers helps solve this problem by starting real dependencies in Docker containers during test execution. Instead of mocking PostgreSQL, Redis, MongoDB, LocalStack, or another service, your test can start a short-lived container, connect your Node.js application to it, run the test, and clean everything up afterward. The Node.js implementation of test containers is designed for this kind of workflow, and the project describes it as a way to run lightweight, throwaway instances of common databases, Selenium browsers, or anything else that can run in Docker.

The idea is simple, but the impact is significant. When AI generates or modifies backend code, test containers gives you a safer way to verify whether that code works against real infrastructure behavior. It does not replace unit tests, and it does not remove the need for code review. Instead, it adds a confidence layer between “the code looks fine” and “this is safe enough to merge.”

Here is the testing flow in one view.

A typical mocked test might look clean, but it can hide important behavior.

describe("createUser", () => {
  it("creates a user and returns an id", async () => {
    const db = {
      query: vi.fn().mockResolvedValue({
        rows: [{ id: 1 }]
      })
    };

    const result = await createUser(db, {
      email: "demo@example.com",
      passwordHash: "hashed-password"
    });

    expect(result.id).toBe(1);
  });
});

This test is useful for checking that your function handles a successful response, but it does not prove that your SQL works. It does not verify that the users table exists, that the email column has a unique constraint, that the query returns the shape you expect, or that PostgreSQL handles your data types the way your mock suggests. If an AI assistant generated the SQL, this test may give you false confidence.

A better integration test uses a real PostgreSQL container.

import { PostgreSqlContainer, StartedPostgreSqlContainer } from "@testcontainers/postgresql";
import { Client } from "pg";
import { afterAll, beforeAll, describe, expect, it } from "vitest";

async function createUser(
  client: Client,
  input: { email: string; passwordHash: string }
): Promise<{ id: number; email: string }> {
  const result = await client.query(
    `
      INSERT INTO users (email, password_hash)
      VALUES ($1, $2)
      RETURNING id, email
    `,
    [input.email, input.passwordHash]
  );

  return result.rows[0];
}

describe("createUser integration test", () => {
  let container: StartedPostgreSqlContainer;
  let client: Client;

  beforeAll(async () => {
    container = await new PostgreSqlContainer("postgres:16-alpine")
      .withDatabase("app_test")
      .withUsername("test_user")
      .withPassword("test_password")
      .start();

    client = new Client({
      host: container.getHost(),
      port: container.getPort(),
      database: container.getDatabase(),
      user: container.getUsername(),
      password: container.getPassword()
    });

    await client.connect();

    await client.query(`
      CREATE TABLE users (
        id SERIAL PRIMARY KEY,
        email VARCHAR(255) UNIQUE NOT NULL,
        password_hash VARCHAR(255) NOT NULL,
        created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
      )
    `);
  }, 60_000);

  afterAll(async () => {
    await client.end();
    await container.stop();
  });

  it("creates a user and returns the generated id", async () => {
    const user = await createUser(client, {
      email: "demo@example.com",
      passwordHash: "hashed-password"
    });

    expect(user.id).toBeGreaterThan(0);
    expect(user.email).toBe("demo@example.com");
  });

  it("fails when the email already exists", async () => {
    await createUser(client, {
      email: "duplicate@example.com",
      passwordHash: "first-password"
    });

    await expect(
      createUser(client, {
        email: "duplicate@example.com",
        passwordHash: "second-password"
      })
    ).rejects.toThrow(/duplicate key value violates unique constraint/i);
  });
});

This test does something the mock cannot do. It proves that the table definition, SQL statement, unique constraint, PostgreSQL behavior, and TypeScript application code work together. That matters even more when some of the code was generated or refactored by an AI tool.

The same idea applies to API-level testing. Instead of testing only the repository function, you can test an Express route connected to the real database.

import express from "express";
import request from "supertest";
import { Pool } from "pg";
import { PostgreSqlContainer, StartedPostgreSqlContainer } from "@testcontainers/postgresql";
import { afterAll, beforeAll, describe, expect, it } from "vitest";

function createApp(pool: Pool) {
  const app = express();
  app.use(express.json());

  app.post("/users", async (req, res) => {
    const { email, password } = req.body;

    if (!email || !password) {
      return res.status(400).json({ error: "Email and password are required" });
    }

    try {
      const existingUser = await pool.query(
        "SELECT id FROM users WHERE email = $1",
        [email]
      );

      if (existingUser.rowCount > 0) {
        return res.status(409).json({ error: "Email already registered" });
      }

      const result = await pool.query(
        `
          INSERT INTO users (email, password_hash)
          VALUES ($1, $2)
          RETURNING id, email
        `,
        [email, `hashed-${password}`]
      );

      return res.status(201).json(result.rows[0]);
    } catch {
      return res.status(500).json({ error: "Internal server error" });
    }
  });

  return app;
}

describe("POST /users", () => {
  let container: StartedPostgreSqlContainer;
  let pool: Pool;
  let app: ReturnType<typeof createApp>;

  beforeAll(async () => {
    container = await new PostgreSqlContainer("postgres:16-alpine").start();

    pool = new Pool({
      host: container.getHost(),
      port: container.getPort(),
      database: container.getDatabase(),
      user: container.getUsername(),
      password: container.getPassword()
    });

    await pool.query(`
      CREATE TABLE users (
        id SERIAL PRIMARY KEY,
        email VARCHAR(255) UNIQUE NOT NULL,
        password_hash VARCHAR(255) NOT NULL,
        created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
      )
    `);

    app = createApp(pool);
  }, 60_000);

  afterAll(async () => {
    await pool.end();
    await container.stop();
  });

  it("registers a new user", async () => {
    const response = await request(app)
      .post("/users")
      .send({
        email: "new-user@example.com",
        password: "secure-password"
      })
      .expect(201);

    expect(response.body.id).toBeDefined();
    expect(response.body.email).toBe("new-user@example.com");
  });

  it("returns conflict for duplicate email", async () => {
    await request(app)
      .post("/users")
      .send({
        email: "same-user@example.com",
        password: "first-password"
      })
      .expect(201);

    const response = await request(app)
      .post("/users")
      .send({
        email: "same-user@example.com",
        password: "second-password"
      })
      .expect(409);

    expect(response.body.error).toBe("Email already registered");
  });
});

This is a more realistic safety check for AI-generated backend code. If an assistant changes the route, modifies the SQL query, renames a column, removes the duplicate check, or mishandles an error path, this test has a much better chance of catching the issue than a mock-based test.

Testcontainers also works well with browser testing. Cypress and Playwright are often used to test the full user experience, but those tests are only as reliable as the environment behind them. Cypress maintains Docker images with the required dependencies for running Cypress in Docker, and its CI documentation covers Docker images, caching, parallel execution, and environment configuration. Playwright also provides Docker guidance, including images that contain browser system dependencies for running tests in containerized environments.

A useful pattern is to let Testcontainers provide the backend dependency while Playwright or Cypress validates the user flow. For example, a registration flow can use a real PostgreSQL container, a real API server, and a real browser test. This gives you confidence that the user interface, HTTP layer, validation logic, database query, and persistence behavior all work together.

import { test, expect } from "@playwright/test";
import { PostgreSqlContainer, StartedPostgreSqlContainer } from "@testcontainers/postgresql";

let container: StartedPostgreSqlContainer;
let baseUrl: string;

test.beforeAll(async () => {
  container = await new PostgreSqlContainer("postgres:16-alpine").start();

  baseUrl = await startApplicationForTests({
    database: {
      host: container.getHost(),
      port: container.getPort(),
      name: container.getDatabase(),
      user: container.getUsername(),
      password: container.getPassword()
    }
  });
});

test.afterAll(async () => {
  await stopApplicationForTests();
  await container.stop();
});

test("a user can register and see the profile page", async ({ page }) => {
  await page.goto(`${baseUrl}/register`);
  await page.fill("[name='email']", "playwright-user@example.com");
  await page.fill("[name='password']", "secure-password");
  await page.click("button[type='submit']");

  await expect(page.locator("[data-testid='profile-email']")).toHaveText(
    "playwright-user@example.com"
  );
});

The startApplicationForTests function depends on your project structure, but the principle is straightforward. Start the dependency first, pass its runtime connection details into the app, then run the browser test against the real stack.

This pattern is especially valuable when AI coding tools are changing frontend and backend code together. A generated form update may look correct in the browser, but it might send a payload that no longer matches the API. A generated API route may compile, but it might break database constraints. A generated repository method may pass unit tests, but fail against PostgreSQL because of an incorrect column name. Real dependency testing helps catch these integration gaps.

Test containers is not only for PostgreSQL. The Node.js ecosystem has modules for databases and services such as MongoDB, Redis, and LocalStack, and it also supports generic containers for custom services. The official getting started guide demonstrates using PostgreSQL for Node.js tests, while the broader project describes test containers as a way to test with the same kinds of services used in production instead of relying on mocks or in-memory replacements.

import { GenericContainer, Wait } from "testcontainers";

const service = await new GenericContainer("my-company/search-service:test")
  .withExposedPorts(8080)
  .withEnvironment({
    NODE_ENV: "test"
  })
  .withWaitStrategy(Wait.forHttp("/health", 8080))
  .start();

const searchServiceUrl = `http://${service.getHost()}:${service.getMappedPort(8080)}`;

Readiness checks are important. A container being “started” does not always mean the service inside it is ready to accept requests. Waiting for an HTTP endpoint, a log message, or a health check can prevent flaky tests that fail only because the test ran too early.

There are trade-offs. Test containers-based tests are slower than unit tests. A PostgreSQL container may take a few seconds to start, especially on the first run when Docker needs to pull the image. These tests also require Docker to be available locally and in CI. That is why test containers should not replace your unit test suite. The best approach is layered testing. Keep fast unit tests for pure functions and isolated business logic. Use test containers for integration points where real dependency behavior matters.

In practice, you can keep performance reasonable by starting containers once per test file, cleaning data between tests, and avoiding unnecessary container restarts. You can truncate tables, use transactions, or create isolated schemas depending on your application. Recent test containers Node releases also continue to improve operational behavior. The 11.14.0 release added auto cleanup control for containers and compose environments, along with support for running in parallel for distinct UIDs.

A simple GitHub Actions setup is usually enough because hosted Ubuntu runners already support Docker.

name: Integration Tests

on:
  pull_request:
  push:
    branches:
      - main

jobs:
  test:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: 22
          cache: npm

      - run: npm ci

      - name: Run unit and integration tests
        run: npm test

The main requirement is that your CI environment can run Docker containers. Once that is available, your tests can create real dependencies on demand without maintaining long-lived shared test databases.

The most important mindset shift is this: mocks are not bad, but they are not enough. They are great for speed, edge cases, and isolated logic. They are weak when the risk lives in the contract between your application and a real dependency. AI-generated code increases that risk because it can produce code that looks reasonable while subtly misunderstanding the database schema, query behavior, or runtime environment.

Test containers gives TypeScript teams a practical way to validate those boundaries. It lets you test Node.js APIs against real databases, run browser flows against realistic backends, verify migrations, check queue or cache behavior, and build more trustworthy CI pipelines. For teams adopting AI-assisted development, that confidence layer becomes even more valuable.

The goal is not to test everything with Docker. The goal is to stop pretending that a mock database proves your application works with a real one. Start with one important flow, such as registration, checkout, booking, authentication, or report generation. Replace the mock-heavy integration test with a test containers-backed test. Run it locally. Add it to CI. Then expand only where the added confidence is worth the extra runtime.

As AI tools make code generation faster, our validation systems need to become more grounded. Test containers is one of the most practical ways to bring that grounding into a modern TypeScript and Node.js workflow.

Your AI Agent Dockerfile Might Be Leaking Secrets

Raju Dandigam — Sun, 10 May 2026 17:12:55 +0000

Introduction

Dockerfiles are often treated as boring infrastructure files. We copy a working example, adjust a few commands, install dependencies, and move on. That is understandable, but it is also where many security mistakes begin.

This risk becomes more important when we build AI-enabled Node.js applications. A modern AI app may depend on private npm packages, internal SDKs, GitHub repositories, model provider credentials, MCP server configuration, or private build-time assets. If we are not careful, tokens used during the Docker build can accidentally become part of the image history, image layers, build logs, or final runtime environment.

Docker Build Secrets solve one specific problem: passing sensitive values to the build process without baking them into the final image. Docker's documentation is clear that build arguments and environment variables are not appropriate for secrets because they can persist in the final image, while secret mounts and SSH mounts are designed for securely exposing sensitive data only during a build step.

This article focuses on the practical Node.js and AI-agent case: installing private packages, accessing private repositories, and avoiding the common mistake of treating API keys as normal Dockerfile variables.

The Common Mistake

A common Dockerfile pattern looks like this:

FROM node:22-slim

WORKDIR /app

ARG NPM_TOKEN
ENV NPM_TOKEN=$NPM_TOKEN

COPY package*.json ./

RUN npm config set //registry.npmjs.org/:_authToken=$NPM_TOKEN \
  && npm ci

COPY . .

RUN npm run build

CMD ["node", "dist/index.js"]

At first, this looks reasonable. The build needs an npm token to install private packages, so the token is passed as an argument and used during npm ci.

The problem is that ARG and ENV were not designed for secrets. The value may appear in metadata, logs, or intermediate layers depending on how the image is built and inspected. Even if the final container runs fine, the image may now carry more information than intended.

This gets worse when developers use the same pattern for AI credentials:

ARG OPENAI_API_KEY
ENV OPENAI_API_KEY=$OPENAI_API_KEY

That is usually the wrong place for a model provider key. An OpenAI key, Anthropic key, GitHub token, or MCP server credential should normally be a runtime secret, not a build-time value. The build process usually does not need it. The running application does.

Why AI Apps Make This Easier to Get Wrong

AI applications often blur the boundary between build time and runtime. A regular Node.js API may only need dependencies during build and database credentials during runtime. An AI-agent application may also need tool credentials, private package access, GitHub access, prompt assets, evaluation data, and model provider keys.

That complexity leads to shortcuts. A developer may add a token to the Dockerfile just to make the build pass. An AI coding assistant may generate a Dockerfile that uses ARG because it looks simple. A CI workflow may pass secrets directly into build arguments because it is easy to wire up.

The safer habit is to ask one question before adding any secret to a Docker build: does this value need to exist while building the image, or only when running the container?

If the secret is needed to install a private npm package, clone a private repository, or download a private build asset, it may be a build secret. If the secret is needed to call a model provider, connect to a database, access an MCP tool, or call an external API at runtime, it should be passed when the container runs.

The Safer Pattern: Build Secrets

Docker BuildKit supports secret mounts. A secret mount exposes a value as a temporary file during a specific RUN instruction. By default, Docker mounts secrets under /run/secrets, and the secret is not automatically copied into the final image unless your command explicitly writes it somewhere permanent. Docker describes this as a two-step process: pass the secret into docker build, then consume it inside the Dockerfile using a secret mount.

Here is a safer version for installing private npm packages:

# syntax=docker/dockerfile:1.7

FROM node:22-slim AS build

WORKDIR /app

COPY package*.json ./

RUN --mount=type=secret,id=npm_token \
  npm config set //registry.npmjs.org/:_authToken="$(cat /run/secrets/npm_token)" \
  && npm ci \
  && npm config delete //registry.npmjs.org/:_authToken

COPY . .

RUN npm run build

FROM node:22-slim AS runtime

WORKDIR /app

ENV NODE_ENV=production

COPY --from=build /app/dist ./dist
COPY --from=build /app/package*.json ./

RUN npm ci --omit=dev

CMD ["node", "dist/index.js"]

Then build the image like this:

docker build \
  --secret id=npm_token,env=NPM_TOKEN \
  -t ai-agent-api:local .

In this example, the npm token is available only during the RUN instruction that installs dependencies. It is not declared with ARG, not promoted to ENV, and not needed in the runtime image.

Architecture in One View

The important distinction is that build secrets and runtime secrets solve different problems. Build secrets help the image build safely. Runtime secrets help the container run safely.

GitHub Actions Example

Docker also documents secret mounts and SSH mounts for GitHub Actions builds. Secret mounts expose values as files during the build container step, while SSH mounts expose SSH agent sockets or keys for operations such as cloning private repositories.

Here is a simple GitHub Actions workflow using Docker's Build Push Action:

name: Build Docker Image

on:
  pull_request:
  push:
    branches:
      - main

jobs:
  docker-build:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      - uses: docker/setup-buildx-action@v3

      - uses: docker/build-push-action@v6
        with:
          context: .
          push: false
          tags: ai-agent-api:ci
          secrets: |
            npm_token=${{ secrets.NPM_TOKEN }}

The matching Dockerfile can read the secret as /run/secrets/npm_token:

RUN --mount=type=secret,id=npm_token \
  npm config set //registry.npmjs.org/:_authToken="$(cat /run/secrets/npm_token)" \
  && npm ci \
  && npm config delete //registry.npmjs.org/:_authToken

This is much safer than passing the npm token as a build argument.

What About SSH Keys?

Sometimes the build needs to pull code from a private Git repository. For that, SSH mounts are usually a better fit than copying a private key into the image:

# syntax=docker/dockerfile:1.7

FROM node:22-slim AS build

WORKDIR /app

RUN apt-get update \
  && apt-get install -y --no-install-recommends git openssh-client \
  && rm -rf /var/lib/apt/lists/*

RUN --mount=type=ssh \
  git clone git@github.com:your-org/private-agent-tools.git tools

Build it with SSH forwarding enabled:

docker build --ssh default -t ai-agent-api:local .

The SSH key is not copied into the image. The build step gets temporary access through the SSH mount.

What Should Not Be a Build Secret

Not every secret belongs in docker build --secret.

Model provider keys are usually runtime secrets. If your Node.js application calls a model API when it runs, pass the key at runtime:

docker run \
  -e OPENAI_API_KEY="$OPENAI_API_KEY" \
  ai-agent-api:local

For local development, Docker Compose can read values from your environment or an ignored .env file:

services:
  app:
    image: ai-agent-api:local
    environment:
      OPENAI_API_KEY: ${OPENAI_API_KEY}
      MCP_GITHUB_TOKEN: ${MCP_GITHUB_TOKEN}

For production, use your platform's secret manager. That may be AWS Secrets Manager, Kubernetes Secrets, Docker Swarm secrets, GitHub environment secrets, or another managed secret store. The key idea is the same: runtime credentials should be provided to the running container, not baked into the image.

A Simple Checklist for Node.js AI Apps

Before committing a Dockerfile for an AI application, review it with these questions:

Does the Dockerfile use ARG or ENV for anything that looks like a token, key, password, or credential?
Does the build need the secret, or does only the running app need it?
Are private npm tokens passed through --secret instead of ARG?
Are SSH keys forwarded through --ssh instead of copied?
Does the final runtime image avoid .npmrc, private keys, local .env files, and unnecessary build artifacts?
Is .dockerignore excluding files such as .env, .npmrc, .git, logs, coverage output, and local test data?

A basic .dockerignore should usually include these files:

.env
.env.*
.npmrc
.git
node_modules
coverage
dist
*.log

Be careful with dist if your build process expects it from the host. In most production Docker builds, the image should build its own dist output inside the container.

How to Verify You Did Not Leak Something Obvious

You can inspect image history:

docker history ai-agent-api:local

You can also run a quick scan inside the image filesystem:

docker run --rm ai-agent-api:local sh -c "find /app -type f | xargs grep -i 'sk-' || true"

That command is not a full security scanner, but it can catch obvious mistakes. For serious workflows, use dedicated secret scanning and image scanning tools in CI.

This is not theoretical. A 2023 internet-wide study of container images found that exposed secrets in container images are a real issue, including private keys and API secrets discovered across public and private registries.

Conclusion

Docker Build Secrets are not complicated, but they require a clear mental model.

Use build secrets when the build process needs temporary access to sensitive data, such as private npm packages or private source repositories. Use runtime secrets when the running application needs credentials, such as OpenAI keys, GitHub tokens, database passwords, or MCP server credentials.

For AI-agent applications, this distinction matters even more. Agents often connect to powerful tools and sensitive systems. A leaked token can expose private repositories, model usage, customer data, internal APIs, or deployment workflows.

The safer pattern is simple:

Do not put secrets in ARG
Do not promote them to ENV inside the Dockerfile
Do not copy .env or .npmrc into the image
Use RUN --mount=type=secret for build-time secrets
Use --mount=type=ssh for private Git access
Pass runtime credentials through your runtime environment or secret manager

Your Dockerfile is part of your application's security boundary. Treat it that way, especially when the application is powered by AI and connected to real tools.

MCP Without the Setup Pain: Using Docker MCP Toolkit with TypeScript Agents

Raju Dandigam — Fri, 08 May 2026 15:00:06 +0000

Introduction

Model Context Protocol, usually called MCP, has quickly become one of the most important ideas in AI application development. It gives AI tools and agents a standard way to connect to external systems such as filesystems, GitHub, databases, browsers, documentation, and internal APIs.

The protocol is useful because it gives agents a common tool interface. Instead of every AI application inventing its own way to call tools, MCP creates a shared pattern for exposing capabilities.

However, the protocol is only one part of the story. The real pain starts when developers need to run multiple MCP servers locally. One server may need Node.js, another may need Python, another may need browser dependencies, and another may need OAuth or API keys. Suddenly, your agent is not just an AI workflow. It is a small distributed system running on your laptop.

Docker MCP Toolkit tries to solve that operational problem. It does not replace MCP, and it does not make your agent intelligent by itself. Its value is simpler and more practical: it helps you discover, configure, run, and manage MCP servers as containerized tools through Docker Desktop and the Docker MCP Gateway.

The Real MCP Problem Is Setup

A TypeScript agent may look simple at first. It receives a user request, asks an LLM what to do next, and then calls tools. But those tools need to run somewhere.

Imagine a code-review agent that needs three capabilities. It needs GitHub access to read pull request metadata. It needs filesystem access to inspect local files. It needs Playwright access to open a preview deployment and check whether the application still works.

Without Docker, each tool may come with a different setup process. You may need to install Node.js packages for one server, Python packages for another server, browser dependencies for Playwright, and local credentials for each integration. That might be acceptable for one developer on one machine. It becomes painful when a second developer joins, when the setup moves to CI, or when the team needs consistent tool versions.

This is the same problem Docker has always been good at solving. A tool should bring its runtime and dependencies with it. Developers should not need to manually reproduce a long setup document just to run the same agent workflow.

Docker's MCP documentation describes the Toolkit as a Docker Desktop management interface for setting up, managing, and running containerized MCP servers in profiles and connecting them to AI agents. It also highlights profile-based organization, integrated tool discovery, and zero manual setup as key features.

What Docker MCP Toolkit Actually Does

Docker MCP Toolkit sits between your AI client and your MCP servers. The AI client might be Claude Desktop, Cursor, VS Code, Docker AI Agent, or your own local TypeScript agent. The MCP servers are the tools that perform actions.

The Toolkit helps with the operational layer. It lets you browse MCP servers from Docker's MCP Catalog, add servers to profiles, connect clients, and run those servers through the Docker MCP Gateway. Docker's MCP Catalog documentation says the catalog contains more than 300 verified MCP servers packaged as container images with versioning, provenance, and security updates.

That packaging matters. A containerized MCP server can include the runtime it needs, the dependencies it needs, and a more predictable execution environment. The Docker MCP Gateway then manages the server lifecycle. Docker's gateway documentation explains that when an AI application needs a tool, the gateway identifies the correct server, starts it as a Docker container if needed, injects required credentials, applies security restrictions, forwards the request, and returns the result.

The important point is that your agent does not need to know how every MCP server is installed. It only needs to connect through the gateway.

Architecture Overview

Here is the architecture in one view.

The profile defines which servers are available for a workflow. For example, a frontend development profile might include GitHub, filesystem, Playwright, and documentation search. A backend profile might include GitHub, PostgreSQL, Redis, and observability tools.

Docker's profile documentation says profiles organize servers into named collections for different projects, and different AI applications can connect to different profiles. It also notes that profiles can be shared with teams through OCI-compliant registries, while credentials are not included in the shared profile for security reasons.

That gives teams a cleaner model. The profile defines the approved toolset. Each developer configures their own credentials. The agent connects to the profile instead of a random collection of local scripts.

Getting Started with Docker MCP Toolkit

The easiest path is through Docker Desktop. In current Docker documentation, Docker recommends using the MCP Toolkit interface in Docker Desktop, especially for discovery and profile management. The get-started guide explains that you can create a profile from the Profiles tab, browse servers from the Catalog tab, add them to the profile, and connect supported clients from the Clients tab.

A simple setup flow looks like this:

Open Docker Desktop
Select MCP Toolkit
Create a profile named frontend-agent
Add GitHub, filesystem, and Playwright servers from the Catalog tab
Configure required credentials or OAuth permissions
Connect your AI client to the profile

For clients that are not directly listed in Docker Desktop, Docker documents a manual stdio configuration using the gateway command:

{
  "servers": {
    "MCP_DOCKER": {
      "command": "docker",
      "args": ["mcp", "gateway", "run", "--profile", "frontend-agent"],
      "type": "stdio"
    }
  }
}

This is a useful pattern because many MCP clients support launching a local MCP server process over stdio. In this case, the process is the Docker MCP Gateway, and the gateway manages the actual MCP server containers behind it.

A Simple TypeScript Agent Example

The MCP client SDK APIs may vary based on transport and package version, so the example below is intentionally simple. The goal is to show the application shape, not hide the article behind too much SDK boilerplate.

A TypeScript agent using MCP tools usually follows this pattern:

type ToolCall = {
  name: string;
  arguments: Record<string, unknown>;
};

type ToolResult = {
  content: string;
};

async function callMcpTool(tool: ToolCall): Promise<ToolResult> {
  // In a real application, this call goes through your MCP client transport.
  // Docker MCP Gateway handles routing to the correct containerized server.
  console.log(`Calling MCP tool: ${tool.name}`);

  return {
    content: `Result from ${tool.name}`
  };
}

async function reviewPullRequest(prUrl: string) {
  const prDetails = await callMcpTool({
    name: "github.get_pull_request",
    arguments: { url: prUrl }
  });

  const changedFiles = await callMcpTool({
    name: "github.list_changed_files",
    arguments: { url: prUrl }
  });

  const packageJson = await callMcpTool({
    name: "filesystem.read_file",
    arguments: { path: "/workspace/package.json" }
  });

  return {
    prDetails: prDetails.content,
    changedFiles: changedFiles.content,
    packageJson: packageJson.content
  };
}

reviewPullRequest("https://github.com/example/app/pull/42")
  .then(console.log)
  .catch(console.error);

In a real implementation, callMcpTool would use an MCP client transport connected to the Docker MCP Gateway. The gateway would route github.* calls to the GitHub MCP server container and filesystem.* calls to the filesystem MCP server container.

The agent itself stays clean. It is not installing GitHub dependencies. It is not launching Playwright. It is not managing Python or Node runtimes for individual tool servers. It is asking for tools by name, and Docker handles the operational boundary.

Why This Matters for TypeScript Agents

TypeScript is a strong fit for agent applications because it helps define tool contracts, workflow state, structured outputs, and runtime validation. But TypeScript alone does not solve the environment problem. A typed tool call still fails if the MCP server is not installed correctly, if the browser dependency is missing, or if a credential is configured differently across machines.

Docker MCP Toolkit makes the tool layer more repeatable. A team can agree that a specific profile is the standard development toolset. One developer can use it from Cursor. Another can connect it to Claude Desktop. A third can connect a custom TypeScript agent. The server collection stays consistent.

This becomes more important as agents move beyond simple demos. A real code assistant may need repository access, issue tracker access, local file access, test execution, browser automation, and documentation search. Without a management layer, MCP server sprawl becomes a real problem.

Where Docker Helps Most

Docker helps most when your agent needs more than one or two tools. If you are only testing a single local MCP server, manual setup may be fine. But if your workflow depends on several MCP servers, different runtimes, and credentials, the Docker approach becomes much more useful.

It also helps when teams need consistency. A new developer should not need to install five runtimes and follow a long checklist before trying an agent workflow. The closer the setup gets to "pull the profile, configure credentials, connect the client," the easier it becomes to share.

Docker also helps with security boundaries. MCP servers are powerful because they can touch real systems. That also makes them risky. A filesystem tool should not automatically access your entire machine. A browser tool should not have unlimited permissions. A GitHub tool should use scoped credentials. Running tools through a gateway and containerized servers does not remove all risk, but it gives teams a better place to apply isolation and control.

The Docker MCP Gateway repository describes this gateway pattern as AI Client → MCP Gateway → MCP Servers, with servers running as Docker containers and the gateway providing a unified interface, secrets handling, OAuth integration, and dynamic discovery.

What This Does Not Solve

Docker MCP Toolkit is not magic. It does not make a weak agent design reliable. It does not decide which tool should be called. It does not validate every tool result for you. It does not remove the need for approval gates when an agent can modify files, open pull requests, deploy code, or touch production-like systems.

It also does not mean every MCP server is automatically safe. You still need to choose trusted servers, limit permissions, review tool access, and avoid giving broad credentials to experimental workflows. Docker's catalog and container packaging improve the operational story, but security still depends on how the tools are configured and what the agent is allowed to do.

There is also a learning curve. Developers still need to understand MCP concepts such as clients, servers, tools, transports, and permissions. Docker simplifies the runtime and setup problem. It does not eliminate the need to design the agent workflow carefully.

A Practical Use Case

A good first use case is a local code review assistant. Keep it simple. Give it access to GitHub for pull request metadata, filesystem access to the local repository, and Playwright access to a preview URL.

The agent flow can be straightforward:

This is useful because it is realistic but still safe enough for a first experiment. The agent is not deploying anything. It is not merging code. It is gathering context and producing a review summary.

When to Use Docker MCP Toolkit

Use Docker MCP Toolkit when you are building agents that need multiple external tools, when you want repeatable local setup across a team, or when you want MCP servers to run in isolated containers instead of directly on every developer machine.

It is especially useful for TypeScript agent projects that combine GitHub, filesystem, browser automation, documentation search, databases, or cloud service tools. It is also useful when you want the same profile available across multiple AI clients.

Skip it for very small experiments. If you are testing one MCP server for an hour, manual setup may be faster. Bring in Docker MCP Toolkit when the setup starts becoming part of the problem.

Conclusion

MCP standardizes how agents talk to tools. Docker MCP Toolkit standardizes how those tools are discovered, configured, run, and shared.

That distinction matters. The future of agent development is not only about better prompts or smarter models. It is also about safer and more repeatable tool execution. Agents become more useful when they can access real systems, but they become harder to manage when every tool brings its own runtime, secrets, permissions, and setup instructions.

Docker MCP Toolkit gives TypeScript developers a practical way to manage that complexity. It lets teams create profiles, run MCP servers as containers, connect clients through a gateway, and reduce the dependency chaos that comes with multi-tool agents.

For a small prototype, you may not need it. For a real agent workflow that depends on GitHub, files, browsers, databases, or internal tools, Docker MCP Toolkit can make MCP feel less like a pile of scripts and more like a manageable development platform.

Stop Burning API Credits While Building AI Apps: Run Local LLMs with Docker Model Runner

Raju Dandigam — Thu, 07 May 2026 16:24:35 +0000

Building AI features usually starts with a cloud API. That is the fastest path when you are experimenting with chat interfaces, summarization, classification, content generation, or agent workflows. You add an SDK, pass an API key, send a prompt, and get a response back.

That simplicity is great, but during active development it can also become noisy. Every prompt experiment, failed test, retry, debugging session, and local demo sends another request to a paid service. For one developer, the cost may be small. For a team building AI features every day, those calls can add up quickly. There is also another concern: not every development prompt should leave your machine, especially when you are testing with internal documents, customer-like data, logs, or proprietary examples.

Docker Model Runner gives JavaScript developers another option. It lets you run AI models locally using Docker’s workflow and expose them through APIs that feel familiar to developers already using OpenAI-style clients. Docker describes Model Runner as a way to run and manage AI models locally, serve models through OpenAI and Ollama-compatible APIs, and package model files as OCI artifacts. That means AI models can start behaving more like other Docker-managed development dependencies.

This does not mean local models replace cloud models for every use case. They usually do not. Cloud models are still better for production workloads that need high-quality reasoning, scale, reliability, and the latest model capabilities. The more useful point is simpler: local models are very useful during development, especially when you want fast iteration, predictable cost, and better control over data.

Here is the workflow in one view.

The application code can stay almost the same. The main difference is configuration. In development, your OpenAI-compatible client points to Docker Model Runner. In production, it points to your cloud provider.

Docker Model Runner is integrated with Docker Desktop and Docker Engine. Docker’s API reference shows that host processes can access the Model Runner API at http://localhost:12434, while containers can access it through Docker networking patterns such as model-runner.docker.internal:12434 when configured through Compose.

Before writing code, enable Docker Model Runner in Docker Desktop if it is not already enabled. Then confirm the CLI is available.

docker model --help

You can pull a model using the Docker model command. The exact model you choose depends on what is available in your Docker environment and what your machine can run comfortably.

docker model pull ai/llama3.2:3B-Q4_K_M

After pulling a model, you can run a quick prompt from the command line.

docker model run ai/llama3.2:3B-Q4_K_M "Explain Docker containers in one sentence."

This is already useful for quick experiments, but the real value for JavaScript developers comes from calling the local model from a Node.js app.

Install the OpenAI SDK.

npm install openai

Now create a small TypeScript helper that talks to the local Docker Model Runner endpoint.

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY || "local-development-key",
  baseURL: process.env.OPENAI_BASE_URL || "http://localhost:12434/engines/llama.cpp/v1"
});

export async function generateSummary(text: string): Promise<string> {
  const response = await client.chat.completions.create({
    model: "ai/llama3.2:3B-Q4_K_M",
    messages: [
      {
        role: "system",
        content: "You summarize technical text clearly and briefly."
      },
      {
        role: "user",
        content: `Summarize this text in three sentences:\n\n${text}`
      }
    ],
    temperature: 0.3
  });

  return response.choices[0]?.message?.content ?? "";
}

Then call it from a simple script.

import { generateSummary } from "./generate-summary";

async function main() {
  const summary = await generateSummary(`
    Docker Model Runner lets developers run AI models locally and call them
    through familiar API formats. This can reduce development cost and keep
    sensitive experimentation data on the developer machine.
  `);

  console.log(summary);
}

main().catch((error) => {
  console.error("Failed to generate summary:", error);
  process.exit(1);
});

The most important part is not the example itself. The important part is the boundary. Your application is not tightly coupled to one provider. It is coupled to an OpenAI-compatible interface. That gives you flexibility.

In local development, you can use this environment configuration.

OPENAI_BASE_URL=http://localhost:12434/engines/llama.cpp/v1
OPENAI_API_KEY=local-development-key

The rest of your application does not need to change. This pattern is valuable because most AI application code should not care whether the model is running locally or remotely. It should care about the contract: send messages, receive a response, handle errors, and validate the output.

A practical use case for local models is development-time text processing. For example, imagine you are building an internal support tool that summarizes customer tickets before a human reads them. During development, you may run the same prompt hundreds of times while tuning the wording, testing edge cases, and adjusting the UI. A local model is a good fit for that stage because you are optimizing the workflow, not making final production-quality decisions.

Here is a slightly more realistic example.

type TicketSummary = {
  category: "billing" | "bug" | "account" | "other";
  summary: string;
};

export async function summarizeTicket(ticketText: string): Promise<TicketSummary> {
  const response = await client.chat.completions.create({
    model: "ai/llama3.2:3B-Q4_K_M",
    messages: [
      {
        role: "system",
        content:
          "Classify the support ticket and summarize it. Return only valid JSON."
      },
      {
        role: "user",
        content: ticketText
      }
    ],
    temperature: 0.2
  });

  const content = response.choices[0]?.message?.content ?? "{}";

  try {
    return JSON.parse(content) as TicketSummary;
  } catch {
    return {
      category: "other",
      summary: "The model returned an invalid response."
    };
  }
}

This example is intentionally simple. In a real application, you would validate the response with a schema library such as Zod, add retries for invalid JSON, and log model behavior for debugging. The point is that Docker Model Runner lets you build and test this workflow locally without sending every prompt to a cloud API.

Docker is also moving toward making models fit naturally into Compose-based development. The Docker Compose model reference describes a models section where an AI model can be defined as an OCI artifact, pulled and served by Model Runner, and then exposed to an application through injected connection information.

Conceptually, that means a future local AI development stack can look like this.

services:
  app:
    build: .
    ports:
      - "3000:3000"
    environment:
      OPENAI_BASE_URL: http://model-runner.docker.internal:12434/engines/llama.cpp/v1
      OPENAI_API_KEY: local-development-key
    extra_hosts:
      - "model-runner.docker.internal:host-gateway"

This keeps the Node.js application containerized while still allowing it to reach the local Model Runner endpoint. Docker’s API docs specifically note that containers may need an extra_hosts entry to access model-runner.docker.internal through the host gateway.

There are several places where this local-first setup is useful.

It is useful for prompt iteration because you can test many versions without worrying about API usage. It is useful for privacy-sensitive development because test data can stay on your machine. It is useful for offline work after the model is already pulled. It is also useful for CI experiments where you want to run basic LLM-dependent tests without calling a cloud provider, although you should keep those tests small because local inference can be slower and hardware-dependent.

There are also clear limits.

Local models usually do not match the quality of the strongest hosted models. Smaller models can summarize, classify, rewrite, and answer simple questions reasonably well, but they may struggle with complex reasoning or long context tasks. Performance depends heavily on your hardware, especially RAM and GPU availability. A small model may run comfortably on a developer laptop, while a larger model may feel too slow for daily use.

Docker Model Runner is also best understood as a development tool first. Docker’s product page emphasizes local-first inference, no recurring API costs for local usage, privacy, and control. Those are development strengths. They do not automatically make it the right choice for high-scale production serving.

A healthy architecture is to keep both paths available.

Use local inference when you are designing prompts, building UI flows, testing basic behavior, working with sensitive examples, or experimenting with agent workflows. Use cloud inference when you need production reliability, stronger model quality, scale, monitoring, and service-level guarantees.

The bigger lesson is that AI development is starting to look more like normal software development. We want local dependencies. We want repeatable environments. We want clear configuration. We want the ability to run important parts of the system without depending on external services for every test.

Docker Model Runner fits into that shift. It brings AI models closer to the Docker workflow many developers already understand. You pull a model, run it locally, expose an API, and connect your application to it. For JavaScript and TypeScript developers, the OpenAI-compatible API makes the adoption path even easier because the application code can remain familiar.

This is not a replacement for cloud AI platforms. It is a practical addition to the developer toolbox. If you are building AI features in Node.js and you want cheaper prompt iteration, better local privacy, and a Docker-native workflow, Docker Model Runner is worth exploring.

Stop Messy AI Projects: A Clean Folder Structure for Real Agent Systems

Raju Dandigam — Tue, 05 May 2026 20:42:29 +0000

Every AI agent project starts the same way. You create an index.ts, add a prompt, maybe define a couple of tools, and everything works. For a while, it even feels clean and manageable. Then the system starts to grow. You introduce memory, add logging, experiment with multiple agents, and eventually build workflows. At that point, the simplicity disappears and the codebase turns into a collection of loosely connected files with no clear structure.

This is the part most tutorials skip. They show how to call a model, but they rarely show how to organize a system around it.

In a previous article, I discussed why AI agents should be designed as controlled systems where the model proposes actions and the application owns validation, execution, and safety. This article is the practical extension of that idea. If you were starting a TypeScript AI agent project today, this is the folder structure I would use to keep the system understandable and scalable.

At a high level, the structure looks like this:

my-ai-agent/
├── src/
│   ├── agents/
│   ├── tools/
│   ├── memory/
│   ├── workflows/
│   ├── mcp/
│   ├── prompts/
│   ├── middleware/
│   ├── types/
│   └── index.ts
├── config/
├── tests/
├── package.json
└── tsconfig.json

At first glance, it may feel like over-organization. In reality, you do not start with everything. You grow into it. The goal is not to create folders upfront, but to have a clear place for things as complexity increases.

This is the simplest way to think about the system. Each folder has a single responsibility, and that clarity is what keeps the system predictable as it grows.

The reason structure matters more in AI systems than in traditional applications is that the execution path is not fixed. In a typical backend, a request follows a known route. In an agent system, the path depends on the model’s decisions. The agent might call different tools, retrieve different memory, or stop midway for approval. That flexibility is powerful, but it also makes systems harder to debug and reason about. Without structure, debugging becomes guesswork. With structure, behavior becomes traceable.

The best way to approach this is to start smaller than you think. A minimal setup is often enough in the beginning:

src/
├── agents/
│   └── researcher.ts
├── tools/
│   └── search.ts
└── index.ts

This is sufficient for a working agent. As the system grows, you introduce additional layers like memory, workflows, and middleware. The structure expands naturally instead of forcing a painful refactor later.

The agents folder is where you define what your system does. Each agent represents a role, typically combining a system prompt, a model configuration, and a set of tools. For example:

export const researcherAgent = {
  name: "researcher",
  systemPrompt: "You are a research assistant...",
  tools: ["web_search"],
  temperature: 0.3,
};

This folder answers a simple but important question: what roles exist in your system?

The tools folder defines what the agent is allowed to do. Tools are where agents become useful, but they are also where risk enters the system. Each tool should be explicit and controlled:

export const searchTool = {
  name: "web_search",
  execute: async (query: string) => {
    return fetch(`/search?q=${query}`);
  },
};

The key idea is not the implementation of the tool itself, but the boundary it creates. The agent should never have access to everything. It should only see and use tools that you explicitly register.

The memory folder is where many systems become unnecessarily complex. Instead of pushing everything into prompts, memory should be isolated and managed intentionally. A simple starting point is often enough:

export class ContextMemory {
  private messages: string[] = [];

  add(message: string) {
    this.messages.push(message);
  }

  getAll() {
    return this.messages;
  }
}

You can introduce more advanced memory systems such as vector search only when the need becomes real.

The workflows folder is where individual agent actions become coordinated processes. Most real systems are not single-step interactions. They are sequences of decisions and actions:

export async function researchPipeline(topic: string) {
  const research = await researcherAgent.run(topic);
  const analysis = await analystAgent.run(research);
  return analysis;
}

This is the point where you move from an agent to a system.

The mcp folder introduces a clean boundary for integrating external systems using the Model Context Protocol. As MCP adoption grows, isolating these integrations becomes increasingly valuable. Even with MCP, your application still needs to control access, validation, and permissions.

The prompts folder is about separating content from logic. As prompts evolve, keeping them inline makes iteration harder. Moving them into dedicated files allows faster updates without touching code.

The middleware folder is where production concerns live. This includes token budgets, logging, tracing, and rate limiting:

export class BudgetMiddleware {
  tokens = 0;

  track(usage: number) {
    this.tokens += usage;
  }
}

This layer is often what separates a simple demo from a production-ready system.

The types folder is where TypeScript provides its real value. Centralizing interfaces ensures that when something changes, the impact is visible across the system:

export type Agent = {
  name: string;
  tools: string[];
};

This makes evolving the system much safer.

What most people miss is that folder structure is not just about organization. It reflects architecture. If your code mixes tools, prompts, memory, and execution logic randomly, your system will behave the same way. If your folders enforce separation of concerns, your system becomes predictable. This aligns directly with the architectural principle that the runtime controls execution, the model proposes actions, and the system validates behavior.

Testing should follow the same philosophy. You do not need a complex setup at the beginning. A simple structure is enough:

tests/
├── unit/
└── integration/

Start by testing tools and memory. Add workflow tests as the system evolves. End-to-end testing can come later once the system stabilizes.

As your project grows, the structure can evolve. You might introduce a providers folder if you support multiple LLMs, or a skills layer if capabilities become reusable across agents. At the same time, if the project remains small, it is perfectly valid to flatten the structure. The goal is not to follow a template rigidly, but to avoid chaos as complexity increases.

Most AI agent tutorials focus heavily on prompts and models. Very few focus on how to structure the system around them. In real-world projects, that is where most of the challenges appear. A good folder structure will not make your agent smarter, but it will make your system understandable, maintainable, and scalable. And in practice, that matters far more.

In the previous article https://dev.to/raju_dandigam/the-typescript-ai-agent-architecture-i-would-use-in-2026-18k6 I covered the architecture behind controlled AI agents and why the model should not own the system. In a future post, I will show how to combine that architecture with this structure to build a minimal but production-ready agent in TypeScript. That is where everything connects.

The TypeScript AI Agent Architecture I Would Use in 2026

Raju Dandigam — Tue, 05 May 2026 05:46:54 +0000

Most AI apps do not fail because the model is bad. They fail because the system surrounding the model lacks structure.

The first version usually starts the same way. A user sends input, the app calls an LLM, and the response is returned. That is enough for a demo, but the moment the system needs to do anything real, the design starts to break.

A real AI system does more than generate text. It may need to call APIs, use tools, remember context, validate outputs, retry on failures, ask for human approval, and explain what happened. At that point, you are not building a chatbot anymore. You are building a system.

In 2026, I would not start with prompts. I would start with architecture.

The model is not the architecture

One of the biggest mistakes I see is treating the LLM as the center of the system. The model can suggest what to do next, but it should not control everything. It should not decide which tools are safe, whether a user has permission, or whether a risky action should proceed.

The model should propose. The application should decide. This simple shift changes how you design everything.

Think in terms of a loop, not a prompt

An agent is not a better prompt. It is a loop. The system gives the model a goal and context. The model suggests the next step. The system validates that step, executes it if allowed, records the result, and continues until the task is completed or blocked. Without this structure, agents become unpredictable. They repeat steps, call the wrong tools, or silently fail. With structure, they become workflows you can reason about.

Start with a simple state model

Before anything else, define state.

type AgentState = {
  goal: string;
  steps: AgentStep[];
  status: "running" | "blocked" | "completed" | "failed";
};

type AgentStep = {
  name: string;
  input: unknown;
  output?: unknown;
};

This small structure changes everything. The system is no longer a single request-response call. It becomes a stateful workflow. You can inspect it, debug it, resume it, and control it.

This is the simplest way to think about it. The model suggests. The runtime controls. The system decides what actually happens. I would keep the architecture simple and consistent.

The five layers that actually matter

The API layer handles requests, users, and permissions.
The runtime layer controls the loop, state, and execution.
The model layer interacts with LLMs through a gateway.
The tool layer defines what the agent is allowed to do.
The control layer handles validation, memory, observability, and approvals.

That is enough for most real systems.

Tools should be contracts, not suggestions

Tools are what make agents useful, but they are also where risk enters the system. If a model can call tools, those tools need structure.

type Tool = {
  name: string;
  risk: "low" | "high";
  execute: (input: unknown) => Promise<unknown>;
};

The key idea is simple.The model can request a tool. The system decides if that request is allowed. This is where most demos fall short. They give the model too much control.

Memory should be intentional

More context does not always mean better results. Instead of sending everything to the model, retrieve only what matters. Think of memory as useful signals, not a full transcript. Short-term memory belongs to the current task. Semantic memory stores reusable facts. Episodic memory stores past actions. The important part is not storing memory. It is retrieving the right memory at the right time.

This keeps the system focused, cheaper, and easier to debug.

Structured outputs make the system usable

Free text works for user responses. It does not work for system decisions. If the model is deciding what to do next, it should return structured data.

type Decision = {
  action: "call_tool" | "finish" | "ask_user";
  toolName?: string;
};

This allows the system to validate behavior instead of guessing from text. The model suggests. The system verifies.

Observability is not optional

Agent systems are harder to debug because they are not deterministic. The same input may take a different path. If something goes wrong, you need to know:

What the model saw
What it decided
Which tool it called
What came back

Without this, debugging becomes guesswork. Even a simple step trace makes a big difference.

Where frameworks fit

Frameworks can help, but they do not replace architecture.

Tools like:

Vercel AI SDK
LangGraph
OpenAI Agents SDK
Model Context Protocol

are useful for building agent systems. But they do not define your boundaries.

You still need to decide how state works, how tools are exposed, how outputs are validated, and how failures are handled.

The architecture I would trust

The architecture I would use in 2026 is not the most complex one. It is the one that gives control back to the system.

A stateful workflow.
A controlled loop.
Typed tools.
Structured outputs.
Observable steps.
Clear boundaries between model decisions and system execution.

That is what turns an AI demo into something you can actually trust. Because in real systems, reliability matters more than clever prompts.

What Most Beginners Get Wrong About Building AI Apps

Raju Dandigam — Tue, 14 Apr 2026 05:31:37 +0000

When you first start building AI-powered features, everything sounds deceptively simple. You call an API, pass some text, and get a response back. After a few experiments, it starts to feel like all AI systems are built the same way.

Then you hear terms like workflows, agents, and multi-agent systems, which only makes it more confusing. It is easy to assume these are just different names for the same thing.

That assumption is where most beginners go wrong.

Once you start building something real, something that needs to work consistently, scale, and handle edge cases, you quickly realize that these are fundamentally different ways of designing systems. The choice between them is not just about architecture. It directly affects reliability, cost, performance, and how easy your system is to debug when things break.

The biggest mistake beginners make is not understanding how decisions are made inside their system.

It’s not really about AI; it’s about decision-making

A much simpler way to think about AI systems is to ignore the model for a moment and focus on control.

In some systems, you control every step. In others, the AI decides what to do next. In more complex setups, multiple AI components collaborate and share responsibilities.

That difference in control is what shapes the entire system.

If you design everything as if the AI should always “figure it out,” you will often end up with something harder to manage than it needs to be. If you over-control everything, you may limit flexibility where it actually matters.

Understanding that balance early saves a lot of rework later.

A relatable way to think about it

Imagine you are ordering food.

In one scenario, the process is completely structured. You select items from a menu, enter your address, confirm payment, and receive your order. Every step is predefined and predictable.

In another scenario, you simply say, “I want something quick and healthy,” and the system figures out what you might like, asks follow-up questions, and adapts based on your answers.

Now imagine a third scenario where one system understands your intent, another finds suitable options, and another optimizes delivery timing. Each part focuses on a specific responsibility, and together they complete the task.

These three patterns represent very different ways of building AI applications, even though they might all use the same underlying model.

The simplest starting point: fixed decision paths

Most real-world AI systems start with something very simple. You define the steps, and the system follows them every time.

async function createSummary(text: string) {
  const cleaned = await cleanText(text);
  const summary = await generateSummary(cleaned);
  const keywords = await extractKeywords(summary);

  return { summary, keywords };
}

This approach is straightforward. Every execution follows the same sequence. If something fails, you know exactly where to look. If you need to optimize cost, you know how many model calls are happening. If you need to scale, the behavior is predictable.

This is why many production systems rely heavily on this pattern. It works well for document processing, onboarding flows, reporting pipelines, and content moderation. These are all scenarios in which the steps are known ahead of time and do not change much across requests.

Beginners often underestimate how powerful this approach is because it does not feel “intelligent.” In reality, this level of control is what makes systems reliable.

When the system needs to decide

There are cases where predefined steps start to break down. You may not know the next step until you see the input. The system may need to explore, ask questions, or adapt based on context.

That is where a different approach becomes useful.

async function runAgent(task: string) {
  return await agent({
    goal: task,
    tools: ["search", "summarize", "save"]
  });
}

Here, instead of defining the sequence, you define a goal and give the system a set of capabilities. The system decides whether it should search first, summarize later, or skip certain steps entirely.

This flexibility is valuable in areas like customer support, research, and planning. Every input can be different, and the system needs to adapt rather than follow a fixed path.

However, this comes with trade-offs. The number of steps may vary. The cost may vary. Debugging becomes less straightforward because the path is no longer fixed. You are trading control for flexibility.

This is often where beginners run into trouble. It is tempting to use this approach everywhere because it feels more powerful, but many problems simply do not need that level of adaptability.

When complexity grows further

As systems grow, you may find that a single decision-making unit becomes overloaded. Different parts of the task require different kinds of expertise. One part needs research, another needs writing, and another needs validation.

At that point, splitting responsibilities can help.

async function buildArticle(topic: string) {
  const research = await researchAgent(topic);
  const draft = await writerAgent(research);
  const final = await editorAgent(draft);

  return final;
}

Each component focuses on a specific responsibility. One gathers information, another transforms it, and another refines it. This separation can improve quality and make complex tasks more manageable.

At the same time, it introduces more moving parts. Coordination becomes important. Debugging becomes more complex. Costs can increase. This is why this pattern is usually introduced later rather than at the beginning.

Where most beginners go wrong

A common pattern I see is starting with the most flexible and complex approach first. It feels like the “correct” modern way to build AI systems.

In practice, it often leads to overengineering.

Simple tasks get wrapped in unnecessary complexity. Costs increase without clear benefits. Systems become harder to reason about. Small bugs become difficult to trace because the execution path is not fixed.

Another mistake is forcing a rigid structure onto problems that clearly require flexibility. If your system keeps adding exceptions, retries, and conditional branches to handle different cases, it may be a sign that the design needs to allow more dynamic behavior.

The real skill is not choosing one approach over another. It is knowing when each one makes sense.

A more practical way to build AI systems

Instead of picking one pattern and applying it everywhere, a better approach is to combine them.

Start with a simple, controlled structure and introduce flexibility only where it adds value.

async function handleSupport(message: string) {
  const type = await classify(message);

  if (type === "simple") {
    return searchFAQDatabase(message);
  }

  return runAgent(message);
}

In this example, straightforward questions are handled with a predictable path. More complex issues are handled with a flexible system that can adapt to the situation.

This approach keeps the system efficient and understandable while still allowing intelligence where it matters.

A useful mental model

If you can clearly define the steps, keep it simple and structured.

If the system needs to figure out the steps on its own, allow it more flexibility.

If the problem naturally breaks into multiple specialized responsibilities, consider separating them.

You do not need to start with the most advanced setup. In fact, starting simple often leads to better systems in the long run.

Conclusion

The goal is not to build the smartest system.

The goal is to build something that works reliably, is easy to understand, and can evolve as your requirements grow.

Most successful AI applications are not fully autonomous systems. They are carefully designed combinations of control and flexibility.

If you are just getting started, begin with something simple and predictable. Once you understand where your system needs more intelligence, add it deliberately.

That approach will take you much further than trying to build the most advanced system on day one.

Docker for TypeScript Developers Building AI Agents in 2026

Raju Dandigam — Fri, 10 Apr 2026 00:08:52 +0000

Modern frontend engineers are no longer just building UI layers. Increasingly, we are building systems that orchestrate AI behavior. A simple TypeScript service can now act as a coordinator between large language models, vector databases, background workers, and external tools.

That shift has quietly introduced a new class of problems. Not problems with writing code, but with running it.

You might have already experienced something like this. Your AI agent works perfectly on your machine. It calls an LLM, stores context in a vector database, maybe uses Redis for memory, and even talks to a Python service for embeddings. Then a teammate pulls the repo and tries to run it.

Suddenly, nothing works. Node versions don’t match. Python dependencies break. Redis isn’t running. Environment variables are missing. The system that felt simple is now fragile.

This is where Docker stops being “infrastructure tooling” and becomes something much

Why This Problem Is Different in 2026

Traditional web applications were mostly deterministic. If your code compiled and your dependencies matched, you could reasonably expect consistent behavior.

AI systems don’t behave that way. Even when your code is correct, outcomes vary based on context, prompts, and external services. That makes the execution environment even more critical. If the environment itself is inconsistent, debugging becomes nearly impossible.

On top of that, modern AI applications are rarely single-service systems. A typical setup might include:

A TypeScript API orchestrating agents
A vector database for retrieval
A cache or message queue for coordination
A Python service for embeddings or model execution
Optional local LLMs for development

This is no longer just a Node.js app. It is a distributed system, even during development.

Docker as the Execution Layer for AI Agents

The most useful way to think about Docker in this context is not as a deployment tool, but as a boundary.

Instead of letting your AI agent execute directly on your machine, you introduce a controlled environment where everything runs. The agent still makes decisions, but execution occurs within a container with defined tools, dependencies, and permissions.

This separation solves several problems at once. It makes environments reproducible, isolates dependencies, and gives you a safe place for agents to run code, tests, or workflows.

In practice, this means your TypeScript application becomes the orchestration layer, while Docker provides the execution layer.

Starting Simple: Containerizing a TypeScript Agent

Let’s begin with a minimal example. Imagine a small TypeScript service that acts as an AI agent using an LLM API.

import express from 'express';
import Anthropic from '@anthropic-ai/sdk';

const app = express();
app.use(express.json());

const client = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

app.post('/agent', async (req, res) => {
  const response = await client.messages.create({
    model: 'claude-3-5-sonnet-20241022',
    max_tokens: 500,
    messages: [{ role: 'user', content: req.body.prompt }],
  });

  res.json(response);
});

app.listen(3000, () => {
  console.log('Agent running on port 3000');
});

This works locally, but we want to make it portable and reproducible. A multi-stage Dockerfile gives us a clean way to do that.

FROM node:20-alpine AS builder

WORKDIR /app

COPY package*.json ./
RUN npm ci

COPY tsconfig.json ./
COPY src ./src

RUN npm run build

FROM node:20-alpine

WORKDIR /app

COPY package*.json ./
RUN npm ci --only=production

COPY --from=builder /app/dist ./dist

USER node

EXPOSE 3000

CMD ["node", "dist/index.js"]

Now the application runs the same way everywhere. There is no dependency drift, no missing tools, and no ambiguity about runtime behavior.

Moving to Real Systems: Multi-Agent Architecture

The real value of Docker becomes obvious when you move beyond a single service.

Consider a common multi-agent setup:

A coordinator who receives requests
A research agent that fetches and analyzes information
A code agent that generates or modifies code
Redis for communication
PostgreSQL for persistence

Instead of managing all of this manually, Docker Compose lets you define the entire system in one place.

version: '3.8'

services:
  coordinator:
    build: ./services/coordinator
    ports:
      - "3000:3000"
    environment:
      - REDIS_URL=redis://redis:6379
      - DATABASE_URL=postgresql://user:pass@postgres:5432/agents
    depends_on:
      - redis
      - postgres

  research-agent:
    build: ./services/research-agent
    ports:
      - "3001:3001"
    environment:
      - REDIS_URL=redis://redis:6379
    depends_on:
      - redis

  code-agent:
    build: ./services/code-agent
    ports:
      - "3002:3002"
    environment:
      - REDIS_URL=redis://redis:6379
    depends_on:
      - redis

  redis:
    image: redis:7-alpine

  postgres:
    image: postgres:15-alpine
    environment:
      - POSTGRES_USER=user
      - POSTGRES_PASSWORD=pass
      - POSTGRES_DB=agents

Running docker-compose up brings the entire system to life. Each agent runs in isolation, but they communicate through well-defined channels. This is far more stable than trying to stitch together services manually.

When TypeScript Meets Python

Most AI systems today are not purely JavaScript. Even if your orchestration layer is TypeScript, you will likely depend on Python for embeddings, model execution, or specialized libraries.

Docker makes this integration straightforward by separating concerns into services.

services:
  agent:
    build: ./agent-service
    ports:
      - "3000:3000"
    environment:
      - ML_SERVICE_URL=http://ml-service:8000
    depends_on:
      - ml-service

  ml-service:
    build: ./ml-service
    ports:
      - "8000:8000"

Your TypeScript agent can now call the Python service without worrying about local Python installations or dependency conflicts.

async function getEmbeddings(text: string) {
  const response = await fetch('http://ml-service:8000/embed', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({ text })
  });

  return response.json();
}

This separation becomes critical as systems grow. Each service can scale independently, and each stack can evolve without breaking others.

Development Experience Without Friction

One concern developers often have is that Docker slows down iteration. That can happen if you rebuild containers on every change, but it does not have to.

A better approach is to use volume mounts with a watch mode.

FROM node:20-alpine

WORKDIR /app

COPY package*.json ./
RUN npm install

COPY tsconfig.json ./
COPY src ./src

RUN npm install -g tsx

CMD ["tsx", "watch", "src/index.ts"]

Now you can edit code locally, and the container reloads automatically. You get the benefits of Docker without sacrificing developer experience.

What Actually Changes After Adopting This

The impact of this approach is not theoretical. It shows up immediately in how teams work.

Onboarding becomes faster because new developers do not need to recreate environments manually. Running the system becomes predictable because everything is defined in one place. Debugging improves because you eliminate environment-related variables.

More importantly, it changes how you think about AI systems. Instead of treating them as scripts or services, you start treating them as controlled execution environments. The agent decides what to do, but Docker defines how it is allowed to do it.

A More Useful Mental Model

It helps to think of modern AI systems as having three distinct layers.

The first is the decision layer, where the language model or agent determines what actions to take. The second is the orchestration layer, typically written in TypeScript, where workflows and integrations are defined. The third is the execution layer, where those actions actually run.

Docker fits naturally into that third layer. It provides a deterministic, isolated environment in which execution occurs safely and consistently.

Once you start thinking in these terms, Docker no longer feels like an optional tool. It becomes a fundamental part of building reliable AI systems.

Conclusion

The biggest mistake teams make with AI development today is underestimating the importance of the execution environment. It is easy to focus on prompts, models, and frameworks, but those are only part of the system.

What matters just as much is where and how those decisions are executed.

For TypeScript developers, Docker provides a practical way to bring structure and reliability to increasingly complex AI workflows. It bridges the gap between frontend development and distributed systems, without requiring a complete shift in tooling or mindset.

If you are building AI agents in 2026, you are already working with multi-service systems, mixed runtimes, and non-deterministic behavior. Docker is what makes all of that manageable.

Start small. Containerize a single agent. Then add services as your system grows. Over time, you will find that it is not just about making things run, but about making them run in a way you can trust.