Forem: Abhinav

Prompting Without the Menu

Abhinav — Mon, 27 Apr 2026 06:56:21 +0000

I read a list-format post on prompting techniques over the weekend. Few-shot, chain-of-thought, ReAct, RAG, self-consistency, meta-prompting... fifteen items, flat list, equal weight. Each one explained briefly, each one given the same visual real estate.

The format is the problem.

When you list techniques as parallel options, you train the reader to pick the one that sounds most appropriate to their task. That's how prompting becomes cargo-culting. Someone reads "use chain-of-thought for complex reasoning" and starts adding Let's think step by step to every prompt that feels hard. The technique gets used, but it's not solving anything specific, because nothing specific had been diagnosed in the first place.

Prompting techniques aren't options. They're responses to failure modes. The list-format post collapses that distinction, and the collapse is what makes prompting feel like trial-and-error instead of engineering.

Here's how I've started thinking about it instead.

A workflow is a pipeline, not a single prompt

Most of the techniques in those lists assume you're optimizing one prompt: one input, one output, get the wording right. That framing is wrong for anything beyond a one-shot question. Real work with an LLM is a pipeline, and the stages aren't symmetric.

Early stages reduce entropy. Later stages spend tokens on reasoning. If you skip the first kind and jump straight to the second, every downstream technique compounds noise instead of reducing it.

Compression has to come first. Before generating anything, three questions do most of the work:

What exists? (the actual current state: files, schemas, code paths, existing logic)
What's broken? (the specific gap, not the vague feeling of wrongness)
What should the output look like? (shape, not content)

Skip this and the model fills the gap with probabilistic noise. You'll get plausible-looking output that's wrong in ways that take longer to debug than the original problem. Every prompting technique downstream of a vague problem statement is rearranging deck chairs.

Techniques sort by what they're for, not how they sound

Once you've compressed, the techniques fall into four groups based on what kind of failure they address:

Pattern biasing (low-cost control knobs). Few-shot, role, format, instruction prompting. These are persistent constraints, not reactive fixes. Set them once at the top of the workflow (prefer minimum diff, no abstraction until justified, assume repo context) and inject them only when the model drifts. The mistake is treating them as prompts to write fresh each time. They're more like config than instructions.

Reasoning scaffolds (conditional, expensive). Chain-of-thought, tree-of-thought, reflection. Reflection is the cheapest and most local, so it should be the default. Generate, then immediately ask what failure modes and edge cases exist, then patch. Only escalate to CoT or ToT when the model is visibly guessing at structure rather than missing facts. These trade cost and latency for accuracy... that trade is worth it less often than the lists suggest.

External state interaction (for missing data). Prompt chaining, ReAct, least-to-most. These are about getting information into the workflow that wasn't there before. Use when the task decomposes cleanly into stages, or when the model needs to act on the world before continuing.

Systems we can borrow from (RAG-shaped). A repo, a logs directory, a diff history are all RAG layers in disguise. The lesson from RAG isn't use a vector database. It's don't describe; point. Specific files, specific versions, specific diffs. Description introduces drift; pointing grounds it.

The reframe that changed the most

The biggest single shift in how I prompt: stop asking the model whether the output is correct.

Is this right? invites defense. The model has just produced the output. Asking it to grade itself produces a confident affirmative most of the time, because the same machinery that generated the output is now being asked to evaluate it.

The better question: what assumptions did you make, and what breaks first?

This exposes seams instead of inviting justification. It forces the model to surface the implicit decisions baked into the output: the input formats it assumed, the edge cases it didn't handle, the constraints it inferred but didn't verify. Once those are visible, you can decide whether each one is acceptable. Is this correct gets you a yes. What breaks first gets you a list.

This is also a better reflection prompt than the standard "review your answer for errors" pattern, because it doesn't depend on the model finding its own mistakes. It depends on the model articulating its own assumptions, which is a much more reliable thing to ask of it.

Rolling state, not history

For anything multi-turn, the default failure mode is context bloat. Every turn appends to the history, the history gets too long, you compact it, and compaction loses the load-bearing context: the early decisions, the constraints, the invariants that everything downstream depends on.

The fix is to maintain a compressed state explicitly. Not history. State. Key decisions, active constraints, things-that-must-not-change. Update it as you go. When context gets tight, you compact the conversational history but never the state.

Dense context. Saved tokens. Preserved direction.

What to skip

Some techniques in the typical list increase entropy without giving you a way to reduce it back. Zero-shot prompting (when you have any examples available), self-consistency for tasks that aren't ambiguous, meta-prompting for problems you haven't compressed yet, these add cost and uncertainty without a corresponding reduction in either. They have their place, but they're not the default tools, and listing them with equal weight to reflection or grounding is misleading.

Worse: they're often the techniques that feel sophisticated, which means people reach for them first. Self-consistency feels rigorous. Meta-prompting feels meta. Both are easy ways to spend tokens without spending thought.

The actual diagnostic

Here's the decision logic, condensed:

Output is wrong in a vague way? Reduce entropy. You haven't compressed the problem yet.
Output has wrong structure or format? Pattern bias: few-shot, format, instruction. Cheap and high-leverage.
Reasoning slipped somewhere? Add scaffolding. Reflection first; CoT or ToT only when the model is guessing at structure.
Output is factually off? Ground it. Point at files, versions, diffs. Don't describe.

Failure mode → fix type. That's the whole framework. The specific techniques are implementations of these four moves.

This is also why list-format posts feel unsatisfying after a while. They give you the techniques without the diagnostic, which is like handing someone a toolbox without telling them how to identify what's broken. You end up applying tools by feel rather than by indication.

Where this is going

The reason I've been thinking about this isn't really about prompting techniques. It's about the layer above them... how developers, platforms, and users work with generative AI systems, and where the friction in that interaction comes from.

The friction isn't usually that the model is bad. It's that the interface between human intent and model output is underspecified at every level. List-format posts on prompting techniques are a symptom of that, they're trying to make the interface tractable by enumerating its surface, but the actual problem is structural.

That's a longer thread, and not for this post. But it's where the next few are going.

This post grew out of reading one too many "complete guide to prompting" lists. The decision tree above is my attempt to compress what those lists actually need, into something diagnostic instead of encyclopedic.

I Dockerized a Production AI System as an Intern. Here's What Actually Mattered.

Abhinav — Mon, 06 Apr 2026 04:49:03 +0000

No CI/CD. No Kubernetes. Just PuTTY, WinSCP, and a system that needed to stop being fragile.

The System I Walked Into

I'm an intern on an AI team. My project is an internal AI support tool: an augmented RAG-based system that ingests knowledge bases, searches resolved tickets via vector similarity, and synthesizes resolutions using an LLM. FastAPI backend, React frontend, PostgreSQL with pgvector, ChromaDB for embeddings, OpenAI for generation.

The AI pipeline is interesting. The infrastructure it was running on was not.

Here's what "deployment" looked like when I joined:

Directory A (test):
  └── git pull → run uvicorn directly on EC2

Directory B (production):
  └── teammate manually copies changed files from Directory A
  └── find-and-replace URLs
  └── run uvicorn directly on EC2

Four ports exposed + separate frontend and backend for both environments. No containerization. No build step for the frontend. No rollback mechanism. No isolation between test and production beyond "they're in different folders." The frontend was not even served via Vite's dev server in production.

If the EC2 instance had a bad day, reconstruction was from memory and hope.

What I Built

One directory. Docker Compose with overlay files for environment separation. nginx as a reverse proxy (two ports instead of four). Image versioning with semantic tags and timestamp backups. A deploy script. A rollback script. Full isolation between prod and test + different Docker networks, different data volumes, different container names.

Deploy went from "copy files and pray" to:

./deploy.sh prod 1.3

The Constraints That Shaped Everything

This is the part I actually want to talk about. The Docker setup isn't novel... anyone can follow a tutorial. What made this interesting was what I couldn't do.

No CI/CD

No GitHub Actions. No webhooks. No automated pipelines. My deployment tools are PuTTY (SSH terminal) and WinSCP (file transfer). That's it. So I built a shell script that acts as a poor-man's pipeline:

./deploy.sh [stack] [version] [branch]
     │
     ├── git pull origin [branch]
     ├── tag current running images as backup
     ├── docker compose build
     ├── docker compose up -d
     └── prune dangling images

The branch argument means I can test feature branches on the test stack without merging:

./deploy.sh test latest feature/new-rag-pipeline
# verify on internal IP
# merge PR
./deploy.sh prod 1.3 main

Is this as good as GitHub Actions with automated tests and staging environments? No. Does it work reliably for a single-server deployment with one developer? Yes.

Shared Secrets, Different Environments

The backend reads its config from a single .env file: database credentials, API keys, OIDC settings. Both prod and test use the same file because they're on the same machine talking to the same database.

But the OIDC redirect URIs must differ between environments. Prod redirects to the public DNS. Test redirects to the internal IP.

The solution: Docker Compose's precedence rules. environment: in a compose file beats env_file:. So the base compose file loads the shared secrets via env_file:, and each overlay (prod.yml, test.yml) overrides just the OIDC URI via environment:.

# docker-compose.yml (base)
services:
  backend:
    env_file: backend/.env  # shared secrets

# docker-compose.prod.yml (overlay)
services:
  backend:
    environment:
      - OIDC_REDIRECT_URI_FRONTEND=https://public.dns.com

# docker-compose.test.yml (overlay)
services:
  backend:
    environment:
      - OIDC_REDIRECT_URI_FRONTEND=http://<IP>:<Port>

This is a small detail. It's also the kind of thing that causes a two-hour debugging session if you don't know about it. env_file values get silently overridden by environment values, and there's no warning, no log, nothing. You just get the wrong redirect and stare at your OIDC provider's error page.

Volume Isolation Without Duplication

ChromaDB stores embeddings on disk. The knowledge base files live on disk. Logs go to disk. Prod and test need completely separate copies of all of these. You don't want a test run corrupting production embeddings.

Docker Compose variable substitution handles this:

# docker-compose.yml
volumes:
  - ./backend/${CHROMA_DIR:-chromaDB}:/app/chromaDB
  - ./backend/${DATA_DIR:-data}:/app/data
  - ./backend/${LOGS_DIR:-logs}:/app/logs

# .env.test
CHROMA_DIR=chromaDB-test
DATA_DIR=data-test
LOGS_DIR=logs-test

Default values point to prod directories. When you pass --env-file .env.test, the paths switch to test directories. Same compose file, different data. The deploy script handles this automatically, and you never pass --env-file manually.

The Override File Pattern

Docker Compose has a feature where docker-compose.override.yml is automatically loaded alongside docker-compose.yml, but only when you don't use explicit -f flags.

I used this to create three distinct modes from the same codebase:

# Local development (override.yml auto-loaded)
docker compose watch
→ Vite dev server with HMR, OIDC pointing to localhost

# Production (explicit -f, override.yml skipped)
docker compose -f docker-compose.yml -f docker-compose.prod.yml up
→ nginx serves built static files, OIDC pointing to public DNS

# Test (explicit -f, override.yml skipped)  
docker compose --env-file .env.test -f docker-compose.yml -f docker-compose.test.yml up
→ nginx serves built static files, OIDC pointing to internal IP, test volumes

One base file. Three overlays. Three completely different behaviors. The developer never thinks about which files to compose: docker compose watch just works locally, and ./deploy.sh picks the right overlay on EC2.

The Migration

The scariest part was the cutover. Two directories, both running live. I needed to consolidate into one without downtime on prod.
The sequence:

Stop old test stack (directory A)
Stop old prod stack (directory B)
Copy prod data (ChromaDB, knowledge base, secrets) into directory A
Create isolated test directories (start empty)
Pull latest code with all new compose files
Deploy prod from directory A → verify public DNS works
Deploy test from directory A → verify internal IP works
Archive directory B

Step 6 is where you sweat. The public DNS now needs to resolve to the new container, with the right OIDC config, serving the right data. If anything is wrong, users see a broken page.

It worked on the first try. Which means I probably over-prepared, but I'd rather over-prepare than explain to the team why production is down.

Rollback

The deploy script tags current images before rebuilding:

resolvyst-backend:latest  →  resolvyst-backend:v1.2
                          →  resolvyst-backend:20260403_1430

Rollback reuses the saved image without touching data volumes:

./rollback.sh prod v1.2

This is basic, but it's infinitely better than what existed before (nothing). The timestamp tag is insurance. Even if you forget to bump the version, you can still roll back to any previous deploy by timestamp.

What I'd Do Differently With More Access

If I had CI/CD and wasn't constrained to PuTTY:

Health checks in the compose file. Right now, deploy.sh reports success even if the backend crashes on startup. A curl check post-deploy would catch that.
Separate secrets per environment. The shared .env file works but is fragile; one wrong edit affects both stacks.
Automated smoke tests after deploy. Hit the health endpoint, verify the RAG pipeline returns a response, check that OIDC redirects correctly.
Git working tree check at the top of deploy.sh. Right now, nothing stops you from deploying with uncommitted changes on EC2.

But these are improvements to a system that already works. The first version doesn't need to be perfect. It needs to be better than what it replaced, and "someone manually copy-pasting files" is a low bar to clear.

The Actual Takeaway

The interesting skill in infrastructure work isn't knowing Docker or nginx or Compose. It's designing around constraints you can't remove. I couldn't set up CI/CD. I couldn't get a second server. I couldn't change the OIDC provider's configuration beyond adding redirect URIs. I had an intern's access level.

So I built something that works within those constraints. It's not elegant by industry standards. But it's reproducible, it's rollback-safe, it has environment isolation, and it replaced a process that depended on one person's memory of which files to copy where.

That's the gap between knowing tools and doing systems design. Tools are things you learn. Systems design is figuring out what to do when the tools you want aren't available.

I'm an intern working on AI systems: RAG pipelines, support ticket analytics, UX upgrades and apparently now DevOps. If you're working on similar problems or just want to talk about building things under real-world constraints, I'd love to connect.