Stop trying to 'train' ChatGPT on your docs

Muzammil Shakir — Thu, 23 Apr 2026 14:16:52 +0000

Every few weeks someone on my team gets the same request: "Can you just train ChatGPT on our docs?"

Short answer: no — and you almost never actually want to. The long answer is more interesting, because the thing most people call "training" is three very different engineering problems. Pick the wrong one and you'll spend a month on a fine-tune that can't even remember your product's name.

"Training" is actually three separate problems

When a non-engineer says "train ChatGPT on our data," they mean one of these — and each has a completely different implementation path:

Instructions — how the model talks (tone, format, refusal rules). No code, no data.
Grounding — what the model can reference (RAG, file uploads, tool calls). This is the one you probably want.
Fine-tuning — what patterns the model follows (style, classification, strict formats). Not a knowledge store.

⚠️ Warning
Fine-tuning a model on your PDFs to "teach it your docs" is one of the most expensive ways to get a worse result than RAG. The model won't reliably recall specific facts — that's not what fine-tuning does.

If you're building a support bot, an internal knowledge assistant, or a sales enablement tool, grounding (RAG + good instructions) is almost always the right primitive.

The four real options, ranked by "when you should reach for them"

Custom Instructions — your need is consistent tone or formatting. No knowledge involved. Start here.
Custom GPT — knowledge set is small (a dozen-ish files), changes rarely, audience is trusted. Zero code, ships in an afternoon.
API + RAG — knowledge is large, changes often, needs permissions, or you need tool use (CRM lookups, ticket creation). This is where production lives.
Fine-tuning — you need behavioral constraints: tight output formats, classification, style imitation. Not for "remembering our docs."

💡 Tip
A useful rule: if the thing you want the assistant to know could change next month, put it in retrieval. If it's "how we talk," put it in instructions. Never try to fine-tune either into the weights.

A RAG pipeline that actually ships

I'll skip the "what's a vector database" overview. The thing that separates demos from production is the evaluation and governance loop, not the embedding model. Here's the skeleton I use:

Scope — pick 3–5 sources, write down what's out of scope. Never start with "all our Confluence."
Clean — structured formats (Markdown, clean HTML) beat scraping. Kill duplicates and outdated pages before indexing.
Chunk — 200–800 words per chunk, keep the section header with the chunk. Retrieval without headers loses its mind.
Index — embeddings + metadata (source, last-updated, permissions). Permissions especially — global-read indexes leak in interesting ways.
Answer with citations — prompt the model to only use retrieved chunks. On weak retrieval, return "I don't know" and ask for clarification. Don't let it guess.
Evaluate — pull 30–100 real questions from support logs. Score: correct, complete, cited, on-tone. Re-run this every prompt change.
Monitor — thumbs up / down in the UI, re-index on doc changes, version your prompts.

Six of those seven are unglamorous. Teams skip them and then wonder why "ChatGPT" is lying to their customers.

The governance piece everyone skips

Risk	What breaks	What to actually do
Empty retrieval	Model guesses, confidently	Force refusal + clarification prompt
Stale docs in index	Assistant cites a 2022 policy	Re-index on change, track `last_updated`
Global-read index	Anyone can query HR / contracts	Enforce permissions at retrieval time
Drift after prompt change	Quality quietly tanks	Regression test against golden set
Secrets in prompts	Token leaks into training / logs	Strip them at the gateway

If your assistant is going to face customers or touch sensitive data, none of these are optional.

Quick answers

Q: Should I fine-tune GPT-4 on our docs?

Almost certainly no. Fine-tuning teaches behavior, not knowledge. Use RAG for knowledge and keep fine-tuning for things like "always output JSON in this schema" or "classify this support ticket."

Q: Is a Custom GPT enough for production?

For an internal prototype, yes. For a customer-facing assistant with permissions, audit logs, or tool use — no. You'll hit Custom GPT's ceiling fast and end up rebuilding on the API anyway.

Q: What's the cheapest path to a working assistant?

Custom Instructions + a Custom GPT with 5–10 of your best documents. That's a zero-code baseline. If the outputs are good enough for internal use, ship it. If users start asking things the documents don't cover, that is the signal to graduate to RAG.

This is a condensed version. The full guide with the comparison table across all five methods, a detailed RAG blueprint, FAQs, and a governance checklist lives on our site: Read the full guide →