Forem: Tim Maximov

System Instead of Team: Rethinking How Businesses Are Built

Tim Maximov — Wed, 01 Apr 2026 02:12:43 +0000

System Instead of Team: Rethinking How Businesses Are Built

Most founders believe they are building a team. In practice, they are building a system, simply not in an explicit form. This system is distributed across people, decisions, and shared context. It exists in habits, implicit rules, and accumulated experience. As long as the original participants remain involved and the context is preserved, such a system appears stable. However, this stability is conditional and does not survive change.

The problem becomes visible when the environment shifts. Team composition changes, the volume of tasks increases, or the system is applied in a slightly different context. At this point, what previously looked consistent begins to diverge. The same inputs lead to different outputs, decisions vary depending on who makes them, and the overall behavior of the organization becomes less predictable. This is not a failure of execution but a consequence of how the system is structured.

At an early stage, this variability is often interpreted as noise. It is attributed to growth, complexity, or temporary misalignment. In reality, it is structural. The system has always depended on interpretation rather than definition. Scaling does not introduce this property; it amplifies it. As the number of decisions and participants increases, so does the number of possible interpretations.

Any functioning team already operates within a system. Decisions are made, tasks are executed, and results are evaluated according to some internal logic. The critical distinction is not whether this logic exists, but where it resides. When it resides in people, it changes with people. When it resides in context, it degrades as context fades. In both cases, the system lacks independence from its carriers.

This becomes a limiting factor under growth. Scaling is often framed as a problem of capacity, requiring more people, more coordination, and more management. In practice, it is a problem of reproducibility. The question is not how many tasks can be processed, but whether identical conditions produce identical outcomes. If they do not, the system is not scaling; it is fragmenting.

Teams compensate for this fragmentation through communication and alignment. They fill gaps, resolve ambiguities, and synchronize understanding. While effective in the short term, this approach does not eliminate variability. It redistributes it. Coordination becomes increasingly expensive, and the system remains dependent on continuous human mediation.

An explicit system addresses this at a different level. It separates logic from the individuals executing it by defining rules, constraints, and decision paths in a form that does not rely on memory or interpretation. This does not eliminate the role of the team but changes it. Instead of carrying the system, the team operates within it. Decisions become reproducible rather than situational, and outcomes become predictable rather than dependent on individual judgment.

This distinction becomes more pronounced with the introduction of automation. Automation does not create structure; it assumes its existence. When applied to an implicit system, it accelerates existing inconsistencies. Ambiguity is not resolved but encoded, and variability is not reduced but propagated at higher speed. As a result, automation amplifies both correctness and error, depending on the quality of the underlying system.

Recent advances in AI systems, particularly language models, further expose these structural properties. Unlike humans, such systems do not share implicit context and do not compensate for missing information through experience. They operate strictly on the provided input. When the system contains gaps, contradictions, or undefined elements, these are not smoothed over but translated into inconsistent outputs. What was previously hidden within human interpretation becomes observable at the level of system behavior.

This shift changes the framing of the problem. The question is no longer how to improve team performance within an implicit structure, but how to define the structure itself. A system must be described in terms of decisions, constraints, and relationships in a way that allows it to function independently of specific individuals. Without this, any attempt to scale will increase variability rather than throughput.

In this context, the role of the team is redefined. A team is not the source of system logic but its execution layer. It applies rules, handles edge cases, and maintains operation within defined boundaries. The quality of execution depends on the clarity of the system, not on the implicit knowledge of its members. This reduces dependency on individual context and enables consistent behavior across different participants and conditions.

At small scale, the difference between implicit and explicit systems is negligible. Informal coordination is sufficient, and variability remains within acceptable limits. At larger scale, this difference becomes fundamental. Systems that rely on implicit logic require increasing effort to maintain consistency, while explicit systems can replicate behavior with minimal coordination overhead.

Ultimately, the transition is not from team to system, but from implicit to explicit structure. A team can maintain a system, but it cannot replace it. As complexity grows, the absence of explicit structure becomes the primary constraint on development.

Further exploration

This article is part of a broader exploration of how systems behave under scale, loss of context, and reinterpretation.

https://dev.to/macsart_ai_by_tim/series/37809

Your Knowledge, Your Model — Part 3: Determinism Is Not Accuracy

Tim Maximov — Tue, 31 Mar 2026 10:46:16 +0000

Two agents. Same knowledge base. Same question. Different answers.

Both answers are internally consistent. Both are traceable to real sources. Neither agent made anything up. And yet they disagree.

This is not a hallucination problem. It's not an agent quality problem. It's a determinism problem — and it's the one nobody talks about.

What determinism means in a knowledge system

Most people ask two things of their knowledge system:

Is the information there? (completeness)
Is it correct? (accuracy)

This method adds a third requirement that almost nobody names explicitly:

Any agent, reading your sources in any order, must arrive at the same model of the system.

This is not the same as accuracy. Data can be accurate in every individual file and still produce different models depending on reading order. The failure modes are subtle:

A connection is described from side A but not from side B
The same concept has two different names in two different places
One file says "may", another says "always" — for the same behavior
A rule exists in one layer but not in the layer where agents expect to find it

None of these are factual errors. Every file is "correct." But the system as a whole is non-deterministic — its output depends on which file the agent happened to read first.

If two agents read the same knowledge base and build different models — the knowledge base is non-deterministic. That's a bug, not a disagreement, not a matter of interpretation.

Why this is harder to catch than hallucination

Hallucination is visible in the wrong direction. The output doesn't match anything in the sources — you can check.

Non-determinism is invisible because the output matches something in the sources. It's just not the right something. If you ask "is this in my knowledge base?" — the answer is yes. The answer is always yes. You just got the wrong version of yes.

This is why COLLAPSE markers matter so much. Without them, every silent choice looks like a confident answer. With them, you can see exactly where the system branched — and which branch was taken.

But COLLAPSE markers only help after a choice has been made. Determinism is about preventing the ambiguity that forces the choice in the first place.

The three sources of non-determinism

1. Asymmetric connections

If your knowledge includes relationships between concepts — A connects to B — both sides need to describe the connection. If only A mentions B but B doesn't mention A, then an agent starting from B will build a model where that connection doesn't exist.

This is the most common failure. It feels like thorough documentation. It isn't.

The test: for every connection in your system, can you find the description from both sides? If not — you have asymmetric coverage, and reading order determines what agents know.

2. Terminology drift

The same concept named differently in different places. "Decision" in one file, "resolution" in another, "outcome" in a third — all meaning the same thing.

Each individual file is internally consistent. But across files, an agent has no way to know these are the same thing. It builds three separate concepts. And when it reasons about them, the models diverge.

The fix is not renaming everything in one pass — that introduces iatrogenesis. The fix is a terminology map: here are all the names we use for this concept, and this one is canonical. Then agents can normalize before reasoning.

3. Layer violations

When information lives on the wrong layer, agents reading only the correct layer miss it. An agent doing a deep-spec pass reads specs — and misses the business rule that was written in the navigation layer because it seemed important at the time.

This creates a specific kind of non-determinism: the model depends not just on reading order, but on reading depth. An agent doing a shallow pass and a deep pass build different models — even from the same starting point.

How to test for determinism

Three practical tests. You don't need multiple agents to run them — you can do them manually.

Test 1 — The reverse order test

Pick your five most important questions about your domain. Answer them from your knowledge base, starting from the first file you'd naturally open.

Now answer them again, but start from a file you'd normally read last.

Do the answers change? If yes — you have reading-order dependency. Something is determined by entry point, not by content.

Test 2 — The two-path test

Pick one connection or relationship in your system. Find where it's described from side A. Find where it's described from side B (if it exists). Do both descriptions agree on: what the connection is, when it applies, who initiates it?

If they disagree on any of these — you have a COLLAPSE:RED that hasn't been marked yet.

If side B doesn't exist — you have an asymmetric connection. An agent starting from B will never know this relationship exists.

Test 3 — The empty layer test

For each layer in your pyramid, ask: what would an agent know if it read only this layer and nothing below?

Then ask: is that the right amount for this layer to communicate?

If the navigation layer contains business rules — agents reading only the navigation layer will build an overconfident model. If the spec layer is missing fields that appear in the scenario layer — agents building from specs will have an incomplete model.

The right answer for each layer is: exactly what belongs here, nothing more, nothing less.

What done looks like

A knowledge system is deterministic when:

Every important question has one unambiguous answer traceable to a specific source
The same question answered by any agent produces the same answer regardless of reading order
Every connection is described from both sides with consistent details
Every layer contains exactly what belongs there — no more, no less
Every contradiction has a COLLAPSE marker — none are silent
Incomplete coverage is labeled explicitly, not hidden

That last point matters more than it sounds. "I've covered the main sources" is not a status. "I've read 14 of 20 sources; the remaining 6 require a second pass for: [specific topics]" is a status.

The difference is whether you know what you don't know. A system that knows its own gaps is more useful than one that presents itself as complete.

The linter vs the method

A linter checks form: file exists, link not broken, syntax valid. You can have a perfect linter score and a completely non-deterministic knowledge system. All files present, all links valid, all formats correct — and two agents still build different models.

Determinism is semantic. It's checked by:

Running the hallucination trap catalog on every file
Marking every COLLAPSE — no exceptions, no obvious choices
Verifying symmetric connections
Auditing layer distribution
Running the three tests above

This is not a one-time setup. It's a recurring check — every time you add significant content, every time you cross a domain boundary, every time you connect two previously separate knowledge areas.

The linter checks form. This method checks meaning.

Where we go from here

The series has covered the full method:

Part 1 — five core principles: write everything explicitly, use layers, catalog hallucination traps, mark collapses, be the gateway.

Part 2 — agent specialization as protection against iatrogenics, and the failure patterns that look like work but aren't.

Part 3 (this post) — determinism as the third requirement, its three sources, and how to test for it.

Next: we leave method and go into domain. Real failure modes, real examples — starting with the one that caught OpenAI off guard: confabulation versus hallucination, and why the distinction changes how you build.

Method developed from a real working system. The principle works with any stack, any tools, any domain.

References: Zhang et al. arXiv:2510.04618 (ACE, 2025), Luhmann (1981).

Your Knowledge, Your Model — Part 2: Agents, Iatrogenics

Tim Maximov — Tue, 31 Mar 2026 10:38:16 +0000

In Part 1, I described the problem and five principles: write everything explicitly, use layers, catalog hallucination traps, mark silent collapses, stay the gateway.

But I left out three things deliberately — they needed more room.

How to build agents that don't break each other.
What not to do when building the system.
How to know the system actually works.

That's this post.

Why one smart agent is the wrong architecture

The obvious setup: one powerful agent that reads everything, understands everything, fixes everything. It's flexible, it adapts, it "gets the context."

The problem: one agent is one point of iatrogenesis.

Iatrogenesis is a medical term — when the treatment causes another disease. The doctor fixes the knee, damages the nerve. The surgeon removes the tumor, introduces infection. In medicine it's a known, studied risk. In information systems almost nobody names it.

In agent systems it looks like this: the agent fixes a contradiction in file A, and in doing so creates a new one in file B. Or it summarizes a long document and loses a nuance that was critical three layers down. Or it rewrites a section "for clarity" and subtly shifts the meaning — consistently with its own model, not yours.

The fix is the same as in surgery: specialization. A surgeon who only does knees doesn't touch nerves. An agent that only reads and builds a model can't accidentally break anything — it has no write access. An agent that only adds content never rewrites existing text — by design.

Each role, one responsibility, one explicit constraint:

Reader       →  extracts understanding. builds a model. never edits.
Verifier     →  runs the hallucination checklist. emits COLLAPSE markers.
Surgeon      →  reads the report. makes targeted edits. never rewrites.
Mirror       →  checks symmetry. is every connection described from both sides?
Filler       →  only adds. never touches existing content.
Auditor      →  checks distribution. where does info live on the wrong layer?

These aren't product names — they're roles. Your implementation might have three of them or fifteen. The principle is the same: every constraint is a removed degree of freedom for error.

And the number of agents is unlimited. When a new domain appears, you add a role for it. The pipeline scales horizontally without changing the core.

Iatrogenics: the patterns that look like work but aren't

Beyond multi-agent design, there are recurring failure modes that appear when building any knowledge system with LLMs. They all share the same shape: they feel productive, they produce output, and they quietly degrade the system.

"The main stuff is covered."

This is rationalization of incomplete reading. There are no unimportant parts of a knowledge system. An unread source isn't "background context" — it's a blind spot. The only honest alternative: flag it explicitly. "Read 12 of 20 sources. The remaining 8 require a second pass covering: [list]." Incomplete analysis doesn't get called analysis.

"Exactly N items in the system."

Hardcoding counts is a pressure artifact. The agent is implicitly told: fit reality into this number. So it does. It picks the five most prominent issues and ignores three others that are equally real. The count should come from the content, not the prompt. Always.

"Top 5 problems."

Same mechanism, worse outcome. Limiting the output count is not brevity — it's information loss. If there are eight problems, there are eight problems. The agent that returns five has made an editorial decision on your behalf, without flagging it.

"Describe it briefly."

Volume should follow content, not instructions to be concise. Nuance lives in the details that get cut first. If a brief description is genuinely sufficient — it will be brief naturally. Forcing brevity on a complex topic produces a description that sounds complete and isn't.

Rigid output format.

This is the subtlest one. If your prompt requires "five sections" — you will always get five sections. Even when the content calls for three or eight. The agent optimizes for format compliance, not for accuracy. Format should follow content. Always the other way around from what most people do.

The ACE paper (Zhang et al., arXiv:2510.04618, 2025) describes the same failure at the architectural level and calls it brevity bias — the system optimizes for shorter outputs because shorter is what the format rewards. The same dynamic happens in knowledge systems when you constrain the output shape before you understand the content shape.

The series so far — and where it goes

Part 1 — the problem and five principles: write everything, use layers, catalog hallucination traps, mark collapses, be the gateway.

Part 2 (this post) — agents and specialization, iatrogenics patterns.

Part 3 — determinism as an engineering requirement: what it means, how to test for it, and what "done" actually looks like.

After that: real failure modes from real domains — confabulation vs hallucination, static automation, and why your tech stack is a consequence not a decision.

Method developed from a real working system. The tools — Markdown, Obsidian, open-source Copilot — are one implementation. The principle works with any stack.

References: Zhang et al. arXiv:2510.04618 (ACE, 2025), Li et al. arXiv:2507.03724 (MemOS, 2025), Luhmann (1981).

Your Knowledge, Your Model — Part 1: A Method for Deterministic Knowledge Externalization

Tim Maximov — Tue, 31 Mar 2026 10:25:02 +0000

Knowledge in your head is not knowledge. It's yours — right now, in this context, while the project is active. The moment you switch domains, add a new project, or let a few months pass — it starts to degrade. Not because your memory is bad. Because volume exceeds what a human brain handles linearly and reliably.

People have tried to solve this forever: index cards, Zettelkasten, GTD, wikis, Second Brain. Each one — an attempt to externalize thinking without losing its structure.

But LLMs changed the game. Now you need to externalize knowledge so a machine can read it. And read it correctly. Without hallucinating. Without silently choosing between two contradictory versions of the same fact.

This is a post about a method. Not a tool. Tools are a variable — Markdown, Obsidian, Notion, plain text files, whatever works for you. The principle is the constant.

Why RAG and NotebookLM don't solve this

Both solve a search problem. RAG — vector similarity over chunks. NotebookLM — reads 300 files, builds its own understanding, answers questions.

The key word is its own. NotebookLM builds its model of your system — not yours. If your mental model differs from what's "obvious" to the LLM — you won't know. It will answer confidently, fluently, plausibly. And incorrectly.

RAG — solves: finding relevant chunks · doesn't solve: authorial consistency, your structure

NotebookLM — solves: summarization and Q&A · doesn't solve: preserving your interpretation, controlling the output

This method — solves: externalizing your model without distortion · doesn't solve: it's not a search tool; it requires structural work

RAG solves retrieval. This method solves a different problem — preserving authorial epistemology when transferring knowledge to a machine.

Principle 1: If it's not written, it doesn't exist

LLMs don't infer. Don't fill in gaps from context. Don't reconstruct what isn't there.

Technically — they do fill gaps, but from their own weights, not your logic. If a rule exists only in your head — for the system it doesn't exist. If a critical detail is in one file out of twenty — for the agents reading the other nineteen, it's not there.

This is not a bug. It's their nature. And you have to work with it architecturally, not with prompts.

Principle 2: Layers, not a pile

The most common disease of any knowledge system — gravity toward the entry point. Information accumulates where it's first opened. In wikis — the main page. In Notion — the first dashboard block. In a personal knowledge base — the most-visited note.

The fix: an explicit layer pyramid. Each layer expands the one above — never repeats it.

Layer 1 — Navigation
  One paragraph per topic + a link down.
  Rule: if something takes more than 7 lines — it belongs in the next layer.

Layer 2 — Meaning
  Why, for whom, by what rules. No technical details.
  Readable by someone without specialized knowledge.

Layer 3 — Structure
  Components, architecture, interaction map.
  All specifications grow from here.

Layer 4 — Scenarios
  Step-by-step flows with real data.
  Read like a test case: trigger → step 1 → step 2 → outcome.

Layer 5 — Specs
  Exact fields, types, formats. Facts only —
  explanations already live above.

If an agent reads only the top layer — it builds a shallow model with 100% confidence. Explicit layers tell the agent where to go for depth. And if a lower layer is empty — that's not "details not needed", it's "details not written". The difference matters.

Principle 3: Hallucination traps are predictable

Hallucination is a predictable response to ambiguity in text. Not random. Not a specific model's bug. A structural inevitability at certain writing patterns.

Which means: you can build a catalog of patterns that deterministically produce wrong output when read by an LLM.

Floating pronoun — critical. "The system receives a request. It passes it for processing. Then the component checks permissions." With two or more subjects, the LLM picks by its weights. Hallucination guaranteed. Fix: explicit names everywhere.

Heading contradicts body — critical. Headings carry more weight in transformers. If a section is titled "Synchronous Processing" but the body describes a queue — the LLM takes the heading. A human reader would notice. The LLM won't.

Undefined modal verb — high risk. "The component may call the external service." When? Under what condition? Fix: always state the condition — "only if X", "when Y".

Confabulation — fact without source — critical, the sneakiest. An agent states something specific about the system — plausible, consistent with context, but nowhere written. Unlike hallucination, confabulation sounds convincing. You can't catch it without checking: is this fact actually in the source?

Passive voice without an actor — high risk. "The message is normalized and forwarded for processing." Who normalizes? Who forwards? The LLM will decide. In a complex system — almost certainly wrong.

Run this catalog as a checklist on every file. It's not style editing. It's engineering verification: is this text deterministic when read by an LLM?

Principle 4: Make the silent choice visible

When an LLM encounters two different descriptions of the same fact — it doesn't stop. Doesn't flag the contradiction. It silently collapses: picks one version and continues with 100% confidence. No trace. No marker.

Why is this worse than regular hallucination? Because the collapsed version is real. It came from one of your sources. If you check "is this in my notes?" — the answer is yes. Just not the right version.

Researchers at Stanford and UC Berkeley named this in the ACE paper (Zhang et al., arXiv:2510.04618, 2025): context collapse — when iterative context rewriting erases accumulated detail. They measured it: at step 60 the context held 18,282 tokens at 66.7% accuracy, then collapsed to 122 tokens — and accuracy dropped to 57.1%, below baseline.

The fix: make every choice visible with a marker.

[COLLAPSE:RED]
CHOSEN:      "decision at execution time" — source: note from March 15
ALTERNATIVE: "decision at planning stage" — source: note from February 2
REASON:      newer source, but contradiction requires human resolution

Four levels by severity:

RED — changes a fundamental decision. Stop. Don't continue until resolved by a human.
YELLOW — implementation details diverge. Continue, mark explicitly.
GRAY — terminology mismatch. Continue, flag for unification.
UNRESOLVED — contradiction noticed, source unclear. Minimal marker beats silence.

Any choice between two versions of one fact = a marker. No exceptions.

Principle 5: You are the API gateway

There's a temptation to think of this as automation. "Set up agents, they run, I step back."

That's the wrong frame.

The value is not that agents work without you. The value is that you control what gets passed, to whom, in what format, at what level of detail. You decide what understanding the next agent needs. You decide the depth.

In software architecture there's the role of API gateway: not automation — control over flow. The backend stores data. The gateway decides what to expose and how. You're the gateway between your externalized model of the world and whoever works with it next.

This is not a tool for those who want to delegate thinking. It's for those who want to scale it — while remaining the author.

Where this sits in the landscape

Three active research directions work nearby — none occupies the same point.

Context engineering became a named discipline in 2025. Andrej Karpathy defined it as "the delicate art and science of filling the context window with exactly the right information." But it answers how to present information to a model — not how to organize it so your mental model survives the transfer.

Agent memory management — ACE (Zhang et al., 2025) and MemOS (Li et al., arXiv:2507.03724, 2025) — work at the infrastructure layer: incremental updates, memory versioning, full lifecycle. This method works one level above — in organizing the knowledge itself before it reaches any agent infrastructure.

PKM + AI — the Obsidian/Notion community. Their key insight: instead of going to the AI, put the AI inside your system. Context lives in files, not in model memory. Close in spirit — but stops at integration. No one goes as far as determinism, a hallucination trap catalog, and an explicit protocol for flagging contradictions.

The unfilled point is at the intersection of all three. Not context engineering. Not agent memory. Not PKM+AI.

It's about organizing knowledge so that any agent, reading it in any order, reproduces your model. Not the averaged one. Not the "obvious" one. Yours.

This is not a product

It's a method. A set of principles from which everyone builds their own implementation — for their domain, their volume, their tools, their working style.

Luhmann built his Zettelkasten for sociology — and everyone's is their own. Forte built his Second Brain for productivity — and everyone adapts it differently. This method is built for working with LLMs — and your implementation will be yours.

The common thread: information externalized so the authorial model stays authorial. Not simplified by the tool. Not completed by the model. Not silently collapsed at the first contradiction.

That's the whole principle. One answer. Infinitely many implementations.

Method developed from a real working system. Tools mentioned — Markdown, Obsidian, open-source Copilot — are not required. The principle works with any stack.

References: Luhmann (1981), Zhang et al. arXiv:2510.04618, Li et al. arXiv:2507.03724, Forte (2022).