SDD doesn’t solve AI’s blind spots: it just rearranges them

Renato de Matos — Sat, 16 May 2026 19:05:13 +0000

I adopted Spec-Driven Development (SDD) expecting that a more rigorous process would guarantee me the result I was looking for. Detailed specs, acceptance criteria, a well-configured harness. In part, it worked. On line-by-line implementation, the model does well when the tooling is sharp. But in some cases the model would assume things: about the domain, about the shape of the data, about what the interface needed.

Seen from the initial angle, those assumptions looked correct. They made sense, they got approved, they passed the criteria. The problem is that they carried ambiguities that only surfaced in the final result. SDD didn't make those blind spots disappear; it just distributed them, neatly, across well-documented tasks. Each ambiguity was still right there, intact.

And this is where the cost lives. When the bottleneck only shows up at the end, after spec, subspecs, and tasks have already been written, the damage isn't limited to reworking code. It's rethinking the entire feature, rewriting artifacts, and spending tokens all over again on the whole planning cycle.

So I learned a lesson: SDD and a good harness aren't enough. What was missing was looking at the project's layers and spotting the bottlenecks before the whole apparatus was already assembled, not after.

What is a blind spot?

It's worth clarifying which kind of blind spot I'm talking about, because it isn't the kind of problem the tools already solve. Tools like Claude Code already handle explicit ambiguity well: there are harnesses that stop and ask, that plan before they execute. I test these tools, and they help. But the blind spot I'm talking about here is a different one. It isn't a question left unanswered; it's the inconsistency that's born when each part of the project evolves from its own assumptions and, when you see them working in practice, those individual evolutions start to conflict with one another.

The interface evolved assuming one thing about the requirements. The domain model evolved assuming another about the same requirements. The backend evolved assuming a third. Each one, in isolation, makes sense. It's at the meeting point that the conflict appears, but by then it's already too late.

I would specify a feature end to end, with UI and UX, domain rules, and the backend layer described in detailed specs, and the result would come back with domain invariants violated not in explicit code, but in side effects, in inconsistencies when connecting the pieces. The method had imposed order on the process, but the error kept being born in the same place. I couldn't blame the tool or the spec: the problem wasn't the quality of the generated code, but the granularity of what I expected from a single complete SDD flow.

The reason a feature spec is so dense with assumptions is simple: a feature is not an atomic unit of work. It couples, at a minimum, three decisions of distinct natures:

the presentation layer: navigation, visual hierarchy, interaction states;
the domain model: entities, invariants, relationship cardinality, the rules that guarantee data consistency;
the backend: the API and its contract, the mechanism, persistence, the infrastructure that holds everything up.

These are three modes of reasoning competing for the same attention budget of the model. Each boundary between them, each undeclared coupling, generates blind spots waiting for an assumption. And when it's a full-stack feature, all three are always in play at the same time.

Why SDD reorganizes but doesn't solve

Here's what tends to be said about SDD, and which is true: since AI works better with smaller tasks and short contexts, SDD helps turn requirements into small, executable tasks. This does in fact work. But, day to day, this alone isn't always as effective as it seems.

Decomposing requirements into executable tasks solves half the problem. The other half is using AI in a specialized way at each layer of the project, and that's what took me a while to understand. AI has great tooling for building UI, dedicated resources for documentation and data modeling, and specific support for the backend layer. Each layer has its best instrument.

The problem shows up when you bundle everything into a single request. When you say "run an SDD flow for this UI, following this architecture, aware of this modeling, decompose it into tasks and use subagents," the work does end up organized. But the three layers stay coupled in the same contexts, and the model doesn't use the best instrument for each one. It averages across the three. The spec got tidier, and that's all: the coupling between the layers stayed exactly where it was. The AI's potential, instead of drawing out the best of each layer, ended up diluted in the search for a common ground between them.

And then comes the step that gives my SDD workflow its name: the confrontation. You compare where each layer went, observe where they converge and where they diverge. It's this confrontation that brings the blind spot to the surface, while it's still cheap to fix.

The thread to follow is this: organizing the work into tasks only distributed the blind spot across organized tasks. To truly neutralize it, each decision needs to be resolved in its own scope, within its own constraints, drawing on the best tooling that exists for its type.

The turning point: explore each layer, then confront them

None of this is my invention. Separating presentation, domain, and data is the principle behind layered architecture, separation of concerns, and Domain-Driven Design, and has been for years. What changes in AI-assisted development is that this old discipline stops being just a good architectural practice and becomes a prompting strategy, applied to the process of instructing the model and not only to the code that comes out of it.

1. I design the interface. This is one of the most counterintuitive steps. I build the UI flow fed by sample data: placeholder values and text, just enough for the interface to render. This is deliberate: at this moment a stable data model doesn't yet exist; it will be built over the course of the iterations with the other layers. The goal is to derive the interface from the client's requirements, and not from a legacy data structure or one imagined out of assumptions and biases. Refactoring the UI is part of the plan. It starts out purely visual, validating the client's expectation against the requirements; later it's confronted with the constraints of the other layers and, over the iterations, gains logic, structure, organization, and modularization.

2. I model the data. Here I flip the focus and set the interface aside. I model the domain autonomously: I define entities and attributes, relationship cardinality, and the invariants, the rules that must remain true regardless of any screen. An order doesn't allow a negative total; a booking can't end before it begins. I don't think about which field appears where in the UI, only about the internal consistency of the model. The result is a description of the domain that stands on its own: no UI, no backend, just the model.

3. I build the comparison plan. With the two prototypes ready, I confront one against the other and map the gaps in both directions. On one side, where the UI demands something the model doesn't support: a screen that shows "average response time" while the model never persisted the timestamps needed to derive that metric. On the other, where the model exposes something nobody consumes: entities with no point of use, a symptom of a missing use case or an oversized model. This step is the explicit act of making the blind spot visible. The trade-offs that would otherwise stay implicit become named, documented items, instead of ambushing the rigid implementation along the way.

4. I refactor the layers toward convergence. With the divergences mapped, I rewrite each layer toward the point of convergence. In the traditional flow, this adjustment would be the "cleanup" at the end of the task, the leftover; here it's a first-class step, a design decision that heads off emergent complexity as early as possible. If a layer assumed premises far from what the others allow, this step may require a substantial rewrite. But that's exactly what step 3 exists for: the earlier the confrontation happens, the lower the cost of the fix.

5. I validate the behaviors. I implement mocks that reproduce the contract and behavior of the real API: endpoints, schemas, status codes, error and loading states. This is only viable after step 4, because a mock presupposes a contract to imitate, and that contract only stabilized after the comparison. With the mocks in place, I validate complete flows, including error cases and intermediate states, without being blocked on the backend, iterating in short cycles. I iterate with AI here too, validating the whole scope. The idea is to reach the backend with the fewest dependencies possible.

6. I implement the backend. Now comes the implementation of the APIs and the infrastructure, anchored in a validated data model and a UI with stabilized requirements. The contract already exists and has already been exercised by the mocks, so the backend doesn't have to guess anything: it has a precise target to chase. All the ingenuity of the implementation can then be invested where it truly matters, in tailored solutions for a problem that's already well defined, instead of being spent dealing with still-uncertain requirements.

7. I do the E2E integration. Finally, I connect the layers and validate the flow end to end. Since each layer has already gone through verification in isolation, the defects that show up here tend to be genuine integration defects, and not business rules forgotten back at the start.

The process is iterative: a first pass through all the layers, then a v2, then a v3. Each cycle is narrower than the last.

Why this works so well with AI

The gain isn't just that the model makes fewer mistakes, though that does happen. Narrow specs compress the space of ambiguity where it would fill gaps with assumptions. The central gain is that the error stops being silent.

When each layer is resolved in isolation, it becomes an independent checkpoint. You can inspect the UI and assert that it meets the requirements without that assessment depending on the backend. You can inspect the model and assert that the invariants are consistent without that depending on the screen. The blind spot doesn't disappear by magic, but it's pushed to the surface, to the comparison step, where someone, or the model itself on the next iteration with a smaller context, can detect it before it reaches production.

This seems to contradict a common intuition. If "the more specific context, the better," why not hand over the whole detailed specifications all at once? Because that level of specificity is a result, not a starting point. A feature's bottlenecks don't reveal themselves in the brief; the blind spots emerge when you push each layer to its limit and confront the results. And then, after the confrontation, you have context precise enough to make it worth writing detailed specs, and not before, when they would still be an expensive guess in tokens.

I've been using this approach even with teams at an S&P 500 company, and the result has been consistent: more predictability and less friction than in typical AI-assisted development.

I don't think it's a silver bullet. But, for me, it has changed for good the way I work with AI in software development.

And you? How have you been structuring AI-assisted development?

Forem: Renato de Matos

SDD doesn’t solve AI’s blind spots: it just rearranges them

What is a blind spot?

Why SDD reorganizes but doesn't solve

The turning point: explore each layer, then confront them

Why this works so well with AI