Forem: ke yi

The AI Development Life Cycle (AIDLC): Why Your ML Projects Need More Than SDLC

ke yi — Tue, 26 May 2026 07:37:18 +0000

If you've ever shipped a machine learning model to production, you know the feeling. Everything works beautifully in your notebook, the metrics look great in staging, and then... three weeks after deployment, accuracy quietly tanks. Nobody notices until a stakeholder asks why the recommendations got weird.

This is the gap traditional software development practices don't fill. SDLC was built for deterministic systems—code that does the same thing every time. ML systems aren't deterministic, they're statistical. They decay. They drift. They need to be retrained on schedules that have nothing to do with feature releases.

Enter the AI Development Life Cycle (AIDLC).

What AIDLC Actually Is

AIDLC is a structured framework for building, deploying, and maintaining AI systems. It borrows the discipline of SDLC but adds the loops and feedback mechanisms that ML systems actually need.

The core stages look like this:

Problem Framing → Data Engineering → Model Development 
       ↑                                        ↓
       └── Iteration ← Monitoring ← Deployment ← Evaluation

Notice it's a loop, not a line. That's the whole point.

Why SDLC Falls Short for ML

Traditional SDLC assumes:

Requirements can be fully specified upfront
Code behavior is deterministic
"Done" means shipped
Bugs are reproducible

ML breaks all four assumptions:

You often don't know if a problem is solvable until you try
Models produce probabilistic outputs
Shipping is the start of the real work
Bugs may be data issues, not code issues, and may only appear weeks later

A model that achieves 94% accuracy on Tuesday might hit 81% by Friday because user behavior shifted. Your CI/CD pipeline doesn't know that. It thinks everything is fine because the tests pass.

The Seven Stages, Briefly

1. Problem Framing

This is where most ML projects quietly fail. "Build a churn prediction model" isn't a problem statement—it's a wish. You need:

A measurable business outcome
A clear definition of what counts as a positive/negative example
Constraints (latency, cost, interpretability)
A baseline (what does "doing nothing" look like?)

2. Data Engineering

Pipelines, feature stores, labeling workflows, train/validation/test splits that respect time and entity boundaries. If your data engineering is sloppy here, nothing downstream will save you.

# Time-aware splitting matters for production ML
def temporal_split(df, date_col, train_end, val_end):
    train = df[df[date_col] <= train_end]
    val = df[(df[date_col] > train_end) & (df[date_col] <= val_end)]
    test = df[df[date_col] > val_end]
    return train, val, test

3. Model Development

The fun part. Also the part teams over-invest in. Spend less time tweaking architectures and more time on stages 2, 5, and 6.

4. Evaluation

Beyond accuracy/F1, you need:

Slice-based metrics (does it work for all user segments?)
Calibration analysis
Robustness tests
Business-metric simulation

# Slice evaluation - check performance across segments
for segment in ['new_users', 'power_users', 'enterprise']:
    subset = test_df[test_df['segment'] == segment]
    score = evaluate(model, subset)
    print(f"{segment}: {score:.3f}")

5. Deployment

Containerize, version, expose. Patterns like shadow deployment and canary rollouts matter here. Your model artifact, training data hash, and code commit should all be linked.

model_version: v2.3.1
training_data_hash: a3f9c2...
git_commit: 8b4d1e2
deployed_at: 2024-11-15T10:30:00Z
shadow_traffic: 100%
production_traffic: 0%

6. Monitoring

This is where AIDLC really diverges from SDLC. You're not just watching error rates and latency—you're watching:

Data drift: Are inputs distributionally different from training data?
Concept drift: Has the relationship between inputs and outputs changed?
Prediction drift: Are output distributions shifting?
Performance decay: When ground truth becomes available, how is accuracy holding?

from scipy.stats import ks_2samp

def detect_drift(reference, current, threshold=0.05):
    stat, p_value = ks_2samp(reference, current)
    return p_value < threshold  # True = drift detected

7. Iteration

Retraining isn't an emergency response—it's a scheduled, automated, governed process. The output of monitoring feeds directly into the next iteration cycle.

The Tooling Problem

Most teams cobble AIDLC together from a dozen tools: MLflow for tracking, Airflow for orchestration, custom dashboards for monitoring, Slack for alerts, Confluence for documentation that nobody reads. The integration overhead is real, and the gaps between tools are where production incidents live.

This is the space echloe operates in—giving teams a unified methodology and tooling layer for AIDLC so they're not reinventing the wheel for every new model. The methodology piece matters as much as the tooling, honestly. A tool without process discipline just produces problems faster.

What Adoption Actually Looks Like

Teams that formalize AIDLC tend to see meaningful operational improvements—roughly 3x faster time-to-production is a number that gets thrown around, and from what I've seen it's plausible if you're coming from an ad-hoc baseline. But the real win isn't speed; it's that you stop being surprised by your own systems.

Generative Engine Optimization (GEO): What Devs Need to Know About Getting Cited by AI

ke yi — Mon, 25 May 2026 08:27:08 +0000

Generative Engine Optimization (GEO): What Devs Need to Know About Getting Cited by AI

If you've shipped a product in the last year, you've probably noticed something weird in your analytics: referral traffic from chat.openai.com, perplexity.ai, or gemini.google.com. Sometimes a trickle. Sometimes a surprising amount.

That's not SEO traffic. That's GEO traffic — visits driven by AI engines citing your content in their generated answers.

I've been digging into this for a few months while building marketing flows at echloe, and the mental model is genuinely different from SEO. Worth writing down.

SEO vs GEO: a quick reframe

Classic SEO is a ranking problem:

Goal: rank in the top 10 blue links
Unit of success: position + CTR
Optimization target: a query → a page

GEO is a citation problem:

Goal: be the source the LLM quotes when synthesizing an answer
Unit of success: being mentioned (often with a link) inside a generated response
Optimization target: a topic/entity → a model's training and retrieval pipeline

You're not trying to outrank a competitor. You're trying to be the most useful, most trustworthy chunk of text that an LLM can grab when it builds an answer.

That distinction changes everything about how you write and structure content.

How AI engines actually pick sources

There's no public algorithm doc, but the pattern across ChatGPT Search, Perplexity, Gemini, and Claude looks roughly like:

Query understanding — break the user's question into sub-claims.
Retrieval — pull candidate documents (web search, vector DB, internal index).
Re-ranking — score chunks for relevance + authority.
Synthesis — generate the answer, citing 2–7 sources.

So your content needs to survive three filters: be retrievable, be re-rankable, and be quotable.

Tactics that actually move the needle

1. Write in extractable chunks

LLMs love self-contained paragraphs that answer one question completely. The 12-section listicle padded with intro fluff? Useless. A page where each H2 is a clear question and the first 2–3 sentences answer it definitively? Gold.

Bad:

"In today's fast-moving world of containers, many developers wonder about the differences between tools..."

Good:

"Docker Compose runs multi-container apps on a single host. Kubernetes orchestrates containers across a cluster. Use Compose for local dev; use Kubernetes for production scale."

That second version is quotable. An LLM can lift it verbatim.

2. Add structured data — yes, really

Schema.org markup is having a second life. Models trained on Common Crawl ingest it; retrieval systems use it as metadata.

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "TechArticle",
  "headline": "Generative Engine Optimization Explained",
  "author": {
    "@type": "Person",
    "name": "Jane Dev",
    "url": "https://janedev.com"
  },
  "datePublished": "2024-11-15",
  "about": {
    "@type": "Thing",
    "name": "Generative Engine Optimization"
  },
  "citation": [
    "https://arxiv.org/abs/2311.09735"
  ]
}
</script>

The author, citation, and about fields are particularly useful — they help engines verify expertise and topical relevance.

3. Be present across platforms

This is the big mindset shift. Your domain isn't enough.

LLMs synthesize from:

Wikipedia (huge weight)
Reddit and Stack Overflow
GitHub READMEs and discussions
YouTube transcripts
Substack/Medium/Dev.to (hi 👋)
Industry-specific forums

If your project only exists on yourdomain.com, you're invisible to half the retrieval surface. A README with clear language, a few thoughtful Reddit answers, a Stack Overflow presence — these compound.

This is part of why we built echloe the way we did: it tracks where your brand gets cited across AI engines and surfaces the gaps in your cross-platform footprint, because manually checking ChatGPT vs Perplexity vs Gemini for "best [your category] tool" gets old fast.

4. Establish entity authority

LLMs think in entities, not keywords. "Stripe" is an entity. "payment processing API" is a topic. The model maps queries about the topic to entities it associates with that topic.

To become an entity the model recognizes:

Get a Wikipedia or Wikidata entry if you legitimately qualify
Use consistent naming everywhere (don't be "Acme", "Acme Inc.", and "Acme.io" across different sites)
Build co-occurrence: get mentioned alongside well-known entities in your space

A quick check — try this prompt in any LLM:

List the top 5 tools for [your category]. 
For each, give a one-sentence description.

If you're not in the list, the model doesn't have a strong entity association for you yet. That's the gap to close.

5. Monitor citations like you monitor errors

You wouldn't ship without observability. Same here. A simple monitoring loop:


python
import openai

queries = [
    "What is the best tool for X?",
    "How do I solve Y problem?",
    "Compare A vs B for use case Z"
]

def check_citations(brand_name, queries):
    results = []
    for q in queries:
        response = openai.chat.completions.create(
            model="gpt-4o-search-preview",
            messages=[{"role": "user", "content": q}]
        )
        text = response.choices

AWS AI-DLC: The Agentic Dev Lifecycle That Works Everywhere

ke yi — Tue, 19 May 2026 09:15:32 +0000

AWS AI-DLC: The Agentic Development Lifecycle That Works Across Every IDE

TL;DR: AI-DLC (AI-Driven Development Life Cycle) is AWS's answer to "vibes-based" AI coding. It's a rule-based steering system — not a tool or library — that transforms AI pair-programming from ad-hoc prompting into a structured, three-phase lifecycle (Inception → Construction → Operations). It runs on any AI coding agent that supports rule files: Kiro, Amazon Q, Cursor, Claude Code, Copilot, and more. The rules are the same everywhere; only the file location changes. Your workflow state persists in plain workspace files, so you can switch IDEs mid-project without losing progress.

Key Takeaways

AI-DLC is a methodology delivered as rule files — it works on 7+ coding assistants because it's agent/IDE/model agnostic. The rules file content is identical across platforms; only the path differs (.kiro/steering/, .cursor/rules/, CLAUDE.md, etc.).
The three-phase architecture (Inception, Construction, Operations) is adaptive — simple bug fixes skip most stages, while complex system migrations get full treatment with per-unit design loops.
Reverse Engineering automatically scans existing codebases to build context artifacts that feed every subsequent stage. Without it, AI coding agents duplicate services, break patterns, and ignore existing architecture.
The per-unit construction loop means complex projects get decomposed into parallelizable work packages, each with its own functional design → NFR → code generation → test cycle.
An extension system lets enterprises layer blocking constraints (HIPAA compliance, internal SDK rules, security baselines) that halt the workflow on violations — not just warnings, actual blockers.
Session continuity works via plain workspace files (aidlc-docs/aidlc-state.md). Start in Cursor on Monday, switch to Claude Code on Wednesday — the AI reads the same state file and resumes exactly where you left off.

The Problem This Solves

We've all been there. You open your AI coding agent — Claude Code, Cursor, Copilot, whatever — and type "build a user API." The agent immediately starts writing code. It picks REST (you wanted GraphQL). It uses Express (your team uses Fastify). It creates a new auth service (you already have one). It ignores your company's error code standards.

The fundamental issue isn't the AI's coding ability. It's that nobody told the AI to ask questions first.

AI-DLC fixes this by inserting a structured questioning and planning phase before any code is written. It's the difference between a junior dev who immediately starts typing and a senior architect who says "wait — let me understand the requirements, check the existing system, and propose an approach before we write a single line."

Core Philosophy

Before diving into the mechanics, here's what drives the design:

Principle	What It Means in Practice
Adaptive Execution	Only execute stages that add value; a bug fix doesn't need user stories
Human in the Loop	AI proposes, human approves — every phase has an explicit gate
Methodology First	Agent/IDE/model agnostic; works anywhere rule files can be loaded
Reproducible	Rules are explicit enough that different AI models produce similar outcomes
No Duplication	Single source of truth; generate artifacts rather than maintain copies

The Three-Phase Architecture

AI-DLC structures development into three phases, each answering a different question:

Inception — WHAT to build and WHY. This is where requirements get clarified, existing code gets understood, and work gets planned. Contains six stages: Workspace Detection, Reverse Engineering, Requirements Analysis, User Stories, Workflow Planning, and Application Design. For a simple bug fix, most of these get skipped. For a new platform, they all fire.

Construction — HOW to build it. This is the per-unit loop where each piece of the system gets designed, coded, and tested independently. Contains: Functional Design, NFR Requirements, NFR Design, Infrastructure Design, Code Generation (plan + execute), and Build & Test.

Operations — How to DEPLOY and RUN. Currently a placeholder in the framework — future phases for deployment, monitoring, incident response, and maintenance.

Adaptive Depth

The framework uses three depth levels that scale documentation rigor to problem complexity:

Depth	Trigger	Example
Minimal	Clear, simple request	Bug fix with known root cause
Standard	Normal complexity, some ambiguity	Feature addition with defined scope
Comprehensive	High-risk, multi-stakeholder	System migration, new platform

Key insight: stage selection is binary (execute or skip), but detail within executed stages is adaptive. All mandatory artifacts are always produced; only their depth of content varies.

Deep Dive: Reverse Engineering — Why It Matters

When you ask an AI to "add a payment feature" to an existing 50-file codebase, the AI needs to understand the current architecture before it can make intelligent additions. Without reverse engineering, the AI might create duplicate services, ignore existing patterns, or break established conventions.

When it triggers: Only for "brownfield" projects (existing code detected in workspace). Skipped entirely for greenfield (empty) projects.

What it produces — imagine you have an existing e-commerce backend:

aidlc-docs/inception/reverse-engineering/
├── business-overview.md        ← "This system handles order processing, inventory, and shipping"
├── architecture.md             ← System diagram: API Gateway → Lambda → DynamoDB
├── code-structure.md           ← File inventory: "src/handlers/order.ts handles POST /orders"
├── api-documentation.md        ← "GET /products returns ProductList, POST /orders creates Order"
├── component-inventory.md      ← "3 Lambda functions, 2 DynamoDB tables, 1 S3 bucket"
├── technology-stack.md         ← "TypeScript 5.x, AWS CDK, Jest for testing"
├── dependencies.md             ← "order-service depends on inventory-service via SQS"
└── code-quality-assessment.md  ← "80% test coverage, ESLint configured, no tech debt"

Why this matters: Every subsequent stage (Requirements, Design, Code Generation) loads these artifacts. When the AI generates code later, it knows which files to modify vs. create new, which patterns to follow, which services already exist, and what the dependency graph looks like.

Staleness detection: If you resume a project months later and the codebase has changed significantly, Workspace Detection compares artifact timestamps against the latest code modifications. Stale artifacts trigger a re-run.

Application Design → Units Generation

This is one of the subtler relationships in AI-DLC, and it confused me at first. They sound similar but serve completely different purposes.

Application Design = "What are the building blocks?" It identifies high-level components, their responsibilities, and how they interact. Think of it as drawing the boxes on an architecture whiteboard.

For a "Task Management SaaS" project, Application Design would produce:

## Components Identified:
- TaskService: CRUD operations for tasks, assignment logic
- NotificationService: Email/push notification delivery
- AuthService: User authentication and authorization
- AnalyticsService: Usage tracking and reporting

## Component Interfaces:
- TaskService.createTask(userId, taskData) → Task
- TaskService.assignTask(taskId, assigneeId) → void
- NotificationService.send(userId, template, data) → void

## Dependencies:
- TaskService → NotificationService (triggers on assignment)
- TaskService → AuthService (validates permissions)
- AnalyticsService → TaskService (reads events)

Units Generation = "How do we break this into parallelizable work packages?" It takes the Application Design output and groups it into units of work — logical scopes that can be developed independently.

The relationship is directional: Application Design is about architecture (what exists, how it connects). Units Generation is about development strategy (what to build first, what can parallelize). Simple projects may need Application Design but skip Units Generation (single unit). Complex projects need both.

What Is a "Unit of Work"?

A unit of work is a logical grouping of related stories/features that can be designed, coded, and tested as a cohesive package. It is NOT a microservice, a file, or a sprint. It's the smallest chunk of the system that makes sense to hand to one developer (or one AI session) and say "build this completely."

How it maps to architecture types:

Architecture	Unit =
Microservices	One independently deployable service
Monolith	A logical module with clear boundaries
Full-stack feature	Frontend + backend + database for one capability

The Per-Unit Construction Loop

Each unit goes through its own full design-and-build cycle:

Unit 1 → Functional Design → NFR → Code Generation → ✅ Done
Unit 2 → Functional Design → NFR → Code Generation → ✅ Done
Unit 3 → Functional Design → NFR → Code Generation → ✅ Done
...
All Units Done → Build and Test (integration across all units)

NFR — Non-Functional Requirements Explained

NFR = the qualities and constraints of a system that are NOT about what it does, but HOW WELL it does it.

Category	Functional (What)	Non-Functional (How Well)
Performance	"Users can search products"	"Search returns < 200ms at p99"
Security	"Users can log in"	"Passwords hashed with bcrypt, sessions expire in 24h"
Scalability	"System processes orders"	"Handles 10K concurrent orders during peak"
Availability	"Service is accessible"	"99.9% uptime SLA with automatic failover"
Reliability	"Data is stored"	"Zero data loss, RPO < 1 minute"

AI-DLC has two NFR stages per unit:

NFR Requirements Assessment          NFR Design
(WHAT quality attributes needed?)    (HOW to achieve them technically?)

"We need < 200ms response time"  →  "Use Redis caching layer + CDN"
"We need 99.9% uptime"          →  "Multi-AZ deployment + health checks"
"We need PCI DSS compliance"    →  "Encrypt at rest, tokenize card data"

NFR Requirements Assessment asks: What are your scalability expectations? Performance benchmarks? Security/compliance standards? Tech stack preferences?

NFR Design takes those answers and produces concrete architectural patterns, technology selections with justification, and infrastructure requirements.

The Extension System — Enterprise Rules That Actually Block

Most linting and compliance tools generate warnings that developers ignore. AI-DLC extensions are different — they're blocking constraints. If code violates a rule, the workflow halts.

The Two-File Convention

Each extension consists of exactly two files:

extensions/your-category/your-extension/
├── your-extension.md           ← Full rules (loaded ONLY if user opts in)
└── your-extension.opt-in.md    ← Lightweight question (always loaded)

Opt-in file (lightweight, always loaded):

## Opt-In Prompt
Would you like to enable HIPAA compliance rules for this project?
A) Yes — enforce HIPAA data handling rules
B) No — skip HIPAA compliance checks

Rules file (heavy, loaded only on opt-in):

## Rule HIPAA-01: PHI Data Classification
**Rule**: All data models MUST classify fields as PHI/non-PHI...
**Verification**: No model exists without PHI classification annotations...

## Rule HIPAA-02: Minimum Necessary Access
**Rule**: API endpoints MUST implement role-based access to PHI fields...
**Verification**: No endpoint returns PHI without role validation...

Enforcement Behavior

During Code Generation stage:
  AI generates a DynamoDB table without encryption
  ↓
  Extension SECURITY-01 check: "Encryption at rest enabled?" → NO
  ↓
  ⛔ BLOCKING FINDING — stage cannot complete
  ↓
  User sees ONLY "Request Changes" option (no "Continue")
  ↓
  AI must fix the violation before workflow can proceed

What Organizations Can Build

Category	Example Extensions
Security	SOC2 controls, PCI DSS requirements, zero-trust networking rules
Compliance	GDPR data handling, HIPAA PHI rules, FedRAMP controls
Coding Standards	Company SDK usage, naming conventions, API versioning policy
Architecture	Microservices boundaries, event-driven patterns, shared-nothing rules
Testing	Minimum coverage thresholds, mutation testing, chaos engineering
Operations	Runbook requirements, observability standards, SLO definitions

Key design decisions: extensions without an opt-in file are always enforced (no user choice). N/A rules are logged but not blocking. Extension compliance is summarized at each stage completion.

The Two-Layer File System — The Architectural Insight

This is the thing that clicked for me after staring at the repo for a while. AI-DLC uses a two-layer file system where the layers serve completely different purposes:

Layer 1 — Rule Files (HOW the AI should behave): These are STATIC. They don't change during workflow execution. They're loaded by the platform's native rules mechanism:

Platform	Rule File Location
Kiro IDE	`.kiro/steering/aws-aidlc-rules/core-workflow.md`
Amazon Q	`.amazonq/rules/aws-aidlc-rules/core-workflow.md`
Cursor	`.cursor/rules/ai-dlc-workflow.mdc`
Claude Code	`CLAUDE.md`
Copilot	`.github/copilot-instructions.md`

Layer 2 — Workflow Artifacts (state generated DURING execution): These are DYNAMIC. Created/updated as the workflow progresses. They live in aidlc-docs/ which is a UNIVERSAL location for ALL platforms:

aidlc-docs/
├── aidlc-state.md               ← "Where are we in the workflow?"
├── audit.md                     ← "What happened? Full history"
├── inception/
│   ├── plans/                   ← Execution plans, stage plans
│   ├── reverse-engineering/     ← Architecture, components, APIs
│   ├── requirements/            ← Requirements, verification questions
│   ├── user-stories/            ← Stories, personas
│   └── application-design/      ← Components, services, units
├── construction/
│   ├── plans/                   ← Code-generation plans per unit
│   ├── {unit-name}/             ← Per-unit design docs
│   │   ├── functional-design/
│   │   ├── nfr-requirements/
│   │   ├── nfr-design/
│   │   ├── infrastructure-design/
│   │   └── code/                ← Markdown summaries only, NOT actual code
│   └── build-and-test/
└── operations/                  ← Placeholder for future

The analogy: Rule files = a recipe book (instructions on how to cook). aidlc-docs/ = the kitchen (where actual cooking happens and meals are stored). The recipe book tells you to "check the oven temperature" — the oven is separate from the recipe book.

Session Continuity — How It Actually Works Across Platforms

AI coding assistants don't remember previous conversations. Each new session starts with a blank context. AI-DLC's solution: use workspace files as persistent memory that survives session boundaries.

The Continuity Mechanism Step-by-Step

New session starts → Platform loads rule file (core-workflow.md)
Rule file instructs: "Run Workspace Detection → check for aidlc-docs/aidlc-state.md"
If state file exists → AI reads it and finds:

   ## Current Status
   - **Current Stage**: CONSTRUCTION - Code Generation
   - **Next Stage**: Unit 2 Code Generation
   - **Last Completed**: Unit 1 Code Generation (2024-03-15)

AI loads previous artifacts (requirements, designs, plans from aidlc-docs/)
AI presents "Welcome Back" prompt with options to continue or review
Work continues seamlessly from where it left off

Why This Works Across ALL Platforms

The genius of the approach:

Platform-specific  ──▶  INSTRUCTIONS  ──▶  Universal workspace files
rule file location      (same content)      (same for all platforms)

.kiro/steering/    ─┐
.amazonq/rules/    ─┤                       aidlc-docs/
.cursor/rules/     ─┼─▶ core-workflow.md ──▶ ├── aidlc-state.md
CLAUDE.md          ─┤   (same rules!)       ├── audit.md
copilot-instr.md   ─┘                       └── inception/...

The rule file's content is identical regardless of platform — only its location differs. You could switch IDEs mid-project: start with Cursor on Day 1 → generates aidlc-docs/. Switch to Claude Code on Day 5 → reads same aidlc-docs/, resumes from state.

Platform-Specific Enhancements

Some platforms offer additional session persistence beyond AI-DLC's file-based approach:

Platform	Extra Feature	How AI-DLC Benefits
Claude Code	`--continue` / `--resume` flags	Can resume exact conversation + file state
Claude Code	Auto-memory system	Learns user preferences across projects
Kiro IDE	Conditional steering (file-match patterns)	Can load phase-specific rules
Cursor	`alwaysApply` vs conditional rules	Always-on ensures workflow never forgotten

Important limitation: AI context windows are finite. Even when resuming, if the project has 20+ artifacts, the AI must selectively load what's relevant. AI-DLC handles this with "Smart Context Loading by Stage": early stages load only workspace analysis; design stages load requirements + stories + architecture; code stages load ALL artifacts + existing code.

The Human Approval Gate Pattern

Every stage follows: Generate → Present → Wait → Log

AI generates artifacts
       │
       ▼
Present completion message with:
  📋 REVIEW REQUIRED (link to artifact)
  🚀 WHAT'S NEXT:
     🔧 Request Changes
     ✅ Approve & Continue
       │
       ▼
⛔ GATE: Do NOT proceed until explicit approval
       │
       ▼
Log user's COMPLETE RAW response in audit.md (ISO 8601 timestamp)

This creates a complete audit trail. For regulated industries, you can trace exactly why a particular architectural decision was made, who approved it, and what context was available.

Anti-Overconfidence — My Favorite Design Choice

AI-DLC explicitly prevents the common AI problem of "assuming instead of asking":

OLD AI BEHAVIOR:                       AI-DLC ENFORCED BEHAVIOR:
─────────────────                      ────────────────────────────
User: "Build a user API"               User: "Build a user API"
AI: *immediately writes code*          AI: *generates 8 clarifying questions*
     (assumes REST, assumes Node,           - REST or GraphQL?
      assumes PostgreSQL, assumes           - Authentication method?
      no caching needed...)                 - Expected request volume?
                                            - Data retention requirements?
                                            ...
                                       AI: *waits for answers*
                                       AI: *analyzes answers for ambiguity*
                                       AI: *follows up on vague responses*
                                       AI: *THEN proceeds with full context*

Red flags the AI must detect in user answers:

"depends" → follow up: "What does it depend on? Define the criteria"
"mix of A and B" → follow up: "When do you use A vs B specifically?"
"not sure" → follow up: "What information would help you decide?"
"standard" → follow up: "Define 'standard' in your context"

What Makes This Different — The 10 Key Differentiators

Agent-agnostic rule files — same methodology across 7+ coding assistants
Adaptive intelligence — complexity drives depth, not rigid templates
Anti-overconfidence by design — mandatory questioning and ambiguity resolution
Full audit trail — every interaction logged with timestamps in audit.md
Extension system — enterprises layer their own security, compliance, coding rules
Session continuity via files — aidlc-state.md enables cross-session resume on any platform
Separation of concerns — code at project root, docs in aidlc-docs/, rules in platform location
Two-part stage pattern — plan approval before execution prevents wasted effort
Per-unit loop — complex systems decomposed and built incrementally
Blocking constraints — enabled extensions halt progress on non-compliance

Who Should Use This

AI-DLC is most valuable when:

Your team uses AI coding agents daily and wants consistency
You work on brownfield codebases where context matters
You need audit trails for compliance or governance
You want to switch between IDEs without losing workflow state
You're building complex systems that need decomposition before coding

It's probably overkill for solo side projects or quick prototypes. The adaptive depth system mitigates this, but there's inherent overhead in the questioning phase. The extension system makes it particularly powerful for enterprises — you can encode your entire internal development playbook as blocking rules.

FAQ

Is AI-DLC a tool I need to install?

No. AI-DLC is a set of rule files (markdown documents) that you place in your project's rule directory. There's no CLI, no npm package, no binary. Your existing AI coding agent reads the rules and follows the methodology. It works on Kiro, Amazon Q, Cursor, Claude Code, Copilot, and any other agent that supports instruction/rule files.

Can I use AI-DLC with Claude Code and its CLAUDE.md?

Yes — Claude Code is one of the supported platforms. You either include the AI-DLC rules directly in your CLAUDE.md file or reference them via include paths. Claude Code's --continue flag provides additional session continuity on top of AI-DLC's file-based state system. The auto-memory feature also learns your preferences across projects.

What happens if I skip Inception and go straight to coding?

The framework is adaptive — for truly simple tasks, the AI will propose a minimal path. But force-skipping on a complex task means you lose reverse engineering context, requirements clarification, and architectural planning. The common failure mode is the AI duplicating existing services or breaking patterns because it never scanned the codebase.

How do AI-DLC extensions differ from regular linting rules?

Two key differences: scope and enforcement. Linting rules check syntax and style at the file level. AI-DLC extensions operate at the architecture level — "all APIs must be versioned," "every data model needs PHI classification," "minimum 3 replicas for stateful services." And they're blocking: the workflow physically cannot proceed until violations are resolved. There's no "ignore this warning" escape hatch.

How does the two-file extension convention work?

Each extension has an opt-in.md file (lightweight, always loaded — asks the user a yes/no question) and a full rules file (heavy, loaded only if the user opts in). This keeps the system prompt lean while allowing arbitrarily complex rule sets. Extensions without an opt-in file are always enforced — no user choice.

Can different team members use different IDEs on the same AI-DLC project?

Yes — this is explicitly supported. The workflow state in aidlc-docs/ is platform-agnostic. Developer A uses Cursor, Developer B uses Claude Code, Developer C uses Amazon Q. They all read and write the same aidlc-docs/aidlc-state.md and artifact files. The only IDE-specific files are the rule file locations, and those contain identical content.

Originally published at fp8.co. Subscribe for weekly AI engineering analysis at fp8.co/newsletters.