<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Frédéric Geens</title>
    <description>The latest articles on Forem by Frédéric Geens (@fr-e-d).</description>
    <link>https://forem.com/fr-e-d</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3795317%2Fe31ea398-a6e8-4700-84e7-86dc5215a1c2.jpeg</url>
      <title>Forem: Frédéric Geens</title>
      <link>https://forem.com/fr-e-d</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/fr-e-d"/>
    <language>en</language>
    <item>
      <title>The Cold Start Pricing Trap</title>
      <dc:creator>Frédéric Geens</dc:creator>
      <pubDate>Tue, 24 Mar 2026 16:12:53 +0000</pubDate>
      <link>https://forem.com/fr-e-d/the-cold-start-pricing-trap-3jff</link>
      <guid>https://forem.com/fr-e-d/the-cold-start-pricing-trap-3jff</guid>
      <description>&lt;p&gt;I spent a full day building a clever pricing mechanism for my marketplace. Dynamic coefficients, maturity thresholds, progressive price increases tied to platform quality metrics. It was elegant.&lt;/p&gt;

&lt;p&gt;Then I killed it.&lt;/p&gt;

&lt;p&gt;Here's what I learned — and the framework that actually matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  The gut check that started it
&lt;/h2&gt;

&lt;p&gt;I'm building &lt;a href="https://callibrate.io" rel="noopener noreferrer"&gt;Callibrate&lt;/a&gt;, a B2B matching platform that connects businesses with AI/automation experts. The expert pays per qualified lead — a booked call with a prospect who has a confirmed budget and a scoped problem.&lt;/p&gt;

&lt;p&gt;Our pricing grid ties lead cost to prospect budget:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Prospect budget   Lead price
────────────────  ──────────
Under 5k          €49
5k – 20k          €89
20k – 50k         €149
50k+              €229
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I was reviewing this grid when a question hit me: &lt;em&gt;Would I pay €89 for a single lead from a platform nobody's heard of?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Probably not.&lt;/p&gt;

&lt;p&gt;And that's the problem. The grid was designed for steady state — a platform with proven lead quality and a 30% call-to-project conversion rate. But at launch? Zero track record. Zero conversion data. The matching engine hasn't been calibrated on real leads.&lt;/p&gt;

&lt;p&gt;The price might be correct in theory. But would an expert trust it?&lt;/p&gt;

&lt;h2&gt;
  
  
  The math that should come first
&lt;/h2&gt;

&lt;p&gt;Before touching the price, I needed to know: are these numbers actually defensible?&lt;/p&gt;

&lt;p&gt;I modeled the expert's real cost to acquire one paying client at each tier under three conversion scenarios.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;At 30% conversion (steady state hypothesis):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Budget tier   Price/lead   Leads to close 1   Cost per client   % of deal
───────────   ──────────   ────────────────   ───────────────   ─────────
Under 5k      €49          3.3                €163              5.4%
5k – 20k      €89          3.3                €297              2.4%
20k – 50k     €149         3.3                €497              1.4%
50k+          €229         3.3                €763              1.0%
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All green. The cost-to-value ratios are excellent — well under the 5% industry threshold for B2B lead generation (standard SaaS benchmark — acquisition cost should stay below 5% of contract value at scale).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;At 15% conversion (realistic launch — engine not calibrated):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Budget tier   Price/lead   Leads to close 1   Cost per client   % of deal
───────────   ──────────   ────────────────   ───────────────   ─────────
Under 5k      €49          6.7                €327              10.9% ⚠
5k – 20k      €89          6.7                €593              4.7%
20k – 50k     €149         6.7                €993              2.8%
50k+          €229         6.7                €1,527            2.0%
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Still works for everything except the smallest tier. And at 8% conversion — worst case — the under-5k tier becomes painful (20% of deal value) but the larger tiers remain viable.&lt;/p&gt;

&lt;p&gt;Then I benchmarked against how AI consultants actually acquire clients today:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cold outbound:&lt;/strong&gt; $200-500 per meeting (Belkins, SalesHive — B2B lead gen pricing reports 2025)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LinkedIn organic:&lt;/strong&gt; $750-3,000 per meeting — opportunity cost at $150/h consultant rates, 5-10h/week for 2-4 meetings/month&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bark / Thumbtack:&lt;/strong&gt; $14-100 per lead (Bark Help Center, Thumbtack Pro documentation), 1.3/5 quality rating from professionals (G2, Trustpilot 2025-2026)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;B2B lead gen agencies:&lt;/strong&gt; $100-250 per qualified lead (Sopro 2025 B2B cost-per-lead benchmarks)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Referrals:&lt;/strong&gt; Free but completely unpredictable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A pre-qualified B2B lead with a booked call and confirmed budget, priced at €49-229? That's competitive. The math works.&lt;/p&gt;

&lt;p&gt;So the price isn't the problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  The clever solution I built (and killed)
&lt;/h2&gt;

&lt;p&gt;Knowing the math was sound didn't resolve my gut feeling. An expert seeing "€89/lead" on an unknown platform will still hesitate. So I designed what I thought was an elegant solution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Platform Maturity Coefficient.&lt;/strong&gt; Lead price would scale with the platform's proven quality:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;0-50 accepted leads: price × 0.65 (35% discount)&lt;/li&gt;
&lt;li&gt;51-200 leads: price × 0.80&lt;/li&gt;
&lt;li&gt;201-500 leads: price × 0.90&lt;/li&gt;
&lt;li&gt;500+ leads: full price&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The idea: charge less while the platform is unproven, increase as quality is demonstrated. Transparent. Published. Tied to real metrics.&lt;/p&gt;

&lt;p&gt;I liked it. It felt fair.&lt;/p&gt;

&lt;p&gt;Then I stress-tested it against what the industry actually does. Five problems emerged.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. No marketplace uses this.&lt;/strong&gt; I checked Bark, Thumbtack, Angi, Upwork, Fiverr, Contra, Expert360. Zero use a "platform maturity" pricing mechanism. Dynamic pricing exists — based on supply/demand, job value, exclusivity. Never based on "how many leads we've successfully delivered." Novel pricing mechanisms are a risk, not an advantage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. It penalizes loyalty.&lt;/strong&gt; The price goes UP as the platform improves. The more an expert uses the platform and helps it succeed, the more they pay. That's a loyalty penalty. Research is clear: customers churn when they discover others paid less for the same thing. My mechanism guaranteed it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. It signals weakness.&lt;/strong&gt; An expert reads "35% discount because we haven't proven our quality yet" and hears: &lt;em&gt;even they don't trust their own product.&lt;/em&gt; That's the opposite of the confidence signal you need at launch.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. It creates a price cliff.&lt;/strong&gt; Early adopters get welcome credits (part of our Founding Expert program). If those credits buy leads at one price and paid credits buy at another, one credit no longer equals one credit. Transparent pricing means stable units. My co-founder instinct caught this one: "I prefer transparency and no surprises. Credits must have the same value."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. It's redundant.&lt;/strong&gt; We already have a Founding Expert program that gives early adopters 300 free credits, a 20% permanent recharge bonus, and a 14-day satisfaction window where credits are restored on bad leads. The cold-start trust gap was already covered. I was adding complexity on top of a system that didn't need it.&lt;/p&gt;

&lt;p&gt;I deleted the coefficient.&lt;/p&gt;

&lt;h2&gt;
  
  
  The actual insight
&lt;/h2&gt;

&lt;p&gt;The pricing grid was never wrong. But I was asking the wrong question.&lt;/p&gt;

&lt;p&gt;"Is €89 too expensive for an unproven lead?" is a pricing question. The real question is: &lt;strong&gt;"Does the expert trust the platform enough to find out?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That's a trust question. And trust has different mechanisms than price.&lt;/p&gt;

&lt;p&gt;Here's what I already had in place:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Trust gap                           Mechanism
──────────────────────────────────  ──────────────────────────────────
"What if the matching is bad?"      14-day window — credits restored
"What if I pay for nothing?"        300 free credits ($5 setup)
"Price too high for the quality?"   20% recharge bonus, for life
"How do I know this is legit?"      Published metrics + methodology
"Lead fine but doesn't convert?"    Normal pay-per-lead risk
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The trust gap was already covered. Not by pricing tricks — by structural protections.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I built instead
&lt;/h2&gt;

&lt;p&gt;One thing was genuinely missing: a pre-defined plan for what to do if the data proves me wrong.&lt;/p&gt;

&lt;p&gt;So instead of a clever pricing mechanism, I wrote a simple rule:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;After 50 accepted leads:&lt;/strong&gt; if conversion drops below 15% on any tier AND the flag rate is below 20% (meaning leads are technically fine but not converting), pause that tier, diagnose whether it's a matching problem or a pricing problem, and adjust accordingly.&lt;/p&gt;

&lt;p&gt;That's it. No dynamic coefficients. No progressive discounts. Just: track the data, define the trigger, pre-commit to the response.&lt;/p&gt;

&lt;p&gt;The lesson was clear. I was solving a trust problem with a pricing tool. The right tools for trust are: risk reduction (free trial leads), quality guarantees (flag window), and radical transparency (published metrics). Not discounts.&lt;/p&gt;

&lt;h2&gt;
  
  
  The framework, if you want to steal it
&lt;/h2&gt;

&lt;p&gt;For any marketplace founder working through pricing:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Model the expert's ROI&lt;/strong&gt; at multiple conversion rates. Not just your optimistic hypothesis — model 30%, 15%, and 8%. If the math doesn't work at 15%, you have a real pricing problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2: Benchmark against alternatives.&lt;/strong&gt; Not against other marketplaces — against how your supply side currently acquires customers. If an AI consultant spends $500 and 3 weeks to land a client through cold outreach, a €89 pre-qualified lead is cheap. The anchor is their current cost, not Bark's credit price.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Separate price from trust.&lt;/strong&gt; If the price is defensible but adoption is uncertain, don't lower the price. Build trust mechanisms instead: free trial credits, satisfaction guarantees, transparent metrics, quality protection.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4: Pre-commit your response.&lt;/strong&gt; Write down, before launch, exactly what data would trigger a price adjustment. What conversion rate? How many leads? What diagnostic steps? You don't want to make this decision under stress.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm building &lt;a href="https://callibrate.io" rel="noopener noreferrer"&gt;Callibrate&lt;/a&gt; in public — a matching platform for AI/automation experts. If marketplace pricing and cold-start problems are your thing, I write about this stuff regularly.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://fredericgeens.substack.com/p/the-cold-start-pricing-trap" rel="noopener noreferrer"&gt;Substack&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>startup</category>
      <category>saas</category>
      <category>buildinpublic</category>
      <category>marketplace</category>
    </item>
    <item>
      <title>My girlfriend banned my laptop for 6 weeks. I came back with GAAI — a governance framework for AI coding agents.</title>
      <dc:creator>Frédéric Geens</dc:creator>
      <pubDate>Mon, 09 Mar 2026 01:35:41 +0000</pubDate>
      <link>https://forem.com/fr-e-d/my-girlfriend-banned-my-laptop-for-6-weeks-i-came-back-with-a-governance-framework-for-ai-agents-15lk</link>
      <guid>https://forem.com/fr-e-d/my-girlfriend-banned-my-laptop-for-6-weeks-i-came-back-with-a-governance-framework-for-ai-agents-15lk</guid>
      <description>&lt;p&gt;&lt;strong&gt;My girlfriend banned my laptop for 6 weeks. I came back with GAAI — a governance framework that turns AI coding tools into reliable agentic systems.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;39 stories. 79 decisions. 260 tests. 16,000 lines of code. 4 days.&lt;/p&gt;

&lt;p&gt;That was day 4. Today: 32 epics, 176 stories, 177 decisions, 84,000 lines of code, no AI drifts. Same framework. Same governance. The compound interest is real.&lt;/p&gt;

&lt;p&gt;The enforced pause is what made the results possible. Here's how.&lt;/p&gt;




&lt;h2&gt;
  
  
  Who I Am (Not a CV)
&lt;/h2&gt;

&lt;p&gt;I'm Frédéric Geens. Belgian. Program &amp;amp; Operation manager by day, builder by obsession.&lt;/p&gt;

&lt;p&gt;Ten-plus years of coding alone at night — hundreds of hours, most dropped early (fail fast, learn faster). A dozen projects started, a handful finished, almost none made it to production. Not failure. Training. The kind you don’t get from tutorials.&lt;/p&gt;

&lt;p&gt;In 2015, I co-founded a transport company in Wallonia with a friend. I built the booking platform between rides, laptop on knees in the car. It was one of the first online booking systems for on-demand transport in the region. We ran it for five years. I handed it over in 2019. It survived COVID. It still runs today.&lt;/p&gt;

&lt;p&gt;Somewhere in there I fell in love with SaaS. Read Rob Walling's &lt;em&gt;The SaaS Playbook&lt;/em&gt; and something clicked: bootstrap everything, no VC, start small and stay small. Build something people need badly enough to pay for. Painkillers, not vitamins.&lt;/p&gt;

&lt;p&gt;I'm an IT Program &amp;amp; Operation Manager now — 30 hours a week, remote. The rest of the time, I build. I describe myself as a Swiss Army knife: not the deepest expert in any single thing, but deeply curious, fast to learn, and obsessively focused on building things that actually solve problems.&lt;/p&gt;

&lt;p&gt;I'm not someone who discovered "vibe coding" last month. The distinction matters for everything that follows.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Vietnam Digital Detox (The Part That Started Everything)
&lt;/h2&gt;

&lt;p&gt;My partner is my guardrail. She watches over my balance, pulls me out of the cave, forces me to live things that have nothing to do with screens. She's been watching me build project after project for years, most of them destined for the drawer. She knows when I need a break even when I don't.&lt;/p&gt;

&lt;p&gt;So she imposed a rule before our trip to Vietnam: no laptop. Six weeks. (Jan-Feb 2026)&lt;/p&gt;

&lt;p&gt;I agreed. I shouldn't have been surprised. I also shouldn't have been surprised at how much it bothered me, because the AI ecosystem was exploding in exactly those six weeks.&lt;/p&gt;

&lt;p&gt;January–February 2026 was chaos. New agent frameworks going viral overnight — some shipped entire products, others wasted nine hours and got nothing. A bash one-liner that just looped AI prompts until tests passed was getting adopted by YC teams. Anthropic launched Claude Cowork with Skills; Fortune called it a threat to dozens of startups. New models every week — Opus 4.6, GPT-5.3-Codex, Sonnet 4.6, Gemini 3.1 Pro — each making last week's feel outdated. A major open-source skill registry got hit with 824 malicious packages. The creator joined OpenAI mid-crisis.&lt;/p&gt;

&lt;p&gt;I was watching all of this from an iPhone 8 in Southeast Asia with no way to respond. Every new tool to try, every new model to test, every drama to follow — and no laptop to get pulled into any of it.&lt;/p&gt;

&lt;p&gt;The itch was unbearable. But not having the laptop turned out to be the best thing that could have happened. Because while everyone else was chasing the shiny objects — trying B-Mad, looping Ralph Wiggum, panicking about Cowork, benchmarking whatever model dropped that week — I couldn't participate. I was forced to sit with a different question entirely. Not "how do I code faster with AI?" but "how do I make agentic coding &lt;em&gt;reliable&lt;/em&gt;?" How do you control the beast? What happens when the agent makes a decision you didn't authorize? What happens when context is lost between sessions? What happens when you can't trace why something was built the way it was?&lt;/p&gt;

&lt;p&gt;The hype was about speed. The question nobody was asking was about governance. And I had six weeks with nothing but a slow phone to think about it.&lt;/p&gt;

&lt;p&gt;So instead of coding, I started thinking. From that painfully slow iPhone 8 — the kind where you watch the progress bar and go make coffee — I read scientific papers. Papers on separating reflection from execution, on context engineering, on persistent memory in agent systems, on isolation between cognitive and operational roles.&lt;/p&gt;

&lt;p&gt;I used Google NotebookLM (Gemini 3) and ChatGPT deepsearch to condense the state of the art. NotebookLM is genuinely good at this — feed it 15 papers and a handful of forum threads and ask it to synthesize. It did the work my slow phone couldn't.&lt;/p&gt;

&lt;p&gt;Then I used ChatGPT 5.2 to brutally challenge every idea I was forming. No complacency. I'd draft a concept, feed it to ChatGPT, and tell it to find every hole. It found plenty. That friction made the framework tighter.&lt;/p&gt;

&lt;p&gt;Piece by piece, the architecture took shape — captured in Notion via MCP, challenged by ChatGPT, synthesized by NotebookLM, refined on a phone that could barely keep up.&lt;/p&gt;

&lt;p&gt;There's an irony here I want to name explicitly: a governance framework for AI agents, born because a human imposed governance on the builder. The same person who gives me the energy to keep building is the one who forced the pause that made the breakthrough possible. She doesn't need a framework to keep me on track. But apparently AI agents do.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Built: GAAI in Four Concepts
&lt;/h2&gt;

&lt;p&gt;GAAI stands for — &lt;strong&gt;Governed Agentic AI Infrastructure&lt;/strong&gt;. It lives in a &lt;code&gt;.gaai/&lt;/code&gt; folder at the root of the project and governs every session.&lt;/p&gt;

&lt;p&gt;4 layers:&lt;/p&gt;

&lt;p&gt;→ &lt;strong&gt;Dual-Track Agentic:&lt;/strong&gt; one agent thinks, produces epics, stories and logs the decisions trail. The other builds. Never mix.&lt;br&gt;
→ &lt;strong&gt;Skills:&lt;/strong&gt; execute only what the skill authorizes&lt;br&gt;
→ &lt;strong&gt;Rules:&lt;/strong&gt; govern them all&lt;br&gt;
→ &lt;strong&gt;Persistent Memory:&lt;/strong&gt; agents read yesterday's decisions before today's code&lt;/p&gt;

&lt;p&gt;No improvisation. No drift.&lt;/p&gt;

&lt;p&gt;Here's what each layer does:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Dual-Track (think ≠ do)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;There are two agents, and they never mix roles. The Discovery Agent thinks: it clarifies intent, asks questions, creates artefacts, help the human in-the-loop to define what to build. The Delivery Agent executes: it reads the backlog, picks a validated story, builds it, tests it, opens a PR. An agent that discovers is never also the one that codes — like separating the architect from the construction crew.&lt;/p&gt;

&lt;p&gt;This isn't just conceptual cleanliness. It's what prevents scope creep, rogue decisions, and the particular failure mode where an agent decides to "improve" something while building something else entirely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Persistent Memory&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every session starts with loaded context. The agent knows what was decided three days ago, what the billing model is, why Cal.com was replaced, what Reddit told us about expert pricing sensitivity. Pattern: &lt;code&gt;memory-retrieve&lt;/code&gt; runs before any action. Result: no repeated mistakes, no re-litigating decisions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Decision Trail&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every non-trivial choice gets a decision log entry in &lt;code&gt;DEC-NNN&lt;/code&gt; format. The entry records: what was decided, why, what it replaces, what it impacts. 177 decisions logged at time of writing. Each one is a future mistake prevented.&lt;/p&gt;

&lt;p&gt;When something breaks or needs to change, you (or your Discovery Agent) open the log and find the entry. When you pivot, you (as well as the agent) know exactly what you're replacing and why.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Skill-Based Execution&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Agents do only what their skill authorizes. No improvisation. No scope creep. The backlog authorizes execution; the skill defines how to execute. The agent reads both before touching a file. This is the constraint that makes the whole system predictable.&lt;/p&gt;

&lt;p&gt;Here's a concrete Dual-Track example from the project: while Delivery was building the matching engine (story E06S05), Discovery was simultaneously running listening sessions on Reddit — 27 posts across 20 subreddits, non-revealing engagement. The budget data that came back from Reddit directly changed the pricing model (DEC-67). Two tracks, running in parallel, from day 1.&lt;/p&gt;

&lt;p&gt;Here's what the folder looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.gaai/
├── core/
│   ├── agents/          # Discovery, Delivery — role definitions
│   ├── skills/          # What each agent is authorized to do
│   └── contexts/rules/  # Orchestration constraints
└── project/
    ├── contexts/
    │   ├── backlog/     # Stories: the only thing that authorizes code
    │   ├── memory/      # Persistent context, loaded per session
    │   └── artefacts/   # Decisions, strategies, research
    └── ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No magic. No custom tooling. A folder, markdown files, and constraints that Claude Code reads before touching anything.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Test: Callibrate
&lt;/h2&gt;

&lt;p&gt;I flew home to Belgium. Opened the laptop. Started the first real implementation with Claude Code under GAAI governance. The results were well beyond what I'd expected.&lt;/p&gt;

&lt;p&gt;Not just speed. Confidence. The kind where you push to production and actually sleep afterward, because you know the agent didn't make up decisions in the dark.&lt;/p&gt;

&lt;p&gt;But I didn't want to write about an untested framework. I needed it to hold up on something real and complex — not a toy project, not a demo. So I ran it on Callibrate.&lt;/p&gt;

&lt;p&gt;Callibrate is an AI expert matching marketplace. Businesses with automation needs — SMBs trying to implement n8n, companies building internal AI tools, startups who need someone who actually understands their business logic — get matched with vetted AI consultants. Not LinkedIn. Not Upwork. Specialized, curated, pre-qualified on both sides. The unit of value isn't a profile view or a message. It's a booked call.&lt;/p&gt;

&lt;p&gt;Here's what GAAI governance shipped in 4 days:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;39 stories delivered, 46 total stories planned&lt;/li&gt;
&lt;li&gt;79 decisions documented (20 active, 59 archived)&lt;/li&gt;
&lt;li&gt;260 vitest tests passing&lt;/li&gt;
&lt;li&gt;16,246 lines of TypeScript across 97 files&lt;/li&gt;
&lt;li&gt;4 Cloudflare Workers: Core API, Matching Engine, PostHog Proxy, Satellite frontend&lt;/li&gt;
&lt;li&gt;7 epics spanning auth, billing, matching, AI extraction, Google Calendar, analytics, content&lt;/li&gt;
&lt;li&gt;4 working days (2026-02-19 to 2026-02-24)&lt;/li&gt;
&lt;li&gt;Total API cost: $198.29 for 458 million tokens — 96.9% cache reads&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That was day 4. Two and a half weeks later, the same framework has shipped 176 stories across 32 epics, documented 177 decisions, and governs 84,000 lines of TypeScript across 7 Cloudflare Workers. The governance didn't slow down — it compounded.&lt;/p&gt;

&lt;p&gt;That last number deserves a sentence: 96.9% of all tokens were cache reads. The persistent context system meant the agents weren't regenerating knowledge from scratch every session. They were re-reading what was already established and building forward.&lt;/p&gt;




&lt;h2&gt;
  
  
  How I Validated the Problem Before Writing a Line of Code
&lt;/h2&gt;

&lt;p&gt;The matching engine existed before anyone asked for it. That's the usual failure mode of solo builders: we fall in love with solutions before confirming problems.&lt;/p&gt;

&lt;p&gt;The Discovery Agent ran a parallel track while Delivery was building. 27 posts across 20 subreddits. Non-revealing — the product name never appeared in any of them. Just questions, observations, engagement. Community conversations about real pain.&lt;/p&gt;

&lt;p&gt;The posts are still getting comments weeks later.&lt;/p&gt;

&lt;p&gt;Here's what the community actually said:&lt;/p&gt;

&lt;p&gt;On expert lead quality: &lt;em&gt;"90% of inbound is just tire kickers looking for a $50 fix."&lt;/em&gt; — Littlecutsie, r/n8n_ai_agents&lt;/p&gt;

&lt;p&gt;On finding good experts as a buyer: &lt;em&gt;"Took me about 3 months to find someone who actually got our specific business logic."&lt;/em&gt; — Present-Access-2260, r/AiForSmallBusiness&lt;/p&gt;

&lt;p&gt;On the scarcity of real expertise: &lt;em&gt;"Real workflow architects who understand business logic are maybe 10-15% of that pool."&lt;/em&gt; — PathStoneAnalytics, r/AiForSmallBusiness&lt;/p&gt;

&lt;p&gt;On expert billing sophistication: &lt;em&gt;"Money upfront. Card on file. Auto billing enabled."&lt;/em&gt; — MachadoEsq, r/gohighlevel&lt;/p&gt;

&lt;p&gt;Zero competitors named across all 27 posts.&lt;/p&gt;

&lt;p&gt;One anti-signal worth sharing: someone asked &lt;em&gt;"Do you use AI to write all your posts or are you a bot?"&lt;/em&gt; — mycall, r/digitalnomad. The format was too polished. Reddit needs rougher writing. Lesson noted, applied.&lt;/p&gt;

&lt;p&gt;The signals directly changed the architecture. Budget was the #1 filter in every expert conversation — they didn't want leads who couldn't pay. That became DEC-67: a dynamic pricing model based on declared prospect budget (€49 for sub-5k prospects, €263 for 50k+ projects). Expert time waste became the case for making the matching engine the critical path, not a nice-to-have. Expert sophistication signals — "card on file", "modular or nothing", "if they want me to build the brain inside GHL, it's a hard no" — became admissibility controls in expert profiles (DEC-80).&lt;/p&gt;

&lt;p&gt;One quote changed the entire positioning framing: &lt;em&gt;"We stopped selling 'n8n workflows' and started selling 'found time'."&lt;/em&gt; — Littlecutsie, r/n8n_ai_agents. Outcome-based positioning. Not what you build. What you save.&lt;/p&gt;

&lt;p&gt;The Dual-Track proof: Discovery (Reddit) was running in parallel with Delivery (code) from day 1. By the time the matching engine was done, the pricing model and admissibility logic it runs on had already been validated by real people with real opinions.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Agents Need Governance
&lt;/h2&gt;

&lt;p&gt;The problem isn't capability. Today's coding agents can write, test, refactor, and ship. The problem is what they do when you're not watching. They make architectural decisions you didn't authorize. They "improve" code that wasn't in scope. They forget what was decided yesterday and re-invent it differently today. They accumulate technical debt that only surfaces when you try to merge.&lt;/p&gt;

&lt;p&gt;MCP solves tool access. It doesn't solve behavior. Governance is what fills that gap — not at the ecosystem level, but at the session level, where the actual damage happens.&lt;/p&gt;

&lt;p&gt;Here are three concrete examples from this project:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The 19-PR disaster (DEC-71).&lt;/strong&gt; Nineteen story branches accumulated unmerged over multiple delivery cycles. When they were finally batch-merged, cascading conflicts required three rounds of resolution — 60 conflict files, more than two hours of remediation work. The root cause: the delivery process had no mandatory merge step. Fix: one rule added to &lt;code&gt;conventions.md&lt;/code&gt;. Every PR gets merged immediately after QA passes. Never accumulate. One process change, problem gone.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The billing pivot (DEC-68).&lt;/strong&gt; The original billing model had structural friction — per-lead checkout with non-refundable payment processor fees. The decision trail showed exactly what depended on it. Six stories and four decision log entries later: a completely different billing architecture. Without the trail, this problem would have surfaced in production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Cal.com pivot (DEC-41).&lt;/strong&gt; Cal.com closed its Platform plan — the Managed Users API needed for multi-tenant booking — to new signups on 2025-12-15. Discovered in 30 seconds via the decision trail — the dependency was logged, the impact was clear, the pivot path was obvious. Google Calendar OAuth instead. Time lost: near zero.&lt;/p&gt;

&lt;p&gt;The memory system completes the picture. Session 1 establishes that postgres.js replaces Supabase JS for all DB queries (DEC-66). Session 2 doesn't re-evaluate it. Session 3 doesn't re-implement a Supabase client because the agent "forgot." The decision exists in memory, the agent reads it, and builds forward. Without persistent memory, every session starts from scratch — reinventing decisions, repeating mistakes, rebuilding what was already built.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/Fr-e-d/GAAI-framework" rel="noopener noreferrer"&gt;GAAI is open-source on GitHub.&lt;/a&gt; The framework held up on a real, complex project — four days to first proof, two and a half weeks to 176 stories. It's ready to use.&lt;/p&gt;

&lt;p&gt;Callibrate is launching. The matching marketplace — the product GAAI built — will be live for AI experts and businesses shortly. If you're an AI consultant tired of unqualified leads, or a company that's spent months trying to find someone who actually understands your business logic, it was built for you.&lt;/p&gt;

&lt;p&gt;This post is the first public artefact. The build log already exists — 177 decisions, 176 stories, all documented. It just needs to be made public in pieces.&lt;/p&gt;

&lt;p&gt;Follow on X (&lt;a href="https://x.com/frederic_geens" rel="noopener noreferrer"&gt;@frederic_geens&lt;/a&gt;) for build-in-public updates — threads on specific governance decisions, architecture choices, and what the data actually shows.&lt;/p&gt;

&lt;p&gt;Subscribe on Substack for the monthly milestone posts — deeper dives, less frequent, built for reading not scrolling.&lt;/p&gt;

&lt;p&gt;If you're using Claude Code and wrestling with unstructured sessions — agents making decisions you didn't authorize, context lost between sessions, no way to understand what happened or why — &lt;a href="https://github.com/Fr-e-d/GAAI-framework" rel="noopener noreferrer"&gt;the framework might be exactly what you need.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The digital detox was my partner's idea. This post exists because of her.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>opensource</category>
      <category>buildinpublic</category>
    </item>
  </channel>
</rss>
