<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: rp1run</title>
    <description>The latest articles on Forem by rp1run (@rp1run).</description>
    <link>https://forem.com/rp1run</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3833224%2F1e69d4c0-9794-4a20-8b0e-a4b49eeca031.png</url>
      <title>Forem: rp1run</title>
      <link>https://forem.com/rp1run</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/rp1run"/>
    <language>en</language>
    <item>
      <title>Why we ship untested prompts (and the supply-chain pattern that fixes it)</title>
      <dc:creator>rp1run</dc:creator>
      <pubDate>Wed, 29 Apr 2026 15:00:00 +0000</pubDate>
      <link>https://forem.com/rp1run/why-we-ship-untested-prompts-and-the-supply-chain-pattern-that-fixes-it-291d</link>
      <guid>https://forem.com/rp1run/why-we-ship-untested-prompts-and-the-supply-chain-pattern-that-fixes-it-291d</guid>
      <description>&lt;p&gt;I'd never approve a PR that bypassed CI.&lt;/p&gt;

&lt;p&gt;But I've watched dozens of teams — including ones I've worked on — deploy prompt changes with zero of the verification we'd insist on for a code change. Edit a string in a config file. Push. Hope.&lt;/p&gt;

&lt;p&gt;A prompt change is a logic change. It alters how the system behaves under uncertainty, what it returns under load, and how it handles edge cases nobody enumerated. The fact that it's text and not Python doesn't change what it does.&lt;/p&gt;

&lt;p&gt;The gap between how we deploy code and how we deploy prompts is going to bite hard as agentic systems scale. And the answer might already exist — in the tooling the supply-chain security world has been building for the last five years.&lt;/p&gt;

&lt;h2&gt;
  
  
  The supply-chain parallel
&lt;/h2&gt;

&lt;p&gt;Sigstore, SLSA, in-toto. These tools solved a related problem for binaries: how do you cryptographically prove that the artifact in production is the one that passed your checks?&lt;/p&gt;

&lt;h3&gt;
  
  
  The primitives:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Content-addressable hashing.&lt;/strong&gt; Identify the artifact by the hash of its content. Two artifacts with the same hash are identical, byte-for-byte.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Signed attestations.&lt;/strong&gt; A cryptographic statement: "this hash passed this evaluation, witnessed by this entity."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verification gates.&lt;/strong&gt; Deployment refuses any artifact without a valid attestation.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Applied to prompts:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Hash the prompt text. &lt;code&gt;prompt[sha256:abc123...]&lt;/code&gt; is now uniquely identifiable.&lt;/li&gt;
&lt;li&gt;Run your eval suite against that exact hash.&lt;/li&gt;
&lt;li&gt;Generate a signed attestation: "prompt[abc123] passed eval suite v2 on date X."&lt;/li&gt;
&lt;li&gt;Production deployment verifies the attestation before promoting.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Now "what prompt is in production?" has an answer that doesn't depend on git archaeology or trusting a config dashboard.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this doesn't solve
&lt;/h2&gt;

&lt;p&gt;This is the part most discussions of prompt evaluation skip over.&lt;/p&gt;

&lt;p&gt;Eval reproducibility is non-trivial when the underlying model version drifts. An attestation from last month against &lt;code&gt;gpt-4o-2024-08-06&lt;/code&gt; doesn't tell you anything about behaviour against &lt;code&gt;gpt-4o-2024-11-20&lt;/code&gt;. Either you pin model versions in the attestation (and accept the operational cost of staying on old models), or you re-attest on every model version change (and accept the eval cost). There's no free lunch.&lt;/p&gt;

&lt;p&gt;There's also the question of whether "passing evals" is actually the right gate. Code passes tests but can still ship bugs. Prompt evals are coarser — they sample behaviour, they don't prove correctness.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bigger question
&lt;/h2&gt;

&lt;p&gt;Are prompts code or configuration?&lt;/p&gt;

&lt;p&gt;Most teams haven't picked, which is why they fall into the worst of both: edited freely like config, executing logic like code. Picking one would mean deciding whether prompts go through a CI pipeline (code-treated) or a configuration management system with rollback (config-treated). Either is better than the current default of "text in a file, deployed by whoever has commit access."&lt;/p&gt;

&lt;p&gt;Prem Pillai (&lt;a class="mentioned-user" href="https://dev.to/cloud-on-prem"&gt;@cloud-on-prem&lt;/a&gt;) wrote a longer treatment of the architecture and gaps as &lt;a href="https://blog.rp1.run/stop-shipping-untested-prompts-content-addressable-eval-attestation-for-agentic-systems-eabe35125454" rel="noopener noreferrer"&gt;rp1 blog post&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you're working on prompt evaluation, deployment pipelines for agentic systems, or just struggling with the operational chaos of prompt management at scale — we have a &lt;a href="https://discord.gg/WYQEvaDjpk" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; where engineers are talking through these patterns.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devops</category>
      <category>security</category>
      <category>discuss</category>
    </item>
    <item>
      <title>AI built your codebase in 2 months. Who's going to maintain it?</title>
      <dc:creator>rp1run</dc:creator>
      <pubDate>Wed, 22 Apr 2026 08:47:44 +0000</pubDate>
      <link>https://forem.com/rp1run/ai-built-your-codebase-in-2-months-whos-going-to-maintain-it-30eb</link>
      <guid>https://forem.com/rp1run/ai-built-your-codebase-in-2-months-whos-going-to-maintain-it-30eb</guid>
      <description>&lt;p&gt;Cloudflare shipped EmDash in April 2026 — an open-source CMS written in TypeScript, built in ~2 months by AI coding agents. It's a genuinely impressive achievement and a real signal of where the industry is going.&lt;/p&gt;

&lt;p&gt;But it also surfaces a question that the AI coding conversation has been avoiding: &lt;strong&gt;what happens after the AI ships the first version?&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The "plans that read well don't build well" problem
&lt;/h2&gt;

&lt;p&gt;There's a failure mode I keep seeing in AI-assisted codebases. The initial build is fast. The prose in the plan reads authoritatively. The code compiles and the tests pass. Three weeks later, the second engineer tries to extend it, and nothing quite fits — because the agent's narrative was persuasive without being correct about the underlying constraints.&lt;/p&gt;

&lt;p&gt;This isn't a model problem. Frontier models will keep getting better at writing plausible code. It's a &lt;strong&gt;workflow problem.&lt;/strong&gt; The missing layer is the one that turns ephemeral agent sessions into durable, reviewable architectural decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the missing layer looks like
&lt;/h2&gt;

&lt;p&gt;We have been building &lt;a href="https://rp1.run" rel="noopener noreferrer"&gt;rp1&lt;/a&gt; with this exact gap in mind. Three ideas, each directly addressing a specific failure mode.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Constitutional prompting
&lt;/h3&gt;

&lt;p&gt;Most "prompt engineering" is additive — you stack instructions on top of a model and hope. Constitutional prompting is subtractive: workflows encode the patterns an expert would follow &lt;em&gt;as constraints&lt;/em&gt;. &lt;a href="https://rp1.run/reference/dev/build/" rel="noopener noreferrer"&gt;&lt;code&gt;/build&lt;/code&gt;&lt;/a&gt; isn't a prompt, it's a pipeline:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generate a blueprint from requirements&lt;/li&gt;
&lt;li&gt;Form a hypothesis about the existing codebase&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Validate the hypothesis against actual code before writing anything&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Implement against the validated plan&lt;/li&gt;
&lt;li&gt;Run verification&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The hypothesis validation step is the one that catches the "plan reads well but is wrong about your ListView" class of bug.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Knowledge-aware agents
&lt;/h3&gt;

&lt;p&gt;Most AI coding sessions start blank. You re-explain your architecture everytime. rp1's &lt;a href="https://rp1.run/reference/base/knowledge-build/" rel="noopener noreferrer"&gt;&lt;code&gt;/knowledge-build&lt;/code&gt;&lt;/a&gt; runs once and maps your codebase into a persistent knowledge base that every subsequent command inherits.&lt;/p&gt;

&lt;p&gt;The practical effect: you stop getting generic advice that ignores your patterns. Every &lt;a href="https://rp1.run/reference/dev/build/" rel="noopener noreferrer"&gt;&lt;code&gt;/build&lt;/code&gt;&lt;/a&gt; starts with full awareness of the actual system, not an imagined one.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Durable artefacts
&lt;/h3&gt;

&lt;p&gt;Every rp1 workflow produces inspectable design documents — requirements, design, hypothesis, verification, reports — attached to the project, not trapped in chat scrollback.&lt;/p&gt;

&lt;p&gt;This is the onboarding primitive. When the second engineer joins an&lt;br&gt;
AI-built codebase, they can read &lt;em&gt;what was decided and why&lt;/em&gt; instead of re-prompting their way to an understanding.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://rp1.run" rel="noopener noreferrer"&gt;rp1&lt;/a&gt; is open source and works across Claude Code, OpenCode, Codex, and GitHub Copilot CLI. Same workflows, different harnesses.&lt;/p&gt;

&lt;p&gt;The full write-up on how this plays out specifically for EmDash-style codebases is on our blog: &lt;a href="https://blog.rp1.run/rp1-on-emdash-the-workflow-layer-that-makes-ai-built-codebases-navigable-73802187141d" rel="noopener noreferrer"&gt;rp1 on EmDash — the workflow layer that makes AI-built codebases navigable&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you're maintaining a codebase an agent wrote, Prem and I would genuinely like to hear what's broken. That's the feedback that's shaped everything we've built so far.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;rp1 is built by Prem Pillai (&lt;a href="https://x.com/cloud_on_prem" rel="noopener noreferrer"&gt;@cloud_on_prem&lt;/a&gt;) and Mahesh Shivamallappa (&lt;a href="https://x.com/maheshs786" rel="noopener noreferrer"&gt;@maheshs786&lt;/a&gt;).&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>opensource</category>
      <category>emdash</category>
    </item>
  </channel>
</rss>
