<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Rahul Kashayp</title>
    <description>The latest articles on Forem by Rahul Kashayp (@rahul_kashayp_700d6c4673a).</description>
    <link>https://forem.com/rahul_kashayp_700d6c4673a</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3754630%2Fbb2a2871-e607-458c-9198-fa1fd0e8e975.jpg</url>
      <title>Forem: Rahul Kashayp</title>
      <link>https://forem.com/rahul_kashayp_700d6c4673a</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/rahul_kashayp_700d6c4673a"/>
    <language>en</language>
    <item>
      <title>I Built Git for LLM Prompts — Here is What 332 Tests Taught Me</title>
      <dc:creator>Rahul Kashayp</dc:creator>
      <pubDate>Thu, 05 Feb 2026 10:13:40 +0000</pubDate>
      <link>https://forem.com/rahul_kashayp_700d6c4673a/i-built-git-for-llm-prompts-here-is-what-332-tests-taught-me-1gki</link>
      <guid>https://forem.com/rahul_kashayp_700d6c4673a/i-built-git-for-llm-prompts-here-is-what-332-tests-taught-me-1gki</guid>
      <description>&lt;h1&gt;
  
  
  I Built "Git for Prompts" — Here is What 332 Tests Taught Me
&lt;/h1&gt;

&lt;p&gt;I was managing 50+ LLM prompts in Google Docs.&lt;/p&gt;

&lt;p&gt;It broke my production AI &lt;strong&gt;3 times in one month&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Each time, I spent hours manually testing versions to find what changed.&lt;/p&gt;

&lt;p&gt;Sound familiar?&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Git works great for code. But prompts are different:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Semantic changes matter more than text diff&lt;/strong&gt; — changing "be concise" to "be thorough" is a behavioral shift&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Version history is scattered&lt;/strong&gt; — Google Docs, Notion, or worse, inline comments&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No way to query by performance&lt;/strong&gt; — Which version had the best success rate?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sharing improvements is manual&lt;/strong&gt; — Copy-paste and hope you do not break anything&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I needed version control that &lt;strong&gt;understands prompts&lt;/strong&gt;, not just tracks text.&lt;/p&gt;




&lt;h2&gt;
  
  
  Meet PIT (Prompt Information Tracker)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;prompt-pit
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;PIT is "Git for prompts" — semantic version control designed for LLM workflows.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Binary Search for Broken Versions
&lt;/h3&gt;

&lt;p&gt;Your AI started giving weird answers. Which version broke it?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pit bisect start &lt;span class="nt"&gt;--failing-input&lt;/span&gt; &lt;span class="s2"&gt;"why is the sky blue?"&lt;/span&gt;
pit bisect good v1
pit bisect bad v50
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Binary search finds the culprit. &lt;strong&gt;Minutes, not hours.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Time-Travel Replay
&lt;/h3&gt;

&lt;p&gt;Same input. 50 versions. Instant comparison.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pit replay run my-prompt &lt;span class="nt"&gt;--input&lt;/span&gt; &lt;span class="s2"&gt;"Hello"&lt;/span&gt; &lt;span class="nt"&gt;--all&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;See exactly how behavior evolved. No more "it worked yesterday" mysteries.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Query by Behavior
&lt;/h3&gt;

&lt;p&gt;Find versions that actually matter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pit log &lt;span class="nt"&gt;--where&lt;/span&gt; &lt;span class="s2"&gt;"success_rate &amp;gt; 0.9"&lt;/span&gt;
pit log &lt;span class="nt"&gt;--where&lt;/span&gt; &lt;span class="s2"&gt;"content contains 'be concise' AND tags contains production"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Query by metrics, not just metadata.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Shareable Patches
&lt;/h3&gt;

&lt;p&gt;Your teammate improved a prompt. You want that improvement.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pit patch create prompt v1 v2 &lt;span class="nt"&gt;--output&lt;/span&gt; fix.patch
pit patch apply fix.patch &lt;span class="nt"&gt;--to&lt;/span&gt; my-prompt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Like Git patches, but for prompt semantics.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Git-Style Hooks
&lt;/h3&gt;

&lt;p&gt;Prevent bad prompts from reaching production:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pit hooks &lt;span class="nb"&gt;install &lt;/span&gt;pre-commit
&lt;span class="c"&gt;# Scans for security issues before every commit&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;CI/CD for prompts. Finally.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Dependencies
&lt;/h3&gt;

&lt;p&gt;Your prompts depend on other prompts. Track it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pit deps add shared github org/repo/prompts &lt;span class="nt"&gt;--version&lt;/span&gt; v1.0
pit deps &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Like npm for prompts. Version-lock everything.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Full Feature Set
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;What It Does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Bisect&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Binary search to find broken versions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Replay&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Test same input across all versions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Patches&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Export/import prompt changes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hooks&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Pre-commit, post-checkout automation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Bundles&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Package and share prompts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Query Language&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Search by behavior metrics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dependencies&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;External prompt packages&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Worktrees&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Multiple contexts without switching&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Stash&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Save WIP with test context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Semantic Merge&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Smart conflict detection&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;332 tests. Production-ready. Open source.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;Prompts are becoming &lt;strong&gt;critical infrastructure&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just like we do not deploy code without version control, we should not deploy prompts without it either.&lt;/p&gt;

&lt;p&gt;PIT brings software engineering discipline to prompt engineering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Traceability (who changed what, when, why)&lt;/li&gt;
&lt;li&gt;Reproducibility (checkout any version instantly)&lt;/li&gt;
&lt;li&gt;Collaboration (patches, bundles, dependencies)&lt;/li&gt;
&lt;li&gt;Quality (hooks, testing, metrics)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;prompt-pit
pit init
pit add my-prompt.md &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="s2"&gt;"my-prompt"&lt;/span&gt;
pit commit my-prompt &lt;span class="nt"&gt;--message&lt;/span&gt; &lt;span class="s2"&gt;"Initial version"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;⭐ &lt;strong&gt;Star it on GitHub&lt;/strong&gt;: &lt;a href="https://github.com/itisrmk/pit" rel="noopener noreferrer"&gt;github.com/itisrmk/pit&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What is Your Biggest Prompt Management Pain?
&lt;/h2&gt;

&lt;p&gt;I built PIT to solve my own headaches.&lt;/p&gt;

&lt;p&gt;But I am curious — what frustrates you most about managing prompts in production?&lt;/p&gt;

&lt;p&gt;Drop a comment below 👇&lt;/p&gt;




&lt;p&gt;&lt;em&gt;PIT is free, open source (MIT), and built with Python + Rich + Typer.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>opensource</category>
      <category>python</category>
    </item>
  </channel>
</rss>
