<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Ertugrul</title>
    <description>The latest articles on Forem by Ertugrul (@ertugrulmutlu).</description>
    <link>https://forem.com/ertugrulmutlu</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1342871%2F80fb281c-da45-4841-b3a7-762c28f06416.jpg</url>
      <title>Forem: Ertugrul</title>
      <link>https://forem.com/ertugrulmutlu</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/ertugrulmutlu"/>
    <language>en</language>
    <item>
      <title>OpenAnima v1 — Open-Source Desktop Overlay Engine for Windows</title>
      <dc:creator>Ertugrul</dc:creator>
      <pubDate>Fri, 15 May 2026 06:15:20 +0000</pubDate>
      <link>https://forem.com/ertugrulmutlu/openanima-v1-open-source-desktop-overlay-engine-for-windows-3m9c</link>
      <guid>https://forem.com/ertugrulmutlu/openanima-v1-open-source-desktop-overlay-engine-for-windows-3m9c</guid>
      <description>&lt;h1&gt;
  
  
  OpenAnima v1 — Open-Source Desktop Overlay Engine for Windows
&lt;/h1&gt;

&lt;p&gt;After months of building, experimenting, rewriting systems, debugging strange desktop issues, and testing different asset formats, OpenAnima v1 is finally out.&lt;/p&gt;

&lt;p&gt;OpenAnima is an open-source desktop overlay engine for Windows that lets you place animated assets directly onto your desktop.&lt;/p&gt;

&lt;p&gt;You can use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GIFs&lt;/li&gt;
&lt;li&gt;Static images&lt;/li&gt;
&lt;li&gt;Sprite strips&lt;/li&gt;
&lt;li&gt;Spritesheets&lt;/li&gt;
&lt;li&gt;Frame-folder animations&lt;/li&gt;
&lt;li&gt;RPG-style HUD elements&lt;/li&gt;
&lt;li&gt;Transparent animated assets&lt;/li&gt;
&lt;li&gt;Pixel-art characters&lt;/li&gt;
&lt;li&gt;Desktop companions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal of the project was simple:&lt;/p&gt;

&lt;p&gt;Create a lightweight desktop overlay system that feels flexible, customizable, and fun to experiment with.&lt;/p&gt;




&lt;h2&gt;
  
  
  What OpenAnima Can Do
&lt;/h2&gt;

&lt;p&gt;OpenAnima allows you to spawn movable overlay windows directly on top of your desktop.&lt;/p&gt;

&lt;p&gt;Each overlay can be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;dragged freely&lt;/li&gt;
&lt;li&gt;resized&lt;/li&gt;
&lt;li&gt;locked in place&lt;/li&gt;
&lt;li&gt;made click-through&lt;/li&gt;
&lt;li&gt;set always-on-top&lt;/li&gt;
&lt;li&gt;adjusted for opacity&lt;/li&gt;
&lt;li&gt;animated with custom FPS settings&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The application also restores overlay states between sessions, so your desktop setup persists after restarting the app.&lt;/p&gt;




&lt;h2&gt;
  
  
  Supported Asset Types
&lt;/h2&gt;

&lt;p&gt;One of the biggest goals of the project was supporting multiple animation workflows instead of only GIFs.&lt;/p&gt;

&lt;p&gt;Currently supported:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GIF animations&lt;/li&gt;
&lt;li&gt;Static images (.png, .jpg, .webp)&lt;/li&gt;
&lt;li&gt;Frame-folder animations&lt;/li&gt;
&lt;li&gt;Horizontal sprite strips&lt;/li&gt;
&lt;li&gt;Vertical sprite strips&lt;/li&gt;
&lt;li&gt;Spritesheets with metadata&lt;/li&gt;
&lt;li&gt;Basic HUD/UI overlay assets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The engine also includes an asset analyzer system that tries to detect asset types automatically.&lt;/p&gt;




&lt;h2&gt;
  
  
  Built With
&lt;/h2&gt;

&lt;p&gt;OpenAnima is primarily built using:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python&lt;/li&gt;
&lt;li&gt;PyQt6&lt;/li&gt;
&lt;li&gt;QMovie / QPixmap rendering systems&lt;/li&gt;
&lt;li&gt;Custom animation parsing logic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A lot of time went into handling edge cases like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;hidden overlays&lt;/li&gt;
&lt;li&gt;broken configs&lt;/li&gt;
&lt;li&gt;off-screen windows&lt;/li&gt;
&lt;li&gt;click-through recovery&lt;/li&gt;
&lt;li&gt;corrupted animation states&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;v1 focuses heavily on stability and recovery tools.&lt;/p&gt;




&lt;h2&gt;
  
  
  Features Added During Development
&lt;/h2&gt;

&lt;p&gt;Some systems added throughout development:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Persistent overlay state saving&lt;/li&gt;
&lt;li&gt;Safer config recovery&lt;/li&gt;
&lt;li&gt;Recovery tools for invisible overlays&lt;/li&gt;
&lt;li&gt;Logging system&lt;/li&gt;
&lt;li&gt;Diagnostics panel&lt;/li&gt;
&lt;li&gt;Asset metadata support&lt;/li&gt;
&lt;li&gt;Import wizard&lt;/li&gt;
&lt;li&gt;Sprite animation handling&lt;/li&gt;
&lt;li&gt;Basic layered UI rendering&lt;/li&gt;
&lt;li&gt;System tray controls&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why I Built It
&lt;/h2&gt;

&lt;p&gt;I always liked desktop customization tools, animated desktop companions, game HUD overlays, and lightweight desktop effects.&lt;/p&gt;

&lt;p&gt;Most existing solutions were either:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;too limited&lt;/li&gt;
&lt;li&gt;too complicated&lt;/li&gt;
&lt;li&gt;too specific&lt;/li&gt;
&lt;li&gt;abandoned&lt;/li&gt;
&lt;li&gt;or focused on only one asset type&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So I started experimenting with building my own system.&lt;/p&gt;

&lt;p&gt;The project slowly evolved from “just display GIFs on desktop” into a more general desktop overlay engine.&lt;/p&gt;




&lt;h2&gt;
  
  
  Future Plans
&lt;/h2&gt;

&lt;p&gt;Some ideas planned for future versions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Better transparent video support&lt;/li&gt;
&lt;li&gt;More advanced animation systems&lt;/li&gt;
&lt;li&gt;Improved UI tools&lt;/li&gt;
&lt;li&gt;Better performance optimizations&lt;/li&gt;
&lt;li&gt;Asset marketplace/import improvements&lt;/li&gt;
&lt;li&gt;More overlay interaction systems&lt;/li&gt;
&lt;li&gt;Possible 3D asset support in the future&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Download
&lt;/h2&gt;

&lt;p&gt;Website / Download:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenAnima Website: &lt;a href="https://ertugrulmutlu.github.io/OpenAnima" rel="noopener noreferrer"&gt;https://ertugrulmutlu.github.io/OpenAnima/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;GitHub Repository:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenAnima GitHub Repository: &lt;a href="https://github.com/Ertugrulmutlu/OpenAnima" rel="noopener noreferrer"&gt;https://github.com/Ertugrulmutlu/OpenAnima&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;itch.io Page:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenAnima itch.io Page: &lt;a href="https://ertugrulmutlu.itch.io/openanima" rel="noopener noreferrer"&gt;https://ertugrulmutlu.itch.io/openanima&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instagram:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenAnima Instagram: &lt;a href="https://www.instagram.com/openanimaengine" rel="noopener noreferrer"&gt;https://www.instagram.com/openanimaengine&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;OpenAnima started as a small side project and slowly became a much larger system than I originally expected.&lt;/p&gt;

&lt;p&gt;There is still a lot to improve, but releasing v1 feels like a huge milestone for the project.&lt;/p&gt;

&lt;p&gt;Feedback, ideas, bug reports, and experiments are always welcome.&lt;/p&gt;

&lt;p&gt;I’m excited to see what people create with it.&lt;/p&gt;

</description>
      <category>gamedev</category>
      <category>opensource</category>
      <category>showdev</category>
      <category>sideprojects</category>
    </item>
    <item>
      <title>PromptLedger v0.6 — Turning prompt history into a local workspace dashboard</title>
      <dc:creator>Ertugrul</dc:creator>
      <pubDate>Mon, 04 May 2026 16:23:46 +0000</pubDate>
      <link>https://forem.com/ertugrulmutlu/promptledger-v06-turning-prompt-history-into-a-local-workspace-dashboard-3cen</link>
      <guid>https://forem.com/ertugrulmutlu/promptledger-v06-turning-prompt-history-into-a-local-workspace-dashboard-3cen</guid>
      <description>&lt;h2&gt;
  
  
  Devlog — Part 5
&lt;/h2&gt;

&lt;p&gt;PromptLedger v0.6 is out.&lt;/p&gt;

&lt;p&gt;This release changes how PromptLedger feels to use.&lt;/p&gt;

&lt;p&gt;Until now, PromptLedger was primarily a terminal-first tool with a small read-only viewer. The core workflow already existed: store prompt versions, compare changes, label releases, mark important versions, and keep everything local in SQLite.&lt;/p&gt;

&lt;p&gt;That worked.&lt;/p&gt;

&lt;p&gt;But once a prompt library grows, a simple viewer stops being enough.&lt;/p&gt;

&lt;p&gt;Prompt iteration is not just about storing text. It is about navigating decisions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which version worked best?&lt;/li&gt;
&lt;li&gt;Which version became stable?&lt;/li&gt;
&lt;li&gt;What changed between versions?&lt;/li&gt;
&lt;li&gt;Which prompt belongs to which workflow?&lt;/li&gt;
&lt;li&gt;Which prompt should be reused?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those questions are easier to answer when the history is not only stored, but also visible and interactive.&lt;/p&gt;

&lt;p&gt;So v0.6 turns the old viewer into a local prompt workspace dashboard.&lt;/p&gt;




&lt;h2&gt;
  
  
  Workspace dashboard
&lt;/h2&gt;

&lt;p&gt;The dashboard now starts with a card-based workspace instead of a single prompt view.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqpruubjud6h477k1pzua.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqpruubjud6h477k1pzua.png" alt=" " width="800" height="469"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Each prompt appears as a card showing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;latest version&lt;/li&gt;
&lt;li&gt;collection and role&lt;/li&gt;
&lt;li&gt;markers such as stable or milestone&lt;/li&gt;
&lt;li&gt;short preview of the prompt&lt;/li&gt;
&lt;li&gt;last updated time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This makes it easier to scan a prompt library and understand what exists without opening each item.&lt;/p&gt;




&lt;h2&gt;
  
  
  Prompt detail view
&lt;/h2&gt;

&lt;p&gt;Clicking a card opens a detailed view of the prompt.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk1ukgab6e3ixuvfr39js.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk1ukgab6e3ixuvfr39js.png" alt=" " width="800" height="619"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This view includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;full prompt text&lt;/li&gt;
&lt;li&gt;version metadata&lt;/li&gt;
&lt;li&gt;markers and labels&lt;/li&gt;
&lt;li&gt;version timeline&lt;/li&gt;
&lt;li&gt;side-by-side comparison&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal is to make it easy to understand how a prompt evolved over time.&lt;/p&gt;




&lt;h2&gt;
  
  
  What changed in v0.6
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Workspace instead of viewer
&lt;/h3&gt;

&lt;p&gt;The dashboard is no longer just a viewer. It is a workspace where prompts can be explored and organized visually.&lt;/p&gt;

&lt;h3&gt;
  
  
  Card-based interaction
&lt;/h3&gt;

&lt;p&gt;Prompts are now treated as objects instead of rows in a list. Cards provide quick context and actions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Marker actions in the UI
&lt;/h3&gt;

&lt;p&gt;Stable and milestone markers can now be applied directly from the dashboard. These actions use the same underlying marker system as the CLI.&lt;/p&gt;

&lt;h3&gt;
  
  
  Compare workflow
&lt;/h3&gt;

&lt;p&gt;The compare view has been improved to clearly show differences between versions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Usability improvements
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;keyboard shortcuts for faster navigation&lt;/li&gt;
&lt;li&gt;copy actions with feedback&lt;/li&gt;
&lt;li&gt;better empty states&lt;/li&gt;
&lt;li&gt;improved hover and selection behavior&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Design direction
&lt;/h2&gt;

&lt;p&gt;PromptLedger remains intentionally limited in scope.&lt;/p&gt;

&lt;p&gt;It is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;local-first&lt;/li&gt;
&lt;li&gt;SQLite-backed&lt;/li&gt;
&lt;li&gt;CLI-driven for write operations&lt;/li&gt;
&lt;li&gt;dashboard-driven for inspection and workflow&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It does not include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;cloud services&lt;/li&gt;
&lt;li&gt;telemetry&lt;/li&gt;
&lt;li&gt;external APIs&lt;/li&gt;
&lt;li&gt;AI features&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This boundary is important.&lt;/p&gt;

&lt;p&gt;The goal is not to build another platform, but to provide a reliable tool for working with prompt history.&lt;/p&gt;




&lt;h2&gt;
  
  
  Installation
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--upgrade&lt;/span&gt; promptledger
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Run the dashboard
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;promptledger dashboard
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;v0.6 is a shift from storing prompt history to working with it.&lt;/p&gt;

&lt;p&gt;The dashboard is not meant to replace the CLI, but to complement it by making prompt iteration easier to inspect, compare, and organize.&lt;/p&gt;

&lt;p&gt;There is still a lot to improve, but this version establishes a clearer direction:&lt;/p&gt;

&lt;h2&gt;
  
  
  PromptLedger is becoming a tool for thinking about prompts, not just storing them.
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;p&gt;PyPI: &lt;a href="https://pypi.org/project/promptledger/" rel="noopener noreferrer"&gt;PyPI&lt;/a&gt;&lt;br&gt;
GitHub: &lt;a href="https://github.com/Ertugrulmutlu/promptledger" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;br&gt;
LinkedIn: &lt;a href="https://www.linkedin.com/in/ertugrul-mutlu/?locale=en" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;br&gt;
Website: &lt;a href="https://ertugrulmutlu.github.io" rel="noopener noreferrer"&gt;Website&lt;/a&gt;&lt;/p&gt;

</description>
      <category>promptengineering</category>
      <category>python</category>
      <category>opensource</category>
      <category>showdev</category>
    </item>
    <item>
      <title>OpenAnima v0.2 Preview: Turning the Windows Desktop into a Living Canvas</title>
      <dc:creator>Ertugrul</dc:creator>
      <pubDate>Fri, 01 May 2026 10:31:23 +0000</pubDate>
      <link>https://forem.com/ertugrulmutlu/openanima-v02-preview-turning-the-windows-desktop-into-a-living-canvas-11lh</link>
      <guid>https://forem.com/ertugrulmutlu/openanima-v02-preview-turning-the-windows-desktop-into-a-living-canvas-11lh</guid>
      <description>&lt;h1&gt;
  
  
  OpenAnima v0.2 Preview: Turning the Windows Desktop into a Living Canvas
&lt;/h1&gt;

&lt;p&gt;I recently published &lt;strong&gt;OpenAnima v0.2 Preview&lt;/strong&gt;, and this release is a big step for the project.&lt;/p&gt;

&lt;p&gt;OpenAnima started as a small experiment: what if I could place animated GIFs directly on my Windows desktop as movable overlay objects?&lt;/p&gt;

&lt;p&gt;That simple idea quickly became more interesting.&lt;/p&gt;

&lt;p&gt;Instead of only supporting GIFs, OpenAnima is now moving toward a more general goal:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;An open-source desktop asset overlay engine for Windows.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The idea is to make the desktop feel less static. OpenAnima lets you place animated assets, sprites, frame animations, HUD elements, and game-style visual assets directly on your desktop, then control how they behave.&lt;/p&gt;

&lt;p&gt;Website: &lt;a href="https://ertugrulmutlu.github.io/OpenAnima/" rel="noopener noreferrer"&gt;https://ertugrulmutlu.github.io/OpenAnima/&lt;/a&gt;&lt;br&gt;
Itch.io: &lt;a href="https://ertugrulmutlu.itch.io/openanima" rel="noopener noreferrer"&gt;https://ertugrulmutlu.itch.io/openanima&lt;/a&gt;&lt;br&gt;
GitHub: &lt;a href="https://github.com/Ertugrulmutlu/OpenAnima" rel="noopener noreferrer"&gt;https://github.com/Ertugrulmutlu/OpenAnima&lt;/a&gt;&lt;br&gt;
Release: &lt;a href="https://github.com/Ertugrulmutlu/OpenAnima/releases/tag/v0.2.0-preview" rel="noopener noreferrer"&gt;https://github.com/Ertugrulmutlu/OpenAnima/releases/tag/v0.2.0-preview&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  What OpenAnima does
&lt;/h2&gt;

&lt;p&gt;OpenAnima is a Windows desktop application that lets users add visual assets on top of their desktop.&lt;/p&gt;

&lt;p&gt;These assets can be moved around, configured, and used as lightweight desktop overlays.&lt;/p&gt;

&lt;p&gt;Some example use cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Desktop companions or animated mascots&lt;/li&gt;
&lt;li&gt;Game-style HUD elements&lt;/li&gt;
&lt;li&gt;Stream or recording overlays&lt;/li&gt;
&lt;li&gt;Sprite and animation preview experiments&lt;/li&gt;
&lt;li&gt;Ambient desktop widgets&lt;/li&gt;
&lt;li&gt;Weird little visual experiments&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The project is still early, but the direction is becoming clearer: OpenAnima is not just a GIF player. It is becoming a small visual layer above the desktop.&lt;/p&gt;


&lt;h2&gt;
  
  
  Why I built it
&lt;/h2&gt;

&lt;p&gt;I like projects that feel small at first, but slowly reveal a larger design space.&lt;/p&gt;

&lt;p&gt;At the beginning, OpenAnima was simply about putting GIFs on the desktop. That was already fun, but it also felt limited.&lt;/p&gt;

&lt;p&gt;Once I started thinking about game assets, HUD elements, sprites, and layered UI assets, the project became more like a desktop rendering playground.&lt;/p&gt;

&lt;p&gt;I wanted to explore questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Can desktop overlays be treated like reusable visual assets?&lt;/li&gt;
&lt;li&gt;Can a normal desktop become a small interactive canvas?&lt;/li&gt;
&lt;li&gt;Can game-style assets live outside a game engine?&lt;/li&gt;
&lt;li&gt;Can asset packs become something users can import, configure, and reuse?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is where the v0.2 release comes in.&lt;/p&gt;


&lt;h2&gt;
  
  
  What changed in v0.2
&lt;/h2&gt;

&lt;p&gt;The v0.2 preview expands OpenAnima from a GIF overlay tool into a more metadata-driven desktop asset engine.&lt;/p&gt;

&lt;p&gt;The biggest changes are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generic asset analyzer and import wizard&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;asset.json&lt;/code&gt; metadata support&lt;/li&gt;
&lt;li&gt;Support for multiple asset formats&lt;/li&gt;
&lt;li&gt;Sprite strip setup workflow&lt;/li&gt;
&lt;li&gt;Spritesheet rendering with named animations&lt;/li&gt;
&lt;li&gt;Composite UI/HUD assets&lt;/li&gt;
&lt;li&gt;Runtime sliders for layered UI assets&lt;/li&gt;
&lt;li&gt;Improved Editor tab and Asset Setup dialog layouts&lt;/li&gt;
&lt;li&gt;Safer metadata validation and error handling&lt;/li&gt;
&lt;li&gt;Backward compatibility for existing GIF/static/frame-folder workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This release is still a preview, but the foundation is much stronger than the first version.&lt;/p&gt;


&lt;h2&gt;
  
  
  Supported asset types
&lt;/h2&gt;

&lt;p&gt;OpenAnima v0.2 supports several asset types:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GIF&lt;/li&gt;
&lt;li&gt;Static image&lt;/li&gt;
&lt;li&gt;Frame-folder animation&lt;/li&gt;
&lt;li&gt;Sprite strip&lt;/li&gt;
&lt;li&gt;Spritesheet&lt;/li&gt;
&lt;li&gt;Composite UI / HUD&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The first version was mostly focused on simple animated overlays. v0.2 starts building the system needed for more structured assets.&lt;/p&gt;


&lt;h2&gt;
  
  
  Metadata-driven assets
&lt;/h2&gt;

&lt;p&gt;One of the most important changes in v0.2 is &lt;code&gt;asset.json&lt;/code&gt; support.&lt;/p&gt;

&lt;p&gt;Instead of hardcoding every asset type, OpenAnima can now use metadata to understand how an asset should be loaded and rendered.&lt;/p&gt;

&lt;p&gt;A simplified example could look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"demo_hud"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"composite_ui"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"layers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"background"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"file"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"background.png"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bar"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"file"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bar.png"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This makes the project more flexible. Instead of treating every asset as just an image or GIF, OpenAnima can start understanding assets as structured objects.&lt;/p&gt;

&lt;p&gt;That opens the door for asset packs, richer UI overlays, and more advanced configuration later.&lt;/p&gt;




&lt;h2&gt;
  
  
  Sprite strips and spritesheets
&lt;/h2&gt;

&lt;p&gt;Another focus of v0.2 was better support for game-style assets.&lt;/p&gt;

&lt;p&gt;Sprite strips and spritesheets are common in game development, but they usually need some setup before they can be used correctly.&lt;/p&gt;

&lt;p&gt;OpenAnima now includes workflows for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Frame count&lt;/li&gt;
&lt;li&gt;Frame size&lt;/li&gt;
&lt;li&gt;Crop fields&lt;/li&gt;
&lt;li&gt;Preview grid&lt;/li&gt;
&lt;li&gt;Frame export&lt;/li&gt;
&lt;li&gt;Named animations for spritesheets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is still not perfect, especially for unusual sprite layouts, but it is a good step toward making OpenAnima useful for more than simple GIF overlays.&lt;/p&gt;




&lt;h2&gt;
  
  
  Composite UI and HUD assets
&lt;/h2&gt;

&lt;p&gt;I also added support for composite UI/HUD assets.&lt;/p&gt;

&lt;p&gt;The idea is simple: an overlay does not have to be one image. It can be made from multiple layers.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A health bar background&lt;/li&gt;
&lt;li&gt;A fill layer&lt;/li&gt;
&lt;li&gt;A frame layer&lt;/li&gt;
&lt;li&gt;A text or icon layer&lt;/li&gt;
&lt;li&gt;Runtime sliders controlling values&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This makes it possible to experiment with game-like desktop HUDs.&lt;/p&gt;

&lt;p&gt;The current Composite UI editor is functional, but it is not meant to be a professional layout tool yet. It is more of a foundation for future versions.&lt;/p&gt;




&lt;h2&gt;
  
  
  The control panel
&lt;/h2&gt;

&lt;p&gt;OpenAnima has a small control panel with three main areas:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Library&lt;/li&gt;
&lt;li&gt;Active&lt;/li&gt;
&lt;li&gt;Editor&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Library tab is used to import and manage assets.&lt;/p&gt;

&lt;p&gt;The Active tab is used to manage overlays currently placed on the desktop.&lt;/p&gt;

&lt;p&gt;The Editor tab is used to adjust selected assets with controls like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scale&lt;/li&gt;
&lt;li&gt;Opacity&lt;/li&gt;
&lt;li&gt;Speed&lt;/li&gt;
&lt;li&gt;Always on top&lt;/li&gt;
&lt;li&gt;Click-through&lt;/li&gt;
&lt;li&gt;Locked&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal is to keep the interface practical and lightweight. I do not want OpenAnima to become a huge desktop suite. I want it to stay small, hackable, and focused.&lt;/p&gt;




&lt;h2&gt;
  
  
  Website and distribution
&lt;/h2&gt;

&lt;p&gt;For this release, I also created a small website for the project:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ertugrulmutlu.github.io/OpenAnima/" rel="noopener noreferrer"&gt;https://ertugrulmutlu.github.io/OpenAnima/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The website explains the idea, shows the demo, links to the GitHub repository, and provides access to the Windows executable through GitHub Releases.&lt;/p&gt;

&lt;p&gt;I also published the project on itch.io as a free tool:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ertugrulmutlu.itch.io/openanima" rel="noopener noreferrer"&gt;https://ertugrulmutlu.itch.io/openanima&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This was mainly to make the project feel more like a small product rather than only a repository.&lt;/p&gt;




&lt;h2&gt;
  
  
  Known limitations
&lt;/h2&gt;

&lt;p&gt;This is still a preview release, so some limitations are expected.&lt;/p&gt;

&lt;p&gt;Known issues include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sprite strips may require manual frame count or frame size correction.&lt;/li&gt;
&lt;li&gt;Some sprite strips with unusual padding may need manual crop values.&lt;/li&gt;
&lt;li&gt;Spritesheets require metadata or setup through the import wizard.&lt;/li&gt;
&lt;li&gt;Composite UI assets may require manual layer alignment.&lt;/li&gt;
&lt;li&gt;The Composite UI editor is functional but not a full professional layout tool.&lt;/li&gt;
&lt;li&gt;3D model support is not included yet.&lt;/li&gt;
&lt;li&gt;Some unusual asset packs may still need manual &lt;code&gt;asset.json&lt;/code&gt; editing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I prefer being clear about this because v0.2 is not a polished final product. It is a foundation release.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I want to explore next
&lt;/h2&gt;

&lt;p&gt;There are several directions I want to explore after v0.2:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Asset packs&lt;/li&gt;
&lt;li&gt;Better first-run experience&lt;/li&gt;
&lt;li&gt;More polished installer/distribution flow&lt;/li&gt;
&lt;li&gt;Better preview and import tools&lt;/li&gt;
&lt;li&gt;More reliable spritesheet workflows&lt;/li&gt;
&lt;li&gt;Simple 3D overlay experiments&lt;/li&gt;
&lt;li&gt;Linux experiments later&lt;/li&gt;
&lt;li&gt;Better documentation for custom assets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The 3D direction is especially interesting, but I do not want to rush it before the 2D asset foundation is stable.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I learned
&lt;/h2&gt;

&lt;p&gt;This project reminded me that even small desktop tools can become surprisingly deep.&lt;/p&gt;

&lt;p&gt;At first, the problem looked simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Put an animated object on the desktop.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But once I started supporting different asset types, the project became about importing, validating, describing, previewing, rendering, and controlling assets.&lt;/p&gt;

&lt;p&gt;That means the real challenge is not only drawing something on the desktop. The real challenge is creating a flexible system around desktop assets.&lt;/p&gt;

&lt;p&gt;OpenAnima v0.2 is my first serious step in that direction.&lt;/p&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;p&gt;Website: &lt;a href="https://ertugrulmutlu.github.io/OpenAnima/" rel="noopener noreferrer"&gt;https://ertugrulmutlu.github.io/OpenAnima/&lt;/a&gt;&lt;br&gt;
GitHub: &lt;a href="https://github.com/Ertugrulmutlu/OpenAnima" rel="noopener noreferrer"&gt;https://github.com/Ertugrulmutlu/OpenAnima&lt;/a&gt;&lt;br&gt;
Release: &lt;a href="https://github.com/Ertugrulmutlu/OpenAnima/releases/tag/v0.2.0-preview" rel="noopener noreferrer"&gt;https://github.com/Ertugrulmutlu/OpenAnima/releases/tag/v0.2.0-preview&lt;/a&gt;&lt;br&gt;
itch.io: &lt;a href="https://ertugrulmutlu.itch.io/openanima" rel="noopener noreferrer"&gt;https://ertugrulmutlu.itch.io/openanima&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Feedback, bug reports, weird desktop overlay ideas, and asset workflow suggestions are very welcome.&lt;/p&gt;

&lt;p&gt;Thanks for reading.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>showdev</category>
      <category>sideprojects</category>
      <category>ui</category>
    </item>
    <item>
      <title>PromptLedger v0.4 — Faster prompt logging, lightweight markers, and better prompt organization</title>
      <dc:creator>Ertugrul</dc:creator>
      <pubDate>Mon, 27 Apr 2026 15:19:18 +0000</pubDate>
      <link>https://forem.com/ertugrulmutlu/promptledger-v04-faster-prompt-logging-lightweight-markers-and-better-prompt-organization-2b2g</link>
      <guid>https://forem.com/ertugrulmutlu/promptledger-v04-faster-prompt-logging-lightweight-markers-and-better-prompt-organization-2b2g</guid>
      <description>&lt;p&gt;&lt;strong&gt;Devlog — Part 4&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;PromptLedger v0.4 is mostly about making prompt versioning easier to use repeatedly: faster logging, clearer organization, and small release signals for versions worth remembering.&lt;/p&gt;

&lt;p&gt;In the earlier parts of this series, PromptLedger started as a deliberately small local-first prompt version control tool. Then it gained labels, status semantics, better diffs, review workflows, semantic summaries, warnings, and Markdown export.&lt;/p&gt;

&lt;p&gt;Those additions made the history more useful after prompts had already been logged.&lt;/p&gt;

&lt;p&gt;Some of the direction for v0.4 also came from feedback and from watching where the workflow still felt a bit too manual. So before going into the details: thank you to everyone who tried the earlier versions, shared thoughts, pointed out rough edges, or simply asked practical questions about how the tool should behave during real prompt iteration.&lt;/p&gt;

&lt;p&gt;v0.4 focuses more on the moment before review: the actual day-to-day act of adding, organizing, and revisiting prompt versions.&lt;/p&gt;

&lt;p&gt;Because in practice, a prompt version control tool only works if logging a prompt is cheap enough that you actually keep doing it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why this part was needed
&lt;/h2&gt;

&lt;p&gt;PromptLedger has always been intentionally limited in scope.&lt;/p&gt;

&lt;p&gt;It is SQLite-backed. It is terminal-first. It does not need a hosted backend. It does not try to execute prompts for you. It does not try to become an evaluation platform.&lt;/p&gt;

&lt;p&gt;The core job is still simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;store prompt versions&lt;/li&gt;
&lt;li&gt;compare them&lt;/li&gt;
&lt;li&gt;review changes&lt;/li&gt;
&lt;li&gt;organize prompt history&lt;/li&gt;
&lt;li&gt;make the history inspectable later&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But once I started using it more like a real prompt library, a small problem became obvious.&lt;/p&gt;

&lt;p&gt;Adding a prompt version was technically easy, but repetitive.&lt;/p&gt;

&lt;p&gt;During actual prompt iteration, the prompt text changes often, while the surrounding metadata usually stays the same. The same author. The same environment. The same tags. The same library grouping. The same role.&lt;/p&gt;

&lt;p&gt;Having to retype that metadata every time creates friction.&lt;/p&gt;

&lt;p&gt;And friction matters. If logging is annoying, the history becomes incomplete. If the history is incomplete, review becomes less useful. And if review becomes less useful, the tool stops doing its main job.&lt;/p&gt;

&lt;p&gt;So v0.4 is not a big architectural release.&lt;/p&gt;

&lt;p&gt;It is a workflow release.&lt;/p&gt;

&lt;p&gt;It makes repeated prompt logging faster, adds better organization primitives, and introduces lightweight markers for important versions.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick add: less typing during real iteration
&lt;/h2&gt;

&lt;p&gt;The main usability change in v0.4 is the new &lt;code&gt;add --quick&lt;/code&gt; workflow.&lt;/p&gt;

&lt;p&gt;A normal add can still be explicit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;promptledger add &lt;span class="nt"&gt;--id&lt;/span&gt; onboarding &lt;span class="nt"&gt;--text&lt;/span&gt; &lt;span class="s2"&gt;"..."&lt;/span&gt; &lt;span class="nt"&gt;--collection&lt;/span&gt; support &lt;span class="nt"&gt;--role&lt;/span&gt; system
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is useful when creating a new prompt or when metadata should be stated clearly.&lt;/p&gt;

&lt;p&gt;But once a prompt already exists, most iterations do not need all metadata to be typed again. With &lt;code&gt;--quick&lt;/code&gt;, PromptLedger can reuse safe metadata defaults from the latest version of the same prompt id.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;promptledger add &lt;span class="nt"&gt;--id&lt;/span&gt; onboarding &lt;span class="nt"&gt;--text&lt;/span&gt; &lt;span class="s2"&gt;"..."&lt;/span&gt; &lt;span class="nt"&gt;--quick&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the latest &lt;code&gt;onboarding&lt;/code&gt; version already had metadata like author, tags, env, collection, and role, those values can be reused unless explicitly overridden.&lt;/p&gt;

&lt;p&gt;That means the common workflow becomes much smaller:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;edit the prompt&lt;/li&gt;
&lt;li&gt;add the new version&lt;/li&gt;
&lt;li&gt;keep moving&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is not a flashy feature, but it changes the feel of the tool.&lt;/p&gt;

&lt;p&gt;Prompt logging becomes closer to a habit than a chore.&lt;/p&gt;

&lt;p&gt;That matters because PromptLedger is not useful because one perfect prompt was saved once. It is useful because a sequence of changes becomes reviewable later.&lt;/p&gt;

&lt;p&gt;Quick add is there to protect that sequence.&lt;/p&gt;




&lt;h2&gt;
  
  
  Collection and role are now first-class metadata
&lt;/h2&gt;

&lt;p&gt;v0.4 also adds two first-class metadata fields on prompt versions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;collection&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;role&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The reason is simple: once a prompt library grows, ids and tags are not always enough.&lt;/p&gt;

&lt;p&gt;A &lt;code&gt;collection&lt;/code&gt; gives prompts a lightweight grouping. It can represent a product area, a project, a use case, a customer support flow, an internal tool, or just a personal folder-like grouping.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;promptledger add &lt;span class="nt"&gt;--id&lt;/span&gt; onboarding &lt;span class="nt"&gt;--text&lt;/span&gt; &lt;span class="s2"&gt;"..."&lt;/span&gt; &lt;span class="nt"&gt;--collection&lt;/span&gt; support &lt;span class="nt"&gt;--role&lt;/span&gt; system
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This makes it easier to ask questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which prompts belong to the support collection?&lt;/li&gt;
&lt;li&gt;Which prompts are part of an onboarding workflow?&lt;/li&gt;
&lt;li&gt;Which versions were written for a specific environment or use case?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The second field, &lt;code&gt;role&lt;/code&gt;, is about what kind of prompt artifact this version represents.&lt;/p&gt;

&lt;p&gt;Built-in roles are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;system&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;user&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;template&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;modelfile&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;eval&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This matters because prompt libraries usually contain different kinds of artifacts.&lt;/p&gt;

&lt;p&gt;A system instruction is not the same thing as a reusable template. An evaluation prompt is not the same thing as a user-facing message. A model file prompt is not the same thing as an onboarding assistant instruction.&lt;/p&gt;

&lt;p&gt;Before v0.4, these differences could be represented with tags, but tags are free-form and tend to become messy over time.&lt;/p&gt;

&lt;p&gt;Making &lt;code&gt;role&lt;/code&gt; first-class gives PromptLedger a small amount of structure without turning it into a large framework.&lt;/p&gt;

&lt;p&gt;That is the balance I wanted here: enough organization to be useful, but not so much that the tool starts dictating how every prompt library must be designed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Markers: small signals for important versions
&lt;/h2&gt;

&lt;p&gt;v0.4 introduces a new marker system for prompt versions.&lt;/p&gt;

&lt;p&gt;The core commands are:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;promptledger marker &lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;--id&lt;/span&gt; onboarding &lt;span class="nt"&gt;--version&lt;/span&gt; 8 &lt;span class="nt"&gt;--name&lt;/span&gt; stable
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;promptledger marker remove &lt;span class="nt"&gt;--id&lt;/span&gt; onboarding &lt;span class="nt"&gt;--version&lt;/span&gt; 8 &lt;span class="nt"&gt;--name&lt;/span&gt; stable
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;promptledger marker list &lt;span class="nt"&gt;--id&lt;/span&gt; onboarding
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;promptledger marker show &lt;span class="nt"&gt;--id&lt;/span&gt; onboarding &lt;span class="nt"&gt;--version&lt;/span&gt; 8
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are also convenience commands for the built-in markers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;promptledger stable &lt;span class="nt"&gt;--id&lt;/span&gt; onboarding
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;promptledger milestone &lt;span class="nt"&gt;--id&lt;/span&gt; onboarding
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The built-in markers are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;stable&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;milestone&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Markers are intentionally lighter than labels.&lt;/p&gt;

&lt;p&gt;That distinction is important.&lt;/p&gt;

&lt;p&gt;Labels in PromptLedger are useful when you want release-like semantics or named pointers. They can represent states such as a current production prompt, a reviewed version, or a version used in a specific workflow.&lt;/p&gt;

&lt;p&gt;Markers are smaller than that.&lt;/p&gt;

&lt;p&gt;A marker says: this version is worth noticing.&lt;/p&gt;

&lt;p&gt;Maybe it was the first version that worked well. Maybe it was a milestone during a rewrite. Maybe it was stable enough to revisit later. Maybe it is just a checkpoint that should not get lost in the version list.&lt;/p&gt;

&lt;p&gt;Not every important prompt version needs full label semantics.&lt;/p&gt;

&lt;p&gt;Sometimes you only need a small flag attached directly to a version.&lt;/p&gt;

&lt;p&gt;That is what markers are for.&lt;/p&gt;

&lt;p&gt;They are deliberately simple. They do not try to encode deployment state, environment ownership, evaluation results, or production routing. They are just lightweight release signals inside the local history.&lt;/p&gt;




&lt;h2&gt;
  
  
  Search, list, and show became more useful
&lt;/h2&gt;

&lt;p&gt;Organization only helps if the tools expose it.&lt;/p&gt;

&lt;p&gt;So v0.4 updates &lt;code&gt;list&lt;/code&gt;, &lt;code&gt;show&lt;/code&gt;, and &lt;code&gt;search&lt;/code&gt; to surface and use collection, role, and marker information.&lt;/p&gt;

&lt;p&gt;For example, searching by metadata becomes more natural:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;promptledger search &lt;span class="nt"&gt;--collection&lt;/span&gt; support &lt;span class="nt"&gt;--role&lt;/span&gt; system
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Search can also work with an empty &lt;code&gt;--contains&lt;/code&gt;, which means metadata-only filtering is now possible.&lt;/p&gt;

&lt;p&gt;That sounds small, but it changes how PromptLedger can be used.&lt;/p&gt;

&lt;p&gt;Before, search was mostly about finding prompt text. Now it can also be used to navigate the prompt library itself.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;show all system prompts in a collection&lt;/li&gt;
&lt;li&gt;find evaluation prompts across a project&lt;/li&gt;
&lt;li&gt;inspect versions marked as stable&lt;/li&gt;
&lt;li&gt;separate templates from user prompts&lt;/li&gt;
&lt;li&gt;review one slice of the library without relying on text search&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This moves PromptLedger a little further from “prompt version storage” toward “prompt library workflow”.&lt;/p&gt;

&lt;p&gt;Still local. Still small. Still inspectable.&lt;/p&gt;

&lt;p&gt;But more practical once the number of prompt artifacts grows.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Streamlit UI is still read-only
&lt;/h2&gt;

&lt;p&gt;PromptLedger also has a small Streamlit UI for inspecting prompt history.&lt;/p&gt;

&lt;p&gt;In v0.4, the UI now surfaces collection, role, and markers in timeline, detail, and comparison views.&lt;/p&gt;

&lt;p&gt;That makes the UI more useful when browsing a prompt library. You can see not only how a prompt changed, but also what kind of prompt it is, which collection it belongs to, and whether a version was marked as stable or as a milestone.&lt;/p&gt;

&lt;p&gt;The important part: the UI is still read-only.&lt;/p&gt;

&lt;p&gt;That is intentional.&lt;/p&gt;

&lt;p&gt;For now, PromptLedger keeps editing and logging in the CLI, while the UI remains focused on inspection. This keeps the implementation smaller and avoids turning the viewer into a second source of write behavior.&lt;/p&gt;

&lt;p&gt;The terminal is where versions are created.&lt;/p&gt;

&lt;p&gt;The UI is where history can be reviewed more comfortably.&lt;/p&gt;

&lt;p&gt;That split still feels right for the project.&lt;/p&gt;




&lt;h2&gt;
  
  
  Database changes stayed local and simple
&lt;/h2&gt;

&lt;p&gt;v0.4 moves the schema version from 3 to 5.&lt;/p&gt;

&lt;p&gt;The main database changes are straightforward:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a new &lt;code&gt;markers&lt;/code&gt; table&lt;/li&gt;
&lt;li&gt;new &lt;code&gt;collection&lt;/code&gt; and &lt;code&gt;role&lt;/code&gt; columns on &lt;code&gt;prompt_versions&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;migrations for the local SQLite database&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There is no remote migration service. No hosted registry. No account state. No backend coordination.&lt;/p&gt;

&lt;p&gt;The migration story stayed aligned with the rest of the project: local, inspectable, and boring in a good way.&lt;/p&gt;

&lt;p&gt;That matters because PromptLedger should remain easy to understand. A small SQLite-backed tool should not require infrastructure thinking just to store prompt history.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tests expanded around the new workflows
&lt;/h2&gt;

&lt;p&gt;v0.4 also expands the test coverage substantially around the areas that changed most:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;add workflows&lt;/li&gt;
&lt;li&gt;quick add behavior&lt;/li&gt;
&lt;li&gt;marker commands&lt;/li&gt;
&lt;li&gt;search and metadata filtering&lt;/li&gt;
&lt;li&gt;related list/show behavior&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is not the kind of testing expansion that makes for a dramatic release note, but it is important for this project.&lt;/p&gt;

&lt;p&gt;PromptLedger deals with history. If history is stored incorrectly, the tool loses trust quickly.&lt;/p&gt;

&lt;p&gt;So the goal is practical correctness:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;metadata should be reused only when expected&lt;/li&gt;
&lt;li&gt;explicit overrides should still win&lt;/li&gt;
&lt;li&gt;markers should attach to the right versions&lt;/li&gt;
&lt;li&gt;search should return the intended slice of the prompt library&lt;/li&gt;
&lt;li&gt;existing workflows should not become harder to use&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a local-first developer tool, confidence matters more than feature count.&lt;/p&gt;




&lt;h2&gt;
  
  
  Design tradeoffs
&lt;/h2&gt;

&lt;p&gt;The main tradeoff in v0.4 was deciding how much structure to add.&lt;/p&gt;

&lt;p&gt;PromptLedger could have gone further.&lt;/p&gt;

&lt;p&gt;It could have introduced nested collections, custom role registries, marker categories, richer release channels, or project configuration files.&lt;/p&gt;

&lt;p&gt;I avoided that for now.&lt;/p&gt;

&lt;p&gt;The project works best when the concepts stay small:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ids identify prompt histories&lt;/li&gt;
&lt;li&gt;versions preserve changes over time&lt;/li&gt;
&lt;li&gt;labels provide stronger named semantics&lt;/li&gt;
&lt;li&gt;markers provide lightweight version signals&lt;/li&gt;
&lt;li&gt;collection groups prompt versions&lt;/li&gt;
&lt;li&gt;role explains what kind of prompt artifact a version is&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is enough structure to make a growing prompt library easier to navigate, without making the tool feel like a platform.&lt;/p&gt;

&lt;p&gt;This is also why markers are not labels with another name.&lt;/p&gt;

&lt;p&gt;Labels are useful when you want something closer to a maintained pointer or status. Markers are useful when you want to annotate a version as notable.&lt;/p&gt;

&lt;p&gt;Both can exist because they answer different workflow questions.&lt;/p&gt;

&lt;p&gt;A label asks:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What does this version represent in a larger workflow?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A marker asks:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Is this version worth noticing later?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That difference is small, but it keeps the model clean.&lt;/p&gt;




&lt;h2&gt;
  
  
  What did not change
&lt;/h2&gt;

&lt;p&gt;v0.4 does not change the basic philosophy of PromptLedger.&lt;/p&gt;

&lt;p&gt;It is still a small, local-first prompt version control tool.&lt;/p&gt;

&lt;p&gt;It still does not add:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;hosted registry&lt;/li&gt;
&lt;li&gt;prompt execution APIs&lt;/li&gt;
&lt;li&gt;agent tooling&lt;/li&gt;
&lt;li&gt;telemetry pipelines&lt;/li&gt;
&lt;li&gt;cloud sync&lt;/li&gt;
&lt;li&gt;automatic scoring&lt;/li&gt;
&lt;li&gt;evaluation harnesses&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those are not bad ideas in general. They are just outside the current scope of this project.&lt;/p&gt;

&lt;p&gt;PromptLedger is not trying to run prompts, score prompts, deploy prompts, or orchestrate agents.&lt;/p&gt;

&lt;p&gt;It is trying to make prompt changes easier to store, compare, review, and organize.&lt;/p&gt;

&lt;p&gt;That boundary is useful.&lt;/p&gt;

&lt;p&gt;Small scope is not a missing feature here. It is part of the design.&lt;/p&gt;




&lt;h2&gt;
  
  
  Closing thoughts
&lt;/h2&gt;

&lt;p&gt;PromptLedger v0.4 is a usability and organization release.&lt;/p&gt;

&lt;p&gt;It does not radically change what the tool is. Instead, it makes the existing workflow smoother.&lt;/p&gt;

&lt;p&gt;Quick add reduces friction during iteration. Collection and role make prompt libraries easier to navigate. Markers create a lightweight way to remember important versions. Search, list, show, and the read-only UI now expose more of that structure.&lt;/p&gt;

&lt;p&gt;The result is still intentionally modest.&lt;/p&gt;

&lt;p&gt;A local SQLite database. A CLI. A read-only inspection UI. Deterministic exports and reviewable history where possible.&lt;/p&gt;

&lt;p&gt;But the day-to-day workflow feels better now.&lt;/p&gt;

&lt;p&gt;And for a tool like this, that matters.&lt;/p&gt;

&lt;p&gt;Because prompt version control only becomes useful when it is easy enough to use consistently.&lt;/p&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;p&gt;PyPI: &lt;a href="https://pypi.org/project/promptledger/" rel="noopener noreferrer"&gt;PyPI&lt;/a&gt;&lt;br&gt;
GitHub: &lt;a href="https://github.com/Ertugrulmutlu/promptledger" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;br&gt;
LinkedIn: &lt;a href="https://www.linkedin.com/in/ertugrul-mutlu/?locale=en" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;br&gt;
Website: &lt;a href="https://ertugrulmutlu.github.io" rel="noopener noreferrer"&gt;Website&lt;/a&gt;&lt;/p&gt;

</description>
      <category>promptengineering</category>
      <category>python</category>
      <category>opensource</category>
      <category>showdev</category>
    </item>
    <item>
      <title>PromptLedger v0.3 — Turning prompt history into a practical review workflow.</title>
      <dc:creator>Ertugrul</dc:creator>
      <pubDate>Sat, 28 Mar 2026 13:37:59 +0000</pubDate>
      <link>https://forem.com/ertugrulmutlu/promptledger-v03-labels-status-and-better-diffs-1ong</link>
      <guid>https://forem.com/ertugrulmutlu/promptledger-v03-labels-status-and-better-diffs-1ong</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Devlog — Part 3&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Turning prompt history into a practical review workflow.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;In Part 1, I introduced &lt;strong&gt;PromptLedger&lt;/strong&gt; as a deliberately small, local-first tool for treating prompts like code.&lt;/p&gt;

&lt;p&gt;In Part 2, I added &lt;strong&gt;release semantics&lt;/strong&gt;: labels, label history, and status views that made it easier to answer questions like &lt;em&gt;what is in production right now?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;With &lt;strong&gt;v0.3&lt;/strong&gt;, the next question became harder:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Even if I can diff two prompt versions, can I review them in a way that feels closer to a real release workflow?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That is the focus of this release.&lt;/p&gt;

&lt;p&gt;PromptLedger v0.3 adds a small but practical &lt;strong&gt;Prompt Review&lt;/strong&gt; layer on top of the existing history model — while still staying local-first, SQLite-backed, and intentionally limited in scope.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why a third part?
&lt;/h2&gt;

&lt;p&gt;After the release semantics work in v0.2, the project could already answer questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which prompt does &lt;code&gt;prod&lt;/code&gt; currently point to?&lt;/li&gt;
&lt;li&gt;When was that label changed?&lt;/li&gt;
&lt;li&gt;How does &lt;code&gt;prod&lt;/code&gt; differ from &lt;code&gt;staging&lt;/code&gt;?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But another gap became obvious.&lt;/p&gt;

&lt;p&gt;A raw diff is useful, but in practice people often want a slightly higher-level review:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Did the prompt become stricter?&lt;/li&gt;
&lt;li&gt;Did the tone change?&lt;/li&gt;
&lt;li&gt;Was the output format changed from bullets to JSON?&lt;/li&gt;
&lt;li&gt;Did safety or refusal wording get stronger or weaker?&lt;/li&gt;
&lt;li&gt;Is this a release change or a likely regression risk?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those are not execution questions. They are not observability questions either.&lt;/p&gt;

&lt;p&gt;They are &lt;strong&gt;review questions&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;So instead of adding prompt execution, external APIs, or any hosted layer, I kept the project focused and added a &lt;strong&gt;review workflow built entirely on top of the existing local data&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The main addition: &lt;code&gt;review&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;The new command is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;promptledger review &lt;span class="nt"&gt;--id&lt;/span&gt; onboarding &lt;span class="nt"&gt;--from&lt;/span&gt; prod &lt;span class="nt"&gt;--to&lt;/span&gt; staging
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This compares two refs — versions or labels — and produces a structured review output that includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;resolved refs and versions&lt;/li&gt;
&lt;li&gt;a semantic summary&lt;/li&gt;
&lt;li&gt;metadata changes&lt;/li&gt;
&lt;li&gt;label context&lt;/li&gt;
&lt;li&gt;warning flags&lt;/li&gt;
&lt;li&gt;a few conservative notes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is deliberately not an evaluation system. It does not score prompts. It does not call a model. It does not guess too much.&lt;/p&gt;

&lt;p&gt;It simply makes a prompt diff easier to interpret.&lt;/p&gt;




&lt;h2&gt;
  
  
  From line diff to semantic summary
&lt;/h2&gt;

&lt;p&gt;Traditional diffs are still useful, and PromptLedger keeps all previous diff modes.&lt;/p&gt;

&lt;p&gt;But v0.3 adds a new summary-oriented mode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;promptledger diff &lt;span class="nt"&gt;--id&lt;/span&gt; onboarding &lt;span class="nt"&gt;--from&lt;/span&gt; 7 &lt;span class="nt"&gt;--to&lt;/span&gt; 9 &lt;span class="nt"&gt;--mode&lt;/span&gt; summary
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This produces a &lt;strong&gt;heuristic, rule-based semantic summary&lt;/strong&gt; instead of a raw line diff.&lt;/p&gt;

&lt;p&gt;The important design decision here is that the summary is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;local&lt;/li&gt;
&lt;li&gt;deterministic&lt;/li&gt;
&lt;li&gt;transparent&lt;/li&gt;
&lt;li&gt;intentionally conservative&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In other words: it only says something when the change looks clear enough.&lt;/p&gt;

&lt;p&gt;Current summary categories include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;tone changes&lt;/li&gt;
&lt;li&gt;tighter or looser constraints&lt;/li&gt;
&lt;li&gt;output format changes&lt;/li&gt;
&lt;li&gt;broader vs more specific prompts&lt;/li&gt;
&lt;li&gt;safety wording changes&lt;/li&gt;
&lt;li&gt;length requirement changes&lt;/li&gt;
&lt;li&gt;refusal or policy wording changes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is not meant to replace reading the actual prompt.&lt;br&gt;
It is meant to make review faster and more structured.&lt;/p&gt;


&lt;h2&gt;
  
  
  Why heuristics instead of an LLM?
&lt;/h2&gt;

&lt;p&gt;Because using an external model for review would push the project in exactly the wrong direction.&lt;/p&gt;

&lt;p&gt;It would introduce:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;network dependence&lt;/li&gt;
&lt;li&gt;nondeterministic behavior&lt;/li&gt;
&lt;li&gt;more configuration&lt;/li&gt;
&lt;li&gt;harder testing&lt;/li&gt;
&lt;li&gt;less trust in the output&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;PromptLedger is supposed to be inspectable.&lt;br&gt;
If it says &lt;em&gt;“constraints tightened”&lt;/em&gt;, that should come from understandable rules, not hidden inference.&lt;/p&gt;

&lt;p&gt;That made a heuristic system the better fit.&lt;/p&gt;

&lt;p&gt;It is not as flexible as an LLM-based reviewer, but it is much easier to reason about — and much more aligned with the philosophy of the project.&lt;/p&gt;


&lt;h2&gt;
  
  
  Reviews now export cleanly to markdown
&lt;/h2&gt;

&lt;p&gt;Another practical gap in earlier versions was sharing review output.&lt;/p&gt;

&lt;p&gt;Reading a diff in the terminal is fine.&lt;br&gt;
Sharing it in a PR, issue, or internal document is another matter.&lt;/p&gt;

&lt;p&gt;So v0.3 adds markdown export for reviews:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;promptledger &lt;span class="nb"&gt;export &lt;/span&gt;review &lt;span class="nt"&gt;--id&lt;/span&gt; onboarding &lt;span class="nt"&gt;--from&lt;/span&gt; prod &lt;span class="nt"&gt;--to&lt;/span&gt; staging &lt;span class="nt"&gt;--format&lt;/span&gt; md &lt;span class="nt"&gt;--out&lt;/span&gt; review.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The exported markdown is deterministic and structured.&lt;br&gt;
It includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a title&lt;/li&gt;
&lt;li&gt;compared refs&lt;/li&gt;
&lt;li&gt;semantic summary&lt;/li&gt;
&lt;li&gt;text diff note&lt;/li&gt;
&lt;li&gt;metadata changes&lt;/li&gt;
&lt;li&gt;warnings&lt;/li&gt;
&lt;li&gt;label information&lt;/li&gt;
&lt;li&gt;a reviewer notes placeholder&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That makes PromptLedger more useful in real workflows without adding any collaboration backend.&lt;/p&gt;

&lt;p&gt;The file is still just a file.&lt;br&gt;
You can paste it into GitHub, attach it to docs, or keep it locally.&lt;/p&gt;




&lt;h2&gt;
  
  
  Metadata changes are now first-class in reviews
&lt;/h2&gt;

&lt;p&gt;Prompt text is only part of the story.&lt;/p&gt;

&lt;p&gt;A release change may also involve metadata updates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;reason&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;author&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tags&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;env&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;metrics&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Earlier versions could already diff metadata, but v0.3 makes metadata changes part of the review object itself.&lt;/p&gt;

&lt;p&gt;That matters because some changes are &lt;strong&gt;metadata-only&lt;/strong&gt;.&lt;br&gt;
In those cases, PromptLedger can now say that clearly instead of pretending there was meaningful prompt drift.&lt;/p&gt;

&lt;p&gt;This is a small feature, but an important one.&lt;br&gt;
It avoids overclaiming, which is one of the easiest ways to make a review tool feel unreliable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Warning flags and likely drift hotspots
&lt;/h2&gt;

&lt;p&gt;Prompt review is not just about summarizing what changed.&lt;br&gt;
It is also about drawing attention to changes that deserve extra care.&lt;/p&gt;

&lt;p&gt;v0.3 adds simple warning flags for cases such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;comparing the same version to itself&lt;/li&gt;
&lt;li&gt;environment changes&lt;/li&gt;
&lt;li&gt;metadata-only changes&lt;/li&gt;
&lt;li&gt;policy or refusal wording changes that may affect behavior drift&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These warnings are not meant to be dramatic.&lt;br&gt;
They are meant to make the review output more useful in practice.&lt;/p&gt;

&lt;p&gt;For example, a wording change around refusal or safety does not automatically mean the prompt got worse — but it probably means a reviewer should read it more carefully.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Python API now returns structured review objects
&lt;/h2&gt;

&lt;p&gt;The review workflow is not just a CLI feature.&lt;/p&gt;

&lt;p&gt;The Python API now exposes review results as structured domain objects rather than just formatted strings.&lt;/p&gt;

&lt;p&gt;That means callers can programmatically access:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;resolved refs&lt;/li&gt;
&lt;li&gt;semantic summary items&lt;/li&gt;
&lt;li&gt;metadata changes&lt;/li&gt;
&lt;li&gt;warnings&lt;/li&gt;
&lt;li&gt;notes&lt;/li&gt;
&lt;li&gt;label context&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This keeps the CLI and the API aligned while also making formatting a separate concern.&lt;/p&gt;

&lt;p&gt;That separation turned out to be one of the cleaner changes in this version:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;review logic lives in one place&lt;/li&gt;
&lt;li&gt;rendering logic lives elsewhere&lt;/li&gt;
&lt;li&gt;markdown export and terminal rendering are both built on the same review result&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Small project, but still worth keeping modular.&lt;/p&gt;




&lt;h2&gt;
  
  
  UI update: review without write access
&lt;/h2&gt;

&lt;p&gt;The Streamlit UI is still read-only.&lt;br&gt;
That did not change.&lt;/p&gt;

&lt;p&gt;What changed is that the comparison view now surfaces review information more clearly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;semantic summary&lt;/li&gt;
&lt;li&gt;warnings&lt;/li&gt;
&lt;li&gt;metadata diff&lt;/li&gt;
&lt;li&gt;side-by-side prompt comparison&lt;/li&gt;
&lt;li&gt;line diff&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This keeps the UI aligned with the CLI review flow without turning it into an editor.&lt;/p&gt;

&lt;p&gt;That constraint still matters.&lt;br&gt;
The UI is there to inspect history, not to mutate it.&lt;/p&gt;




&lt;h2&gt;
  
  
  What did &lt;em&gt;not&lt;/em&gt; change
&lt;/h2&gt;

&lt;p&gt;Just as important as the new features is what was left out.&lt;/p&gt;

&lt;p&gt;v0.3 does &lt;strong&gt;not&lt;/strong&gt; add:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a hosted registry&lt;/li&gt;
&lt;li&gt;prompt execution APIs&lt;/li&gt;
&lt;li&gt;agent tooling&lt;/li&gt;
&lt;li&gt;telemetry pipelines&lt;/li&gt;
&lt;li&gt;tracing dashboards&lt;/li&gt;
&lt;li&gt;cloud sync&lt;/li&gt;
&lt;li&gt;automatic scoring&lt;/li&gt;
&lt;li&gt;evaluation harnesses&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are already plenty of tools going in those directions.&lt;/p&gt;

&lt;p&gt;PromptLedger is still trying to do one narrower thing well:&lt;br&gt;
&lt;strong&gt;store, compare, review, and export prompt changes locally.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  No schema expansion was needed
&lt;/h2&gt;

&lt;p&gt;One part of this release that I particularly liked: the review workflow did not require turning the database into something more complicated.&lt;/p&gt;

&lt;p&gt;SQLite remains the single source of truth.&lt;br&gt;
The review layer is generated from existing prompt versions, labels, and metadata.&lt;/p&gt;

&lt;p&gt;That kept the implementation smaller and the migration story simpler.&lt;/p&gt;

&lt;p&gt;Not every useful feature needs a bigger schema.&lt;br&gt;
Sometimes the better move is to extract more value from the structure that is already there.&lt;/p&gt;




&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;v0.3 did not try to make PromptLedger smarter in a flashy way.&lt;br&gt;
It tried to make it &lt;strong&gt;more reviewable&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The result is still a local tool.&lt;br&gt;
Still inspectable.&lt;br&gt;
Still deterministic where possible.&lt;br&gt;
Still intentionally limited.&lt;/p&gt;

&lt;p&gt;But now it is easier to answer a more realistic question:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Not just “what changed?” — but “how should I review this change before I move it forward?”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That is a better place for the project to be.&lt;/p&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;p&gt;PyPI: &lt;a href="https://pypi.org/project/promptledger/" rel="noopener noreferrer"&gt;PyPI&lt;/a&gt;&lt;br&gt;
GitHub: &lt;a href="https://github.com/Ertugrulmutlu/promptledger" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;br&gt;
LinkedIn: &lt;a href="https://www.linkedin.com/in/ertugrul-mutlu/?locale=en" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;br&gt;
Website: &lt;a href="https://ertugrulmutlu.github.io" rel="noopener noreferrer"&gt;Website&lt;/a&gt;&lt;/p&gt;

</description>
      <category>promptengineering</category>
      <category>python</category>
      <category>opensource</category>
      <category>showdev</category>
    </item>
    <item>
      <title>DataLens: A Read-Only Image Dataset Sanity Checker</title>
      <dc:creator>Ertugrul</dc:creator>
      <pubDate>Tue, 20 Jan 2026 13:43:50 +0000</pubDate>
      <link>https://forem.com/ertugrulmutlu/datalens-a-read-only-image-dataset-sanity-checker-2do0</link>
      <guid>https://forem.com/ertugrulmutlu/datalens-a-read-only-image-dataset-sanity-checker-2do0</guid>
      <description>&lt;h1&gt;
  
  
  DataLens: A Read‑Only Image Dataset Sanity Checker
&lt;/h1&gt;

&lt;p&gt;Training a model rarely fails loudly.&lt;/p&gt;

&lt;p&gt;Most of the time, it &lt;em&gt;kind of works&lt;/em&gt; — loss decreases, accuracy moves, but the results feel unstable, brittle, or just wrong.&lt;/p&gt;

&lt;p&gt;In my experience, when that happens, the root cause is often not the model, but the &lt;strong&gt;dataset&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;So I built &lt;strong&gt;DataLens&lt;/strong&gt;: a lightweight, read‑only sanity checker for image datasets.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem: Silent Dataset Failures
&lt;/h2&gt;

&lt;p&gt;Before training even starts, datasets often contain issues like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Corrupted or unreadable images&lt;/li&gt;
&lt;li&gt;Duplicate or near‑duplicate samples&lt;/li&gt;
&lt;li&gt;Broken CSV → image mappings&lt;/li&gt;
&lt;li&gt;Large numbers of orphan images&lt;/li&gt;
&lt;li&gt;Severe class imbalance&lt;/li&gt;
&lt;li&gt;Extremely small images or extreme aspect ratios&lt;/li&gt;
&lt;li&gt;Mixed image modes (RGB / RGBA / grayscale)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these necessarily crash training.&lt;br&gt;
They just quietly degrade everything downstream.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Read‑Only Matters
&lt;/h2&gt;

&lt;p&gt;Many tools try to &lt;strong&gt;auto‑fix&lt;/strong&gt; datasets.&lt;/p&gt;

&lt;p&gt;I deliberately didn’t.&lt;/p&gt;

&lt;p&gt;DataLens follows a simple rule:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Inspect, don’t mutate.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;No files are moved&lt;/li&gt;
&lt;li&gt;No labels are rewritten&lt;/li&gt;
&lt;li&gt;No assumptions are silently applied&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The tool’s job is to surface problems clearly, so &lt;em&gt;you&lt;/em&gt; can decide what to do next.&lt;/p&gt;




&lt;h2&gt;
  
  
  What DataLens Does
&lt;/h2&gt;

&lt;p&gt;DataLens is a Streamlit‑based audit tool with two modes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mode A — Images Only
&lt;/h3&gt;

&lt;p&gt;For raw image folders:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Recursively scans images&lt;/li&gt;
&lt;li&gt;Detects corrupted files&lt;/li&gt;
&lt;li&gt;Finds exact and near‑duplicate images&lt;/li&gt;
&lt;li&gt;Optionally infers classes from subfolders&lt;/li&gt;
&lt;li&gt;Correctly handles &lt;strong&gt;unlabeled datasets&lt;/strong&gt; (no fake “missing label” warnings)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Mode B — Images + Labels (CSV)
&lt;/h3&gt;

&lt;p&gt;For supervised datasets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Robust CSV reading (UTF‑8 with fallback)&lt;/li&gt;
&lt;li&gt;Automatic filename &amp;amp; label column detection&lt;/li&gt;
&lt;li&gt;Support for IDs without file extensions&lt;/li&gt;
&lt;li&gt;Optional label normalization&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Coverage analysis:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;How many CSV rows actually resolve to images?&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Orphan analysis:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;How many images are never referenced by the CSV?&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;




&lt;h2&gt;
  
  
  Duplicate Detection That Actually Helps
&lt;/h2&gt;

&lt;p&gt;Exact duplicates are easy.&lt;br&gt;
Near‑duplicates are not.&lt;/p&gt;

&lt;p&gt;DataLens supports three methods:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;sha256&lt;/strong&gt; — byte‑exact duplicates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;quick hash&lt;/strong&gt; — fast approximation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;pHash&lt;/strong&gt; — visually similar images (resized, recompressed, slightly cropped)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is especially useful for datasets collected via scraping or merging multiple sources.&lt;/p&gt;




&lt;h2&gt;
  
  
  Data Hygiene Warnings
&lt;/h2&gt;

&lt;p&gt;Beyond basic checks, DataLens flags issues that usually show up &lt;em&gt;too late&lt;/em&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Very small images (e.g. &amp;lt;64px)&lt;/li&gt;
&lt;li&gt;Extreme aspect ratios&lt;/li&gt;
&lt;li&gt;High RGBA share (alpha channel surprises)&lt;/li&gt;
&lt;li&gt;High image mode variance&lt;/li&gt;
&lt;li&gt;Extension mismatches between CSV references and actual files&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are the kinds of things that quietly break training pipelines or augmentations.&lt;/p&gt;




&lt;h2&gt;
  
  
  Outputs You Can Share
&lt;/h2&gt;

&lt;p&gt;After a run, DataLens produces:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An &lt;strong&gt;interactive dashboard&lt;/strong&gt; (Streamlit)&lt;/li&gt;
&lt;li&gt;A deterministic &lt;code&gt;dataset_report.md&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;An &lt;code&gt;issues.csv&lt;/code&gt; containing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;missing images&lt;/li&gt;
&lt;li&gt;orphan images&lt;/li&gt;
&lt;li&gt;corrupted files&lt;/li&gt;
&lt;li&gt;duplicate groups&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;The report is designed to be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;commit‑friendly&lt;/li&gt;
&lt;li&gt;reviewable&lt;/li&gt;
&lt;li&gt;attachable to issues or PRs&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Design Principles
&lt;/h2&gt;

&lt;p&gt;I kept the scope intentionally tight:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Read‑only&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Deterministic&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Transparent&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Pre‑training focused&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is not a dataset cleaning tool.&lt;br&gt;
It’s a &lt;strong&gt;dataset inspection tool&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  When I Use DataLens
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Before starting any new training run&lt;/li&gt;
&lt;li&gt;When receiving datasets from external sources&lt;/li&gt;
&lt;li&gt;When debugging unstable or suspicious training behavior&lt;/li&gt;
&lt;li&gt;As a lightweight QA step before investing GPU hours&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Final Thought
&lt;/h2&gt;

&lt;p&gt;We spend a lot of time tuning models.&lt;/p&gt;

&lt;p&gt;But models are only as good as the data we feed them — and bad data usually doesn’t announce itself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DataLens is my way of making datasets talk.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/Ertugrulmutlu/DataLens" rel="noopener noreferrer"&gt;https://github.com/Ertugrulmutlu/DataLens&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LinkedIn:&lt;/strong&gt; &lt;a href="https://www.linkedin.com/in/ertugrul-mutlu/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/ertugrul-mutlu/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Personal Website:&lt;/strong&gt; &lt;a href="https://ertugrulmutlu.github.io" rel="noopener noreferrer"&gt;https://ertugrulmutlu.github.io&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>machinelearning</category>
      <category>showdev</category>
      <category>testing</category>
      <category>tooling</category>
    </item>
    <item>
      <title>PromptLedger v0.2 — Labels, Status, and Better Diffs</title>
      <dc:creator>Ertugrul</dc:creator>
      <pubDate>Tue, 13 Jan 2026 14:44:54 +0000</pubDate>
      <link>https://forem.com/ertugrulmutlu/promptledger-v02-labels-status-and-better-diffs-137j</link>
      <guid>https://forem.com/ertugrulmutlu/promptledger-v02-labels-status-and-better-diffs-137j</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Devlog — Part 2&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What changed after the first release, and why those changes matter in practice.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;In the first part, I introduced &lt;strong&gt;PromptLedger&lt;/strong&gt; as a deliberately small, local-first tool for treating prompts like code.&lt;/p&gt;

&lt;p&gt;Since then, the project has moved from a minimal prompt history tracker to something closer to a &lt;strong&gt;prompt release ledger&lt;/strong&gt; — without adding servers, agents, or execution layers.&lt;/p&gt;

&lt;p&gt;This post covers what changed in &lt;strong&gt;v0.2&lt;/strong&gt;, why those changes exist, and how they affect real prompt workflows.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why a second part?
&lt;/h2&gt;

&lt;p&gt;After publishing Part 1, the most common follow-up questions were:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Which prompt is actually in production right now?&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;How do I compare prod vs staging without remembering version numbers?&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Can I see when a release pointer changed?&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Answering these questions required &lt;strong&gt;release semantics&lt;/strong&gt;, not more prompt editing features.&lt;/p&gt;

&lt;p&gt;That is the focus of v0.2.&lt;/p&gt;




&lt;h2&gt;
  
  
  Label history: prompts need an audit trail
&lt;/h2&gt;

&lt;p&gt;In v0.1, labels existed as movable pointers, similar to git tags. However, once a label moved, its previous state was lost.&lt;/p&gt;

&lt;p&gt;In v0.2, &lt;strong&gt;every label update is recorded in an append-only history log&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This means you can now answer questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What prompt was in production yesterday?&lt;/li&gt;
&lt;li&gt;When did &lt;code&gt;prod&lt;/code&gt; move from version 7 to version 9?
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;promptledger label &lt;span class="nb"&gt;history&lt;/span&gt; &lt;span class="nt"&gt;--id&lt;/span&gt; onboarding
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Under the hood, label history is implemented as a separate &lt;code&gt;label_events&lt;/code&gt; table. Label pointers remain mutable, but the history is immutable.&lt;/p&gt;

&lt;p&gt;This keeps the system simple while adding real auditability.&lt;/p&gt;




&lt;h2&gt;
  
  
  Label-based diff: stop thinking in version numbers
&lt;/h2&gt;

&lt;p&gt;Version numbers are great for storage, but humans think in environments.&lt;/p&gt;

&lt;p&gt;v0.2 allows diffs to be expressed in terms of &lt;strong&gt;labels&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;promptledger diff &lt;span class="nt"&gt;--id&lt;/span&gt; onboarding &lt;span class="nt"&gt;--from&lt;/span&gt; prod &lt;span class="nt"&gt;--to&lt;/span&gt; staging
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This resolves labels to their underlying versions and performs a normal diff.&lt;/p&gt;

&lt;p&gt;The important detail is that &lt;strong&gt;nothing new is stored&lt;/strong&gt;. Labels are only references; diffs always operate on immutable prompt versions.&lt;/p&gt;




&lt;h2&gt;
  
  
  Status command: a one-line overview
&lt;/h2&gt;

&lt;p&gt;Another recurring problem was simply understanding the current state of prompts.&lt;/p&gt;

&lt;p&gt;The new &lt;code&gt;status&lt;/code&gt; command provides a concise, git-style overview:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;promptledger status
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For each prompt, it shows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The latest version number&lt;/li&gt;
&lt;li&gt;The timestamp of that version&lt;/li&gt;
&lt;li&gt;Active labels and the versions they point to&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is intentionally read-only and summary-focused. If you need details, you still drill down using &lt;code&gt;list&lt;/code&gt;, &lt;code&gt;show&lt;/code&gt;, or &lt;code&gt;diff&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Better diff modes
&lt;/h2&gt;

&lt;p&gt;Prompt changes are not always best reviewed line-by-line.&lt;/p&gt;

&lt;p&gt;v0.2 introduces multiple diff modes built on top of Python’s &lt;code&gt;difflib&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;unified&lt;/code&gt; — the default, git-style view&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;context&lt;/code&gt; — useful for wider structural changes&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ndiff&lt;/code&gt; — character-level insight for small edits&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;metadata&lt;/code&gt; — diff &lt;strong&gt;only&lt;/strong&gt; metadata (&lt;code&gt;reason&lt;/code&gt;, &lt;code&gt;tags&lt;/code&gt;, &lt;code&gt;env&lt;/code&gt;, &lt;code&gt;metrics&lt;/code&gt;)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;promptledger diff &lt;span class="nt"&gt;--id&lt;/span&gt; onboarding &lt;span class="nt"&gt;--from&lt;/span&gt; 1 &lt;span class="nt"&gt;--to&lt;/span&gt; 2 &lt;span class="nt"&gt;--mode&lt;/span&gt; metadata
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Metadata diffs are always rendered in unified format to keep them readable and deterministic.&lt;/p&gt;




&lt;h2&gt;
  
  
  UI update: history without write access
&lt;/h2&gt;

&lt;p&gt;The Streamlit UI remains strictly read-only, but v0.2 adds &lt;strong&gt;label history visibility&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;You can now:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Inspect prompt timelines&lt;/li&gt;
&lt;li&gt;See which labels point where&lt;/li&gt;
&lt;li&gt;Review label movements over time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All mutations still go through the CLI or Python API.&lt;/p&gt;

&lt;p&gt;This constraint is intentional: the UI is for inspection, not experimentation.&lt;/p&gt;




&lt;h2&gt;
  
  
  What did &lt;em&gt;not&lt;/em&gt; change
&lt;/h2&gt;

&lt;p&gt;Several things were deliberately left out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No prompt execution or playground&lt;/li&gt;
&lt;li&gt;No agent framework&lt;/li&gt;
&lt;li&gt;No cloud sync or backend service&lt;/li&gt;
&lt;li&gt;No automatic evaluation or scoring&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;PromptLedger is still a &lt;strong&gt;ledger&lt;/strong&gt;, not an environment.&lt;/p&gt;




&lt;h2&gt;
  
  
  Migration and compatibility
&lt;/h2&gt;

&lt;p&gt;v0.2 introduces a database schema migration, but it is fully backwards compatible.&lt;/p&gt;

&lt;p&gt;Existing databases are upgraded in place, with no data loss.&lt;/p&gt;

&lt;p&gt;The new tables add information, but do not change existing semantics.&lt;/p&gt;




&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;v0.2 did not make PromptLedger bigger — it made it clearer.&lt;/p&gt;

&lt;p&gt;By adding release semantics, auditability, and better inspection tools, prompts can now be reviewed and promoted with the same discipline as code.&lt;/p&gt;

&lt;p&gt;No servers. No dashboards. No magic.&lt;/p&gt;

&lt;p&gt;Just history, diffs, and intent — stored locally.&lt;/p&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;p&gt;Pypi : &lt;a href="https://pypi.org/project/promptledger/" rel="noopener noreferrer"&gt;Pypi&lt;/a&gt;&lt;br&gt;
Github : &lt;a href="https://github.com/Ertugrulmutlu/promptledger" rel="noopener noreferrer"&gt;Github&lt;/a&gt;&lt;br&gt;
My Linkedin : &lt;a href="https://www.linkedin.com/in/ertu%C4%9Frul-mutlu/" rel="noopener noreferrer"&gt;Linkedin&lt;/a&gt;&lt;br&gt;
My Website : &lt;a href="https://ertugrulmutlu.github.io" rel="noopener noreferrer"&gt;Website&lt;/a&gt;&lt;/p&gt;

</description>
      <category>promptengineering</category>
      <category>python</category>
      <category>opensource</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Hysteresis in Neural Networks — Part 1</title>
      <dc:creator>Ertugrul</dc:creator>
      <pubDate>Fri, 09 Jan 2026 14:09:14 +0000</pubDate>
      <link>https://forem.com/ertugrulmutlu/hysteresis-in-neural-networks-part-1-3c03</link>
      <guid>https://forem.com/ertugrulmutlu/hysteresis-in-neural-networks-part-1-3c03</guid>
      <description>&lt;h2&gt;
  
  
  Training Order Is Not Innocent
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;What if seeing the same data is not enough?&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A common, almost implicit assumption in machine learning is that &lt;strong&gt;if a model is trained on the same dataset with the same architecture and optimizer, it should end up learning essentially the same thing&lt;/strong&gt;. Training order is usually treated as an implementation detail — a convenience, not a defining factor.&lt;/p&gt;

&lt;p&gt;In this post, I show a simple but striking counterexample:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Even when two neural networks see exactly the same data, the &lt;em&gt;order&lt;/em&gt; in which the data is presented can determine what the model permanently remembers and what it completely forgets.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is the first part of a short series on &lt;em&gt;hysteresis in neural networks&lt;/em&gt;. Here, I focus only on observable behavior (accuracy and forgetting). In the next part, we will look inside the model and explain &lt;em&gt;why&lt;/em&gt; this happens geometrically.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Core Question
&lt;/h2&gt;

&lt;p&gt;Assume we have two datasets, &lt;strong&gt;A&lt;/strong&gt; and &lt;strong&gt;B&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;We train the same model in two different ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SAB&lt;/strong&gt;: first on A, then on B&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SBA&lt;/strong&gt;: first on B, then on A&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Crucially:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The architecture is identical&lt;/li&gt;
&lt;li&gt;The optimizer and hyperparameters are identical&lt;/li&gt;
&lt;li&gt;The random seed is fixed&lt;/li&gt;
&lt;li&gt;The union of the data is the same: A ∪ B&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The only difference is &lt;strong&gt;chronological order&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The standard intuition is that this should not matter.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This intuition is wrong.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Experimental Setup (Intentionally Simple)
&lt;/h2&gt;

&lt;p&gt;To avoid hiding effects behind complexity, I used a deliberately minimal setup.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dataset&lt;/strong&gt;: MNIST&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Split&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A = digits {0,1,2,3,4}&lt;/li&gt;
&lt;li&gt;B = digits {5,6,7,8,9}&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;strong&gt;Model&lt;/strong&gt;: small CNN + MLP head&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;strong&gt;Training&lt;/strong&gt;: 20 epochs total (10 per phase)&lt;/p&gt;&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Several things were &lt;em&gt;explicitly disabled&lt;/em&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No Batch Normalization&lt;/li&gt;
&lt;li&gt;No data augmentation&lt;/li&gt;
&lt;li&gt;Deterministic initialization (fixed seed)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This ensures that any difference we observe is not an artifact of randomness or regularization, but a consequence of training order alone.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Happens During Training?
&lt;/h2&gt;

&lt;p&gt;Let’s start with the &lt;strong&gt;SAB&lt;/strong&gt; scenario: the model learns A first, then B.&lt;/p&gt;

&lt;h3&gt;
  
  
  SAB: A → B
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Observed behavior&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;During the first phase, accuracy on A quickly rises to ~99%&lt;/li&gt;
&lt;li&gt;Accuracy on B remains near 0% (expected)&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;After switching to B:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Accuracy on B rises to ~99%&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Accuracy on A collapses to 0%&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  SBA: B → A
&lt;/h3&gt;

&lt;p&gt;The mirror experiment produces the mirror result:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Observed behavior&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;During the first phase, accuracy on B rises to ~99%&lt;/li&gt;
&lt;li&gt;Accuracy on A remains near 0%&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;After switching to A:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Accuracy on A rises to ~99%&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Accuracy on B collapses to 0%&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;




&lt;h2&gt;
  
  
  Accuracy Curves
&lt;/h2&gt;

&lt;h3&gt;
  
  
  SAB accuracy over epochs
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frp6w9v27qbcusko1htpz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frp6w9v27qbcusko1htpz.png" alt="SAB accuracy over epochs" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Description:&lt;/em&gt; Line plot showing &lt;code&gt;acc_A&lt;/code&gt;, &lt;code&gt;acc_B&lt;/code&gt;, and &lt;code&gt;acc_full&lt;/code&gt; across epochs. A vertical dashed line marks the phase transition (A → B). Accuracy on A drops sharply after the transition.&lt;/p&gt;




&lt;h3&gt;
  
  
  SBA accuracy over epochs
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff82nk9au3a6u99uwt4mf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff82nk9au3a6u99uwt4mf.png" alt="SBA accuracy over epochs" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Description:&lt;/em&gt; Same plot as above, but for SBA. Accuracy on B drops sharply after switching to A.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Subtle but Important Observation
&lt;/h2&gt;

&lt;p&gt;Despite the dramatic forgetting, &lt;strong&gt;overall accuracy on the full test set remains around ~50% in both cases&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This is not a contradiction.&lt;/p&gt;

&lt;p&gt;Each model performs extremely well on &lt;em&gt;half&lt;/em&gt; of the classes and completely fails on the other half. Aggregated metrics hide this asymmetry.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Two models can have similar overall accuracy while representing fundamentally different worlds internally.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Quantifying the Effect: Hysteresis Loss
&lt;/h2&gt;

&lt;p&gt;We can define a simple, order-dependent metric:&lt;/p&gt;

&lt;p&gt;[\mathcal{L}_{\text{hyst}}(A) = |\text{Acc}_A(SAB) - \text{Acc}_A(SBA)|]&lt;/p&gt;

&lt;p&gt;In this experiment:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Acc_A(SAB) ≈ 0.00&lt;/li&gt;
&lt;li&gt;Acc_A(SBA) ≈ 0.99&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This yields a hysteresis loss close to &lt;strong&gt;1.0&lt;/strong&gt;, i.e. the maximum possible difference.&lt;/p&gt;

&lt;p&gt;The same holds symmetrically for dataset B.&lt;/p&gt;




&lt;h3&gt;
  
  
  Hysteresis summary
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fniatonl4y3kwpf2iq8b5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fniatonl4y3kwpf2iq8b5.png" alt="Hysteresis summary" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Description:&lt;/em&gt; Bar chart showing absolute accuracy differences between SAB and SBA. &lt;br&gt;
Hysteresis for the individual subsets A and B is near-maximal, while hysteresis in the aggregate (full test accuracy) is close to zero and visually compressed due to the shared scale.&lt;/p&gt;

&lt;p&gt;This highlights a key point: global performance metrics can remain almost invariant, even when class-conditional behavior is maximally path-dependent.&lt;/p&gt;




&lt;h2&gt;
  
  
  What This Means
&lt;/h2&gt;

&lt;p&gt;This result shows that training order is not just an optimization detail.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The network does not converge to a single, order-independent solution&lt;/li&gt;
&lt;li&gt;Learning leaves &lt;em&gt;path-dependent traces&lt;/em&gt; in weight space&lt;/li&gt;
&lt;li&gt;Once the model commits to one subset, returning to a balanced representation is not trivial&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This behavior is strongly reminiscent of &lt;strong&gt;hysteresis in physical systems&lt;/strong&gt;, where the final state depends on the path taken, not only on the endpoint.&lt;/p&gt;




&lt;h2&gt;
  
  
  What This Post Does &lt;em&gt;Not&lt;/em&gt; Explain (Yet)
&lt;/h2&gt;

&lt;p&gt;This post only shows &lt;em&gt;that&lt;/em&gt; hysteresis exists.&lt;/p&gt;

&lt;p&gt;It does &lt;strong&gt;not&lt;/strong&gt; explain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Whether the two final models lie in different basins of attraction&lt;/li&gt;
&lt;li&gt;Whether their internal representations are aligned or incompatible&lt;/li&gt;
&lt;li&gt;Whether one can smoothly interpolate between them without loss spikes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These questions require looking at the &lt;strong&gt;geometry of the weight space&lt;/strong&gt;, not just accuracy curves.&lt;/p&gt;

&lt;p&gt;That is exactly what Part 2 will address.&lt;/p&gt;




&lt;h2&gt;
  
  
  Reproducibility
&lt;/h2&gt;

&lt;p&gt;All experiments were run with a fixed configuration and deterministic setup. The code used for this post (FAZ 1) is available on GitHub:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Ertugrulmutlu/hysteresis-neural-networks" rel="noopener noreferrer"&gt;Hysteresis in Neural Networks&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The geometric analysis (weight trajectories, representation similarity, interpolation barriers) will be released together with the next post.&lt;/p&gt;




&lt;h2&gt;
  
  
  Closing Thoughts
&lt;/h2&gt;

&lt;p&gt;If training order alone can erase entire subsets of knowledge, then:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Curriculum learning has hidden costs&lt;/li&gt;
&lt;li&gt;"Same data" does not imply "same model"&lt;/li&gt;
&lt;li&gt;Optimization is not just minimization — it is a &lt;em&gt;history-dependent process&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the next post, we will open the model and examine how these irreversible choices are encoded in the geometry of neural manifolds.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Part 2: Inside the Weight Space — Geometry of Hysteresis&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;p&gt;Linkedin: &lt;a href="https://www.linkedin.com/in/ertugrul-mutlu/" rel="noopener noreferrer"&gt;Linkedin&lt;/a&gt;&lt;br&gt;
Github: &lt;a href="https://github.com/Ertugrulmutlu" rel="noopener noreferrer"&gt;Github&lt;/a&gt;&lt;br&gt;
Website: &lt;a href="https://ertugrulmutlu.github.io" rel="noopener noreferrer"&gt;Website&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>computerscience</category>
      <category>deeplearning</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Deterministic Decision Making in Non-Deterministic Environments: Why all my projects fight the same problem</title>
      <dc:creator>Ertugrul</dc:creator>
      <pubDate>Sun, 04 Jan 2026 02:34:46 +0000</pubDate>
      <link>https://forem.com/ertugrulmutlu/deterministic-decision-making-in-non-deterministic-environments-why-all-my-projects-fight-the-same-c8p</link>
      <guid>https://forem.com/ertugrulmutlu/deterministic-decision-making-in-non-deterministic-environments-why-all-my-projects-fight-the-same-c8p</guid>
      <description>&lt;p&gt;The real world is not deterministic.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Data is &lt;em&gt;noisy&lt;/em&gt;.&lt;br&gt;
Sensors &lt;em&gt;drift&lt;/em&gt;.&lt;br&gt;
Timing &lt;em&gt;slips&lt;/em&gt;.&lt;br&gt;
Humans are &lt;em&gt;inconsistent&lt;/em&gt;.&lt;br&gt;
Systems change &lt;em&gt;quietly&lt;/em&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And yet, we expect systems to do one thing reliably:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;make decisions.&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Ideally, those decisions should be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;repeatable,&lt;/li&gt;
&lt;li&gt;explainable,&lt;/li&gt;
&lt;li&gt;auditable.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the question that connects all of my work:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;How can decision-making systems remain deterministic in a non-deterministic world?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is not a project list.&lt;br&gt;
It is a description of the problem I keep returning to.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Model Fallacy
&lt;/h2&gt;

&lt;p&gt;For a long time, I followed the standard recipe:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;more data&lt;/li&gt;
&lt;li&gt;larger models&lt;/li&gt;
&lt;li&gt;better metrics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Real systems challenged that belief.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The model can be correct.&lt;br&gt;
The system can still fail.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The root cause is often not the loss function,&lt;br&gt;
not the optimizer,&lt;br&gt;
not the architecture.&lt;/p&gt;

&lt;p&gt;It is the &lt;strong&gt;system design&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Hidden state.&lt;br&gt;
Implicit dependencies.&lt;br&gt;
Untracked changes.&lt;br&gt;
Unmeasured timing effects.&lt;/p&gt;

&lt;p&gt;At that point, one thing became clear:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Decision-making is not only an ML problem.&lt;br&gt;
&lt;strong&gt;It is a systems problem.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  One Question, Different Layers
&lt;/h2&gt;

&lt;p&gt;That realization forced me to stop thinking in terms of single projects.&lt;/p&gt;

&lt;p&gt;Instead, I began exploring the same question across different layers of the stack —&lt;br&gt;
not to accumulate work,&lt;br&gt;
but to expose failure modes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Same Question, Different Layers
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Example Projects&lt;/th&gt;
&lt;th&gt;What It Revealed&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Prompt / Decision Logic&lt;/td&gt;
&lt;td&gt;PromptLedger&lt;/td&gt;
&lt;td&gt;If prompts are not versioned like code, decision logic becomes unreliable.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;System Design&lt;/td&gt;
&lt;td&gt;Bad Decision System&lt;/td&gt;
&lt;td&gt;Determinism does not come from the model, but from design discipline.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Perception → Action&lt;/td&gt;
&lt;td&gt;JumpNet (real-time latency failures)&lt;/td&gt;
&lt;td&gt;A correct decision made at the wrong time is still wrong.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hardware / Edge&lt;/td&gt;
&lt;td&gt;Pico Trend Alarm, Sound Classifier, Mini SCADA&lt;/td&gt;
&lt;td&gt;Noisy sensors and tight resources force explicit FSMs and hysteresis.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data Control&lt;/td&gt;
&lt;td&gt;Custom data collectors, viewers, logging tools&lt;/td&gt;
&lt;td&gt;Data you don’t control will control system behavior.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Decision Logic: The Prompt Layer
&lt;/h2&gt;

&lt;p&gt;While building agentic workflows, I noticed a recurring issue:&lt;/p&gt;

&lt;p&gt;Prompts behave like executable logic,&lt;br&gt;
but are rarely treated as such.&lt;/p&gt;

&lt;p&gt;Which prompt is in production?&lt;br&gt;
Why did it change?&lt;br&gt;
What behavior shifted as a result?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/ertugrulmutlu/promptledger-local-first-prompt-version-control-2pk0"&gt;&lt;strong&gt;PromptLedger&lt;/strong&gt;&lt;/a&gt; emerged from this gap.&lt;/p&gt;

&lt;p&gt;Not as a prompt playground,&lt;br&gt;
but as an attempt to make decision logic inspectable, versioned, and deterministic.&lt;/p&gt;

&lt;p&gt;If the logic that guides decisions is unstable,&lt;br&gt;
the system itself cannot be trusted.&lt;/p&gt;




&lt;h2&gt;
  
  
  System Design: When Correct Models Fail
&lt;/h2&gt;

&lt;p&gt;To isolate system-level failure, I intentionally built a flawed decision pipeline.&lt;/p&gt;

&lt;p&gt;Same inputs.&lt;br&gt;
Same model.&lt;br&gt;
Different outputs across identical runs.&lt;/p&gt;

&lt;p&gt;The cause was not randomness in the model,&lt;br&gt;
but global state, side effects, and hidden coupling.&lt;/p&gt;

&lt;p&gt;That experiment reinforced a core principle:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Determinism is a property of design,&lt;br&gt;
not of the model.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Perception → Decision → Action: Timing Matters
&lt;/h2&gt;

&lt;p&gt;In &lt;a href="https://dev.to/ertugrulmutlu/jumpnet-part-1-from-raw-gameplay-to-labeled-intelligence-building-the-data-foundation-for-2e2f"&gt;JumpNet&lt;/a&gt;, offline metrics looked nearly perfect.&lt;/p&gt;

&lt;p&gt;In real-time execution, the system failed.&lt;/p&gt;

&lt;p&gt;A 50 ms delay was enough to alter behavior.&lt;br&gt;
Minor prediction errors cascaded into visible mistakes.&lt;/p&gt;

&lt;p&gt;The lesson was unavoidable:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A correct decision made at the wrong time&lt;br&gt;
is still a wrong decision.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Determinism is about &lt;em&gt;when&lt;/em&gt; as much as &lt;em&gt;what&lt;/em&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Hardware: Constraints Shape Decisions
&lt;/h2&gt;

&lt;p&gt;Deploying models on embedded hardware exposes assumptions that rarely surface in simulation.&lt;/p&gt;

&lt;p&gt;Limited memory.&lt;br&gt;
Noisy sensors.&lt;br&gt;
Strict latency budgets.&lt;br&gt;
Restricted numerical precision.&lt;/p&gt;

&lt;p&gt;Models simplify.&lt;br&gt;
Control logic becomes explicit.&lt;br&gt;
FSMs and hysteresis replace soft heuristics.&lt;/p&gt;

&lt;p&gt;Here, determinism is not optional.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;It is required for the system to function at all.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Data: Control the Inputs or Lose the System
&lt;/h2&gt;

&lt;p&gt;When a system behaves unexpectedly, I start by inspecting the data.&lt;/p&gt;

&lt;p&gt;That led me to build custom:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/ertugrulmutlu/modular-snip-recorder-a-data-collection-tool-for-behavior-cloning-12-5di8"&gt;data collection tools&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/ertugrulmutlu/modular-snip-recorder-a-data-collection-tool-for-behavior-cloning-12-5di8"&gt;logging pipelines&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/ertugrulmutlu/modular-snip-recorder-a-data-collection-tool-for-behavior-cloning-12-5di8"&gt;dataset viewers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;synchronization and inspection utilities&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because one rule keeps repeating itself:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Uncontrolled data produces non-deterministic behavior.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  This Is Not a CV
&lt;/h2&gt;

&lt;p&gt;This text is not a list of skills or frameworks.&lt;/p&gt;

&lt;p&gt;It is a stance.&lt;/p&gt;

&lt;p&gt;I am not optimizing for larger models.&lt;br&gt;
I am not chasing more impressive demos.&lt;/p&gt;

&lt;p&gt;I am interested in one problem:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Reliable decision-making under real-world uncertainty.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Deterministic Decision Making in Non-Deterministic Environments&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is not a slogan.&lt;br&gt;
It is a filter.&lt;/p&gt;

&lt;p&gt;It determines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;which problems I choose to work on,&lt;/li&gt;
&lt;li&gt;which ones I deliberately avoid,&lt;/li&gt;
&lt;li&gt;and where I draw the line.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why the work spans multiple domains.&lt;/p&gt;

&lt;p&gt;They all wrestle with the same question.&lt;/p&gt;




&lt;p&gt;If you care more about building &lt;strong&gt;reliable systems&lt;/strong&gt; than bigger models,&lt;br&gt;
and about understanding &lt;strong&gt;failure modes&lt;/strong&gt; rather than showcasing demos,&lt;br&gt;
you may find the following useful:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/ertugrulmutlu" rel="noopener noreferrer"&gt;https://github.com/ertugrulmutlu&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;dev.to series: &lt;a href="https://dev.to/ertugrulmutlu"&gt;https://dev.to/ertugrulmutlu&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Try PromptLedger: &lt;code&gt;pip install promptledger&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>machinelearning</category>
      <category>systemdesign</category>
      <category>edgeai</category>
      <category>engineeringphilosophy</category>
    </item>
    <item>
      <title>PromptLedger: Local-first prompt version control</title>
      <dc:creator>Ertugrul</dc:creator>
      <pubDate>Sat, 03 Jan 2026 21:18:11 +0000</pubDate>
      <link>https://forem.com/ertugrulmutlu/promptledger-local-first-prompt-version-control-2pk0</link>
      <guid>https://forem.com/ertugrulmutlu/promptledger-local-first-prompt-version-control-2pk0</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Treat prompts like code: version them locally, diff them, label releases, and inspect history — without any backend services.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feed01shndo6r2f06jnrl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feed01shndo6r2f06jnrl.png" alt=" " width="800" height="212"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;Prompt engineering has quietly become production work. Prompts evolve, regress, get tuned for edge cases, and eventually land in “prod”. Yet most teams still track them in scratch files, notebooks, or chat logs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PromptLedger&lt;/strong&gt; is a deliberately small tool that fixes this by treating prompts like code: every change is versioned, diffable, and labeled — all stored locally in a single SQLite database.&lt;/p&gt;

&lt;p&gt;This post is a technical deep dive into how PromptLedger works, what data it stores, and why its design is intentionally boring.&lt;/p&gt;




&lt;h2&gt;
  
  
  Design goals
&lt;/h2&gt;

&lt;p&gt;PromptLedger is built around a few strict constraints:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Local-first&lt;/strong&gt;: no server, no SaaS, no telemetry&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Single source of truth&lt;/strong&gt;: one SQLite file&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Git-aware&lt;/strong&gt;: works naturally inside repositories&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Read-only UI&lt;/strong&gt;: all writes go through the CLI/API&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deterministic output&lt;/strong&gt;: exports are stable and diffable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you are looking for an agent framework or a prompt playground, this is not it.&lt;/p&gt;




&lt;p&gt;tecture overview&lt;/p&gt;

&lt;p&gt;The codebase is intentionally small and split by responsibility:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;cli.py&lt;/code&gt; – argument parsing and user-facing commands&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;core.py&lt;/code&gt; – domain logic (&lt;code&gt;PromptLedger&lt;/code&gt; class)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;db.py&lt;/code&gt; – SQLite connection, schema, migrations&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ui.py&lt;/code&gt; – Streamlit-based read-only viewer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are no background services and no long-running processes. PromptLedger runs when you invoke it and exits.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnhf75zf0b2fdc843vlf2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnhf75zf0b2fdc843vlf2.png" alt="Architecture" width="316" height="306"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Storage and path resolution
&lt;/h2&gt;

&lt;p&gt;PromptLedger always stores data locally in &lt;code&gt;promptledger.db&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Resolution rules are deterministic:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Inside a git repo: &lt;code&gt;&amp;lt;repo_root&amp;gt;/.promptledger/promptledger.db&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Outside git: &lt;code&gt;&amp;lt;cwd&amp;gt;/.promptledger/promptledger.db&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Override with &lt;code&gt;PROMPTLEDGER_HOME=/custom/path&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Hard override via &lt;code&gt;PromptLedger(db_path="/abs/path/to.db")&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This avoids accidental duplication when running commands from nested directories and keeps prompt data out of git history by default.&lt;/p&gt;




&lt;h2&gt;
  
  
  SQLite schema
&lt;/h2&gt;

&lt;p&gt;Two tables make up the core of PromptLedger:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;prompt_versions&lt;/code&gt;: immutable prompt history&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;labels&lt;/code&gt;: mutable pointers to specific versions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each prompt version stores:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;prompt_id&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;version&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;content&lt;/code&gt; (the prompt text)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;content_hash&lt;/code&gt; (SHA-256)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;created_at&lt;/code&gt; (UTC ISO timestamp)&lt;/li&gt;
&lt;li&gt;optional metadata: &lt;code&gt;reason&lt;/code&gt;, &lt;code&gt;author&lt;/code&gt;, &lt;code&gt;tags&lt;/code&gt;, &lt;code&gt;env&lt;/code&gt;, &lt;code&gt;metrics&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Labels store:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;prompt_id&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;label&lt;/code&gt; (e.g. &lt;code&gt;prod&lt;/code&gt;, &lt;code&gt;staging&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;version&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;updated_at&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This separation lets you move release pointers without creating new versions.&lt;/p&gt;




&lt;h2&gt;
  
  
  Versioning algorithm
&lt;/h2&gt;

&lt;p&gt;When you add a prompt, PromptLedger:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Normalizes newlines (CRLF/CR → LF)&lt;/li&gt;
&lt;li&gt;Hashes the normalized content&lt;/li&gt;
&lt;li&gt;Fetches the latest version for that prompt&lt;/li&gt;
&lt;li&gt;Skips insertion if the hash matches (no-op)&lt;/li&gt;
&lt;li&gt;Otherwise inserts a new version with incremented number&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This keeps history clean and avoids formatting-only noise.&lt;/p&gt;




&lt;h2&gt;
  
  
  Newline normalization
&lt;/h2&gt;

&lt;p&gt;Cross-platform newline differences are a common source of useless diffs.&lt;/p&gt;

&lt;p&gt;PromptLedger normalizes line endings &lt;strong&gt;before hashing and diffing&lt;/strong&gt;, which means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Windows CRLF and Unix LF content are treated as identical&lt;/li&gt;
&lt;li&gt;Diff output focuses on real textual changes&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Metadata model
&lt;/h2&gt;

&lt;p&gt;PromptLedger supports lightweight metadata for each version:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;reason&lt;/code&gt; – why the prompt changed&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;author&lt;/code&gt; – who made the change&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;tags&lt;/code&gt; – arbitrary labels for grouping&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;env&lt;/code&gt; – &lt;code&gt;dev&lt;/code&gt;, &lt;code&gt;staging&lt;/code&gt;, &lt;code&gt;prod&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;metrics&lt;/code&gt; – JSON blob (accuracy, latency, cost, ratings)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This turns raw text history into something closer to an audit trail.&lt;/p&gt;




&lt;h2&gt;
  
  
  Labels: release-style pointers
&lt;/h2&gt;

&lt;p&gt;Labels are the feature that pushes PromptLedger beyond simple history tracking.&lt;/p&gt;

&lt;p&gt;Think of labels like git tags that move:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;promptledger label &lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;--id&lt;/span&gt; onboarding &lt;span class="nt"&gt;--version&lt;/span&gt; 7 &lt;span class="nt"&gt;--name&lt;/span&gt; prod
promptledger label &lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;--id&lt;/span&gt; onboarding &lt;span class="nt"&gt;--version&lt;/span&gt; 9 &lt;span class="nt"&gt;--name&lt;/span&gt; staging
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you can answer questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What prompt is currently in production?&lt;/li&gt;
&lt;li&gt;Which version was deployed last week?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without copying or duplicating prompt content.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fchzaj8fgrdrtvfmo4e7j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fchzaj8fgrdrtvfmo4e7j.png" alt=" " width="800" height="371"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  CLI workflow
&lt;/h2&gt;

&lt;p&gt;Core commands:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;init&lt;/code&gt; – create DB and &lt;code&gt;.gitignore&lt;/code&gt; entry&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;add&lt;/code&gt; – add or skip a version&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;list&lt;/code&gt; – list versions&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;show&lt;/code&gt; – show content + metadata&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;diff&lt;/code&gt; – unified diff between versions&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;search&lt;/code&gt; – content + metadata search&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;export&lt;/code&gt; – deterministic JSONL / CSV&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;label&lt;/code&gt; – manage release pointers&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ui&lt;/code&gt; – launch Streamlit viewer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A typical flow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;promptledger init
promptledger add &lt;span class="nt"&gt;--id&lt;/span&gt; demo &lt;span class="nt"&gt;--text&lt;/span&gt; &lt;span class="s2"&gt;"Hello"&lt;/span&gt;
promptledger add &lt;span class="nt"&gt;--id&lt;/span&gt; demo &lt;span class="nt"&gt;--text&lt;/span&gt; &lt;span class="s2"&gt;"Hello World"&lt;/span&gt;
promptledger diff &lt;span class="nt"&gt;--id&lt;/span&gt; demo &lt;span class="nt"&gt;--from&lt;/span&gt; 1 &lt;span class="nt"&gt;--to&lt;/span&gt; 2
promptledger label &lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;--id&lt;/span&gt; demo &lt;span class="nt"&gt;--version&lt;/span&gt; 2 &lt;span class="nt"&gt;--name&lt;/span&gt; prod
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Streamlit UI (read-only)
&lt;/h2&gt;

&lt;p&gt;The UI is intentionally non-destructive:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Timeline view of versions&lt;/li&gt;
&lt;li&gt;Filters by prompt id, tags, env&lt;/li&gt;
&lt;li&gt;Full content preview&lt;/li&gt;
&lt;li&gt;Unified diff and side-by-side comparison&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All writes remain in the CLI/API path.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft1iyyle0e2tnja7s9the.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft1iyyle0e2tnja7s9the.png" alt=" " width="800" height="472"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Export and determinism
&lt;/h2&gt;

&lt;p&gt;Exports are designed for reproducibility:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;JSONL uses sorted keys&lt;/li&gt;
&lt;li&gt;CSV has a fixed column order&lt;/li&gt;
&lt;li&gt;Repeated exports of the same data are byte-for-byte identical&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This makes PromptLedger suitable for audits, reviews, and downstream tooling.&lt;/p&gt;




&lt;h2&gt;
  
  
  Security notes
&lt;/h2&gt;

&lt;p&gt;PromptLedger never sends data anywhere.&lt;/p&gt;

&lt;p&gt;It includes a minimal warning for common secret patterns (&lt;code&gt;sk-&lt;/code&gt;, &lt;code&gt;AKIA&lt;/code&gt;, &lt;code&gt;-----BEGIN&lt;/code&gt;). This is advisory only and can be disabled. The responsibility remains with the user to avoid storing secrets in prompt text.&lt;/p&gt;




&lt;h2&gt;
  
  
  What PromptLedger is not
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Not an LLM framework&lt;/li&gt;
&lt;li&gt;Not an agent system&lt;/li&gt;
&lt;li&gt;Not a hosted service&lt;/li&gt;
&lt;li&gt;Not a playground&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It is a local ledger for prompt evolution.&lt;/p&gt;




&lt;h2&gt;
  
  
  Workflow for those interested
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7wch130vhnaxh663do9u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7wch130vhnaxh663do9u.png" alt=" " width="800" height="647"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;If your prompts matter enough to review, promote, and roll back, they matter enough to version properly.&lt;/p&gt;

&lt;p&gt;PromptLedger keeps that history local, inspectable, and boring — which is exactly the point.&lt;/p&gt;

&lt;h2&gt;
  
  
  Contact and Links
&lt;/h2&gt;

&lt;p&gt;Pypi : &lt;a href="https://pypi.org/project/promptledger/" rel="noopener noreferrer"&gt;Pypi&lt;/a&gt;&lt;br&gt;
Github : &lt;a href="https://github.com/Ertugrulmutlu/promptledger" rel="noopener noreferrer"&gt;Github&lt;/a&gt;&lt;br&gt;
My Linkedin : &lt;a href="https://www.linkedin.com/in/ertu%C4%9Frul-mutlu/" rel="noopener noreferrer"&gt;Linkedin&lt;/a&gt;&lt;br&gt;
My Website : &lt;a href="https://ertugrulmutlu.github.io" rel="noopener noreferrer"&gt;Website&lt;/a&gt;&lt;/p&gt;

</description>
      <category>promptengineering</category>
      <category>developertools</category>
      <category>python</category>
      <category>opensource</category>
    </item>
    <item>
      <title>I Intentionally Built a Bad Decision System (So You Don’t Have To)</title>
      <dc:creator>Ertugrul</dc:creator>
      <pubDate>Fri, 19 Dec 2025 11:00:00 +0000</pubDate>
      <link>https://forem.com/ertugrulmutlu/i-intentionally-built-a-bad-decision-system-so-you-dont-have-to-417j</link>
      <guid>https://forem.com/ertugrulmutlu/i-intentionally-built-a-bad-decision-system-so-you-dont-have-to-417j</guid>
      <description>&lt;h2&gt;
  
  
  A tiny benchmark that exposes silent failure modes in AI and ML pipelines
&lt;/h2&gt;

&lt;p&gt;Most AI blog posts show &lt;em&gt;best practices&lt;/em&gt;: clean architectures, neat abstractions, and impressive demos. I decided to do the opposite.&lt;/p&gt;

&lt;p&gt;I intentionally built a &lt;strong&gt;bad AI system&lt;/strong&gt; — one that &lt;em&gt;works&lt;/em&gt;, produces outputs, and even looks reasonable at first glance — and then compared it to a boring, well-designed version of the &lt;strong&gt;same pipeline&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The goal was not performance. The goal was to understand &lt;strong&gt;how systems fail silently&lt;/strong&gt; when design principles are ignored.&lt;/p&gt;




&lt;h2&gt;
  
  
  The task: same problem, two implementations
&lt;/h2&gt;

&lt;p&gt;Both systems solve the exact same problem:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Input text → extract keywords → compute a score → recommend an action&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The action space is deliberately small:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;WAIT_AND_SEE&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;BUY_MORE_STOCK&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;PANIC_REORDER&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Keeping the task simple allows us to focus entirely on &lt;strong&gt;system behavior&lt;/strong&gt;, not model quality.&lt;/p&gt;




&lt;h2&gt;
  
  
  The benchmark idea
&lt;/h2&gt;

&lt;p&gt;The benchmark is intentionally minimal:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Take a &lt;strong&gt;single, fixed input text&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Run it &lt;strong&gt;multiple times&lt;/strong&gt; through the system&lt;/li&gt;
&lt;li&gt;Observe whether the outputs stay stable&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Why this matters:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A system that only works once is not a system — it’s a coincidence.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If the same input produces different outputs, something is fundamentally wrong at the system level.&lt;/p&gt;




&lt;h2&gt;
  
  
  Benchmark results: BAD vs GOOD
&lt;/h2&gt;

&lt;p&gt;The following results were produced by running the &lt;strong&gt;same input&lt;/strong&gt; five times through both systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  BAD system output (excerpt)
&lt;/h3&gt;

&lt;p&gt;The BAD system gradually escalates its decisions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run 1 → score &lt;code&gt;14&lt;/code&gt;, action &lt;code&gt;WAIT_AND_SEE&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Run 3 → score &lt;code&gt;42&lt;/code&gt;, action &lt;code&gt;BUY_MORE_STOCK&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Run 5 → score &lt;code&gt;74&lt;/code&gt;, action &lt;code&gt;PANIC_REORDER&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Same input. Same keywords. Completely different decisions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Aggregated benchmark summary
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;BAD system&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Runs: 5&lt;/li&gt;
&lt;li&gt;Unique scores: 5&lt;/li&gt;
&lt;li&gt;Scores: &lt;code&gt;[14, 28, 42, 58, 74]&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Unique actions: 3&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;GOOD system&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Runs: 5&lt;/li&gt;
&lt;li&gt;Unique scores: 1&lt;/li&gt;
&lt;li&gt;Scores: &lt;code&gt;[14, 14, 14, 14, 14]&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Unique actions: 1&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The GOOD system behaves like a function. The BAD system behaves like a memory leak.&lt;/p&gt;




&lt;h2&gt;
  
  
  Failure Taxonomy: How the BAD System Breaks
&lt;/h2&gt;

&lt;p&gt;The bad system does not fail in a single obvious way. Instead, it exhibits &lt;strong&gt;multiple interacting failure modes&lt;/strong&gt; that are common in real-world AI and data systems. Naming these failure modes makes them easier to detect—and harder to accidentally ship.&lt;/p&gt;




&lt;h3&gt;
  
  
  1) Drift
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Definition:&lt;/strong&gt; The system’s output changes over time even when the input stays exactly the same.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Global score accumulation across runs&lt;/li&gt;
&lt;li&gt;State that grows monotonically without reset&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why this is dangerous:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Business logic mutates without any explicit change&lt;/li&gt;
&lt;li&gt;Historical execution order influences current decisions&lt;/li&gt;
&lt;li&gt;Monitoring dashboards often miss the problem because values remain “reasonable”&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Drift is especially dangerous because it looks like learning—but it isn’t.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  2) Non-determinism
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Definition:&lt;/strong&gt; Identical inputs produce different outputs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Random noise injected into scoring&lt;/li&gt;
&lt;li&gt;Implicit dependency on execution history&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why this is dangerous:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bugs cannot be reliably reproduced&lt;/li&gt;
&lt;li&gt;Test failures become flaky and untrustworthy&lt;/li&gt;
&lt;li&gt;A/B experiments lose statistical meaning&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;If you can’t reproduce a decision, you can’t debug it.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  3) Hidden State
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Definition:&lt;/strong&gt; Functions rely on data that is not visible in their interface or inputs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Global variables such as &lt;code&gt;CURRENT_SCORE&lt;/code&gt;, &lt;code&gt;LAST_TEXT&lt;/code&gt;, and &lt;code&gt;RUN_COUNT&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why this is dangerous:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Code cannot be understood locally&lt;/li&gt;
&lt;li&gt;Refactoring changes behavior in non-obvious ways&lt;/li&gt;
&lt;li&gt;New contributors unknowingly introduce regressions&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Hidden state turns every function call into a guessing game.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  4) Silent Corruption
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Definition:&lt;/strong&gt; The system continues to run without errors while its decisions become increasingly wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No explicit failure signals&lt;/li&gt;
&lt;li&gt;No invariants or sanity checks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why this is dangerous:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Incorrect outputs propagate downstream&lt;/li&gt;
&lt;li&gt;Problems surface only through business impact&lt;/li&gt;
&lt;li&gt;Rollbacks become difficult or impossible&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Loud failures get fixed. Silent failures get deployed.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Why This Taxonomy Matters
&lt;/h2&gt;

&lt;p&gt;These failure modes rarely appear in isolation. In the BAD system, they reinforce each other:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hidden state enables drift&lt;/li&gt;
&lt;li&gt;Drift amplifies non-determinism&lt;/li&gt;
&lt;li&gt;Non-determinism hides silent corruption&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Understanding these patterns is more valuable than fixing any single bug—because the same taxonomy applies to much larger and more complex AI systems.&lt;/p&gt;




&lt;h2&gt;
  
  
  A single metric: Stability Score
&lt;/h2&gt;

&lt;p&gt;To summarize system behavior, I used a single metric:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;stability_score = 1 - (unique_scores / runs)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;1.0&lt;/strong&gt; → perfectly stable&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;0.0&lt;/strong&gt; → completely unstable&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Stability results
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;BAD system → &lt;code&gt;0.0&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;GOOD system → &lt;code&gt;0.8&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This one number already tells you which system you can trust.&lt;/p&gt;




&lt;h2&gt;
  
  
  Minimal Fixes: Four Small Patches That Change Everything
&lt;/h2&gt;

&lt;p&gt;This is not a rewrite. These are &lt;strong&gt;surgical changes&lt;/strong&gt;. Each patch removes an entire &lt;em&gt;class&lt;/em&gt; of failure modes without introducing new abstractions or frameworks.&lt;/p&gt;




&lt;h3&gt;
  
  
  Patch 1 — Remove Global State
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Before (BAD):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# global mutation + history dependence
&lt;/span&gt;&lt;span class="n"&gt;GS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CURRENT_SCORE&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;base&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;GS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CURRENT_SCORE&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;After (GOOD):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;score_keywords&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;keywords&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;keywords&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;13&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What this fixes:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Eliminates score drift&lt;/li&gt;
&lt;li&gt;Removes hidden history dependence&lt;/li&gt;
&lt;li&gt;Makes the function deterministic and testable&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;A function that depends on global state is not a function — it’s a memory leak.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  Patch 2 — Push Side-Effects to the Boundaries
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Before (BAD):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_keywords&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Extracting keywords...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;log.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;After (GOOD):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_keywords&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;tokenize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)[:&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# side-effects handled explicitly at the edge
&lt;/span&gt;&lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Extracting keywords&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What this fixes:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Core logic becomes reusable&lt;/li&gt;
&lt;li&gt;Logging becomes configurable&lt;/li&gt;
&lt;li&gt;Unit testing becomes trivial&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Side-effects inside core logic silently infect everything upstream.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  Patch 3 — Make Dependencies Explicit
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Before (BAD):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;GS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LAST_TEXT&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;base&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;GS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LAST_TEXT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;13&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;After (GOOD):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;score_keywords&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;keywords&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;base&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;keywords&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;base&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What this fixes:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No hidden inputs&lt;/li&gt;
&lt;li&gt;Clear data flow&lt;/li&gt;
&lt;li&gt;Safe refactoring&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;If a dependency isn’t in the function signature, it’s a liability.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  Patch 4 — Name the Magic Numbers
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Before (BAD):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;action&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PANIC_REORDER&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;After (GOOD):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@dataclass&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;frozen&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;panic_threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;panic_threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;action&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PANIC_REORDER&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What this fixes:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Decisions become explainable&lt;/li&gt;
&lt;li&gt;Parameters become reviewable&lt;/li&gt;
&lt;li&gt;Behavior changes become intentional&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Magic numbers turn engineering decisions into superstition.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Summary
&lt;/h3&gt;

&lt;p&gt;These four patches:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Remove hidden state&lt;/li&gt;
&lt;li&gt;Eliminate non-determinism&lt;/li&gt;
&lt;li&gt;Make behavior explainable&lt;/li&gt;
&lt;li&gt;Restore trust in the system&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No agents. No frameworks. Just engineering discipline.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final takeaway
&lt;/h2&gt;

&lt;p&gt;The BAD system &lt;em&gt;works&lt;/em&gt;. That’s the problem.&lt;/p&gt;

&lt;p&gt;It fails in the most dangerous way possible: &lt;strong&gt;plausibly and quietly&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The GOOD system is boring, predictable, and easy to reason about — which is exactly what you want in production.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Working code is not the same as a working system.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  Code &amp;amp; Reproducibility
&lt;/h3&gt;

&lt;p&gt;All code used in this article — including the intentionally broken system, the clean implementation, and the benchmark — is available on GitHub:&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;&lt;a href="https://github.com/Ertugrulmutlu/I-Intentionally-Built-a-Bad-Decision-System-So-You-Don-t-Have-To" rel="noopener noreferrer"&gt;https://github.com/Ertugrulmutlu/I-Intentionally-Built-a-Bad-Decision-System-So-You-Don-t-Have-To&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you want to reproduce the results, run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python compare.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The benchmark will run the same input multiple times through both systems and show, in a few lines of output, why predictability matters more than flashy abstractions.&lt;/p&gt;

</description>
      <category>systemdesign</category>
      <category>architecture</category>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>How Do You Actually Optimize Agents? It Depends on the Task</title>
      <dc:creator>Ertugrul</dc:creator>
      <pubDate>Thu, 18 Dec 2025 18:28:14 +0000</pubDate>
      <link>https://forem.com/ertugrulmutlu/how-do-you-actually-optimize-agents-it-depends-on-the-task-2h0o</link>
      <guid>https://forem.com/ertugrulmutlu/how-do-you-actually-optimize-agents-it-depends-on-the-task-2h0o</guid>
      <description>&lt;p&gt;After my recent talk on Agent-in-the-Loop systems, I was asked a seemingly simple question: &lt;strong&gt;How do you optimize agents?&lt;/strong&gt;&lt;br&gt;
Link for Talk: &lt;a href="https://www.youtube.com/watch?v=HwCR59VuYn4&amp;amp;t=1888s" rel="noopener noreferrer"&gt;https://www.youtube.com/watch?v=HwCR59VuYn4&amp;amp;t=1888s&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At first glance, this sounds like a technical question. Many people expect a concrete answer involving prompt engineering, temperature tuning, or model selection. My response, however, was far less satisfying — but far more honest:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;It depends on the task.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This answer often feels like a cop-out. In reality, it reflects a deeper truth about agentic systems: &lt;strong&gt;you don’t optimize agents in isolation — you optimize the system they operate in.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Common Misconception: Optimization Means Tuning the Model
&lt;/h2&gt;

&lt;p&gt;When people talk about optimizing agents, they usually mean optimizing the underlying model. Adjust the prompt, lower the temperature, swap the model, and expect better behavior.&lt;/p&gt;

&lt;p&gt;These adjustments can help at the margins, but they rarely address the root cause of failure. That’s because an agent is not just a language model.&lt;/p&gt;

&lt;p&gt;An agent is a system composed of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a task definition&lt;/li&gt;
&lt;li&gt;an action space (what the agent is allowed to do)&lt;/li&gt;
&lt;li&gt;constraints and boundaries&lt;/li&gt;
&lt;li&gt;feedback and evaluation mechanisms&lt;/li&gt;
&lt;li&gt;stop and escalation conditions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If these components are poorly designed, no amount of prompt tuning will make the system reliable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Agent Optimization Is a Task Design Problem
&lt;/h2&gt;

&lt;p&gt;In practice, most agent failures are task design failures.&lt;/p&gt;

&lt;p&gt;Agents struggle when objectives are too broad, success criteria are vague, or responsibilities are overloaded. Instructions like &lt;em&gt;“do your best”&lt;/em&gt; or &lt;em&gt;“solve this end-to-end”&lt;/em&gt; leave too much room for interpretation and lead to unpredictable behavior.&lt;/p&gt;

&lt;p&gt;Consider the difference between these two prompts:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Poorly framed task:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Analyze this document and decide what to do."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This instruction hides multiple decisions inside a single step: analysis, prioritization, and action selection. The agent has no clear notion of success or failure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Well-framed task:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Summarize the document, estimate uncertainty, and escalate to a human if confidence falls below a defined threshold."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Here, the task is explicit, bounded, and testable. The agent’s role is clear, and human intervention is intentionally designed rather than left implicit.&lt;/p&gt;

&lt;p&gt;Optimizing an agent often means narrowing the task:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;defining what &lt;em&gt;success&lt;/em&gt; actually means&lt;/li&gt;
&lt;li&gt;specifying what the agent should &lt;strong&gt;not&lt;/strong&gt; do&lt;/li&gt;
&lt;li&gt;breaking complex goals into smaller, verifiable steps&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A well-framed task reduces the need for aggressive model-level optimization.&lt;/p&gt;




&lt;h2&gt;
  
  
  Feedback Loops Matter More Than Prompts
&lt;/h2&gt;

&lt;p&gt;Another common failure point is feedback design. Agents frequently evaluate their own outputs, but self-evaluation can be misleading or overly optimistic.&lt;/p&gt;

&lt;p&gt;Effective agent systems rely on feedback loops that are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;timely&lt;/li&gt;
&lt;li&gt;aligned with real objectives&lt;/li&gt;
&lt;li&gt;capable of triggering escalation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If feedback arrives too late or measures the wrong thing, the agent may appear functional while gradually drifting away from its intended behavior.&lt;/p&gt;

&lt;p&gt;Human involvement is most valuable here — not in validating every decision, but in designing how feedback is generated and when intervention is required.&lt;/p&gt;




&lt;h2&gt;
  
  
  Constraints Are Not a Limitation — They Are a Guide
&lt;/h2&gt;

&lt;p&gt;One of the most overlooked aspects of agent optimization is constraint design.&lt;/p&gt;

&lt;p&gt;Constraints define:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;which tools an agent can use&lt;/li&gt;
&lt;li&gt;how often it can retry&lt;/li&gt;
&lt;li&gt;how much context it can consume&lt;/li&gt;
&lt;li&gt;when it must stop or ask for help&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Rather than limiting performance, constraints provide structure. They prevent runaway behavior and make agent actions easier to reason about.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Constraints don’t weaken agents — they guide them.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Role of Humans in Optimized Agent Systems
&lt;/h2&gt;

&lt;p&gt;In optimized Agent-in-the-Loop systems, humans are not prompt engineers or micro-managers. Their role is to design the system boundaries and supervision mechanisms.&lt;/p&gt;

&lt;p&gt;Humans are best positioned to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;define goals and constraints&lt;/li&gt;
&lt;li&gt;decide which failures are acceptable&lt;/li&gt;
&lt;li&gt;interpret ambiguous situations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In other words, humans optimize the &lt;strong&gt;decision space&lt;/strong&gt;, not individual decisions.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Agent optimization starts with task design, not model tuning&lt;/li&gt;
&lt;li&gt;Prompts and temperatures are secondary levers&lt;/li&gt;
&lt;li&gt;Feedback loops determine long-term behavior&lt;/li&gt;
&lt;li&gt;Constraints increase reliability and predictability&lt;/li&gt;
&lt;li&gt;Humans belong above the loop, not inside every step&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Optimizing agents is not about making them smarter. It’s about making the system clearer.&lt;/p&gt;

&lt;p&gt;When tasks are well-defined, feedback is meaningful, and constraints are explicit, agents don’t need to be aggressively optimized — they simply work better.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>machinelearning</category>
      <category>ai</category>
      <category>systems</category>
    </item>
  </channel>
</rss>
