<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: GauntletCI</title>
    <description>The latest articles on Forem by GauntletCI (@gauntletci).</description>
    <link>https://forem.com/gauntletci</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3918796%2F6df556b4-e474-4313-a65a-09116f121db2.png</url>
      <title>Forem: GauntletCI</title>
      <link>https://forem.com/gauntletci</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/gauntletci"/>
    <language>en</language>
    <item>
      <title>The Asymmetry of Change: Why Your Tests Are Looking the Wrong Way</title>
      <dc:creator>GauntletCI</dc:creator>
      <pubDate>Sat, 09 May 2026 01:27:41 +0000</pubDate>
      <link>https://forem.com/gauntletci/the-asymmetry-of-change-why-your-tests-are-looking-the-wrong-way-2k2f</link>
      <guid>https://forem.com/gauntletci/the-asymmetry-of-change-why-your-tests-are-looking-the-wrong-way-2k2f</guid>
      <description>&lt;h2&gt;
  
  
  The Asymmetry of Change: Why Your Tests Are Looking the Wrong Way
&lt;/h2&gt;

&lt;p&gt;A passing build is often treated as a certificate of correctness. In reality, it's a narrow contract.&lt;/p&gt;

&lt;p&gt;It doesn't prove your code is right. It proves that the assertions you wrote in the past, against behaviors you anticipated back then, still hold true today.&lt;/p&gt;

&lt;p&gt;When you open a pull request, your unit tests ask: &lt;em&gt;"Does the system still behave the way it used to?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The question you actually need to answer is different: &lt;strong&gt;"Is the new behavior I just introduced safe?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Those aren't the same thing. And that gap is exactly where production incidents live.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Wrong Question
&lt;/h2&gt;

&lt;p&gt;Here's the problem: tests are a snapshot of past understanding.&lt;/p&gt;

&lt;p&gt;Your code changed. Your tests didn't. And somehow the build is still green.&lt;/p&gt;

&lt;p&gt;A guard clause disappears. No test explicitly covered it because the guard &lt;em&gt;was&lt;/em&gt; the coverage. A condition gets narrowed. An exception handler gets swapped. A state transition loses a validation step.&lt;/p&gt;

&lt;p&gt;The test suite sees none of this, because it was never asked to care about these things. It was asked about something else. Something that still works fine.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Evidence
&lt;/h2&gt;

&lt;p&gt;This isn't theoretical. Multiple independent studies have found the same pattern across every major programming language.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Test Co-Evolution Studies:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A 2025 study analyzed 526 repositories across JavaScript, TypeScript, Java, Python, PHP, and C#. Finding: asynchronous evolution of tests and code is pervasive. [1] Earlier work on 975 Java projects reached the same conclusion: production code frequently changes without test updates. [2] This has been documented since at least 2010. [3]&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chromium CI Study:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Researchers analyzed 1.5 million test executions across 14,000 commits. Result: even with 99.2% precision, modern flakiness detection still caused 76.2% of real regression faults to be missed. [4] Not because tests were missing. Because the tests that existed were being silenced.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real Example - Django 6.0:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A refactor in the &lt;code&gt;querystring&lt;/code&gt; template tag introduced a loop that worked fine for standard dictionaries but silently broke &lt;code&gt;QueryDict&lt;/code&gt; instances. Existing tests passed. The bug shipped. It was caught only by a targeted rendered-output test that nobody thought to run regularly. [5]&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Numbers:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In an analysis of 598 pull requests across 57 open-source .NET repositories, 71% of PRs submitted without test file modifications contained at least one behavioral risk indicator. [6] That's not an outlier. That's the norm.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Time Machine Problem
&lt;/h2&gt;

&lt;p&gt;Every diff is a time machine moving in one direction.&lt;/p&gt;

&lt;p&gt;The assertions stay where they were written. The code underneath moves forward.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Before: implicit contract&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;span class="nf"&gt;Process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// After: contract broken, tests don't notice&lt;/span&gt;
&lt;span class="nf"&gt;Process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The guard was always there. Because it was always there, nobody wrote a test for the null case. It was implicit in the structure. The contract was protected by accident.&lt;/p&gt;

&lt;p&gt;Remove that guard, and the test suite stays green. It's not "broken." It just never knew the guard mattered.&lt;/p&gt;

&lt;p&gt;This is the &lt;strong&gt;Implicit Contract&lt;/strong&gt; problem. And it's everywhere.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Code Review Isn't Enough
&lt;/h2&gt;

&lt;p&gt;We rely on code review to catch these slips.&lt;/p&gt;

&lt;p&gt;But human reviewers have a context window too. On a Tuesday afternoon, looking at a 400-line diff, they might see a refactor and miss that a crucial exception handler got swapped or a validation step disappeared.&lt;/p&gt;

&lt;p&gt;We are asking humans to perform high-stakes pattern matching against a moving target. It's a process designed for fatigue.&lt;/p&gt;

&lt;p&gt;Plus: reviewers didn't write the original code. They don't carry the full behavioral contract in their head. The removed guard clause looks like cleanup. The narrowed condition looks like a legitimate business rule change.&lt;/p&gt;

&lt;p&gt;Code review is essential. But it's not a safety net. It's a second pair of eyes that also gets tired.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Deterministic Answer
&lt;/h2&gt;

&lt;p&gt;Here's what actually works: catch these patterns &lt;em&gt;before&lt;/em&gt; anyone else sees the code.&lt;/p&gt;

&lt;p&gt;Not with an LLM that sometimes forgets what you told it thirty messages ago. Not with probabilities. With deterministic rules that fire the same way every single time.&lt;/p&gt;

&lt;p&gt;A Roslyn-powered engine that scans your diff and flags:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Removed guard clauses or defensive conditions&lt;/li&gt;
&lt;li&gt;Narrowed catch blocks (&lt;code&gt;catch(Exception)&lt;/code&gt; → &lt;code&gt;catch(ArgumentException)&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Validation steps removed from state transitions&lt;/li&gt;
&lt;li&gt;Thread-blocking patterns introduced in async code (e.g., new &lt;code&gt;Thread.Sleep()&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Behavioral changes that touch no test files&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each of these is a pattern that has caused real production incidents. Each can slip past a green test suite.&lt;/p&gt;

&lt;p&gt;The output is a checklist, not a verdict. You still decide what's actually a risk and what isn't. But you decide with full information, at the moment of change, when the logic is still fresh in your head.&lt;/p&gt;




&lt;h2&gt;
  
  
  Moving the "Uh-Oh" Moment
&lt;/h2&gt;

&lt;p&gt;The most expensive place to have an &lt;strong&gt;"uh-oh"&lt;/strong&gt; moment is in a post-mortem.&lt;/p&gt;

&lt;p&gt;The second most expensive is a failed staging build.&lt;/p&gt;

&lt;p&gt;The goal is to move that realization to your local terminal. The millisecond you hit save. Before you even think about committing.&lt;/p&gt;

&lt;p&gt;When you catch unvalidated behavioral changes while the code is still in front of you, you don't just keep the build green. You ensure the build is actually correct.&lt;/p&gt;

&lt;p&gt;You stop the time machine before it leaves the station.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;If this problem feels familiar, you've already felt the cost of it.&lt;/p&gt;

&lt;p&gt;The question isn't whether these gaps exist. The evidence is clear: they're everywhere. The question is whether you want to keep finding them in production, or find them at the diff.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Learn more:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/EricCogen/GauntletCI" rel="noopener noreferrer"&gt;GauntletCI on GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://gauntletci.com/the-asymmetry-of-change" rel="noopener noreferrer"&gt;The full article on GauntletCI.com&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://gauntletci.com/behavioral-change-risk-formal-framework" rel="noopener noreferrer"&gt;Behavioral Change Risk: A Formal Framework&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;p&gt;&lt;a id="ref1"&gt;&lt;/a&gt;[1] Miranda, J. et al. (2025). Test Co-Evolution in Software Projects: A Large-Scale Empirical Study. &lt;em&gt;Journal of Software: Evolution and Process.&lt;/em&gt; DOI: 10.1002/smr.70035&lt;/p&gt;

&lt;p&gt;&lt;a id="ref2"&gt;&lt;/a&gt;[2] Sun, W. et al. (2021). Understanding and Facilitating the Co-Evolution of Production and Test Code. &lt;em&gt;IEEE International Conference on Software Engineering (ICSE).&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a id="ref3"&gt;&lt;/a&gt;[3] Gergely, T. et al. (2010). Studying the co-evolution of production and test code in open source and industrial developer test processes through repository mining. &lt;em&gt;Empirical Software Engineering.&lt;/em&gt; DOI: 10.1007/s10664-010-9143-7&lt;/p&gt;

&lt;p&gt;&lt;a id="ref4"&gt;&lt;/a&gt;[4] Haben, G., Habchi, S., Papadakis, M., Cordy, M., &amp;amp; Le Traon, Y. (2023). The Importance of Discerning Flaky from Fault-triggering Test Failures: A Case Study on the Chromium CI. &lt;em&gt;arXiv:2302.10594.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a id="ref5"&gt;&lt;/a&gt;[5] Moreau, M. (2026). How a Single Test Revealed a Bug in Django 6.0. &lt;em&gt;Lincoln Loop.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a id="ref6"&gt;&lt;/a&gt;[6] Cogen, E. (2025). GauntletCI Corpus Analysis. 598 pull requests across 57 open-source .NET repositories.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Eric I. Cogen builds software for production. Twenty years in .NET, twenty years of shipping bugs that tests never caught. GauntletCI is the pre-commit gate he wishes he'd had all along.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>testing</category>
      <category>dotnet</category>
      <category>csharp</category>
      <category>ai</category>
    </item>
    <item>
      <title>Post-Mortem: How a "Performance" PR Introduced 28 New Regressions</title>
      <dc:creator>GauntletCI</dc:creator>
      <pubDate>Fri, 08 May 2026 01:53:14 +0000</pubDate>
      <link>https://forem.com/gauntletci/post-mortem-how-a-performance-pr-introduced-28-new-regressions-4pgf</link>
      <guid>https://forem.com/gauntletci/post-mortem-how-a-performance-pr-introduced-28-new-regressions-4pgf</guid>
      <description>&lt;h2&gt;
  
  
  Analyzing Jellyfin PR #16062 with GauntletCI
&lt;/h2&gt;

&lt;p&gt;Jellyfin PR #16062 is titled &lt;strong&gt;"Query Performance Improvements."&lt;/strong&gt; It was a massive architectural shift: 126 files, 27,810 lines of code. It was reviewed, approved, and merged on May 3, 2026.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;By May 7, the community was already reporting 90-second query hangs (Issue #16279).&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We ran GauntletCI: a deterministic, rules-based Behavioral Change Risk (BCR) detector: against the merged diff. It took exactly &lt;strong&gt;660 ms&lt;/strong&gt; to find exactly why the "Performance Improvement" was causing performance degradation.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuxr5r1a94okw5z6rt4v3.png" alt="The GauntletCI dashboard for Jellyfin PR #16062: 129 findings across 27,000+ lines, fully analyzed in just" width="800" height="721"&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Gap Between Intent and Reality
&lt;/h3&gt;

&lt;p&gt;In a 27,000-line diff, human review is a suggestion, not a safeguard. The maintainers intended to fix N+1 query patterns. They succeeded in some areas, but the sheer scale of the change made it impossible to see what was being introduced simultaneously.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The Performance Traps (GCI0044)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Findings:&lt;/strong&gt; 28 LINQ-in-loop patterns.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;The Reality:&lt;/strong&gt; While the PR closed older performance issues, it introduced 28 new ones. Specifically, 9 findings in &lt;code&gt;BaseItemRepository.TranslateQuery.cs&lt;/code&gt; map directly to the filtering logic that users are now reporting as "unbearably slow." &lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The Verification:&lt;/strong&gt; &lt;a href="https://github.com/jellyfin/jellyfin/issues/16279" rel="noopener noreferrer"&gt;Issue #16279&lt;/a&gt; ("Filters query taking 90s each time") isn't a mystery. It's a structural regression that was visible in the diff 1546 ms after it was written.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  2. The Deadlock Time-Bombs (GCI0016)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Findings:&lt;/strong&gt; 5 Block-level async violations (&lt;code&gt;.Wait()&lt;/code&gt; and &lt;code&gt;.GetAwaiter().GetResult()&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;The Risk:&lt;/strong&gt; These are "Heisenbugs." They often pass local CI because they require specific concurrency timing to hang a thread pool. &lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Status:&lt;/strong&gt; These are currently sitting in the master branch. They haven't "exploded" yet, but the pattern is a well-documented deadlock vector in ASP.NET Core.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. The Structural Decay (GCI0038 &amp;amp; GCI0043)
&lt;/h3&gt;

&lt;p&gt;Beyond the crashes, the scan found:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;45 Dependency Injection Violations:&lt;/strong&gt; Service locator anti-patterns that create hidden coupling.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;15 Type Safety Gaps:&lt;/strong&gt; &lt;code&gt;as&lt;/code&gt; casts without null checks that lead to context-less &lt;code&gt;NullReferenceExceptions&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Execution Profile
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total Findings&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;129&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Block-Level (Merge Stoppers)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Scan Time&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;660 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LLMs Used&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Why This Happened
&lt;/h3&gt;

&lt;p&gt;The Jellyfin team is talented. The problem isn't the people: it's the &lt;strong&gt;Scale of Change vs. The Speed of Human Cognition.&lt;/strong&gt; Reviewers check for intent; GauntletCI checks for structural risk.&lt;/p&gt;

&lt;h3&gt;
  
  
  Try It Yourself
&lt;/h3&gt;

&lt;p&gt;You don't need an LLM to find these. You need a &lt;a href="https://github.com/EricCogen/GauntletCI" rel="noopener noreferrer"&gt;pessimistic verifier&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;dotnet tool &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; GauntletCI
gauntletci analyze &lt;span class="nt"&gt;--staged&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>dotnet</category>
      <category>csharp</category>
      <category>performance</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
