<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Amit Ben-Ari</title>
    <description>The latest articles on Forem by Amit Ben-Ari (@amitba).</description>
    <link>https://forem.com/amitba</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3827860%2F69b55ac3-d46d-4cf2-b76c-4e35fca060c1.jpg</url>
      <title>Forem: Amit Ben-Ari</title>
      <link>https://forem.com/amitba</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/amitba"/>
    <language>en</language>
    <item>
      <title>Why git log --oneline is Killing Your AI-Generated PRs</title>
      <dc:creator>Amit Ben-Ari</dc:creator>
      <pubDate>Thu, 09 Apr 2026 06:30:00 +0000</pubDate>
      <link>https://forem.com/amitba/why-git-log-oneline-is-killing-your-ai-generated-prs-5gbm</link>
      <guid>https://forem.com/amitba/why-git-log-oneline-is-killing-your-ai-generated-prs-5gbm</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://hivetrail.com/blog/why-git-log-oneline-kills-ai-prs" rel="noopener noreferrer"&gt;hivetrail.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I build &lt;a href="https://hivetrail.com/mesh" rel="noopener noreferrer"&gt;HiveTrail Mesh&lt;/a&gt;, a context assembly tool for LLMs. I use Claude Code, as well as multiple other LLMs, daily. And recentry, I asked it to write a pull request description for a feature I'd just finished - 27 commits across 32 files, several days of real work.&lt;/p&gt;

&lt;p&gt;The output was competent. It had a title, a summary, a list of changed files. But reading it back, something felt off. I knew what was in those commits. The architectural decisions, the edge cases I'd hunted down, the bug fixes that weren't obvious from the file names. None of it was there. The PR read like someone had skimmed the index of a book and written a summary without reading the chapters.&lt;/p&gt;

&lt;p&gt;So I did what any developer building a context tool probably would: I went looking for why.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Claude Code Actually Sends to the Model
&lt;/h2&gt;

&lt;p&gt;When you ask Claude Code to write a PR description, it runs two git commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git log main..HEAD &lt;span class="nt"&gt;--oneline&lt;/span&gt;
git diff main...HEAD &lt;span class="nt"&gt;--stat&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;--oneline&lt;/code&gt; flag returns one line per commit: the abbreviated SHA hash and the subject line. That's it. No commit body, no co-author notes, no extended description you carefully wrote.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;--stat&lt;/code&gt; flag returns a diffstat - a summary of which files changed and how many lines were added or removed. Again, no actual content. No diffs, no file contents, no context about &lt;em&gt;what&lt;/em&gt; changed or &lt;em&gt;why&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;So the model is working from something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;7c22302 fix(git-tools): harden subprocess wrapper against local gitconfig pollution
6b5fd96 feat(git-tools): replace base branch input with auto-populated select
1246c27 fix(git-tools): resolve validation, state drift, and arch leaks
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Plus a file change summary. For a 27-commit feature branch, that's the equivalent of asking someone to explain a film by reading the chapter titles on a DVD menu.&lt;/p&gt;

&lt;p&gt;The model is good enough to produce something coherent from this - but coherent isn't the same as accurate, and it certainly isn't the same as useful.&lt;/p&gt;




&lt;h2&gt;
  
  
  What a Good PR Description Actually Needs
&lt;/h2&gt;

&lt;p&gt;Before getting to the fix, it's worth being specific about what's missing.&lt;/p&gt;

&lt;p&gt;A PR description serves two audiences with different needs. Developers reviewing the code want to know which layers of the codebase were touched, what edge cases were handled, and why certain decisions were made the way they were. Product managers and QA engineers want to understand user impact, workflow changes, and how to verify the feature works.&lt;/p&gt;

&lt;p&gt;When the model only has commit subject lines and a file list, it can infer &lt;em&gt;what&lt;/em&gt; changed from the file names. It cannot infer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The why&lt;/strong&gt; behind architectural decisions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Edge cases&lt;/strong&gt; that were discovered and handled mid-implementation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bug fixes&lt;/strong&gt; that are buried in commits whose subject lines don't make them obvious&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The distinction&lt;/strong&gt; between new features and hardening work&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Testing specifics&lt;/strong&gt; - what's covered, how, and what a reviewer should manually verify&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are exactly the things that separate a PR description that's useful from one that's just technically accurate.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Prompt Fix: Better Claude Code Instructions
&lt;/h2&gt;

&lt;p&gt;Before reaching for a different tool, most developers will try the obvious thing first: write a better prompt. And it's a fair instinct - you can absolutely instruct Claude Code to run more thorough git commands, request full diff content, and follow a specific PR structure. Something like &lt;em&gt;"run git log with full commit bodies, fetch the complete diff for each changed file, then write a PR description organized by architectural layer with a key design decisions section"&lt;/em&gt; will produce meaningfully better output than the default.&lt;/p&gt;

&lt;p&gt;But there are real costs to this approach worth understanding. The most immediate is token burn - asking Claude Code to fetch full diffs and structured commit logs for a large branch will consume significantly more context than the default &lt;code&gt;--oneline&lt;/code&gt; summary approach, which adds up quickly if you're on a metered plan. The less obvious problem is consistency. Claude Code operates within a conversation context that degrades over a long session: early instructions get compressed, memory gets summarised, and the careful prompt you wrote at the start of a session may not be fully honoured three hours later when you finally hit merge. You're also now maintaining a prompt, not just a workflow.&lt;/p&gt;

&lt;p&gt;For this test, I deliberately used the simplest possible prompt - &lt;em&gt;"based on the staged changes / recent commits, write me a PR title and description"&lt;/em&gt; - across all three methods. The goal was to measure what each approach produces from its own capabilities, not what it produces when coached. Prompt engineering can close some of the gap, but it can't change what data the model is actually working from.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Manual Fix: Give the Model the Full Context
&lt;/h2&gt;

&lt;p&gt;The underlying problem is simple: the model is summarising a summary. The fix is to give it the actual data.&lt;/p&gt;

&lt;p&gt;Here's what that looks like manually. Before asking your LLM to write the PR, run this yourself and include the output in your prompt:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Full commit log with bodies:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git log main..HEAD &lt;span class="nt"&gt;--pretty&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;format:&lt;span class="s2"&gt;"%H%n%an%n%ad%n%s%n%b%n---"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Actual file diffs:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git diff main...HEAD
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Or per-commit diffs if the full diff is too large:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git log main..HEAD &lt;span class="nt"&gt;--patch&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A word of warning on token volume: for a large feature branch, &lt;code&gt;git diff main...HEAD&lt;/code&gt; on a 27-commit, 32-file branch produced around 106,000 tokens in my test - roughly 379KB of XML-structured content. That's well beyond what you'd paste into a chat window, and it approaches or exceeds the context limits of many models.&lt;/p&gt;

&lt;p&gt;This is where you need to be selective. For smaller branches - a few commits, a handful of files - pasting the full diff directly works fine. For larger branches, you have options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Feed it to a model with a large context window (Gemini Pro handles this comfortably)&lt;/li&gt;
&lt;li&gt;Trim to the files and commits most relevant to the PR's purpose&lt;/li&gt;
&lt;li&gt;Use the structured approach described in the next section&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Either way, the quality difference is immediate. When I ran the same prompt - &lt;em&gt;"based on the staged changes / recent commits, write me a PR title and description"&lt;/em&gt; - with the full structured context versus Claude Code's summary approach, the outputs were not comparable. The full-context version knew about BOM-aware file encoding, NiceGUI deleted-slot errors, the decision to use &lt;code&gt;@computed_field&lt;/code&gt; to eliminate state drift. The summary version knew that &lt;code&gt;git_service.py&lt;/code&gt; was a new file.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the Results Actually Showed
&lt;/h2&gt;

&lt;p&gt;I ran three versions of the same PR description for the same feature branch:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Version 1 - Claude Code (Sonnet 4.6)&lt;/strong&gt;&lt;br&gt;
Working from &lt;code&gt;--oneline&lt;/code&gt; and &lt;code&gt;--stat&lt;/code&gt; only. Produced a competent, file-oriented description with a good "Key design decisions" section. Flat formatting, no inline code styling, read like a wall of text.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Version 2 - Claude web chat (Sonnet 4.6) + full structured context&lt;/strong&gt;&lt;br&gt;
The same model, but fed the complete PR Brief XML (106k tokens of structured diffs, commit metadata, and file content). Layered by architectural section, included product context, named specific edge cases and why they were handled the way they were, referenced exact test counts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Version 3 - Claude web chat (Haiku 4.5) + full structured context&lt;/strong&gt;&lt;br&gt;
The cheapest Claude model, same full context. Produced a description nearly as strong as Version 2, with better structured sections for testing guidance and explicit "Key Design Decisions."&lt;/p&gt;

&lt;p&gt;I asked Gemini 3 Pro to evaluate all three as a neutral third party, framed as a senior developer and product manager. The ranking: Version 2 first, Version 3 second, Version 1 third.&lt;/p&gt;

&lt;p&gt;The conclusion that stood out: &lt;strong&gt;Haiku 4.5 with full context outperformed Sonnet 4.6 with shallow context.&lt;/strong&gt; The model tier mattered less than the context quality.&lt;/p&gt;

&lt;p&gt;Gemini's summary of the gap was pointed: Version 1 "forces the reviewer to do the heavy lifting." Versions 2 and 3 "treat the PR description as living documentation."&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Google Gemini 3 Pro's comparison table:&lt;/em&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;PR Text&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Primary Strength&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Primary Weakness&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Best Audience&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;PR 1 (Claude Code)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Exceptional, step-by-step test plan.&lt;/td&gt;
&lt;td&gt;Flat formatting makes it hard to scan. Lacks inline code styling.&lt;/td&gt;
&lt;td&gt;QA and Product Managers.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;PR 2 (Sonnet + Mesh)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Strongest structural hierarchy and includes "Key Design Decisions."&lt;/td&gt;
&lt;td&gt;Testing section is a status report, not an actionable guide.&lt;/td&gt;
&lt;td&gt;Senior Developers and Architects.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;PR 3 (Haiku + Mesh)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Excellent markdown formatting (inline backticks) and explicit bug-fix callouts.&lt;/td&gt;
&lt;td&gt;Lacks the deeper architectural "why" found in PR 2.&lt;/td&gt;
&lt;td&gt;General Code Reviewers.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Version 1:&lt;/strong&gt; Claude Code with Sonnet 4.6:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;Title:
feat(#282): add Git Tools source type (Commit Brief + PR Brief)
&lt;span class="p"&gt;
---
&lt;/span&gt;
Description:

Summary
Introduces Git Tools as a new source type in the Stack, enabling users to generate token-optimized LLM context from local git repositories.

Two item types are supported:
&lt;span class="p"&gt;
-&lt;/span&gt; Commit Brief - scans a single commit and produces a structured XML summary of changed files and diff content
&lt;span class="p"&gt;-&lt;/span&gt; PR Brief - scans a branch against a base branch, producing a structured XML summary with per-commit metadata and a selectable file/commit
checklist

What's included
&lt;span class="p"&gt;-&lt;/span&gt; services/git_service.py - new service with full subprocess wrapper...
&lt;span class="p"&gt;-&lt;/span&gt; models/stack/git_tools.py - Pydantic models for CommitBriefStackItem...
&lt;span class="p"&gt;-&lt;/span&gt; core/state/sources/git_tools.py + core/state/stack/handlers/git_tools.py...
&lt;span class="p"&gt;-&lt;/span&gt; components/common/forms/git_tools/ - CommitBriefForm, PRBriefForm...
&lt;span class="p"&gt;-&lt;/span&gt; components/sections/stack/dialogs/edit_commit_brief.py + edit_pr_brief.py - edit dialogs with...
&lt;span class="p"&gt;-&lt;/span&gt; components/sections/source/panels/git_tools/ - source panel with auto-populated base branch select ...
&lt;span class="p"&gt;-&lt;/span&gt; Full pytest coverage in tests/services/test_git_service.py

Key design decisions
&lt;span class="p"&gt;-&lt;/span&gt; Base branch is auto-populated from the repo rather than free-text input...
&lt;span class="p"&gt;-&lt;/span&gt; Partial generation failures surface as a warning state...
&lt;span class="p"&gt;-&lt;/span&gt; Commit log is stored as structured...
&lt;span class="p"&gt;-&lt;/span&gt; subprocess calls are wrapped to suppress local...
&lt;span class="p"&gt;-&lt;/span&gt; Follows the existing async stack lifecycle...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Version 2:&lt;/strong&gt; Claude Sonnet 4.6 chat with Mesh:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gs"&gt;**feat(#282): Add Git Tools source type (Commit Brief + PR Brief)**&lt;/span&gt;
&lt;span class="p"&gt;
---
&lt;/span&gt;
&lt;span class="gu"&gt;## Summary&lt;/span&gt;

Introduces Git Tools as a fourth source type in HiveTrail Mesh, alongside Notion, Local Files, and Context Blocks. Provides two workflows for assembling LLM context from a git repository:
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="gs"&gt;**Commit Brief**&lt;/span&gt; - scans uncommitted changes (staged/unstaged/untracked), assembles file content + unified diffs against HEAD for commit message generation
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**PR Brief**&lt;/span&gt; - scans branch-wide changes vs a base branch, includes file content, diffs, structured commit log, and uncommitted change warnings for PR description generation
&lt;span class="p"&gt;
---
&lt;/span&gt;
&lt;span class="gu"&gt;## What's Changed&lt;/span&gt;

&lt;span class="gs"&gt;**Models &amp;amp; Architecture**&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; New &lt;span class="sb"&gt;`GitChangedFile`&lt;/span&gt;, &lt;span class="sb"&gt;`GitCommitEntry`&lt;/span&gt;, ... models with Pydantic...
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`PRBriefStackItem.commit_count`&lt;/span&gt; as a &lt;span class="sb"&gt;`@computed_field`&lt;/span&gt; to eliminate...
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`warning`&lt;/span&gt; status added to &lt;span class="sb"&gt;`StackItemBaseModel`&lt;/span&gt;...
&lt;span class="p"&gt;-&lt;/span&gt; Git item types registered in &lt;span class="sb"&gt;`StackItemUnion`&lt;/span&gt;...

&lt;span class="gs"&gt;**Service Layer**&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; New &lt;span class="sb"&gt;`git_service.py`&lt;/span&gt; - stateless subprocess service covering...
&lt;span class="p"&gt;-&lt;/span&gt; Hardened &lt;span class="sb"&gt;`_run_git`&lt;/span&gt; with... to prevent ANSI corruption and pager hangs
&lt;span class="p"&gt;-&lt;/span&gt; BOM-aware file reading (UTF-16 LE/BE) in...
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`GitResult`&lt;/span&gt; dataclass for structured content...
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`generate_git_content`&lt;/span&gt; produces structured... XML nodes with ... CDATA string

&lt;span class="gs"&gt;**State &amp;amp; Stack**&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="sb"&gt;`GitToolsSource`&lt;/span&gt; bridge in... wrapping all git service calls
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`GitToolsHandler`&lt;/span&gt; integrated into the standard async stack lifecycle...
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`StackManager.update_git_item`&lt;/span&gt;, ... extended for git types
&lt;span class="p"&gt;-&lt;/span&gt; Stack persistence (&lt;span class="sb"&gt;`StackPersistence`&lt;/span&gt;) handles ... in save/load
&lt;span class="p"&gt;-&lt;/span&gt; Full facade coverage on...

&lt;span class="gs"&gt;**UI - Source Panel**&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="sb"&gt;`GitToolsPanel`&lt;/span&gt; with ..., following ... conventions
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`CommitBriefForm`&lt;/span&gt; and ... forms with path selector, scan button...
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`FileChecklist`&lt;/span&gt; and ... reusable components with select/deselect all, .... with local timezone formatting
&lt;span class="p"&gt;-&lt;/span&gt; Base Branch field auto-populated from ... disables immediately on path change to prevent stale state
&lt;span class="p"&gt;-&lt;/span&gt; Uncommitted changes warning banner in...
&lt;span class="p"&gt;-&lt;/span&gt; Files/Commits sections hidden until...

&lt;span class="gs"&gt;**UI - Stack**&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="sb"&gt;`GitToolsStackCard`&lt;/span&gt; with type-specific... confirmed Reload action
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`EditCommitBriefDialog`&lt;/span&gt; and ... with Settings/Content tabs, ... banner for partial failures
&lt;span class="p"&gt;-&lt;/span&gt; Insert action enabled for ...
&lt;span class="p"&gt;-&lt;/span&gt; Edit routing wired in ...

&lt;span class="gs"&gt;**Bug Fixes &amp;amp; Hardening**&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="sb"&gt;`notify`&lt;/span&gt; called before ... to prevent NiceGUI ... errors
&lt;span class="p"&gt;-&lt;/span&gt; Stale scan ... on path change
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`commit_log`&lt;/span&gt; fallback ... prevent Pydantic ...
&lt;span class="p"&gt;-&lt;/span&gt; Output manager processing logs ...

&lt;span class="gs"&gt;**Tests**&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; 41 new tests in ... covering parse, pre-checks, scan, ... and PR XML generation (199 total passing)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Version 3: Claude Haiku 4.5 chat with Mesh:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gs"&gt;**feat(#282): complete Git Tools integration-forms, dialogs, stack cards, and content generation**&lt;/span&gt;
&lt;span class="p"&gt;
---
&lt;/span&gt;
&lt;span class="gh"&gt;# PR Description&lt;/span&gt;

&lt;span class="gu"&gt;## Overview&lt;/span&gt;

This PR completes the Git Tools feature (Commit Brief and PR Brief) end-to-end, integrating UI components, edit dialogs, stack management, and async content generation. The feature is now wired into the source panel, stack lifecycle, and preset persistence, with full test coverage for the git_service module.

&lt;span class="gu"&gt;## What's Included&lt;/span&gt;

&lt;span class="gu"&gt;### Forms &amp;amp; Components&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="gs"&gt;**CommitBriefForm**&lt;/span&gt; ... forms with path selection, branch auto-population, file/commit checklists, and rescan workflows
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**FileChecklist**&lt;/span&gt; ...Reusable list components with ... dialogs
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**GitToolsStackCard**&lt;/span&gt;: Stack ... confirmation
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**EditCommitBriefDialog**&lt;/span&gt; ... dialogs with Settings/Content tabs, ... display

&lt;span class="gu"&gt;### State &amp;amp; Architecture&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="gs"&gt;**GitToolsSource**&lt;/span&gt; (&lt;span class="sb"&gt;`core/.../git_tools.py`&lt;/span&gt;): Domain bridge wrapping ... asyncio threading
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**GitToolsHandler**&lt;/span&gt; (&lt;span class="sb"&gt;`core/.../git_tools.py`&lt;/span&gt;): Pre-generated ... (no JIT reads)
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**git_service.py**&lt;/span&gt;: Stateless subprocess service with ... merge logic
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**Stack persistence**&lt;/span&gt; updated to serialize/deserialize ...
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**StateManager facade**&lt;/span&gt; ... git-specific methods ...

&lt;span class="gu"&gt;### UI Integration&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; Git Tools added to source panel...
&lt;span class="p"&gt;-&lt;/span&gt; Stack card factory routes ...
&lt;span class="p"&gt;-&lt;/span&gt; Stack component instantiates both ...
&lt;span class="p"&gt;-&lt;/span&gt; BaseStackCard updated to handle ... enabled Insert button
&lt;span class="p"&gt;-&lt;/span&gt; Output manager log messages ...

&lt;span class="gu"&gt;### Content Generation&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="gs"&gt;**generate_git_content()**&lt;/span&gt;: Async XML generator with BOM-aware file reading...
&lt;span class="p"&gt;-&lt;/span&gt; Returns &lt;span class="gs"&gt;**GitResult**&lt;/span&gt; dataclass with ...
&lt;span class="p"&gt;-&lt;/span&gt; Partial success mode: items transition to &lt;span class="sb"&gt;`warning`&lt;/span&gt; state ...

&lt;span class="gu"&gt;### Data Models&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="gs"&gt;**GitChangedFile**&lt;/span&gt;, ...
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**ScanResult**&lt;/span&gt;, ... dataclasses
&lt;span class="p"&gt;-&lt;/span&gt; FileStatus ... enums
&lt;span class="p"&gt;-&lt;/span&gt; Pydantic &lt;span class="sb"&gt;`@computed_field`&lt;/span&gt; for ...

&lt;span class="gu"&gt;### Testing&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; 41 new git_service tests covering:
&lt;span class="p"&gt;    -&lt;/span&gt; _parse_name_status with ...
&lt;span class="p"&gt;    -&lt;/span&gt; Pre-checks (is_repo, ...)
&lt;span class="p"&gt;    -&lt;/span&gt; scan_commit_changes with ... states
&lt;span class="p"&gt;    -&lt;/span&gt; scan_pr_changes with ... flag
&lt;span class="p"&gt;    -&lt;/span&gt; apply_default_checks (...)
&lt;span class="p"&gt;    -&lt;/span&gt; merge_scan_with_saved (preserve user ...)
&lt;span class="p"&gt;    -&lt;/span&gt; parse_commit_log and ...
&lt;span class="p"&gt;    -&lt;/span&gt; generate_git_content integration with ...

&lt;span class="gu"&gt;## Key Design Decisions&lt;/span&gt;
&lt;span class="p"&gt;
1.&lt;/span&gt; &lt;span class="gs"&gt;**Pre-generated content model**&lt;/span&gt;: Git items store content at ...
&lt;span class="p"&gt;2.&lt;/span&gt; &lt;span class="gs"&gt;**Graceful partial success**&lt;/span&gt;: When some files fail ... insert partial context.
&lt;span class="p"&gt;3.&lt;/span&gt; &lt;span class="gs"&gt;**Clean Architecture enforcement**&lt;/span&gt;: GitToolsSource wraps ... No direct service imports.
&lt;span class="p"&gt;4.&lt;/span&gt; &lt;span class="gs"&gt;**Concurrent diff fetching**&lt;/span&gt;: Diffs fetched in parallel with ... avoid resource exhaustion.
&lt;span class="p"&gt;5.&lt;/span&gt; &lt;span class="gs"&gt;**Merge logic for rescan**&lt;/span&gt;: When users rescan, ... across repository state changes.
&lt;span class="p"&gt;6.&lt;/span&gt; &lt;span class="gs"&gt;**BOM-aware encoding detection**&lt;/span&gt;: UTF-16 files (with BOM) and UTF-8 with BOM are decoded correctly; Windows cp1252 default avoided.

&lt;span class="gu"&gt;## Testing Guidance&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; All 199 tests pass (including 41 new ... tests)
&lt;span class="p"&gt;-&lt;/span&gt; git_service tests use ...
&lt;span class="p"&gt;-&lt;/span&gt; Integration tested via forms/dialogs in the app
&lt;span class="p"&gt;-&lt;/span&gt; Warning state rendering tested in ...
&lt;span class="p"&gt;
---
&lt;/span&gt;
&lt;span class="gs"&gt;**Closes #282**&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Easier Path: Structured Context Assembly
&lt;/h2&gt;

&lt;p&gt;Constructing that full context manually - running the right git commands, handling encoding issues, structuring the output so the model can navigate the 106k-token PR Brief without losing the thread - is non-trivial. For a one-off experiment it's fine. As a repeatable workflow before every PR, it's friction most developers won't sustain.&lt;/p&gt;

&lt;p&gt;This is exactly the problem I built HiveTrail Mesh to solve. The PR Brief source type runs the git commands, structures the output as navigable XML with per-commit nodes, handles BOM-aware encoding, and lets you select which files and commits to include before the context gets assembled. The output goes to your clipboard, ready to paste into whichever LLM you want to use.&lt;/p&gt;

&lt;p&gt;If you want to try it, &lt;a href="https://hivetrail.com/mesh" rel="noopener noreferrer"&gt;Mesh is in limited beta and free during the beta period&lt;/a&gt;. But honestly, if you want to test the manual approach first on your next branch, the git commands above will get you there.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fztdtuwhcdq4i7n2ats9r.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fztdtuwhcdq4i7n2ats9r.webp" alt="HiveTrail Mesh’s PR Brief interface automates the git context assembly process, structuring full diffs and commit logs into LLM-ready XML while letting you select exactly which files to include." width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Reference: The Git Commands That Actually Feed the Model
&lt;/h2&gt;

&lt;p&gt;If you want to try the full-context approach on your next branch before merging, these are the three commands worth knowing:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Full commit log with bodies (not just subject lines):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
git log main..HEAD &lt;span class="nt"&gt;--pretty&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;format:&lt;span class="s2"&gt;"%H%n%s%n%b%n---"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Complete diff across the branch:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
git diff main...HEAD
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Per-commit diffs with context (useful for smaller branches):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
git log main..HEAD &lt;span class="nt"&gt;--patch&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A practical note on scope: for a large feature branch, the full diff will be large - potentially 100k+ tokens. Before feeding it to a model, skim the file list and drop binaries, generated files, and lockfile changes. What remains is usually 20-40% smaller and significantly more useful to the model.&lt;/p&gt;

&lt;p&gt;If you'd rather not run and filter these manually every time, this is the workflow &lt;a href="https://hivetrail.com/mesh" rel="noopener noreferrer"&gt;HiveTrail Mesh&lt;/a&gt; automates - structured XML output, file selection, token count before you export.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Broader Point
&lt;/h2&gt;

&lt;p&gt;Claude Code isn't doing something wrong. It's making a reasonable tradeoff - fast, low-cost, good enough for most cases. The &lt;code&gt;--oneline&lt;/code&gt; approach keeps the token cost down and the response time fast. For a quick commit message or a small fix, it's fine.&lt;/p&gt;

&lt;p&gt;But for complex feature branches where the PR description is going to be read by your team, reviewed by senior engineers, and live in your repository history for years - it's worth spending an extra 30 seconds to give the model the full picture.&lt;/p&gt;

&lt;p&gt;The quality of your AI output is constrained by the quality of the context you provide. For PR descriptions, the full diff is the context. Everything else is a summary of a summary.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;&lt;a href="https://dev.to/amitba"&gt;Amit&lt;/a&gt; builds HiveTrail Mesh, a context assembly tool for developers working with LLMs. &lt;a href="https://hivetrail.com/mesh" rel="noopener noreferrer"&gt;If you found this useful, join our beta.&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>git</category>
      <category>devtools</category>
      <category>productivity</category>
    </item>
    <item>
      <title>We Ran the Same Experiment Twice. Different Feature, Different Models, Same Winner.</title>
      <dc:creator>Amit Ben-Ari</dc:creator>
      <pubDate>Tue, 07 Apr 2026 07:00:00 +0000</pubDate>
      <link>https://forem.com/amitba/we-ran-the-same-experiment-twice-different-feature-different-models-same-winner-93n</link>
      <guid>https://forem.com/amitba/we-ran-the-same-experiment-twice-different-feature-different-models-same-winner-93n</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://hivetrail.com/blog/llm-context-assembly-pr-generation" rel="noopener noreferrer"&gt;hivetrail.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;em&gt;How two independent PR generation benchmarks pointed to the same conclusion about context quality - and why your model choice matters less than you think.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Here's a finding that should change how you think about AI tooling: in two independent experiments using real production code, a "budget" model fed rich context consistently outperformed flagship models operating on shallow git summaries. The budget model didn't just win. It won by a landslide, unanimously, against models that cost significantly more per token.&lt;/p&gt;

&lt;p&gt;This isn't a post about which model is best. It's about why the question itself might be the wrong one to ask.&lt;/p&gt;




&lt;h2&gt;
  
  
  The setup
&lt;/h2&gt;

&lt;p&gt;HiveTrail Mesh is a context assembly tool. One of its core features is PR Brief - it scans a git branch against a base branch, reads every changed file in full, assembles all diffs and commit metadata into a structured XML document, and hands it to an LLM. The output is typically a 100K–380K token document containing everything an LLM needs to write a comprehensive PR description.&lt;/p&gt;

&lt;p&gt;We used this workflow as the basis for both experiments. The prompt in each case was deliberately simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Based on the staged changes / recent commits, write me a PR title and description.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;No elaborate prompting. No chain-of-thought instructions. Just the raw context and a task.&lt;/p&gt;




&lt;h2&gt;
  
  
  Experiment 1: The budget model vs. the flagship agent
&lt;/h2&gt;

&lt;p&gt;The first experiment ran on the Git Tools feature - a substantial new addition to HiveTrail Mesh covering 27 commits across 32 files, with async XML generation, state management, UI components, and 41 new tests.&lt;/p&gt;

&lt;p&gt;We ran three conditions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Condition A - Claude Code (Sonnet 4.6), native git context.&lt;/strong&gt; Claude Code ran &lt;code&gt;git log main..HEAD --oneline&lt;/code&gt; and &lt;code&gt;git diff main...HEAD --stat&lt;/code&gt; - the standard abbreviated approach. Generated in about 25 seconds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Condition B - Haiku 4.5, Mesh context.&lt;/strong&gt; Mesh assembled a 380KB XML file (~106K tokens) covering every changed file, diff, and commit. Haiku 4.5 received this in full.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Condition C - Sonnet 4.6, Mesh context.&lt;/strong&gt; Same Mesh XML, same prompt, given to Sonnet 4.6.&lt;/p&gt;

&lt;p&gt;Gemini 3 Pro evaluated all three as a senior software developer and product manager.&lt;/p&gt;

&lt;p&gt;The verdict was unambiguous. The Mesh-fed PRs were called "significantly stronger" across every dimension: product context, workflow clarity, architectural structure, technical depth, and testing visibility. The Claude Code version was characterised as reading like "a rough draft or a quick brain dump before hitting Create Pull Request."&lt;/p&gt;

&lt;p&gt;This wasn't a knock on Sonnet 4.6. It was a knock on what Sonnet 4.6 was given to work with.&lt;/p&gt;

&lt;p&gt;Claude Code - like most agentic coding tools - acts like a developer who skims the commit titles and says "looks good to me." It reads summaries: which files changed, roughly how many lines, what the commit subjects say. HiveTrail Mesh acts like the reviewer who actually pulls down the branch and reads every single file. The difference in output reflects that difference in reading.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Haiku 4.5 with full context outperformed Sonnet 4.6 with shallow context.&lt;/strong&gt; A cheaper, faster model given the complete picture wrote a better PR than a more capable model working from a summary.&lt;/p&gt;

&lt;p&gt;But here's the part that should really give you pause: Haiku 4.5 didn't just beat Sonnet 4.6's native shallow context - &lt;strong&gt;it beat Sonnet 4.6 when both were fed the exact same Mesh XML.&lt;/strong&gt; The budget model outperformed the flagship on a level playing field.&lt;/p&gt;

&lt;p&gt;Final ranking:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Haiku 4.5 + Mesh&lt;/strong&gt; - best overall structure, key design decisions, quantified test coverage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sonnet 4.6 + Mesh&lt;/strong&gt; - excellent markdown, clear bug-fix callouts, strong architecture section&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sonnet 4.6 native (Claude Code)&lt;/strong&gt; - good test plan, but flat structure and shallow context throughout&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Experiment 2: Can Gemini CLI beat its own model family?
&lt;/h2&gt;

&lt;p&gt;Several months later, we ran a second experiment on a completely different feature - the GitHub API integration for HiveTrail Mesh, covering 24 files and 22 commits.&lt;/p&gt;

&lt;p&gt;The framing this time was sharper. &lt;strong&gt;The question wasn't "which model is best" - it was "can an agentic tool using native git context compete with the same model family when context is properly assembled?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Gemini CLI was the subject under test. It has its own git tooling, can run shell commands, and is built by the same team behind the models it would be competing against. If any tool could close the context gap through smart native tool use, Gemini CLI was the candidate.&lt;/p&gt;

&lt;p&gt;We set it against seven Gemini models - ranging from Gemini 3 Fast to Gemini 3.1 Pro with high thinking - all fed via HiveTrail Mesh. We also added Haiku 4.5 via Mesh as an external reference point, since it had won Experiment 1.&lt;/p&gt;

&lt;p&gt;Three independent judges evaluated all nine PR texts blind, without knowing which model produced which:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Google Gemini 3 Pro&lt;/li&gt;
&lt;li&gt;Anthropic Claude Opus 4.6&lt;/li&gt;
&lt;li&gt;OpenAI ChatGPT&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Scoring: 9 points for 1st place, 1 point for last. Maximum possible: 27.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Rank&lt;/th&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Gemini Pro&lt;/th&gt;
&lt;th&gt;Opus 4.6&lt;/th&gt;
&lt;th&gt;ChatGPT&lt;/th&gt;
&lt;th&gt;Total&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Haiku 4.5 + Mesh&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;27&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Gemini Flash 3 preview (Thinking Low) + Mesh&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;23&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Gemini 3 Fast + Mesh&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Gemini 3.1 Pro preview (Thinking High) + Mesh&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tied 5&lt;/td&gt;
&lt;td&gt;ChatGPT + Mesh&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tied 5&lt;/td&gt;
&lt;td&gt;Gemini Flash 3 preview (Thinking High) + Mesh&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Gemini 3.1 Flash Light preview (Thinking High) + Mesh&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;Gemini 3 Pro + Mesh&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Gemini CLI (native context)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;4&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Two results stand out.&lt;/p&gt;

&lt;p&gt;First, Haiku 4.5 received a perfect score - 9 from every judge, unanimously, with a 4-point gap over second place. All three judges independently placed it first for the same reasons: dedicated test coverage sections, specific method names and API behaviors called out by name, explicit reasoning behind architectural decisions, and reviewer notes that no other entry included. Opus 4.6 called it "the most complete and production-grade PR description" of the nine.&lt;/p&gt;

&lt;p&gt;Second, and more telling: &lt;strong&gt;Gemini CLI finished last.&lt;/strong&gt; Not second to last - last, with 4 points, behind every Mesh-fed entry including smaller, cheaper Gemini variants. Its own model family, given better context by a different tool, beat it at every position in the table.&lt;/p&gt;

&lt;p&gt;The reason is the same as Experiment 1. Gemini CLI ran &lt;code&gt;git log -n 10 --stat&lt;/code&gt; and a few shell commands. Fast, low-cost, reasonable for most tasks - but it produced the same shallow picture. The resulting PR covered the surface of the changes without the architectural reasoning, edge case handling, or quantified test results that the Mesh-fed models could draw on because they had actually read the code.&lt;/p&gt;

&lt;p&gt;It's worth noting that the Mesh PR Brief isn't just raw file content dumped into a prompt. It's structured XML - commits organized chronologically, files grouped by change type, diffs nested within their commit context. That structure helps LLMs navigate 100K+ token documents more efficiently than a flat wall of text would. So "full context" here means both &lt;em&gt;more&lt;/em&gt; information and &lt;em&gt;better-organized&lt;/em&gt; information. Both matter.&lt;/p&gt;

&lt;p&gt;After the main competition, we ran Claude Code on the same feature - not as a competitor, but as a consistency check. Same pattern as Experiment 1: a short, surface-level PR based on abbreviated git output. The shallow-context behavior isn't specific to any one tool or vendor. It's structural - it's what happens when speed is optimized over depth of reading.&lt;/p&gt;




&lt;h2&gt;
  
  
  The pattern
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Context quality sets the ceiling. Model choice determines where within that ceiling you land.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Run both experiments side by side and the picture is hard to argue with.&lt;/p&gt;

&lt;p&gt;Experiment 1 tested context delivery method with the same model family. Mesh-assembled context won over native git context regardless of model tier - and the budget model beat the flagship even on a level playing field.&lt;/p&gt;

&lt;p&gt;Experiment 2 tested whether a sophisticated agentic tool could close that gap through smart native tool use. It couldn't - and it finished last against its own model family.&lt;/p&gt;

&lt;p&gt;Different features. Different PR Briefs. Different competitive sets. Different judges. The only constant was the relationship between context quality and output quality.&lt;/p&gt;

&lt;p&gt;When an AI tool reads a few lines of git log to write a PR, it isn't producing a poor result because it's a bad model. It's producing a poor result because it has been given a poor picture of what changed and why. Give any capable model the full picture - every file, every diff, every commit, structured and organized - and the output improves dramatically.&lt;/p&gt;

&lt;p&gt;The implication runs both ways. A "budget" model with rich context outperforms a flagship with shallow context. And a flagship with shallow context produces flagship-priced shallow output.&lt;/p&gt;




&lt;h2&gt;
  
  
  What this means for your workflow
&lt;/h2&gt;

&lt;p&gt;If you're using AI tools for PR descriptions today, the most impactful change probably isn't switching models.&lt;/p&gt;

&lt;p&gt;Agentic coding tools are optimized for speed and low token cost - they read summaries, not full file content. That's the right tradeoff for interactive coding tasks, where you want fast feedback and low latency. For a PR covering 20+ files and weeks of work, summary-level context produces summary-level output.&lt;/p&gt;

&lt;p&gt;The alternative is deliberate context assembly before you prompt: read every changed file in full, preserve the diff structure, organize commits chronologically, package everything in a format the LLM can navigate. You could build a script to do this - pull every changed file, run the diffs, format it into structured XML. It's achievable engineering. It's also a few days of work to do properly, and more to maintain as your codebase evolves.&lt;/p&gt;

&lt;p&gt;That's exactly why we built HiveTrail Mesh's PR Brief. Point it at a branch and within seconds it has scanned every changed file, assembled the diffs, and produced a structured 100,000+ token XML document - faster than most agentic tools complete their own context gathering. The remaining time in the workflow is just the LLM responding, which varies by model (a few seconds for smaller models, up to ~30 seconds for the larger ones). The total end-to-end time is competitive with agentic coding tools - with dramatically better output to show for it. Use any LLM you prefer: Claude, Gemini, ChatGPT, whatever fits your workflow. The model choice, as these experiments suggest, matters less than you might expect.&lt;/p&gt;

&lt;p&gt;For teams where PRs serve as living documentation, get reviewed by multiple people, or feed downstream into release notes - the tradeoff is straightforward. For a solo developer pushing a two-file fix, probably not worth it.&lt;/p&gt;




&lt;h2&gt;
  
  
  What we didn't test
&lt;/h2&gt;

&lt;p&gt;In the spirit of intellectual honesty:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt engineering.&lt;/strong&gt; Both experiments used a minimal prompt. A carefully crafted prompt might narrow the gap somewhat - though we'd expect the ceiling to remain lower without full file content. It's also worth noting that the Mesh PR Brief's structured XML format is itself a form of context organization: commits are sequenced chronologically, files are grouped by change type, and diffs are nested within their commit context. That structure likely helps LLMs parse large documents more efficiently than flat CLI output would.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Other writing tasks.&lt;/strong&gt; Both experiments focused on PR descriptions. Commit messages, technical documentation, and code review summaries likely follow the same pattern, but we haven't tested them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Newer model releases.&lt;/strong&gt; These experiments used models current at the time of testing. Rankings will shift as new models release - though the underlying dynamic (context quality determines ceiling) should hold.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost efficiency.&lt;/strong&gt; Haiku 4.5 is significantly cheaper per token than most of the models it beat. The cost-per-quality-point story is compelling but token pricing changes frequently enough that any number we published here would be stale quickly.&lt;/p&gt;




&lt;h2&gt;
  
  
  Closing thought
&lt;/h2&gt;

&lt;p&gt;The most useful takeaway from two experiments isn't a model recommendation. It's a workflow question worth asking before you prompt: &lt;em&gt;what does the model actually see?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;If the answer is "a handful of commit subject lines and a diffstat," you've already constrained the output - regardless of which model is on the other end.&lt;/p&gt;

&lt;p&gt;The models are good enough. The context is usually the bottleneck.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;HiveTrail Mesh is a context assembly tool for developers and product teams. PR Brief assembles a token-optimized, structured XML document from your git branch - ready to paste into any LLM. &lt;a href="https://hivetrail.com/mesh" rel="noopener noreferrer"&gt;Try the beta →&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>devtools</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
