<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Shinsuke Matsuda</title>
    <description>The latest articles on Forem by Shinsuke Matsuda (@xhack).</description>
    <link>https://forem.com/xhack</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3705433%2F79cb752a-d00d-4808-b773-6c443c8b3657.png</url>
      <title>Forem: Shinsuke Matsuda</title>
      <link>https://forem.com/xhack</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/xhack"/>
    <language>en</language>
    <item>
      <title>How Do We Keep Evolving a 100k-Line Codebase in the Age of AI?</title>
      <dc:creator>Shinsuke Matsuda</dc:creator>
      <pubDate>Thu, 22 Jan 2026 13:24:33 +0000</pubDate>
      <link>https://forem.com/xhack/how-do-we-keep-evolving-a-100k-line-codebase-in-the-age-of-ai-2kni</link>
      <guid>https://forem.com/xhack/how-do-we-keep-evolving-a-100k-line-codebase-in-the-age-of-ai-2kni</guid>
      <description>&lt;h2&gt;
  
  
  Plan Stack as a Methodology
&lt;/h2&gt;




&lt;p&gt;Imagine a codebase with over 100,000 lines of code.&lt;/p&gt;

&lt;p&gt;Six months ago, an AI-generated pull request added several thousand more.&lt;br&gt;
The tests passed. The review was rushed. The change was merged.&lt;/p&gt;

&lt;p&gt;Today, you need to modify that area.&lt;/p&gt;

&lt;p&gt;You can read the code.&lt;br&gt;
You can see &lt;em&gt;what&lt;/em&gt; it does.&lt;/p&gt;

&lt;p&gt;But you have no idea &lt;strong&gt;why it is the way it is&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;No one remembers what constraints existed.&lt;br&gt;
No one knows which alternatives were considered.&lt;br&gt;
And the person who “wrote” the code was an AI.&lt;/p&gt;

&lt;p&gt;This situation is no longer rare — and it’s not temporary.&lt;/p&gt;




&lt;h2&gt;
  
  
  The real problem is not “too much code”
&lt;/h2&gt;

&lt;p&gt;Traditionally, even when code was messy, we could infer intent.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We knew who wrote it
&lt;/li&gt;
&lt;li&gt;We remembered the discussion
&lt;/li&gt;
&lt;li&gt;We could reconstruct the reasoning from context
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With AI-generated code, intent lives somewhere else.&lt;/p&gt;

&lt;p&gt;It depends entirely on &lt;strong&gt;what instructions were given to the AI&lt;/strong&gt; —&lt;br&gt;
instructions that are usually &lt;em&gt;not preserved&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;So reviewers are left asking:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Was this design intentional or accidental?&lt;/li&gt;
&lt;li&gt;Were other options considered?&lt;/li&gt;
&lt;li&gt;What constraints existed at the time?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The code doesn’t answer these questions.&lt;/p&gt;

&lt;p&gt;As this repeats at scale, reviews slowly degrade into:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Does this look obviously broken?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Eventually, review itself stops scaling.&lt;/p&gt;




&lt;h2&gt;
  
  
  This is not a review problem — it’s a methodology problem
&lt;/h2&gt;

&lt;p&gt;The real issue is bigger:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;How do we keep evolving a 100k- or 1M-line codebase&lt;br&gt;&lt;br&gt;
over years, when most of the code is written by AI?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is not about productivity.&lt;br&gt;&lt;br&gt;
It’s about &lt;strong&gt;control&lt;/strong&gt;, &lt;strong&gt;maintenance&lt;/strong&gt;, and &lt;strong&gt;long-term evolution&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  “Just write plans and review them” — that’s only the entrance
&lt;/h2&gt;

&lt;p&gt;Plan Stack is often summarized as:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Write a plan first, commit it, and review the plan.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That sounds like a review optimization trick.&lt;/p&gt;

&lt;p&gt;It’s not.&lt;/p&gt;

&lt;p&gt;That’s just the entrance.&lt;/p&gt;




&lt;h2&gt;
  
  
  The real value of Plan Stack
&lt;/h2&gt;

&lt;p&gt;Plan Stack provides a &lt;strong&gt;structure&lt;/strong&gt; that makes AI-driven development sustainable at scale.&lt;/p&gt;

&lt;p&gt;More precisely, it enables three things that large, long-lived codebases&lt;br&gt;
struggle with in the AI era.&lt;/p&gt;




&lt;h2&gt;
  
  
  Scale: letting humans review intent, not output
&lt;/h2&gt;

&lt;p&gt;AI can generate unlimited code.&lt;br&gt;
Humans cannot review unlimited details.&lt;/p&gt;

&lt;p&gt;In large systems, the bottleneck is no longer implementation —&lt;br&gt;
it’s human judgment.&lt;/p&gt;

&lt;p&gt;By introducing &lt;em&gt;plans&lt;/em&gt; as first-class artifacts,&lt;br&gt;
humans review &lt;strong&gt;intent&lt;/strong&gt; instead of raw diffs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What is in scope for this change?&lt;/li&gt;
&lt;li&gt;What is explicitly out of scope?&lt;/li&gt;
&lt;li&gt;Which trade-offs are being made &lt;em&gt;this time&lt;/em&gt;?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This allows a small number of humans to stay in control,&lt;br&gt;
even as AI-generated output grows by orders of magnitude.&lt;/p&gt;




&lt;h2&gt;
  
  
  Maintainability: preserving the “why” over time
&lt;/h2&gt;

&lt;p&gt;Six months later, the hardest question is never:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“What does this code do?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It’s:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;“Why is it like this?”&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Plan Stack makes that answer explicit.&lt;/p&gt;

&lt;p&gt;Not as comments.&lt;br&gt;
Not as detached documentation.&lt;/p&gt;

&lt;p&gt;But as a plan that is reviewed, committed,&lt;br&gt;
and permanently associated with the change.&lt;/p&gt;

&lt;p&gt;This is what allows future maintainers&lt;br&gt;
— including your future self —&lt;br&gt;
to re-enter the decision context quickly.&lt;/p&gt;




&lt;h2&gt;
  
  
  Continuous evolution: plans as accumulated knowledge
&lt;/h2&gt;

&lt;p&gt;A plan is not a disposable note.&lt;/p&gt;

&lt;p&gt;In a long-lived codebase,&lt;br&gt;
plans accumulate as a &lt;strong&gt;decision history&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Why this abstraction exists&lt;/li&gt;
&lt;li&gt;Why a shortcut was acceptable &lt;em&gt;at the time&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Why a deeper refactor was deferred&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When the next change comes,&lt;br&gt;
those past plans become shared context&lt;br&gt;
for both humans and AI.&lt;/p&gt;

&lt;p&gt;As the codebase grows to hundreds of thousands or millions of lines,&lt;br&gt;
this accumulated judgment becomes more valuable, not less.&lt;/p&gt;




&lt;h2&gt;
  
  
  Isn’t this just ADRs or design docs?
&lt;/h2&gt;

&lt;p&gt;This question always comes up.&lt;/p&gt;

&lt;p&gt;ADRs and design documents are excellent at capturing&lt;br&gt;
&lt;strong&gt;big, infrequent decisions&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Architecture choices&lt;/li&gt;
&lt;li&gt;Technology selection&lt;/li&gt;
&lt;li&gt;Long-term direction&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But AI-driven development explodes the number of&lt;br&gt;
&lt;strong&gt;small, local decisions&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How much abstraction is enough &lt;em&gt;for this change&lt;/em&gt;?&lt;/li&gt;
&lt;li&gt;Should we optimize now or accept some debt?&lt;/li&gt;
&lt;li&gt;Which edge cases are intentionally ignored?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These decisions shape the codebase,&lt;br&gt;
but rarely make it into ADRs.&lt;/p&gt;

&lt;p&gt;Plan Stack exists exactly at this missing layer.&lt;/p&gt;




&lt;h2&gt;
  
  
  The key difference: proximity to code
&lt;/h2&gt;

&lt;p&gt;ADRs and design docs are distant from code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rarely updated&lt;/li&gt;
&lt;li&gt;Easy to drift out of sync&lt;/li&gt;
&lt;li&gt;Not part of the PR review loop&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Plans, in contrast:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Are written per PR&lt;/li&gt;
&lt;li&gt;Are reviewed together with code&lt;/li&gt;
&lt;li&gt;Live and die with the change itself&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Plans are not documentation.&lt;br&gt;&lt;br&gt;
They are &lt;strong&gt;part of the commit&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Is the plan for humans or for AI?
&lt;/h2&gt;

&lt;p&gt;At first glance, plans look like instructions for AI.&lt;/p&gt;

&lt;p&gt;They do help AI produce more stable output.&lt;/p&gt;

&lt;p&gt;But that’s not the point.&lt;/p&gt;

&lt;p&gt;AI can write code without plans.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Humans can’t judge AI-written code without them.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The real beneficiary of Plan Stack is the human.&lt;/p&gt;




&lt;h2&gt;
  
  
  Plans as externalized human reasoning
&lt;/h2&gt;

&lt;p&gt;AI-generated code removes the human “thinking trace” from the artifact.&lt;/p&gt;

&lt;p&gt;Plans restore it.&lt;/p&gt;

&lt;p&gt;They externalize:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What was decided&lt;/li&gt;
&lt;li&gt;What was deferred&lt;/li&gt;
&lt;li&gt;Where compromises were made&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This preserved judgment is what makes large AI-written codebases&lt;br&gt;
&lt;em&gt;maintainable&lt;/em&gt; instead of merely &lt;em&gt;functional&lt;/em&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Plan as discipline: how humans retake control
&lt;/h2&gt;

&lt;p&gt;This process is not just a workflow improvement.&lt;/p&gt;

&lt;p&gt;It is a form of &lt;strong&gt;discipline&lt;/strong&gt; in AI-driven development.&lt;/p&gt;

&lt;p&gt;The simple rule —&lt;br&gt;
&lt;em&gt;“don’t let AI write code immediately”&lt;/em&gt; —&lt;br&gt;
creates a crucial buffer.&lt;/p&gt;

&lt;p&gt;That buffer pulls control back to the human side.&lt;/p&gt;




&lt;h3&gt;
  
  
  1. Shift-left review: intervene at the cheapest point
&lt;/h3&gt;

&lt;p&gt;In traditional development,&lt;br&gt;
post-implementation code review&lt;br&gt;
was the last line of defense for quality.&lt;/p&gt;

&lt;p&gt;In the AI era, reviewing thousands of generated lines &lt;em&gt;after the fact&lt;/em&gt;&lt;br&gt;
is not realistic.&lt;/p&gt;

&lt;p&gt;Having the AI write a &lt;strong&gt;plan first&lt;/strong&gt;, and letting humans review that plan,&lt;br&gt;
means intervening where mistakes are cheapest to fix.&lt;/p&gt;

&lt;p&gt;This prevents development from turning into&lt;br&gt;
a slot machine of “generate → patch → regenerate”.&lt;/p&gt;




&lt;h3&gt;
  
  
  2. Reducing cognitive load, focusing on decisions
&lt;/h3&gt;

&lt;p&gt;Writing a plan from scratch is cognitively expensive for humans.&lt;/p&gt;

&lt;p&gt;But having the AI produce a &lt;strong&gt;draft plan&lt;/strong&gt; changes the equation.&lt;/p&gt;

&lt;p&gt;Humans no longer start from a blank page —&lt;br&gt;
they review, adjust, and approve.&lt;/p&gt;

&lt;p&gt;This lets humans focus on the highest-value activity:&lt;br&gt;
&lt;strong&gt;decision making&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI: expands the space — options, dependencies, edge cases
&lt;/li&gt;
&lt;li&gt;Humans: narrow it — scope, trade-offs, priorities
&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  3. The plan as a contract
&lt;/h3&gt;

&lt;p&gt;Once a plan is reviewed and agreed upon,&lt;br&gt;
code generation becomes contract execution.&lt;/p&gt;

&lt;p&gt;If the output looks wrong, the question becomes clear:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Was the plan ambiguous?&lt;/li&gt;
&lt;li&gt;Or did the AI fail to execute it?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Spending more time reviewing the plan&lt;br&gt;
often results in &lt;em&gt;less&lt;/em&gt; time spent coding and debugging overall.&lt;/p&gt;

&lt;p&gt;This is how Plan Stack achieves both &lt;strong&gt;scale&lt;/strong&gt; and &lt;strong&gt;control&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The future role of humans: the decision-maker
&lt;/h2&gt;

&lt;p&gt;In the AI era, human roles converge toward one thing:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Making decisions under constraints.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;What matters now?&lt;/li&gt;
&lt;li&gt;What can wait?&lt;/li&gt;
&lt;li&gt;What risks are acceptable?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AI executes.&lt;br&gt;&lt;br&gt;
Humans decide.&lt;/p&gt;




&lt;h2&gt;
  
  
  Ownership without authorship
&lt;/h2&gt;

&lt;p&gt;When AI writes most of the code, authorship becomes blurry.&lt;/p&gt;

&lt;p&gt;Ownership doesn’t.&lt;/p&gt;

&lt;p&gt;Ownership comes from recorded judgment.&lt;/p&gt;

&lt;p&gt;If decisions are preserved,&lt;br&gt;
the codebase remains &lt;em&gt;human-owned&lt;/em&gt; —&lt;br&gt;
even when AI-written.&lt;/p&gt;




&lt;h2&gt;
  
  
  Without plans, the future looks like this
&lt;/h2&gt;

&lt;p&gt;Not “unmaintainable” code — worse.&lt;/p&gt;

&lt;p&gt;Code that is &lt;strong&gt;untouchable&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;No one dares to modify existing logic,&lt;br&gt;
so new behavior gets added &lt;em&gt;next to it&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The same concept ends up implemented three times,&lt;br&gt;
each slightly different,&lt;br&gt;
because no one knows which constraints still matter.&lt;/p&gt;

&lt;p&gt;The system grows, but it doesn’t evolve.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;AI writes code.&lt;br&gt;&lt;br&gt;
Humans decide.&lt;/p&gt;

&lt;p&gt;And decisions must survive longer than any single implementation.&lt;/p&gt;

&lt;p&gt;Plan Stack is a minimal structure&lt;br&gt;
that allows humans to remain decision-makers —&lt;br&gt;
even as AI becomes the primary author.&lt;/p&gt;




&lt;h2&gt;
  
  
  A small invitation to try
&lt;/h2&gt;

&lt;p&gt;If this sounds heavy, start small.&lt;/p&gt;

&lt;p&gt;In your next PR, write a short plan first:&lt;br&gt;
what you decided, and what you intentionally didn’t.&lt;/p&gt;

&lt;p&gt;It doesn’t need to be perfect.&lt;/p&gt;

&lt;p&gt;Just make the judgment explicit — and commit it with the code.&lt;/p&gt;

&lt;p&gt;You’ll notice the shift immediately:&lt;/p&gt;

&lt;p&gt;AI-assisted development stops being &lt;em&gt;generation&lt;/em&gt;&lt;br&gt;&lt;br&gt;
and starts becoming &lt;em&gt;control&lt;/em&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>devops</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Real-Time Staff Matching Without Heavy Optimization: An Insertion-Based Approach</title>
      <dc:creator>Shinsuke Matsuda</dc:creator>
      <pubDate>Tue, 20 Jan 2026 12:25:09 +0000</pubDate>
      <link>https://forem.com/xhack/real-time-staff-matching-without-heavy-optimization-an-insertion-based-approach-162l</link>
      <guid>https://forem.com/xhack/real-time-staff-matching-without-heavy-optimization-an-insertion-based-approach-162l</guid>
      <description>&lt;h2&gt;
  
  
  0. Introduction: The Hidden Complexity of Staff Matching
&lt;/h2&gt;

&lt;p&gt;I’m working on a service that matches staff members with customer requests.&lt;/p&gt;

&lt;p&gt;At first glance, the problem looks trivial:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pick the nearest staff member
&lt;/li&gt;
&lt;li&gt;Or assign someone who is available
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But in reality, staff matching quickly becomes complicated once you consider real-world constraints:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each job has a fixed start time and duration
&lt;/li&gt;
&lt;li&gt;Staff members already have scheduled jobs
&lt;/li&gt;
&lt;li&gt;Different locations imply travel time
&lt;/li&gt;
&lt;li&gt;Skill requirements must be satisfied
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once you try to handle all of these at the same time,&lt;br&gt;&lt;br&gt;
simple nearest-neighbor searches or availability checks start to break down.&lt;/p&gt;

&lt;p&gt;At the same time, we often have strong practical requirements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;We need to select one staff member in real time&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;We don’t want heavy optimization (ILP / MIP solvers)&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The logic must be explainable and operable in production&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This article describes a &lt;strong&gt;practical matching approach that actually works&lt;/strong&gt; under those constraints.&lt;/p&gt;


&lt;h2&gt;
  
  
  What This Article Covers (and What It Doesn’t)
&lt;/h2&gt;
&lt;h3&gt;
  
  
  What this article covers
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;A practitioner-oriented way to frame the problem
&lt;/li&gt;
&lt;li&gt;A simple but powerful insertion-based algorithm
&lt;/li&gt;
&lt;li&gt;Why this approach works well in real systems
&lt;/li&gt;
&lt;li&gt;Engineering decisions that make it maintainable
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  What this article does not cover
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Mathematical formulations
&lt;/li&gt;
&lt;li&gt;Exact optimization models
&lt;/li&gt;
&lt;li&gt;Detailed solver implementations
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is intentionally &lt;strong&gt;not&lt;/strong&gt; an academic treatment.&lt;/p&gt;


&lt;h2&gt;
  
  
  1. Problem Setting (From a Practitioner’s Perspective)
&lt;/h2&gt;

&lt;p&gt;We consider the following matching problem.&lt;/p&gt;
&lt;h3&gt;
  
  
  Job (Request)
&lt;/h3&gt;

&lt;p&gt;Each job has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Start time
&lt;/li&gt;
&lt;li&gt;Duration (e.g. 30-minute units, up to several hours)
&lt;/li&gt;
&lt;li&gt;Location (latitude / longitude)
&lt;/li&gt;
&lt;li&gt;Required skills
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Staff
&lt;/h3&gt;

&lt;p&gt;Each staff member has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A set of skills
&lt;/li&gt;
&lt;li&gt;Existing scheduled jobs (each with time and location)
&lt;/li&gt;
&lt;li&gt;A base location (e.g. office or home)
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Constraints
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;A staff member cannot work on overlapping jobs
&lt;/li&gt;
&lt;li&gt;Travel time between locations must be considered
&lt;/li&gt;
&lt;li&gt;A free time slot does not necessarily mean feasibility
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Goal
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Given a single job request, select the best staff member in real time.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is closer to a &lt;em&gt;recommendation&lt;/em&gt; problem than a global assignment problem.&lt;/p&gt;


&lt;h2&gt;
  
  
  2. Common but Broken Approaches
&lt;/h2&gt;

&lt;p&gt;Before describing the solution, it’s useful to look at approaches that often fail in practice.&lt;/p&gt;
&lt;h3&gt;
  
  
  Picking the Nearest Staff Member
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Ignores the staff member’s previous job
&lt;/li&gt;
&lt;li&gt;Travel time may make the job infeasible
&lt;/li&gt;
&lt;li&gt;Schedules look valid on paper but fail in reality
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Picking Anyone Who Is Available
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Time gaps are misleading
&lt;/li&gt;
&lt;li&gt;Feasibility depends on adjacent jobs
&lt;/li&gt;
&lt;li&gt;Decisions are hard to explain afterward
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Global Optimization (ILP / MIP)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Computationally expensive
&lt;/li&gt;
&lt;li&gt;Complex to implement and operate
&lt;/li&gt;
&lt;li&gt;Poor fit for real-time, online decisions
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These approaches are reasonable in theory, but often &lt;strong&gt;don’t survive production constraints&lt;/strong&gt;.&lt;/p&gt;


&lt;h2&gt;
  
  
  3. A Shift in Perspective: From Assignment to Insertion
&lt;/h2&gt;

&lt;p&gt;Here is the key mental shift.&lt;/p&gt;

&lt;p&gt;A staff member’s day can be represented as a &lt;strong&gt;sequence of (time, location)&lt;/strong&gt; pairs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
[ 09:00 @ A ] → [ 11:00 @ B ] → [ 15:00 @ C ]

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A new job is not something to &lt;em&gt;assign&lt;/em&gt; globally.&lt;/p&gt;

&lt;p&gt;It is a &lt;strong&gt;block that might be inserted&lt;/strong&gt; somewhere into this sequence.&lt;/p&gt;

&lt;p&gt;So the core question becomes:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Can this job be inserted into the staff member’s schedule without breaking feasibility?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This framing turns a complex matching problem into a series of local feasibility checks.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Insertion Heuristics (In Plain Terms)
&lt;/h2&gt;

&lt;p&gt;After implementing this approach, I later learned that it has a name:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Insertion heuristics&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;They are commonly used in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Vehicle Routing Problems (VRP)
&lt;/li&gt;
&lt;li&gt;Technician Scheduling Problems
&lt;/li&gt;
&lt;li&gt;Dynamic routing and scheduling
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The general idea is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Take an existing route or schedule
&lt;/li&gt;
&lt;li&gt;Try inserting a new job at a feasible position
&lt;/li&gt;
&lt;li&gt;Choose the insertion with the lowest cost
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In our case:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Jobs arrive &lt;strong&gt;online&lt;/strong&gt;, one by one
&lt;/li&gt;
&lt;li&gt;We must decide &lt;strong&gt;immediately&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;We only select &lt;strong&gt;one&lt;/strong&gt; staff member
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  5. Algorithm Overview (Implementation-Oriented)
&lt;/h2&gt;

&lt;p&gt;The algorithm itself is surprisingly simple.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Filter by required skills&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Filter by coarse time availability&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Find the immediately previous and next jobs&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Check feasibility including travel time&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compute an insertion cost&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Select the staff member with the lowest score&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Two principles matter most:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Check feasibility before scoring&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Only score candidates that actually work&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This keeps the algorithm fast and robust.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. Why This Works Well in Practice
&lt;/h2&gt;

&lt;p&gt;This approach works well in real systems for clear reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Low computational cost
&lt;/li&gt;
&lt;li&gt;Suitable for real-time responses
&lt;/li&gt;
&lt;li&gt;No need for post-hoc constraint fixing
&lt;/li&gt;
&lt;li&gt;Business logic fits naturally into the scoring function
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It feels like optimization,&lt;br&gt;&lt;br&gt;
&lt;strong&gt;without behaving like heavy optimization&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  7. Engineering Design: Keep the Algorithm Pure
&lt;/h2&gt;

&lt;p&gt;One important design choice is keeping the matching logic as a &lt;strong&gt;pure function&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key principles
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;No database access inside the algorithm
&lt;/li&gt;
&lt;li&gt;All required data is passed as a snapshot
&lt;/li&gt;
&lt;li&gt;Same input always produces the same output
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Benefits
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;High reproducibility
&lt;/li&gt;
&lt;li&gt;Easy unit testing
&lt;/li&gt;
&lt;li&gt;Simple A/B testing
&lt;/li&gt;
&lt;li&gt;Easy replacement or extension in the future
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This separation pays off quickly in real projects.&lt;/p&gt;




&lt;h2&gt;
  
  
  8. Why We Don’t Encode This in SQL
&lt;/h2&gt;

&lt;p&gt;It’s tempting to push matching logic into complex SQL queries.&lt;/p&gt;

&lt;p&gt;In practice, this often leads to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Large JOINs
&lt;/li&gt;
&lt;li&gt;Window functions everywhere
&lt;/li&gt;
&lt;li&gt;Hard-to-debug logic
&lt;/li&gt;
&lt;li&gt;Unstable query performance
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Matching logic is procedural by nature:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Find adjacent jobs
&lt;/li&gt;
&lt;li&gt;Check travel feasibility
&lt;/li&gt;
&lt;li&gt;Evaluate insertion cost
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These steps are far more naturally expressed in application code than in SQL.&lt;/p&gt;




&lt;h2&gt;
  
  
  9. Practical Notes on Data Loading
&lt;/h2&gt;

&lt;p&gt;This does &lt;strong&gt;not&lt;/strong&gt; mean loading everything into memory.&lt;/p&gt;

&lt;p&gt;In production systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pre-filter candidate staff members
&lt;/li&gt;
&lt;li&gt;Load only nearby or relevant scheduled jobs
&lt;/li&gt;
&lt;li&gt;Keep snapshots minimal and fresh
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Final booking should still be handled by a separate, transactional process.&lt;/p&gt;

&lt;p&gt;Separating &lt;strong&gt;recommendation&lt;/strong&gt; from &lt;strong&gt;commitment&lt;/strong&gt; keeps the system stable.&lt;/p&gt;




&lt;h2&gt;
  
  
  10. Limitations and Trade-Offs
&lt;/h2&gt;

&lt;p&gt;This approach is not perfect.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It does not produce a global optimum
&lt;/li&gt;
&lt;li&gt;It is not suitable for batch optimization of many jobs
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, for the question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;“Who should handle this job right now?”&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;…it delivers more than enough value.&lt;/p&gt;




&lt;h2&gt;
  
  
  11. Conclusion
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Real-world constraints naturally lead to insertion-based thinking
&lt;/li&gt;
&lt;li&gt;Heavy optimization is not always the right tool
&lt;/li&gt;
&lt;li&gt;Choosing &lt;em&gt;not&lt;/em&gt; to optimize globally is also a design decision
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even with time, travel, and skill constraints,&lt;br&gt;&lt;br&gt;
&lt;strong&gt;a practical and maintainable solution is possible&lt;/strong&gt;.&lt;/p&gt;

</description>
      <category>algorithms</category>
      <category>heuristics</category>
    </item>
    <item>
      <title>TL;DR: Code Reviews Break in the AI Era — Plans Fix Them</title>
      <dc:creator>Shinsuke Matsuda</dc:creator>
      <pubDate>Tue, 20 Jan 2026 09:43:26 +0000</pubDate>
      <link>https://forem.com/xhack/tldr-code-reviews-break-in-the-ai-era-plans-fix-them-3ge9</link>
      <guid>https://forem.com/xhack/tldr-code-reviews-break-in-the-ai-era-plans-fix-them-3ge9</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;AI makes writing code dramatically faster.&lt;br&gt;&lt;br&gt;
But that speed quietly breaks code reviews.&lt;/p&gt;

&lt;p&gt;The problem isn’t large diffs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The real problem is that reviewers no longer know where to start.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The fix is not “better reviews” — it’s better plans.
&lt;/h3&gt;

&lt;p&gt;In AI-assisted development, a &lt;strong&gt;plan&lt;/strong&gt; is not a TODO list.&lt;/p&gt;

&lt;p&gt;A plan is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the &lt;strong&gt;unit of review&lt;/strong&gt; (intent, scope, boundaries)&lt;/li&gt;
&lt;li&gt;the &lt;strong&gt;unit of generation&lt;/strong&gt; (what AI should and should not change)&lt;/li&gt;
&lt;li&gt;the &lt;strong&gt;unit of knowledge&lt;/strong&gt; (what gets promoted to docs later)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When plans are treated as first-class artifacts and committed to the repo:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reviews start from &lt;em&gt;intent&lt;/em&gt;, not diff scanning&lt;/li&gt;
&lt;li&gt;PR sizes shrink structurally&lt;/li&gt;
&lt;li&gt;Human reviews focus on judgment, not syntax&lt;/li&gt;
&lt;li&gt;AI reviews become intent-aware&lt;/li&gt;
&lt;li&gt;Context window usage drops&lt;/li&gt;
&lt;li&gt;Future changes become cheaper&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Plans are not documents. They are process.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If AI writes more code, humans must decide &lt;em&gt;more clearly&lt;/em&gt; —&lt;br&gt;&lt;br&gt;
and plans are how we do that.&lt;/p&gt;




&lt;p&gt;👉 &lt;strong&gt;Read the full article:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;a href="https://dev.to/xhack/why-plans-should-be-first-class-artifacts-in-ai-assisted-development-4f36"&gt;Why Plans Should Be First-Class Artifacts in AI-Assisted Development&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>codereview</category>
      <category>programming</category>
      <category>devops</category>
    </item>
    <item>
      <title>Why “Plans” Should Be First-Class Artifacts in AI-Assisted Development</title>
      <dc:creator>Shinsuke Matsuda</dc:creator>
      <pubDate>Tue, 20 Jan 2026 09:37:22 +0000</pubDate>
      <link>https://forem.com/xhack/why-plans-should-be-first-class-artifacts-in-ai-assisted-development-4f36</link>
      <guid>https://forem.com/xhack/why-plans-should-be-first-class-artifacts-in-ai-assisted-development-4f36</guid>
      <description>&lt;h2&gt;
  
  
  Why Plans Should Be First-Class Artifacts in AI-Assisted Development
&lt;/h2&gt;

&lt;p&gt;AI-assisted development is no longer experimental.&lt;br&gt;&lt;br&gt;
At this point, it’s fair to say it has become mainstream.&lt;/p&gt;

&lt;p&gt;There are, of course, organizations that cannot adopt generative AI yet due to security or regulatory constraints.&lt;br&gt;&lt;br&gt;
But for software teams that &lt;em&gt;can&lt;/em&gt; use AI and still choose not to, the issue is no longer technical—it’s a matter of mindset.&lt;/p&gt;

&lt;p&gt;Productivity gains from AI-assisted development are real.&lt;br&gt;&lt;br&gt;
Today, shipping and operating small to mid-sized services with minimal handwritten code is no longer unrealistic.&lt;/p&gt;

&lt;p&gt;And yet, many teams report the same frustrations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“We can’t keep track of the massive amount of AI-generated code.”&lt;/li&gt;
&lt;li&gt;“AI-written code is hard to maintain.”&lt;/li&gt;
&lt;li&gt;“Pull requests are getting huge, and reviews have become the bottleneck.”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I believe these problems are not caused by limitations of AI itself.&lt;br&gt;&lt;br&gt;
They stem from the fact that &lt;strong&gt;our development processes haven’t caught up with the AI era&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Problem: Reviews Can’t Even Start
&lt;/h2&gt;

&lt;p&gt;AI dramatically increases the speed of code generation.&lt;br&gt;&lt;br&gt;
But the real pain in AI-assisted development is not that diffs are large.&lt;/p&gt;

&lt;p&gt;The real problem is this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reviewers don’t know where to start.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When reviewing an AI-generated pull request, reviewers implicitly need answers to questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What is the intent of this change?&lt;/li&gt;
&lt;li&gt;What is the scope of this PR? (Where is the review boundary?)&lt;/li&gt;
&lt;li&gt;What assumptions are being preserved?&lt;/li&gt;
&lt;li&gt;Which parts are risky and need extra attention?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without an explicit plan, reviewers are forced to reconstruct all of this information &lt;strong&gt;from the diff alone&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;As AI writes more code faster, this reconstruction step becomes the true bottleneck.&lt;/p&gt;




&lt;h2&gt;
  
  
  What “Plan” Means in This Article
&lt;/h2&gt;

&lt;p&gt;Before going further, let’s define what “plan” means here.&lt;/p&gt;

&lt;p&gt;This is not about tools or features.&lt;br&gt;&lt;br&gt;
It’s about &lt;strong&gt;how we structure information to make reviews work in the AI era&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A plan is not a TODO list.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A plan is a compressed decision artifact that makes implementation and review possible.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A good plan typically includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Findings from investigating the existing codebase
(what matters, what assumptions exist)&lt;/li&gt;
&lt;li&gt;Goals and non-goals&lt;/li&gt;
&lt;li&gt;References to relevant specs or architecture&lt;/li&gt;
&lt;li&gt;Expected impact and risks&lt;/li&gt;
&lt;li&gt;The review boundary (how far this PR should go)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key point is that a plan is &lt;strong&gt;reviewable in size&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
It is not a long design document.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Plans Should Be Committed
&lt;/h2&gt;

&lt;p&gt;Here’s an important point:&lt;br&gt;&lt;br&gt;
the size and quality of plans described above are not theoretical.&lt;/p&gt;

&lt;p&gt;Modern AI tools (including Claude Code) can reliably generate plans at this granularity.&lt;br&gt;&lt;br&gt;
That makes plans not temporary notes, but real artifacts worth keeping.&lt;/p&gt;




&lt;h3&gt;
  
  
  1) A Plan Is a Deliverable
&lt;/h3&gt;

&lt;p&gt;In AI-assisted development, a plan is no longer a personal memo.&lt;/p&gt;

&lt;p&gt;A solid plan contains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Consolidated investigation results&lt;/li&gt;
&lt;li&gt;A clear list of affected files&lt;/li&gt;
&lt;li&gt;Explicit intent and boundaries&lt;/li&gt;
&lt;li&gt;Rejected alternatives and caveats&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At this point, the plan has become &lt;strong&gt;shared team knowledge&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
It is agreed upon before implementation and often referenced longer than the code itself.&lt;/p&gt;

&lt;p&gt;If that’s the case, it should be treated like code.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Anything that changes over time should live where change history is tracked best:&lt;br&gt;&lt;br&gt;
in git.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  2) Reviews Can Start with Intent, Not Diff Reading
&lt;/h3&gt;

&lt;p&gt;Without a plan, reviews inevitably start like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Read the diff top to bottom&lt;/li&gt;
&lt;li&gt;Try to infer intent&lt;/li&gt;
&lt;li&gt;Guess the impact&lt;/li&gt;
&lt;li&gt;Discover late that “this wasn’t the intended direction”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With a plan, reviews start somewhere else:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is this the right direction?&lt;/li&gt;
&lt;li&gt;Is this review boundary safe?&lt;/li&gt;
&lt;li&gt;Are the stated invariants preserved?&lt;/li&gt;
&lt;li&gt;Which parts deserve the most attention?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As a result, reviews shift from&lt;br&gt;&lt;br&gt;
&lt;strong&gt;“code inspection” to “decision validation.”&lt;/strong&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  3) PR Sizes Shrink Structurally
&lt;/h3&gt;

&lt;p&gt;When the review boundary is explicitly written in the plan, agreement comes first.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;This PR stops here&lt;/li&gt;
&lt;li&gt;Behavior is unchanged in this step&lt;/li&gt;
&lt;li&gt;Follow-up work goes into the next PR&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With this structure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Massive PRs become rare&lt;/li&gt;
&lt;li&gt;“While I’m here” changes are easier to reject&lt;/li&gt;
&lt;li&gt;Reviews stop collapsing under their own weight&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In other words, we convert&lt;br&gt;&lt;br&gt;
&lt;strong&gt;“how much AI can generate” into “how much humans can reasonably decide.”&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Plans Improve Review Quality—for Humans and AI
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1) Human Reviews Shift Focus
&lt;/h3&gt;

&lt;p&gt;Reviewing diffs alone leads to flat review criteria:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is the code style okay?&lt;/li&gt;
&lt;li&gt;Does it look bug-free?&lt;/li&gt;
&lt;li&gt;Are tests present?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are important, but increasingly they are areas where&lt;br&gt;&lt;br&gt;
automated checks and AI reviews excel.&lt;/p&gt;

&lt;p&gt;That means human review time is better spent on &lt;strong&gt;judgment-heavy areas&lt;/strong&gt;:&lt;br&gt;
design intent, boundaries, invariants, and risk.&lt;/p&gt;

&lt;p&gt;With a plan, reviews gain contrast:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;High-impact areas get deeper scrutiny&lt;/li&gt;
&lt;li&gt;“Non-goals” are explicitly verified&lt;/li&gt;
&lt;li&gt;Risky paths are reproduced locally&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result isn’t less review effort—it’s &lt;strong&gt;better reviews&lt;/strong&gt;.&lt;/p&gt;




&lt;h3&gt;
  
  
  2) AI Reviews Become Intent-Aware
&lt;/h3&gt;

&lt;p&gt;When AI reviews only see diffs, feedback tends to be generic.&lt;/p&gt;

&lt;p&gt;When plans are included, AI reviews can instead check:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Are non-goals violated?&lt;/li&gt;
&lt;li&gt;Did changes cross the review boundary?&lt;/li&gt;
&lt;li&gt;Are promised invariants preserved?&lt;/li&gt;
&lt;li&gt;Are impact estimates missing something?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AI reviews evolve from&lt;br&gt;&lt;br&gt;
&lt;strong&gt;diff interpretation to intent verification.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Decisive Value of Plans in the AI Era
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1) Plans Save Context Window
&lt;/h3&gt;

&lt;p&gt;Plans change what we pass to AI.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No need to paste full chat histories&lt;/li&gt;
&lt;li&gt;No need to dump raw logs&lt;/li&gt;
&lt;li&gt;Only relevant documents and summarized findings&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A plan is a &lt;strong&gt;compressed state representation&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Readable by humans&lt;/li&gt;
&lt;li&gt;Easy for AI to understand&lt;/li&gt;
&lt;li&gt;Cheap in context window usage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This leaves more room for actual reasoning.&lt;/p&gt;




&lt;h3&gt;
  
  
  2) Plans Shortcut File Discovery
&lt;/h3&gt;

&lt;p&gt;For similar follow-up tasks, the most expensive step is often figuring out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which files matter&lt;/li&gt;
&lt;li&gt;What depends on what&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If plans are committed, AI doesn’t need to re-scan the entire codebase.&lt;br&gt;&lt;br&gt;
It can reuse explicit file lists, dependencies, and constraints—&lt;br&gt;&lt;br&gt;
avoiding unnecessary context pollution.&lt;/p&gt;




&lt;h3&gt;
  
  
  3) Plans Promote Knowledge
&lt;/h3&gt;

&lt;p&gt;Plans are short-lived by nature, but they often contain durable knowledge.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Feature-level knowledge can graduate into documentation&lt;/li&gt;
&lt;li&gt;Constraints and boundaries can move into architecture docs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When teams consistently:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Aggregate knowledge in plans&lt;/li&gt;
&lt;li&gt;Promote it after completion&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They stop relying on past chat logs as an external memory.&lt;/p&gt;




&lt;h2&gt;
  
  
  Plans Are Not Documents—They Are Process
&lt;/h2&gt;

&lt;p&gt;The goal is not to treat plans as optional artifacts.&lt;/p&gt;

&lt;p&gt;A healthy loop looks like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Commit plans&lt;/li&gt;
&lt;li&gt;Review plans (intent, boundaries, non-goals)&lt;/li&gt;
&lt;li&gt;Update plans alongside implementation&lt;/li&gt;
&lt;li&gt;Promote lasting knowledge into canonical docs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This loop becomes &lt;strong&gt;the development process in the AI era&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Faster AI-generated code makes reviews more fragile&lt;/li&gt;
&lt;li&gt;The root cause is not diff size, but the loss of a clear review starting point&lt;/li&gt;
&lt;li&gt;Treating plans as first-class artifacts enables:

&lt;ul&gt;
&lt;li&gt;Reviews to start from intent&lt;/li&gt;
&lt;li&gt;Structurally smaller PRs&lt;/li&gt;
&lt;li&gt;Higher-quality human and AI reviews&lt;/li&gt;
&lt;li&gt;Lower context window usage&lt;/li&gt;
&lt;li&gt;Cheaper future exploration&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Plans are the unit of review, generation, and knowledge in AI-assisted development.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That’s why plans are deliverables—and why they belong in git.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>codereview</category>
      <category>productivity</category>
    </item>
    <item>
      <title>The 2M Token Trap: Why "Context Stuffing" Kills Reasoning</title>
      <dc:creator>Shinsuke Matsuda</dc:creator>
      <pubDate>Sun, 11 Jan 2026 14:56:20 +0000</pubDate>
      <link>https://forem.com/xhack/the-2m-token-trap-why-context-stuffing-kills-reasoning-1kck</link>
      <guid>https://forem.com/xhack/the-2m-token-trap-why-context-stuffing-kills-reasoning-1kck</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu3ygmzam4w02cntim3dk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu3ygmzam4w02cntim3dk.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Why more context often makes LLMs worse—and what to do instead&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Introduction
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Context Window Arms Race
&lt;/h3&gt;

&lt;p&gt;The expansion of context windows has been staggering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Early 2023&lt;/strong&gt;: GPT-4 launches with 32K tokens
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;November 2023&lt;/strong&gt;: GPT-4 Turbo extends to 128K
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;March 2024&lt;/strong&gt;: Claude 3 reaches 200K
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;February 2024&lt;/strong&gt;: Gemini 1.5 hits 1M—later expanding to 2M
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In just two years, context capacity grew from 32K to 2M tokens—a &lt;strong&gt;62× increase&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The developer intuition was immediate and seemingly logical:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“If everything fits, just put everything in.”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  The Paradox: More Context, Worse Results
&lt;/h3&gt;

&lt;p&gt;Practitioners are discovering a counterintuitive pattern:&lt;br&gt;&lt;br&gt;
&lt;strong&gt;the more context you provide, the worse the model performs.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Common symptoms include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Passing an entire codebase → misunderstood design intent
&lt;/li&gt;
&lt;li&gt;Including exhaustive logs → critical errors overlooked
&lt;/li&gt;
&lt;li&gt;Providing comprehensive documentation → unfocused responses
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This phenomenon has a name in the research literature:&lt;br&gt;&lt;br&gt;
&lt;strong&gt;“Lost in the Middle”&lt;/strong&gt; (Liu et al., 2023).&lt;br&gt;&lt;br&gt;
Information placed in the middle of long contexts is systematically neglected.&lt;/p&gt;

&lt;p&gt;The uncomfortable truth is this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;A context window is not just storage capacity. It is cognitive load.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This article explores why &lt;em&gt;Context Stuffing&lt;/em&gt; fails, what Anthropic’s Claude Code reveals about effective context management, and how to shift from &lt;strong&gt;Prompt Engineering&lt;/strong&gt; to &lt;strong&gt;Context Engineering&lt;/strong&gt;—the discipline of architectural curation for AI systems.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Why “More Context” Doesn’t Mean “Better Understanding”
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Capacity vs. Capability
&lt;/h3&gt;

&lt;p&gt;We must distinguish between two fundamentally different concepts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Capacity&lt;/strong&gt;: How much data fits in memory (e.g. 200K, 2M tokens)
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Capability&lt;/strong&gt;: The ability to prioritize, connect, and reason over that data
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Just because a model can ingest 2 million tokens does not mean it can pay attention to them equally.&lt;/p&gt;

&lt;p&gt;Providing a 2M-token context to an LLM is like handing a new developer &lt;strong&gt;10,000 pages of documentation on day one&lt;/strong&gt; and expecting them to fix a bug in five minutes.&lt;br&gt;&lt;br&gt;
They won’t understand the system—they will immediately drown in it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Attention Dilution and “Lost in the Middle”
&lt;/h3&gt;

&lt;p&gt;This limitation is rooted in the &lt;strong&gt;self-attention mechanism&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
As token count increases, attention distributions flatten, signal-to-noise ratios drop, and relevant information gets buried.&lt;/p&gt;

&lt;p&gt;Liu et al. (2023) demonstrated that information placed in the middle of long contexts is &lt;strong&gt;systematically neglected—even when explicitly relevant&lt;/strong&gt;—while content at the beginning and end receives disproportionate attention.&lt;/p&gt;

&lt;p&gt;In short:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Context expansion increases what can be accessed, not what can be understood.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Real-World Symptoms
&lt;/h3&gt;

&lt;p&gt;In practice, adding information often &lt;em&gt;degrades&lt;/em&gt; accuracy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Entire codebases → architectural misinterpretation
&lt;/li&gt;
&lt;li&gt;Exhaustive logs → critical signals buried
&lt;/li&gt;
&lt;li&gt;Comprehensive docs → answers drift off-topic
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are not failures of model intelligence.&lt;br&gt;&lt;br&gt;
They are failures of &lt;strong&gt;information structure and prioritization&lt;/strong&gt;—problems no amount of context capacity can solve.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. The 75% Rule: Lessons from Claude Code
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Problem: Quality Degradation in Long Sessions
&lt;/h3&gt;

&lt;p&gt;The strongest evidence against Context Stuffing comes from &lt;strong&gt;Claude Code&lt;/strong&gt;, Anthropic’s terminal-based coding agent with a 200K context window.&lt;/p&gt;

&lt;p&gt;In early 2024, users reported recurring issues:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Code quality degraded over long sessions
&lt;/li&gt;
&lt;li&gt;Earlier design decisions were forgotten
&lt;/li&gt;
&lt;li&gt;Auto-compact sometimes failed, causing infinite loops
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At the time, Claude Code routinely used &lt;strong&gt;over 90%&lt;/strong&gt; of its available context.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution: Auto-Compact at 75%
&lt;/h3&gt;

&lt;p&gt;In September 2024, Anthropic implemented a counterintuitive fix:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Trigger auto-compact when context usage reaches 75%.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This meant:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;~150K tokens used for storage
&lt;/li&gt;
&lt;li&gt;~50K tokens deliberately left empty
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What looked like waste turned out to be the key to &lt;strong&gt;dramatic quality improvements&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why It Works: Inference Space
&lt;/h3&gt;

&lt;p&gt;Several hypotheses explain why this works:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Context Compression&lt;/strong&gt; — Low-relevance information is removed
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Information Restructuring&lt;/strong&gt; — Summaries reorganize scattered data
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Preserving Room for Reasoning&lt;/strong&gt; — Empty space enables generation
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;As one developer put it:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“That free context space isn’t wasted—it’s where reasoning happens.”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This mirrors computer memory behavior:&lt;br&gt;&lt;br&gt;
Running at 95% RAM doesn’t mean the remaining 5% is idle—it’s system overhead. Push to 100%, and everything grinds to a halt.&lt;/p&gt;

&lt;h3&gt;
  
  
  Takeaway
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Filling context to capacity degrades output quality.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Effective context management requires &lt;strong&gt;headroom&lt;/strong&gt;—space reserved for reasoning, not just retrieval.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. The Three Principles of Context Engineering
&lt;/h2&gt;

&lt;p&gt;The era of prompt wording tweaks is ending.&lt;/p&gt;

&lt;p&gt;As Hamel Husain observed:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“AI Engineering is Context Engineering.”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The critical skill is no longer &lt;em&gt;what you say&lt;/em&gt; to the model, but &lt;em&gt;what you put in front of it&lt;/em&gt;—and what you deliberately leave out.&lt;/p&gt;

&lt;h3&gt;
  
  
  Principle 1: Isolation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Do not dump the monolith.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Borrow &lt;strong&gt;Bounded Contexts&lt;/strong&gt; from Domain-Driven Design.&lt;br&gt;&lt;br&gt;
Provide the smallest effective context for the task.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example: Add OAuth2 authentication&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Needed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;User&lt;/code&gt; model
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;SessionController&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;routes.rb&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Relevant auth middleware
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Not needed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Billing module
&lt;/li&gt;
&lt;li&gt;CSS styles
&lt;/li&gt;
&lt;li&gt;Unrelated APIs
&lt;/li&gt;
&lt;li&gt;Other test fixtures
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Ask:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;What is the minimum context required to solve this problem?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Principle 2: Chaining
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pass artifacts, not histories.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Break workflows into stages:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Plan → Execute → Reflect&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each stage receives only the previous stage’s output—not the entire conversation history.&lt;br&gt;&lt;br&gt;
This keeps context fresh and signal-dense.&lt;/p&gt;

&lt;p&gt;Ask:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Can this be decomposed into stages that pass summaries instead of transcripts?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Principle 3: Headroom
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Never run a model at 100% capacity.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Adopt the &lt;strong&gt;75% Rule&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
Token limits usually cover &lt;em&gt;input + output&lt;/em&gt;. Stuffing 195K tokens into a 200K window leaves almost no room for reasoning.&lt;/p&gt;

&lt;p&gt;Ask:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Have I left enough space for the model to think—not just respond?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Treat the context window as a scarce cognitive resource, not infinite storage.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  5. Why 200K Is the Sweet Spot
&lt;/h2&gt;

&lt;p&gt;Despite 2M-token models, &lt;strong&gt;200K is the practical sweet spot&lt;/strong&gt; for Context Engineering.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cognitive Scale
&lt;/h3&gt;

&lt;p&gt;150K tokens (75% of 200K) is roughly one technical book—about the largest coherent “project state” both humans and LLMs can manage. Beyond that, you need chapters, summaries, and architecture.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost and Latency
&lt;/h3&gt;

&lt;p&gt;Attention scales at &lt;strong&gt;O(n²)&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
Doubling context quadruples cost.&lt;br&gt;&lt;br&gt;
200K balances performance, latency, and cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  Methodological Discipline
&lt;/h3&gt;

&lt;p&gt;200K &lt;strong&gt;forces curation&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
Exceeding it is a code smell: unclear boundaries, oversized tasks, or stuffing instead of structuring.&lt;/p&gt;

&lt;p&gt;Anthropic offers 1M tokens—but behind premium tiers.&lt;br&gt;&lt;br&gt;
The implicit message:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;1M is for special cases. 200K is the default for a reason.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The constraint is not a limitation—it is &lt;em&gt;the&lt;/em&gt; design principle.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. Conclusion: From Prompt Engineering to Context Engineering
&lt;/h2&gt;

&lt;p&gt;The context window arms race delivered a 62× increase in capacity.&lt;br&gt;&lt;br&gt;
But capacity was never the bottleneck.&lt;/p&gt;

&lt;p&gt;The bottleneck is—and always has been—&lt;strong&gt;curation&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The shift is fundamental:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Prompt Engineering&lt;/th&gt;
&lt;th&gt;Context Engineering&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;“How do I phrase this?”&lt;/td&gt;
&lt;td&gt;“What should the model see?”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Optimizing words&lt;/td&gt;
&lt;td&gt;Architecting information&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Single-shot prompts&lt;/td&gt;
&lt;td&gt;Multi-stage pipelines&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Filling capacity&lt;/td&gt;
&lt;td&gt;Preserving headroom&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Three Questions to Ask Before Every Task
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Am I stuffing context just because I can?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Relevant beats exhaustive.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Is this context isolated to the real problem?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
If you can’t state the boundary, you haven’t found it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Have I left room for the model to think?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Output quality requires input restraint.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The era of prompt engineering rewarded clever wording.&lt;br&gt;&lt;br&gt;
The era of context engineering rewards &lt;strong&gt;architectural judgment&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The question is no longer:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What should I say to the model?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The question is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;What world should the model see?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  7. References
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Research Papers
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Liu et al., &lt;em&gt;Lost in the Middle: How Language Models Use Long Contexts&lt;/em&gt; (2023)
&lt;a href="https://arxiv.org/abs/2307.03172" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2307.03172&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Tools &amp;amp; Methodologies
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;planstack.ai — &lt;a href="https://github.com/planstack-ai/planstack" rel="noopener noreferrer"&gt;https://github.com/planstack-ai/planstack&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Empirical Studies
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Greg Kamradt, &lt;em&gt;Needle in a Haystack&lt;/em&gt;
&lt;a href="https://github.com/gkamradt/LLMTest_NeedleInAHaystack" rel="noopener noreferrer"&gt;https://github.com/gkamradt/LLMTest_NeedleInAHaystack&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Articles &amp;amp; Analysis
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;How Claude Code Got Better by Protecting More Context&lt;/em&gt; (2024)&lt;br&gt;&lt;br&gt;
&lt;a href="https://hyperdev.matsuoka.com/p/how-claude-code-got-better-by-protecting" rel="noopener noreferrer"&gt;https://hyperdev.matsuoka.com/p/how-claude-code-got-better-by-protecting&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Hamel Husain, &lt;em&gt;Context Rot&lt;/em&gt;&lt;br&gt;&lt;br&gt;
&lt;a href="https://hamel.dev/notes/llm/rag/p6-context_rot.html" rel="noopener noreferrer"&gt;https://hamel.dev/notes/llm/rag/p6-context_rot.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>llm</category>
      <category>promptengineering</category>
      <category>contextengineering</category>
    </item>
  </channel>
</rss>
