<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: shakti mishra</title>
    <description>The latest articles on Forem by shakti mishra (@shakti_mishra_308e9f36b5d).</description>
    <link>https://forem.com/shakti_mishra_308e9f36b5d</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3895003%2Ff64e0882-0aa9-44ad-8c7c-a53d7a669188.jpg</url>
      <title>Forem: shakti mishra</title>
      <link>https://forem.com/shakti_mishra_308e9f36b5d</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/shakti_mishra_308e9f36b5d"/>
    <language>en</language>
    <item>
      <title>Mythos and Cyber Models: What does it mean for the future of software?</title>
      <dc:creator>shakti mishra</dc:creator>
      <pubDate>Sat, 25 Apr 2026 23:24:05 +0000</pubDate>
      <link>https://forem.com/shakti_mishra_308e9f36b5d/mythos-and-cyber-models-what-does-it-mean-for-the-future-of-software-edb</link>
      <guid>https://forem.com/shakti_mishra_308e9f36b5d/mythos-and-cyber-models-what-does-it-mean-for-the-future-of-software-edb</guid>
      <description>&lt;h2&gt;
  
  
  Anthropic Made Its Model Worse On Purpose. Here's What That Tells You About the State of AI Security.
&lt;/h2&gt;

&lt;p&gt;In the entire history of commercial AI model releases, no company has intentionally made a model &lt;em&gt;worse&lt;/em&gt; on a published benchmark before shipping it to the public.&lt;/p&gt;

&lt;p&gt;That changed this month.&lt;/p&gt;

&lt;p&gt;Anthropic released Opus 4.7. And if you look at the CyberBench scores, it performs below Opus 4.6 — the model it was supposed to supersede. That regression was not a bug. It was a deliberate product decision, and understanding why they made it is one of the most important things a software architect can do right now.&lt;/p&gt;

&lt;p&gt;The reason is a model called Claude Mythos. It is the most capable vulnerability-discovery system ever tested on real-world production software. It found a 27-year-old flaw in OpenBSD — one of the most security-hardened operating systems on the planet. It found a 16-year-old vulnerability in FFmpeg. It chained multiple Linux kernel weaknesses into a working privilege escalation exploit, going from ordinary user access to full machine control.&lt;/p&gt;

&lt;p&gt;And then Anthropic looked at those results, looked at the systems the rest of the world runs on, and decided the right thing to do was to restrict access before releasing anything more capable publicly.&lt;/p&gt;

&lt;p&gt;That decision is the signal. Everything else in this post explains what it means.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Claude Mythos Actually Did
&lt;/h2&gt;

&lt;p&gt;Mythos is not a research artifact or a red-team proof of concept. It is a production-grade capability that was released — under the codename &lt;strong&gt;Project Glasswing&lt;/strong&gt; — to a small set of approximately 40 vetted organizations that operate critical software, specifically so they could begin hardening their systems before the model's capabilities became more widely known.&lt;/p&gt;

&lt;p&gt;What it demonstrated in controlled environments:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Active zero-day discovery at scale.&lt;/strong&gt; Mythos does not just match known CVE patterns. It analyzes real systems, identifies previously undocumented vulnerabilities, and produces working proof-of-concept exploit chains. The OpenBSD bug had existed since 1997. It was not obscure legacy code that nobody touched — OpenBSD is actively maintained and specifically designed to be resistant to exactly this kind of analysis. A 27-year-old bug surviving in that environment is not a failure of individual engineers. It is a signal about the limits of human-scale review.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Exploit chaining.&lt;/strong&gt; Finding a single vulnerability is one thing. Combining multiple weaknesses into a viable attack path is the work that turns a theoretical risk into a real one. Mythos demonstrated the ability to do this across kernel-level Linux vulnerabilities, turning a sequence of low-individually-critical issues into full privilege escalation. This is the kind of chain that typically takes a skilled attacker weeks to construct. The model did it as part of its analysis pass.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scale that no human team can match.&lt;/strong&gt; The significance is not any single finding — it is the rate. Human security researchers are bottlenecked by expertise, time, and context-switching. Mythos evaluates thousands of potential attack surfaces in parallel, continuously, without fatigue or prioritization constraints.&lt;/p&gt;




&lt;h2&gt;
  
  
  OpenAI Is Thinking the Same Thing
&lt;/h2&gt;

&lt;p&gt;Anthropic is not operating in isolation. Within days of Mythos going out to Project Glasswing partners, OpenAI released &lt;strong&gt;GPT-5.4-Cyber&lt;/strong&gt; — a variant of its flagship model fine-tuned specifically for defensive cybersecurity use cases. It is only available to vetted participants in their &lt;strong&gt;Trusted Access for Cyber (TAC)&lt;/strong&gt; program.&lt;/p&gt;

&lt;p&gt;The parallel is striking:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Anthropic                              OpenAI
─────────────────────────────────────────────────────
Claude Mythos                          GPT-5.4-Cyber
Project Glasswing (~40 partners)       TAC program (vetted participants)
Restricted pre-release access          Safety-guardrail modifications
                                       for authenticated defenders
Vulnerability discovery &amp;amp; chaining     Binary reverse engineering enabled
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GPT-5.4-Cyber goes further in one specific way: it removes many standard safety guardrails for authenticated defenders, including support for binary reverse engineering — a capability that is normally off-limits. OpenAI's Codex Security tool has already contributed to fixing over 3,000 critical and high-severity vulnerabilities.&lt;/p&gt;

&lt;p&gt;What this pattern tells you is not that these models are risky in an abstract sense. It is that both of the leading frontier AI labs have independently reached the same conclusion: their models are now powerful enough that unrestricted public access would be a net liability. That is not a marketing stunt. That is not regulatory positioning. That is two organizations treating their own work the way defense contractors treat classified technology.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Shift That Actually Matters: Human Effort Is No Longer the Limit
&lt;/h2&gt;

&lt;p&gt;For as long as software security has existed as a discipline, there has been a natural rate-limiting factor: &lt;strong&gt;human effort&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Finding vulnerabilities required skilled people with time, focus, and domain expertise. Even the most sophisticated state-level adversaries were constrained by how fast their teams could move. The difficulty of exploitation was, itself, a form of defense.&lt;/p&gt;

&lt;p&gt;That constraint is gone.&lt;/p&gt;

&lt;p&gt;Here is what the new operating environment looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Old model (human-rate-limited):
─────────────────────────────────────────────────────
Attacker → manually analyze codebase
         → weeks/months per target
         → limited to known vulnerability patterns
         → exploitation requires specialists
         → limited parallelism

New model (AI-accelerated):
─────────────────────────────────────────────────────
AI system → continuous automated analysis
          → thousands of targets in parallel
          → identifies novel vulnerability classes
          → generates working exploit chains
          → operates 24/7 without fatigue
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The attack surface has not changed. The cost of probing it has dropped by orders of magnitude.&lt;/p&gt;

&lt;p&gt;Vulnerability discovery now happens continuously instead of periodically. Exploit development can be partially or fully automated. And as these models become accessible — either through legitimate programs or through underground markets where stripped-down variants already circulate — the population of actors capable of sophisticated attacks expands dramatically.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Problem: The Remediation Gap
&lt;/h2&gt;

&lt;p&gt;Here is the uncomfortable truth that the Mythos story exposes.&lt;/p&gt;

&lt;p&gt;Most of the risk in software systems today does not come from vulnerabilities that haven't been found yet. It comes from vulnerabilities that have already been found, are already documented, and have not been patched.&lt;/p&gt;

&lt;p&gt;Security teams work against a perpetual backlog. Systems are too fragile to update quickly. Regressions break things when patches go in. Dependency chains make change expensive. This is the normal operational state of almost every engineering organization running at scale.&lt;/p&gt;

&lt;p&gt;What AI does is &lt;strong&gt;accelerate the discovery side without equally accelerating the remediation side.&lt;/strong&gt; That asymmetry is the actual risk.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Discovery velocity         ████████████████████████████░░  (AI-accelerated)
Remediation velocity       ████████░░░░░░░░░░░░░░░░░░░░░░  (still human-rate-limited)
                                    ^^^
                            This gap is your attack surface
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A system that finds 10,000 previously unknown vulnerabilities in a month is not obviously helpful if your team can patch 200. The remaining 9,800 are now known — potentially to adversaries — and unaddressed. The net effect can be a larger effective attack surface, even though the underlying systems have not changed at all.&lt;/p&gt;

&lt;p&gt;This is the design problem that the industry has not solved. Mythos forced the conversation into the open.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Monoculture Risk Nobody Is Talking About
&lt;/h2&gt;

&lt;p&gt;Individual vulnerabilities are dangerous. Vulnerabilities in software that runs everywhere are catastrophic.&lt;/p&gt;

&lt;p&gt;The hidden amplification factor in this story is &lt;strong&gt;software monoculture&lt;/strong&gt;: the same operating systems, the same libraries, the same frameworks are used across millions of production systems globally. A single vulnerability in glibc, OpenSSL, or the Linux kernel is not a bug in one application. It is a bug in the substrate that most of the world's software infrastructure runs on.&lt;/p&gt;

&lt;p&gt;When AI accelerates vulnerability discovery in monoculture environments, the impact does not scale linearly — it scales by the number of systems running that codebase.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Traditional single-target exploit:
  1 attacker → 1 target → 1 breach

AI-discovered monoculture exploit:
  1 AI system → 1 vulnerability → millions of targets
                                 (same code, different deployments)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is how the Mythos findings — an OpenBSD bug, an FFmpeg flaw — become systemic risks rather than isolated incidents. OpenBSD runs in firewalls, embedded systems, and network appliances across critical infrastructure. FFmpeg processes video in applications that touch billions of users. These are not edge cases.&lt;/p&gt;




&lt;h2&gt;
  
  
  An Unexpected Counterforce
&lt;/h2&gt;

&lt;p&gt;There is one interesting development beginning to emerge from the same forces that created this risk.&lt;/p&gt;

&lt;p&gt;As AI reduces the cost of building software, organizations may — over time — begin to build more customized, less standardized systems. When you can generate a bespoke authentication module in minutes instead of weeks, the calculus around using shared libraries changes.&lt;/p&gt;

&lt;p&gt;If that shift materializes at scale, it could reduce the blast radius of any single vulnerability. Attackers cannot reuse the same exploit across millions of targets if the targets are no longer running identical code.&lt;/p&gt;

&lt;p&gt;The catch is that this benefit only materializes if &lt;strong&gt;security practices evolve at the same pace as development&lt;/strong&gt;. Right now, AI is accelerating development velocity significantly faster than it is accelerating security rigor. The window between "built with AI" and "secured with AI" is where the risk lives.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where This is Heading: AI vs. AI
&lt;/h2&gt;

&lt;p&gt;The end state of this trajectory is a security landscape that operates entirely differently from today's.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Current state:
  Human attackers ──────────► Human defenders
  (slow, expertise-limited)    (slow, expertise-limited)

Near-term state:
  AI attackers ─────────────► Human defenders
  (fast, scalable)              (slow, expertise-limited)
                    ^^^
              Current danger zone

Future state:
  AI attackers ─────────────► AI defenders
  (fast, scalable)              (fast, scalable)
         └──────────────────────────┘
              Competing feedback loops
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We are currently in the second phase — the danger zone. AI-accelerated attack capability is outpacing human-scale defense. The third phase, where AI defense catches up, is coming, but it is not here yet.&lt;/p&gt;

&lt;p&gt;The organizations that close that gap fastest will not necessarily have the most capable models. They will have the tightest feedback loop between detection and remediation. Anthropic understood this when they degraded Opus 4.7 on CyberBench. They looked at Mythos's capabilities, understood that making something more capable publicly available was a liability before the defense side had caught up, and made a product decision that cost them a benchmark headline in exchange for reduced near-term risk.&lt;/p&gt;

&lt;p&gt;That is the playbook. Build for the loop, not the leaderboard.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Developers and Architects Should Actually Do Right Now
&lt;/h2&gt;

&lt;p&gt;The model release news cycle will pass. The structural shift it represents will not. Here is how to think about your exposure:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Audit your patch lag.&lt;/strong&gt; The remediation gap is your real risk surface. How long does it take your organization to go from "CVE published" to "patch deployed in production"? That number tells you more about your actual risk than your perimeter security posture.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Treat your dependency graph as infrastructure.&lt;/strong&gt; Libraries and shared frameworks are not just technical debt decisions — they are blast radius decisions. Every shared dependency is a vector through which a single discovered vulnerability reaches you. That calculus now needs to include AI-accelerated discovery timelines.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Start thinking about detection-to-remediation as a pipeline, not a process.&lt;/strong&gt; The organizations that will handle the next phase of AI-accelerated attacks are the ones that have automated the boring parts of remediation so that their human capacity can focus on the genuinely novel cases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Understand which of your systems run on monoculture infrastructure.&lt;/strong&gt; OpenBSD, Linux kernel, FFmpeg, OpenSSL, glibc — if your systems touch these, you are exposed to a different risk profile than systems running on more customized stacks. Know which category you are in.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The intentional benchmark regression is the story.&lt;/strong&gt; Anthropic degraded Opus 4.7 on CyberBench specifically because Mythos demonstrated that unrestricted public access to more capable models is a net liability for critical infrastructure. That is an industry-first decision worth understanding deeply.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Human effort is no longer the rate-limiting factor in vulnerability discovery.&lt;/strong&gt; AI systems can probe attack surfaces at scale, continuously, across thousands of targets — and produce working exploit chains, not just theoretical flags.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The remediation gap is now the primary risk.&lt;/strong&gt; AI accelerates discovery without equally accelerating patching. The asymmetry between those two velocities is your real attack surface.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Software monoculture amplifies everything.&lt;/strong&gt; A single AI-discovered vulnerability in shared infrastructure (Linux, OpenSSL, FFmpeg) is not one bug in one system — it's one bug in the foundation of millions of systems simultaneously.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Both Anthropic and OpenAI are now treating their own models like classified defense technology.&lt;/strong&gt; This is not regulatory theater. It is a calibrated signal that capability has outpaced the defense ecosystem's readiness.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Question That Should Keep Architects Up at Night
&lt;/h2&gt;

&lt;p&gt;Anthropic made their model worse on purpose because they understood something most of the industry has not caught up to yet: the capability is already here. The question that remains is who gets to use it first, and whether the defense side catches up before the attack side scales.&lt;/p&gt;

&lt;p&gt;We like to believe that modern software systems are mature and well understood. They are not. A 27-year-old bug in a deliberately hardened operating system is not an anomaly — it is evidence that complexity has always outpaced our ability to fully audit what we build. AI is not introducing that complexity. It is exposing it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Here is the question I want to leave you with:&lt;/strong&gt; If a system like Mythos ran against your production infrastructure today, how long would it take your team to close what it found — and do you have a plan for the gap?&lt;/p&gt;

&lt;p&gt;Drop your answer in the comments. I'm particularly curious how organizations with large legacy surface areas are thinking about this.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Credit: The technical analysis in this post is based on insights from &lt;a href="https://newsletter.karuparti.com" rel="noopener noreferrer"&gt;Diary of an AI Architect&lt;/a&gt; by Anurag Karuparti — a newsletter worth following if you build or operate software at scale.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>cybersecurity</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>5 Markdown Files That Tame Non-Deterministic AI in Your Engineering Org</title>
      <dc:creator>shakti mishra</dc:creator>
      <pubDate>Fri, 24 Apr 2026 00:08:51 +0000</pubDate>
      <link>https://forem.com/shakti_mishra_308e9f36b5d/5-markdown-files-that-tame-non-deterministic-ai-in-your-engineering-org-31h3</link>
      <guid>https://forem.com/shakti_mishra_308e9f36b5d/5-markdown-files-that-tame-non-deterministic-ai-in-your-engineering-org-31h3</guid>
      <description>&lt;h1&gt;
  
  
  Your AI Coding Agent Has No Memory. These 5 Files Fix That.
&lt;/h1&gt;

&lt;p&gt;Picture this: two developers on the same team, same repo, same AI coding assistant. One gets perfectly typed TypeScript with tests. The other gets &lt;code&gt;any&lt;/code&gt; everywhere and zero test coverage. Same tool. Same codebase. Completely different output.&lt;/p&gt;

&lt;p&gt;This is not a bug. It is the default state of AI-assisted engineering when you leave standardization up to individual prompting habits.&lt;/p&gt;

&lt;p&gt;One developer's Copilot generates tests for every function. Another skips testing entirely. One team receives code that reuses the shared auth module. Another ends up with a custom, hand-rolled auth flow. One developer's output follows established naming conventions. Another produces code that looks like it came from a completely different codebase.&lt;/p&gt;

&lt;p&gt;As AI becomes embedded in software delivery, the real problem is not capability — it is consistency. The rules, workflows, and context that shape good engineering decisions need to live somewhere permanent. Somewhere the model will actually read.&lt;/p&gt;

&lt;p&gt;That somewhere is your repository. And the format is five markdown files.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Prompting Alone Doesn't Scale
&lt;/h2&gt;

&lt;p&gt;Every engineer prompts differently. That is fine for a solo project. It is a slow disaster for a team.&lt;/p&gt;

&lt;p&gt;When everyone relies on personal prompting habits, you get a system where quality varies by individual, standards drift across branches, good decisions made once never get inherited by the next PR, and AI agents context-switch between contributors with no shared memory.&lt;/p&gt;

&lt;p&gt;The models are not the bottleneck. Your team's ability to encode engineering judgment into the system around the model is.&lt;/p&gt;

&lt;p&gt;GitHub now supports a structured set of repository-level files that give AI coding agents a persistent, shared understanding of how your team works. These files load into context automatically, apply to specific code paths, define specialist roles, and package reusable workflows. They work across GitHub Copilot, Claude Code, Cursor, and Codex.&lt;/p&gt;

&lt;p&gt;Here is how each one works — and why it matters.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. &lt;code&gt;.github/copilot-instructions.md&lt;/code&gt; — The Always-On Standards Layer
&lt;/h2&gt;

&lt;p&gt;This is your baseline. It applies to every AI interaction in the repo, automatically, without anyone having to remember to include it.&lt;/p&gt;

&lt;p&gt;Put broad engineering expectations here: coding conventions, testing requirements, accessibility standards, architectural boundaries, documentation rules, error-handling patterns. If your team wants the AI to always write typed APIs, follow a specific folder structure, or update tests whenever production code changes — this is where that lives.&lt;/p&gt;

&lt;p&gt;It is one of the highest-leverage files you can create. Not because it does anything new, but because it makes implicit standards explicit and permanent.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# .github/copilot-instructions.md&lt;/span&gt;

&lt;span class="gu"&gt;## Language and framework&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Use TypeScript with strict mode enabled
&lt;span class="p"&gt;-&lt;/span&gt; Use Express.js for all API endpoints
&lt;span class="p"&gt;-&lt;/span&gt; Never use &lt;span class="sb"&gt;`any`&lt;/span&gt; type

&lt;span class="gu"&gt;## Testing&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Write unit tests for every new function using Jest
&lt;span class="p"&gt;-&lt;/span&gt; Maintain minimum 80% code coverage

&lt;span class="gu"&gt;## Error handling&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Use custom error classes from &lt;span class="sb"&gt;`src/errors/`&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Always return structured error responses with status code and message

&lt;span class="gu"&gt;## Architecture&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Never import directly from &lt;span class="sb"&gt;`src/internal/`&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Use the repository pattern for all database access
&lt;span class="p"&gt;-&lt;/span&gt; All new endpoints must go through the API gateway in &lt;span class="sb"&gt;`src/gateway/`&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Think of it as onboarding documentation that never gets ignored, because the AI reads it every time.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. &lt;code&gt;.github/instructions/*.instructions.md&lt;/code&gt; — The Path-Scoped Layer
&lt;/h2&gt;

&lt;p&gt;Most real codebases are not uniform. Your frontend follows different rules than your infrastructure. Your data pipelines need different guardrails than your API layer.&lt;/p&gt;

&lt;p&gt;Path-specific instruction files let you apply the right constraints in the right place. Each file uses an &lt;code&gt;applyTo&lt;/code&gt; pattern to activate only for matching directories or file types. This is where standardization gets intelligent instead of blunt.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;applyTo&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;src/frontend/**"&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="gh"&gt;# Frontend instructions&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Use React functional components with hooks
&lt;span class="p"&gt;-&lt;/span&gt; Use Tailwind CSS for styling, no inline styles
&lt;span class="p"&gt;-&lt;/span&gt; All components must be accessible (WCAG 2.1 AA)
&lt;span class="p"&gt;-&lt;/span&gt; Use React Testing Library for component tests
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;applyTo&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;infrastructure/**"&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="gh"&gt;# Infrastructure instructions&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Use Bicep for all Azure resource definitions
&lt;span class="p"&gt;-&lt;/span&gt; Never hardcode secrets, always reference Key Vault
&lt;span class="p"&gt;-&lt;/span&gt; Tag every resource with &lt;span class="sb"&gt;`environment`&lt;/span&gt; and &lt;span class="sb"&gt;`team`&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You stop treating the repo like a monolith and start giving the AI the right lens for each context. The frontend agent should not be applying infrastructure conventions. Now it will not.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. &lt;code&gt;AGENTS.md&lt;/code&gt; — The Repo's Operating Manual
&lt;/h2&gt;

&lt;p&gt;This is the file that tells an autonomous agent how work actually gets done here.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;AGENTS.md&lt;/code&gt; is an open format for guiding coding agents, originally created by the OpenAI ecosystem. GitHub's Copilot coding agent added support for it in 2025, and the industry has converged around it: GitHub also supports &lt;code&gt;CLAUDE.md&lt;/code&gt; and &lt;code&gt;GEMINI.md&lt;/code&gt; as equivalent alternatives, depending on your toolchain.&lt;/p&gt;

&lt;p&gt;Think of it as operational memory for the repo. What commands should the agent run? How should it test? What should it never touch? How should it title pull requests? What counts as "done"?&lt;/p&gt;

&lt;p&gt;Without this file, every autonomous agent starts from scratch. With it, engineering standards become portable across tools and contributors.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# AGENTS.md&lt;/span&gt;

&lt;span class="gu"&gt;## Build and test&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Run &lt;span class="sb"&gt;`npm run build`&lt;/span&gt; before committing
&lt;span class="p"&gt;-&lt;/span&gt; Run &lt;span class="sb"&gt;`npm test`&lt;/span&gt; and ensure all tests pass
&lt;span class="p"&gt;-&lt;/span&gt; Run &lt;span class="sb"&gt;`npm run lint`&lt;/span&gt; and fix all warnings

&lt;span class="gu"&gt;## Pull requests&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Title format: &lt;span class="sb"&gt;`[AREA] Short description`&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Always include a summary of what changed and why
&lt;span class="p"&gt;-&lt;/span&gt; Never push directly to &lt;span class="sb"&gt;`main`&lt;/span&gt;

&lt;span class="gu"&gt;## Off limits&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Do not modify files in &lt;span class="sb"&gt;`src/generated/`&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Do not update &lt;span class="sb"&gt;`package-lock.json`&lt;/span&gt; manually
&lt;span class="p"&gt;-&lt;/span&gt; Do not change CI/CD workflows without approval
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The distinction from &lt;code&gt;copilot-instructions.md&lt;/code&gt; is important. That file sets coding standards. This one sets operating procedure. One shapes what the AI produces. The other shapes how it behaves as an agent.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. &lt;code&gt;.github/agents/*.md&lt;/code&gt; — Custom Agent Profiles (The Specialist Layer)
&lt;/h2&gt;

&lt;p&gt;Not every task should go to a general-purpose coding assistant. Sometimes you need a security reviewer who will not touch production code. Sometimes you need an implementation planner. Sometimes you need a refactoring specialist with write access to exactly two directories.&lt;/p&gt;

&lt;p&gt;Custom agent files let you define specialist personas with their own instructions, tools, and restrictions. They live in &lt;code&gt;.github/agents/&lt;/code&gt; and can specify which tools the agent is allowed to use — including MCP servers if your setup supports them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# .github/agents/security-reviewer.md
---
&lt;/span&gt;description: "Reviews code for security vulnerabilities"
tools:
&lt;span class="p"&gt;  -&lt;/span&gt; code_search
&lt;span class="gh"&gt;  - read_file
---
&lt;/span&gt;
You are a security reviewer. Your job is to find vulnerabilities.

&lt;span class="gu"&gt;## Rules&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Flag any use of &lt;span class="sb"&gt;`eval()`&lt;/span&gt;, &lt;span class="sb"&gt;`innerHTML`&lt;/span&gt;, or unsanitized user input
&lt;span class="p"&gt;-&lt;/span&gt; Check for SQL injection in all database queries
&lt;span class="p"&gt;-&lt;/span&gt; Verify that all API endpoints require authentication
&lt;span class="p"&gt;-&lt;/span&gt; You may read code but never modify it
&lt;span class="p"&gt;-&lt;/span&gt; Output a structured report with severity levels
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is architecturally different from general instructions. General instructions tell every agent how your team works. Custom agents create intentional specialists for jobs that repeat. You define the role once, and any developer on the team can invoke it without reinventing the persona each time.&lt;/p&gt;

&lt;p&gt;The repo stops having one AI assistant with inconsistent behavior. It starts having a team of specialists with defined roles.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. &lt;code&gt;SKILL.md&lt;/code&gt; — The Reusable Capability Layer
&lt;/h2&gt;

&lt;p&gt;This is where things get genuinely powerful.&lt;/p&gt;

&lt;p&gt;A skill is a folder of instructions, scripts, and resources that an agent loads on demand for a specific task. It lives under &lt;code&gt;.github/skills/&lt;/code&gt; and must include a &lt;code&gt;SKILL.md&lt;/code&gt; file. GitHub has made the spec an open standard, and skills work across Copilot's coding agent, the CLI, and VS Code agent mode.&lt;/p&gt;

&lt;p&gt;The difference between a skill and a custom instruction is that a skill can package a repeatable workflow — not just guidance, but executable steps with associated scripts.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.github/skills/
  debug-ci/
    SKILL.md
    scripts/
      analyze-logs.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# SKILL.md
---
&lt;/span&gt;name: "debug-ci"
&lt;span class="gh"&gt;description: "Debug failing GitHub Actions workflows"
---
&lt;/span&gt;
&lt;span class="gu"&gt;## Steps&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; Read the failing workflow YAML from &lt;span class="sb"&gt;`.github/workflows/`&lt;/span&gt;
&lt;span class="p"&gt;2.&lt;/span&gt; Run &lt;span class="sb"&gt;`scripts/analyze-logs.sh`&lt;/span&gt; to extract the error
&lt;span class="p"&gt;3.&lt;/span&gt; Check if the failure is a flaky test, dependency issue, or config error
&lt;span class="p"&gt;4.&lt;/span&gt; Suggest a fix with the exact file and line to change
&lt;span class="p"&gt;5.&lt;/span&gt; If the fix involves a dependency update, run &lt;span class="sb"&gt;`npm audit`&lt;/span&gt; first
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can build skills for anything that happens more than twice: Playwright UI testing, infrastructure code review, proposal drafting, schema validation, changelog generation. The team stops starting from zero on recurring tasks. Good engineering behavior becomes a reusable asset.&lt;/p&gt;




&lt;h2&gt;
  
  
  How the Layers Stack Together
&lt;/h2&gt;

&lt;p&gt;Here is how the full system looks when all five files are in play:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────┐
│                    Your Repository                   │
│                                                      │
│  ┌──────────────────────────────────────────────┐   │
│  │  copilot-instructions.md                     │   │
│  │  Always-on: coding standards, arch rules     │   │
│  └──────────────────────────────────────────────┘   │
│                                                      │
│  ┌──────────────────────────────────────────────┐   │
│  │  .github/instructions/*.instructions.md      │   │
│  │  Path-scoped: frontend rules, infra rules    │   │
│  └──────────────────────────────────────────────┘   │
│                                                      │
│  ┌──────────────────────────────────────────────┐   │
│  │  AGENTS.md                                   │   │
│  │  Operating manual: build, test, PR rules     │   │
│  └──────────────────────────────────────────────┘   │
│                                                      │
│  ┌──────────────────────────────────────────────┐   │
│  │  .github/agents/*.md                         │   │
│  │  Specialist roles: security, planner, etc.   │   │
│  └──────────────────────────────────────────────┘   │
│                                                      │
│  ┌──────────────────────────────────────────────┐   │
│  │  .github/skills/*/SKILL.md                   │   │
│  │  Reusable workflows: debug-ci, test-ui, etc. │   │
│  └──────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────┘
                          │
                          ▼
         ┌────────────────────────────┐
         │   AI Coding Agent          │
         │   (Copilot / Claude Code / │
         │    Cursor / Codex)         │
         └────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each layer handles a different surface area. Together, they close the gap between what the model can do and what your team needs it to do consistently.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Shift Most Teams Miss
&lt;/h2&gt;

&lt;p&gt;Most teams reach for more model power when they hit inconsistency problems. A better model will not fix a context problem.&lt;/p&gt;

&lt;p&gt;The real insight is this: your AI coding tools are only as consistent as the context they receive. When that context is scattered across Slack threads, tribal knowledge, and individual senior engineers, the AI inherits that chaos. When it lives in structured, version-controlled files, the AI inherits your engineering judgment.&lt;/p&gt;

&lt;p&gt;These five files are not markdown clutter. They are the beginning of a standardized interface between your engineering system and the AI agents working inside it.&lt;/p&gt;

&lt;p&gt;The best teams will not win because they have access to the smartest model. They will win because they know how to encode their engineering judgment into the system around the model.&lt;/p&gt;

&lt;p&gt;And increasingly, that system looks like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;copilot-instructions.md&lt;/code&gt; for the default rules&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;AGENTS.md&lt;/code&gt; for the repo's operating manual&lt;/li&gt;
&lt;li&gt;Path-specific files for context-aware standards&lt;/li&gt;
&lt;li&gt;Custom agents for specialist roles&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;SKILL.md&lt;/code&gt; for reusable workflows
The future of software engineering will not just be written in code. More of it will be written in context.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AI coding tools are only as consistent as the context they receive.&lt;/strong&gt; Without structured repo files, every developer's output diverges based on personal prompting style.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The 5-file system creates layered, version-controlled context&lt;/strong&gt; — always-on standards, path-scoped rules, operating procedures, specialist personas, and reusable workflows.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;AGENTS.md&lt;/code&gt; is cross-tool portable.&lt;/strong&gt; GitHub Copilot, Claude Code, and Gemini all support their own flavor; the concept is converging into an industry standard.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skills package repeatable workflows, not just instructions.&lt;/strong&gt; If a task happens more than twice, it should be a skill.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Most teams need more structure before they need more model power.&lt;/strong&gt; Better context produces more consistent output than a smarter model with no guardrails.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What Are You Doing About This?
&lt;/h2&gt;

&lt;p&gt;Most teams I talk to are one or two steps into this system — they have a rough &lt;code&gt;copilot-instructions.md&lt;/code&gt; or a stale &lt;code&gt;AGENTS.md&lt;/code&gt; that nobody updates. Very few have all five layers running together.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Which of these files does your team already have in place? And which one would make the biggest difference if you added it tomorrow?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Drop a comment — I'm curious where teams are actually getting value and where they're still fighting entropy.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Credit: The technical insights in this post draw from &lt;a href="https://newsletter.karuparti.com" rel="noopener noreferrer"&gt;Diary of an AI Architect&lt;/a&gt; by Anurag Karuparti — one of the clearest voices on production agentic AI architecture.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>github</category>
      <category>productivity</category>
      <category>devtools</category>
    </item>
  </channel>
</rss>
