<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Signadot</title>
    <description>The latest articles on Forem by Signadot (@signadot).</description>
    <link>https://forem.com/signadot</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F467077%2Fb24943f8-8c14-4c75-841c-7017b22507ee.png</url>
      <title>Forem: Signadot</title>
      <link>https://forem.com/signadot</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/signadot"/>
    <language>en</language>
    <item>
      <title>Agents Write Code. They Don't Do Software Engineering.</title>
      <dc:creator>Signadot</dc:creator>
      <pubDate>Mon, 20 Apr 2026 14:47:24 +0000</pubDate>
      <link>https://forem.com/signadot/agents-write-code-they-dont-do-software-engineering-43n0</link>
      <guid>https://forem.com/signadot/agents-write-code-they-dont-do-software-engineering-43n0</guid>
      <description>&lt;p&gt;&lt;em&gt;Read this article on &lt;a href="https://www.signadot.com/blog/agents-write-code-they-dont-do-software-engineering/?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=Agents+Write+Code.+They+Don%27t+Do+Software+Engineering." rel="noopener noreferrer"&gt;Signadot&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Long-running and background coding agents have hit a new threshold. When an agent runs for hours, manages its own iteration loop, and submits a pull request without hand-holding, it stops being a tool you invoke and starts being more like a worker you assign tasks to. Like any worker, the question isn’t how closely you supervise them. It’s what work you assign them in the first place.&lt;/p&gt;

&lt;p&gt;We are all figuring this out in real time, and I see many teams making an understandable but critical error. They tune the autonomy dial, adding more review checkpoints or removing them, when the actual variable that matters is which categories of work agents should own versus which categories developers should own. That distinction isn’t about risk tolerance. It’s about capability boundaries.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code writing and software engineering are not the same job
&lt;/h3&gt;

&lt;p&gt;Writing code is pattern recognition. Take what’s been done before, apply it to a new context, and scaffold it out. Large language models are exceptional at this because that’s exactly what they do: recognize and reproduce patterns from massive corpora of prior work.&lt;/p&gt;

&lt;p&gt;Software engineering is something else. It’s trade-offs. Constraints. Decisions that require context no model has access to: your business domain, your product strategy, your customers, your technical debt, the conversation your team had last week about why you chose one approach over another.&lt;/p&gt;

&lt;p&gt;Most teams split on importance or risk tolerance. The real divide is between work that can be reasoned from prior patterns and work that requires context, strategy, and judgment that lives outside the codebase.&lt;/p&gt;

&lt;h3&gt;
  
  
  What developers actually own
&lt;/h3&gt;

&lt;p&gt;The work that actually requires developers is more specific than “anything important,” but it doesn’t reduce to a tidy list of task types. It cuts across every part of the engineering process.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Agents can read code. They cannot read the room.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Developers own the work where the right answer depends on context that doesn’t exist in the codebase. Product strategy, business constraints, team dynamics, conversations in Slack threads, and architecture reviews are part of the history of why a system is built the way it is. Agents can read code. They cannot read the room.&lt;/p&gt;

&lt;p&gt;Developers own the work where the risk profile is ambiguous, or the failure modes are hard to predict. Some changes cascade in ways that depend on organizational boundaries, deployment timing, or data contracts baked into a system over years of iteration. Evaluating correctness in those cases requires judgment that no model can drive from code alone. The higher the uncertainty, the more you need someone who understands not just what the code does but why it was written that way.&lt;/p&gt;

&lt;p&gt;Developers own the work where the output is a decision, not an artifact: what to build, what to cut, and which technical bets to place six months from now. Agents can generate options. They cannot tell you which option is right for your situation because “right” depends on factors that reside in human heads and organizational contexts, not in training data.&lt;/p&gt;

&lt;p&gt;And all of this is still evolving. As teams invest in making context more explicit through better documentation, clearer contracts, and more structured decision records, the boundaries shift. Work that once required a developer’s institutional knowledge becomes accessible to an agent. But the frontier of unstructured, high-judgment work keeps moving, too, and that’s where developer time is most valuable.&lt;/p&gt;

&lt;p&gt;In distributed systems, this problem gets worse. The more services you spread across multiple teams and codebases, the more that critical context lives outside any single codebase. A change to one service’s event schema can break downstream consumers in ways that no test in that service’s own suite will catch.&lt;/p&gt;

&lt;p&gt;Also: The agent doesn’t know what it doesn’t know. The developer on that team does — not because they wrote the code — but because they were in the meeting where the schema was agreed upon. For cloud-native teams, this scales badly: the more services, the more implicit contracts, and the more context that only people carry.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where agents deliver the most value
&lt;/h3&gt;

&lt;p&gt;There is a mountain of work in every codebase that is a waste of human brainpower. Boilerplate, scaffolding, repetitive refactors, unit test generation, configuration templating, and data formatting. This work is rote, mechanical, and can be reasoned entirely from prior patterns. Agents should own it.&lt;/p&gt;

&lt;p&gt;Once a developer specifies the interface, contract, and expected behavior, an agent can implement faster and more consistently than a developer could. The implementation is the repeatable part. The reasoning that precedes it is not.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“The developers who thrive in this model won’t be the ones who write the most code. They’ll be the ones who make the best decisions.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Iteration speed matters here, too. Generating multiple implementations, running test suites, checking contract conformance: agents do this at a pace no developer can match. Circle CI’s State of Software Delivery Report found that throughput bottlenecks most commonly appear in the feedback and validation loop, not the code-writing phase. Agents compress that loop significantly when the acceptance criteria are clear and they have access to the runtime environment and tools they need to validate their work.&lt;/p&gt;

&lt;p&gt;The developers who thrive in this model won’t be the ones who write the most code. They’ll be the ones who make the best decisions about what to build and how to architect it, then hand off the execution to agents that can move faster than any human.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd1u7u1s73ydxd4evr9wo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd1u7u1s73ydxd4evr9wo.png" alt=" " width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  A three-tier model for dividing the work
&lt;/h3&gt;

&lt;p&gt;In our team, we have found it useful to implement a rough framework for bucketing engineering work categories to make decisions about how it is distributed between agents and developers.&lt;/p&gt;

&lt;h4&gt;
  
  
  Tier 1: Agent-led, developer-reviewed
&lt;/h4&gt;

&lt;p&gt;Tasks where agent execution is high-confidence and the output is self-verifiable. Boilerplate generation, configuration templating, adding endpoints within established patterns, running and reporting on test suites, and scaffolding new services or modules from existing conventions. The developer reviews the output, but the agent owns the work.&lt;/p&gt;

&lt;p&gt;Routing this to developers wastes your most expensive resource. This category should expand as teams get better at making their patterns explicit and testable.&lt;/p&gt;

&lt;h4&gt;
  
  
  Tier 2: Agent-assisted, developer-guided
&lt;/h4&gt;

&lt;p&gt;Tasks that require context beyond the codebase to validate. The agent implements, but the developer defines the scope, constraints, and success criteria. Feature work within a well-understood domain, refactoring within established boundaries, and test implementation for developer-defined strategies fall here.&lt;/p&gt;

&lt;p&gt;The developer provides the engineering judgment. The agent provides the implementation throughput. Most feature work, across any architecture, falls into this tier.&lt;/p&gt;

&lt;h4&gt;
  
  
  Tier 3: Developer-led, agent-supported
&lt;/h4&gt;

&lt;p&gt;Tasks where the core work is judgment, not implementation. Architectural decisions, cross-boundary contract changes, debugging emergent failures, and defining what to build next. Agents can assist with subtasks: drafting proposals, analyzing logs, and generating candidate implementations for evaluation. But a developer must drive because the work itself is reasoning, not pattern execution.&lt;/p&gt;

&lt;p&gt;The distinction from Tier 2 is that the developer isn’t just validating output. They’re doing the intellectual work that no amount of training data can substitute for.&lt;/p&gt;

&lt;h3&gt;
  
  
  The cost of getting the split wrong
&lt;/h3&gt;

&lt;p&gt;Most teams I speak to are either under-allocating or over-allocating work to agents. Both are expensive mistakes.&lt;/p&gt;

&lt;p&gt;Over-allocation is the more visible failure. Push agents into Tier 3 work, and they produce output that requires significant rework because the necessary context wasn’t available to them. The rework cost is real, but the opportunity cost is worse: developers who should be doing Tier 3 work are instead reviewing and correcting agent output that shouldn’t have been delegated in the first place.&lt;/p&gt;

&lt;p&gt;Under-allocation is quieter but equally damaging. Teams that default to developer-owned work because agent output seems uncertain are paying developer rates for Tier 1 tasks. Developer time is the highest-cost resource on the team. Burning it on pattern-execution work that agents could handle is a slow drag on velocity that compounds over months.&lt;/p&gt;

&lt;p&gt;This is why many teams adopting agentic workflows see limited gains or even slight decreases in merged code throughput. They haven’t solved the allocation problem. They’ve added a new tool without changing how work gets distributed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Audit the work, not just the agents
&lt;/h3&gt;

&lt;p&gt;The question isn’t whether agents replace developers. It’s what the right engineering model looks like when agents handle the mechanical work, and humans focus on the strategic work.&lt;/p&gt;

&lt;p&gt;The teams navigating this well don’t just audit their agents. They audit their work. They ask which tasks could be agent-led if the boundaries were made explicit, then invest in making those boundaries explicit. That investment returns developer time to the high-judgment, context-dependent work that agents won’t own well anytime soon.&lt;/p&gt;

&lt;p&gt;The answer won’t come from the AI labs. It’ll come from engineering teams actually building software this way every day, figuring out where the line is through practice, and learning what their specific codebase, team, and domain require on each side of the divide.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>devops</category>
      <category>architecture</category>
    </item>
    <item>
      <title>The Agent PR Flood Is Here. If You Run Istio, You're Halfway to Solving It.</title>
      <dc:creator>Signadot</dc:creator>
      <pubDate>Tue, 14 Apr 2026 18:08:26 +0000</pubDate>
      <link>https://forem.com/signadot/the-agent-pr-flood-is-here-if-you-run-istio-youre-halfway-to-solving-it-hbg</link>
      <guid>https://forem.com/signadot/the-agent-pr-flood-is-here-if-you-run-istio-youre-halfway-to-solving-it-hbg</guid>
      <description>&lt;p&gt;&lt;em&gt;Read this article on &lt;a href="https://www.signadot.com/blog/the-agent-pr-flood-is-here-if-you-run-istio-youre-halfway-to-solving-it/?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=The+Agent+PR+Flood+Is+Here.+If+You+Run+Istio%2C+You%27re+Halfway+to+Solving%C2%A0It." rel="noopener noreferrer"&gt;Signadot&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Agentic workflows are rapidly accelerating the volume of pull requests, and validation is quickly becoming the most critical bottleneck. Teams using service meshes like Istio are well-positioned to solve it in ephemeral environments.&lt;/p&gt;

&lt;p&gt;Engineering teams across the industry are waking up to a harsh new reality. The widespread adoption of agentic workflows has made code generation cheap and fast, but it has created a new infrastructure problem.&lt;/p&gt;

&lt;p&gt;In simpler application architectures, running unit tests and mocks in a continuous integration pipeline might be enough to validate agent-generated code. But in cloud-native, distributed systems, validating behavior in a live environment is critical.&lt;/p&gt;

&lt;p&gt;In just the past few months, I’ve seen the conversation with customers and colleagues shift from “agents are great for writing code, but we’re not seeing it impact pipeline” to “we’re drowning in PRs.” Validation has become the new bottleneck for distributed systems.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“If you cannot validate that code as quickly as agents write it, your pipeline will collapse down to the same human-level throughput it was built to handle.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This flood isn’t impacting organizations equally. Companies like Stripe, Ramp, and other early adopters of advanced AI workflows are seeing exponential gains in code merged to main. They recognized early that generating code is only half the battle. If you cannot validate that code as quickly as agents write it, your pipeline will collapse down to the same human-level throughput it was built to handle.&lt;/p&gt;

&lt;p&gt;For teams that want to replicate the success of these organizations, the answer might already be running in their clusters. If your platform is currently running a service mesh like Istio, you are already halfway to eliminating the validation bottleneck.&lt;/p&gt;

&lt;h3&gt;
  
  
  The AI velocity illusion and the integration bottleneck
&lt;/h3&gt;

&lt;p&gt;The recent &lt;a href="https://circleci.com/resources/2026-state-of-software-delivery/" rel="noopener noreferrer"&gt;CircleCI 2026 State of Software Delivery&lt;/a&gt; report confirms what on-call rotations are already feeling: The pipeline is choking on its own success. While average workflow throughput increased 59 percent year over year, those gains are heavily concentrated at the top. Elite teams are operating at an unprecedented scale. The top 5 percent of teams saw their throughput nearly double, up 97%.&lt;/p&gt;

&lt;p&gt;For the vast majority of organizations, the pipeline is clogging. The median team saw a 15.2 percent increase in throughput on feature branches where AI supports rapid prototyping, but their throughput on the main branch actually declined by 6.8 percent. Developers and their autonomous agents are generating significantly more code, but teams are struggling to review, validate, and promote it.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“The pipeline is choking on its own success. Developers and their autonomous agents are generating significantly more code, but teams are struggling to review, validate, and promote it.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Traditional shared staging environments were never designed to handle this level of concurrency. They were sized for human output. For an engineering team of 50 generating 2-3 pull requests a day, their infrastructure was built to handle 100-150 PRs a day. This quickly becomes a critical choke point when hit with a massive volume spike. The queue grows faster than it drains.&lt;/p&gt;

&lt;p&gt;Organizations that fail to upgrade their validation infrastructure are finding that the velocity promised by their AI investments is dissolving in the staging queue. The teams that are winning recognize that scalable validation infrastructure is the only way to unlock the true return on investment of agentic workflows.&lt;/p&gt;

&lt;h3&gt;
  
  
  The true bottleneck of agentic workflows
&lt;/h3&gt;

&lt;p&gt;To understand why this bottleneck is so destructive, you must examine what happens when machine output speed collides with infrastructure built for human throughput. Agents exponentially increase the volume of pull requests, and traditional staging queues and review processes simply cannot support that volume without creating impossibly long backlogs.&lt;/p&gt;

&lt;p&gt;Because the pipeline cannot handle the load, developers are forced to throttle their agents. They do not submit the full volume of agent-generated code. Instead, they have agents rely on unit tests and mocks to avoid the staging queue until the later stages of development. This imperfect pattern worked for human developers who had a mental model of the full system architecture and could intuit which changes would break downstream dependencies. Agents don’t work that way. They frequently generate novel code that passes localized unit tests but fails when introduced to the broader system architecture. For agents, a fast feedback loop with a realistic runtime to validate their code is not a nice-to-have. It’s a requirement.&lt;/p&gt;

&lt;p&gt;This means the potential throughput of agents is artificially capped by linear infrastructure designed for human velocity. It also means the code that does get through is much more likely to break. The CircleCI report highlights the cost of these integration failures. Success rates on the main branch for most teams fell to 70.8 percent.&lt;/p&gt;

&lt;h3&gt;
  
  
  The unsustainable math of environment duplication
&lt;/h3&gt;

&lt;p&gt;To convert the increased output of agentic workflows into actual throughput and eliminate this bottleneck, the validation infrastructure needs to give each agent or pull request an isolated, realistic runtime environment. Traditionally, platform teams would spin up a fresh Kubernetes namespace or an isolated cluster for every single pull request. While this provides the necessary fidelity, the math completely breaks down at the agentic scale. Duplicating every database, message queue, and microservice takes 15 minutes or more. When you multiply that overhead by 1,000 pull requests a day, infrastructure costs explode, and the 15-minute deployment lag severely caps an agent’s iteration cycles.&lt;/p&gt;

&lt;p&gt;Another common approach to bypass full cluster duplication is shifting the burden to heavy virtual machines running localized container setups. I spoke recently with an engineering leader whose team handles integration testing by dynamically generating Docker Compose files for isolated cloud instances. Because tests rely on shared state, touching just a few core files in continuous integration triggers a fleet of 100 heavy cloud instances that spend an hour grinding through sequential testing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbz64d11aioh7ecy1qldz.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbz64d11aioh7ecy1qldz.jpg" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Whether you are spinning up 1,000 full Kubernetes namespaces or orchestrating fleets of heavy virtual machines to run localized containers, the result is the same. The deployment lag compounds quickly, and the velocity of your AI workflows when it meets the bottleneck of linear infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Ephemeral environments for agentic scale
&lt;/h3&gt;

&lt;p&gt;These compounding factors mean that the only viable solution is a new model of scalable ephemeral environments. To handle machine speed concurrency, environments must spin up in seconds and provide a realistic runtime without the cost of duplicating the entire cluster. Instead of copying everything, a scalable, ephemeral environment model deploys only the microservices that have changed, as a lightweight sandbox. The rest of the architecture, including all heavy databases and stable downstream services, is shared from a baseline environment. The sandbox dynamically routes test traffic between the changed services and the baseline.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0cs8i5scvxm2r0seuulw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0cs8i5scvxm2r0seuulw.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This approach delivers the exact same high-fidelity runtime as a full duplicate environment. The code is tested against real, live dependencies. The critical difference is the resource footprint. By only deploying the services under test, the environment spins up in seconds rather than minutes. It consumes a fraction of the compute resources.&lt;/p&gt;

&lt;p&gt;In this model, agents can validate their code, get instant feedback, and iterate with massive concurrency and zero contention.&lt;/p&gt;

&lt;h3&gt;
  
  
  Routing your way out of the staging queue
&lt;/h3&gt;

&lt;p&gt;Implementing this shared baseline architecture requires advanced traffic control. Building the automation and lifecycle management for these environments from scratch is a massive engineering undertaking. However, teams running a service mesh such as Istio have a significant advantage.&lt;/p&gt;

&lt;p&gt;Because these tools already provide the exact routing capabilities needed, implementing scalable ephemeral environments like those described above becomes seamless. The underlying service mesh or ingress controller simply handles the dynamic routing of test traffic to a lightweight sandbox while ensuring all regular traffic flows uninterrupted to the stable baseline.&lt;/p&gt;

&lt;p&gt;Here is what the underlying routing logic looks like when configured in Istio:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;networking.istio.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;VirtualService&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;location&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;hotrod-istio&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;location&lt;/span&gt;
  &lt;span class="na"&gt;http&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;match&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;baggage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;^.*\b(sd-routing-key|sd-sandbox)\s*=[^,]*\bqwblp48fpmb30\b.*$&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;tracestate&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;^.*\b(sd-routing-key|sd-sandbox)\s*=[^,]*\bqwblp48fpmb30\b.*$&lt;/span&gt;
    &lt;span class="na"&gt;route&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;destination&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;local-location-location-9f4477c8.hotrod-istio.svc.cluster.local&lt;/span&gt;
        &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;number&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8081&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;route&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;destination&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;location&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When a request carries the specific header, Istio intercepts it and routes it directly to the sandbox version of the service deployed for that specific pull request.&lt;/p&gt;

&lt;p&gt;The critical enabling mechanism behind this is context propagation. In a deep microservice call chain, the sandbox routing header must travel automatically between every service. OpenTelemetry (otel) baggage propagation handles this seamlessly. The routing value rides along the trace context, crossing boundaries without any individual service needing to explicitly forward it.&lt;/p&gt;

&lt;p&gt;By leveraging these foundational primitives, platform teams can easily adopt scalable ephemeral environment solutions to orchestrate the deployment of sandbox services and automatically configure the routing rules within their existing mesh. This gives agents the ability to validate their own work against live cluster dependencies with instant feedback, eliminating the integration bottleneck.&lt;/p&gt;

&lt;h3&gt;
  
  
  What’s next
&lt;/h3&gt;

&lt;p&gt;Agentic workflows are the new standard for software development, and they are already revealing the cracks in the traditional model of code validation and review. The gap between teams that have made scalable validation infrastructure a top priority and those that haven’t is evident, and it will only get bigger.&lt;/p&gt;

&lt;p&gt;Teams that are already running service meshes like Istio are significantly ahead of the curve here. They already have the traffic-routing primitives in place that make implementing scalable, ephemeral environment solutions like &lt;a href="https://www.signadot.com/" rel="noopener noreferrer"&gt;Signadot&lt;/a&gt; seamless. This puts them in a position to move quickly on tackling the agentic PR validation issue before it becomes a full-blown crisis.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>devops</category>
      <category>aiagents</category>
      <category>automation</category>
    </item>
    <item>
      <title>Coding Agents Are Only as Good as the Signals You Feed Them</title>
      <dc:creator>Signadot</dc:creator>
      <pubDate>Thu, 09 Apr 2026 21:02:09 +0000</pubDate>
      <link>https://forem.com/signadot/coding-agents-are-only-as-good-as-the-signals-you-feed-them-5kg</link>
      <guid>https://forem.com/signadot/coding-agents-are-only-as-good-as-the-signals-you-feed-them-5kg</guid>
      <description>&lt;p&gt;&lt;em&gt;Read this article on &lt;a href="https://www.signadot.com/blog/coding-agents-are-only-as-good-as-the-signals-you-feed-them/?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=Coding+Agents+Are+Only+as+Good+as+the+Signals+You+Feed%C2%A0Them" rel="noopener noreferrer"&gt;Signadot&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The industry has spent the last few years optimizing AI agents’ code-generation capabilities. The focus has been on expanding context windows, fine-tuning models on repository-specific data, and developing complex prompting strategies. This has undoubtedly produced more capable coding agents. However, for most teams, that code-generation capability has not translated into significant gains in productivity.&lt;/p&gt;

&lt;p&gt;Most engineering teams are stuck in a manual workflow. The agent generates the code, tests it locally, and submits a PR to the developer for review. Deploying the code, validating that it works, and feeding back any integration issues to the agent all happen at human pace. This workflow puts a hard ceiling on the productivity gains that agents can deliver by making developers into a validation bottleneck.&lt;/p&gt;

&lt;p&gt;But some companies are enabling real autonomy for their agents and seeing the productivity gains that AI promises. Organizations like Stripe, Ramp, and the internal teams at OpenAI and Anthropic have come to the same realization: the quality of an agent’s output is directly proportional to the quality of the feedback loop it receives.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“The quality of an agent’s output is directly proportional to the quality of the feedback loop it receives.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To elevate engineers to architects and see the speed of agentic code generation translate into productivity, platform engineering teams need to reconsider their strategy. Instead of focusing on giving developers better coding agents, the more impactful lever may be giving agents better feedback infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  The lesson from harness engineering
&lt;/h3&gt;

&lt;p&gt;OpenAI recently documented how they &lt;a href="https://openai.com/index/harness-engineering/" rel="noopener noreferrer"&gt;built a complete software product using Codex&lt;/a&gt; with a guiding principle of “Humans steer. Agents execute.” Their success was not driven purely by model intelligence or prompt engineering. It was driven by a heavy investment in the environment within which the agent operated.&lt;/p&gt;

&lt;p&gt;A team of just three engineers generated a working product with internal users and millions of lines of code by designing environments, specifying intent, and building rigorous feedback loops. The primary job of the engineers shifted from writing implementation code to building the scaffolding that allowed agents to verify their own work. This approach is known as harness engineering.&lt;/p&gt;

&lt;p&gt;Harness engineering involves equipping agents with the tools and constraints required to act effectively. OpenAI engineers would write a docstring and a set of assertions. The agent would then generate the implementation. If the assertions failed, the environment would automatically capture the traceback, feed it back to the model, and request a retry. This loop allows for dozens of iterations without human intervention.&lt;/p&gt;

&lt;p&gt;The key lesson here is that for agents to behave like engineers, they need the same tools, environments, and constraints as engineers at the infrastructure level. By giving the agent a way to validate its own work, they transformed the model from a one-shot code generator into an engineer capable of iteration. The harness provided the signals the agent needed to debug its own code, verify its logic, and deliver fully functioning software.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stripe’s Minions and the feedback loop
&lt;/h3&gt;

&lt;p&gt;We can see a similar pattern at Stripe. The company recently published a blog post detailing its internal agent framework, &lt;a href="https://stripe.dev/blog/minions-stripes-one-shot-end-to-end-coding-agents" rel="noopener noreferrer"&gt;Minions&lt;/a&gt;. Reportedly, Minions produce over a thousand merged pull requests every week. Stripe did not achieve this volume simply by pointing a large language model at its monorepo.&lt;/p&gt;

&lt;p&gt;The company built an MCP server called Toolshed, which exposes over 400 tools to its agents. Crucially, it gave agents full access to the development environment and built deterministic verification steps into the agent’s loop.&lt;/p&gt;

&lt;p&gt;When a Stripe Minion writes code, the harness forces it through a gauntlet of verification steps. It begins with git operations, then moves to linting and formatting. If the agent generates code that violates the style guide, the linter rejects it immediately and returns the specific line number. The agent consumes this error message and corrects the syntax.&lt;/p&gt;

&lt;p&gt;The Minion then moves on to type checking and testing. If a test fails, the error output is fed back into the context window for a fix. This functions as a closed-loop control system, with the development environment itself providing the error signal. This design allows the organization to trust agent output because the system prevents incorrect code from leaving the agent’s local context.&lt;/p&gt;

&lt;h3&gt;
  
  
  The verification gap
&lt;/h3&gt;

&lt;p&gt;Most engineering teams today do not operate at this level of sophistication. They often provide their agents with little more than a code editor and a terminal window.&lt;/p&gt;

&lt;p&gt;This creates a verification gap. It is like hiring a senior engineer and not giving them access to staging environments, monitoring dashboards, test infrastructure, or code review, and expecting them to contribute effectively.&lt;/p&gt;

&lt;p&gt;The feedback signals available to an agent define the ceiling of what it can accomplish autonomously. If an agent can only see the text in the editor, it is limited to fixing syntax errors. If it can see the compiler output, it can detect type errors. But to solve complex integration problems, it needs access to the same rich diversity of signals that human engineers rely on.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“The feedback signals available to an agent define the ceiling of what it can accomplish autonomously… Without these signals, agents are prone to silent failures.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Without these signals, agents are prone to silent failures. An agent might generate a SQL query that is syntactically correct and returns the correct data, but performs a full table scan, degrading production performance. Without access to an explain plan or execution metrics, the agent has no way of knowing that the code it wrote fails in production.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqk0u7ndxftlpv6unbql8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqk0u7ndxftlpv6unbql8.png" alt=" " width="800" height="557"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  A hierarchy of feedback signals
&lt;/h3&gt;

&lt;p&gt;To understand the impact of feedback signals on agents, it is worth mapping the hierarchy. Each level provides the agent with more context and raises the ceiling on their autonomy and, in turn, their productivity.&lt;/p&gt;

&lt;h4&gt;
  
  
  Syntax and type checking
&lt;/h4&gt;

&lt;p&gt;This is the baseline. Any competent agent loop effectively eliminates syntax and type errors by iterating against compiler or linter output. However, these represent the shallowest class of bugs. A program can be syntactically perfect and type-safe while completely failing in production.&lt;/p&gt;

&lt;h4&gt;
  
  
  Unit tests
&lt;/h4&gt;

&lt;p&gt;Agents that can run local unit tests can verify logic in isolation. This catches a significant volume of logical defects but misses the complexity of distributed systems. A unit test can confirm that a function correctly calculates a tax rate, but it cannot confirm that the tax service is reachable or that the authentication token is valid.&lt;/p&gt;

&lt;h4&gt;
  
  
  Integration and API tests
&lt;/h4&gt;

&lt;p&gt;This is where the verification gap widens. To verify that a new or updated service correctly calls an upstream dependency, the agent needs access to a running environment where those services interact. Agents frequently hallucinate API payloads or invent endpoints without this context.&lt;/p&gt;

&lt;h4&gt;
  
  
  Observability data
&lt;/h4&gt;

&lt;p&gt;Agents are rarely given access to traces and logs, yet these are critical tools for developers to debug complex failures. Giving an agent the ability to query logs or analyze a trace ID allows it to diagnose runtime behavior issues that static analysis will never catch.&lt;/p&gt;

&lt;h4&gt;
  
  
  Visual and end-to-end verification
&lt;/h4&gt;

&lt;p&gt;Finally, visual and end-to-end verification is required to validate any changes that fully impact the frontend. A backend agent might deploy a schema change that passes all service-level tests but breaks the user interface because a component expects a different data format. By equipping agents with isolated previews and tools to drive a browser, they can confirm that their changes function end-to-end and close the loop.&lt;/p&gt;

&lt;h3&gt;
  
  
  What’s next
&lt;/h3&gt;

&lt;p&gt;We have already crossed the threshold of model intelligence, enabling powerful engineering capabilities for agents. The limiting factor is now the richness of the feedback signals available to agents.&lt;/p&gt;

&lt;p&gt;There is a strong case for treating agent feedback infrastructure as a first-class platform capability, much like CI/CD pipelines are treated now. This involves considering investments in standardized tool interfaces like MCP, structured outputs that make logs and errors easily consumable by machine, and &lt;a href="https://www.signadot.com/" rel="noopener noreferrer"&gt;ephemeral environment solutions&lt;/a&gt; that allow agents to spin up the isolated spaces they need to test and iterate in parallel against real dependencies.&lt;/p&gt;

&lt;p&gt;Teams that build infrastructure to enable these feedback loops will see velocity compound as models improve. Those that do not will always have a ceiling on the productivity they can generate from coding agents.&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>kubernetes</category>
      <category>testing</category>
      <category>automation</category>
    </item>
    <item>
      <title>Why the MCP Server Is Now a Critical Microservice</title>
      <dc:creator>Signadot</dc:creator>
      <pubDate>Thu, 19 Feb 2026 15:55:36 +0000</pubDate>
      <link>https://forem.com/signadot/why-the-mcp-server-is-now-a-critical-microservice-5cla</link>
      <guid>https://forem.com/signadot/why-the-mcp-server-is-now-a-critical-microservice-5cla</guid>
      <description>&lt;p&gt;&lt;em&gt;Read this article on &lt;a href="https://www.signadot.com/blog/why-the-mcp-server-is-now-a-critical-microservice?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=Why+the+MCP+Server+Is+Now+a+Critical+Microservice" rel="noopener noreferrer"&gt;Signadot&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In my previous article on &lt;a href="https://www.signadot.com/blog/your-ci-cd-pipeline-is-not-ready-to-ship-ai-agents" rel="noopener noreferrer"&gt;preparing CI/CD pipelines to ship production-ready agents&lt;/a&gt;, I argued that we cannot ship agents to production that are driven primarily by non-deterministic models. Instead, they must be built as robust workflows where the large language model (LLM) is introduced strategically at specific steps within a deterministic control flow.&lt;/p&gt;

&lt;p&gt;Now we must examine the most critical node in that framework.&lt;/p&gt;

&lt;p&gt;The Model Context Protocol (MCP) server facilitates interactions between the probabilistic LLM node and the deterministic microservices workflow. It acts as the translation layer connecting the reasoning engine to external data and tools.&lt;/p&gt;

&lt;p&gt;The model is one half of the agent architecture. The MCP server is the other half. While model evaluations validate the reasoning engine, they cannot verify the system as a whole. Validation strategies relying on&lt;a href="https://www.signadot.com/blog/why-mocks-fail-real-environment-testing-for-microservices" rel="noopener noreferrer"&gt; mocks fail to test&lt;/a&gt; the agent as a workflow.&lt;/p&gt;

&lt;p&gt;Reliability of the end-to-end workflow is paramount when shipping agents to production. The MCP server is the critical node in this topology, acting as both sensory organ and effector arm. If it transmits ambiguous signals, the agent acts erratically. It hallucinates. It degrades user trust. It causes critical business errors.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Architectural Shift From Contracts to Semantics
&lt;/h3&gt;

&lt;p&gt;To understand the failure risks, we must examine how the MCP server alters service contracts.&lt;/p&gt;

&lt;p&gt;Service-to-service communication is deterministic in standard microservices environments. Service A calls Service B using a strict REST or gRPC contract. The interaction is rigid. It is predictable. It is easily validated.&lt;/p&gt;

&lt;p&gt;An agentic workflow inverts this.&lt;/p&gt;

&lt;p&gt;The agent is a nondeterministic actor operating on probabilistic logic. It decides when to call a tool based on semantic context provided by the MCP server. The server exposes a world model rather than just an API endpoint.&lt;/p&gt;

&lt;p&gt;This makes the MCP server a distinct type of microservice. It is a translation layer converting probabilistic intent into deterministic action. This responsibility manifests in three operations requiring rigorous engineering.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F49hvyp0pw9j5esb0xxw9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F49hvyp0pw9j5esb0xxw9.png" alt=" " width="800" height="378"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  1. Defining Capability Boundaries
&lt;/h4&gt;

&lt;p&gt;The MCP server defines agent capabilities through JSON-RPC tool definitions.&lt;/p&gt;

&lt;p&gt;If the server exposes a schema with vague descriptions, the agent cannot formulate a valid execution plan. A human developer might read documentation to clarify an API field, but the agent relies solely on metadata exposed by the list_tools capability.&lt;/p&gt;

&lt;p&gt;Consider a payment operations agent handling refunds. A fragile MCP implementation might expose a tool named refund_user to process a refund.&lt;/p&gt;

&lt;p&gt;This lacks semantic density. The model does not know whether this applies to a full or partial refund or if it handles tax calculation. It is a black box.&lt;/p&gt;

&lt;p&gt;A robust implementation defines the boundary with precision. It exposes process_prorated_subscription_refund. The description explicitly states that it calculates the remaining balance for the current billing cycle and issues a credit.&lt;/p&gt;

&lt;p&gt;The reasoning chain breaks without this specificity.&lt;/p&gt;

&lt;h4&gt;
  
  
  2. Governing the Context Economy
&lt;/h4&gt;

&lt;p&gt;The MCP server governs the context window. It must retrieve backend data and format it for LLM consumption.&lt;/p&gt;

&lt;p&gt;This data engineering challenge requires differentiating between signal and noise.&lt;/p&gt;

&lt;p&gt;Providing a raw 5 MB JSON dump dilutes agent attention. It wastes tokens and increases latency. Conversely, providing too little data causes the agent to hallucinate missing details.&lt;/p&gt;

&lt;p&gt;The server must act as a transformation layer that optimizes raw data into context-ready snippets.&lt;/p&gt;

&lt;h4&gt;
  
  
  3. Executing Side Effects
&lt;/h4&gt;

&lt;p&gt;The MCP server executes actions for the agent. When an agent triggers a deployment, the server is the execution mechanism.&lt;/p&gt;

&lt;p&gt;A confused agent can trigger destructive loops if the server lacks idempotency or error handling. The server must implement safeguards preventing the model from erroneously retrying state-changing operations.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Engineering Rigor Required for Production
&lt;/h3&gt;

&lt;p&gt;Shipping agents to production requires due diligence exceeding standard microservice development. This is most visible in return state ambiguity.&lt;/p&gt;

&lt;p&gt;A traditional API might return a 404 error code, which a client handles with logic. An MCP server faces a more complex challenge. It must return a natural language description or structured tool result explaining why the action failed.&lt;/p&gt;

&lt;p&gt;If the server returns a generic stack trace, the agent may retry endlessly or invent a plausible but incorrect reason for failure. The error message becomes part of the prompt for the next conversation turn. It must be engineered as carefully as the system prompt.&lt;/p&gt;

&lt;p&gt;Latency is also critical. Agents operate in a sequential thought loop. They reason. They call a tool. They wait. They reason again.&lt;/p&gt;

&lt;p&gt;A slow server breaks the cognitive chain. High latency causes context timeouts, forcing the agent to abandon workflows. This leaves systems in inconsistent states.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scaling Testing via Multitenancy
&lt;/h3&gt;

&lt;p&gt;The nondeterministic nature of the client makes testing difficult. Traditional unit tests are insufficient.&lt;/p&gt;

&lt;p&gt;Unit testing a Python function to ensure valid JSON output does not prove that an agent will understand how to use it. Mocks are equally ineffective. They decouple the &lt;a href="https://thenewstack.io/why-your-microservice-integration-tests-miss-real-problems/" rel="noopener noreferrer"&gt;test from real system behavior&lt;/a&gt; and create false confidence.&lt;/p&gt;

&lt;p&gt;The only way to validate an MCP server is through rigorous end-to-end testing against real dependencies. However, spinning up full cluster replicas for every test is rarely feasible.&lt;/p&gt;

&lt;p&gt;To validate an MCP server without the overhead of full environment replication, we treat the test run as a logical slice within a shared cluster. This life cycle relies on header based routing and session affinity:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Handshake and routing: The test harness initializes the agent with specific context metadata (such as a baggage header or a custom routing parameter) during the WebSocket or transport handshake. This signals the ingress controller or service mesh to route the persistent JSON-RPC session specifically to the candidate MCP server (the version under test), bypassing the stable production traffic.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Session isolation: Once connected, the agent operates within a strictly isolated session. While the underlying compute resources may be shared, the logical control flow is pinned to the candidate artifact. This ensures that the nondeterministic reasoning of the agent is exercising only the new code paths.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Shared downstream state: The candidate MCP server processes the agent’s intent but executes side effects against shared downstream dependencies such as staging databases or stable microservices. This eliminates the need for mocks, allowing the agent to interact with a realistic “world model” where API contracts and data schemas are genuine.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe487fjlg1oealeep18jc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe487fjlg1oealeep18jc.png" alt=" " width="800" height="613"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This architecture enables safe end-to-end semantic testing. The harness prompts the agent to perform an operation and verifies the state change against downstream microservices.&lt;/p&gt;

&lt;p&gt;Isolation at the connection layer turns the test run into a private lane on a public highway. This enables full end-to-end validation of the MCP server without saturating &lt;a href="https://www.signadot.com/blog/smart-ephemeral-environments-share-more-copy-less" rel="noopener noreferrer"&gt;testing infrastructure or introducing resource contention in shared staging environments&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Treat It Like Critical Infrastructure
&lt;/h3&gt;

&lt;p&gt;Teams that are shipping advanced, customer-facing agents understand that robust MCP servers are critical infrastructure. We must recognize them as complex architectural nodes that directly affect agent reliability.&lt;/p&gt;

&lt;p&gt;Model evals are critical but insufficient for production standards. Rigorous integration testing of agents with MCP servers is necessary.&lt;/p&gt;

&lt;p&gt;An agent is only as effective as its tools. A fragile MCP server creates a fragile agent. Elevating the MCP server to a fully validated microservice is essential for advancing agent development from internal experiments to products that are ready for production.&lt;/p&gt;

&lt;p&gt;Learn more about how to implement this testing workflow for your agents at &lt;a href="https://www.signadot.com/" rel="noopener noreferrer"&gt;Signadot.com&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aiagents</category>
      <category>platformengineering</category>
      <category>mcp</category>
      <category>backend</category>
    </item>
    <item>
      <title>Your CI/CD Pipeline Is Not Ready To Ship AI Agents</title>
      <dc:creator>Signadot</dc:creator>
      <pubDate>Wed, 18 Feb 2026 15:43:01 +0000</pubDate>
      <link>https://forem.com/signadot/your-cicd-pipeline-is-not-ready-to-ship-ai-agents-219</link>
      <guid>https://forem.com/signadot/your-cicd-pipeline-is-not-ready-to-ship-ai-agents-219</guid>
      <description>&lt;p&gt;&lt;em&gt;Read this blog on &lt;a href="https://www.signadot.com/blog/your-ci-cd-pipeline-is-not-ready-to-ship-ai-agents?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=Your+CI%2FCD+Pipeline+Is+Not+Ready+To+Ship+AI%C2%A0Agents" rel="noopener noreferrer"&gt;Signadot&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Let’s be honest with ourselves for a minute. If you look past the hype cycles, the viral Twitter demos and the astronomical valuation of foundation model companies, you will notice a distinct gap in the AI landscape.&lt;/p&gt;

&lt;p&gt;We are incredibly early, and our infrastructure is failing us.&lt;/p&gt;

&lt;p&gt;While every SaaS company has slapped a copilot sidebar onto its UI, actual autonomous agents are rare in the wild. I am referring to software that reliably executes complex and multistep tasks without human hand-holding. Most agents today are internal tools glued together by enthusiastic engineers to summarize Slack threads or query a SQL database. They live in the safe harbor of internal usage where a 20% failure rate is a quirky annoyance rather than a churn event.&lt;/p&gt;

&lt;p&gt;Why aren’t these agents facing customers yet? It is not because the models lack intelligence. It is because our delivery pipelines lack rigor. Taking an agent from cool demo to production-grade reliability is an engineering nightmare that few have solved because traditional CI/CD pipelines simply were not designed for non-deterministic software.&lt;/p&gt;

&lt;p&gt;We are learning the hard way that shipping agents is not an AI problem. It is a systems engineering problem. Specifically, it is a &lt;a href="https://thenewstack.io/why-your-microservice-integration-tests-miss-real-problems/" rel="noopener noreferrer"&gt;testing infrastructure problem&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Death of ‘Prompt and Pray’
&lt;/h3&gt;

&lt;p&gt;For the last year, the industry has been obsessed with frameworks that promised magic. You give the framework a goal and it figures out the rest. This was the &lt;a href="https://www.oreilly.com/radar/beyond-prompt-and-pray/" rel="noopener noreferrer"&gt;“prompt and pray”&lt;/a&gt; era.&lt;/p&gt;

&lt;p&gt;But as recent discussions in the engineering community highlight, specifically the insightful &lt;a href="https://github.com/humanlayer/12-factor-agents" rel="noopener noreferrer"&gt;conversation around 12-Factor Agents&lt;/a&gt;, production reality is boringly deterministic. The developers actually shipping reliable agents are abandoning the idea of total autonomy. Instead, they are building robust and deterministic workflows where large language models (LLMs) are treated as fuzzy function calls injected at specific leverage points.&lt;/p&gt;

&lt;p&gt;When teams start testing agents, they almost always start with evals.&lt;/p&gt;

&lt;p&gt;The 12-Factor philosophy correctly argues that you must own your control flow. You cannot outsource your logic loop to a probabilistic model. If you do, you end up with a system that works 80% of the time and hallucinates itself into a corner the other 20%.&lt;/p&gt;

&lt;p&gt;So we build the agent as a workflow. We treat the LLM as a component rather than the architect. But once we settle on this architecture, we run headfirst into a wall that traditional software engineering solved a decade ago but which AI has reopened. That wall is integration testing.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Trap of Evals
&lt;/h3&gt;

&lt;p&gt;When teams start testing agents, they almost always start with evals.&lt;/p&gt;

&lt;p&gt;Evals are critical. You need frameworks to score your LLM outputs for relevance, toxicity and hallucinations. You need to know if your prompt changes caused a regression in reasoning.&lt;/p&gt;

&lt;p&gt;However, in the context of shipping a product, evals are essentially unit tests. They test the logic of the node, but they do not test the integrity of the graph.&lt;/p&gt;

&lt;p&gt;In a production environment, your agent is not chatting in a void. It is acting. It is calling tools. It is fetching data from a CRM, updating a ticket in Jira or triggering a deployment via an &lt;a href="https://thenewstack.io/mcp-the-missing-link-between-ai-agents-and-apis/" rel="noopener noreferrer"&gt;MCP (Model Context Protocol) server&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The reliability of your agent is not just defined by how well it writes text or code. It is defined by how consistently it handles the messy and structured data returned by these external dependencies.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Integration Nightmare
&lt;/h3&gt;

&lt;p&gt;This is where the platform engineering headache begins.&lt;/p&gt;

&lt;p&gt;Imagine you have an agent designed to troubleshoot Kubernetes pod failures. To test this agent, you cannot just feed it a text prompt. You need to put it in an environment where it can do several things. It must call the Kubernetes API or an MCP server wrapping it. It must receive a JSON payload describing a CrashLoopBackOff. It must parse that payload. It must decide to check the logs. Finally, it must call the log service.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjf8wg92a7q3ekw6fexi4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjf8wg92a7q3ekw6fexi4.png" alt=" " width="800" height="492"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If the structure of that JSON payload changes, or if the latency of the log service spikes, or if the MCP server returns a slightly different error schema, your agent might break. It might hallucinate a solution because the input context did not match its training examples.&lt;/p&gt;

&lt;p&gt;To test this reliably, you &lt;a href="https://www.signadot.com/blog/we-need-a-new-approach-to-testing-microservices" rel="noopener noreferrer"&gt;need integration testing&lt;/a&gt;. But integration testing for agents is significantly harder than for standard web apps.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Traditional Testing Tails
&lt;/h3&gt;

&lt;p&gt;In traditional software development, we mock dependencies. We stub out the database and the third-party APIs.&lt;/p&gt;

&lt;p&gt;But with LLM agents, the data is the control flow. If you mock the response from an MCP server, you are feeding the LLM a perfect and sanitized scenario. You are testing the happy path. But LLMs are most dangerous on the unhappy path.&lt;/p&gt;

&lt;p&gt;You need to know how the agent reacts when the MCP server returns a 500 error, an empty list or a schema with missing fields. If you mock these interactions, you are writing the test to pass rather than to find bugs. You are not testing the agent’s ability to reason. You are testing your own ability to write mocks.&lt;/p&gt;

&lt;p&gt;The alternative to mocking is usually a full staging environment where you spin up the agent, the MCP servers, the databases and the message queues.&lt;/p&gt;

&lt;p&gt;But in a modern microservices architecture, spinning up a duplicate stack for every pull request is prohibitively expensive and slow. You cannot wait 45 minutes for a full environment provision just to test if a tweak to the system prompt handles a database error correctly.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Need for Ephemeral Sandboxes
&lt;/h3&gt;

&lt;p&gt;To ship production-grade agents, we need to rethink our CI/CD pipeline. We need infrastructure that allows us to perform high-fidelity integration testing early in the software development life cycle.&lt;/p&gt;

&lt;p&gt;We need ephemeral sandboxes.&lt;/p&gt;

&lt;p&gt;A platform engineer needs to provide a way for the AI developer to spin up a lightweight, isolated environment that contains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The version of the agent being tested.&lt;/li&gt;
&lt;li&gt;The specific MCP servers and microservices it depends on.&lt;/li&gt;
&lt;li&gt;Access to real (or realistic) data stores.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Crucially, we do not need to duplicate the entire platform. We need a system that allows us to spin up the changed components while routing traffic intelligently to shared and stable baselines for the rest of the stack.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fddujny88yp7dlc8koxmu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fddujny88yp7dlc8koxmu.png" alt=" " width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This approach solves the data fidelity problem. The agent interacts with real MCP servers running real logic. If the MCP server returns a complex JSON object, the agent has to ingest it. If the agent makes a state-changing call like restart pod, it actually hits the service or a sandboxed version of it. This ensures the loop is closed.&lt;/p&gt;

&lt;p&gt;This is the only way to verify that the workflow holds up.&lt;/p&gt;

&lt;h3&gt;
  
  
  Shifting Left on Agentic Reliability
&lt;/h3&gt;

&lt;p&gt;The future of AI agents is not just better models. It is better DevOps.&lt;/p&gt;

&lt;p&gt;If we accept that production agents are just software with fuzzy logic, we must accept that they require the same rigor in integration testing as a payment gateway or a flight control system.&lt;/p&gt;

&lt;p&gt;We are moving toward a world where the agent is just one microservice in a Kubernetes cluster. It communicates via MCP to other services. The challenge for platform engineers is to give developers the confidence to merge code.&lt;/p&gt;

&lt;p&gt;That confidence does not come from a green checkmark on a prompt eval. It comes from seeing the agent navigate a live environment, query a live MCP server and execute a workflow successfully.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;Building the agent is the easy part. Building the stack to reliably test the agent is where the battle is won or lost.&lt;/p&gt;

&lt;p&gt;As we move from internal toys and controlled demos to customer-facing products, the teams that win will be those that can iterate fast without breaking things. They will be the teams that abandon the idea of “prompt and pray” and instead bring production fidelity to their pull request (PR) review. This requires a specific type of infrastructure focused on request-level isolation and ephemeral &lt;a href="https://www.signadot.com/blog/kubernetes-isnt-your-ai-bottleneck----its-your-secret-weapon" rel="noopener noreferrer"&gt;testing environments that work natively within Kubernetes&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Solving this infrastructure gap is our core mission at Signadot. We allow platform teams to create lightweight &lt;a href="https://www.signadot.com/blog/sandbox-testing-the-devex-game-changer-for-microservices" rel="noopener noreferrer"&gt;sandboxes to test agents&lt;/a&gt; against real dependencies without the complexity of full environments. If you are refining the architecture for your AI workflows, you can learn more about this testing pattern at signadot.com.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>devops</category>
      <category>aiagents</category>
      <category>automation</category>
    </item>
    <item>
      <title>Ramp’s Inspect shows closed-loop AI agents are software’s future</title>
      <dc:creator>Signadot</dc:creator>
      <pubDate>Thu, 12 Feb 2026 15:23:51 +0000</pubDate>
      <link>https://forem.com/signadot/ramps-inspect-shows-closed-loop-ai-agents-are-softwares-future-4ic1</link>
      <guid>https://forem.com/signadot/ramps-inspect-shows-closed-loop-ai-agents-are-softwares-future-4ic1</guid>
      <description>&lt;p&gt;&lt;em&gt;Read this blog on &lt;a href="https://www.signadot.com/blog/ramps-inspect-shows-closed-loop-ai-agents-are-softwares-future?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=Ramp%E2%80%99s+Inspect+shows+closed-loop+AI+agents+are+software%E2%80%99s+future" rel="noopener noreferrer"&gt;Signadot&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The recent release of the &lt;a href="https://engineering.ramp.com/post/why-we-built-our-background-agent" rel="noopener noreferrer"&gt;background coding agent Inspect&lt;/a&gt; by Ramp’s engineering team serves as a definitive proof point that closed-loop agentic systems are the future of software development. It has transformed coding agents into truly autonomous engineering partners, and it is fundamentally changing the way agents deliver software.&lt;/p&gt;

&lt;p&gt;Whether teams use a custom cloud development environment (CDE) like Ramp or another approach, the signal is clear: Teams need to solve for this kind of autonomy or risk getting left behind. Modern engineers need access to coding agents that do not just generate code but also run it, verify the output, and iterate on the solution until it works.&lt;/p&gt;

&lt;p&gt;This distinction represents a fundamental shift. The industry has been focused on optimizing the “brain” of agents, solving for context windows and reasoning. Ramp’s success validates that the “body” matters just as much.&lt;/p&gt;

&lt;p&gt;The ability to interact with a runtime environment is what transforms code from a hypothesis into a solution. This verification loop separates truly autonomous coding agents from those that rely on humans to validate their work.&lt;/p&gt;

&lt;h3&gt;
  
  
  The open-loop bottleneck
&lt;/h3&gt;

&lt;p&gt;Modern coding agents are impressive. They can plan complex refactors and generate thousands of lines of code. However, these &lt;a href="https://thenewstack.io/your-ci-cd-pipeline-is-not-ready-to-ship-ai-agents/" rel="noopener noreferrer"&gt;agents typically operate&lt;/a&gt; in an open loop. They rely on the developer to act as the runtime environment. The agent proposes a solution. The human must compile, test, and interpret error messages or feed them back to the agent. The cognitive load of verification remains with the user.&lt;/p&gt;

&lt;p&gt;This workflow caps developer velocity. The speed of the agent is irrelevant if the verification process is slow. We have optimized code generation to be near instantaneous, but verification remains bound by human bandwidth and linear CI pipelines.&lt;/p&gt;

&lt;p&gt;Inspect demonstrates that closing that loop unlocks a new category of velocity. By giving the agent access to a &lt;a href="https://www.signadot.com/blog/sandbox-testing-the-devex-game-changer-for-microservices" rel="noopener noreferrer"&gt;sandbox to run builds and tests&lt;/a&gt;, the agent transitions from text generator to task completer. It hands off a verified solution rather than a draft.&lt;/p&gt;

&lt;p&gt;The impact is measurable. Ramp reported vertical internal adoption charts. Within months, approximately 30% of all pull requests merged to its frontend and backend repositories were written by Inspect. This penetration suggests closed-loop agents are a step function change in productivity, not a marginal improvement.&lt;/p&gt;

&lt;h3&gt;
  
  
  The economics of curiosity
&lt;/h3&gt;

&lt;p&gt;The value proposition of closed-loop agents is not just delivering code faster. It is about the parallelization of solution discovery.&lt;/p&gt;

&lt;p&gt;In traditional workflows, exploring refactors or library upgrades is expensive. It requires context switching, stashing work and fighting dependency conflicts. Because experimentation costs are high, we experiment less. We stick to safe patterns to avoid the time sink of failure.&lt;/p&gt;

&lt;p&gt;Background agents change the economics of curiosity. If an engineer can spin up 10 concurrent agent sessions to explore 10 architectural approaches, the cost of failure drops significantly.&lt;/p&gt;

&lt;p&gt;Consider a team migrating a legacy component. Currently, this is a multiweek spike. In the new paradigm, a &lt;a href="https://www.signadot.com/blog/your-infrastructure-isnt-ready-for-agentic-development-at-scale" rel="noopener noreferrer"&gt;developer could instead task a fleet of agents&lt;/a&gt; to attempt the migration using different strategies. One agent might try a strangler fig pattern. Another might attempt a hard cutover. A third might focus on integration tests.&lt;/p&gt;

&lt;p&gt;The developer then reviews results rather than typing code. The agents run in isolated sandboxes. They build, catch syntax errors, and run test suites until they achieve a green state. The developer wakes up to three potential pull requests verified against the CI pipeline and chooses the best one.&lt;/p&gt;

&lt;h3&gt;
  
  
  Verification beyond localhost
&lt;/h3&gt;

&lt;p&gt;Ramp’s Inspect platform validates within a custom-built CDE. To ensure these environments start quickly despite their complexity, a sophisticated snapshotting system keeps images warm and ready to launch. Ramp was able to extend this CDE infrastructure to also support integration testing, a brilliant engineering feat that works well for its specific context.&lt;/p&gt;

&lt;p&gt;However, for many organizations building complex, cloud native applications with high levels of dependencies, this approach faces significant hurdles. Often, the entire stack is too large to be spun up on a single virtual machine (VM) or devpod. In these scenarios, while CDEs remain excellent for replacing local development laptops, high-fidelity integration &lt;a href="https://www.signadot.com/blog/we-need-a-new-approach-to-testing-microservices" rel="noopener noreferrer"&gt;testing requires a different approach&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;To enable true autonomy in these complex environments, we need a way to perform integration testing without replicating the entire world. We can connect agents directly to a &lt;a href="https://www.signadot.com/blog/smart-ephemeral-environments-share-more-copy-less" rel="noopener noreferrer"&gt;shared baseline environment&lt;/a&gt; using existing Kubernetes infrastructure.&lt;/p&gt;

&lt;p&gt;In this model, the agent deploys only the modified service to a lightweight sandbox. The infrastructure uses dynamic routing and context propagation to direct specific &lt;a href="https://www.signadot.com/blog/microservices-testing-4-use-cases-for-sandbox-environments" rel="noopener noreferrer"&gt;test traffic to that sandbox&lt;/a&gt; while fulfilling all other dependencies from a shared, stable baseline.&lt;/p&gt;

&lt;p&gt;This approach gives coding agents the power to execute autonomous end-to-end testing, regardless of the stack’s size or complexity. It leverages the existing cluster to provide high-fidelity context. An agent can then run integration tests against real upstream and downstream services. It sees how the change interacts with the actual message queue schema and the latency of the live database.&lt;/p&gt;

&lt;p&gt;This closes the loop with higher fidelity while lowering the infrastructure barrier. By testing against a shared cluster, the agent can catch integration regressions that might pass in a hermetic VM without requiring the platform team to build a custom orchestration engine to support it.&lt;/p&gt;

&lt;h3&gt;
  
  
  The future of software delivery
&lt;/h3&gt;

&lt;p&gt;The release of Inspect is a clear signal of where software development is heading. The era of the human engineer as the sole verifier is ending. We are moving toward a world where agents operate as autonomous partners capable of exploring solutions and verifying their own work.&lt;/p&gt;

&lt;p&gt;Ramp has proven that this workflow is not science fiction. It is working in production today and is driving massive efficiency gains. The question for the rest of the industry is not whether to adopt this workflow, but how.&lt;/p&gt;

&lt;p&gt;Whether a team chooses to build a custom platform like Ramp or adopt an existing cloud native solution like &lt;a href="https://www.signadot.com/" rel="noopener noreferrer"&gt;Signadot&lt;/a&gt; to give their agents a runtime, the imperative is the same. We must provide our agents with a body. We must close the loop between generation and verification. Once we do, we unlock a level of velocity that will define the next generation of high-performing engineering teams.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>agents</category>
      <category>ai</category>
      <category>devops</category>
    </item>
    <item>
      <title>Your infrastructure isn’t ready for agentic development at scale</title>
      <dc:creator>Signadot</dc:creator>
      <pubDate>Thu, 05 Feb 2026 15:53:22 +0000</pubDate>
      <link>https://forem.com/signadot/your-infrastructure-isnt-ready-for-agentic-development-at-scale-25jk</link>
      <guid>https://forem.com/signadot/your-infrastructure-isnt-ready-for-agentic-development-at-scale-25jk</guid>
      <description>&lt;p&gt;Read this blog on &lt;a href="https://www.signadot.com/blog/your-infrastructure-isnt-ready-for-agentic-development-at-scale?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=Your+infrastructure+isn%E2%80%99t+ready+for+agentic+development+at+scale" rel="noopener noreferrer"&gt;Signadot&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I have spent the last year watching the AI conversation shift from smart autocomplete to autonomous contribution. When I test tools like Claude Code or GitHub Copilot Workspace, I am no longer just seeing code suggestions. I am watching them solve tickets and refactor entire modules.&lt;/p&gt;

&lt;p&gt;The promise is seductive. I imagine assigning a complex task and returning to merged work. But while these agents generate code in seconds, I have discovered that code verification is the new bottleneck.&lt;/p&gt;

&lt;p&gt;For agents to be force multipliers, they cannot rely on humans to &lt;a href="https://www.signadot.com/blog/traditional-code-review-is-dead-what-comes-next" rel="noopener noreferrer"&gt;validate every step&lt;/a&gt;. If I have to debug every intermediate state, my productivity gains evaporate. To achieve 10 times the impact, we must transition to an agent-driven loop where humans provide intent while agents handle implementation and integration.&lt;/p&gt;

&lt;h3&gt;
  
  
  The code generation feedback loop crisis
&lt;/h3&gt;

&lt;p&gt;Consider a scenario where an agent is tasked with updating a deprecated API endpoint in a user service. The agent parses the codebase, identifies the relevant files, and generates syntactically correct code. It may even generate a unit test that passes within the limited context of that specific repository.&lt;/p&gt;

&lt;p&gt;However, problems emerge when code interacts with the broader system. A change might break a contract with a downstream payment gateway or an upstream authentication service. If the agent cannot see this failure, it assumes the task is complete and opens a pull request.&lt;/p&gt;

&lt;p&gt;The burden then falls on human developers. They have to pull down the agent’s branch, spin up a local environment, or wait for a slow staging build to finish, only to discover the integration error. The developer pastes the error log back into the chat window and asks the agent to try again. This ping-pong effect destroys velocity.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.linkedin.com/in/bcherny/" rel="noopener noreferrer"&gt;Boris Cherny&lt;/a&gt;, creator of Claude Code, has noted the necessity of &lt;a href="https://x.com/bcherny/status/2007179832300581177" rel="noopener noreferrer"&gt;closed-loop systems&lt;/a&gt; for agents to be effective. An agent is only as capable as its ability to observe the consequences of its actions. Without a feedback loop that includes real runtime data, an agent is building in the dark.&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://thenewstack.io/cloud-native/" rel="noopener noreferrer"&gt;cloud native development&lt;/a&gt;, unit tests and mocks are insufficient for this feedback. In a microservices architecture, correctness is a function of the broader ecosystem.&lt;/p&gt;

&lt;p&gt;Code that passes a unit test is merely a suggestion that it might work. True verification requires the code to run against real dependencies, real network latency, and real data schemas. For an agent to iterate autonomously, it needs access to runtime reality.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frxhi3rozcn71v13ojttz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frxhi3rozcn71v13ojttz.png" alt=" " width="800" height="463"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The requirement: Realistic runtime environments at scale
&lt;/h3&gt;

&lt;p&gt;In a recent blog post, &lt;a href="https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents" rel="noopener noreferrer"&gt;“Effective harnesses for long-running agents,”&lt;/a&gt; Anthropic’s engineering team argued that an agent’s performance is strictly limited by the quality of its harness. If the harness provides slow or inaccurate feedback, the agent cannot learn or correct itself.&lt;/p&gt;

&lt;p&gt;This presents a massive infrastructure challenge for engineering leadership. In a large organization, you might deploy 100 autonomous agents to tackle backlog tasks simultaneously. To support this, you effectively need 100 distinct staging environments.&lt;/p&gt;

&lt;p&gt;The traditional approach to this problem fails at scale. Spinning up full Kubernetes namespaces or ephemeral clusters for every task is cost-prohibitive and slow. Provisioning a full cluster with 50 or more microservices, databases, and message queues can take 15 minutes or more. This latency is fatal for an AI workflow. Large language models (LLMs) operate on a timescale of seconds.&lt;/p&gt;

&lt;p&gt;We are left with a fundamental conflict. We need production-like fidelity to ensure reliability, but we cannot afford the production-level overhead for every &lt;a href="https://thenewstack.io/ai-agents-vs-agentic-ai-a-kubernetes-developers-guide/" rel="noopener noreferrer"&gt;agentic task&lt;/a&gt;. We need a way to verify code that is fast, cheap, and accurate.&lt;/p&gt;

&lt;h3&gt;
  
  
  The solution: Environment virtualization
&lt;/h3&gt;

&lt;p&gt;The answer lies in decoupling the environment from the underlying infrastructure. This concept is known as environment virtualization.&lt;/p&gt;

&lt;p&gt;Environment virtualization allows the creation of lightweight and ephemeral sandboxes within a shared Kubernetes cluster. In this model, a baseline environment runs the stable versions of all services. When an agent proposes a change to a specific service, such as the user service mentioned earlier, it does not clone the entire cluster. Instead, it spins up only the modified workload containing the agent’s new code as a shadow deployment.&lt;/p&gt;

&lt;p&gt;The environment then utilizes dynamic traffic routing to create the illusion of a dedicated environment. It employs context propagation headers to route specific requests to the agent’s sandbox. If a request carries a specific routing key associated with the agent’s task, the service mesh or ingress controller directs that request to the shadow deployment. All other downstream calls fall back to the stable baseline services.&lt;/p&gt;

&lt;p&gt;This architecture solves the agent-environment fit in three specific ways:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Speed:&lt;/strong&gt; Because a single container or pod is launching, rather than a full cluster, sandboxes spin up in seconds.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; The infrastructure footprint is minimal. You are not paying for idle databases or duplicate copies of stable services.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Fidelity:&lt;/strong&gt; Agents test against real dependencies and valid data rather than stubs. The modified service interacts with the actual payment gateways and databases in the baseline.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  The seamless verification workflow for AI agents
&lt;/h3&gt;

&lt;p&gt;The mechanics of this verification loop rely on precise context propagation, typically handled through standard tracing headers like OpenTelemetry baggage.&lt;/p&gt;

&lt;p&gt;When an agent works on a task, its environment is virtually mapped to the remote Kubernetes cluster. This setup supports conflict-free parallelism. Multiple agents can simultaneously work on the same microservice in different sandboxes without collision because routing is determined by unique headers attached to test traffic.&lt;/p&gt;

&lt;p&gt;Here is the autonomous workflow for an agent refactoring a microservice:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Generation:&lt;/strong&gt; The agent analyzes a ticket and generates a code fix with local static analysis. At this stage, the code is theoretical.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Instantiation:&lt;/strong&gt; The agent triggers a sandbox via the &lt;a href="https://thenewstack.io/why-the-mcp-server-is-now-a-critical-microservice" rel="noopener noreferrer"&gt;Model Context Protocol (MCP)&lt;/a&gt; server. This deploys only the modified workload alongside the running baseline in seconds.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Verification:&lt;/strong&gt; The agent runs integration tests against the cluster using a specific routing header. Requests route to the modified service while dependencies fall back to the baseline.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Feedback:&lt;/strong&gt; If the change breaks a downstream contract, the baseline service returns a real runtime error (e.g., 400 Bad Request). The agent captures this actual exception rather than relying on a mock.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Iteration:&lt;/strong&gt; The agent analyzes the error, refines the code to fix the integration failure, and updates the sandbox instantly. It runs the test again to confirm the fix works in the real environment.‍&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Submission:&lt;/strong&gt; Once tests pass, the agent submits a verified pull request (PR). The human reviewer receives a sandbox link to interact with the running code immediately, bypassing local setup.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpo1vbnxj6l6ptqdc6ac3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpo1vbnxj6l6ptqdc6ac3.png" alt=" " width="800" height="463"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Why engineering’s future is autonomous
&lt;/h3&gt;

&lt;p&gt;As we scale the use of AI agents, the bottleneck moves from the keyboard to the infrastructure. If we treat agents as faster typists but force them to wait for slow &lt;a href="https://thenewstack.io/your-ci-cd-pipeline-is-not-ready-to-ship-ai-agents" rel="noopener noreferrer"&gt;legacy CI/CD pipelines&lt;/a&gt;, we gain nothing. We simply build a longer queue of unverified pull requests.&lt;/p&gt;

&lt;p&gt;To move toward a truly autonomous engineering workforce, we must give agents the ability to see. They need to see how their code performs in the real world rather than just in a text editor. They need to experience the friction of deployment and the reality of network calls. This is &lt;a href="https://www.signadot.com/" rel="noopener noreferrer"&gt;Signadot&lt;/a&gt;’s approach.&lt;/p&gt;

&lt;p&gt;Environment virtualization is shifting from a tool for developer experience to foundational infrastructure. By closing the loop, agents can do the messy and iterative work of integration. This leaves architects and engineers free to focus on system design, high-level intent, and the creative aspects of building software.&lt;/p&gt;

</description>
      <category>aiagents</category>
      <category>kubernetes</category>
      <category>devops</category>
      <category>platformengineering</category>
    </item>
    <item>
      <title>Traditional Code Review Is Dead. What Comes Next?</title>
      <dc:creator>Signadot</dc:creator>
      <pubDate>Tue, 27 Jan 2026 18:22:10 +0000</pubDate>
      <link>https://forem.com/signadot/traditional-code-review-is-dead-what-comes-next-41oi</link>
      <guid>https://forem.com/signadot/traditional-code-review-is-dead-what-comes-next-41oi</guid>
      <description>&lt;p&gt;&lt;em&gt;Read this blog on &lt;a href="https://www.signadot.com/blog/traditional-code-review-is-dead-what-comes-next?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=Traditional+Code+Review+Is+Dead.+What+Comes+Next%3F" rel="noopener noreferrer"&gt;Signadot&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I noticed a quiet shift in our engineering team recently that brought me to a broader realization about the future of software development: Code review has changed fundamentally.&lt;/p&gt;

&lt;p&gt;It started with a pull request (PR). An engineer had used an agent to generate the entire change, iterating with it to define business logic, but ultimately relying on the agent to write the code. It was a substantial chunk of work. The code was syntactically perfect. It followed our linting rules. It even included unit tests that passed green.&lt;/p&gt;

&lt;p&gt;The human reviewer, a senior engineer who is usually meticulous about architectural patterns and naming conventions, approved it almost immediately. The time between the PR opening and the approval was less than two minutes.&lt;/p&gt;

&lt;p&gt;When I asked about the speed of the approval, they said they checked if the output was correct and moved on. They did not feel the need to parse every line of syntax because it was written by an agent. They spun up the deploy preview, clicked the buttons, verified the state changes and merged it.&lt;/p&gt;

&lt;p&gt;This made sense, but it still took me by surprise. I realized that I was witnessing the silent death of traditional code review.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Silent Death of the Code Review
&lt;/h3&gt;

&lt;p&gt;For decades, the peer review process has been the primary quality gate in software engineering. Humans reading code written by other humans served two critical purposes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It caught logic bugs that automated tests missed.&lt;/li&gt;
&lt;li&gt;It maintained a shared mental model of the codebase across the team.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The assumption behind this process was that code is a scarce resource produced slowly. A human developer might write 50 to 100 lines of meaningful code in a day. Another human can reasonably review that volume while maintaining high cognitive focus.&lt;/p&gt;

&lt;p&gt;But we are entering an era where code is becoming abundant and cheap. In fact, the precise goal of implementing coding agents is to generate code at a velocity and volume that by design makes it impossible for humans to keep up.&lt;/p&gt;

&lt;p&gt;When an engineer sees a massive block of AI-generated code, the instinct is to offload the syntax-checking to the machine. If the linter is happy and the tests pass, the human assumes the code is valid. The rigorous line-by-line inspection vanishes.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Problem: AI Trust and the Rubber Stamp
&lt;/h3&gt;

&lt;p&gt;This shift leads to what I call the rubber stamp effect. We see a “lgtm” (looks good to me) approval on code that nobody actually read.&lt;/p&gt;

&lt;p&gt;This creates a significant change to the risk profile. Human errors usually manifest as syntax errors or obvious logic gaps. AI errors are different. Large language models (LLMs) often hallucinate plausible but functionally incorrect code.&lt;/p&gt;

&lt;p&gt;Traditional diff-based review tools are ill-equipped for this. A diff shows you what changed in the text file. It does not show you the emergent behavior of that change. When a human writes code, the diff is a representation of their intent. When an AI writes code, the diff is just a large volume of tokens that may or may not align with the prompt.&lt;/p&gt;

&lt;p&gt;We are moving from a syntax-first culture to an outcome-first culture. The question is no longer “Did you write this correctly?” The question is “Does this do what we asked the agent for?”&lt;/p&gt;

&lt;h3&gt;
  
  
  Previews as the New Source of Truth
&lt;/h3&gt;

&lt;p&gt;In this new world, where engineers are logic architects who offload the writing of code to agents, the most important artifact is not the code. It is the preview.&lt;/p&gt;

&lt;p&gt;If we cannot rely on humans to read the code, we must rely on humans to verify the behavior. But to verify behavior, we need more than a diff. We need a destination. The code must be deployed to a live environment where it can be exercised.&lt;/p&gt;

&lt;p&gt;While frontend previews have become standard, the critical gap — and the harder problem to solve — is the backend.&lt;/p&gt;

&lt;p&gt;Consider a change to a payment processing microservice generated by an agent. The code might look syntactically correct. The logic flow seems correct. But does it handle the race condition when two requests hit the API simultaneously? Does the new database migration lock a critical table for too long?&lt;/p&gt;

&lt;p&gt;You cannot see these problems in a text diff. You cannot even see them in a unit test mock. You can only see them when the code is running in a live, integrated environment.&lt;/p&gt;

&lt;p&gt;A backend preview environment allows for true end-to-end verification. It allows a reviewer to execute real API calls against a real database instance. It transforms the review process from a passive reading exercise into an active verification session. We are not just checking whether the code compiles. We are checking whether the system behaves.&lt;/p&gt;

&lt;p&gt;As AI agents write more code, the “review” phase of the software development life cycle must evolve into a “validation” phase. We are not reviewing the recipe. We are tasting the dish.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Infrastructure Challenge: The Concurrency Explosion
&lt;/h3&gt;

&lt;p&gt;However, this shift to outcome-based verification comes with a massive infrastructure challenge that most platform engineering teams are not ready for.&lt;/p&gt;

&lt;p&gt;A human developer typically works linearly. They open a branch, write code, open a pull request, wait for review and merge. They might context switch between two tasks, but rarely more.&lt;/p&gt;

&lt;p&gt;AI agents work in parallel. An agent tasked with fixing a bug might spin up 10 different strategies to solve it. It could open 10 parallel pull requests, each with a different implementation, and ask the human to select the best one.&lt;/p&gt;

&lt;p&gt;This creates an explosion of concurrency.&lt;/p&gt;

&lt;p&gt;Traditional &lt;a href="https://www.signadot.com/blog/your-ci-cd-pipeline-wasnt-built-for-microservices" rel="noopener noreferrer"&gt;CI/CD pipelines are built&lt;/a&gt; for linear human workflows. They assume a limited number of concurrent builds. If your AI agent opens 20 parallel sessions to test different hypotheses, you face two prohibitive problems: cost and contention.&lt;/p&gt;

&lt;p&gt;First, you cannot have 20 full-scale staging &lt;a href="https://www.signadot.com/blog/reimagining-environments-for-cloud-native-architectures" rel="noopener noreferrer"&gt;environments spinning up on expensive cloud instances&lt;/a&gt;. Imagine spinning up a dedicated Kubernetes cluster and database for 20 variations of a single bug fix. The cloud costs would be astronomical.&lt;/p&gt;

&lt;p&gt;Second, and perhaps worse, is the bottleneck of shared resources. Many pipelines rely on a single &lt;a href="https://www.signadot.com/blog/why-staging-doesnt-scale-for-microservice-testing" rel="noopener noreferrer"&gt;staging environment or limited testing slots&lt;/a&gt;. To avoid data collisions, these systems force PRs into a queue.&lt;/p&gt;

&lt;p&gt;With existing human engineering teams, these queues are already a frustrating bottleneck. With multiple agents dumping 20 PRs into the pipe simultaneously, the queue becomes a deadlock. The alternative of running them all at once on shared infrastructure results in race conditions and flaky tests.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frj98b9ys9f5u0y7x9wy6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frj98b9ys9f5u0y7x9wy6.png" alt=" " width="800" height="410"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Scaling Development With Environment Virtualization
&lt;/h3&gt;

&lt;p&gt;To scale agent-driven development, we cannot rely on infrastructure built for linear human pacing. We are talking about potentially hundreds of concurrent agents generating PRs in parallel, all of which need to be validated with previews. Cloning the entire stack for each one is not a viable option.&lt;/p&gt;

&lt;p&gt;The solution is to multiplex these &lt;a href="https://www.signadot.com/blog/smart-ephemeral-environments-share-more-copy-less" rel="noopener noreferrer"&gt;environments on shared infrastructure&lt;/a&gt;. Just as a single physical computer can host multiple virtual machines (VMs), a single Kubernetes cluster can multiplex thousands of lightweight, ephemeral environments.&lt;/p&gt;

&lt;p&gt;By applying smart isolation techniques at the application layer, we can provide strict separation for each agent’s work without duplicating the underlying infrastructure. This allows us to spin up a dedicated sandbox for every change, ensuring agents can work in parallel and validate code end-to-end without stepping on each other’s toes or exploding cloud costs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;There is a clear shift happening in the way we review changes. As agents take over the writing of code, the review process naturally evolves from checking syntax to verifying behavior. The preview is no longer just a convenience. It is the only scalable way to validate the work that agents produce.&lt;/p&gt;

&lt;p&gt;At &lt;a href="https://www.signadot.com/" rel="noopener noreferrer"&gt;Signdot&lt;/a&gt;, we are building for this future. We provide the orchestration layer that enables fleets of agents to work in parallel, generating and validating code end-to-end in a closed loop with instant, cost-effective previews.&lt;/p&gt;

&lt;p&gt;The winners of the next era won’t be the teams with the best style guides, but those who can handle the parallelism of AI agents without exploding their cloud budgets or bringing their CI/CD pipelines to a grinding halt.&lt;/p&gt;

&lt;p&gt;In an AI-first world, reading code is a luxury we can no longer afford. Verification is the new standard. If you cannot preview it, you cannot ship it.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devops</category>
      <category>kubernetes</category>
      <category>testing</category>
    </item>
    <item>
      <title>Merging To Test Is Killing Your Microservices Velocity</title>
      <dc:creator>Signadot</dc:creator>
      <pubDate>Mon, 19 Jan 2026 19:57:45 +0000</pubDate>
      <link>https://forem.com/signadot/merging-to-test-is-killing-your-microservices-velocity-4do0</link>
      <guid>https://forem.com/signadot/merging-to-test-is-killing-your-microservices-velocity-4do0</guid>
      <description>&lt;p&gt;&lt;em&gt;Read this blog on &lt;a href="https://www.signadot.com/blog/merging-to-test-is-killing-your-microservices-velocity?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=Merging+To+Test+Is+Killing+Your+Microservices+Velocity" rel="noopener noreferrer"&gt;Signadot&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Frontend and data layers have evolved with branch-based previews and isolated environments. Why is the backend service layer stuck with shared staging?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you are a platform engineer or an engineering leader, look at your current development pipeline. Is everything treated equally?&lt;/p&gt;

&lt;p&gt;To me, it seems that there is a glaring discrepancy in the way we treat different parts of the stack.&lt;/p&gt;

&lt;p&gt;When a frontend developer pushes code to a feature branch, tools like Vercel or Netlify immediately spin up a deploy preview. It is a unique URL, isolated from production, where they can click around and validate changes instantly.&lt;/p&gt;

&lt;p&gt;When a database engineer needs to test a schema migration, modern platforms like Neon or PlanetScale allow them to branch the database. They get an isolated, copy-on-write clone of the production data to wreck and repair without affecting a single real user.&lt;/p&gt;

&lt;p&gt;But what happens when a backend engineer pushes a change to one microservice in a mesh of 50?&lt;/p&gt;

&lt;p&gt;Nothing.&lt;/p&gt;

&lt;p&gt;There is a gaping hole in the middle of our cloud native stack. While frontend and data layers have evolved to embrace branch-based development, the backend service layer is stuck in the stone age of shared environments.&lt;/p&gt;

&lt;p&gt;This isn’t just an annoyance. It is the primary bottleneck preventing teams from truly shifting left.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fimiry5qfmqr8w6e3jkx1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fimiry5qfmqr8w6e3jkx1.png" alt=" " width="800" height="426"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Merge To Validate Anti-Pattern
&lt;/h3&gt;

&lt;p&gt;In most distributed architectures, a developer working on a backend service cannot realistically run the entire platform on their laptop. It is too heavy.&lt;/p&gt;

&lt;p&gt;So they rely on &lt;a href="https://www.signadot.com/blog/why-mocks-fail-real-environment-testing-for-microservices" rel="noopener noreferrer"&gt;unit tests and mocks&lt;/a&gt;. But we all know that mocks are liars. They do not catch the contract drift between services or the latency issues that only appear over the network.&lt;/p&gt;

&lt;p&gt;To get real validation, the &lt;a href="https://www.signadot.com/blog/the-struggle-to-test-microservices-before-merging" rel="noopener noreferrer"&gt;developer has to merge&lt;/a&gt; their branch to the main trunk so it can be deployed to a shared staging environment.&lt;/p&gt;

&lt;p&gt;This is where velocity goes to die.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The queue:&lt;/strong&gt; Developers wait in line to deploy to staging.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The block:&lt;/strong&gt; If one developer breaks staging, everyone is blocked.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The noise:&lt;/strong&gt; Testing fails, but is it your code, or did someone else deploy a bad config to the auth-service five minutes ago?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We have normalized this dysfunction. We treat staging as a fragile, sacred monolith. But in an era where we want to deploy multiple times a day, merging to trunk just to see if your code works is backward. It is like pouring concrete before you have checked the blueprints.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution: Service Branching
&lt;/h3&gt;

&lt;p&gt;We need to bring the Vercel and Neon experience to the Kubernetes backend. We need service branching.&lt;/p&gt;

&lt;p&gt;The goal is simple. Every git branch should result in a testable, isolated environment.&lt;/p&gt;

&lt;p&gt;However, the physics of microservices makes this hard. You cannot duplicate a cluster with 100+ services for every single pull request. The cost and spin-up time would be prohibitive.&lt;/p&gt;

&lt;p&gt;The solution is not duplication. It is isolation.&lt;/p&gt;

&lt;p&gt;Imagine a base environment, your existing staging cluster, that runs the stable version of all your services. When a developer pushes a change to a specific service, the platform shouldn’t clone the cluster. It should simply spin up a lightweight sandbox containing only the modified service.&lt;/p&gt;

&lt;p&gt;Smart routing does the rest:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Standard traffic flows through the stable baseline.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.signadot.com/blog/sandbox-testing-the-devex-game-changer-for-microservices" rel="noopener noreferrer"&gt;Test traffic is intercepted and rerouted only to the sandboxed service.&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;If the sandboxed service needs to call other services, it routes back into the stable baseline. This gives the developer the experience of a full, dedicated environment with a fraction of the infrastructure footprint.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The New Mental Model: Git Equals Environment
&lt;/h3&gt;

&lt;p&gt;For this to work at scale, platform engineers need to provide a clean mental model that maps source code directly to infrastructure.&lt;/p&gt;

&lt;p&gt;This is what the new standard looks like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Trunk (main) corresponds to the baseline environment (staging). This is the source of truth. It represents the stable state of the world where all services are interacting as expected.&lt;/li&gt;
&lt;li&gt;Feature branch (feat-xyz) corresponds to a sandbox environment. This is ephemeral. It lives only as long as the PR is open. It contains only the delta of the services that have changed in that specific branch.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When a developer opens a PR, they do not need to think about clusters or namespaces. They just get a dedicated playground that mirrors their branch perfectly.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdazpgwch5no3nk9hk3rn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdazpgwch5no3nk9hk3rn.png" alt=" " width="800" height="516"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Holy Grail: The Full Virtual Stack
&lt;/h3&gt;

&lt;p&gt;When you combine this service branching approach with the existing tools for frontend and database branching, you unlock something powerful: a full virtual stack per branch.&lt;/p&gt;

&lt;p&gt;Imagine a workflow where a developer creates a branch, and magically, a complete, isolated environment materializes. To the developer, it feels like they have their own private copy of the entire company’s infrastructure.&lt;/p&gt;

&lt;p&gt;This includes frontend, backend services and database schemas. They are all aligned to their specific code changes.&lt;/p&gt;

&lt;p&gt;They can run end-to-end integration tests on their branch before merging. They can hand a URL to a product manager to demo the feature. They can validate complex migrations safely. It is a dedicated reality for their feature, created instantly and destroyed just as quickly.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9llrwcfgh7asl0wvsamu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9llrwcfgh7asl0wvsamu.png" alt=" " width="800" height="496"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Why This Matters: Speed and Quality at Scale
&lt;/h3&gt;

&lt;p&gt;This model shifts the paradigm from serial blocking to massive parallelism.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Remove the bottleneck:&lt;/strong&gt; Large engineering teams no longer have to queue up for staging. You can have 10, 50 or 100 developers and agents testing simultaneously without stepping on each other’s toes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;True shift left:&lt;/strong&gt; Integration testing happens during development, not after the merge. You catch the bug when you write it, not three days later when the staging build fails.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;More quality, faster:&lt;/strong&gt; When testing is easy and isolated, people do more of it. We stop fearing deployments and start treating them as routine.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result is a software delivery pipeline that is both significantly faster and more stable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Closing the Gap
&lt;/h3&gt;

&lt;p&gt;The technology to do this exists. The patterns are proven. It is time for platform teams to stop managing static environments and start managing dynamic, ephemeral workflows.&lt;/p&gt;

&lt;p&gt;If you are looking to implement this service branching layer to complete your testing strategy, this is exactly what Signadot was built for. &lt;a href="https://www.signadot.com/?utm_content=inline+mention" rel="noopener noreferrer"&gt;Signadot&lt;/a&gt; provides the orchestration layer that brings request-based isolation to Kubernetes.&lt;/p&gt;

&lt;p&gt;Stop merging to test. Start branching to validate.&lt;/p&gt;

</description>
      <category>cloudnative</category>
      <category>productivity</category>
      <category>devops</category>
      <category>testing</category>
    </item>
    <item>
      <title>Agentic Coding Tools Are Accelerating Output, Not Velocity</title>
      <dc:creator>Signadot</dc:creator>
      <pubDate>Mon, 12 Jan 2026 16:14:19 +0000</pubDate>
      <link>https://forem.com/signadot/agentic-coding-tools-are-accelerating-output-not-velocity-51fc</link>
      <guid>https://forem.com/signadot/agentic-coding-tools-are-accelerating-output-not-velocity-51fc</guid>
      <description>&lt;p&gt;&lt;em&gt;Read this article on &lt;a href="https://www.signadot.com/blog/agentic-coding-tools-are-accelerating-output-not-velocity?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=Agentic+Coding+Tools+Are+Accelerating+Output%2C+Not+Velocity" rel="noopener noreferrer"&gt;Signadot&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We are at a pivotal moment in software development. In just a few years, LLMs have gone from impressive chatbots to full-blown coding agents built directly into your IDE.  &lt;/p&gt;

&lt;p&gt;This has the entire industry racing to accelerate developer velocity with AI. With &lt;a href="https://survey.stackoverflow.co/2025/" rel="noopener noreferrer"&gt;84% of developers now using AI tools&lt;/a&gt;, and companies like Google stating that 25% or more of their code is now generated by AI, the mandate is clear.   &lt;/p&gt;

&lt;p&gt;So far, this hasn’t translated into significantly more code being generated, but that is already changing. Models and tooling are evolving rapidly, with agents now able to work in parallel on longer coding tasks &lt;/p&gt;

&lt;p&gt;Right now, this workflow is still maturing. We are in a transition phase where tools are helpful but often require heavy verification, limiting the actual output of code that makes it to PR. But as the tools continue to get better, the per-developer output of code is only going to increase.&lt;/p&gt;

&lt;p&gt;And that is the goal. Teams are adopting these tools now with the expectation that they will continue to improve developer velocity, ultimately enabling the same teams to write 5x or 10x more code. &lt;/p&gt;

&lt;p&gt;But the future state where that goal is achieved presents a critical challenge. The resulting explosion in code output will inevitably make existing CI/CD bottlenecks worse in direct proportion to the coding efficiencies gained.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Invisible Queue
&lt;/h3&gt;

&lt;p&gt;Picture the emerging workflow. A developer runs three Cursor sessions at once. One refactors the auth service. Another adds a feature to the notification system. A third optimizes database queries. By the end of the day she has three PRs ready to publish. Multiply that by a hundred developers and you get a sense of the scale of code that will be hitting legacy CI/CD pipelines. &lt;/p&gt;

&lt;p&gt;Without modernizing validation infrastructure in parallel to developer tooling, the acceleration in code generation will never translate to an equal acceleration in productivity. Instead, it will create a merge crisis. All those PRs will continue to pile up at the exact same chokepoint: validation.&lt;/p&gt;

&lt;p&gt;Code review becomes the new bottleneck. Even with AI-assisted reviews, humans still need to sign off. If teams achieve their goal of 5x developer output, that translates to hundreds of PRs a week needing eyeballs, testing, and approval. The queue grows. PRs age. Conflicts multiply.&lt;/p&gt;

&lt;p&gt;But the real chaos happens post-merge. All that code lands in staging and staging breaks, because nobody could actually test how their microservices changes interact until they're all deployed together. Now 100 developers are blocked, staring at a broken staging environment, waiting for someone to untangle the mess.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Only Way Forward
&lt;/h3&gt;

&lt;p&gt;There's only one solution, and it's counterintuitive: test more, but &lt;em&gt;earlier&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Not unit tests. Not mocks. Real integration tests. End-to-end tests. Running against actual services. But here's the key: run them at the granularity of every code change, during local development, before the PR even exists.&lt;/p&gt;

&lt;p&gt;Think about it differently. Right now, that Cursor session is generating code. It writes a new API endpoint, updates three service calls, modifies a database schema. The developer reviews the code, thinks it looks good, publishes the PR. Then the waiting starts. Code review queue. CI pipeline. Manual testing in staging. Maybe it works. Maybe it doesn't. Either way, you find out hours or days later.&lt;/p&gt;

&lt;p&gt;What if instead, while Cursor is writing that code, it's also spinning up a lightweight test environment, deploying those changes, running integration tests, and validating everything works, all before the developer even switches back to that tab?&lt;/p&gt;

&lt;p&gt;That's not science fiction. That's how development needs to work in the agentic era.&lt;/p&gt;

&lt;h3&gt;
  
  
  Shifting Left, For Real This Time
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ijqjzusqhms0e1sefy5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ijqjzusqhms0e1sefy5.png" alt=" " width="800" height="385"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We've been talking about "shift left" for years. This time it's not optional. It's critical for survival.&lt;/p&gt;

&lt;p&gt;The paradigm: every unit of code produced by an AI agent gets tested end-to-end during local development. Not after merge. Not during the PR review. &lt;em&gt;While being written.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;When this works, something magical happens. The PR that lands for review isn't a leap of faith. It's already validated. The reviewer sees: "15 integration tests passed. 3 end-to-end flows validated. Contract tests clean." Code review becomes fast. Confidence is high. Merges happen quickly. Staging stays green.&lt;/p&gt;

&lt;p&gt;This is where Signadot comes in.&lt;/p&gt;

&lt;h3&gt;
  
  
  Making It Real
&lt;/h3&gt;

&lt;p&gt;The solution isn't better code review tools or faster CI pipelines. It's parallelizing validation itself.&lt;/p&gt;

&lt;h4&gt;
  
  
  Parallelize Validation
&lt;/h4&gt;

&lt;p&gt;To unblock the queue, we must enable full integration testing during local development.&lt;/p&gt;

&lt;p&gt;Signadot allows developers to use the existing staging environment as a shared baseline. By isolating requests rather than duplicating infrastructure, every developer gets a sandboxed virtual private staging environment.&lt;/p&gt;

&lt;p&gt;This allows 100+ developers to test in parallel on the same cluster without impacting each other. No queue. No contention. No waiting for staging to be free.&lt;/p&gt;

&lt;h4&gt;
  
  
  Agent-Native Infrastructure via MCP
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F57jjrrbaywqxidxfpb81.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F57jjrrbaywqxidxfpb81.png" alt=" " width="800" height="419"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Signadot integrates directly into the AI coding workflow via the Signadot MCP (Model Context Protocol) Server.&lt;/p&gt;

&lt;p&gt;Tools like Cursor and Claude Code can now "speak" infrastructure. They use plain English commands to instantiate sandboxes, configure routing, and manage resources directly from the agent interface.&lt;/p&gt;

&lt;p&gt;The agent doesn't need you to switch contexts. It handles infrastructure the same way it handles code.&lt;/p&gt;

&lt;h4&gt;
  
  
  Bi-Directional Tunneling
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3sd6qhgcss0dikkrcfhh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3sd6qhgcss0dikkrcfhh.png" alt=" " width="800" height="309"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once the sandbox is requested, Signadot establishes a secure, bi-directional tunnel between the agent's local environment and the remote Kubernetes cluster.&lt;/p&gt;

&lt;p&gt;Requests triggered from the staging UI in the cluster are routed to your local machine. Simultaneously, your locally running service directly hits remote dependencies—database, auth, other microservices—in the cluster. You're testing against the real stack, not mocks.&lt;/p&gt;

&lt;h4&gt;
  
  
  AI-Driven Test Generation via Traffic Capture
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fww1fnefvudx4r213bqqp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fww1fnefvudx4r213bqqp.png" alt=" " width="800" height="596"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you interact with your local service, Signadot captures the actual request/response traffic flows. The coding agent analyzes this captured traffic to automatically generate functional API tests, ensuring your changes don't break existing contracts.&lt;/p&gt;

&lt;p&gt;No manual test writing. The agent sees how your service actually behaves and writes tests that validate those behaviors.&lt;/p&gt;

&lt;h4&gt;
  
  
  Autonomous Debugging Loop
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc4o5vlay00769huqjpy7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc4o5vlay00769huqjpy7.png" alt=" " width="800" height="468"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Test failed? The agent reads the live logs streamed directly from the Signadot sandbox to pinpoint the error. It corrects the code and reruns the tests instantly in the same sandbox.&lt;/p&gt;

&lt;p&gt;This closes the feedback loop without human intervention. The agent iterates until tests pass, all during local development.&lt;/p&gt;

&lt;p&gt;When you publish the PR, you attach the sandbox link. Your reviewer sees exactly what was tested, how it performed, and what edge cases were validated. The PR goes from "needs investigation" to "LGTM" in minutes.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Real Unlock
&lt;/h3&gt;

&lt;p&gt;Agentic coding tools promise 10x code generation and companies are racing to achieve that goal. But without granular pre-merge testing, we will remain stuck with 1x shipping velocity.&lt;/p&gt;

&lt;p&gt;The companies that will win are not the ones generating the most code. They are the ones who can validate and ship that code as fast as it is written. That means testing at the same granularity as coding. Every change. Every branch. Every experiment.&lt;/p&gt;

&lt;p&gt;The bottleneck is moving. The question is whether your testing infrastructure is moving with it.&lt;/p&gt;

&lt;p&gt;At Signadot, we're building the platform that makes agentic coding actually productive. Because code in a PR queue isn't value. Code in production is.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devops</category>
      <category>kubernetes</category>
      <category>productivity</category>
    </item>
    <item>
      <title>It’s Time To Kill Staging: The Case for Testing in Production</title>
      <dc:creator>Signadot</dc:creator>
      <pubDate>Thu, 04 Dec 2025 16:02:20 +0000</pubDate>
      <link>https://forem.com/signadot/its-time-to-kill-staging-the-case-for-testing-in-production-521a</link>
      <guid>https://forem.com/signadot/its-time-to-kill-staging-the-case-for-testing-in-production-521a</guid>
      <description>&lt;p&gt;&lt;em&gt;Read this blog on &lt;a href="https://www.signadot.com/blog/its-time-to-kill-staging-the-case-for-testing-in-production?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=It%E2%80%99s+Time+To+Kill+Staging%3A+The+Case+for+Testing+in+Production" rel="noopener noreferrer"&gt;Signadot&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Learn how testing in production with the right guardrails can eliminate bottlenecks, reduce costs and help your team ship more reliable code faster.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Staging has always been a necessary evil. New approaches to isolation and on-demand sandboxes have finally made it just plain evil.&lt;/p&gt;

&lt;p&gt;For decades, the staging environment has been a fixture of software development. And for just as long, it has been hated by developers everywhere. It’s the proverbial traffic jam that every developer is forced to sit in just to get their work validated.&lt;/p&gt;

&lt;p&gt;Yet staging is no longer necessary; its time has come and gone. Newer &lt;a href="https://www.signadot.com/blog/smart-ephemeral-environments-share-more-copy-less" rel="noopener noreferrer"&gt;isolation methods&lt;/a&gt; enable developers to test safely in live environments, providing fast, high-fidelity feedback that is impossible to achieve in a staging environment.&lt;/p&gt;

&lt;p&gt;It is time to kill your staging environment.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Problem We All Know
&lt;/h3&gt;

&lt;p&gt;Staging environments make sense, in theory. We must test code in a production-like environment before we ship to production. Anything else would be madness.&lt;/p&gt;

&lt;p&gt;But the cure has become its own disease. In any organization with more than a handful of microservices, the staging environment inevitably becomes a wasteland of developer pain and burned cash.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;It’s a bottleneck:&lt;/strong&gt; When 50 developers merge code, staging becomes a shared queue. Tests fail not because of bad code, but because another developer deployed a conflicting change.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It’s not production:&lt;/strong&gt; We call it “production-like,” but it never has the same data scale, traffic patterns or identity and access management (IAM) policies. This fidelity gap is where dangerous bugs hide.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It kills velocity:&lt;/strong&gt; Commit code, wait for CI, wait for a deploy slot, run a 40-minute test suite. This multihour cycle destroys flow state.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Nobody maintains it:&lt;/strong&gt; Teams treat it as a dumping ground for unstable builds, further diverging from production.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We have accepted this broken workflow for 20 years. We believed it was the only way.&lt;/p&gt;

&lt;h3&gt;
  
  
  From Environment Isolation To Request Isolation
&lt;/h3&gt;

&lt;p&gt;Staging exists because of the assumption that testing must be isolated at the environment level. To test a new version of the payment service, you must deploy it to an environment that also contains a cart service, user service and auth service.&lt;/p&gt;

&lt;p&gt;This assumption is outdated and obsolete.&lt;/p&gt;

&lt;p&gt;The new model is request-level isolation. Instead of cloning an entire environment, you spin up only the service you’re changing. This model is enabled by Kubernetes-native platforms that provide on-demand sandboxes for every request.&lt;/p&gt;

&lt;p&gt;Here is how it works:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A new service version is spun up in an on-demand, isolated “sandbox.”&lt;/li&gt;
&lt;li&gt;When a test request is sent (tagged with a unique header), it is routed to the sandboxed service.&lt;/li&gt;
&lt;li&gt;As that service calls its dependencies, those calls are routed back to the stable, baseline services in production.&lt;/li&gt;
&lt;li&gt;The &lt;a href="https://www.signadot.com/blog/shifting-testing-left-the-request-isolation-solution" rel="noopener noreferrer"&gt;test request remains isolated&lt;/a&gt; as it travels through the stack, while all other traffic flows normally.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With this approach, you get high-fidelity testing (real dependencies, real network policies) without the downsides of shared environments (no collisions, no queues, dramatically lower cost).&lt;/p&gt;

&lt;p&gt;Request-level isolation can be implemented into a traditional local &amp;gt; staging &amp;gt; production deployment flow to improve it, eliminating contention and long waits for a CI pipeline. But its real power lies in bypassing the need for staging altogether, enabling testing in production.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Safety Model
&lt;/h3&gt;

&lt;p&gt;Testing in production sounds dangerous. It’s not when you have the right guardrails.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Strict data isolation is critical.&lt;/strong&gt; The biggest concern is that a sandboxed service could corrupt data in other services. The solution is simple. The same routing header that isolates test traffic also routes database operations to separate test databases. For example, when a test request flows through the system, each service recognizes the test context and directs all database writes to isolated test data stores, completely separate from production databases. Test users interact only with test data. Production data remains untouched.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multitenancy provides the foundation.&lt;/strong&gt; Virtual private network (VPN) restrictions ensure test traffic originates only from authorized internal networks. Audit logs track every sandbox session for compliance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Request routing provides blast radius control.&lt;/strong&gt; Your sandbox is isolated at the request level. Your colleagues’ work is unaffected. Production traffic flows normally.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Progressive rollout remains essential.&lt;/strong&gt; Sandboxes handle preproduction validation, but you still use canary deployments, feature flags and observability to safely roll out to real users.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Answering the Hard Questions
&lt;/h3&gt;

&lt;p&gt;Testing in production is the logical evolution of the movement to shift testing left. But making such a foundational change to your CI/CD pipeline naturally brings up some critical questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;“How do you guarantee test traffic doesn’t corrupt production data?”&lt;/strong&gt; Test writes are isolated. The isolation header redirects all database writes to ephemeral, nonproduction data stores that are destroyed after the test. Production data is never touched.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;“What about the blast radius? How do you stop a bad test from DDoSing a downstream service?”&lt;/strong&gt; Sandboxes are “shadow” deployments built with guardrails like circuit breakers and network policies. A buggy test with runaway network requests is automatically throttled and contained, preventing it from overwhelming baseline services or affecting other users.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;“This sounds fine for simple APIs, but what about Kafka or gRPC?”&lt;/strong&gt; The isolation model is protocol-agnostic. The isolation header is propagated over gRPC or as a Kafka message header. A sandboxed consumer, for example, reads from the main topic but only processes messages with its unique sandbox ID.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;“What about compliance and audit requirements?”&lt;/strong&gt; This model is more auditable. Every sandbox is tied to a specific user and a pull request/dev session. All test traffic is explicitly tagged with a sandbox ID and user identity, creating a granular audit log that is far superior to a shared staging environment.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Addressing these and making the shift to testing in production will inevitably involve some upfront engineering investment. However, the return on investment for that work is not just the savings from eliminating the direct infrastructure costs of your staging. It’s also a better developer experience, faster product delivery and fewer opportunities are lost to competitors that can iterate and ship faster than you.&lt;/p&gt;

&lt;h3&gt;
  
  
  It’s Time To Let Go
&lt;/h3&gt;

&lt;p&gt;Eliminating staging may sound like a pipe dream, but several prominent cloud native teams like &lt;a href="https://careersatdoordash.com/blog/moving-e2e-testing-into-production-with-multi-tenancy-for-increased-speed-and-reliability/" rel="noopener noreferrer"&gt;DoorDash&lt;/a&gt; and &lt;a href="https://www.uber.com/blog/shifting-e2e-testing-left/" rel="noopener noreferrer"&gt;Uber&lt;/a&gt; have already made the shift left to testing in production. Driven by their highly complex &lt;a href="https://www.signadot.com/blog/its-time-to-kill-staging-the-case-for-testing-in-production" rel="noopener noreferrer"&gt;microservice stacks and a need to get better testing&lt;/a&gt; fidelity, they are also realizing huge infrastructure cost savings.&lt;/p&gt;

&lt;p&gt;Teams like these deprecating their staging environments to test in production represent a broader trend: the rejection of approximation in favor of reality. Staging environments are &lt;a href="https://www.signadot.com/blog/why-duplicating-environments-for-microservices-backfires" rel="noopener noreferrer"&gt;artifacts of an era when duplicating infrastructure&lt;/a&gt; was a harder problem to solve than coordinating humans around shared resources.&lt;/p&gt;

&lt;p&gt;That era is ending.&lt;/p&gt;

&lt;p&gt;The future isn’t about building better approximations of production or optimizing your CI pipeline. It’s about adopting an entirely new paradigm. The teams taking this step aren’t just moving faster and cutting costs. They’re also shipping more reliable code.&lt;/p&gt;

&lt;p&gt;It’s time to kill your staging environment.&lt;/p&gt;

</description>
      <category>testing</category>
      <category>devops</category>
      <category>discuss</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Kubernetes Isn't Your AI Bottleneck - It's Your Secret Weapon</title>
      <dc:creator>Signadot</dc:creator>
      <pubDate>Mon, 17 Nov 2025 15:24:20 +0000</pubDate>
      <link>https://forem.com/signadot/kubernetes-isnt-your-ai-bottleneck-its-your-secret-weapon-190j</link>
      <guid>https://forem.com/signadot/kubernetes-isnt-your-ai-bottleneck-its-your-secret-weapon-190j</guid>
      <description>&lt;p&gt;&lt;em&gt;Read this blog on &lt;a href="https://www.signadot.com/blog/kubernetes-isnt-your-ai-bottleneck----its-your-secret-weapon?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=Kubernetes+Isn%E2%80%99t+Your+AI+Bottleneck+%E2%80%94+It%E2%80%99s+Your+Secret+Weapon" rel="noopener noreferrer"&gt;Signadot&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Let’s get one thing straight: In the AI era, the only thing that matters is experimenting faster than your competition. Generative coding assistants are enabling rapid iteration and users are flocking to whichever platform offers them the latest, most automated features.&lt;/p&gt;

&lt;p&gt;But for many established companies, there’s a huge, complex anchor slowing you down: Kubernetes (K8s).&lt;/p&gt;

&lt;p&gt;You adopted it to scale, and now it feels like a velocity tax. It’s the stumbling block that gets in the way of rapid, AI-assisted coding converting into actual product iteration. You watch agile, AI-first teams shipping 10 times faster on simpler platforms, and you worry that your “mature” stack is leaving you behind.&lt;/p&gt;

&lt;p&gt;I’ll be blunt: If you’re a five-person team trying to find product-market fit, this criticism is 100% correct. You shouldn’t be near K8s. Your constraint is discovery, not delivery. Stop worrying about infrastructure.&lt;/p&gt;

&lt;p&gt;But this article isn’t for you.&lt;/p&gt;

&lt;p&gt;This is for the teams in growth mode and beyond who feel that exact pain. The teams that need to ship fast to compete but are trapped by their own scale.&lt;/p&gt;

&lt;p&gt;I’m here to tell you Kubernetes isn’t your stumbling block. If you use it right, it’s the superpower that will let you win the AI integration race.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Bottleneck Has Moved
&lt;/h3&gt;

&lt;p&gt;Let’s be real about what’s happening. Tools like Cursor, Claude and the whole fleet of AI coding tools are ridiculously good at generating code, to the extent that they are even &lt;a href="https://www.signadot.com/blog/your-next-pull-request-will-come-from-a-product-manager" rel="noopener noreferrer"&gt;enabling a new class of code contributors&lt;/a&gt;. The “blank page” problem is vanishing. I can ask an agent to “refactor this Python service to use a new gRPC endpoint and add retry logic,” and it will generate a 90% correct pull request (PR) in 30 seconds.&lt;/p&gt;

&lt;p&gt;The bottleneck is no longer &lt;em&gt;writing&lt;/em&gt; code. It’s &lt;em&gt;validating&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;My engineers’ job is shifting from “typist” to “editor-in-chief.” Their day is less about writing code line by line and more about asking:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“The agent gave me three valid-looking approaches. Which one is actually better?”&lt;/li&gt;
&lt;li&gt;“This PR looks right, but did it silently break one of the 15 downstream services that depend on it?”&lt;/li&gt;
&lt;li&gt;“How can I test this end to end without spending two days mocking dependencies?”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The new currency for competitive advantage is speed of experimentation. The team that can validate and ship 10 AI-generated experiments while the other team is still setting up their environment is going to win. Period.&lt;/p&gt;

&lt;p&gt;And this is where K8s, the platform everyone loves to hate, becomes your secret weapon.&lt;/p&gt;

&lt;h3&gt;
  
  
  K8s Is the Ultimate Experimentation Platform
&lt;/h3&gt;

&lt;p&gt;The old complaint about K8s was, “It’s too complex! I don’t want to run 20+ microservices on my laptop just to test a one-line change.”&lt;/p&gt;

&lt;p&gt;That argument is dead. If you’re still doing that, you’re using it wrong.&lt;/p&gt;

&lt;p&gt;Tools (like my own startup, &lt;a href="https://www.signadot.com/" rel="noopener noreferrer"&gt;Signadot&lt;/a&gt;) can help. No one runs the whole stack locally anymore. You connect your local machine to the cluster, or better yet, you get an isolated “sandbox” inside the cluster for every single PR.&lt;/p&gt;

&lt;p&gt;This is the game-changer that unlocks rapid experimentation at enterprise scale.&lt;/p&gt;

&lt;p&gt;When an engineer gets a PR from an AI agent, they shouldn’t be testing it on their MacBook. They should have a workflow that, with one click, gives them:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;An ephemeral sandbox:&lt;/strong&gt; K8s is brilliant at this. It spins up a lightweight, isolated environment containing just the new version of their service.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Request-level isolation:&lt;/strong&gt; This new &lt;a href="https://www.signadot.com/blog/microservices-testing-4-use-cases-for-sandbox-environments" rel="noopener noreferrer"&gt;“sandbox” environment&lt;/a&gt; intelligently routes only the developer’s test requests to their new code. All other traffic flows to the stable “baseline” services.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faznv766cs6ugfnbnkepz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faznv766cs6ugfnbnkepz.png" alt=" " width="800" height="563"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A preview URL:&lt;/strong&gt; The developer gets a unique URL to test their AI-generated feature against the entire production-like stack, without colliding with anyone else or breaking staging.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp06g8uxa04y3ndlqb9ej.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp06g8uxa04y3ndlqb9ej.png" alt=" " width="800" height="730"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, that engineer can review three different AI-generated PRs in an hour. They can run a full suite of end-to-end (E2E) tests against each one. They’re not validating code in a vacuum; they’re validating outcomes in a real-world environment.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stop Validating Code and Start Validating Hypotheses
&lt;/h3&gt;

&lt;p&gt;This brings me to my last point: testing.&lt;/p&gt;

&lt;p&gt;What about running all those tests? K8s gives you the ultimate building blocks. Sure, you could just use a managed CI provider, but that’s like building your factory inside someone else’s warehouse. You’re stuck with their rules, their limitations and their pricing. You’ll inevitably outgrow it. K8s is about owning the factory itself. It gives you the control to build a custom validation platform that is tailored precisely to your company’s workflow. You’re not renting a generic tool; you’re building a durable, competitive asset.&lt;/p&gt;

&lt;p&gt;Your whole validation pipeline runs on the same platform as your application.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An agent generates a PR.&lt;/li&gt;
&lt;li&gt;CI kicks in, builds a container and deploys it to a K8s sandbox.&lt;/li&gt;
&lt;li&gt;A series of Kubernetes Jobs are triggered, hammering that &lt;a href="https://www.signadot.com/blog/sandbox-testing-the-devex-game-changer-for-microservices" rel="noopener noreferrer"&gt;sandbox’s preview URL with E2E tests&lt;/a&gt;, load tests and contract tests.&lt;/li&gt;
&lt;li&gt;The engineer gets a report: “Approach A passed all tests. Approach B failed the latency test under load.”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is how you build an “experimentation engine,” not a “CI/CD pipeline.” You’re creating a factory for validating hypotheses at scale. The human’s job is to manage the factory, not turn the wrenches.&lt;/p&gt;

&lt;p&gt;So yes, K8s is complex. It’s a beast. But in an era where code is generated instantly, the battleground has shifted from creation to validation. And the only platform built to handle that level of high-concurrency, isolated, on-demand experimentation at scale is Kubernetes.&lt;/p&gt;

&lt;p&gt;If you’re in growth mode, speed of innovation is all that matters. Stop arguing about YAML and start building your experimentation engine.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>testing</category>
      <category>ai</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
