<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Jaskaran Singh</title>
    <description>The latest articles on Forem by Jaskaran Singh (@jaskaran_singh).</description>
    <link>https://forem.com/jaskaran_singh</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3891457%2Fcf2bff88-3ae7-4d38-a2ed-d62c86263079.jpg</url>
      <title>Forem: Jaskaran Singh</title>
      <link>https://forem.com/jaskaran_singh</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/jaskaran_singh"/>
    <language>en</language>
    <item>
      <title>Google Cloud NEXT '26 Was a Plumbing Conference. That's Why It Matters.</title>
      <dc:creator>Jaskaran Singh</dc:creator>
      <pubDate>Thu, 23 Apr 2026 03:31:40 +0000</pubDate>
      <link>https://forem.com/jaskaran_singh/google-cloud-next-26-was-a-plumbing-conference-thats-why-it-matters-b1f</link>
      <guid>https://forem.com/jaskaran_singh/google-cloud-next-26-was-a-plumbing-conference-thats-why-it-matters-b1f</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-cloud-next-2026-04-22"&gt;Google Cloud NEXT Writing Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Read enough keynote recaps and the shape of them becomes familiar. Model names, benchmark numbers, a CEO quote about whatever this year's era is called. You close the tab, write a Jira ticket, and wonder whether any of it was about your job.&lt;/p&gt;

&lt;p&gt;Today's Google Cloud NEXT '26 opening keynote had all of that. Thomas Kurian in Las Vegas, Sundar Pichai on video, Apple's logo unexpectedly behind the Google CEO's head. TPU generation eight. "The agentic cloud." The Gemini Enterprise Agent Platform — which is mostly what Vertex AI used to be, renamed and consolidated.&lt;/p&gt;

&lt;p&gt;Here's what I kept coming back to, though: the announcements that will actually affect what developers ship this year weren't the ones with applause breaks.&lt;/p&gt;

&lt;p&gt;They were the boring ones. The plumbing.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Mean By Plumbing
&lt;/h2&gt;

&lt;p&gt;The boring infrastructure almost always decides whether a technology ships at scale. HTTP wasn't exciting. TCP/IP wasn't a keynote moment. Nobody clapped for DNS. But that's the layer where things either work reliably or don't work at all.&lt;/p&gt;

&lt;p&gt;AI agents are at exactly that point right now. Everyone roughly agrees on what they want agents to do. The part that has quietly killed a hundred enterprise AI projects is different: getting agents to talk to each other across systems, hold context between sessions, and do it without becoming a security nightmare your team has to clean up later.&lt;/p&gt;

&lt;p&gt;That's most of what Google actually shipped today. Dressed up in model demos and stage lighting, but the substance is infrastructure.&lt;/p&gt;




&lt;h2&gt;
  
  
  The A2A Protocol Reaching v1.0
&lt;/h2&gt;

&lt;p&gt;The Agent2Agent (A2A) protocol reached v1.0 today and got handed to the Linux Foundation's Agentic AI Foundation for governance. It got maybe thirty seconds on stage.&lt;/p&gt;

&lt;p&gt;A2A answers a question that has made multi-agent architectures genuinely painful: how does an agent on Platform A discover, trust, and delegate to an agent on Platform B, when neither platform knows anything about the other's internals?&lt;/p&gt;

&lt;p&gt;The answer is Agent Cards. Each agent publishes a signed card — cryptographically verified via domain signatures — declaring what it can do, what inputs it accepts, and how to reach it. Another agent fetches that card, checks the signature, and delegates with some real basis for trusting that the capability is what it claims.&lt;/p&gt;

&lt;p&gt;Before this, "multi-agent" usually meant custom glue code, bespoke APIs, or just hoping two SDKs from the same vendor happened to compose without breaking. The production signal is worth noting: 150 organizations are running A2A in production right now, not in pilots — routing real workloads between agents built on different vendors' stacks. It launched roughly a year ago with 50 partner organizations.&lt;/p&gt;

&lt;p&gt;Native A2A support now ships in ADK, LangGraph, CrewAI, LlamaIndex, Semantic Kernel, and AutoGen. That's not a Google-curated list of close partners. That's where developers are actually building agent systems.&lt;/p&gt;

&lt;p&gt;The Linux Foundation move deserves more credit than it's getting. When a protocol lives in one company's GitHub, every potential adopter carries a quiet question in the back of their head: &lt;em&gt;what happens when Google gets bored with this?&lt;/em&gt; That friction is real, and it's killed protocols before. Handing it to neutral governance before mass adoption removes the question. It's the right call — and it wasn't the obviously self-interested one, since Google could have used the protocol as a lock-in mechanism instead.&lt;/p&gt;




&lt;h2&gt;
  
  
  ADK v1.0 and What "Stable" Actually Buys You
&lt;/h2&gt;

&lt;p&gt;The Agent Development Kit hit stable v1.0 releases today across Python, Go, and Java, with TypeScript available as well. The announcement was brief. The implications are less so.&lt;/p&gt;

&lt;p&gt;The 0.x releases were experimentally useful — people shipped real things with them. But "production-ready" means something specific when your agents are taking autonomous actions: stable APIs you can actually depend on, predictable versioning, and a security model you can explain to someone who isn't you.&lt;/p&gt;

&lt;p&gt;v1.0 ships with Model Armor, which defends against indirect prompt injection. This is the attack vector most agent systems ignore until it becomes a real problem — a malicious payload hidden in retrieved content that hijacks agent behavior mid-task. It also puts zero-trust architecture at the protocol level, with access managed through Cloud IAM and full audit logging. When an agent does something unexpected at 2am, you can find out what it did and why, rather than guessing.&lt;/p&gt;

&lt;p&gt;If you've been waiting for ADK to stabilize before committing: the spec is frozen, the security model exists, and the governance is neutral. That's what stable means.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Line from the Keynote Worth Sitting With
&lt;/h2&gt;

&lt;p&gt;Thomas Kurian said this during his talk: &lt;em&gt;"You have moved beyond the pilot. The experimental phase is behind us."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I've been thinking about that framing. It's not a description of where enterprises actually are. It's a description of where Google needs them to be.&lt;/p&gt;

&lt;p&gt;Most enterprise AI projects are still in pilot. The gap between a working demo and something that runs reliably across production data, security policies, and organizational complexity has ended more AI initiatives than bad models ever did. That gap is exactly what makes today's less-glamorous announcements worth attention.&lt;/p&gt;

&lt;p&gt;Knowledge Catalog grounds agents in actual business context across an entire data estate. Memory Bank gives agents persistent state across sessions, so they don't start from scratch on every interaction. Agent Identity manages agent credentials through the same IAM system that manages human credentials — which means your security team can audit them the same way.&lt;/p&gt;

&lt;p&gt;None of this demos well. "Agent credentials managed through IAM with audit logging" doesn't generate applause. But it's what makes an agent your CISO will let near production data, rather than one that stays permanently in sandbox.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where I'm Skeptical
&lt;/h2&gt;

&lt;p&gt;The word "open" appears a lot today. A2A is an open protocol. ADK is open source. The Model Garden includes 200+ models from multiple vendors, including Anthropic Claude.&lt;/p&gt;

&lt;p&gt;All true. And also: the smoothest path through every one of these tools runs directly through Google Cloud. Agent Engine for managed hosting. Apigee as the API-to-agent gateway. Vertex AI as the deployment target.&lt;/p&gt;

&lt;p&gt;The protocol is portable. The operational infrastructure is not.&lt;/p&gt;

&lt;p&gt;This isn't necessarily a problem — someone has to build the runtime, and Google's is genuinely good. But developers should be clear with themselves about what "open" covers here. The code you write on ADK travels with you. The observability tooling, the managed hosting, the audit trail — those are Google Cloud products. That's a real dependency. Know what you're choosing.&lt;/p&gt;




&lt;h2&gt;
  
  
  What to Actually Do with This
&lt;/h2&gt;

&lt;p&gt;If you're building agents right now: read the A2A spec before the SDK docs. Understanding Agent Cards — what goes into them, how signing works, what a well-defined skill description looks like — shapes how you design agents from the start. Adding discoverability to a system you built as closed is miserable. The &lt;a href="https://google.github.io/adk-docs/a2a/" rel="noopener noreferrer"&gt;official ADK A2A docs&lt;/a&gt; are genuinely readable and worth an hour.&lt;/p&gt;

&lt;p&gt;If you're choosing a multi-agent framework: A2A v1.0 in production at 150 organizations, across every major framework, is a meaningful signal about where multi-agent interoperability is actually converging. MCP is worth understanding too — the two solve different layers of the same problem. But A2A is where cross-platform agent composition is happening in production, not in demos.&lt;/p&gt;

&lt;p&gt;If you're speccing an enterprise AI project: look at Memory Bank and Agent Identity before you finalize the architecture. Persistent agent state and proper credential management are the two things that most demo architectures quietly skip. If yours skips them too, you'll add them later, under pressure, and it won't go cleanly.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Part That's Easy to Miss
&lt;/h2&gt;

&lt;p&gt;The keynote demo that got the biggest reaction showed a Gemini agent pulling data from thousands of ingredient PDFs, catching a soy allergen buried in one of them, then calling research agents to build a full market projection — autonomously, while the presenter talked.&lt;/p&gt;

&lt;p&gt;That's a real capability and it's impressive. But it works because of things that weren't in the demo: agents that can find each other by capability, verify each other's identity, maintain context between calls, and do it inside an auditable security boundary.&lt;/p&gt;

&lt;p&gt;The conference was loud today. TPU naming conventions, Apple on a Google slide, Sundar Pichai explaining that 75% of Google's new code is now AI-generated. That's all interesting. The part that matters for what developers actually ship is quieter: a protocol standard under neutral governance, running in production, with a security story you can defend.&lt;/p&gt;

&lt;p&gt;Infrastructure doesn't announce itself. It just works, until the day you need it and it's not there.&lt;/p&gt;

&lt;p&gt;The developer keynote is tomorrow at 10:30 AM PT on the DEV homepage. Worth catching for how the ADK and A2A story gets told to a technical audience rather than an executive one.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>cloudnextchallenge</category>
      <category>googlecloud</category>
    </item>
    <item>
      <title>The Mental Model I Use to Write Prompts That Don't Produce Garbage Code</title>
      <dc:creator>Jaskaran Singh</dc:creator>
      <pubDate>Wed, 22 Apr 2026 20:37:08 +0000</pubDate>
      <link>https://forem.com/jaskaran_singh/the-mental-model-i-use-to-write-prompts-that-dont-produce-garbage-code-1kgh</link>
      <guid>https://forem.com/jaskaran_singh/the-mental-model-i-use-to-write-prompts-that-dont-produce-garbage-code-1kgh</guid>
      <description>&lt;p&gt;Most prompt engineering advice focuses on syntax. Add "think step by step." Specify the language. Say "you are an expert." Some of that helps. None of it addresses the actual reason prompts produce bad code.&lt;/p&gt;

&lt;p&gt;The actual reason: the model can only solve the problem you described. Not the problem you have.&lt;/p&gt;

&lt;p&gt;Those are different more often than people realize. I know because my job is to find where they diverge. I've spent the last year evaluating AI-generated code professionally: writing rubrics, running adversarial tests, doing multi-turn reviews. The failure modes I see aren't random. They trace back, almost every time, to something in how the task was specified.&lt;/p&gt;

&lt;p&gt;There's a reason Andrej Karpathy &lt;a href="https://x.com/karpathy/status/1937902205765607626" rel="noopener noreferrer"&gt;argued in 2025&lt;/a&gt; we should call this "context engineering" rather than prompt engineering. "In every industrial-strength LLM app, context engineering is the delicate art and science of filling the context window with just the right information." The framing shift matters. A request is something you fire off. A specification is something you construct.&lt;/p&gt;




&lt;h2&gt;
  
  
  Prompts Are Specifications, Not Requests
&lt;/h2&gt;

&lt;p&gt;When you ask a colleague to write a function, they bring context you never stated. They know the codebase. They know what "production-ready" means at your company. They know the last time something similar broke and why. They fill gaps with judgment.&lt;/p&gt;

&lt;p&gt;A model has none of that. It has your words and its training data. When you leave gaps, it fills them with statistically likely defaults. Those defaults are often fine. When they're not, the code looks correct and isn't.&lt;/p&gt;

&lt;p&gt;The shift that changed how I prompt: stop thinking of a prompt as a request and start thinking of it as a specification. A specification answers questions the implementer will have whether or not you thought to ask them.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Four Things Every Code Prompt Is Missing
&lt;/h2&gt;

&lt;p&gt;After a few hundred evaluations, four categories of missing information account for most of the failures I see.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The Environment
&lt;/h3&gt;

&lt;p&gt;The model doesn't know where this code lives. The same function needs a completely different implementation depending on whether it runs in a coroutine context, a background thread, a serverless function, or a single-threaded script.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Instead of:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;Write&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;that&lt;/span&gt; &lt;span class="nx"&gt;fetches&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="nx"&gt;and&lt;/span&gt; &lt;span class="nx"&gt;caches&lt;/span&gt; &lt;span class="nx"&gt;it&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Try:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="nc"&gt;Write&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="nc"&gt;Kotlin&lt;/span&gt; &lt;span class="n"&gt;function&lt;/span&gt; &lt;span class="n"&gt;that&lt;/span&gt; &lt;span class="n"&gt;fetches&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="n"&gt;and&lt;/span&gt; &lt;span class="n"&gt;caches&lt;/span&gt; &lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="nc"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt; &lt;span class="n"&gt;runs&lt;/span&gt; &lt;span class="n"&gt;inside&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="nc"&gt;ViewModel&lt;/span&gt; &lt;span class="n"&gt;using&lt;/span&gt; &lt;span class="n"&gt;viewModelScope&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="nc"&gt;The&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="n"&gt;targets&lt;/span&gt; &lt;span class="nc"&gt;Android&lt;/span&gt; &lt;span class="nc"&gt;API&lt;/span&gt; &lt;span class="mi"&gt;26&lt;/span&gt;&lt;span class="p"&gt;+.&lt;/span&gt; &lt;span class="nc"&gt;We&lt;/span&gt; &lt;span class="n"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Coroutines&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;not&lt;/span&gt; &lt;span class="nc"&gt;RxJava&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first prompt produces code. The second produces code that fits where it has to live.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The Failure Cases
&lt;/h3&gt;

&lt;p&gt;The model optimizes for the happy path unless you tell it not to. Network calls succeed. Inputs are valid. Caches hit. This isn't laziness. Your prompt described a world where those things are true.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Instead of:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;Write&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;function&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;parse&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;this&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;JSON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;response&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;into&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;User&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;object.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Try:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="nc"&gt;Write&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;function&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="n"&gt;parse&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt; &lt;span class="nc"&gt;JSON&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="n"&gt;into&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="nc"&gt;User&lt;/span&gt; &lt;span class="k"&gt;object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="nc"&gt;Handle&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;malformed&lt;/span&gt; &lt;span class="nc"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;missing&lt;/span&gt; &lt;span class="n"&gt;required&lt;/span&gt; &lt;span class="n"&gt;fields&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt; &lt;span class="n"&gt;values&lt;/span&gt; &lt;span class="n"&gt;on&lt;/span&gt;
&lt;span class="n"&gt;optional&lt;/span&gt; &lt;span class="n"&gt;fields&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;and&lt;/span&gt; &lt;span class="n"&gt;network&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="nc"&gt;Return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="nc"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;User&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;so&lt;/span&gt;
&lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;caller&lt;/span&gt; &lt;span class="n"&gt;can&lt;/span&gt; &lt;span class="n"&gt;handle&lt;/span&gt; &lt;span class="n"&gt;each&lt;/span&gt; &lt;span class="n"&gt;failure&lt;/span&gt; &lt;span class="n"&gt;type&lt;/span&gt; &lt;span class="n"&gt;explicitly&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You're not asking the model to be more careful. You're describing a more complete problem.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The Constraints Nobody Mentions
&lt;/h3&gt;

&lt;p&gt;Performance requirements. Size limits. Thread safety. Backward compatibility. These feel obvious because you carry them in your head. The model doesn't have your head.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Instead of:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;Write&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;function&lt;/span&gt; &lt;span class="n"&gt;that&lt;/span&gt; &lt;span class="n"&gt;processes&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt; &lt;span class="n"&gt;of&lt;/span&gt; &lt;span class="n"&gt;transactions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Try:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="nc"&gt;Write&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;function&lt;/span&gt; &lt;span class="n"&gt;that&lt;/span&gt; &lt;span class="n"&gt;processes&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;list&lt;/span&gt; &lt;span class="n"&gt;of&lt;/span&gt; &lt;span class="n"&gt;transactions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="nc"&gt;Constraints&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;list&lt;/span&gt; &lt;span class="n"&gt;can&lt;/span&gt; &lt;span class="n"&gt;contain&lt;/span&gt; &lt;span class="n"&gt;up&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;000&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="k"&gt;this&lt;/span&gt; &lt;span class="n"&gt;runs&lt;/span&gt; &lt;span class="n"&gt;on&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt; &lt;span class="n"&gt;thread&lt;/span&gt; &lt;span class="n"&gt;so&lt;/span&gt; &lt;span class="n"&gt;blocking&lt;/span&gt; &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="n"&gt;not&lt;/span&gt; &lt;span class="n"&gt;acceptable&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="n"&gt;and&lt;/span&gt; &lt;span class="n"&gt;we&lt;/span&gt; &lt;span class="n"&gt;need&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="n"&gt;support&lt;/span&gt; &lt;span class="nc"&gt;API&lt;/span&gt; &lt;span class="mi"&gt;21&lt;/span&gt;&lt;span class="p"&gt;+.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. What "Done" Looks Like
&lt;/h3&gt;

&lt;p&gt;If you don't define success criteria, the model defines them for you. Usually that means "compiles and handles the obvious case." That's a low bar.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Instead of:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Write&lt;/span&gt; &lt;span class="n"&gt;unit&lt;/span&gt; &lt;span class="n"&gt;tests&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt; &lt;span class="n"&gt;repository&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt;&lt;span class="err"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Try:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Write&lt;/span&gt; &lt;span class="n"&gt;unit&lt;/span&gt; &lt;span class="n"&gt;tests&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt; &lt;span class="n"&gt;repository&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt;&lt;span class="err"&gt;.&lt;/span&gt;
&lt;span class="nc"&gt;I&lt;/span&gt; &lt;span class="n"&gt;want&lt;/span&gt; &lt;span class="n"&gt;coverage&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;success&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;network&lt;/span&gt; &lt;span class="n"&gt;failure&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
&lt;span class="n"&gt;cache&lt;/span&gt; &lt;span class="n"&gt;hit&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cache&lt;/span&gt; &lt;span class="n"&gt;miss&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;and&lt;/span&gt; &lt;span class="n"&gt;concurrent&lt;/span&gt; &lt;span class="n"&gt;access&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
&lt;span class="nc"&gt;Use&lt;/span&gt; &lt;span class="nc"&gt;JUnit4&lt;/span&gt; &lt;span class="n"&gt;and&lt;/span&gt; &lt;span class="nc"&gt;Mockito&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt; &lt;span class="nc"&gt;Each&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt; &lt;span class="n"&gt;should&lt;/span&gt; &lt;span class="n"&gt;have&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;
&lt;span class="n"&gt;single&lt;/span&gt; &lt;span class="n"&gt;assertion&lt;/span&gt; &lt;span class="n"&gt;and&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;descriptive&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Pre-Prompt Checklist
&lt;/h2&gt;

&lt;p&gt;Before I send a code prompt, I run through four questions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where does this run?&lt;/strong&gt; Language, runtime, framework, threading model, platform constraints. If I can't answer this in one sentence, I don't know my own context well enough yet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What goes wrong?&lt;/strong&gt; Every function has a set of inputs that break it. State them. If the function touches the network, a database, or user input, those are automatic candidates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What can't I trade away?&lt;/strong&gt; Performance floor, security requirements, API compatibility, dependency restrictions. Anything that would make a technically correct solution still unshippable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How will I know it worked?&lt;/strong&gt; If I can't describe a test that would fail on a wrong implementation and pass on a correct one, my spec is incomplete.&lt;/p&gt;

&lt;p&gt;This takes maybe 90 seconds. It saves much more than that in review time.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Adversarial Test
&lt;/h2&gt;

&lt;p&gt;After I write a prompt, I read it as if I'm trying to satisfy it with the worst code that technically meets the stated requirements.&lt;/p&gt;

&lt;p&gt;If the worst technically-compliant implementation is still unshippable, my prompt is missing something.&lt;/p&gt;

&lt;p&gt;Example. Prompt: "Write a function that returns the user's name from the database."&lt;/p&gt;

&lt;p&gt;Worst technically-compliant implementation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_user_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT name FROM users WHERE id = &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;fetchone&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This returns a name from the database. It's also a SQL injection vulnerability — &lt;a href="https://owasp.org/www-project-top-10-for-large-language-model-applications/" rel="noopener noreferrer"&gt;still the most common LLM-generated security flaw according to OWASP's Top 10 for LLM Applications&lt;/a&gt; — and will throw an unhandled exception if the user doesn't exist.&lt;/p&gt;

&lt;p&gt;Both problems are visible if you read the prompt adversarially. Neither shows up if you read it straight.&lt;/p&gt;

&lt;p&gt;Better prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;Write&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;function&lt;/span&gt; &lt;span class="n"&gt;that&lt;/span&gt; &lt;span class="n"&gt;returns&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s name from the database.
Use parameterized queries. Return None if the user doesn&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="n"&gt;exist&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;Raise&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="nc"&gt;DatabaseError &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;generic&lt;/span&gt; &lt;span class="n"&gt;exception&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="n"&gt;fails&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the worst technically-compliant implementation is actually safe.&lt;/p&gt;




&lt;h2&gt;
  
  
  What This Doesn't Fix
&lt;/h2&gt;

&lt;p&gt;Two things this framework won't help with: tasks that require understanding your system's history, and tasks where the right answer depends on a judgment call you haven't made yet. For the first, no amount of prompt engineering substitutes for the model actually knowing your codebase — that's where RAG and &lt;a href="https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview" rel="noopener noreferrer"&gt;long-context strategies&lt;/a&gt; come in. For the second, the prompt can't be finished until you've made the decision.&lt;/p&gt;

&lt;p&gt;Both are worth recognizing because they tell you when to stop trying to prompt your way out of a problem. Some work needs to stay with you.&lt;/p&gt;




&lt;h2&gt;
  
  
  The One-Line Version
&lt;/h2&gt;

&lt;p&gt;Describe the problem you have, not the output you want.&lt;/p&gt;

&lt;p&gt;The output is a function. The problem is a function that handles these inputs, runs in this context, fails gracefully in these ways, and satisfies these constraints. The model is better at solving the second one than you might think. It just can't infer it from the first.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;&lt;a href="https://linkedin.com/in/jaskaranchana" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; · &lt;a href="https://jaskaranchana.netlify.app/" rel="noopener noreferrer"&gt;Portfolio&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>programming</category>
      <category>todayilearned</category>
    </item>
    <item>
      <title>AI Agents Are Shipping Features Without You. Now What?</title>
      <dc:creator>Jaskaran Singh</dc:creator>
      <pubDate>Wed, 22 Apr 2026 00:12:59 +0000</pubDate>
      <link>https://forem.com/jaskaran_singh/ai-agents-are-shipping-features-without-you-now-what-4eo0</link>
      <guid>https://forem.com/jaskaran_singh/ai-agents-are-shipping-features-without-you-now-what-4eo0</guid>
      <description>&lt;p&gt;&lt;em&gt;Jaskaran Singh — Senior Software Engineer, AI Trainer&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;A few weeks ago I watched an agent open a GitHub issue, write the fix, run the tests, and open a pull request. No human typed a line of code. The PR passed review.&lt;/p&gt;

&lt;p&gt;I didn't find this inspiring. I found it genuinely disorienting. I say that as someone who trains AI models for a living and is currently building an agent of my own.&lt;/p&gt;

&lt;p&gt;If you're a software engineer in 2026 and you haven't had that moment yet, you will. Agentic AI is being called the third seismic shift in software engineering this century, after open source and DevOps. That framing might be overblown. It might not be. Either way, something real is happening and it's worth thinking clearly about instead of panicking or dismissing it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Numbers Stopped Being Theoretical
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1551288049-bebda4e38f71%3Fw%3D1000%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1551288049-bebda4e38f71%3Fw%3D1000%26q%3D80" alt="A dashboard of data and analytics representing AI adoption statistics" width="1000" height="667"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Source: &lt;a href="https://unsplash.com/photos/MacBook-Pro-on-table-beside-white-iMac-and-Magic-Mouse-Im7lZjxeLhg" rel="noopener noreferrer"&gt;Unsplash&lt;/a&gt; — Luke Chesser&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A &lt;a href="https://newsletter.pragmaticengineer.com/p/ai-tooling-2026" rel="noopener noreferrer"&gt;survey of nearly 1,000 engineers published in early 2026&lt;/a&gt; found that 95% use AI tools at least weekly, 75% use AI for half or more of their engineering work, and 55% regularly use AI agents. That last number is the one that matters. Copilots have been mainstream for two years. Agents are different.&lt;/p&gt;

&lt;p&gt;A copilot suggests. An agent acts. It reads your codebase, decides what to do, does it, checks whether it worked, and tries again if it didn't. The feedback loop is closed without you in it.&lt;/p&gt;

&lt;p&gt;In 2025, coding agents moved from experimental tools to production systems shipping real features to real customers. In 2026, single agents are becoming coordinated teams of agents.&lt;/p&gt;

&lt;p&gt;I've been watching this from an unusual angle. My job involves evaluating AI-generated code for quality: finding the failure modes, writing the rubrics, doing the multi-turn reviews. At the same time I'm building a Python agent that monitors the OINP immigration portal and pushes Telegram alerts whenever a new Masters Graduate stream draw drops. Two different relationships with the same technology, and both have given me a clearer picture than I'd have from either side alone.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Agents Are Actually Good At
&lt;/h2&gt;

&lt;p&gt;Agents handle implementation tasks well when the problem is well-scoped and verifiable. "Add pagination to this endpoint." "Write tests for this module." "Refactor this class to use dependency injection." Tasks with clear success criteria: the code runs, the tests pass, the interface contract is unchanged. The agent can verify its own work.&lt;/p&gt;

&lt;p&gt;Quality still varies. My evaluation work confirms what engineers describe: intuitions for delegation develop over time. People hand off tasks that are easily verifiable or low-stakes. That intuition is real and it matters. Knowing what to delegate is itself a skill now.&lt;/p&gt;

&lt;p&gt;Where agents fall apart is anything requiring judgment about what the right problem even is. An agent given an ambiguous brief will confidently solve the wrong version of it. I've seen this pattern repeatedly, not as an occasional edge case but as a consistent failure mode when the task specification has gaps. The agent doesn't ask for clarification. It infers, fills in, and proceeds. Sometimes the inference is right. When it's wrong, it's wrong in ways that are coherent and hard to catch. That's the part that should make you nervous.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Shift That's Actually Happening to Engineering Teams
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.gartner.com/en/newsroom/press-releases/2025-10-20-gartner-identifies-the-top-strategic-technology-trends-for-2026" rel="noopener noreferrer"&gt;Gartner predicts&lt;/a&gt; 80% of organizations will evolve large software engineering teams into smaller, AI-augmented teams by 2030. The trajectory is already visible. Teams that used to need eight engineers to maintain a product are running it with four. Not because the other four got fired, but because agent-assisted output per engineer went up enough that the headcount math changed.&lt;/p&gt;

&lt;p&gt;The pattern emerging in 2026: software development is moving toward human expertise focused on defining problems worth solving while AI handles the tactical implementation work.&lt;/p&gt;

&lt;p&gt;That framing is mostly right but it undersells something. "Defining problems worth solving" sounds clean and strategic. In practice it means writing a spec precise enough that an agent doesn't go off the rails, reviewing agent output at a level that catches subtle correctness issues, and making architecture decisions that hold up when the agent starts filling in implementations you didn't anticipate.&lt;/p&gt;

&lt;p&gt;Those are all hard skills. They're also different from the skills that got most of us into engineering. We learned by writing the implementation ourselves. The feedback loop of "I wrote this, it broke, I understand why" is how you build the mental models that make good judgment possible. Whether that judgment transfers cleanly to directing agents at tasks you've never done yourself is an open question. I don't think anyone knows yet.&lt;/p&gt;




&lt;h2&gt;
  
  
  What This Means If You're Mid-Career
&lt;/h2&gt;

&lt;p&gt;I'm five years in. I've shipped production Android apps, done fintech work, and I'm now working at the AI training layer. The people who seem least threatened by this shift share one thing: they understand systems, not just syntax.&lt;/p&gt;

&lt;p&gt;A developer who knows Kotlin and can write Jetpack Compose components is in a different position than one who understands why coroutine cancellation works the way it does, when a &lt;code&gt;ViewModel&lt;/code&gt; scope is the wrong choice, and what the architectural consequences of a particular state management approach are three features down the road. The first kind of knowledge is increasingly delegatable. The second is what you need to review what the agent produces.&lt;/p&gt;

&lt;p&gt;This is not a comfortable message. It basically says the work that builds deep knowledge is being automated before you've had a chance to accumulate it through repetition. That's a real problem for junior developers and I don't have a clean answer to it. Engineers who actively seek out the "why" behind every pattern they use, even when an agent handed them that pattern, will pull ahead of those who treat agent output as a black box. That's my best guess.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Security Problem Nobody Talks About
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1555066931-4365d14bab8c%3Fw%3D1000%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1555066931-4365d14bab8c%3Fw%3D1000%26q%3D80" alt="A padlock on a keyboard representing code security" width="1000" height="667"&gt;&lt;/a&gt;&lt;em&gt;Source: &lt;a href="https://unsplash.com/photos/turned-on-gray-laptop-computer-4hbJ-eymZ1o" rel="noopener noreferrer"&gt;Unsplash&lt;/a&gt; — Lewis Kang'ethe Ngugi&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Agentic coding is changing security in two directions. As models get more capable, building security into products gets easier. The same capabilities that help defenders help attackers too.&lt;/p&gt;

&lt;p&gt;There's a third direction worth adding from my evaluation work: agents introduce security risks through confident implementation of insecure patterns. An agent writing a data pipeline reaches for the most direct path to working code. Input sanitization, parameterized queries, credential management, error handling that doesn't leak internals: these require deliberate thought. Agents do them inconsistently.&lt;/p&gt;

&lt;p&gt;The more autonomous the coding pipeline, the more critical it is to have security review that isn't the same agent that wrote the code. I've flagged SQL injection vulnerabilities in agent-generated Python and credential handling issues in agent-generated Kotlin. The code was functionally correct. It would have passed a cursory review. It shouldn't have shipped.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why I'm Still Building Agents
&lt;/h2&gt;

&lt;p&gt;None of this made me stop building the OINP monitoring bot. It made me more deliberate about it.&lt;/p&gt;

&lt;p&gt;The thing I'm building isn't trying to do something clever. It checks a government webpage on a schedule, parses the draw results, compares against the last known state, and fires a Telegram message if something changed. The agent part is the parsing logic: handling inconsistencies in how the page is structured, dealing with cases where the data format shifts slightly. That's a good fit for what these tools are actually good at.&lt;/p&gt;

&lt;p&gt;The immigration system in Canada is opaque in ways that are genuinely stressful for people on it. If a monitoring tool reduces that stress even slightly, it's worth the weekend. The judgment about what's worth building and why is still entirely mine.&lt;/p&gt;

&lt;p&gt;That's probably the honest answer to "now what." The judgment work is still yours. The implementation is increasingly negotiable.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Jaskaran Singh is a Senior Software Engineer working in AI training and evaluation, with production experience in Android development using Kotlin and Flutter. Currently building a Python-based OINP immigration monitoring agent.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;a href="https://linkedin.com/in/jaskaranchana" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; · &lt;a href="https://jaskaranchana.github.io/Portfolio/" rel="noopener noreferrer"&gt;Portfolio&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>career</category>
      <category>security</category>
    </item>
    <item>
      <title>I Grade AI Code for a Living. Here's What Nobody Talks About.</title>
      <dc:creator>Jaskaran Singh</dc:creator>
      <pubDate>Tue, 21 Apr 2026 23:56:05 +0000</pubDate>
      <link>https://forem.com/jaskaran_singh/i-grade-ai-code-for-a-living-heres-what-nobody-talks-about-4do3</link>
      <guid>https://forem.com/jaskaran_singh/i-grade-ai-code-for-a-living-heres-what-nobody-talks-about-4do3</guid>
      <description>&lt;p&gt;&lt;em&gt;Jaskaran Singh — Senior Software Engineer, AI Trainer&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;I've spent the last year doing something most engineers haven't: reading AI-generated code all day and deciding whether it's actually good.&lt;/p&gt;

&lt;p&gt;Not "does it compile." Not "did the tests pass." Good as in, would I be comfortable shipping this to production at 2am on a Friday if something went wrong.&lt;/p&gt;

&lt;p&gt;The answer, more often than people want to admit, is no.&lt;/p&gt;

&lt;p&gt;I use LLMs myself. But after evaluating enough AI-generated code across Python, Java, Kotlin, and C/C++, I know the failure modes aren't random. They follow patterns. And once you see them, you can't unsee them in AI code or your own.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Job Nobody Has a Good Title For
&lt;/h2&gt;

&lt;p&gt;My official role is AI Trainer. What that actually means: I'm a human in the RLHF loop.&lt;/p&gt;

&lt;p&gt;Reinforcement Learning from Human Feedback works by having engineers like me evaluate model outputs against structured rubrics, then rank and rewrite them so the model learns what "better" looks like. I write adversarial prompts to expose failure modes. I do multi-turn code reviews, meaning I follow an entire back-and-forth between a user and a model across five or ten turns, and assess whether the reasoning held up or quietly drifted off the rails somewhere in the middle.&lt;/p&gt;

&lt;p&gt;Less "AI whisperer." More "very opinionated senior reviewer who never runs out of things to flag."&lt;/p&gt;




&lt;h2&gt;
  
  
  The Pattern That Bothers Me Most
&lt;/h2&gt;

&lt;p&gt;There's a category of bug I call "confident and wrong." The code compiles. It's readable. The variable names are sensible. It even has a comment explaining what it does. And it's still wrong. Not obviously wrong, but wrong in the way that only shows up under load, or with a specific input type, or after three other things happen first.&lt;/p&gt;

&lt;p&gt;Here's a real example. Prompt was something like: &lt;em&gt;"Write a function to fetch user details and cache the result."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The model produced:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;object&lt;/span&gt; &lt;span class="nc"&gt;UserCache&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;cache&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;HashMap&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;User&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;()&lt;/span&gt;

    &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;getUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fetchFn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nc"&gt;User&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;User&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getOrPut&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nf"&gt;fetchFn&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Clean. Concise. Totally broken in a concurrent environment.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;HashMap&lt;/code&gt; isn't thread-safe. Two coroutines calling &lt;code&gt;getOrPut&lt;/code&gt; simultaneously on the same key can corrupt the map. The model didn't add a mutex, didn't suggest &lt;code&gt;ConcurrentHashMap&lt;/code&gt;, didn't even mention the assumption that this runs single-threaded. It just wrote code that works in the demo and fails in production.&lt;/p&gt;

&lt;p&gt;The correct version uses &lt;code&gt;ConcurrentHashMap&lt;/code&gt; or wraps access with a &lt;code&gt;Mutex&lt;/code&gt; if you need atomic get-or-fetch semantics:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;object&lt;/span&gt; &lt;span class="nc"&gt;UserCache&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;cache&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConcurrentHashMap&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;User&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;()&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;mutex&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Mutex&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;suspend&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;getUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fetchFn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;suspend&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nc"&gt;User&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;User&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;let&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;it&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;mutex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withLock&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="c1"&gt;// double-checked after acquiring lock&lt;/span&gt;
            &lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getOrPut&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nf"&gt;fetchFn&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model's version would pass code review at most places. That's what worries me.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Edge Case Problem is Structural, Not Random
&lt;/h2&gt;

&lt;p&gt;After a few hundred evaluations, I stopped thinking of missed edge cases as oversights. They're structural. LLMs optimize for the problem as stated. If the prompt doesn't mention null inputs, concurrent access, or network timeouts, the model won't think about them either.&lt;/p&gt;

&lt;p&gt;Good engineers treat those as implied. You don't wait to be asked "what if this list is empty." You just handle it.&lt;/p&gt;

&lt;p&gt;Here are the categories where models fail most consistently:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Concurrency.&lt;/strong&gt; Single-threaded assumptions that explode under real-world load. The &lt;code&gt;HashMap&lt;/code&gt; example above is the most common flavor.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Failure state propagation.&lt;/strong&gt; Functions that catch exceptions and return &lt;code&gt;null&lt;/code&gt; or &lt;code&gt;false&lt;/code&gt;, then callers that don't check the return value, and the whole chain silently fails. The model gets each function right in isolation. It gets the composition wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Resource cleanup.&lt;/strong&gt; Network connections, file handles, database cursors left open because the happy path worked and nobody wrote the &lt;code&gt;finally&lt;/code&gt; block or used the right scoping construct.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Behavioral drift across turns.&lt;/strong&gt; In turn 1, the model sets up a class a certain way. By turn 4, after a few "can you refactor this" prompts, it has made changes that contradict the original design without acknowledging it. The code still runs. The architecture is now inconsistent in ways that will cause problems in six months.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Actually Look For in a Code Review
&lt;/h2&gt;

&lt;p&gt;My rubric has eight criteria. The ones that surface the most issues:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Correctness under adversarial input.&lt;/strong&gt; Not "does it work with the example." Does it work when the input is empty, null, malformed, enormous, or concurrent? I'll trace through a model's code in my head with the worst inputs I can think of before scoring it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Explicitness of assumptions.&lt;/strong&gt; Code that works is not the same as code that communicates its constraints. If a function assumes its input is sorted, that needs to be in a comment, a precondition check, or the function name. The model almost never does this unprompted.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Error handling that means something.&lt;/strong&gt; There's a specific anti-pattern I call "error theater":&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="c1"&gt;// This is not error handling. This is error cosplay.&lt;/span&gt;
&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;result&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;riskyOperation&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;Log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;e&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"TAG"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Something went wrong"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It looks like error handling. It isn't. The caller has no information. The system has no way to recover. The log message gets ignored. Good error handling changes what the caller can do. It doesn't just muffle the crash.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security surface.&lt;/strong&gt; SQL construction via string interpolation, credentials in code comments, user input passed to shell commands without sanitization. These come up. Not constantly, but often enough that I check every time.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Skill That Transferred Back
&lt;/h2&gt;

&lt;p&gt;I didn't expect this job to change how I write code. It did.&lt;/p&gt;

&lt;p&gt;Spending eight hours a day articulating &lt;em&gt;why&lt;/em&gt; something is wrong, not just flagging it but writing a clear explanation that a model can actually learn from, builds a habit of internal interrogation that's hard to turn off.&lt;/p&gt;

&lt;p&gt;Now, before I submit a PR, I run my own rubric. Is this thread-safe? What happens on retry? Who owns cleanup? Does this function do what its name says, or has it quietly acquired a second responsibility?&lt;/p&gt;

&lt;p&gt;That last one is underrated. Functions that do two things are where bugs live. The AI writes them constantly because function names get generated from the prompt context, and prompts often have two goals. "Fetch and validate" is two functions pretending to be one.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where AI Code Actually Shines
&lt;/h2&gt;

&lt;p&gt;I've been critical, so to be fair.&lt;/p&gt;

&lt;p&gt;AI-generated code is genuinely good at boilerplate. Serialization logic, configuration parsing, test scaffolding, adapters between interfaces that differ only in naming. Tedious work that models handle well. If I ask for a Room database entity with a DAO and a repository, the output is usually solid and saves thirty minutes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="c1"&gt;// This kind of scaffolding? Models nail it.&lt;/span&gt;
&lt;span class="nd"&gt;@Entity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tableName&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"users"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;data class&lt;/span&gt; &lt;span class="nc"&gt;UserEntity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nd"&gt;@PrimaryKey&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;createdAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Long&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;currentTimeMillis&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@Dao&lt;/span&gt;
&lt;span class="kd"&gt;interface&lt;/span&gt; &lt;span class="nc"&gt;UserDao&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nd"&gt;@Query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"SELECT * FROM users WHERE id = :userId"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;suspend&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;getUserById&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;UserEntity&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt;

    &lt;span class="nd"&gt;@Insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;onConflict&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OnConflictStrategy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;REPLACE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;suspend&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;insertUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;UserEntity&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Models are also good at surfacing options I'd forgotten about. Not because they know my codebase, but because they've seen enough code to suggest a &lt;code&gt;StateFlow&lt;/code&gt; where I was reaching for &lt;code&gt;LiveData&lt;/code&gt;, or use &lt;code&gt;runCatching&lt;/code&gt; in a context where it genuinely fits.&lt;/p&gt;

&lt;p&gt;The mistake is treating it as something that reasons about your system. It doesn't know your system. It knows patterns. Those overlap most of the time and fail in ways that aren't obvious the other times.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why I Wrote This
&lt;/h2&gt;

&lt;p&gt;A few months ago I started noticing that engineers I respect were shipping AI-generated code without reviewing it seriously. Not because they're lazy. Because the code looked fine. That's the problem. It's calibrated to look fine.&lt;/p&gt;

&lt;p&gt;The engineers who work well with AI tooling treat it the way experienced engineers treat a junior developer: capable, useful, not fully trusted without review, and prone to specific failure patterns you learn over time.&lt;/p&gt;

&lt;p&gt;That framing changed how I work with it. I think it'll change how you do too.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Jaskaran Singh is a Senior Software Engineer working in AI training and evaluation. Previously built Android fintech apps at Comviva Technologies and Talentica Software. Currently building a Python-based OINP immigration monitoring bot on the side, because immigration status shouldn't require manually refreshing government websites.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Find me on &lt;a href="https://linkedin.com/in/jaskaranchana" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; or at my &lt;a href="https://jaskaranchana.github.io/Portfolio/" rel="noopener noreferrer"&gt;portfolio&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>career</category>
      <category>android</category>
      <category>kotlin</category>
    </item>
  </channel>
</rss>
