<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Juan Torchia</title>
    <description>The latest articles on Forem by Juan Torchia (@jtorchia).</description>
    <link>https://forem.com/jtorchia</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F885942%2F5b3b3860-d364-4de0-a335-cb7c251109d9.jpeg</url>
      <title>Forem: Juan Torchia</title>
      <link>https://forem.com/jtorchia</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/jtorchia"/>
    <language>en</language>
    <item>
      <title>Prisma Server Actions in Next.js 16: the patterns that work and the N+1 that sneaks up on you</title>
      <dc:creator>Juan Torchia</dc:creator>
      <pubDate>Mon, 18 May 2026 12:31:25 +0000</pubDate>
      <link>https://forem.com/jtorchia/prisma-server-actions-in-nextjs-16-the-patterns-that-work-and-the-n1-that-sneaks-up-on-you-19h2</link>
      <guid>https://forem.com/jtorchia/prisma-server-actions-in-nextjs-16-the-patterns-that-work-and-the-n1-that-sneaks-up-on-you-19h2</guid>
      <description>&lt;h1&gt;
  
  
  Prisma Server Actions in Next.js 16: the patterns that work and the N+1 that sneaks up on you
&lt;/h1&gt;

&lt;p&gt;Next.js 16 shipped recently with App Router improvements and Server Actions stabilized as a first-class primitive. The community is adopting Server Actions as the natural replacement for API routes on mutations. The migration looks obvious — less boilerplate, co-location with the component, shared types between client and server. I started moving in that direction too. And somewhere along the way I ran into an N+1 that didn't come from Prisma: it came from &lt;em&gt;how I was composing the Actions&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;My thesis is this: Prisma ORM 5 doesn't introduce N+1 in Server Actions. &lt;strong&gt;Action composition&lt;/strong&gt; does — the pattern of calling multiple independent Actions from the same component, or chaining them without collapsing the queries. It's an architecture problem, not an ORM problem. And it has a solution, but you have to know where to look.&lt;/p&gt;




&lt;h2&gt;
  
  
  Classic N+1 vs. composition N+1 in Server Actions
&lt;/h2&gt;

&lt;p&gt;The classic N+1 with Prisma is well-known: you iterate over a list and fire a separate query for each item because you forgot the &lt;code&gt;include&lt;/code&gt;. The &lt;a href="https://www.prisma.io/docs/orm/prisma-client/queries/query-optimization-performance" rel="noopener noreferrer"&gt;official Prisma docs on query optimization&lt;/a&gt; cover it precisely — use &lt;code&gt;include&lt;/code&gt; or &lt;code&gt;select&lt;/code&gt; with nested relations, or for more complex cases, &lt;code&gt;findMany&lt;/code&gt; with relational filters instead of queries in a loop.&lt;/p&gt;

&lt;p&gt;The composition N+1 in Server Actions is different. It doesn't show up inside the body of a single Action — it shows up when the component calls &lt;em&gt;multiple&lt;/em&gt; Actions in sequence or in parallel, and each Action opens its own connection with its own Prisma cursor. Under SSR load, that becomes connection pool pressure that never appears in local tests.&lt;/p&gt;

&lt;p&gt;Look at this problematic pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// app/dashboard/page.tsx&lt;/span&gt;
&lt;span class="c1"&gt;// ⚠️ Problematic pattern: three independent Actions&lt;/span&gt;
&lt;span class="c1"&gt;// each one opens its own connection to the pool&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;getUserProfile&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@/actions/user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;getRecentOrders&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@/actions/orders&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;getNotifications&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@/actions/notifications&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;DashboardPage&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Three separate round-trips, three pool connections&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;profile&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;getUserProfile&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;orders&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;getRecentOrders&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;notifications&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;getNotifications&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;Dashboard&lt;/span&gt; &lt;span class="nx"&gt;profile&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="nx"&gt;orders&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="nx"&gt;notifications&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;notifications&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each of those Actions has its own &lt;code&gt;prisma.user.findUnique&lt;/code&gt;, its own &lt;code&gt;prisma.order.findMany&lt;/code&gt;, its own &lt;code&gt;prisma.notification.findMany&lt;/code&gt;. Three queries that could be resolved with a single well-designed call — or at minimum with &lt;code&gt;Promise.all&lt;/code&gt; to parallelize them.&lt;/p&gt;




&lt;h2&gt;
  
  
  The connection pool under SSR load
&lt;/h2&gt;

&lt;p&gt;Prisma uses an internal connection pool. In Next.js App Router with SSR, each request can fire multiple Server Actions in the same render. If every component on the page calls its own Action, the pool receives a short but intense burst of connections per user visit.&lt;/p&gt;

&lt;p&gt;The most common pattern that generates this problem is using &lt;code&gt;prisma&lt;/code&gt; as a global singleton alongside &lt;code&gt;PrismaClient&lt;/code&gt; instantiated in each separate module. Prisma's documentation explicitly recommends using a singleton instance in serverless and SSR environments:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// lib/prisma.ts&lt;/span&gt;
&lt;span class="c1"&gt;// Singleton pattern recommended by Prisma for Next.js&lt;/span&gt;
&lt;span class="c1"&gt;// Source: https://www.prisma.io/docs/orm/prisma-client/queries/query-optimization-performance&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;PrismaClient&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@prisma/client&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;globalForPrisma&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;globalThis&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;prisma&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;PrismaClient&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;prisma&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
  &lt;span class="nx"&gt;globalForPrisma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prisma&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt;
  &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;PrismaClient&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;log&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;NODE_ENV&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;development&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;query&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;warn&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;error&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;error&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;NODE_ENV&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;production&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nx"&gt;globalForPrisma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prisma&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;prisma&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you skip this pattern, every hot reload in development — and potentially every cold start in production with some providers — can instantiate a fresh &lt;code&gt;PrismaClient&lt;/code&gt; with its own pool. The result: exhausted connections with no obvious warning in the logs.&lt;/p&gt;




&lt;h2&gt;
  
  
  The patterns that work: collapsing queries into a single Action
&lt;/h2&gt;

&lt;p&gt;The antidote to the composition N+1 is simple to state but requires discipline: &lt;strong&gt;one Action per use case, not one Action per entity&lt;/strong&gt;. Instead of three independent Actions for the dashboard, one single Action that groups the three queries with &lt;code&gt;Promise.all&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// actions/dashboard.ts&lt;/span&gt;
&lt;span class="c1"&gt;// ✅ Correct pattern: one Action that collapses the queries&lt;/span&gt;
&lt;span class="c1"&gt;// Promise.all for real parallelism within the same connection&lt;/span&gt;

&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;use server&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;prisma&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@/lib/prisma&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;auth&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@/lib/auth&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;getDashboardData&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;session&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Not authenticated&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;userId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;

  &lt;span class="c1"&gt;// Single pool invocation — three queries in parallel&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;notifications&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="nx"&gt;prisma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findUnique&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;where&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;userId&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;select&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;avatarUrl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="nx"&gt;prisma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findMany&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;where&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;createdAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;gte&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;orderBy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;createdAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;desc&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;take&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="nx"&gt;prisma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;notification&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findMany&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;where&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;read&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;orderBy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;createdAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;desc&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;take&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="p"&gt;])&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;notifications&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The difference isn't just about queries — it's about design. An Action that groups data for a specific use case is easier to cache, easier to test, and more honest about what problem it's actually solving.&lt;/p&gt;




&lt;h2&gt;
  
  
  The forgotten include and the query that multiplied
&lt;/h2&gt;

&lt;p&gt;The classic N+1 still lives inside Actions. If you iterate over results and fire a nested query per item, Prisma isn't going to save you — that's on you. The most frequent pattern I see in codebases just starting with Server Actions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// ⚠️ Classic N+1 inside an Action&lt;/span&gt;
&lt;span class="c1"&gt;// One query per order to fetch the product&lt;/span&gt;

&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;use server&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;prisma&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@/lib/prisma&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;getOrdersWithProducts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;orders&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;prisma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findMany&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;where&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;userId&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;

  &lt;span class="c1"&gt;// ❌ N+1: one query per order&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ordersWithProduct&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;order&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;product&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;prisma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;product&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findUnique&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;where&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;productId&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;})&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;order&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;product&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;ordersWithProduct&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The correct fix is to collapse with &lt;code&gt;include&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// ✅ Correct include: a single query with implicit JOIN&lt;/span&gt;
&lt;span class="c1"&gt;// Prisma collapses everything into a single round-trip&lt;/span&gt;

&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;use server&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;prisma&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@/lib/prisma&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;getOrdersWithProducts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;prisma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findMany&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;where&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;userId&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;include&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;product&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;select&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;price&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;imageUrl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;orderBy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;createdAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;desc&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;take&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;select&lt;/code&gt; inside the &lt;code&gt;include&lt;/code&gt; matters: you're not pulling the full &lt;code&gt;product&lt;/code&gt; object, you're pulling exactly the fields the component needs. That reduces the serialized payload Next.js has to transfer between server and client.&lt;/p&gt;




&lt;h2&gt;
  
  
  Real gotchas: what the 15-minute tutorial doesn't cover
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;"use server"&lt;/code&gt; doesn't guarantee automatic serialization of Prisma errors.&lt;/strong&gt; If an Action throws a &lt;code&gt;PrismaClientKnownRequestError&lt;/code&gt; (say, a constraint violation), that error doesn't reach the client the way you'd expect in all cases. You need to wrap with try/catch and serialize the error explicitly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// actions/user.ts&lt;/span&gt;
&lt;span class="c1"&gt;// Explicit Prisma error handling in Server Actions&lt;/span&gt;

&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;use server&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;prisma&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@/lib/prisma&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Prisma&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@prisma/client&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;createUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;prisma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Unique constraint violation (P2002 in Prisma)&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="k"&gt;instanceof&lt;/span&gt; &lt;span class="nx"&gt;Prisma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;PrismaClientKnownRequestError&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;code&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;P2002&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;That email is already registered&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="c1"&gt;// Unexpected error: log it, don't expose it&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;[createUser]&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Internal error. Please try again.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Query logging in development is your best diagnostic tool.&lt;/strong&gt; The singleton above already includes &lt;code&gt;log: ["query"]&lt;/code&gt; in development — that lets you see exactly how many queries each render fires. If you see the same &lt;code&gt;SELECT&lt;/code&gt; repeated N times in the terminal, you have an N+1 and you can attack it before it hits production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Server Actions and React 19 &lt;code&gt;useOptimistic&lt;/code&gt; can mask the problem.&lt;/strong&gt; If you use &lt;code&gt;useOptimistic&lt;/code&gt; to update the UI before the Action resolves, perceived latency drops — but the queries are still there. Don't confuse improved UX with optimized queries.&lt;/p&gt;

&lt;p&gt;This connects to something I already documented when looking at &lt;a href="https://juanchi.dev/en/blog/opentelemetry-spring-boot-logs-vs-traces-diagnosis" rel="noopener noreferrer"&gt;how OpenTelemetry in Spring Boot reveals the real problem when the log says OK&lt;/a&gt;: observability surface matters. In Next.js 16, if you don't have query traces, the Action log can look healthy while queries multiply underneath.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ: Prisma Server Actions Next.js 16 N+1
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Why does N+1 appear in Server Actions when it didn't in my API routes?&lt;/strong&gt;&lt;br&gt;
In API routes, the natural pattern was one route = one handler = one query. In Server Actions, co-location with the component invites you to create one Action per entity, and components end up calling several Actions in the same render. That composition generates multiple round-trips that never existed in an API route because the query was centralized.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does Prisma ORM 5 have any mechanism to automatically detect N+1?&lt;/strong&gt;&lt;br&gt;
Not automatically at runtime, but you can enable query logging (&lt;code&gt;log: ["query"]&lt;/code&gt;) to see them in development. There are community proposals for a native N+1 detector, but as of this post it's not a stable feature. The &lt;a href="https://www.prisma.io/docs/orm/prisma-client/queries/query-optimization-performance" rel="noopener noreferrer"&gt;official optimization docs&lt;/a&gt; document the patterns to avoid, but detection is still manual or via external tooling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How many &lt;code&gt;PrismaClient&lt;/code&gt; instances should I have in a Next.js 16 project?&lt;/strong&gt;&lt;br&gt;
One. Using the singleton pattern with &lt;code&gt;globalThis&lt;/code&gt;. More than one instance means more than one connection pool, which under SSR load can exhaust available database connections. This is especially critical on serverless providers where each function can have its own process.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is &lt;code&gt;Promise.all&lt;/code&gt; inside an Action enough to fix the pool problem?&lt;/strong&gt;&lt;br&gt;
For multiple independent queries inside a single Action, yes: &lt;code&gt;Promise.all&lt;/code&gt; parallelizes them within the same invocation and the pool handles a single connection (or the minimum needed). What &lt;code&gt;Promise.all&lt;/code&gt; does &lt;em&gt;not&lt;/em&gt; fix is when you have multiple independent Actions fired from different components in the same render — that needs consolidation at the architecture level.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How does this affect Next.js 16 caching?&lt;/strong&gt;&lt;br&gt;
Next.js 16 has Data Cache and Full Route Cache. If you use &lt;code&gt;fetch&lt;/code&gt; or &lt;code&gt;unstable_cache&lt;/code&gt;, you can cache the result of a Server Action. But the N+1 happens &lt;em&gt;before&lt;/em&gt; the cache — if the Action isn't cached (mutations, data with &lt;code&gt;no-store&lt;/code&gt;), every request executes the queries. The right pattern is to cache the entire Action with &lt;code&gt;unstable_cache&lt;/code&gt; when the data allows it, not to cache individual queries inside it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does this pattern also apply to Prisma with pure Server Components (no Actions)?&lt;/strong&gt;&lt;br&gt;
Yes, but with a difference: in Server Components without Actions, queries live directly in the component and Next.js can do component-level caching more easily. The composition problem is more acute with Server Actions because the mental model of "one Action = one button or form" leads to excessive granularity that multiplies round-trips.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I'm keeping and what I'm not buying
&lt;/h2&gt;

&lt;p&gt;I'm keeping this pattern: &lt;strong&gt;one Action per use case, not one Action per entity&lt;/strong&gt;. That's the most important mindset shift when migrating from API routes to Server Actions with Prisma.&lt;/p&gt;

&lt;p&gt;What I'm not buying is the narrative that Server Actions automatically simplify the data model. They simplify the boilerplate — shared types, no explicit endpoint — but the responsibility to not multiply queries is still yours. If you were coming from API routes where one route = one well-considered query, jumping to Actions can lead to query sprawl that's actually worse.&lt;/p&gt;

&lt;p&gt;The honest trade-off: Server Actions win on DX and co-location. They lose on visibility into which queries fire per render if you don't have logging active. Before deploying any page with multiple Actions, pull up the dev terminal with &lt;code&gt;log: ["query"]&lt;/code&gt; running and count how many &lt;code&gt;SELECT&lt;/code&gt;s appear per render. If the number surprises you, you have work to do.&lt;/p&gt;

&lt;p&gt;This connects directly to what I documented in &lt;a href="https://juanchi.dev/en/blog/prisma-vs-jdbc-benchmark-query-shape-n1" rel="noopener noreferrer"&gt;Prisma vs JDBC: the benchmark that almost made me blame the wrong ORM&lt;/a&gt; — the ORM is rarely the problem. Query shape is. And in Next.js 16 with Server Actions, shape is defined by the Action architecture, not by Prisma.&lt;/p&gt;

&lt;p&gt;For those coming from the Spring Boot world, there's an interesting parallel with &lt;a href="https://juanchi.dev/en/blog/retry-backoff-jitter-spring-boot-amplification" rel="noopener noreferrer"&gt;retry budget and amplification&lt;/a&gt;: every abstraction that looks like a simplification introduces its own amplification vector. In Server Actions, that vector is granular query composition.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.prisma.io/docs/orm/prisma-client/queries/query-optimization-performance" rel="noopener noreferrer"&gt;Prisma Docs — Query optimization &amp;amp; performance&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;This article was originally published on &lt;a href="https://juanchi.dev/en/blog/prisma-server-actions-nextjs-16-n1-composition-patterns" rel="noopener noreferrer"&gt;juanchi.dev&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>english</category>
      <category>typescript</category>
      <category>performance</category>
      <category>nextjs</category>
    </item>
    <item>
      <title>Prisma Server Actions en Next.js 16: los patrones que funcionan y el N+1 que aparece cuando no lo esperás</title>
      <dc:creator>Juan Torchia</dc:creator>
      <pubDate>Mon, 18 May 2026 12:31:21 +0000</pubDate>
      <link>https://forem.com/jtorchia/prisma-server-actions-en-nextjs-16-los-patrones-que-funcionan-y-el-n1-que-aparece-cuando-no-lo-135e</link>
      <guid>https://forem.com/jtorchia/prisma-server-actions-en-nextjs-16-los-patrones-que-funcionan-y-el-n1-que-aparece-cuando-no-lo-135e</guid>
      <description>&lt;h1&gt;
  
  
  Prisma Server Actions en Next.js 16: los patrones que funcionan y el N+1 que aparece cuando no lo esperás
&lt;/h1&gt;

&lt;p&gt;Next.js 16 salió hace poco con mejoras en el App Router y estabilización de Server Actions como primitiva de primera clase. La comunidad está adoptando Server Actions como el reemplazo natural de las API routes para mutaciones. La migración parece obvia — menos boilerplate, co-location con el componente, tipo compartido entre cliente y servidor. Yo también empecé a moverme en esa dirección. Y en algún punto del camino encontré un N+1 que no venía de Prisma: venía de &lt;em&gt;cómo estaba componiendo las Actions&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Mi tesis es esta: Prisma ORM 5 no introduce N+1 en Server Actions. Lo introduce la &lt;strong&gt;composición de Server Actions&lt;/strong&gt; — el patrón de llamar múltiples acciones independientes desde el mismo componente o encadenarlas sin colapsar las queries. Es un problema de arquitectura, no de ORM. Y tiene solución, pero hay que saber dónde mirar.&lt;/p&gt;




&lt;h2&gt;
  
  
  El N+1 clásico vs el N+1 de composición en Server Actions
&lt;/h2&gt;

&lt;p&gt;En el N+1 clásico con Prisma, el problema es conocido: iterás sobre una lista y por cada ítem hacés una query separada porque olvidaste el &lt;code&gt;include&lt;/code&gt;. La &lt;a href="https://www.prisma.io/docs/orm/prisma-client/queries/query-optimization-performance" rel="noopener noreferrer"&gt;documentación oficial de Prisma sobre optimización&lt;/a&gt; lo documenta con precisión: la solución es usar &lt;code&gt;include&lt;/code&gt; o &lt;code&gt;select&lt;/code&gt; con relaciones nested, o en casos más complejos, &lt;code&gt;findMany&lt;/code&gt; con filtros relacionales en lugar de queries en loop.&lt;/p&gt;

&lt;p&gt;El N+1 de composición en Server Actions es diferente. No aparece en el cuerpo de una sola Action — aparece cuando el componente llama a &lt;em&gt;varias&lt;/em&gt; Actions en secuencia o en paralelo, y cada Action abre su propia conexión con su propio cursor de Prisma. Bajo carga de SSR, eso se convierte en una presión sobre el connection pool que no aparece en tests locales.&lt;/p&gt;

&lt;p&gt;Mirá este patrón problemático:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// app/dashboard/page.tsx&lt;/span&gt;
&lt;span class="c1"&gt;// ⚠️ Patrón problemático: tres Actions independientes&lt;/span&gt;
&lt;span class="c1"&gt;// cada una abre su propia conexión al pool&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;getUserProfile&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@/actions/usuario&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;getRecentOrders&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@/actions/pedidos&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;getNotifications&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@/actions/notificaciones&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;DashboardPage&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Tres round-trips separados, tres conexiones del pool&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;perfil&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;getUserProfile&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;pedidos&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;getRecentOrders&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;notificaciones&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;getNotifications&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;Dashboard&lt;/span&gt; &lt;span class="nx"&gt;perfil&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;perfil&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="nx"&gt;pedidos&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;pedidos&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="nx"&gt;notificaciones&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;notificaciones&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cada una de esas Actions tiene su propio &lt;code&gt;prisma.user.findUnique&lt;/code&gt;, su propio &lt;code&gt;prisma.order.findMany&lt;/code&gt;, su propio &lt;code&gt;prisma.notification.findMany&lt;/code&gt;. Tres queries que podrían resolverse con una sola llamada bien diseñada — o al menos con &lt;code&gt;Promise.all&lt;/code&gt; para paralelizarlas.&lt;/p&gt;




&lt;h2&gt;
  
  
  El connection pool bajo carga de SSR
&lt;/h2&gt;

&lt;p&gt;Prisma usa un connection pool interno. En Next.js App Router con SSR, cada request puede disparar múltiples Server Actions en el mismo render. Si cada componente de la página llama su propia Action, el pool recibe una ráfaga corta pero intensa de conexiones por cada visita de usuario.&lt;/p&gt;

&lt;p&gt;El patrón más común que genera este problema es el uso de &lt;code&gt;prisma&lt;/code&gt; como singleton global junto con el &lt;code&gt;PrismaClient&lt;/code&gt; instanciado en cada módulo separado. La documentación de Prisma recomienda explícitamente usar una instancia singleton en entornos serverless y SSR:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// lib/prisma.ts&lt;/span&gt;
&lt;span class="c1"&gt;// Patrón singleton recomendado por Prisma para Next.js&lt;/span&gt;
&lt;span class="c1"&gt;// Fuente: https://www.prisma.io/docs/orm/prisma-client/queries/query-optimization-performance&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;PrismaClient&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@prisma/client&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;globalForPrisma&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;globalThis&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;prisma&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;PrismaClient&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;prisma&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
  &lt;span class="nx"&gt;globalForPrisma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prisma&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt;
  &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;PrismaClient&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;log&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;NODE_ENV&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;development&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;query&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;warn&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;error&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;error&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;NODE_ENV&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;production&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nx"&gt;globalForPrisma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prisma&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;prisma&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Si no usás este patrón, cada hot reload en desarrollo — y potencialmente cada cold start en producción con algunos providers — puede instanciar un &lt;code&gt;PrismaClient&lt;/code&gt; nuevo con su propio pool. El resultado: conexiones agotadas sin advertencia obvia en los logs.&lt;/p&gt;




&lt;h2&gt;
  
  
  Los patrones que funcionan: colapsar queries en una sola Action
&lt;/h2&gt;

&lt;p&gt;El antídoto al N+1 de composición es simple de enunciar pero requiere disciplina: &lt;strong&gt;una Action por caso de uso, no una Action por entidad&lt;/strong&gt;. En lugar de tres Actions independientes para el dashboard, una sola Action que agrupa las tres queries con &lt;code&gt;Promise.all&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// actions/dashboard.ts&lt;/span&gt;
&lt;span class="c1"&gt;// ✅ Patrón correcto: una Action que colapsa las queries&lt;/span&gt;
&lt;span class="c1"&gt;// Promise.all para paralelismo real dentro de la misma conexión&lt;/span&gt;

&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;use server&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;prisma&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@/lib/prisma&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;auth&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@/lib/auth&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;getDashboardData&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;session&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;No autenticado&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;userId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;

  &lt;span class="c1"&gt;// Una sola invocación al pool — tres queries en paralelo&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;perfil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;pedidos&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;notificaciones&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="nx"&gt;prisma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findUnique&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;where&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;userId&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;select&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;nombre&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;avatarUrl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="nx"&gt;prisma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findMany&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;where&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;creadoEn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;gte&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;orderBy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;creadoEn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;desc&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;take&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="nx"&gt;prisma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;notification&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findMany&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;where&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;leida&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;orderBy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;creadoEn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;desc&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;take&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="p"&gt;])&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;perfil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;pedidos&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;notificaciones&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;La diferencia no es solo de queries — es de diseño. Una Action que agrupa los datos de un caso de uso específico es más fácil de cachear, más fácil de testear y más honesta sobre qué problema está resolviendo.&lt;/p&gt;




&lt;h2&gt;
  
  
  El include que se olvidó y la query que se multiplicó
&lt;/h2&gt;

&lt;p&gt;El N+1 clásico todavía existe dentro de las Actions. Si iterás resultados y hacés una query anidada por cada ítem, Prisma no lo va a prevenir solo — eso es tuyo. El patrón más frecuente que veo en codebases que empiezan con Server Actions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// ⚠️ N+1 clásico dentro de una Action&lt;/span&gt;
&lt;span class="c1"&gt;// Una query por cada pedido para traer el producto&lt;/span&gt;

&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;use server&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;prisma&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@/lib/prisma&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;getPedidosConProductos&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;pedidos&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;prisma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findMany&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;where&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;userId&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;

  &lt;span class="c1"&gt;// ❌ N+1: una query por cada pedido&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;pedidosConProducto&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;pedidos&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;pedido&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;producto&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;prisma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;product&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findUnique&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;where&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;pedido&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;productId&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;})&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;pedido&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;producto&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;pedidosConProducto&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;La solución correcta es colapsar con &lt;code&gt;include&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// ✅ Include correcto: una sola query con JOIN implícito&lt;/span&gt;
&lt;span class="c1"&gt;// Prisma colapsa todo en un único round-trip&lt;/span&gt;

&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;use server&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;prisma&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@/lib/prisma&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;getPedidosConProductos&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;prisma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findMany&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;where&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;userId&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;include&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;producto&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;select&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;nombre&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;precio&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;imagenUrl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;orderBy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;creadoEn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;desc&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;take&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;El &lt;code&gt;select&lt;/code&gt; dentro del &lt;code&gt;include&lt;/code&gt; es importante: no traés el objeto completo de &lt;code&gt;producto&lt;/code&gt;, traés exactamente los campos que el componente necesita. Eso reduce el payload serializado que Next.js tiene que transferir entre server y cliente.&lt;/p&gt;




&lt;h2&gt;
  
  
  Gotchas reales: lo que no aparece en el tutorial de 15 minutos
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;El &lt;code&gt;"use server"&lt;/code&gt; no garantiza serialización automática de errores de Prisma.&lt;/strong&gt; Si una Action lanza un &lt;code&gt;PrismaClientKnownRequestError&lt;/code&gt; (por ejemplo, un constraint violation), ese error no llega al cliente de la forma que esperás en todos los casos. Necesitás wrapear con try/catch y serializar el error explícitamente:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// actions/usuario.ts&lt;/span&gt;
&lt;span class="c1"&gt;// Manejo explícito de errores de Prisma en Server Actions&lt;/span&gt;

&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;use server&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;prisma&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@/lib/prisma&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Prisma&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@prisma/client&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;crearUsuario&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;nombre&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;prisma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Constraint unique violation (P2002 en Prisma)&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="k"&gt;instanceof&lt;/span&gt; &lt;span class="nx"&gt;Prisma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;PrismaClientKnownRequestError&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;code&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;P2002&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;El email ya está registrado&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="c1"&gt;// Error no esperado: loguear, no exponer&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;[crearUsuario]&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Error interno. Intentá de nuevo.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;El logging de queries en desarrollo es tu mejor herramienta de diagnóstico.&lt;/strong&gt; El singleton de arriba ya incluye &lt;code&gt;log: ["query"]&lt;/code&gt; en desarrollo — eso te permite ver exactamente cuántas queries dispara cada render. Si ves el mismo &lt;code&gt;SELECT&lt;/code&gt; repetido N veces en el terminal, tenés un N+1 y podés atacarlo antes de que llegue a producción.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Server Actions y React 19 &lt;code&gt;useOptimistic&lt;/code&gt; pueden ocultar el problema.&lt;/strong&gt; Si usás &lt;code&gt;useOptimistic&lt;/code&gt; para actualizar la UI antes de que la Action resuelva, la percepción de latencia baja — pero las queries siguen estando. No confundas UX mejorada con queries optimizadas.&lt;/p&gt;

&lt;p&gt;Esto conecta con algo que ya documenté al analizar &lt;a href="https://juanchi.dev/es/blog/opentelemetry-spring-boot-logs-vs-traces-diagnostico" rel="noopener noreferrer"&gt;cómo OpenTelemetry en Spring Boot muestra el problema real cuando el log dice OK&lt;/a&gt;: la superficie de observabilidad importa. En Next.js 16, si no tenés trazas de queries, el log de la Action puede parecer saludable mientras las queries se multiplican por debajo.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ: Prisma Server Actions Next.js 16 N+1
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;¿Por qué aparece N+1 en Server Actions si no aparecía en mis API routes?&lt;/strong&gt;&lt;br&gt;
En API routes, el patrón natural era una ruta = un handler = una query. En Server Actions, la co-location con el componente invita a crear una Action por entidad, y los componentes terminan llamando varias Actions en el mismo render. Esa composición genera múltiples round-trips que en una API route no existían porque la query estaba centralizada.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;¿Prisma ORM 5 tiene algún mecanismo para detectar N+1 automáticamente?&lt;/strong&gt;&lt;br&gt;
No automáticamente en runtime, pero sí podés habilitar el log de queries (&lt;code&gt;log: ["query"]&lt;/code&gt;) para verlas en desarrollo. Hay propuestas en la comunidad para un detector de N+1 nativo, pero a la fecha de este post no es una feature estable. La &lt;a href="https://www.prisma.io/docs/orm/prisma-client/queries/query-optimization-performance" rel="noopener noreferrer"&gt;documentación oficial de optimización&lt;/a&gt; documenta los patrones a evitar, pero la detección sigue siendo manual o via herramientas externas.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;¿Cuántas instancias de &lt;code&gt;PrismaClient&lt;/code&gt; debería tener en un proyecto Next.js 16?&lt;/strong&gt;&lt;br&gt;
Una sola, usando el patrón singleton con &lt;code&gt;globalThis&lt;/code&gt;. Más de una instancia significa más de un connection pool, lo que bajo carga de SSR puede agotar las conexiones disponibles en la base de datos. Esto es especialmente crítico en providers serverless donde cada función puede tener su propio proceso.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;¿&lt;code&gt;Promise.all&lt;/code&gt; dentro de una Action es suficiente para resolver el problema de pool?&lt;/strong&gt;&lt;br&gt;
Para el caso de múltiples queries independientes dentro de una Action, sí: &lt;code&gt;Promise.all&lt;/code&gt; las paraleliza dentro de la misma invocación y el pool maneja una sola conexión (o las mínimas necesarias). El problema que &lt;code&gt;Promise.all&lt;/code&gt; &lt;em&gt;no&lt;/em&gt; resuelve es cuando tenés múltiples Actions independientes disparadas desde distintos componentes del mismo render — ahí necesitás consolidar a nivel de arquitectura.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;¿Cómo afecta esto al caching de Next.js 16?&lt;/strong&gt;&lt;br&gt;
Next.js 16 tiene caching de Data Cache y Full Route Cache. Si usás &lt;code&gt;fetch&lt;/code&gt; o &lt;code&gt;unstable_cache&lt;/code&gt;, podés cachear el resultado de una Server Action. Pero el N+1 ocurre &lt;em&gt;antes&lt;/em&gt; del cache — si la Action no está cacheada (por ejemplo, en mutaciones o en datos con &lt;code&gt;no-store&lt;/code&gt;), cada request ejecuta las queries. El patrón correcto es cachear la Action completa con &lt;code&gt;unstable_cache&lt;/code&gt; cuando los datos lo permiten, no cachear queries individuales dentro de ella.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;¿Este patrón aplica también a Prisma con Server Components puros (sin Actions)?&lt;/strong&gt;&lt;br&gt;
Sí, pero con una diferencia: en Server Components sin Actions, las queries viven en el componente directamente y Next.js puede hacer caching a nivel de componente más fácilmente. El problema de composición se acentúa con Server Actions porque el modelo mental de "una Action = un botón o formulario" lleva a granularidad excesiva que multiplica los round-trips.&lt;/p&gt;




&lt;h2&gt;
  
  
  Lo que me quedo y lo que no compro
&lt;/h2&gt;

&lt;p&gt;Me quedo con este patrón: &lt;strong&gt;una Action por caso de uso, no una Action por entidad&lt;/strong&gt;. Es el cambio de mentalidad más importante al migrar de API routes a Server Actions con Prisma.&lt;/p&gt;

&lt;p&gt;Lo que no compro es la narrativa de que Server Actions simplifican el modelo de datos automáticamente. Simplifican el boilerplate — el tipo compartido, el endpoint explícito — pero la responsabilidad de no multiplicar queries sigue siendo tuya. Si venías de API routes donde una ruta = una query bien pensada, el salto a Actions puede llevar a una dispersión de queries que es peor.&lt;/p&gt;

&lt;p&gt;El trade-off honesto: Server Actions ganan en DX y co-location. Pierden en visibilidad de qué queries se disparan por render si no tenés el logging activo. Antes de deployar cualquier página con múltiples Actions, revisá el terminal de desarrollo con &lt;code&gt;log: ["query"]&lt;/code&gt; activo y contá cuántos &lt;code&gt;SELECT&lt;/code&gt; aparecen por render. Si el número te sorprende, tenés trabajo por hacer.&lt;/p&gt;

&lt;p&gt;Esto se conecta directamente con lo que documenté en &lt;a href="https://juanchi.dev/es/blog/prisma-vs-jdbc-benchmark-query-shape-n1" rel="noopener noreferrer"&gt;Prisma vs JDBC: el benchmark que casi me hace culpar al ORM equivocado&lt;/a&gt; — el ORM rara vez es el problema. La forma de las queries sí lo es. Y en Next.js 16 con Server Actions, la forma la define la arquitectura de las Actions, no Prisma.&lt;/p&gt;

&lt;p&gt;Para los que vienen del mundo Spring Boot, hay un paralelo interesante con &lt;a href="https://juanchi.dev/es/blog/retry-backoff-jitter-spring-boot-amplification" rel="noopener noreferrer"&gt;el presupuesto de retry y amplificación&lt;/a&gt;: cada abstracción que parece simplificar introduce su propio vector de amplificación. En Server Actions, ese vector es la composición granular de queries.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Fuentes:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.prisma.io/docs/orm/prisma-client/queries/query-optimization-performance" rel="noopener noreferrer"&gt;Prisma Docs — Query optimization &amp;amp; performance&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Este artículo fue publicado originalmente en &lt;a href="https://juanchi.dev/es/blog/prisma-server-actions-nextjs-16-n1-produccion" rel="noopener noreferrer"&gt;juanchi.dev&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>spanish</category>
      <category>espanol</category>
      <category>typescript</category>
      <category>performance</category>
    </item>
    <item>
      <title>Spring Boot 2026: Why Measuring Only Startup Time Is a Trap</title>
      <dc:creator>Juan Torchia</dc:creator>
      <pubDate>Sun, 17 May 2026 21:22:44 +0000</pubDate>
      <link>https://forem.com/jtorchia/spring-boot-2026-why-measuring-only-startup-time-is-a-trap-2oa3</link>
      <guid>https://forem.com/jtorchia/spring-boot-2026-why-measuring-only-startup-time-is-a-trap-2oa3</guid>
      <description>&lt;p&gt;There's a question that surfaces every time someone mentions GraalVM or Spring AOT in a technical meeting: &lt;em&gt;how long does it take to start?&lt;/em&gt; It's the first metric that hits the screen, the number that closes the debate in five minutes. The problem is that question alone isn't enough to make any serious architecture decision, and in 2026 we have enough evidence to prove it with a reproducible lab.&lt;/p&gt;

&lt;p&gt;I built &lt;a href="https://github.com/JuanTorchia/springboot-jvm-2026" rel="noopener noreferrer"&gt;&lt;code&gt;JuanTorchia/springboot-jvm-2026&lt;/code&gt;&lt;/a&gt; (tag &lt;code&gt;editorial-final-startup-matrix&lt;/code&gt;) around exactly that working hypothesis: if you only look at startup time, you're ignoring half the costs that actually matter in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  The lab backend is not a Hello World
&lt;/h2&gt;

&lt;p&gt;Choosing what to measure matters as much as measuring it. A &lt;code&gt;GET /ping&lt;/code&gt; endpoint that returns &lt;code&gt;{"status":"ok"}&lt;/code&gt; doesn't activate the same bean graph or the same JIT behavior as a real application. So the lab backend has concrete surface area:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;POST /api/orders&lt;/code&gt; with Jakarta Validation on a record&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;GET /api/orders/{id}&lt;/code&gt; with Spring Data JDBC on PostgreSQL 17&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;POST /api/work&lt;/code&gt; with deterministic work (iterative CRC32, up to 5,000 iterations)&lt;/li&gt;
&lt;li&gt;Flyway for migrations, Actuator for readiness/liveness&lt;/li&gt;
&lt;li&gt;HikariCP with the pool explicitly configured in the &lt;code&gt;benchmark&lt;/code&gt; profile&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;WorkService&lt;/code&gt; deserves its own paragraph because it's the only endpoint that mixes real CPU with a database query (&lt;code&gt;countOrders()&lt;/code&gt;). That matters: without that endpoint, native and classic JVM look practically identical on warm latency because the JIT has nothing interesting to optimize.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// WorkService.java — deterministic work to force real differences between modes&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="nf"&gt;calculateScore&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;iterations&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;seed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getBytes&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;StandardCharsets&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;UTF_8&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;17&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;iterations&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="no"&gt;CRC32&lt;/span&gt; &lt;span class="n"&gt;crc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="no"&gt;CRC32&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
        &lt;span class="n"&gt;crc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;update&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seed&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;crc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;update&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;longToBytes&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
        &lt;span class="c1"&gt;// rotation + golden Fibonacci constant for dispersion&lt;/span&gt;
        &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Long&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;rotateLeft&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;^&lt;/span&gt; &lt;span class="n"&gt;crc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getValue&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mh"&gt;0x9E3779B97F4A7C15&lt;/span&gt;&lt;span class="no"&gt;L&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="nc"&gt;Long&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;MAX_VALUE&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;5_000&lt;/code&gt; iteration cap isn't arbitrary: I validated it with &lt;code&gt;WorkServiceTest&lt;/code&gt; to keep the cap predictable and prevent the benchmark from accidentally becoming a throughput test.&lt;/p&gt;

&lt;h2&gt;
  
  
  Four modes, four distinct operational surfaces
&lt;/h2&gt;

&lt;p&gt;The lab compares:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;jvm&lt;/code&gt;: &lt;code&gt;java -jar&lt;/code&gt; on Eclipse Temurin 21, the baseline for every team that hasn't touched anything&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;cds&lt;/code&gt;: JVM with a dynamic AppCDS archive prepared in a separate phase&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;aot-jvm&lt;/code&gt;: Spring Boot AOT on JVM, &lt;strong&gt;with &lt;code&gt;-Dspring.aot.enabled=true&lt;/code&gt; verified in the container&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;native&lt;/code&gt;: GraalVM Native Image compiled inside &lt;code&gt;ghcr.io/graalvm/native-image-community:21&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last point about AOT has a story. In the editorial run on May 17, 2026 (17:31–17:44 Buenos Aires time), the &lt;code&gt;aot-jvm&lt;/code&gt; results made no sense until I confirmed the flag was actually reaching the container. Without &lt;code&gt;spring.aot.enabled=true&lt;/code&gt; verified in the runtime env, AOT mode is indistinguishable from classic JVM on startup. The &lt;code&gt;results/environment.json&lt;/code&gt; captures exactly that so anyone reproducing the lab knows what was actually running.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;Dockerfile.native&lt;/code&gt; does the full build inside the builder container:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# Dockerfile.native — the native build happens inside the builder, no local GraalVM required&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;ghcr.io/graalvm/native-image-community:21&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;builder&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /workspace&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;microdnf &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; maven &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; microdnf clean all
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; .mvn/ .mvn/&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; mvnw pom.xml ./&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; src/ src/&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;chmod&lt;/span&gt; +x ./mvnw &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; ./mvnw &lt;span class="nt"&gt;-Pnative&lt;/span&gt; &lt;span class="nt"&gt;-DskipTests&lt;/span&gt; native:compile

&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; ubuntu:24.04&lt;/span&gt;
&lt;span class="c"&gt;# final image with no JRE: just the compiled binary&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=builder /workspace/target/startup-lab /workspace/startup-lab&lt;/span&gt;
&lt;span class="k"&gt;ENTRYPOINT&lt;/span&gt;&lt;span class="s"&gt; ["/workspace/startup-lab"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That means the &lt;code&gt;startup-lab&lt;/code&gt; binary runs without a JRE in the final image. Smaller image, much faster startup, but the cost shifted entirely to build time. That's the central trade-off of native mode: you don't eliminate work, you move it from runtime to build time.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the startup number doesn't capture
&lt;/h2&gt;

&lt;p&gt;In this local matrix, native reduced startup time and RSS compared to JVM modes. That's true and reproducible on the &lt;code&gt;editorial-final-startup-matrix&lt;/code&gt; tag. But that number alone doesn't tell the full story.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Build time&lt;/strong&gt; for native is an order of magnitude higher than a classic &lt;code&gt;mvn package&lt;/code&gt;. If you're on a CI pipeline with frequent deploys, that cost shows up on every merge to main. It's not a startup cost: it's a development cycle cost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First-request latency&lt;/strong&gt; can differ materially from warm latency. On classic JVM, the first request pays the cost of unloaded classes and a cold JIT. On native there's no JIT, so the first request and request number one thousand have a similar profile. That can be an advantage or a disadvantage depending on your actual load profile.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;AppCDS preparation cost&lt;/strong&gt; is a third dimension that only appears in &lt;code&gt;cds&lt;/code&gt; mode: there's an archive dump phase that runs before the container is ready for traffic. Operationally that means an initialization step that doesn't exist in the other modes, and that you need to model in your deploy pipeline if CDS is the option.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Warm latency&lt;/strong&gt; under sustained load, GC behavior under high memory pressure, and scheduling on Kubernetes are dimensions this lab intentionally doesn't measure. Running three iterations on Docker Desktop over WSL2 on Windows is not production. What the lab does guarantee is local reproducibility: anyone can clone the repo and reproduce the matrix with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Windows — full editorial run with 3 runs per mode and native enabled&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;powershell&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-NoProfile&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-ExecutionPolicy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Bypass&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-File&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;\scripts\run-lab.ps1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Preset&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;editorial&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The decision startup time can't make on its own
&lt;/h2&gt;

&lt;p&gt;My position after building this: startup time is useful as a tiebreaker when everything else is even. Using it as the primary metric to choose between classic JVM, AppCDS, AOT-JVM, and native is making an architecture decision on a single axis.&lt;/p&gt;

&lt;p&gt;What I can claim with evidence from this matrix:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If the requirement is startup around 1.4 seconds and controlled RSS in this matrix, native delivers that, but you pay with higher build time and the loss of JIT at warm.&lt;/li&gt;
&lt;li&gt;If the team needs fast CI cycles and current startup is tolerable, AOT-JVM with &lt;code&gt;-Dspring.aot.enabled=true&lt;/code&gt; improves boot time without changing the deploy artifact.&lt;/li&gt;
&lt;li&gt;AppCDS has the lowest operational change cost of all four, but it has that preparation phase that needs to be explicitly modeled.&lt;/li&gt;
&lt;li&gt;Classic JVM is still the correct baseline for any comparison. Dropping it without measuring the other three axes is pure vibes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There's no universal winner. There are trade-offs that depend on how many times per hour the service scales, how heavy the CI pipeline is, and whether the team can take on the additional operational complexity of native.&lt;/p&gt;

&lt;p&gt;The repo is at &lt;a href="https://github.com/JuanTorchia/springboot-jvm-2026" rel="noopener noreferrer"&gt;&lt;code&gt;JuanTorchia/springboot-jvm-2026&lt;/code&gt;&lt;/a&gt;, tag &lt;code&gt;editorial-final-startup-matrix&lt;/code&gt;. Raw results are in &lt;code&gt;results/raw/*.json&lt;/code&gt; and the aggregated matrix in &lt;code&gt;results/comparison.md&lt;/code&gt;. If you're going to cite it, use the wording from the README: &lt;em&gt;"In the &lt;code&gt;editorial-final-startup-matrix&lt;/code&gt; tag of &lt;code&gt;JuanTorchia/springboot-jvm-2026&lt;/code&gt;, measured locally on Windows Docker Desktop/WSL2..."&lt;/em&gt; — that environment context isn't a decorative disclaimer, it's part of the data.&lt;/p&gt;

&lt;p&gt;What's the dimension that drives your decision most between these four modes? Build time, warm latency, or library compatibility on native?&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This article was originally published on &lt;a href="https://juanchi.dev/en/blog/spring-boot-startup-time-2026-graalvm-native-aot-cds" rel="noopener noreferrer"&gt;juanchi.dev&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>english</category>
      <category>performance</category>
      <category>arquitectura</category>
      <category>springboot</category>
    </item>
    <item>
      <title>Spring Boot 2026: por qué medir solo startup time es una trampa</title>
      <dc:creator>Juan Torchia</dc:creator>
      <pubDate>Sun, 17 May 2026 21:22:37 +0000</pubDate>
      <link>https://forem.com/jtorchia/spring-boot-2026-por-que-medir-solo-startup-time-es-una-trampa-2o0h</link>
      <guid>https://forem.com/jtorchia/spring-boot-2026-por-que-medir-solo-startup-time-es-una-trampa-2o0h</guid>
      <description>&lt;p&gt;Hay una pregunta que aparece cada vez que alguien toca GraalVM o Spring AOT en una reunión técnica: &lt;em&gt;¿cuánto tarda en arrancar?&lt;/em&gt; Es la primera métrica que vuela a la pantalla, el número que cierra el debate en cinco minutos. El problema es que esa pregunta sola no alcanza para tomar ninguna decisión de arquitectura seria, y en 2026 tenemos suficiente evidencia para demostrarlo con un laboratorio reproducible.&lt;/p&gt;

&lt;p&gt;Armé &lt;a href="https://github.com/JuanTorchia/springboot-jvm-2026" rel="noopener noreferrer"&gt;&lt;code&gt;JuanTorchia/springboot-jvm-2026&lt;/code&gt;&lt;/a&gt; (tag &lt;code&gt;editorial-final-startup-matrix&lt;/code&gt;) exactamente con esa hipótesis de trabajo: si solo mirás startup time, estás ignorando la mitad de los costos que importan en producción.&lt;/p&gt;

&lt;h2&gt;
  
  
  El backend de laboratorio no es un Hello World
&lt;/h2&gt;

&lt;p&gt;Elegir qué medir importa tanto como medir. Un endpoint &lt;code&gt;GET /ping&lt;/code&gt; que devuelve &lt;code&gt;{"status":"ok"}&lt;/code&gt; no activa el mismo grafo de beans ni el mismo comportamiento de JIT que una aplicación real. Por eso el backend del lab tiene superficie concreta:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;POST /api/orders&lt;/code&gt; con Jakarta Validation sobre un record&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;GET /api/orders/{id}&lt;/code&gt; con Spring Data JDBC sobre PostgreSQL 17&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;POST /api/work&lt;/code&gt; con trabajo determinístico (CRC32 iterativo, hasta 5.000 iteraciones)&lt;/li&gt;
&lt;li&gt;Flyway para migraciones, Actuator para readiness/liveness&lt;/li&gt;
&lt;li&gt;HikariCP con pool configurado explícitamente en el perfil &lt;code&gt;benchmark&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;El &lt;code&gt;WorkService&lt;/code&gt; merece un párrafo aparte porque es el único endpoint que mezcla CPU real con una query de base de datos (&lt;code&gt;countOrders()&lt;/code&gt;). Eso importa: sin ese endpoint, native y JVM clásica se ven prácticamente iguales en warm latency porque el JIT no tiene nada interesante que optimizar.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// WorkService.java — trabajo determinístico para forzar diferencias reales entre modos&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="nf"&gt;calculateScore&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;iterations&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;seed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getBytes&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;StandardCharsets&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;UTF_8&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;17&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;iterations&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="no"&gt;CRC32&lt;/span&gt; &lt;span class="n"&gt;crc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="no"&gt;CRC32&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
        &lt;span class="n"&gt;crc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;update&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seed&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;crc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;update&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;longToBytes&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
        &lt;span class="c1"&gt;// rotación + constante Fibonacci aurea para dispersión&lt;/span&gt;
        &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Long&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;rotateLeft&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;^&lt;/span&gt; &lt;span class="n"&gt;crc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getValue&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mh"&gt;0x9E3779B97F4A7C15&lt;/span&gt;&lt;span class="no"&gt;L&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="nc"&gt;Long&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;MAX_VALUE&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;El límite de &lt;code&gt;5_000&lt;/code&gt; iteraciones no es arbitrario: lo validé con &lt;code&gt;WorkServiceTest&lt;/code&gt; para que el cap sea predecible y el benchmark no se vuelva una prueba de throughput accidental.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cuatro modos, cuatro superficies operativas distintas
&lt;/h2&gt;

&lt;p&gt;El lab compara:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;jvm&lt;/code&gt;: &lt;code&gt;java -jar&lt;/code&gt; sobre Eclipse Temurin 21, el baseline de toda empresa que no tocó nada&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;cds&lt;/code&gt;: JVM con archivo AppCDS dinámico preparado en una fase separada&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;aot-jvm&lt;/code&gt;: Spring Boot AOT sobre JVM, &lt;strong&gt;con &lt;code&gt;-Dspring.aot.enabled=true&lt;/code&gt; verificado en el contenedor&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;native&lt;/code&gt;: GraalVM Native Image compilado dentro de &lt;code&gt;ghcr.io/graalvm/native-image-community:21&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Ese último punto del AOT tiene historia. En la corrida editorial del 17 de mayo de 2026 (17:31–17:44 hora Buenos Aires), los resultados de &lt;code&gt;aot-jvm&lt;/code&gt; no tenían sentido hasta que confirmé que el flag estaba llegando al contenedor. Sin &lt;code&gt;spring.aot.enabled=true&lt;/code&gt; verificado en el env del runtime, el modo AOT no se diferencia del JVM clásico en startup. El &lt;code&gt;results/environment.json&lt;/code&gt; captura eso exactamente para que cualquiera que reproduzca el lab sepa qué estaba corriendo.&lt;/p&gt;

&lt;p&gt;El &lt;code&gt;Dockerfile.native&lt;/code&gt; hace el build completo adentro del contenedor builder:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# Dockerfile.native — el build de native ocurre dentro del builder, no requiere GraalVM local&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;ghcr.io/graalvm/native-image-community:21&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;builder&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /workspace&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;microdnf &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; maven &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; microdnf clean all
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; .mvn/ .mvn/&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; mvnw pom.xml ./&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; src/ src/&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;chmod&lt;/span&gt; +x ./mvnw &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; ./mvnw &lt;span class="nt"&gt;-Pnative&lt;/span&gt; &lt;span class="nt"&gt;-DskipTests&lt;/span&gt; native:compile

&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; ubuntu:24.04&lt;/span&gt;
&lt;span class="c"&gt;# imagen final sin JRE: solo el binario compilado&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=builder /workspace/target/startup-lab /workspace/startup-lab&lt;/span&gt;
&lt;span class="k"&gt;ENTRYPOINT&lt;/span&gt;&lt;span class="s"&gt; ["/workspace/startup-lab"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Eso significa que el binario &lt;code&gt;startup-lab&lt;/code&gt; corre sin JRE en la imagen final. Imagen más chica, startup mucho más rápido, pero el costo se desplazó completamente al build. Esa es la decisión central del modo native: no eliminás trabajo, lo movés de runtime a build time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lo que el número de startup no captura
&lt;/h2&gt;

&lt;p&gt;En esta matriz local, native redujo el startup time y el RSS respecto a los modos JVM. Eso es cierto y reproducible en el tag &lt;code&gt;editorial-final-startup-matrix&lt;/code&gt;. Pero ese número solo no cuenta la historia completa.&lt;/p&gt;

&lt;p&gt;El &lt;strong&gt;build time&lt;/strong&gt; de native es un orden de magnitud mayor que &lt;code&gt;mvn package&lt;/code&gt; clásico. Si estás en un pipeline de CI con deploy frecuente, ese costo aparece en cada merge a main. No es un costo de startup: es un costo de ciclo de desarrollo.&lt;/p&gt;

&lt;p&gt;La &lt;strong&gt;latencia de primer request&lt;/strong&gt; puede diferir materialmente de la latencia warm. En JVM clásica, el primer request paga el costo de clases no cargadas y JIT frío. En native no hay JIT, así que el primer request y el request número mil tienen perfil similar. Eso puede ser una ventaja o una desventaja dependiendo del perfil de carga real.&lt;/p&gt;

&lt;p&gt;El &lt;strong&gt;costo de preparación de AppCDS&lt;/strong&gt; es un tercer momento que aparece solo en el modo &lt;code&gt;cds&lt;/code&gt;: hay una fase de dump del archivo que corre antes de que el contenedor esté listo para tráfico. Operativamente eso implica un paso de inicialización que no existe en los otros modos, y que hay que modelar en el pipeline de deploy si CDS es la opción.&lt;/p&gt;

&lt;p&gt;La &lt;strong&gt;warm latency&lt;/strong&gt; bajo carga sostenida, el comportamiento del GC en memoria alta, y el scheduling en Kubernetes son dimensiones que este lab no mide intencionalmente. Correr tres iteraciones en Docker Desktop sobre WSL2 en Windows no es producción. Lo que el lab sí garantiza es reproducibilidad local: cualquiera puede clonar el repo y reproducir la matriz con:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Windows — corrida editorial completa con 3 runs por modo y native habilitado&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;powershell&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-NoProfile&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-ExecutionPolicy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Bypass&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-File&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;\scripts\run-lab.ps1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Preset&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;editorial&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  La decisión que el número de startup no puede tomar sola
&lt;/h2&gt;

&lt;p&gt;Mi postura después de armar esto: el startup time es útil como tiebreaker cuando todo lo demás está empatado. Usarlo como métrica primaria para elegir entre JVM clásica, AppCDS, AOT-JVM y native es tomar una decisión de arquitectura con un solo eje.&lt;/p&gt;

&lt;p&gt;Lo que sí puedo afirmar con evidencia de esta matriz:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Si el requisito es startup alrededor de 1,4 segundos y RSS controlado en esta matriz, native entrega eso, pero pagás con build time mayor y pérdida de JIT en warm.&lt;/li&gt;
&lt;li&gt;Si el equipo necesita ciclos de CI rápidos y el startup actual es tolerable, AOT-JVM con &lt;code&gt;-Dspring.aot.enabled=true&lt;/code&gt; mejora el arranque sin cambiar el artefacto de deploy.&lt;/li&gt;
&lt;li&gt;AppCDS tiene el menor costo de cambio operativo de todos, pero tiene esa fase de preparación que hay que modelar explícitamente.&lt;/li&gt;
&lt;li&gt;JVM clásica todavía es el baseline correcto para cualquier comparativa. Abandonarla sin medir los otros tres ejes es puro vibes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No hay un ganador universal. Hay trade-offs que dependen de cuántas veces por hora escala el servicio, qué tan pesado es el pipeline de CI, y si el equipo puede asumir la complejidad operativa adicional de native.&lt;/p&gt;

&lt;p&gt;El repo está en &lt;a href="https://github.com/JuanTorchia/springboot-jvm-2026" rel="noopener noreferrer"&gt;&lt;code&gt;JuanTorchia/springboot-jvm-2026&lt;/code&gt;&lt;/a&gt;, tag &lt;code&gt;editorial-final-startup-matrix&lt;/code&gt;. Los resultados raw están en &lt;code&gt;results/raw/*.json&lt;/code&gt; y la matriz agregada en &lt;code&gt;results/comparison.md&lt;/code&gt;. Si vas a citarlo, usá el wording del README: &lt;em&gt;"In the &lt;code&gt;editorial-final-startup-matrix&lt;/code&gt; tag of &lt;code&gt;JuanTorchia/springboot-jvm-2026&lt;/code&gt;, measured locally on Windows Docker Desktop/WSL2..."&lt;/em&gt; — ese contexto de entorno no es un disclaimer decorativo, es parte del dato.&lt;/p&gt;

&lt;p&gt;¿Cuál es la dimensión que más te mueve en la decisión entre estos cuatro modos? ¿Build time, warm latency, o compatibilidad de librerías en native?&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Este artículo fue publicado originalmente en &lt;a href="https://juanchi.dev/es/blog/spring-boot-startup-time-2026-graalvm-native-aot-cds" rel="noopener noreferrer"&gt;juanchi.dev&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>spanish</category>
      <category>espanol</category>
      <category>performance</category>
      <category>arquitectura</category>
    </item>
    <item>
      <title>Show HN: Needle distilled Gemini tool calling into 26M parameters — technical read, zero hype</title>
      <dc:creator>Juan Torchia</dc:creator>
      <pubDate>Sun, 17 May 2026 12:30:43 +0000</pubDate>
      <link>https://forem.com/jtorchia/show-hn-needle-distilled-gemini-tool-calling-into-26m-parameters-technical-read-zero-hype-46jo</link>
      <guid>https://forem.com/jtorchia/show-hn-needle-distilled-gemini-tool-calling-into-26m-parameters-technical-read-zero-hype-46jo</guid>
      <description>&lt;h1&gt;
  
  
  Show HN: Needle distilled Gemini tool calling into 26M parameters — technical read, zero hype
&lt;/h1&gt;

&lt;p&gt;I was in the middle of reviewing my Ollama pipeline when the HN post appeared: &lt;em&gt;Needle&lt;/em&gt;, a 26M parameter model distilled from Gemini specifically for tool calling. My first reaction was skeptical. 26M sounds like a toy. Then I read more carefully and understood that the interesting point isn't the size — it's the problem they're actually attacking.&lt;/p&gt;

&lt;p&gt;Here's my technical read. No euphoria, no easy dismissal.&lt;/p&gt;




&lt;h2&gt;
  
  
  The real problem behind Needle and Gemini tool calling distillation
&lt;/h2&gt;

&lt;p&gt;My thesis is this: &lt;strong&gt;the bottleneck in systems with external tools isn't the LLM's general reasoning — it's the parsability of the output&lt;/strong&gt;. If the model produces malformed JSON, calls functions with wrong arguments, or hallucinates tool names that don't exist, the whole system breaks — doesn't matter how "intelligent" the model is at other tasks.&lt;/p&gt;

&lt;p&gt;I ran into this directly while building agent loops with Claude Code. The most fragile part was never the reasoning; it was the reliability of the data contract. It reminded me of when I resisted TypeScript for years thinking types were bureaucracy. Then I understood that most avoidable failures start as poorly expressed data contracts. Tool calling is exactly the same: a model can be brilliant in prose and terrible at respecting a strict JSON schema under latency pressure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Needle attacks that specific point&lt;/strong&gt;: it takes Gemini's tool calling behavior — which is consistent and well-structured — and distills it into a small, specialized model. The hypothesis is that for &lt;em&gt;this specific task&lt;/em&gt;, 26M parameters trained on the right behavior can outperform giant generalist models that were never fine-tuned to respect function schemas with precision.&lt;/p&gt;

&lt;p&gt;Is it true? In their own benchmarks, according to the project repo, yes. In my own real production, I don't know yet — and that difference matters.&lt;/p&gt;




&lt;h2&gt;
  
  
  What knowledge distillation is and why it matters here
&lt;/h2&gt;

&lt;p&gt;Knowledge distillation is a technique where a large model — the &lt;em&gt;teacher&lt;/em&gt; — generates outputs that are then used to train a smaller model — the &lt;em&gt;student&lt;/em&gt;. The student doesn't learn from raw data: it learns to imitate the teacher's behavior on the distributions that matter most.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Simplified concept of the distillation pipeline for tool calling:&lt;/span&gt;
&lt;span class="c"&gt;# 1. Teacher (Gemini) generates thousands of correct tool calling examples&lt;/span&gt;
&lt;span class="c"&gt;# 2. Student (Needle, 26M) trains on those examples&lt;/span&gt;
&lt;span class="c"&gt;# 3. The student learns the teacher's output distribution, not hand-written rules&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For tool calling, this makes particular sense. You don't need the model to know universal history. You need it to, when you hand it this schema:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Tool definition — the model has to respect this 100%&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;search_product&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Searches for a product by ID in the catalog&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;object&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;product_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="na"&gt;include_stock&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;boolean&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;required&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;product_id&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Produce exactly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"search_product"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"arguments"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"product_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SKU-4821"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"include_stock"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Not some creative variation with renamed keys, wrong types, or invented fields. Small generalist models fail at this constantly. If Needle solves it reliably, the use case exists.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to test it in Ollama: a reproducible checklist
&lt;/h2&gt;

&lt;p&gt;If you want to validate whether a model like Needle has a place in your stack, the criterion shouldn't be someone else's benchmark. It should be your own set of tools under your system's real conditions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Step 1: Install Ollama if you haven't&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://ollama.com/install.sh | sh

&lt;span class="c"&gt;# Step 2: When the model is available in the Ollama registry, pull directly&lt;/span&gt;
&lt;span class="c"&gt;# (check availability at https://ollama.com/search)&lt;/span&gt;
ollama pull needle  &lt;span class="c"&gt;# tentative name — verify the official registry&lt;/span&gt;

&lt;span class="c"&gt;# Step 3: Prepare your own tool calling test suite&lt;/span&gt;
&lt;span class="c"&gt;# Don't use the model README's examples; use YOUR real tools&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// tool-calling-test.ts&lt;/span&gt;
&lt;span class="c1"&gt;// Validation criteria I'd use to evaluate any small model&lt;/span&gt;

&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;TestResult&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;case&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;received&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;validJson&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;schemaRespected&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;latencyMs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;evaluateToolCallingModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;cases&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;expectedSchema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;object&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;TestResult&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="na"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;TestResult&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;

  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;testCase&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;cases&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="c1"&gt;// Call the model via Ollama API&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;http://localhost:11434/api/chat&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;testCase&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
        &lt;span class="c1"&gt;// Pass tools as part of the request&lt;/span&gt;
        &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;testCase&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;expectedSchema&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="na"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;latency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;start&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// Validate if the JSON is parseable and if it respects the schema&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;validJson&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;schemaRespected&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;received&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="c1"&gt;// The tool_call should be in message.tool_calls[0]&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;toolCall&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;?.[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
      &lt;span class="nx"&gt;received&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;toolCall&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="nx"&gt;validJson&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;!!&lt;/span&gt;&lt;span class="nx"&gt;toolCall&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="c1"&gt;// Basic schema validation: required keys must be present&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;toolCall&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;toolCall&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;requiredKeys&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;testCase&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;expectedSchema&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nx"&gt;schemaRespected&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;requiredKeys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;every&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;k&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;k&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;received&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;parse error&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;case&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;testCase&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="na"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;testCase&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;expectedSchema&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;received&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;validJson&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;schemaRespected&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;latencyMs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;latency&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;My minimum acceptance criteria for any tool calling model in a real system:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Minimum acceptable&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Valid JSON&lt;/td&gt;
&lt;td&gt;99%+&lt;/td&gt;
&lt;td&gt;A parse error in production breaks the entire flow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Schema respected&lt;/td&gt;
&lt;td&gt;95%+&lt;/td&gt;
&lt;td&gt;Wrong arguments are silently dangerous&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;p95 latency&lt;/td&gt;
&lt;td&gt;&amp;lt; 500ms local&lt;/td&gt;
&lt;td&gt;If it's slower than an external API, you've lost the point&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool name hallucination&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;td&gt;An invented name is a non-recoverable error&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  The limits that the hype doesn't mention
&lt;/h2&gt;

&lt;p&gt;There are three limitations that don't show up in the headlines and that I consider essential before betting on a distilled model in a real system.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First, the teacher's distribution defines the ceiling.&lt;/strong&gt; If Gemini has biases in how it generates tool calls — certain argument patterns, certain naming conventions — the student inherits them unfiltered. This matters if your API has conventions that drift from Gemini's style.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Second, generalization to unseen schemas is an open question.&lt;/strong&gt; A distilled model can be excellent on the patterns it learned and brittle against complex schemas with &lt;code&gt;anyOf&lt;/code&gt;, nested &lt;code&gt;$ref&lt;/code&gt;s, or conditional validations. You have to test it explicitly against your own schemas — don't assume the general benchmark applies.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Third, 26M parameters implies limited context capacity.&lt;/strong&gt; In systems where the prompt includes many tools simultaneously — common in backends with dozens of endpoints exposed as tools — degradation can be significant. That's a hypothesis to validate, not assume.&lt;/p&gt;

&lt;p&gt;None of this invalidates the project. It locates it. The same discipline I applied when reviewing &lt;a href="https://juanchi.dev/en/blog/pnpm-workspaces-ci-cache-github-actions-40-minutes-fix" rel="noopener noreferrer"&gt;pnpm workspaces cache issues in CI&lt;/a&gt; applies here: understand the limit first, then decide if it fits.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where Needle makes sense and where it doesn't
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Scenarios where it makes sense to try Needle:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Local agent pipelines where network latency to external APIs is the bottleneck&lt;/li&gt;
&lt;li&gt;Edge devices or resource-constrained environments where a 26M model fits in memory comfortably&lt;/li&gt;
&lt;li&gt;Systems with a &lt;em&gt;bounded and stable&lt;/em&gt; set of tools — not dozens of shifting schemas&lt;/li&gt;
&lt;li&gt;As a local fallback when external APIs are unavailable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Scenarios where it probably doesn't cut it:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Systems where reasoning between tool calling steps is complex — deciding &lt;em&gt;when&lt;/em&gt; to call which tool, not just &lt;em&gt;how&lt;/em&gt; to call it&lt;/li&gt;
&lt;li&gt;APIs with deeply nested or polymorphic schemas&lt;/li&gt;
&lt;li&gt;Flows where long conversational context matters — the 26M context limit is going to hurt&lt;/li&gt;
&lt;li&gt;Environments that need auditable safety guarantees — a privately distilled model is a considerably more opaque box&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The tension that surfaced in the &lt;a href="https://juanchi.dev/en/blog/spring-boot-actuator-production-endpoints-hardening-checklist" rel="noopener noreferrer"&gt;Spring Boot Actuator in production&lt;/a&gt; post applies differently here: the comfort of "it works in the demo" can hide surface risks that only show up under load or with unexpected inputs.&lt;/p&gt;




&lt;h2&gt;
  
  
  What this signals for the small model ecosystem
&lt;/h2&gt;

&lt;p&gt;The uncomfortable thing about Needle isn't the model itself. It's what it confirms: &lt;strong&gt;functional specialization is going to pressure the hegemony of large general models on structured tasks&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Tool calling, intent classification, entity extraction with fixed schemas — these are tasks where a well-trained distilled model can beat GPT-4 or Claude on cost and latency without sacrificing reliability. That changes the architecture calculation.&lt;/p&gt;

&lt;p&gt;In my current stack with Claude Code for complex reasoning and Ollama for local tasks, there's a gap exactly where Needle would aim: the tool router that decides which function to call and with what arguments, without needing the overhead of a 70B model for that. I'm not saying I'll adopt it tomorrow. I'm saying the category makes sense and the experiment deserves follow-through.&lt;/p&gt;

&lt;p&gt;Same as when I evaluated &lt;a href="https://juanchi.dev/en/blog/spring-security-spring-boot-actuator-authorization-model-production" rel="noopener noreferrer"&gt;Jakarta EE vs Spring Boot tradeoffs&lt;/a&gt; or compared &lt;a href="https://juanchi.dev/en/blog/pnpm-vs-npm-vs-yarn-2026-monorepo-real-benchmark" rel="noopener noreferrer"&gt;package managers in real monorepos&lt;/a&gt;, the honest answer isn't "adopt it now" or "ignore it" — it's "test it against your own criteria before committing."&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ: Needle, distillation, and tool calling in small models
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What exactly is model distillation in the LLM context?&lt;/strong&gt;&lt;br&gt;
It's a process where a large model (&lt;em&gt;teacher&lt;/em&gt;) generates a dataset of correct behavior — in this case, well-formed tool calling examples — which is used to train a small model (&lt;em&gt;student&lt;/em&gt;). The student learns to imitate the teacher's output distribution on the specific tasks it was distilled for, without needing the teacher's full architecture.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is 26M parameters enough for reliable tool calling?&lt;/strong&gt;&lt;br&gt;
Depends on the scope. For a bounded set of tools with simple schemas, probably yes. For systems with dozens of complex tools, long contexts, or multi-step reasoning, it's an open hypothesis. The project's own benchmark is optimistic; validation against your own schemas is mandatory before betting on it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do I test it locally without risking a production system?&lt;/strong&gt;&lt;br&gt;
With Ollama, if the model is available in the registry, it's as simple as &lt;code&gt;ollama pull [name]&lt;/code&gt; and then evaluating with your own script against the schemas you already use. The validation checklist in this post is a starting point. Always against your real tools — never against the README examples.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's the practical difference between Needle and using function calling from OpenAI or Anthropic?&lt;/strong&gt;&lt;br&gt;
Latency, cost, and privacy. A local model has no network RTT, no per-token cost, and doesn't send your tool schemas to an external API. The tradeoff is that reliability depends entirely on the local model's training quality, without the backing of a provider with an SLA.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is it worth it for an individual stack or only for companies with infrastructure?&lt;/strong&gt;&lt;br&gt;
A 26M model runs on a MacBook with 8GB of RAM without drama. This isn't enterprise infrastructure. If you're already using Ollama for other tasks — like I am — adding a specialized model is operationally trivial. The real cost is evaluation time, not hardware.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What happens if the model hallucinates a tool name that doesn't exist in my system?&lt;/strong&gt;&lt;br&gt;
That's the worst case and you have to design for it as an expected failure. The routing layer that consumes the model's output has to validate that the tool call &lt;code&gt;name&lt;/code&gt; corresponds to a registered tool before executing anything. If it doesn't exist, the error has to be explicit and not silent. This is basic defensive design, independent of which model you use.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion: test it with your eyes open
&lt;/h2&gt;

&lt;p&gt;I'm not going to say Needle is the future or that it's noise. My position is more specific: &lt;strong&gt;functional distillation of large model behavior into small specialized models is a legitimate direction, and tool calling is a use case where it makes genuine technical sense&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;What I don't buy is enthusiasm without friction. A 26M model has real limits around context, generalization, and reliability on unseen schemas. Those limits don't appear in the HN post and they will appear in production.&lt;/p&gt;

&lt;p&gt;My concrete recommendation: if you have an agent pipeline with a stable set of tools and latency is a problem, build a test harness with your own schemas, run it against the acceptance criteria in this post, and measure. If it clears 99% valid JSON and 95% schema respected on your own cases, you have something useful. If not, you know exactly why.&lt;/p&gt;

&lt;p&gt;That's more useful than any benchmark someone else wrote.&lt;/p&gt;

&lt;p&gt;Are you using local models for tool calling? Tell me at &lt;a href="https://juanchi.dev" rel="noopener noreferrer"&gt;juanchi.dev&lt;/a&gt; what stack you built and where you hit the limits.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This article was originally published on &lt;a href="https://juanchi.dev/en/blog/needle-gemini-tool-calling-26m-parameters-technical-read" rel="noopener noreferrer"&gt;juanchi.dev&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>english</category>
      <category>typescript</category>
      <category>llm</category>
      <category>ialocal</category>
    </item>
    <item>
      <title>Show HN: Needle distilled Gemini tool calling en 26M parámetros — lectura técnica sin hype</title>
      <dc:creator>Juan Torchia</dc:creator>
      <pubDate>Sun, 17 May 2026 12:30:38 +0000</pubDate>
      <link>https://forem.com/jtorchia/show-hn-needle-distilled-gemini-tool-calling-en-26m-parametros-lectura-tecnica-sin-hype-1i7b</link>
      <guid>https://forem.com/jtorchia/show-hn-needle-distilled-gemini-tool-calling-en-26m-parametros-lectura-tecnica-sin-hype-1i7b</guid>
      <description>&lt;h1&gt;
  
  
  Show HN: Needle distilled Gemini tool calling en 26M parámetros — lectura técnica sin hype
&lt;/h1&gt;

&lt;p&gt;Estaba revisando mi pipeline de Ollama cuando apareció el post en HN: &lt;em&gt;Needle&lt;/em&gt;, un modelo de 26M de parámetros destilado desde Gemini específicamente para tool calling. Mi primera reacción fue escéptica. 26M suena a juguete. Después leí con más calma y entendí que el punto interesante no es el tamaño: es el problema que están atacando.&lt;/p&gt;

&lt;p&gt;Acá va mi lectura técnica, sin euforia y sin descarte fácil.&lt;/p&gt;




&lt;h2&gt;
  
  
  El problema real detrás de Needle y la destilación de Gemini para tool calling
&lt;/h2&gt;

&lt;p&gt;Mi tesis es esta: &lt;strong&gt;el cuello de botella en sistemas con herramientas externas no es el razonamiento general del LLM, sino la parsabilidad del output&lt;/strong&gt;. Si el modelo produce JSON mal formado, llama funciones con argumentos incorrectos o alucina nombres de tools que no existen, el sistema entero se rompe — no importa qué tan "inteligente" sea el modelo en otras tareas.&lt;/p&gt;

&lt;p&gt;Esto lo experimenté directamente mientras armaba loops de agentes con Claude Code. La parte más frágil nunca fue el razonamiento; fue la confiabilidad del contrato de datos. Me acordé de cuando me resistí a TypeScript durante años pensando que los tipos eran burocracia. Después entendí que muchas fallas evitables empiezan como contratos de datos mal expresados. Con tool calling pasa exactamente lo mismo: un modelo puede ser brillante en prosa y pésimo para respetar un esquema JSON estricto bajo presión de latencia.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Needle ataca ese punto específico&lt;/strong&gt;: toma el comportamiento de tool calling de Gemini — que es consistente y bien estructurado — y lo destila en un modelo pequeño y especializado. La hipótesis es que para &lt;em&gt;esta tarea concreta&lt;/em&gt;, 26M entrenados con el comportamiento correcto pueden superar a modelos gigantes generalistas que no fueron ajustados para respetar esquemas de función con precisión.&lt;/p&gt;

&lt;p&gt;¿Es verdad? En benchmarks propios, según el repositorio del proyecto, sí. En producción real propia, no lo sé todavía — y esa diferencia importa.&lt;/p&gt;




&lt;h2&gt;
  
  
  Qué es la destilación de conocimiento y por qué importa aquí
&lt;/h2&gt;

&lt;p&gt;La destilación de conocimiento (&lt;em&gt;knowledge distillation&lt;/em&gt;) es una técnica donde un modelo grande — el &lt;em&gt;teacher&lt;/em&gt; — genera outputs que después se usan para entrenar un modelo pequeño — el &lt;em&gt;student&lt;/em&gt;. El student no aprende de datos crudos: aprende a imitar el comportamiento del teacher en las distribuciones que más importan.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Concepto simplificado del pipeline de destilación para tool calling:&lt;/span&gt;
&lt;span class="c"&gt;# 1. Teacher (Gemini) genera miles de ejemplos de tool calling correcto&lt;/span&gt;
&lt;span class="c"&gt;# 2. Student (Needle, 26M) entrena sobre esos ejemplos&lt;/span&gt;
&lt;span class="c"&gt;# 3. El student aprende la distribución de outputs del teacher, no reglas escritas a mano&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Para tool calling, esto tiene sentido particular. No necesitás que el modelo sepa historia universal. Necesitás que cuando le pasés este schema:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Definición de herramienta — el modelo tiene que respetar esto al 100%&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;buscar_producto&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Busca un producto por ID en el catálogo&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;object&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;producto_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="na"&gt;incluir_stock&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;boolean&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;required&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;producto_id&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;El output sea exactamente:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"buscar_producto"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"arguments"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"producto_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SKU-4821"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"incluir_stock"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Y no alguna variación creativa con claves renombradas, tipos erróneos o campos inventados. En eso los modelos pequeños generalistas fallan bastante. Si Needle lo resuelve de forma confiable, el caso de uso existe.&lt;/p&gt;




&lt;h2&gt;
  
  
  Cómo probarlo en Ollama: checklist reproducible
&lt;/h2&gt;

&lt;p&gt;Si querés validar si un modelo como Needle tiene lugar en tu stack, el criterio no debería ser un benchmark ajeno. Debería ser tu propio conjunto de herramientas bajo las condiciones reales de tu sistema.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Paso 1: Instalar Ollama si no lo tenés&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://ollama.com/install.sh | sh

&lt;span class="c"&gt;# Paso 2: Cuando el modelo esté disponible en Ollama registry, pull directo&lt;/span&gt;
&lt;span class="c"&gt;# (verificar disponibilidad en https://ollama.com/search)&lt;/span&gt;
ollama pull needle  &lt;span class="c"&gt;# nombre tentativo — verificar el registry oficial&lt;/span&gt;

&lt;span class="c"&gt;# Paso 3: Preparar un set de pruebas de tool calling propio&lt;/span&gt;
&lt;span class="c"&gt;# No uses los ejemplos del README del modelo; usá TUS herramientas reales&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// prueba-tool-calling.ts&lt;/span&gt;
&lt;span class="c1"&gt;// Criterios de validación que yo usaría para evaluar cualquier modelo pequeño&lt;/span&gt;

&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;ResultadoPrueba&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;caso&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;esperado&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;obtenido&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;jsonValido&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;schemaRespetado&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;latenciaMs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;evaluarModeloToolCalling&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;modelo&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;casos&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;schemaEsperado&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;object&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;ResultadoPrueba&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="na"&gt;resultados&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ResultadoPrueba&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;

  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;caso&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;casos&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;inicio&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="c1"&gt;// Llamada al modelo vía API de Ollama&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;respuesta&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;http://localhost:11434/api/chat&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;modelo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;caso&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
        &lt;span class="c1"&gt;// Pasar las herramientas como parte del request&lt;/span&gt;
        &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;caso&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;schemaEsperado&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="na"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;respuesta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;latencia&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;inicio&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// Validar si el JSON es parseable y si respeta el schema&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;jsonValido&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;schemaRespetado&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;obtenido&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="c1"&gt;// El tool_call debería estar en message.tool_calls[0]&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;toolCall&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;?.[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
      &lt;span class="nx"&gt;obtenido&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;toolCall&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="nx"&gt;jsonValido&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;!!&lt;/span&gt;&lt;span class="nx"&gt;toolCall&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="c1"&gt;// Validación básica de schema: las claves required tienen que estar presentes&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;toolCall&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;toolCall&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;requiredKeys&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;caso&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;schemaEsperado&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nx"&gt;schemaRespetado&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;requiredKeys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;every&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;k&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;k&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;obtenido&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;parse error&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nx"&gt;resultados&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;caso&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;caso&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="na"&gt;esperado&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;caso&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;schemaEsperado&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;obtenido&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;jsonValido&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;schemaRespetado&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;latenciaMs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;latencia&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;resultados&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Mi criterio mínimo de aceptación para cualquier modelo de tool calling en un sistema real:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Métrica&lt;/th&gt;
&lt;th&gt;Mínimo aceptable&lt;/th&gt;
&lt;th&gt;Por qué&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;JSON válido&lt;/td&gt;
&lt;td&gt;99%+&lt;/td&gt;
&lt;td&gt;Un parse error en producción rompe el flujo entero&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Schema respetado&lt;/td&gt;
&lt;td&gt;95%+&lt;/td&gt;
&lt;td&gt;Argumentos incorrectos son silenciosamente peligrosos&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Latencia p95&lt;/td&gt;
&lt;td&gt;&amp;lt; 500ms local&lt;/td&gt;
&lt;td&gt;Si tarda más que una API externa, perdiste el punto&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hallucination de tool names&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;td&gt;Un nombre inventado es un error no recuperable&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Los límites que el hype no menciona
&lt;/h2&gt;

&lt;p&gt;Hay tres limitaciones que no aparecen en los titulares y que me parecen centrales antes de apostar por un modelo destilado en un sistema real.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Primero, la distribución del teacher define el techo.&lt;/strong&gt; Si Gemini tiene sesgos en cómo genera tool calls — ciertos patrones de argumentos, ciertas convenciones de nombrado — el student los hereda sin filtro. Esto importa si tu API tiene convenciones que se alejan del estilo de Gemini.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Segundo, la generalización a schemas no vistos es una pregunta abierta.&lt;/strong&gt; Un modelo destilado puede ser excelente en los patrones que aprendió y frágil frente a schemas complejos con &lt;code&gt;anyOf&lt;/code&gt;, &lt;code&gt;$ref&lt;/code&gt; anidados o validaciones condicionales. Hay que probarlo explícitamente con los schemas propios, no asumir que el benchmark general aplica.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tercero, el tamaño de 26M parámetros implica capacidad de contexto limitada.&lt;/strong&gt; En sistemas donde el prompt incluye muchas herramientas al mismo tiempo — algo común en backends con docenas de endpoints expuestos como tools — la degradación puede ser significativa. Es una hipótesis que hay que validar, no asumir.&lt;/p&gt;

&lt;p&gt;Esto no invalida el proyecto. Lo ubica. La misma disciplina que apliqué al revisar &lt;a href="https://juanchi.dev/es/blog/pnpm-workspaces-cache-github-actions-ci-problema" rel="noopener noreferrer"&gt;problemas de caché en CI con pnpm workspaces&lt;/a&gt; aplica acá: primero entender el límite, después decidir si encaja.&lt;/p&gt;




&lt;h2&gt;
  
  
  Dónde Needle sí tiene sentido y dónde no
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Escenarios donde tiene sentido probar Needle:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pipelines de agentes locales donde la latencia de red hacia APIs externas es el cuello de botella&lt;/li&gt;
&lt;li&gt;Edge devices o entornos con recursos limitados donde un modelo de 26M entra en memoria cómodamente&lt;/li&gt;
&lt;li&gt;Sistemas con un conjunto &lt;em&gt;acotado y estable&lt;/em&gt; de herramientas — no docenas de schemas cambiantes&lt;/li&gt;
&lt;li&gt;Como fallback local cuando las APIs externas no están disponibles&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Escenarios donde probablemente no alcanza:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sistemas donde el razonamiento entre pasos de tool calling es complejo — decidir &lt;em&gt;cuándo&lt;/em&gt; llamar qué tool, no solo &lt;em&gt;cómo&lt;/em&gt; llamarla&lt;/li&gt;
&lt;li&gt;APIs con schemas profundamente anidados o polimórficos&lt;/li&gt;
&lt;li&gt;Flujos donde el contexto conversacional largo importa — el límite de contexto de 26M va a doler&lt;/li&gt;
&lt;li&gt;Entornos que necesitan garantías de seguridad auditables — un modelo destilado privado es una caja más opaca&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;La tensión que señaló el post de &lt;a href="https://juanchi.dev/es/blog/spring-boot-actuator-endpoints-seguridad-produccion" rel="noopener noreferrer"&gt;Spring Boot Actuator en producción&lt;/a&gt; aplica de otra manera acá: la comodidad de "funciona en el demo" puede esconder riesgos de superficie que solo aparecen bajo carga o con inputs inesperados.&lt;/p&gt;




&lt;h2&gt;
  
  
  Lo que esto anticipa para el ecosistema de modelos pequeños
&lt;/h2&gt;

&lt;p&gt;Lo incómodo de Needle no es el modelo en sí. Es lo que confirma: &lt;strong&gt;la especialización funcional va a presionar la hegemonía de los modelos grandes generales en tareas estructuradas&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Tool calling, clasificación de intents, extracción de entidades con schema fijo — son tareas donde un modelo destilado bien entrenado puede ganarle a GPT-4 o Claude en costo y latencia sin sacrificar confiabilidad. Eso cambia el cálculo de arquitectura.&lt;/p&gt;

&lt;p&gt;En mi stack actual con Claude Code para razonamiento complejo y Ollama para tareas locales, hay un hueco exactamente donde Needle apuntaría: el router de herramientas que decide qué función llamar y con qué argumentos, sin necesitar el overhead de un modelo de 70B para eso. No digo que lo vaya a adoptar mañana. Digo que la categoría tiene sentido y que el experimento merece seguimiento.&lt;/p&gt;

&lt;p&gt;Al igual que cuando evalué &lt;a href="https://juanchi.dev/es/blog/spring-boot-actuator-security-spring-security-produccion-modelo-autorizacion" rel="noopener noreferrer"&gt;tradeoffs de Jakarta EE vs Spring Boot&lt;/a&gt; o comparé &lt;a href="https://juanchi.dev/es/blog/pnpm-vs-npm-2026-monorepo-benchmark-real" rel="noopener noreferrer"&gt;gestores de paquetes en monorepos reales&lt;/a&gt;, la respuesta honesta no es "adoptalo ya" ni "ignoralo": es "probalo con tus propios criterios antes de comprometerte".&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ: Needle, destilación y tool calling en modelos pequeños
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;¿Qué es exactamente la destilación de modelos en el contexto de LLMs?&lt;/strong&gt;&lt;br&gt;
Es un proceso donde un modelo grande (&lt;em&gt;teacher&lt;/em&gt;) genera un dataset de comportamiento correcto — en este caso, ejemplos de tool calling bien formados — que se usa para entrenar un modelo pequeño (&lt;em&gt;student&lt;/em&gt;). El student aprende a imitar la distribución de outputs del teacher en las tareas específicas para las que fue destilado, sin necesitar la arquitectura completa del teacher.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;¿26M parámetros es suficiente para tool calling confiable?&lt;/strong&gt;&lt;br&gt;
Depende del scope. Para un conjunto acotado de herramientas con schemas simples, probablemente sí. Para sistemas con docenas de herramientas complejas, contextos largos o razonamiento multi-paso, es una hipótesis abierta. El benchmark del proyecto es optimista; la validación con schemas propios es obligatoria antes de apostar.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;¿Cómo lo pruebo localmente sin comprometer un sistema en producción?&lt;/strong&gt;&lt;br&gt;
Con Ollama, si el modelo está disponible en el registry, es tan simple como &lt;code&gt;ollama pull [nombre]&lt;/code&gt; y después evaluar con un script propio contra los schemas que ya usás. El checklist de validación de este post es un punto de partida. Siempre contra tus herramientas reales, nunca contra los ejemplos del README.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;¿Cuál es la diferencia práctica entre Needle y usar function calling de OpenAI o Anthropic?&lt;/strong&gt;&lt;br&gt;
Latencia, costo y privacidad. Un modelo local no tiene RTT de red, no tiene costo por token y no manda los schemas de tus herramientas a una API externa. La contrapartida es que la confiabilidad depende enteramente de la calidad del entrenamiento del modelo local, sin el respaldo de un proveedor con SLA.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;¿Vale la pena para un stack individual o solo para empresas con infraestructura?&lt;/strong&gt;&lt;br&gt;
Un modelo de 26M entra en una MacBook con 8GB de RAM sin drama. No es infraestructura de empresa. Si ya usás Ollama para otras tareas — como yo — agregar un modelo especializado es operativamente trivial. El costo real es el tiempo de evaluación, no el hardware.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;¿Qué pasa si el modelo alucina un nombre de herramienta que no existe en mi sistema?&lt;/strong&gt;&lt;br&gt;
Es el peor caso y hay que diseñarlo como falla esperada. La capa de routing que consume el output del modelo tiene que validar que el &lt;code&gt;name&lt;/code&gt; de la tool call corresponda a una herramienta registrada antes de ejecutar. Si no existe, el error tiene que ser explícito y no silencioso. Esto es diseño defensivo básico, independiente del modelo que uses.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusión: probalo con los ojos abiertos
&lt;/h2&gt;

&lt;p&gt;No voy a decir que Needle es el futuro ni que es ruido. Mi postura es más específica: &lt;strong&gt;la destilación funcional de comportamiento de modelos grandes en modelos pequeños especializados es una dirección legítima, y tool calling es un caso de uso donde tiene sentido técnico genuino&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Lo que no compro es el entusiasmo sin fricción. Un modelo de 26M tiene límites reales de contexto, de generalización y de confiabilidad bajo schemas no vistos. Esos límites no aparecen en el post de HN y aparecerán en producción.&lt;/p&gt;

&lt;p&gt;Mi recomendación concreta: si tenés un pipeline de agentes con un conjunto estable de herramientas y latencia es un problema, armá un harness de prueba con los schemas propios, correlo contra los criterios de aceptación del post y medí. Si pasa el umbral de 99% de JSON válido y 95% de schema respetado en tus propios casos, tenés algo útil. Si no, sabés exactamente por qué.&lt;/p&gt;

&lt;p&gt;Eso es más útil que cualquier benchmark ajeno.&lt;/p&gt;

&lt;p&gt;¿Estás usando modelos locales para tool calling? Contame en &lt;a href="https://juanchi.dev" rel="noopener noreferrer"&gt;juanchi.dev&lt;/a&gt; qué stack armaste y dónde encontraste los límites.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Este artículo fue publicado originalmente en &lt;a href="https://juanchi.dev/es/blog/show-needle-distilled-gemini-tool-calling-modelo-pequeno-analisis" rel="noopener noreferrer"&gt;juanchi.dev&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>spanish</category>
      <category>espanol</category>
      <category>typescript</category>
      <category>llm</category>
    </item>
    <item>
      <title>OpenTelemetry on Spring Boot 3: when logs say OK and traces show the problem</title>
      <dc:creator>Juan Torchia</dc:creator>
      <pubDate>Sat, 16 May 2026 19:06:15 +0000</pubDate>
      <link>https://forem.com/jtorchia/opentelemetry-on-spring-boot-3-when-logs-say-ok-and-traces-show-the-problem-193o</link>
      <guid>https://forem.com/jtorchia/opentelemetry-on-spring-boot-3-when-logs-say-ok-and-traces-show-the-problem-193o</guid>
      <description>&lt;p&gt;There's a question I've asked myself many times while debugging backend systems: did the request take long because the DB was slow, because the downstream kept us waiting, or because some internal loop fired 60 queries to fetch 60 records? The log says &lt;code&gt;duration_ms=340&lt;/code&gt; and &lt;code&gt;status=200&lt;/code&gt;. That's it. You start guessing.&lt;/p&gt;

&lt;p&gt;That moment of uncertainty is where this lab came from. Not to measure OpenTelemetry overhead, not to compare Jaeger against Tempo, but to answer something more concrete: what signals do you lose when you only have good logs, and what shows up when you add a trace?&lt;/p&gt;

&lt;p&gt;The repo is at &lt;a href="https://github.com/JuanTorchia/opentelemetry-spring-boot-lab" rel="noopener noreferrer"&gt;github.com/JuanTorchia/opentelemetry-spring-boot-lab&lt;/a&gt;, commit &lt;code&gt;c12ea4e848dc431c8bbd324318399172302fe053&lt;/code&gt;, tag &lt;code&gt;editorial-final-diagnosis-comparison-v2&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The setup: a lab that produces evidence, not benchmarks
&lt;/h2&gt;

&lt;p&gt;The stack is Spring Boot 3.5.7, Java 21, PostgreSQL 16, OpenTelemetry API 1.43.0, OpenTelemetry Java Agent 2.9.0, and Jaeger all-in-one. Everything starts with Docker Compose. To reproduce it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Quick smoke test with small dataset (1k tasks)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;\scripts\run-lab.ps1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Mode&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;smoke&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Size&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;small&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# Full editorial run (50k tasks, 200 requests, warmup 20, concurrency 8)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;\scripts\run-lab.ps1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Mode&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;editorial&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Size&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;editorial&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Runs&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Requests&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;200&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Warmup&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;20&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Concurrency&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;8&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The runner starts Compose, downloads the agent into &lt;code&gt;tools/&lt;/code&gt;, packages the jar, seeds Postgres with synthetic tables (&lt;code&gt;organizations&lt;/code&gt;, &lt;code&gt;users&lt;/code&gt;, &lt;code&gt;projects&lt;/code&gt;, &lt;code&gt;tasks&lt;/code&gt;, &lt;code&gt;comments&lt;/code&gt;), runs the scenarios, queries Jaeger by &lt;code&gt;traceId&lt;/code&gt;, and regenerates the reports in &lt;code&gt;results/&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Jaeger was chosen for local simplicity: one image, web UI, REST API to query traces by &lt;code&gt;traceId&lt;/code&gt;. Tempo is also valid, but needs more moving parts for a local editorial demo. This is not a production stack recommendation.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;editorial&lt;/code&gt; dataset has 50,000 tasks. The &lt;code&gt;small&lt;/code&gt; dataset has 1,000. That difference matters so the N+1 produces visible fan-out rather than a microsecond gap that disappears into noise.&lt;/p&gt;

&lt;h2&gt;
  
  
  The instrumentation decision I care about most
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;pom.xml&lt;/code&gt; has &lt;code&gt;opentelemetry-api&lt;/code&gt; as a compile dependency, but the agent arrives at runtime. That means HTTP server, HTTP client, and JDBC are instrumented automatically without touching business code.&lt;/p&gt;

&lt;p&gt;Manual spans are used only for business stages that the agent can't infer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// LabService.java — manual span to mark business intent&lt;/span&gt;
&lt;span class="nc"&gt;Span&lt;/span&gt; &lt;span class="n"&gt;span&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tracer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;spanBuilder&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"business.n_plus_one.load_tasks_then_comments"&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;startSpan&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;ignored&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;makeCurrent&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// first fetches tasks, then runs one query per task&lt;/span&gt;
    &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Object&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;tasks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;jdbcTemplate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;queryForList&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
        &lt;span class="s"&gt;"select t.id, t.title, u.display_name as assignee from tasks t "&lt;/span&gt;
        &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"join users u on u.id = t.assignee_id order by t.id limit ?"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Object&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;Long&lt;/span&gt; &lt;span class="n"&gt;taskId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;((&lt;/span&gt;&lt;span class="nc"&gt;Number&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"id"&lt;/span&gt;&lt;span class="o"&gt;)).&lt;/span&gt;&lt;span class="na"&gt;longValue&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
        &lt;span class="c1"&gt;// this query repeats per task → fan-out&lt;/span&gt;
        &lt;span class="nc"&gt;Integer&lt;/span&gt; &lt;span class="n"&gt;comments&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;jdbcTemplate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;queryForObject&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
            &lt;span class="s"&gt;"select count(*) from comments where task_id = ?"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
            &lt;span class="nc"&gt;Integer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;taskId&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="c1"&gt;// ...&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setAttribute&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"lab.n_plus_one.expected_extra_queries"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;enriched&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;size&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;finally&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;end&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That mix is more honest for the post: auto-instrumentation for infrastructure, manual spans to explain intent. If I had used only manual spans, the lab would require observability-specific code in every layer. If I had relied only on the agent, business spans would be invisible.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;logback-spring.xml&lt;/code&gt; injects &lt;code&gt;traceId&lt;/code&gt; and &lt;code&gt;spanId&lt;/code&gt; into every log line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="c"&gt;&amp;lt;!-- logback-spring.xml --&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;pattern&amp;gt;&lt;/span&gt;%d{yyyy-MM-dd'T'HH:mm:ss.SSSXXX} %-5level traceId=%X{trace_id:-none} spanId=%X{span_id:-none} %logger{36} - %msg%n&lt;span class="nt"&gt;&amp;lt;/pattern&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's what connects both worlds. A log with &lt;code&gt;traceId&lt;/code&gt; lets you jump directly to the trace in Jaeger. Without it, logs and traces are islands.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Matrix That Summarizes The Diagnosis
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;p95&lt;/th&gt;
&lt;th&gt;Avg spans&lt;/th&gt;
&lt;th&gt;Avg DB spans&lt;/th&gt;
&lt;th&gt;Error spans/request&lt;/th&gt;
&lt;th&gt;Defensible diagnosis&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;baseline&lt;/td&gt;
&lt;td&gt;55 ms&lt;/td&gt;
&lt;td&gt;3.04&lt;/td&gt;
&lt;td&gt;1.04&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;Healthy request, no weird story.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;optimized&lt;/td&gt;
&lt;td&gt;59 ms&lt;/td&gt;
&lt;td&gt;3.04&lt;/td&gt;
&lt;td&gt;1.04&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;Same functional shape, no DB fan-out.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;n-plus-one&lt;/td&gt;
&lt;td&gt;209 ms&lt;/td&gt;
&lt;td&gt;63.38&lt;/td&gt;
&lt;td&gt;61.38&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;DB fan-out visible inside one request.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;downstream-slow&lt;/td&gt;
&lt;td&gt;374 ms&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;Time concentrates in downstream.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;mixed&lt;/td&gt;
&lt;td&gt;395 ms&lt;/td&gt;
&lt;td&gt;7.57&lt;/td&gt;
&lt;td&gt;1.57&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;DB, downstream, and transformation compete.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;partial-error&lt;/td&gt;
&lt;td&gt;184 ms&lt;/td&gt;
&lt;td&gt;6.27&lt;/td&gt;
&lt;td&gt;1.27&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Downstream error inside a partial response.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This table is not trying to crown a tool. It summarizes which signals are available for diagnosis. The strong point is not that one number is universal: it is that N+1 leaves a very different shape than the optimized case, and that shape does not appear in a flat log unless you enable SQL debug.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the six scenarios reveal
&lt;/h2&gt;

&lt;p&gt;The lab has six endpoints: &lt;code&gt;baseline&lt;/code&gt;, &lt;code&gt;n-plus-one&lt;/code&gt;, &lt;code&gt;optimized&lt;/code&gt;, &lt;code&gt;downstream-slow&lt;/code&gt;, &lt;code&gt;mixed&lt;/code&gt;, and &lt;code&gt;partial-error&lt;/code&gt;. Each produces different signals that the runner consolidates into &lt;code&gt;results/comparison.md&lt;/code&gt; and &lt;code&gt;results/diagnosis-comparison.md&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The finding I most want to defend:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;N+1 vs optimized&lt;/strong&gt;: both return the same response shape. The log for both says &lt;code&gt;status=200&lt;/code&gt;. The difference lives in the trace: &lt;code&gt;n-plus-one&lt;/code&gt; generates an average of &lt;strong&gt;63.38 spans&lt;/strong&gt; per request in the editorial run; &lt;code&gt;optimized&lt;/code&gt; generates &lt;strong&gt;3.04&lt;/strong&gt;. That's not a universal performance claim — it's a diagnostic signal. With only logs and no SQL debug enabled, the difference is ambiguous. With the trace, DB fan-out is visible without extra configuration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Downstream-slow&lt;/strong&gt;: p95 sits at &lt;strong&gt;374 ms&lt;/strong&gt;, very close to the configured 300 ms delay. Logs show total duration and &lt;code&gt;traceId&lt;/code&gt;. What they don't show is where that time went: was it DB? was it the downstream? was it in-memory transformation? The trace separates it: the downstream HTTP client span dominates the hierarchy. The local DB appears as a secondary span with low duration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mixed&lt;/strong&gt;: this is where flat logs fail the most. Three stages compete (DB, downstream, transformation) and none is obviously dominant. p95 reaches &lt;strong&gt;395 ms&lt;/strong&gt;. The trace shows the temporal distribution per stage. The log just says it was slow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Partial-error&lt;/strong&gt;: the endpoint responds with HTTP 206 (partial content). The log records &lt;code&gt;traceId&lt;/code&gt;, status, and error type. The trace goes further: the downstream span is marked with error, nested under a request that technically responded. Logs and trace don't replace each other here — they complement. The log alerts and lets you correlate. The trace places the error in the causal hierarchy.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Screenshot That Changed The Diagnosis
&lt;/h2&gt;

&lt;p&gt;In Jaeger, &lt;code&gt;n-plus-one&lt;/code&gt; does not look like a request that is merely a bit slower. It looks like a request with DB fan-out: many repeated spans under the same business operation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzrwcrw0nacyetckt1vt9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzrwcrw0nacyetckt1vt9.png" alt="Jaeger trace showing DB fan-out in the N+1 scenario" width="800" height="611"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The optimized case, on the other hand, keeps a compact shape. I do not need to read the code to suspect that the previous case was not "Postgres is slow" in the abstract, but the query shape.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fldud8cw9ah8pwqq1rbya.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fldud8cw9ah8pwqq1rbya.png" alt="Jaeger trace for the optimized scenario" width="800" height="611"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The partial-error case matters for another reason: the request can respond, while the downstream span is marked as errored. That nuance is exactly where logs and traces complement each other: the log alerts, the trace locates.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmbz79pyo9mc44a8u34in.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmbz79pyo9mc44a8u34in.png" alt="Jaeger trace with partial downstream error marked" width="800" height="611"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The honest limits of the metrics
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;*_vs_root_pct&lt;/code&gt; fields in &lt;code&gt;results/diagnosis-comparison.md&lt;/code&gt; are cumulative percentages of span durations exported by Jaeger. They can exceed 100% when there are nested spans, client/server pairs, or overlap. The &lt;code&gt;duration_denominator_type&lt;/code&gt; field indicates what was used as the denominator: &lt;code&gt;root_span&lt;/code&gt;, &lt;code&gt;http_request_span&lt;/code&gt;, or &lt;code&gt;largest_observed_span&lt;/code&gt; if the trace was ambiguous.&lt;/p&gt;

&lt;p&gt;These are not overhead numbers. They are not an exact distribution of real request time. They are cumulative diagnostic signals. Treating them like CPU percentages would be a misread that this lab doesn't try to encourage.&lt;/p&gt;

&lt;p&gt;Similarly, &lt;code&gt;diagnosis_confidence_*&lt;/code&gt; is an editorial classification coded in &lt;code&gt;ScenarioDiagnosis.java&lt;/code&gt;, not an automatically measured metric. For N+1, &lt;code&gt;diagnosisConfidenceLogs&lt;/code&gt; is &lt;code&gt;low&lt;/code&gt; and &lt;code&gt;diagnosisConfidenceTrace&lt;/code&gt; is &lt;code&gt;high&lt;/code&gt;. That reflects the fact that without SQL debug, the log is ambiguous. It's not a universal benchmark of which tool is better.&lt;/p&gt;

&lt;h2&gt;
  
  
  My position: what I accept and what I don't buy
&lt;/h2&gt;

&lt;p&gt;I accept that OpenTelemetry with the Java Agent is a reasonable way to add structural visibility to a Spring Boot 3 app without polluting business code. JDBC and HTTP client auto-instrumentation works well for common scenarios.&lt;/p&gt;

&lt;p&gt;I don't buy the narrative that traces replace logs. The lab's &lt;code&gt;RequestCompletionLoggingFilter&lt;/code&gt; is a Servlet filter that records every completed request with scenario, method, path, status, and duration. Those logs are operationally useful even when Jaeger is unavailable. The &lt;code&gt;traceId&lt;/code&gt; in the log is the bridge, not the replacement.&lt;/p&gt;

&lt;p&gt;I also don't buy that Jaeger is the only valid option. It was chosen because it starts with one image and has a ready web UI. Tempo, Zipkin, or any OTLP-compatible backend would solve the same problem in this context.&lt;/p&gt;

&lt;p&gt;The honest trade-off is this: auto-instrumentation reduces accidental work but adds an agent on the classpath that exports data in the background. In a local lab that's trivial. In production, agent overhead depends on load, exporter configuration, and sampling. This lab doesn't measure that, and claiming otherwise would be misleading.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to do with this
&lt;/h2&gt;

&lt;p&gt;If you already have structured logs in production with &lt;code&gt;traceId&lt;/code&gt; and &lt;code&gt;spanId&lt;/code&gt;, the next step isn't replacing anything. It's adding a trace backend and connecting both worlds. The lab shows that Spring Boot 3 auto-instrumentation with the Java Agent is enough for common scenarios, and that manual spans only make sense when you want to name business intent that the agent can't infer.&lt;/p&gt;

&lt;p&gt;If you're evaluating whether the effort is worth it: the case where it's most clearly justified isn't the healthy baseline. It's the mixed scenario or the N+1, where logs give you a number and the trace gives you a shape. The difference between guessing and diagnosing.&lt;/p&gt;

&lt;p&gt;After this lab, my rule is simple: logs tell you what happened; traces help you understand how it happened. If the flat log only gives you total duration, you do not have an explanation yet. You have a clue.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This article was originally published on &lt;a href="https://juanchi.dev/en/blog/opentelemetry-spring-boot-logs-vs-traces-diagnosis" rel="noopener noreferrer"&gt;juanchi.dev&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>english</category>
      <category>experimentos</category>
      <category>backend</category>
      <category>observabilidad</category>
    </item>
    <item>
      <title>OpenTelemetry en Spring Boot 3: cuando el log dice OK y el trace muestra el problema</title>
      <dc:creator>Juan Torchia</dc:creator>
      <pubDate>Sat, 16 May 2026 19:06:09 +0000</pubDate>
      <link>https://forem.com/jtorchia/opentelemetry-en-spring-boot-3-cuando-el-log-dice-ok-y-el-trace-muestra-el-problema-4639</link>
      <guid>https://forem.com/jtorchia/opentelemetry-en-spring-boot-3-cuando-el-log-dice-ok-y-el-trace-muestra-el-problema-4639</guid>
      <description>&lt;p&gt;Hay una pregunta que me hice muchas veces debuggeando sistemas backend: ¿la request tardó porque la DB fue lenta, porque el downstream nos clavó, o porque algún loop interno disparó 60 queries para traer 60 registros? El log dice &lt;code&gt;duration_ms=340&lt;/code&gt; y &lt;code&gt;status=200&lt;/code&gt;. Eso es todo. Empezás a adivinar.&lt;/p&gt;

&lt;p&gt;Ese momento de incertidumbre fue el origen de este laboratorio. No para medir overhead de OpenTelemetry, no para comparar Jaeger contra Tempo, sino para responder algo más concreto: ¿qué señales perdés cuando solo tenés logs buenos, y qué aparece cuando sumás un trace?&lt;/p&gt;

&lt;p&gt;El repo está en &lt;a href="https://github.com/JuanTorchia/opentelemetry-spring-boot-lab" rel="noopener noreferrer"&gt;github.com/JuanTorchia/opentelemetry-spring-boot-lab&lt;/a&gt;, commit &lt;code&gt;c12ea4e848dc431c8bbd324318399172302fe053&lt;/code&gt;, tag &lt;code&gt;editorial-final-diagnosis-comparison-v2&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  El setup: un laboratorio que produce evidencia, no benchmarks
&lt;/h2&gt;

&lt;p&gt;El stack es Spring Boot 3.5.7, Java 21, PostgreSQL 16, OpenTelemetry API 1.43.0, OpenTelemetry Java Agent 2.9.0 y Jaeger all-in-one. Todo levanta con Docker Compose. Para reproducirlo:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Smoke rápido con dataset pequeño (1k tasks)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;\scripts\run-lab.ps1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Mode&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;smoke&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Size&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;small&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# Corrida editorial completa (50k tasks, 200 requests, warmup 20, concurrencia 8)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;\scripts\run-lab.ps1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Mode&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;editorial&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Size&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;editorial&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Runs&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Requests&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;200&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Warmup&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;20&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Concurrency&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;8&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;El runner levanta Compose, descarga el agente en &lt;code&gt;tools/&lt;/code&gt;, empaqueta el jar, seedea Postgres con tablas sintéticas (&lt;code&gt;organizations&lt;/code&gt;, &lt;code&gt;users&lt;/code&gt;, &lt;code&gt;projects&lt;/code&gt;, &lt;code&gt;tasks&lt;/code&gt;, &lt;code&gt;comments&lt;/code&gt;), ejecuta los escenarios, consulta Jaeger por &lt;code&gt;traceId&lt;/code&gt; y regenera los reportes en &lt;code&gt;results/&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Jaeger fue elegido por simplicidad local: una imagen, UI web, API REST para consultar traces por &lt;code&gt;traceId&lt;/code&gt;. Tempo también es válido, pero necesita más piezas para una demo editorial local. No es una recomendación de stack productivo.&lt;/p&gt;

&lt;p&gt;El dataset &lt;code&gt;editorial&lt;/code&gt; tiene 50.000 tasks. El &lt;code&gt;small&lt;/code&gt; tiene 1.000. La diferencia importa para que el N+1 produzca fan-out visible y no una diferencia de microsegundos que desaparece en el ruido.&lt;/p&gt;

&lt;h2&gt;
  
  
  La decisión de instrumentación que más me importa
&lt;/h2&gt;

&lt;p&gt;El &lt;code&gt;pom.xml&lt;/code&gt; tiene &lt;code&gt;opentelemetry-api&lt;/code&gt; como dependencia de compilación, pero el agente llega en runtime. Eso significa que HTTP server, HTTP client y JDBC se instrumentan automáticamente sin tocar el código de negocio.&lt;/p&gt;

&lt;p&gt;Los spans manuales se usan solo para etapas de negocio que el agente no puede inferir:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// LabService.java — span manual para marcar intención de negocio&lt;/span&gt;
&lt;span class="nc"&gt;Span&lt;/span&gt; &lt;span class="n"&gt;span&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tracer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;spanBuilder&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"business.n_plus_one.load_tasks_then_comments"&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;startSpan&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;ignored&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;makeCurrent&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// primero trae tasks, luego hace una query por cada una&lt;/span&gt;
    &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Object&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;tasks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;jdbcTemplate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;queryForList&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
        &lt;span class="s"&gt;"select t.id, t.title, u.display_name as assignee from tasks t "&lt;/span&gt;
        &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"join users u on u.id = t.assignee_id order by t.id limit ?"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Object&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;Long&lt;/span&gt; &lt;span class="n"&gt;taskId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;((&lt;/span&gt;&lt;span class="nc"&gt;Number&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"id"&lt;/span&gt;&lt;span class="o"&gt;)).&lt;/span&gt;&lt;span class="na"&gt;longValue&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
        &lt;span class="c1"&gt;// esta query se repite por cada task → fan-out&lt;/span&gt;
        &lt;span class="nc"&gt;Integer&lt;/span&gt; &lt;span class="n"&gt;comments&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;jdbcTemplate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;queryForObject&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
            &lt;span class="s"&gt;"select count(*) from comments where task_id = ?"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
            &lt;span class="nc"&gt;Integer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;taskId&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="c1"&gt;// ...&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setAttribute&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"lab.n_plus_one.expected_extra_queries"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;enriched&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;size&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;finally&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;end&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Esa mezcla es más honesta para el post: auto-instrumentación para infraestructura, spans manuales para explicar intención. Si hubiera usado solo spans manuales, el lab requeriría código específico de observabilidad en cada capa. Si hubiera confiado solo en el agente, los spans de negocio serían invisibles.&lt;/p&gt;

&lt;p&gt;El &lt;code&gt;logback-spring.xml&lt;/code&gt; inyecta &lt;code&gt;traceId&lt;/code&gt; y &lt;code&gt;spanId&lt;/code&gt; en cada línea de log:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="c"&gt;&amp;lt;!-- logback-spring.xml --&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;pattern&amp;gt;&lt;/span&gt;%d{yyyy-MM-dd'T'HH:mm:ss.SSSXXX} %-5level traceId=%X{trace_id:-none} spanId=%X{span_id:-none} %logger{36} - %msg%n&lt;span class="nt"&gt;&amp;lt;/pattern&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Eso es lo que conecta ambos mundos. Un log con &lt;code&gt;traceId&lt;/code&gt; te permite saltar directo al trace en Jaeger. Sin eso, logs y traces son islas.&lt;/p&gt;

&lt;h2&gt;
  
  
  La matriz que resume el diagnóstico
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Escenario&lt;/th&gt;
&lt;th&gt;p95&lt;/th&gt;
&lt;th&gt;Spans promedio&lt;/th&gt;
&lt;th&gt;DB spans promedio&lt;/th&gt;
&lt;th&gt;Error spans/request&lt;/th&gt;
&lt;th&gt;Diagnóstico defendible&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;baseline&lt;/td&gt;
&lt;td&gt;55 ms&lt;/td&gt;
&lt;td&gt;3,04&lt;/td&gt;
&lt;td&gt;1,04&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;Request sana, sin historia rara.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;optimized&lt;/td&gt;
&lt;td&gt;59 ms&lt;/td&gt;
&lt;td&gt;3,04&lt;/td&gt;
&lt;td&gt;1,04&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;Misma forma funcional, sin fan-out DB.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;n-plus-one&lt;/td&gt;
&lt;td&gt;209 ms&lt;/td&gt;
&lt;td&gt;63,38&lt;/td&gt;
&lt;td&gt;61,38&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;Fan-out DB visible en una sola request.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;downstream-slow&lt;/td&gt;
&lt;td&gt;374 ms&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;El tiempo se concentra en downstream.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;mixed&lt;/td&gt;
&lt;td&gt;395 ms&lt;/td&gt;
&lt;td&gt;7,57&lt;/td&gt;
&lt;td&gt;1,57&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;DB, downstream y transformación compiten.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;partial-error&lt;/td&gt;
&lt;td&gt;184 ms&lt;/td&gt;
&lt;td&gt;6,27&lt;/td&gt;
&lt;td&gt;1,27&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Error downstream dentro de una respuesta parcial.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Esta tabla no intenta coronar una herramienta. Resume qué señales quedan disponibles para diagnosticar. El dato fuerte no es que un número sea universal: es que el N+1 deja una forma muy distinta al caso optimizado, y esa forma no aparece en un log plano sin activar SQL debug.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lo que revelan los seis escenarios
&lt;/h2&gt;

&lt;p&gt;El laboratorio tiene seis endpoints: &lt;code&gt;baseline&lt;/code&gt;, &lt;code&gt;n-plus-one&lt;/code&gt;, &lt;code&gt;optimized&lt;/code&gt;, &lt;code&gt;downstream-slow&lt;/code&gt;, &lt;code&gt;mixed&lt;/code&gt; y &lt;code&gt;partial-error&lt;/code&gt;. Cada uno produce señales diferentes que el runner consolida en &lt;code&gt;results/comparison.md&lt;/code&gt; y &lt;code&gt;results/diagnosis-comparison.md&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;El hallazgo que más me interesa defender:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;N+1 vs optimized&lt;/strong&gt;: ambos devuelven el mismo shape de respuesta. El log de ambos dice &lt;code&gt;status=200&lt;/code&gt;. La diferencia está en el trace: &lt;code&gt;n-plus-one&lt;/code&gt; genera un promedio de &lt;strong&gt;63,38 spans&lt;/strong&gt; por request en la corrida editorial; &lt;code&gt;optimized&lt;/code&gt; genera &lt;strong&gt;3,04&lt;/strong&gt;. Eso no es un claim de performance universal, es una señal diagnóstica. Con solo los logs, sin activar SQL debug, la diferencia es ambigua. Con el trace, el fan-out DB es visible sin configuración extra.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Downstream-slow&lt;/strong&gt;: el p95 está en &lt;strong&gt;374 ms&lt;/strong&gt;, muy cerca del delay configurado de 300 ms. Los logs muestran la duración total y el &lt;code&gt;traceId&lt;/code&gt;. Lo que no muestran es dónde se fue ese tiempo: ¿fue DB? ¿fue el downstream? ¿fue transformación en memoria? El trace lo separa: el span HTTP client del downstream domina la jerarquía. La DB local aparece como span secundario de duración baja.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mixed&lt;/strong&gt;: aquí es donde los logs planos fallan más. Tres etapas compiten (DB, downstream, transformación) y ninguna es dominante de forma obvia. El p95 llega a &lt;strong&gt;395 ms&lt;/strong&gt;. El trace muestra la distribución temporal por etapa. El log solo dice que tardó.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Partial-error&lt;/strong&gt;: el endpoint responde con HTTP 206 (partial content). El log registra el &lt;code&gt;traceId&lt;/code&gt;, el status y el tipo de error. El trace va más lejos: el span del downstream está marcado con error, anidado bajo una request que técnicamente respondió. Logs y trace no se reemplazan acá, se complementan. El log avisa y permite correlacionar. El trace ubica el error en la jerarquía causal.&lt;/p&gt;

&lt;h2&gt;
  
  
  La captura que cambió el diagnóstico
&lt;/h2&gt;

&lt;p&gt;En Jaeger, &lt;code&gt;n-plus-one&lt;/code&gt; no se ve como una request apenas más lenta. Se ve como una request con fan-out DB: muchos spans repetidos bajo una misma operación de negocio.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzrwcrw0nacyetckt1vt9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzrwcrw0nacyetckt1vt9.png" alt="Trace de Jaeger mostrando fan-out DB en el escenario N+1" width="" height=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;El caso optimizado, en cambio, mantiene una forma compacta. No necesito mirar el código para sospechar que el problema del caso anterior no era "Postgres lento" en abstracto, sino el shape de queries.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fldud8cw9ah8pwqq1rbya.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fldud8cw9ah8pwqq1rbya.png" alt="Trace de Jaeger del escenario optimizado" width="800" height="611"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;El caso de error parcial también vale por otra razón: la request puede responder, pero el span del downstream queda marcado con error. Ese matiz es justo donde logs y traces se complementan: el log avisa, el trace ubica.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmbz79pyo9mc44a8u34in.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmbz79pyo9mc44a8u34in.png" alt="Trace de Jaeger con error parcial marcado en downstream" width="800" height="611"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  El límite honesto de las métricas
&lt;/h2&gt;

&lt;p&gt;Los campos &lt;code&gt;*_vs_root_pct&lt;/code&gt; en &lt;code&gt;results/diagnosis-comparison.md&lt;/code&gt; son porcentajes acumulados de duración de spans exportados por Jaeger. Pueden superar el 100% cuando hay spans anidados, pares cliente/servidor o solapamiento. El campo &lt;code&gt;duration_denominator_type&lt;/code&gt; indica qué se usó como denominador: &lt;code&gt;root_span&lt;/code&gt;, &lt;code&gt;http_request_span&lt;/code&gt; o &lt;code&gt;largest_observed_span&lt;/code&gt; si la traza quedó ambigua.&lt;/p&gt;

&lt;p&gt;No son overhead. No son distribución exacta del tiempo real de la request. Son señales diagnósticas acumuladas. Usarlas como si fueran porcentajes de CPU sería un error de interpretación que este lab no intenta fomentar.&lt;/p&gt;

&lt;p&gt;De la misma forma, &lt;code&gt;diagnosis_confidence_*&lt;/code&gt; es una clasificación editorial codificada en &lt;code&gt;ScenarioDiagnosis.java&lt;/code&gt;, no una métrica medida automáticamente. Para N+1, &lt;code&gt;diagnosisConfidenceLogs&lt;/code&gt; es &lt;code&gt;low&lt;/code&gt; y &lt;code&gt;diagnosisConfidenceTrace&lt;/code&gt; es &lt;code&gt;high&lt;/code&gt;. Eso refleja que sin SQL debug, el log es ambiguo. No es un benchmark universal de qué herramienta es mejor.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mi postura: qué acepto y qué no compro
&lt;/h2&gt;

&lt;p&gt;Acepto que OpenTelemetry con el Java Agent es una forma razonable de agregar visibilidad estructural a una app Spring Boot 3 sin ensuciar el código de negocio. La auto-instrumentación de JDBC y HTTP client funciona bien para escenarios comunes.&lt;/p&gt;

&lt;p&gt;No compro la narrativa de que los traces reemplazan los logs. El &lt;code&gt;RequestCompletionLoggingFilter&lt;/code&gt; del lab es un filtro Servlet que registra cada request completada con escenario, método, path, status y duración. Esos logs son operativamente útiles aunque Jaeger no esté disponible. El &lt;code&gt;traceId&lt;/code&gt; en el log es el puente, no el reemplazo.&lt;/p&gt;

&lt;p&gt;Tampoco compro que Jaeger sea la única opción válida. Se eligió porque levanta con una imagen y tiene UI web lista. Tempo, Zipkin o cualquier backend compatible con OTLP resolverían el mismo problema en este contexto.&lt;/p&gt;

&lt;p&gt;El trade-off honesto es este: la auto-instrumentación reduce trabajo accidental pero agrega un agente en el classpath que exporta datos en background. En un laboratorio local eso es trivial. En producción, el overhead del agente depende de la carga, la configuración del exporter y el sampling. Este lab no mide eso, y sería engañoso afirmar que sí.&lt;/p&gt;

&lt;h2&gt;
  
  
  Qué hacer con esto
&lt;/h2&gt;

&lt;p&gt;Si ya tenés logs estructurados en producción con &lt;code&gt;traceId&lt;/code&gt; y &lt;code&gt;spanId&lt;/code&gt;, el paso siguiente no es reemplazar nada. Es agregar el backend de traces y conectar ambos mundos. El lab muestra que la auto-instrumentación de Spring Boot 3 con el Java Agent es suficiente para los escenarios comunes, y que los spans manuales tienen sentido solo cuando querés nombrar intención de negocio que el agente no puede inferir.&lt;/p&gt;

&lt;p&gt;Si estás evaluando si vale la pena el esfuerzo: el caso donde más claramente lo justifica no es el baseline sano. Es el escenario mixto o el N+1, donde los logs te dan un número y el trace te da una forma. La diferencia entre adivinar y diagnosticar.&lt;/p&gt;

&lt;p&gt;Después de este lab, mi regla queda así: logs para saber qué pasó; traces para entender cómo pasó. Si el log plano te da solo duración total, todavía no tenés una explicación. Tenés una pista.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Este artículo fue publicado originalmente en &lt;a href="https://juanchi.dev/es/blog/opentelemetry-spring-boot-logs-vs-traces-diagnostico" rel="noopener noreferrer"&gt;juanchi.dev&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>spanish</category>
      <category>espanol</category>
      <category>experimentos</category>
      <category>backend</category>
    </item>
    <item>
      <title>Prisma vs JDBC: the benchmark that almost made me blame the wrong ORM</title>
      <dc:creator>Juan Torchia</dc:creator>
      <pubDate>Sat, 16 May 2026 01:36:53 +0000</pubDate>
      <link>https://forem.com/jtorchia/prisma-vs-jdbc-the-benchmark-that-almost-made-me-blame-the-wrong-orm-585m</link>
      <guid>https://forem.com/jtorchia/prisma-vs-jdbc-the-benchmark-that-almost-made-me-blame-the-wrong-orm-585m</guid>
      <description>&lt;p&gt;There's a discussion that surfaces every time someone posts an ORM benchmark: "of course JDBC is faster, you're measuring the abstraction". They're right, but only halfway. What nobody says is that the abstraction isn't the only culprit — sometimes the culprit is you, because you let an N+1 slip through without noticing.&lt;/p&gt;

&lt;p&gt;I built &lt;a href="https://github.com/JuanTorchia/prismavsjdbc" rel="noopener noreferrer"&gt;prismavsjdbc&lt;/a&gt; to test this in a controlled way. It's not a benchmark about who wins. It's a lab where the same PostgreSQL 16, the same 50k-task dataset, and the same business scenarios run against two stacks: Node.js 24 LTS + TypeScript + Prisma 5 on one side, and Spring Boot 3 + Java 21 LTS + &lt;code&gt;JdbcTemplate&lt;/code&gt; on the other. The analyzed commit is &lt;code&gt;2cd33e32bd29a1d4b46a26af0b56d6a912f5e4f5&lt;/code&gt;, tag &lt;code&gt;best-effort-editorial-final&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The thesis I'm defending is this: &lt;strong&gt;query shape, SQL/request, and N+1 explain more than the slogan "ORM vs raw SQL"&lt;/strong&gt;. When you optimize the shape, both stacks improve. When you don't, both stacks charge you.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem that almost made me draw the wrong conclusion
&lt;/h2&gt;

&lt;p&gt;The first version of the lab had an obvious trap, even though I didn't see it at first. It compared the most comfortable Prisma implementation — using &lt;code&gt;include&lt;/code&gt; to fetch relations — against a manual join in JDBC. The result was predictable: JDBC measured 1 SQL/request, idiomatic Prisma measured 4 SQL/request on &lt;code&gt;read-by-id&lt;/code&gt;, and latency reflected that.&lt;/p&gt;

&lt;p&gt;Incorrect conclusion I almost published: "Prisma is slower because it emits more queries".&lt;/p&gt;

&lt;p&gt;Correct conclusion: I was comparing different shapes. Prisma's &lt;code&gt;include&lt;/code&gt; fires separate queries per relation — that's not a bug, it's the documented contract of the API. JDBC did a join because I wrote it that way. It's not fair to compare them without acknowledging that.&lt;/p&gt;

&lt;p&gt;That friction changed the entire lab design: I needed three levels within each stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three levels: naive, idiomatic, best-effort
&lt;/h2&gt;

&lt;p&gt;Adding the &lt;code&gt;level&lt;/code&gt; column to &lt;code&gt;results/comparison.csv&lt;/code&gt; was the most important decision in the project. Without it, any results table is a trap for the reader.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;naive&lt;/strong&gt;: the most direct implementation possible, with no thought given to performance. In both stacks, this includes deliberate N+1 — per-task queries inside a loop.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;idiomatic&lt;/strong&gt;: the normal, maintainable way to write code in each stack. Prisma with &lt;code&gt;include&lt;/code&gt; and &lt;code&gt;_count&lt;/code&gt;, JDBC with the join any Java dev would write without obsessing over micro-optimizations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;best-effort&lt;/strong&gt;: the tightest code the team would accept without it becoming a hack. For Prisma, this means dropping to &lt;code&gt;$queryRaw&lt;/code&gt; when the shape is aggregational.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;read-by-id&lt;/code&gt; scenario with idiomatic Prisma measured 4 SQL/request due to &lt;code&gt;include&lt;/code&gt;. The &lt;code&gt;read-by-id-best-effort&lt;/code&gt; variant with &lt;code&gt;$queryRaw&lt;/code&gt; dropped to 1 SQL/request — the same join JDBC uses. The PostgreSQL plan for that query is clean:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- read-by-id-best-effort: same SQL in Prisma $queryRaw and in JdbcTemplate&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;"createdAt"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;"projectId"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;"projectName"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;"organizationId"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;"organizationName"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;"assigneeId"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;display_name&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;"assigneeName"&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tasks&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;projects&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;project_id&lt;/span&gt;
&lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;organizations&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;organization_id&lt;/span&gt;
&lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assignee_id&lt;/span&gt;
&lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'00000000-0000-4000-0100-000000000001'&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;
&lt;span class="k"&gt;limit&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;-- Execution Time: 0.242 ms, Buffers: shared hit=9&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When Prisma and JDBC emit the same SQL, the PostgreSQL plan is identical. That closes the runtime debate: the bottleneck was the shape, not the client.&lt;/p&gt;

&lt;h2&gt;
  
  
  N+1 is the usual villain, but the lab shows it with numbers
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;n-plus-one-trap&lt;/code&gt; scenario exists to make explicit something every developer knows in theory but underestimates in practice. The naive level in both stacks fires individual queries per task — on a 50k-task dataset with concurrency 16, that scales brutally.&lt;/p&gt;

&lt;p&gt;The biggest jump in the lab wasn't between Prisma and JDBC. It was between naive and idiomatic within Prisma. When you go from N+1 to &lt;code&gt;include/_count&lt;/code&gt;, the reduction in SQL/request is immediate and visible in latency. After that, if you want to squeeze more, &lt;code&gt;$queryRaw&lt;/code&gt; gives you another jump — but smaller than the first.&lt;/p&gt;

&lt;p&gt;The interesting part on the Java side is that &lt;code&gt;CountingJdbc&lt;/code&gt; — the wrapper over &lt;code&gt;JdbcTemplate&lt;/code&gt; in &lt;code&gt;apps/jdbc-service/src/main/java/com/example/jdbclab/CountingJdbc.java&lt;/code&gt; — uses an &lt;code&gt;AtomicLong&lt;/code&gt; to count queries. That allows an objective SQL/request comparison without relying on logs or &lt;code&gt;pg_stat_statements&lt;/code&gt; as the primary source:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// CountingJdbc.java — instrumentation with no magic, easy to audit&lt;/span&gt;
&lt;span class="nd"&gt;@Component&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CountingJdbc&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;JdbcTemplate&lt;/span&gt; &lt;span class="n"&gt;jdbc&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;AtomicLong&lt;/span&gt; &lt;span class="n"&gt;queryCount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;AtomicLong&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

  &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;RowMapper&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;mapper&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Object&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// each call to the wrapper adds 1 to the counter&lt;/span&gt;
    &lt;span class="n"&gt;queryCount&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;incrementAndGet&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;jdbc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mapper&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt;

  &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;queryCount&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On the Prisma side, the equivalent lives in &lt;code&gt;apps/prisma-client/src/db.ts&lt;/code&gt;: it hooks into the client's &lt;code&gt;query&lt;/code&gt; event to count. That symmetry in instrumentation is what makes the SQL/request numbers comparable across stacks.&lt;/p&gt;

&lt;h2&gt;
  
  
  When $queryRaw makes sense and when it's a surrender
&lt;/h2&gt;

&lt;p&gt;This is the part where a lot of Prisma posts aren't honest. &lt;code&gt;$queryRaw&lt;/code&gt; exists and is valid, but using it for everything is admitting you don't want to use Prisma — you're using PostgreSQL with a fancy TypeScript client.&lt;/p&gt;

&lt;p&gt;The decision in the lab was clear: best-effort with &lt;code&gt;$queryRaw&lt;/code&gt; makes sense in &lt;code&gt;relation-summary&lt;/code&gt; and &lt;code&gt;report-aggregation&lt;/code&gt; because the shape is genuinely aggregational. Prisma &lt;code&gt;groupBy&lt;/code&gt; doesn't cleanly express &lt;code&gt;date_trunc&lt;/code&gt; + join by organization, and forcing it would be worse than writing SQL.&lt;/p&gt;

&lt;p&gt;By contrast, &lt;code&gt;paginated-list&lt;/code&gt; has no best-effort variant because idiomatic Prisma already emits 1 SQL/request with &lt;code&gt;findMany&lt;/code&gt; and filters. Adding &lt;code&gt;$queryRaw&lt;/code&gt; there wouldn't change anything meaningful — it would be complexity with no benefit.&lt;/p&gt;

&lt;p&gt;The table in &lt;code&gt;docs/brief-post.md&lt;/code&gt; models this well: the &lt;code&gt;level&lt;/code&gt; column isn't a scale of "how much effort you put in" but of "how much the SQL shape changes when you apply the variant".&lt;/p&gt;

&lt;h2&gt;
  
  
  What the lab can't guarantee
&lt;/h2&gt;

&lt;p&gt;The HTTP runner is homegrown — not k6 or wrk. The hardware is local. Docker Desktop, GC, plan cache, and indexes can shift absolute latencies between runs. The editorial run used 3 runs, 300 requests per run, 30 warmup requests, concurrency 16, and a 50k-task dataset — but those numbers on different hardware can produce different results.&lt;/p&gt;

&lt;p&gt;The version matrix (&lt;code&gt;docs/java-version-matrix.md&lt;/code&gt;) shows Java 21 vs Java 25: there are differences, but the main argument — that N+1 and SQL/request dominate — holds on both JVMs. Java 25 improved &lt;code&gt;read-by-id&lt;/code&gt; by ~20% over Java 21 in the local run, but that doesn't change the fact that the problem in &lt;code&gt;relation-summary-naive&lt;/code&gt; was the shape, not the JVM.&lt;/p&gt;

&lt;p&gt;I wouldn't publish those absolute numbers as universal truth. I publish them as evidence of a pattern: when you change the shape, the delta is orders of magnitude larger than when you change the runtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  The position I landed on
&lt;/h2&gt;

&lt;p&gt;Prisma is not slow. Prisma with &lt;code&gt;include&lt;/code&gt; emitting 4 queries where you could emit 1 is an ergonomics trade-off with an observable cost — and that cost is worth it for most endpoints in an API that isn't under extreme pressure. When shape genuinely matters, &lt;code&gt;$queryRaw&lt;/code&gt; exists and works well.&lt;/p&gt;

&lt;p&gt;JDBC with &lt;code&gt;JdbcTemplate&lt;/code&gt; is not superior just because it's raw SQL. It's predictable because the developer controls the shape from the start. The risk is on the other side: that nobody checks whether those Java loops are also doing N+1 without an ORM to blame.&lt;/p&gt;

&lt;p&gt;The lab is reproducible. If you have Docker, Node 24 LTS, and Java 21 or 25, you can run it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# full editorial run — Bash&lt;/span&gt;
bash scripts/run-lab.sh &lt;span class="nt"&gt;--mode&lt;/span&gt; editorial &lt;span class="nt"&gt;--size&lt;/span&gt; editorial &lt;span class="nt"&gt;--runs&lt;/span&gt; 3 &lt;span class="nt"&gt;--requests&lt;/span&gt; 300 &lt;span class="nt"&gt;--warmup&lt;/span&gt; 30 &lt;span class="nt"&gt;--concurrency&lt;/span&gt; 16
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And if you just want to verify the scenarios run without errors before committing time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# quick smoke test to validate the setup&lt;/span&gt;
bash scripts/run-lab.sh &lt;span class="nt"&gt;--mode&lt;/span&gt; smoke &lt;span class="nt"&gt;--size&lt;/span&gt; small
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The code is at &lt;a href="https://github.com/JuanTorchia/prismavsjdbc" rel="noopener noreferrer"&gt;github.com/JuanTorchia/prismavsjdbc&lt;/a&gt;. Editorial results are in &lt;code&gt;results/comparison.csv&lt;/code&gt; and &lt;code&gt;results/comparison.md&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;What I'd like to know: in the stack you're using right now, do you have real visibility into the SQL/request count for each endpoint? Or do you assume the ORM handles it on its own?&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This article was originally published on &lt;a href="https://juanchi.dev/en/blog/prisma-vs-jdbc-benchmark-query-shape-n1" rel="noopener noreferrer"&gt;juanchi.dev&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>english</category>
      <category>experiments</category>
      <category>typescript</category>
      <category>performance</category>
    </item>
    <item>
      <title>Prisma vs JDBC: el benchmark que casi me hace culpar al ORM equivocado</title>
      <dc:creator>Juan Torchia</dc:creator>
      <pubDate>Sat, 16 May 2026 01:36:44 +0000</pubDate>
      <link>https://forem.com/jtorchia/prisma-vs-jdbc-el-benchmark-que-casi-me-hace-culpar-al-orm-equivocado-12hc</link>
      <guid>https://forem.com/jtorchia/prisma-vs-jdbc-el-benchmark-que-casi-me-hace-culpar-al-orm-equivocado-12hc</guid>
      <description>&lt;p&gt;Hay una discusión que aparece cada vez que alguien postea un benchmark de ORM: "claro que JDBC es más rápido, estás midiendo la abstracción". Y tienen razón, pero solo a medias. Lo que nadie dice es que la abstracción no es el único culpable — a veces el culpable sos vos, que dejaste pasar un N+1 sin darte cuenta.&lt;/p&gt;

&lt;p&gt;Armé &lt;a href="https://github.com/JuanTorchia/prismavsjdbc" rel="noopener noreferrer"&gt;prismavsjdbc&lt;/a&gt; para probar esto de forma controlada. No es un benchmark de quién gana. Es un laboratorio donde el mismo PostgreSQL 16, el mismo dataset de 50k tasks y los mismos casos de negocio corren contra dos stacks: Node.js 24 LTS + TypeScript + Prisma 5 por un lado, y Spring Boot 3 + Java 21 LTS + &lt;code&gt;JdbcTemplate&lt;/code&gt; por el otro. El commit analizado es &lt;code&gt;2cd33e32bd29a1d4b46a26af0b56d6a912f5e4f5&lt;/code&gt;, tag &lt;code&gt;best-effort-editorial-final&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;La tesis que defiendo es esta: &lt;strong&gt;query shape, SQL/request y N+1 explican más que el slogan "ORM vs SQL directo"&lt;/strong&gt;. Cuando optimizás el shape, los dos stacks mejoran. Cuando no, los dos te cobran.&lt;/p&gt;

&lt;h2&gt;
  
  
  El problema que casi me hace concluir mal
&lt;/h2&gt;

&lt;p&gt;La primera versión del laboratorio tenía una trampa obvia, aunque no la vi al principio. Comparaba la implementación más cómoda de Prisma — usando &lt;code&gt;include&lt;/code&gt; para traer relaciones — contra un join manual en JDBC. El resultado era predecible: JDBC medía 1 SQL/request, Prisma idiomatic medía 4 SQL/request en &lt;code&gt;read-by-id&lt;/code&gt;, y la latencia lo reflejaba.&lt;/p&gt;

&lt;p&gt;Conclusión incorrecta que casi publico: "Prisma es más lento porque emite más queries".&lt;/p&gt;

&lt;p&gt;Conclusión correcta: estaba comparando shapes distintos. El &lt;code&gt;include&lt;/code&gt; de Prisma hace queries separadas por relación — no es un bug, es el contrato documentado de la API. JDBC hacía un join porque yo lo escribí así. No es fair compararlos sin reconocerlo.&lt;/p&gt;

&lt;p&gt;Esa es la fricción que cambió todo el diseño del lab: necesitaba tres niveles dentro de cada stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tres niveles: naive, idiomatic, best-effort
&lt;/h2&gt;

&lt;p&gt;Agregar la columna &lt;code&gt;level&lt;/code&gt; al &lt;code&gt;results/comparison.csv&lt;/code&gt; fue la decisión más importante del proyecto. Sin ella, cualquier tabla de resultados es una trampa para el lector.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;naive&lt;/strong&gt;: la implementación más directa posible, sin pensar en performance. En ambos stacks, esto incluye N+1 deliberado — consultas por task dentro de un loop.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;idiomatic&lt;/strong&gt;: la forma normal y mantenible de escribir el código en cada stack. Prisma con &lt;code&gt;include&lt;/code&gt; y &lt;code&gt;_count&lt;/code&gt;, JDBC con el join que escribiría cualquier dev Java sin obsesionarse con micro-optimizaciones.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;best-effort&lt;/strong&gt;: el código más ajustado que acepta el equipo sin convertirse en un hack. Para Prisma, esto significa bajar a &lt;code&gt;$queryRaw&lt;/code&gt; cuando el shape es agregacional.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;El escenario &lt;code&gt;read-by-id&lt;/code&gt; con Prisma idiomatic midió 4 SQL/request por el &lt;code&gt;include&lt;/code&gt;. La variante &lt;code&gt;read-by-id-best-effort&lt;/code&gt; con &lt;code&gt;$queryRaw&lt;/code&gt; bajó a 1 SQL/request — el mismo join que usa JDBC. El plan de PostgreSQL para ese query es limpio:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- read-by-id-best-effort: mismo SQL en Prisma $queryRaw y en JdbcTemplate&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;"createdAt"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;"projectId"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;"projectName"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;"organizationId"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;"organizationName"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;"assigneeId"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;display_name&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;"assigneeName"&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tasks&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;projects&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;project_id&lt;/span&gt;
&lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;organizations&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;organization_id&lt;/span&gt;
&lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assignee_id&lt;/span&gt;
&lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'00000000-0000-4000-0100-000000000001'&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;
&lt;span class="k"&gt;limit&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;-- Execution Time: 0.242 ms, Buffers: shared hit=9&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cuando Prisma y JDBC emiten el mismo SQL, el plan de PostgreSQL es idéntico. Eso cierra la discusión del runtime: el cuello de botella era el shape, no el cliente.&lt;/p&gt;

&lt;h2&gt;
  
  
  El N+1 es el villano de siempre, pero el lab lo muestra con números
&lt;/h2&gt;

&lt;p&gt;El escenario &lt;code&gt;n-plus-one-trap&lt;/code&gt; existe para hacer explícito algo que cualquier desarrollador sabe en teoría pero subestima en práctica. El nivel naive en ambos stacks hace consultas individuales por task — en un dataset de 50k tasks con concurrencia 16, eso escala de manera brutal.&lt;/p&gt;

&lt;p&gt;El salto más importante en el lab no fue entre Prisma y JDBC. Fue entre naive e idiomatic dentro de Prisma. Cuando pasás de N+1 a &lt;code&gt;include/_count&lt;/code&gt;, la reducción de SQL/request es inmediata y visible en la latencia. Después, si querés apretarlo más, &lt;code&gt;$queryRaw&lt;/code&gt; te da otro salto — pero menor que el primero.&lt;/p&gt;

&lt;p&gt;Lo interesante del lado Java es que &lt;code&gt;CountingJdbc&lt;/code&gt; — el wrapper sobre &lt;code&gt;JdbcTemplate&lt;/code&gt; que está en &lt;code&gt;apps/jdbc-service/src/main/java/com/example/jdbclab/CountingJdbc.java&lt;/code&gt; — usa un &lt;code&gt;AtomicLong&lt;/code&gt; para contar queries. Eso permite comparar SQL/request de forma objetiva sin depender de logs ni de &lt;code&gt;pg_stat_statements&lt;/code&gt; como fuente principal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// CountingJdbc.java — instrumentación sin magia, fácil de auditar&lt;/span&gt;
&lt;span class="nd"&gt;@Component&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CountingJdbc&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;JdbcTemplate&lt;/span&gt; &lt;span class="n"&gt;jdbc&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;AtomicLong&lt;/span&gt; &lt;span class="n"&gt;queryCount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;AtomicLong&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

  &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;RowMapper&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;mapper&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Object&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// cada llamada al wrapper suma 1 al contador&lt;/span&gt;
    &lt;span class="n"&gt;queryCount&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;incrementAndGet&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;jdbc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mapper&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt;

  &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;queryCount&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Del lado de Prisma, el equivalente está en &lt;code&gt;apps/prisma-client/src/db.ts&lt;/code&gt;: se engancha al evento &lt;code&gt;query&lt;/code&gt; del cliente para contar. Esa simetría en la instrumentación es lo que hace que los números de SQL/request sean comparables entre stacks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cuándo $queryRaw tiene sentido y cuándo es una rendición
&lt;/h2&gt;

&lt;p&gt;Esta es la parte donde muchos posts sobre Prisma no son honestos. &lt;code&gt;$queryRaw&lt;/code&gt; existe y es válido, pero usarlo para todo es admitir que no querés usar Prisma — estás usando PostgreSQL con un cliente TypeScript de lujo.&lt;/p&gt;

&lt;p&gt;La decisión en el lab fue clara: best-effort con &lt;code&gt;$queryRaw&lt;/code&gt; tiene sentido en &lt;code&gt;relation-summary&lt;/code&gt; y &lt;code&gt;report-aggregation&lt;/code&gt; porque el shape es genuinamente agregacional. Prisma &lt;code&gt;groupBy&lt;/code&gt; no expresa limpiamente &lt;code&gt;date_trunc&lt;/code&gt; + join por organization, y forzarlo sería peor que escribir SQL.&lt;/p&gt;

&lt;p&gt;En cambio, &lt;code&gt;paginated-list&lt;/code&gt; no tiene variante best-effort porque Prisma idiomatic ya emite 1 SQL/request con &lt;code&gt;findMany&lt;/code&gt; y filtros. Agregar &lt;code&gt;$queryRaw&lt;/code&gt; ahí no cambiaría nada relevante — sería complejidad sin beneficio.&lt;/p&gt;

&lt;p&gt;La tabla en &lt;code&gt;docs/brief-post.md&lt;/code&gt; lo modela bien: la columna &lt;code&gt;level&lt;/code&gt; no es una escala de "cuánto esfuerzo pusiste" sino de "cuánto cambia el shape SQL cuando aplicás la variante".&lt;/p&gt;

&lt;h2&gt;
  
  
  Lo que el lab no puede garantizar
&lt;/h2&gt;

&lt;p&gt;El runner HTTP es propio — no es k6 ni wrk. El hardware es local. Docker Desktop, GC, plan cache e índices pueden mover las latencias absolutas entre corridas. La corrida editorial usó 3 runs, 300 requests por run, warmup de 30, concurrencia 16 y dataset de 50k tasks, pero esos números en otro hardware pueden dar resultados distintos.&lt;/p&gt;

&lt;p&gt;La matriz de versiones (&lt;code&gt;docs/java-version-matrix.md&lt;/code&gt;) muestra Java 21 vs Java 25: hay diferencias, pero el argumento principal — que N+1 y SQL/request dominan — se mantiene en ambas JVMs. Java 25 mejoró &lt;code&gt;read-by-id&lt;/code&gt; un ~20% sobre Java 21 en la corrida local, pero eso no cambia que el problema en &lt;code&gt;relation-summary-naive&lt;/code&gt; era el shape, no la JVM.&lt;/p&gt;

&lt;p&gt;No publicaría esos números absolutos como verdad universal. Los publico como evidencia de un patrón: cuando cambiás el shape, el delta es órdenes de magnitud mayor que cuando cambiás el runtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  La postura que me quedé
&lt;/h2&gt;

&lt;p&gt;Prisma no es lento. Prisma con &lt;code&gt;include&lt;/code&gt; que emite 4 queries donde podrías emitir 1 es una decisión de ergonomía que tiene un costo observable — y ese costo vale la pena en la mayoría de los endpoints de una API que no está bajo presión extrema. Cuando el shape importa de verdad, &lt;code&gt;$queryRaw&lt;/code&gt; existe y funciona bien.&lt;/p&gt;

&lt;p&gt;JDBC con &lt;code&gt;JdbcTemplate&lt;/code&gt; no es superior por ser SQL directo. Es predecible porque el desarrollador controla el shape desde el primer momento. El riesgo está en el lado opuesto: que nadie revise si esos loops en Java también están haciendo N+1 sin que el ORM sea el chivo expiatorio.&lt;/p&gt;

&lt;p&gt;El lab es reproducible. Si tenés Docker, Node 24 LTS y Java 21 o 25, podés correrlo:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# corrida editorial completa — Bash&lt;/span&gt;
bash scripts/run-lab.sh &lt;span class="nt"&gt;--mode&lt;/span&gt; editorial &lt;span class="nt"&gt;--size&lt;/span&gt; editorial &lt;span class="nt"&gt;--runs&lt;/span&gt; 3 &lt;span class="nt"&gt;--requests&lt;/span&gt; 300 &lt;span class="nt"&gt;--warmup&lt;/span&gt; 30 &lt;span class="nt"&gt;--concurrency&lt;/span&gt; 16
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Y si querés solo verificar que los escenarios corren sin errores antes de comprometer tiempo:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# smoke rápido para validar el setup&lt;/span&gt;
bash scripts/run-lab.sh &lt;span class="nt"&gt;--mode&lt;/span&gt; smoke &lt;span class="nt"&gt;--size&lt;/span&gt; small
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;El código está en &lt;a href="https://github.com/JuanTorchia/prismavsjdbc" rel="noopener noreferrer"&gt;github.com/JuanTorchia/prismavsjdbc&lt;/a&gt;. Los resultados editoriales están en &lt;code&gt;results/comparison.csv&lt;/code&gt; y &lt;code&gt;results/comparison.md&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Lo que me gustaría saber: en el stack que usás ahora mismo, ¿tenés visibilidad real del SQL/request de cada endpoint? ¿O asumís que el ORM lo resuelve solo?&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Este articulo fue publicado originalmente en &lt;a href="https://juanchi.dev/es/blog/prisma-vs-jdbc-benchmark-query-shape-n1" rel="noopener noreferrer"&gt;juanchi.dev&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>spanish</category>
      <category>espanol</category>
      <category>experimentos</category>
      <category>typescript</category>
    </item>
    <item>
      <title>Retry isn't free: budget, amplification, and the cost that never shows up in p95</title>
      <dc:creator>Juan Torchia</dc:creator>
      <pubDate>Fri, 15 May 2026 15:55:35 +0000</pubDate>
      <link>https://forem.com/jtorchia/retry-isnt-free-budget-amplification-and-the-cost-that-never-shows-up-in-p95-ae9</link>
      <guid>https://forem.com/jtorchia/retry-isnt-free-budget-amplification-and-the-cost-that-never-shows-up-in-p95-ae9</guid>
      <description>&lt;p&gt;There's a decision I've gotten wrong more than once: adding retry as if it were a free improvement. Configure three attempts with exponential backoff, the system looks more stable on the dashboard, done. What I wasn't watching was how many extra calls I was sending to the downstream on every failure.&lt;/p&gt;

&lt;p&gt;This post comes from an experiment I built to measure exactly that: when retry buys real availability, when it multiplies pressure, and when it simply changes nothing because the problem isn't transient. The repo is &lt;a href="https://github.com/JuanTorchia/retry-resilience-experiment" rel="noopener noreferrer"&gt;&lt;code&gt;retry-resilience-experiment&lt;/code&gt;&lt;/a&gt;, commit &lt;code&gt;bdfc350&lt;/code&gt;, with Spring Boot 3.3.5, Java 21, Resilience4j 2.2.0, and k6 as the load generator.&lt;/p&gt;

&lt;p&gt;My thesis is simple: retry is budget. Each extra attempt consumes user wait time, hits the real downstream, and can accelerate a degradation that was already in progress. It's not a feature you flip on and call it done.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem with only looking at success rate
&lt;/h2&gt;

&lt;p&gt;When the downstream has simulated random failures at 35%, the difference between policies is visible. With &lt;code&gt;no-retry-standard-timeout&lt;/code&gt;, the success rate in that run was &lt;code&gt;0.6529&lt;/code&gt;. With &lt;code&gt;immediate-retry&lt;/code&gt;, it climbed to &lt;code&gt;0.955&lt;/code&gt;. That looks like a clear win.&lt;/p&gt;

&lt;p&gt;But the number that matters is right next to it: &lt;code&gt;retry_amplification_factor&lt;/code&gt;. With &lt;code&gt;immediate-retry&lt;/code&gt; on &lt;code&gt;random-failures&lt;/code&gt; it reached &lt;code&gt;1.465&lt;/code&gt;. That means for every user request, the system made 1.465 real calls to the downstream. In &lt;code&gt;jitter-random-failures&lt;/code&gt; it was &lt;code&gt;1.471&lt;/code&gt;. The downstream received almost 47% more traffic than k6 generated.&lt;/p&gt;

&lt;p&gt;For transient failures that might be acceptable. The downstream is failing for external reasons, retries land at different moments, and the outcome improves. But that 47% extra isn't abstract: downstream capacity has to exist to absorb it. If the service is already at its limit, that overhead is the nudge that tips it over.&lt;/p&gt;

&lt;p&gt;The metric the repo defines as a contract for not fooling yourself is exactly that:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// MetricSnapshot.java — this line exists to prevent self-deception&lt;/span&gt;
&lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;retryAmplificationFactor&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// downstream_calls / total_requests&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you only look at &lt;code&gt;successRate&lt;/code&gt; and &lt;code&gt;errorRate&lt;/code&gt;, you can believe you won when you actually pushed 47% more load onto a system that was already struggling.&lt;/p&gt;

&lt;h2&gt;
  
  
  progressive-degradation: where retry can accelerate the collapse
&lt;/h2&gt;

&lt;p&gt;This scenario is the most interesting one methodologically, and also the one with the most important warning.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;PROGRESSIVE_DEGRADATION&lt;/code&gt; downstream implements this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// DownstreamScenario.java — delay grows with each real call received&lt;/span&gt;
&lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="no"&gt;PROGRESSIVE_DEGRADATION&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;
    &lt;span class="nc"&gt;Duration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofMillis&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Math&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;min&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;900&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;callNumber&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The delay isn't external or fixed: it grows with &lt;code&gt;callNumber&lt;/code&gt;, which is the counter of real calls to the downstream. That means a policy with more retries generates more calls, and those calls accelerate the degradation. It's not the same failure for everyone: policies with retry degrade faster because they push harder.&lt;/p&gt;

&lt;p&gt;The numbers from the run show this clearly. With &lt;code&gt;no-retry-standard-timeout&lt;/code&gt;, &lt;code&gt;7720&lt;/code&gt; total requests were processed and &lt;code&gt;7720&lt;/code&gt; downstream calls were initiated. With &lt;code&gt;immediate-retry&lt;/code&gt;, total requests dropped to &lt;code&gt;2939&lt;/code&gt; but downstream calls went up to &lt;code&gt;8699&lt;/code&gt;, with an amplification factor of &lt;code&gt;2.96&lt;/code&gt;. The retry policy processed fewer user requests but made more downstream calls.&lt;/p&gt;

&lt;p&gt;To be clear: this isn't a design flaw, it's the point of the experiment. The lab documents it explicitly in &lt;code&gt;docs/brief-post.md&lt;/code&gt;: &lt;code&gt;progressive-degradation&lt;/code&gt; should be read as load-sensitive degradation, not as an identical external failure for all policies. If you treat it as a direct comparison between policies under the same conditions, the conclusion is framed wrong from the start.&lt;/p&gt;

&lt;p&gt;What you can conclude: in scenarios where the degradation rate depends on the volume of calls received, retries can be an accelerant. That has a name in production: retry storm. And the lab reproduces it in a controlled way.&lt;/p&gt;

&lt;h2&gt;
  
  
  The percentiles that lie to you when there are timeouts
&lt;/h2&gt;

&lt;p&gt;There's a technical detail that changed how I read the results, and the README documents it honestly.&lt;/p&gt;

&lt;p&gt;The caller timeout is implemented with &lt;code&gt;future.cancel(true)&lt;/code&gt; in the &lt;code&gt;RetryExecutor&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// RetryExecutor.java — cancel(true) interrupts the attempt from the caller side&lt;/span&gt;
&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;policy&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;toMillis&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt; &lt;span class="nc"&gt;TimeUnit&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;MILLISECONDS&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;AttemptResult&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;elapsedMs&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;started&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt; &lt;span class="s"&gt;"ok"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;TimeoutException&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;cancel&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;AttemptResult&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;elapsedMs&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;started&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt; &lt;span class="s"&gt;"timeout"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When an attempt exceeds the timeout, the latency recorded for that attempt is capped by the caller timeout: &lt;code&gt;STANDARD_TIMEOUT = Duration.ofMillis(260)&lt;/code&gt;. That's why in &lt;code&gt;progressive-degradation&lt;/code&gt; almost all &lt;code&gt;all_attempt_p95_ms&lt;/code&gt; and &lt;code&gt;all_attempt_p99_ms&lt;/code&gt; values show exactly &lt;code&gt;260&lt;/code&gt;. It's not that the downstream responded in 260 ms: it's that the caller stopped waiting at 260 ms and recorded that as the attempt latency.&lt;/p&gt;

&lt;p&gt;What happens after the &lt;code&gt;cancel(true)&lt;/code&gt; in the simulated downstream isn't fully modeled. In a real system with HTTP, a database, or a queue, the downstream may keep executing work even after the client has given up. The lab counts initiated calls but can't guarantee there's no residual work post-cancellation.&lt;/p&gt;

&lt;p&gt;This also matters for reading &lt;code&gt;successful_requests_per_second&lt;/code&gt;. The value of &lt;code&gt;0.95&lt;/code&gt; that appears across several &lt;code&gt;progressive-degradation&lt;/code&gt; scenarios isn't the system's maximum capacity: it's the useful work observed under that closed k6 load. With a different VU configuration, a different duration, or a real network, the numbers would differ.&lt;/p&gt;

&lt;h2&gt;
  
  
  circuit-breaker and bulkhead: visible rejections as a protection signal
&lt;/h2&gt;

&lt;p&gt;In &lt;code&gt;progressive-degradation&lt;/code&gt;, the circuit breaker produces something that looks contradictory at first glance. The &lt;code&gt;13-circuit-breaker-progressive-degradation&lt;/code&gt; run has &lt;code&gt;total_requests = 44777&lt;/code&gt; and &lt;code&gt;circuit_breaker_rejected = 44718&lt;/code&gt;. The error rate is &lt;code&gt;0.9987&lt;/code&gt;. That looks catastrophic.&lt;/p&gt;

&lt;p&gt;But look at the downstream calls: &lt;code&gt;198&lt;/code&gt;. Amplification factor: &lt;code&gt;0.004&lt;/code&gt;. The circuit breaker almost completely stopped sending calls to the downstream. The rejections are visible to the client, but the downstream is protected.&lt;/p&gt;

&lt;p&gt;Compare that with &lt;code&gt;immediate-retry-progressive-degradation&lt;/code&gt;, which has &lt;code&gt;downstream_calls = 8699&lt;/code&gt; and keeps failing at the same rate, and the trade-off becomes obvious. The circuit breaker chooses to reject fast rather than multiply pressure on something that can no longer respond.&lt;/p&gt;

&lt;p&gt;The bulkhead in the same run shows a different variant: &lt;code&gt;bulkhead_rejected = 22122&lt;/code&gt; with &lt;code&gt;downstream_calls = 3668&lt;/code&gt;. It limits concurrency instead of opening the circuit, but the effect is similar: it reduces downstream pressure at the cost of visible rejections.&lt;/p&gt;

&lt;p&gt;Those concurrency signals (&lt;code&gt;max_inflight_downstream = 16&lt;/code&gt; for bulkhead, &lt;code&gt;40&lt;/code&gt; for most other runs) are observations, not proof of saturation. The lab renamed the metric from &lt;code&gt;saturationObservation&lt;/code&gt; to &lt;code&gt;concurrencyObservation&lt;/code&gt; for exactly that reason: high &lt;code&gt;max_inflight&lt;/code&gt; doesn't prove CPU, network, or connection pool saturation. It's a signal that invites investigation, not a conclusion.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I conclude and what I don't
&lt;/h2&gt;

&lt;p&gt;This experiment is a local simulation, a single published run, against a simulated downstream with in-memory delays. The numbers don't represent production, don't represent any real provider, and don't support claiming "this policy scales to X RPS". If you want to publish exact values with strong claims, the README says it clearly: run at least three &lt;code&gt;editorial&lt;/code&gt; runs and look for consistency, not a single pass.&lt;/p&gt;

&lt;p&gt;What I think can be sustained:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In transient failures, retry can improve success rate but always has an amplification factor greater than 1. That overhead exists and has to fit within the system.&lt;/li&gt;
&lt;li&gt;In load-sensitive degradation, more retries can accelerate the degradation because they generate more calls. This isn't universal, but the scenario is real and the experiment reproduces it.&lt;/li&gt;
&lt;li&gt;p95 and p99 of attempts don't tell you the real downstream latency when there are timeouts: they tell you how long the caller waited before giving up.&lt;/li&gt;
&lt;li&gt;Circuit breaker and bulkhead produce visible rejections that can be exactly the right decision to protect the system.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What I don't conclude: that one policy is better than another in the abstract, that these numbers apply to a different system, or that &lt;code&gt;max_inflight_downstream&lt;/code&gt; proves saturation.&lt;/p&gt;

&lt;p&gt;The question I'm leaving open for further exploration: how much real residual work actually remains in the downstream after a &lt;code&gt;future.cancel(true)&lt;/code&gt; in a system with an HTTP connection pool? The lab notes it as a known limitation. In production that's exactly where the difference lies between a timeout that protects and one that only hides the problem.&lt;/p&gt;

&lt;p&gt;The repo is at &lt;a href="https://github.com/JuanTorchia/retry-resilience-experiment" rel="noopener noreferrer"&gt;&lt;code&gt;github.com/JuanTorchia/retry-resilience-experiment&lt;/code&gt;&lt;/a&gt;. If you run it and get different numbers, I want to know.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This article was originally published on &lt;a href="https://juanchi.dev/en/blog/retry-backoff-jitter-spring-boot-amplification" rel="noopener noreferrer"&gt;juanchi.dev&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>english</category>
      <category>experiments</category>
      <category>backend</category>
      <category>arquitectura</category>
    </item>
    <item>
      <title>Retry no es gratis: presupuesto, amplificación y el costo que no aparece en el p95</title>
      <dc:creator>Juan Torchia</dc:creator>
      <pubDate>Fri, 15 May 2026 15:55:26 +0000</pubDate>
      <link>https://forem.com/jtorchia/retry-no-es-gratis-presupuesto-amplificacion-y-el-costo-que-no-aparece-en-el-p95-22no</link>
      <guid>https://forem.com/jtorchia/retry-no-es-gratis-presupuesto-amplificacion-y-el-costo-que-no-aparece-en-el-p95-22no</guid>
      <description>&lt;p&gt;Hay una decisión que tomé mal más de una vez: agregar retry como si fuera una mejora sin costo. Configuro tres intentos con backoff exponencial, el sistema se ve más estable en el dashboard, y listo. Lo que no estaba mirando era cuántas llamadas extra le estaba mandando al downstream en cada falla.&lt;/p&gt;

&lt;p&gt;Este post nace de un experimento que armé para medir eso con precisión: cuándo retry compra disponibilidad real, cuándo multiplica presión y cuándo simplemente no cambia nada porque el problema no es transitorio. El repo es &lt;a href="https://github.com/JuanTorchia/retry-resilience-experiment" rel="noopener noreferrer"&gt;&lt;code&gt;retry-resilience-experiment&lt;/code&gt;&lt;/a&gt;, commit &lt;code&gt;bdfc350&lt;/code&gt;, con Spring Boot 3.3.5, Java 21, Resilience4j 2.2.0 y k6 como generador de carga.&lt;/p&gt;

&lt;p&gt;Mi tesis es simple: retry es presupuesto. Cada intento extra consume tiempo de espera del usuario, llama al downstream real y puede acelerar una degradación que ya estaba en curso. No es una feature que activás y listo.&lt;/p&gt;

&lt;h2&gt;
  
  
  El problema de mirar solo el success rate
&lt;/h2&gt;

&lt;p&gt;Cuando el downstream tiene fallas aleatorias simuladas al 35%, la diferencia entre políticas es visible. Con &lt;code&gt;no-retry-standard-timeout&lt;/code&gt;, el success rate en esa corrida fue &lt;code&gt;0.6529&lt;/code&gt;. Con &lt;code&gt;immediate-retry&lt;/code&gt;, subió a &lt;code&gt;0.955&lt;/code&gt;. Eso parece una victoria clara.&lt;/p&gt;

&lt;p&gt;Pero el número que importa está al lado: el &lt;code&gt;retry_amplification_factor&lt;/code&gt;. Con &lt;code&gt;immediate-retry&lt;/code&gt; en &lt;code&gt;random-failures&lt;/code&gt; llegó a &lt;code&gt;1.465&lt;/code&gt;. Eso significa que por cada request del usuario, el sistema hizo 1.465 llamadas reales al downstream. En &lt;code&gt;jitter-random-failures&lt;/code&gt; fue &lt;code&gt;1.471&lt;/code&gt;. El downstream recibió casi un 47% más de tráfico del que generó k6.&lt;/p&gt;

&lt;p&gt;En fallas transitorias eso puede ser aceptable. El downstream está fallando por razones externas, los reintentos aterrizan en momentos distintos y el resultado mejora. Pero ese 47% extra no es abstracto: tiene que existir capacidad downstream para absorberlo. Si el servicio ya está al límite, ese overhead es el empujón que lo tira.&lt;/p&gt;

&lt;p&gt;La métrica que el repo define como contrato para no engañarse es exactamente esa:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// MetricSnapshot.java — la razón de esta línea es evitar autoengaño&lt;/span&gt;
&lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;retryAmplificationFactor&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// downstream_calls / total_requests&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Si solo mirás &lt;code&gt;successRate&lt;/code&gt; y &lt;code&gt;errorRate&lt;/code&gt;, podés creer que ganaste cuando en realidad le metiste 47% más de carga a un sistema que ya estaba sufriendo.&lt;/p&gt;

&lt;h2&gt;
  
  
  progressive-degradation: donde el retry puede acelerar la caída
&lt;/h2&gt;

&lt;p&gt;Este escenario es el más interesante metodológicamente, y también el que tiene la advertencia más importante.&lt;/p&gt;

&lt;p&gt;El downstream de &lt;code&gt;PROGRESSIVE_DEGRADATION&lt;/code&gt; implementa esto:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// DownstreamScenario.java — el delay sube con cada llamada real recibida&lt;/span&gt;
&lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="no"&gt;PROGRESSIVE_DEGRADATION&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;
    &lt;span class="nc"&gt;Duration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofMillis&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Math&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;min&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;900&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;callNumber&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;El delay no es externo ni fijo: crece con &lt;code&gt;callNumber&lt;/code&gt;, que es el contador de llamadas reales al downstream. Esto significa que una política con más retries genera más llamadas, y esas llamadas aceleran la degradación. No es la misma falla para todos: las políticas con retry se degradan más rápido porque presionan más.&lt;/p&gt;

&lt;p&gt;Los números de la corrida muestran eso claramente. Con &lt;code&gt;no-retry-standard-timeout&lt;/code&gt; se procesaron &lt;code&gt;7720&lt;/code&gt; requests totales y se iniciaron &lt;code&gt;7720&lt;/code&gt; llamadas downstream. Con &lt;code&gt;immediate-retry&lt;/code&gt;, los requests totales bajaron a &lt;code&gt;2939&lt;/code&gt; pero las llamadas downstream subieron a &lt;code&gt;8699&lt;/code&gt;, con un amplification factor de &lt;code&gt;2.96&lt;/code&gt;. La policy con retry procesó menos requests de usuarios pero le hizo más llamadas al downstream.&lt;/p&gt;

&lt;p&gt;Ahora bien: esto no es un fallo de diseño, es el punto del experimento. El laboratorio lo documenta explícitamente en &lt;code&gt;docs/brief-post.md&lt;/code&gt;: &lt;code&gt;progressive-degradation&lt;/code&gt; debe leerse como degradación sensible a carga, no como falla externa idéntica para todos. Si lo tratás como comparación directa entre políticas bajo las mismas condiciones, la conclusión está mal planteada desde el vantage point.&lt;/p&gt;

&lt;p&gt;Lo que sí podés concluir: en escenarios donde la velocidad de degradación depende del volumen de llamadas recibidas, los retries pueden ser un acelerador de la caída. Eso tiene nombre en producción: retry storm. Y el laboratorio lo reproduce de forma controlada.&lt;/p&gt;

&lt;h2&gt;
  
  
  Los percentiles que te mienten cuando hay timeouts
&lt;/h2&gt;

&lt;p&gt;Hay un detalle técnico que cambió mi forma de leer los resultados, y que el README documenta con honestidad.&lt;/p&gt;

&lt;p&gt;El timeout del caller se implementa con &lt;code&gt;future.cancel(true)&lt;/code&gt; en el &lt;code&gt;RetryExecutor&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// RetryExecutor.java — el cancel(true) interrumpe el intento desde el caller&lt;/span&gt;
&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;policy&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;toMillis&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt; &lt;span class="nc"&gt;TimeUnit&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;MILLISECONDS&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;AttemptResult&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;elapsedMs&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;started&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt; &lt;span class="s"&gt;"ok"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;TimeoutException&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;cancel&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;AttemptResult&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;elapsedMs&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;started&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt; &lt;span class="s"&gt;"timeout"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cuando un intento vence el timeout, la latencia registrada para ese intento está capada por el timeout del caller: &lt;code&gt;STANDARD_TIMEOUT = Duration.ofMillis(260)&lt;/code&gt;. Por eso en &lt;code&gt;progressive-degradation&lt;/code&gt; casi todos los &lt;code&gt;all_attempt_p95_ms&lt;/code&gt; y &lt;code&gt;all_attempt_p99_ms&lt;/code&gt; muestran exactamente &lt;code&gt;260&lt;/code&gt;. No es que el downstream respondió en 260 ms: es que el caller dejó de esperar a los 260 ms y registró eso como latencia del intento.&lt;/p&gt;

&lt;p&gt;Lo que pasa después del &lt;code&gt;cancel(true)&lt;/code&gt; en el downstream simulado no se modela completamente. En un sistema real con HTTP, base de datos o cola, el downstream puede seguir ejecutando trabajo aunque el cliente ya no espere. El laboratorio cuenta llamadas iniciadas, pero no puede garantizar que no hay trabajo residual post-cancelación.&lt;/p&gt;

&lt;p&gt;Esto importa para leer &lt;code&gt;successful_requests_per_second&lt;/code&gt; también. El valor de &lt;code&gt;0.95&lt;/code&gt; que aparece en varios escenarios de &lt;code&gt;progressive-degradation&lt;/code&gt; no es la capacidad máxima del sistema: es el trabajo útil observado bajo esa carga cerrada de k6. Con otra configuración de VUs, otra duración o una red real, los números serían distintos.&lt;/p&gt;

&lt;h2&gt;
  
  
  circuit-breaker y bulkhead: rechazos visibles como señal de protección
&lt;/h2&gt;

&lt;p&gt;En &lt;code&gt;progressive-degradation&lt;/code&gt;, el circuit breaker produce algo que parece contradictorio al primer vistazo. La corrida &lt;code&gt;13-circuit-breaker-progressive-degradation&lt;/code&gt; tiene &lt;code&gt;total_requests = 44777&lt;/code&gt; y &lt;code&gt;circuit_breaker_rejected = 44718&lt;/code&gt;. El error rate es &lt;code&gt;0.9987&lt;/code&gt;. Eso parece catastrófico.&lt;/p&gt;

&lt;p&gt;Pero mirá las llamadas downstream: &lt;code&gt;198&lt;/code&gt;. Amplification factor: &lt;code&gt;0.004&lt;/code&gt;. El circuit breaker dejó de mandar llamadas al downstream casi por completo. Los rechazos son visibles hacia el cliente, pero el downstream está protegido.&lt;/p&gt;

&lt;p&gt;Si comparás con &lt;code&gt;immediate-retry-progressive-degradation&lt;/code&gt;, que tiene &lt;code&gt;downstream_calls = 8699&lt;/code&gt; y sigue fallando igual, el trade-off se hace evidente. El circuit breaker elige rechazar rápido antes que multiplicar presión sobre algo que ya no puede responder.&lt;/p&gt;

&lt;p&gt;El bulkhead en la misma corrida muestra una variante distinta: &lt;code&gt;bulkhead_rejected = 22122&lt;/code&gt; con &lt;code&gt;downstream_calls = 3668&lt;/code&gt;. Limita concurrencia en lugar de cortar el circuito, pero el efecto es similar: reduce presión downstream a costa de rechazos visibles.&lt;/p&gt;

&lt;p&gt;Esas señales de concurrencia (&lt;code&gt;max_inflight_downstream = 16&lt;/code&gt; para bulkhead, &lt;code&gt;40&lt;/code&gt; para la mayoría de las otras corridas) son observaciones, no prueba de saturación. El laboratorio renombró la métrica de &lt;code&gt;saturationObservation&lt;/code&gt; a &lt;code&gt;concurrencyObservation&lt;/code&gt; exactamente por eso: &lt;code&gt;max_inflight&lt;/code&gt; alto no prueba saturación de CPU, red ni pool de conexiones. Es una señal que invita a investigar, no una conclusión.&lt;/p&gt;

&lt;h2&gt;
  
  
  Qué concluyo y qué no
&lt;/h2&gt;

&lt;p&gt;Este experimento es una simulación local, corrida única publicada, sobre un downstream simulado con delays en memoria. Los números no representan producción, no representan ningún proveedor real y no permiten afirmar "esta política escala a X RPS". Si querés publicar valores exactos con claims fuertes, el README lo dice claramente: hacé al menos tres corridas &lt;code&gt;editorial&lt;/code&gt; y mirá consistencia, no una sola pasada.&lt;/p&gt;

&lt;p&gt;Lo que sí creo que puede sostenerse:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;En fallas transitorias, retry puede mejorar success rate pero siempre tiene un amplification factor mayor a 1. Ese overhead existe y tiene que caber en el sistema.&lt;/li&gt;
&lt;li&gt;En degradación sensible a carga, más retries pueden acelerar la degradación porque generan más llamadas. Esto no es universal, pero el escenario es real y el experimento lo reproduce.&lt;/li&gt;
&lt;li&gt;p95 y p99 de intentos no te cuentan la latencia real del downstream cuando hay timeouts: te cuentan cuánto esperó el caller antes de rendirse.&lt;/li&gt;
&lt;li&gt;Circuit breaker y bulkhead producen rechazos visibles que pueden ser exactamente la decisión correcta para proteger el sistema.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Lo que no concluyo: que una política es mejor que otra en abstracto, que estos números aplican a otro sistema, o que &lt;code&gt;max_inflight_downstream&lt;/code&gt; prueba saturación.&lt;/p&gt;

&lt;p&gt;La pregunta que me dejo para seguir explorando: ¿cuánto trabajo residual real queda en el downstream después de un &lt;code&gt;future.cancel(true)&lt;/code&gt; en un sistema con pool de conexiones HTTP? El laboratorio lo anota como limitación conocida. En producción eso es exactamente donde está la diferencia entre un timeout que protege y uno que solo esconde el problema.&lt;/p&gt;

&lt;p&gt;El repo está en &lt;a href="https://github.com/JuanTorchia/retry-resilience-experiment" rel="noopener noreferrer"&gt;&lt;code&gt;github.com/JuanTorchia/retry-resilience-experiment&lt;/code&gt;&lt;/a&gt;. Si lo corrés y obtenés números distintos, me interesa saberlo.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Este articulo fue publicado originalmente en &lt;a href="https://juanchi.dev/es/blog/retry-backoff-jitter-spring-boot-amplification" rel="noopener noreferrer"&gt;juanchi.dev&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>spanish</category>
      <category>espanol</category>
      <category>experimentos</category>
      <category>backend</category>
    </item>
  </channel>
</rss>
