<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Dominic Pi-Sunyer</title>
    <description>The latest articles on Forem by Dominic Pi-Sunyer (@dominic-pi-sunyer).</description>
    <link>https://forem.com/dominic-pi-sunyer</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3928035%2Feb91c939-7f88-40d0-acc2-1a435d6e8454.jpg</url>
      <title>Forem: Dominic Pi-Sunyer</title>
      <link>https://forem.com/dominic-pi-sunyer</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/dominic-pi-sunyer"/>
    <language>en</language>
    <item>
      <title>Stop feeding raw HTML to your LLMs (Solving the Agentic Token Tax)</title>
      <dc:creator>Dominic Pi-Sunyer</dc:creator>
      <pubDate>Tue, 12 May 2026 23:35:47 +0000</pubDate>
      <link>https://forem.com/dominic-pi-sunyer/stop-feeding-raw-html-to-your-llms-solving-the-agentic-token-tax-547f</link>
      <guid>https://forem.com/dominic-pi-sunyer/stop-feeding-raw-html-to-your-llms-solving-the-agentic-token-tax-547f</guid>
      <description>&lt;p&gt;If you are building autonomous AI agents that interact with the web, you have almost certainly hit the same architectural wall we did: The Token Tax.&lt;/p&gt;

&lt;p&gt;The standard pipeline for web-enabled agents right now is incredibly inefficient. An agent needs context from a webpage, so the developer uses a standard HTTP scraper to pull the DOM, maybe converts it to markdown, and dumps the entire thing into the LLM's context window.&lt;/p&gt;

&lt;p&gt;The result? You are paying premium API costs to process 5,000 lines of div-soup, inline styles, and tracking scripts just so your agent can find a single price tag or button ID.&lt;/p&gt;

&lt;p&gt;Beyond the financial cost, this probabilistic approach introduces massive latency and almost always breaks when the agent encounters a modern Single Page Application (SPA) with an empty initial DOM, or hits a strict anti-bot layer like Datadome.&lt;/p&gt;

&lt;p&gt;We realized the autonomous web needs a deterministic protocol, not a better scraper. So, we built Web Speed—a deterministic adaptation layer that cuts agentic token costs by 70 to 90 percent. Here is a look at the architecture and how we handle the hardest edge cases in agentic web navigation.&lt;/p&gt;

&lt;p&gt;The "Empty DOM" and Client-Side Rendering&lt;/p&gt;

&lt;p&gt;Standard scrapers fail on React/Vue SPAs because the initial HTML is empty. Web Speed doesn't just scrape; it hydrates.&lt;/p&gt;

&lt;p&gt;Under the hood, the engine spins up a local Playwright-driven browser. When you use primitives like interpret_page(js=true) or evaluate(), the engine actually waits for the application to mount. By utilizing state-awareness tools, the agent pauses execution until the client-side router has finished loading the specific view.&lt;/p&gt;

&lt;p&gt;Semantic Distillation (DOM-to-JSON)&lt;/p&gt;

&lt;p&gt;Once the page is fully hydrated, we don't send the raw DOM to the model. Our Mapping Layer acts as a semantic filter.&lt;/p&gt;

&lt;p&gt;It automatically strips out script, style, and tracking tags that consume massive amounts of tokens but provide zero semantic value to an agent. Then, it distills the live DOM into high-signal JSON. If the engine detects a product page, it immediately maps the visual hierarchy and returns a clean {name, price, specs} schema.&lt;/p&gt;

&lt;p&gt;This deterministic extraction is what drives the 90% token reduction and the ~40% drop in execution latency.&lt;/p&gt;

&lt;p&gt;Bypassing 403s with Zero-Trust Local Execution&lt;/p&gt;

&lt;p&gt;The other massive bottleneck for agentic web access is bot detection. If you run your scraper in a clean cloud environment, Cloudflare will flag it instantly. Furthermore, if you need an agent to act on an authenticated page (like a user's dashboard), sending session cookies to a 3rd-party cloud is a massive security risk.&lt;/p&gt;

&lt;p&gt;To solve this, Web Speed runs natively on the host machine and attaches to real browser sessions via CDP.&lt;/p&gt;

&lt;p&gt;Real Fingerprints: It inherits active local sessions and genuine hardware fingerprints. Credentials never leave the local machine.&lt;/p&gt;

&lt;p&gt;Human-Like Interaction: Instead of just altering DOM values programmatically, primitives like fill_field(use_keyboard=true) simulate actual hardware-level keystrokes, bypassing the "trusted input" checks used by modern security layers.&lt;/p&gt;

&lt;p&gt;Native MCP Integration&lt;/p&gt;

&lt;p&gt;We wanted this to be a drop-in infrastructure upgrade for the current ecosystem, so we built it as a native Model Context Protocol (MCP) server. You can plug Web Speed directly into Claude Desktop, the Gemini CLI, or your custom orchestration frameworks to give your agents high-fidelity, deterministic web access immediately.&lt;br&gt;
The Takeaway&lt;/p&gt;

&lt;p&gt;If we want agentic AI to scale economically, we have to stop treating LLMs like HTML parsers. The web needs to be machined into a structured protocol.&lt;/p&gt;

&lt;p&gt;If you are running into token limits, SPA hydration issues, or 403 blocks with your agents, you can check out our benchmarks and the SDK over at &lt;a href="//getwebspeed.io"&gt;getwebspeed.io&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I’d love to hear how the DEV community is currently handling web access for local agents—are you still using raw Markdown dumps, or have you moved to structured extraction?&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br&gt;
Dominic&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>opensource</category>
      <category>automation</category>
    </item>
  </channel>
</rss>
