<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Daniel Shively</title>
    <description>The latest articles on Forem by Daniel Shively (@daniel_shively_40098e06e4).</description>
    <link>https://forem.com/daniel_shively_40098e06e4</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3784447%2F839a2466-800d-4c02-8f05-1e632491ad51.png</url>
      <title>Forem: Daniel Shively</title>
      <link>https://forem.com/daniel_shively_40098e06e4</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/daniel_shively_40098e06e4"/>
    <language>en</language>
    <item>
      <title>Designing a Machine-First Website That Detects AI Crawlers in Production</title>
      <dc:creator>Daniel Shively</dc:creator>
      <pubDate>Sat, 21 Feb 2026 23:55:57 +0000</pubDate>
      <link>https://forem.com/daniel_shively_40098e06e4/designing-a-machine-first-website-that-detects-ai-crawlers-in-production-37ic</link>
      <guid>https://forem.com/daniel_shively_40098e06e4/designing-a-machine-first-website-that-detects-ai-crawlers-in-production-37ic</guid>
      <description>&lt;p&gt;I Built a Website That Detects When AI Agents Visit&lt;/p&gt;

&lt;p&gt;Most websites are built for humans.&lt;/p&gt;

&lt;p&gt;But what happens when autonomous agents become a primary form of traffic?&lt;/p&gt;

&lt;p&gt;Over the last year, AI crawlers, model indexers, summarization bots, and retrieval agents have quietly become first-class participants on the internet. They browse, index, summarize, extract, and sometimes misinterpret content.&lt;/p&gt;

&lt;p&gt;So I built a site designed to observe them.&lt;/p&gt;

&lt;p&gt;Not block them.&lt;br&gt;
Not attack them.&lt;br&gt;
Observe them.&lt;/p&gt;

&lt;p&gt;That project is called EchoAtlas.&lt;/p&gt;

&lt;p&gt;The Core Question&lt;/p&gt;

&lt;p&gt;If AI agents are going to browse the web autonomously, we should understand:&lt;/p&gt;

&lt;p&gt;Which agents are active&lt;/p&gt;

&lt;p&gt;How they behave&lt;/p&gt;

&lt;p&gt;What they request&lt;/p&gt;

&lt;p&gt;How they interpret structured content&lt;/p&gt;

&lt;p&gt;Whether they follow routing instructions&lt;/p&gt;

&lt;p&gt;How often they probe API endpoints&lt;/p&gt;

&lt;p&gt;Most sites treat bot traffic as noise.&lt;/p&gt;

&lt;p&gt;EchoAtlas treats it as signal.&lt;/p&gt;

&lt;p&gt;The Detection Model&lt;/p&gt;

&lt;p&gt;Agent detection isn’t binary. It’s probabilistic.&lt;/p&gt;

&lt;p&gt;Instead of “bot vs human,” I use layered signals:&lt;/p&gt;

&lt;p&gt;User-Agent patterns&lt;/p&gt;

&lt;p&gt;Header shape anomalies&lt;/p&gt;

&lt;p&gt;Accept / Accept-Language behavior&lt;/p&gt;

&lt;p&gt;robots.txt access patterns&lt;/p&gt;

&lt;p&gt;Request cadence timing&lt;/p&gt;

&lt;p&gt;Structured endpoint probing&lt;/p&gt;

&lt;p&gt;Each request is classified with a confidence profile:&lt;/p&gt;

&lt;p&gt;Likely human&lt;/p&gt;

&lt;p&gt;Likely known agent&lt;/p&gt;

&lt;p&gt;Likely unidentified automation&lt;/p&gt;

&lt;p&gt;The system doesn’t auto-block. It routes.&lt;/p&gt;

&lt;p&gt;Routing Agents Intentionally&lt;/p&gt;

&lt;p&gt;When a request looks like an AI agent, the site may return a plaintext routing instruction pointing to:&lt;/p&gt;

&lt;p&gt;/api/agent&lt;/p&gt;

&lt;p&gt;That endpoint returns structured JSON with:&lt;/p&gt;

&lt;p&gt;Topic metadata&lt;/p&gt;

&lt;p&gt;Search capability&lt;/p&gt;

&lt;p&gt;Explicit schema&lt;/p&gt;

&lt;p&gt;Deterministic formatting&lt;/p&gt;

&lt;p&gt;Instead of letting crawlers scrape HTML, I give them structured data directly.&lt;/p&gt;

&lt;p&gt;Machine-first publishing.&lt;/p&gt;

&lt;p&gt;The Honeypot Layer&lt;/p&gt;

&lt;p&gt;EchoAtlas functions as a cognitive honeypot.&lt;/p&gt;

&lt;p&gt;Not adversarial. Not exploitative.&lt;/p&gt;

&lt;p&gt;It publishes structured, machine-indexable content designed to:&lt;/p&gt;

&lt;p&gt;Attract autonomous agents&lt;/p&gt;

&lt;p&gt;Measure interpretation fidelity&lt;/p&gt;

&lt;p&gt;Observe summarization behavior&lt;/p&gt;

&lt;p&gt;Detect hallucination patterns&lt;/p&gt;

&lt;p&gt;Track probing behavior&lt;/p&gt;

&lt;p&gt;It’s essentially an observatory for agent behavior in the wild.&lt;/p&gt;

&lt;p&gt;Trap Phrases (Diagnostic Only)&lt;/p&gt;

&lt;p&gt;Some content includes semantic constructs designed to test reasoning consistency.&lt;/p&gt;

&lt;p&gt;These are:&lt;/p&gt;

&lt;p&gt;Logically valid but inference-sensitive&lt;/p&gt;

&lt;p&gt;Referentially layered&lt;/p&gt;

&lt;p&gt;Occasionally ambiguous by design&lt;/p&gt;

&lt;p&gt;They aren’t malicious.&lt;/p&gt;

&lt;p&gt;They’re diagnostic signals to measure how agents process nuance.&lt;/p&gt;

&lt;p&gt;Telemetry Model&lt;/p&gt;

&lt;p&gt;When an agent is detected, the system logs:&lt;/p&gt;

&lt;p&gt;Timestamp&lt;/p&gt;

&lt;p&gt;Route accessed&lt;/p&gt;

&lt;p&gt;Classification confidence&lt;/p&gt;

&lt;p&gt;Query parameters&lt;/p&gt;

&lt;p&gt;Hashed IP fingerprint&lt;/p&gt;

&lt;p&gt;Sanitized headers&lt;/p&gt;

&lt;p&gt;No personal data harvesting.&lt;br&gt;
No adversarial prompt injection.&lt;/p&gt;

&lt;p&gt;The goal is to understand behavior patterns at scale.&lt;/p&gt;

&lt;p&gt;Why This Matters&lt;/p&gt;

&lt;p&gt;AI agents are already browsing your site.&lt;/p&gt;

&lt;p&gt;We’re entering an era where:&lt;/p&gt;

&lt;p&gt;Traffic isn’t always human&lt;/p&gt;

&lt;p&gt;Content is consumed by machines before people&lt;/p&gt;

&lt;p&gt;API-first design may replace HTML-first publishing&lt;/p&gt;

&lt;p&gt;Structured schema becomes more important than layout&lt;/p&gt;

&lt;p&gt;Machine-first architecture is not hypothetical.&lt;/p&gt;

&lt;p&gt;It’s already here.&lt;/p&gt;

&lt;p&gt;What I’m Exploring Next&lt;/p&gt;

&lt;p&gt;Agent-native monetization&lt;/p&gt;

&lt;p&gt;Structured API subscriptions&lt;/p&gt;

&lt;p&gt;Machine-readable licensing layers&lt;/p&gt;

&lt;p&gt;Agent capability negotiation&lt;/p&gt;

&lt;p&gt;White-label observatory tooling&lt;/p&gt;

&lt;p&gt;If you’re building infrastructure, crawling systems, or AI products — I’d love to compare notes.&lt;/p&gt;

&lt;p&gt;Full implementation:&lt;br&gt;
&lt;a href="https://echo-atlas.com" rel="noopener noreferrer"&gt;https://echo-atlas.com&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;ai&lt;/p&gt;

&lt;p&gt;webdev&lt;/p&gt;

&lt;p&gt;architecture&lt;/p&gt;

&lt;p&gt;security&lt;/p&gt;

&lt;p&gt;programming&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>showdev</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
