“In the beginning there was the word... and the word was scraped.”
🎯 What is ScraperAgent?
ScraperAgent is a Swarm-native scraper —
no Docker, no headless Chrome, no external queues.
Just:
- 🧠 Drop a
.json
job intopayload/
- 🕷️ Agent fetches + parses the target
- 📜 Logs result to Mailman in seconds
✅ What It Extracts
From any URL, it pulls:
- ✅
<title>
- ✅ Meta description
- ✅ First 10
<a href>
links - ✅ Logs it all into
/comm/mailman-1/payload/
as a.json
file
Example log:
{
"target": "https://matrixswarm.com",
"title": "MatrixSwarm - Autonomous Agent OS",
"description": "A self-healing AI swarm framework.",
"link_count": 27,
"links": [
"https://matrixswarm.com/about",
"https://matrixswarm.com/docs"
]
}
💾 How It Works
You drop this in:
json
Copy
Edit
{
"target_url": "https://matrixswarm.com",
"mode": "summary"
}
ScraperAgent sees the .json, processes the target, and drops a parsed report into Mailman.
⚙️ Spawn Directive
json
Copy
Edit
{
"permanent_id": "scraper-1",
"name": "scraper",
"filesystem": {
"folders": [
{ "name": "payload", "type": "d" }
]
},
"config": {
"report_to": "mailman-1"
}
}
🧠 Why It Changes Everything
Traditional Stack MatrixSwarm
Puppeteer ❌
Chrome Headless ❌
Containerized Scraper API ❌
200MB image + queue ❌
Swarm Alternative:
💡 scraper_agent.py + payload/ folder + mailman-1
No containers.
No browser.
No mess.
🤯 Bonus Potential
✨ Feed results into OracleAgent for summary
🔁 Schedule crawl via Commander
💣 Pair with ReaperAgent to wipe stale data after analysis
💬 The Takeaway
This isn't scraping.
This is Swarm Recon — deployed by Matrix, logged by Mailman, interpreted by Oracle, and remembered forever.
🔗 Get Started
GitHub: https://github.com/matrixswarm/matrixswarm
Website: https://matrixswarm.com
YouTube: MatrixSwarm OS – Spawn, Kill, Resurrect
X/Twitter: @matrixswarm
📜 Fork It Clause
MatrixSwarm is open.
Fork it.
Or Fork U.
(The swarm is open. Bring tools or get logged.)
Top comments (2)
Man, cutting out all that chrome and container junk is so clutch. Honestly just dropping a json and getting logs fast? Yeah, that's my speed.
Appreciate it, Nevo.
That’s the goal — no containers, no chrome, no noise.
Just agents, spawning on signal, logging on impact.
Drop a .json, watch it move.
MatrixSwarm was built for ops who don’t want overhead —
just outcomes.