<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: gen</title>
    <description>The latest articles on Forem by gen (@prospectorlabs).</description>
    <link>https://forem.com/prospectorlabs</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3938157%2F509120e2-78f1-4ec3-87bb-3a6fcdde6532.png</url>
      <title>Forem: gen</title>
      <link>https://forem.com/prospectorlabs</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/prospectorlabs"/>
    <language>en</language>
    <item>
      <title>luckrig: a concept for tasting LLM rigs, not just models</title>
      <dc:creator>gen</dc:creator>
      <pubDate>Fri, 22 May 2026 17:39:03 +0000</pubDate>
      <link>https://forem.com/prospectorlabs/luckrig-a-concept-for-tasting-llm-rigs-not-just-models-34l4</link>
      <guid>https://forem.com/prospectorlabs/luckrig-a-concept-for-tasting-llm-rigs-not-just-models-34l4</guid>
      <description>&lt;p&gt;luckrig: a concept for tasting LLM rigs, not just models&lt;/p&gt;

&lt;p&gt;HuggingFace Spaces lets you try models.&lt;br&gt;
LMSys Arena lets you compare models.&lt;/p&gt;

&lt;p&gt;Neither lets you try a specific rig.&lt;/p&gt;

&lt;p&gt;Exact GPU. Exact quantization. Exact context length.&lt;br&gt;
Someone's actual tuning notes — with your own prompt, right now.&lt;/p&gt;

&lt;p&gt;That's the gap. luckrig is a concept to fill it.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;If Arena maps models, luckrig maps the rigs.&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;th&gt;What you taste&lt;/th&gt;
&lt;th&gt;Hardware visible?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;HF Spaces&lt;/td&gt;
&lt;td&gt;Author's model wrap&lt;/td&gt;
&lt;td&gt;Whatever they printed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LMSys Arena&lt;/td&gt;
&lt;td&gt;Blind A/B models&lt;/td&gt;
&lt;td&gt;Model name. Nothing else.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI Horde&lt;/td&gt;
&lt;td&gt;Any worker that fits&lt;/td&gt;
&lt;td&gt;Abstracted away&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;luckrig&lt;/td&gt;
&lt;td&gt;A specific rig&lt;/td&gt;
&lt;td&gt;GPU · quant · ctx · tuning&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;AI Horde abstracts the worker away.&lt;br&gt;
luckrig makes the hardware the star.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Access earned by contribution, not money.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Inspired by Hotline Connect — the early-2000s Mac P2P tool where&lt;br&gt;
contribution score, not payment, determined access rights.&lt;/p&gt;

&lt;p&gt;Register a node → write tuning notes → upload timing measurements.&lt;br&gt;
That's how you earn access to other people's rigs.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Three seed nodes exist in the POC — not yet public.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;first-5090-qwen3 — RTX 5090, Qwen3-35B-A3B, Q4_K_XL, 267 tok/s&lt;/li&gt;
&lt;li&gt;weekend-m3max — Apple M3 Max, Qwen2.5-14B, Q5_K_M&lt;/li&gt;
&lt;li&gt;shed-pi5 — Raspberry Pi 5, llama3.2-1B, 2.3 tok/s&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are local test nodes to demonstrate the concept.&lt;br&gt;
Looking for early contributors who want to register a real node.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Rarity-first, not leaderboard.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Pi node ranks higher than the 5090 because it's rarer.&lt;br&gt;
Not a speed competition — a showcase of diversity.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Working POC. No external dependencies.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;git clone github.com/prospectorlabs/luckrig&lt;br&gt;
cd luckrig&lt;br&gt;
npm start&lt;br&gt;
→ &lt;a href="http://127.0.0.1:8787" rel="noopener noreferrer"&gt;http://127.0.0.1:8787&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Concept + full spec + working code, all open.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/prospectorlabs/luckrig" rel="noopener noreferrer"&gt;https://github.com/prospectorlabs/luckrig&lt;/a&gt;&lt;br&gt;
&lt;a href="https://prospectorlabs.dev/luckrig/" rel="noopener noreferrer"&gt;https://prospectorlabs.dev/luckrig/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>llm</category>
      <category>opensource</category>
      <category>showdev</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>I hid an entire webpage inside a cat face</title>
      <dc:creator>gen</dc:creator>
      <pubDate>Mon, 18 May 2026 13:46:38 +0000</pubDate>
      <link>https://forem.com/prospectorlabs/i-hid-an-entire-webpage-inside-a-cat-face-5gc9</link>
      <guid>https://forem.com/prospectorlabs/i-hid-an-entire-webpage-inside-a-cat-face-5gc9</guid>
      <description>&lt;p&gt;The source of this page is just a cat face:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://nothing-to-see-here.surge.sh/" rel="noopener noreferrer"&gt;https://nothing-to-see-here.surge.sh/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;View source. You'll see (=･ω･=) and nothing else.&lt;br&gt;
But the page runs. Rainbow animation, layout, everything.&lt;/p&gt;

&lt;p&gt;The entire JavaScript is encoded as invisible Unicode &lt;br&gt;
Variation Selectors attached to the cat emoji.&lt;/p&gt;

&lt;p&gt;Unicode VS (U+FE00–FE0F and U+E0100–E01EF) map &lt;br&gt;
precisely to 256 byte values. Any byte sequence can &lt;br&gt;
ride inside normal text, invisible to readers, &lt;br&gt;
surviving copy-paste across Slack, X, LINE, iMessage.&lt;/p&gt;

&lt;p&gt;That's subtext.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://prospectorlabs.dev/subtext" rel="noopener noreferrer"&gt;https://prospectorlabs.dev/subtext&lt;/a&gt;&lt;/p&gt;

</description>
      <category>javascript</category>
      <category>unicode</category>
      <category>webdev</category>
      <category>showdev</category>
    </item>
    <item>
      <title>267 tok/s local inference on RTX 5090 – llama.cpp MTP + Qwen3-35B-A3B MoE</title>
      <dc:creator>gen</dc:creator>
      <pubDate>Mon, 18 May 2026 12:56:14 +0000</pubDate>
      <link>https://forem.com/prospectorlabs/267-toks-local-inference-on-rtx-5090-llamacpp-mtp-qwen3-35b-a3b-moe-2m6p</link>
      <guid>https://forem.com/prospectorlabs/267-toks-local-inference-on-rtx-5090-llamacpp-mtp-qwen3-35b-a3b-moe-2m6p</guid>
      <description>&lt;p&gt;Been running Qwen3-35B-A3B (MoE) with llama.cpp's Multi-Token Prediction &lt;br&gt;
(MTP / speculative decoding) on an RTX 5090 under WSL2. Results surprised me:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Speed&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Ollama stock (35B MoE)&lt;/td&gt;
&lt;td&gt;171 tok/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;27B Dense + MTP&lt;/td&gt;
&lt;td&gt;104 tok/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;35B MoE + MTP&lt;/td&gt;
&lt;td&gt;267 tok/s  ← this&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For context: Claude Haiku runs ~150 tok/s via API, billed at $150/MTok.&lt;br&gt;
This setup runs on electricity only.&lt;/p&gt;

&lt;p&gt;The interesting finding is that MoE and speculative decoding have unusual &lt;br&gt;
synergy. With a dense model, MTP gave a modest speedup (or none). &lt;br&gt;
With MoE, it nearly doubled throughput.&lt;/p&gt;

&lt;p&gt;My hypothesis: MoE's sparse activation pattern leaves compute headroom that &lt;br&gt;
speculative decoding can exploit. The draft tokens are cheap to verify because &lt;br&gt;
most experts stay inactive during verification passes.&lt;/p&gt;

&lt;p&gt;Setup:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;RTX 5090, WSL2 (Ubuntu 24)&lt;/li&gt;
&lt;li&gt;llama.cpp with MTP draft, n-max 2&lt;/li&gt;
&lt;li&gt;Qwen3-35B-A3B-Instruct Q4_K_XL&lt;/li&gt;
&lt;li&gt;ctx 65536, OpenAI-compatible API on localhost&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Happy to share the exact llama-server launch flags if anyone wants to reproduce.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>machinelearning</category>
      <category>llama</category>
      <category>gpu</category>
    </item>
  </channel>
</rss>
