<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Björn Roberg</title>
    <description>The latest articles on Forem by Björn Roberg (@roobie).</description>
    <link>https://forem.com/roobie</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3922688%2F058fa45b-c023-4de1-b455-68aa3d6d1d86.jpeg</url>
      <title>Forem: Björn Roberg</title>
      <link>https://forem.com/roobie</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/roobie"/>
    <language>en</language>
    <item>
      <title>xAI open-sourced their ranker. It doesn't compile.</title>
      <dc:creator>Björn Roberg</dc:creator>
      <pubDate>Sun, 17 May 2026 19:41:18 +0000</pubDate>
      <link>https://forem.com/roobie/xai-open-sourced-their-ranker-it-doesnt-compile-2gfj</link>
      <guid>https://forem.com/roobie/xai-open-sourced-their-ranker-it-doesnt-compile-2gfj</guid>
      <description>&lt;p&gt;On 2026-05-15, xAI pushed an update to &lt;a href="https://github.com/xai-org/x-algorithm" rel="noopener noreferrer"&gt;&lt;code&gt;xai-org/x-algorithm&lt;/code&gt;&lt;/a&gt; — the recommender behind X's For You feed. Three years after Twitter's original 2023 release, under different leadership.&lt;/p&gt;

&lt;p&gt;I cloned it and ran analysis on it.&lt;/p&gt;

&lt;p&gt;The released code does not compile.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's missing
&lt;/h2&gt;

&lt;p&gt;There's no &lt;code&gt;Cargo.toml&lt;/code&gt; anywhere in the repo. The Rust crates reference &lt;code&gt;crate::params::FAVORITE_WEIGHT&lt;/code&gt;, &lt;code&gt;crate::params::REPORT_WEIGHT&lt;/code&gt;, &lt;code&gt;crate::params::OON_WEIGHT_FACTOR&lt;/code&gt; — sixty-plus named symbols — but &lt;code&gt;crate::params&lt;/code&gt; is not in the tree. Same for &lt;code&gt;crate::clients::*&lt;/code&gt; (referenced 58 times, all &lt;code&gt;Prod*Client&lt;/code&gt; implementations missing), the entire &lt;code&gt;xai_feature_switches&lt;/code&gt; plane, and the training code.&lt;/p&gt;

&lt;p&gt;The values were withheld. That's the headline read.&lt;/p&gt;

&lt;h2&gt;
  
  
  The seams of the sanitization
&lt;/h2&gt;

&lt;p&gt;What's interesting is the &lt;em&gt;unevenness&lt;/em&gt; of the pass. The redaction ran out of energy in specific, telling ways:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;TWEET_EVENT_TOPIC&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The topic name variable is preserved. The topic string is empty. A mechanical search-and-replace on string literals left the variable intact.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="nn"&gt;std&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;env&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;var&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same shape — the env-var-reading call survives; the env-var name is empty-stringed out.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;deluxe&lt;/span&gt; &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Actual invalid Python in &lt;code&gt;grox/classifiers/content/safety_ptos.py&lt;/code&gt;. A predicate was deleted; the &lt;code&gt;and&lt;/code&gt; was left dangling. The file does not parse as shipped. Nobody re-ran the test suite after the cut.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;ModelName&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;EAPI_REASONING_INTERNAL&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Preserved verbatim in a public file. An internal model identifier with &lt;code&gt;_INTERNAL&lt;/code&gt; in its name, shipped to a public release. That's a survivor.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;PTOS_CUTOFF_TWEET_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2_054_275_414_225_846_272&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A real Snowflake date cutoff for "tweets after this date require PTOS_REVIEWED label or default to MediumRisk." That's a policy boundary, in plaintext, in a public file.&lt;/p&gt;

&lt;p&gt;Zero &lt;code&gt;TODO&lt;/code&gt;, zero &lt;code&gt;FIXME&lt;/code&gt;, zero &lt;code&gt;XXX&lt;/code&gt; across two hundred source files. Someone swept the comments. Combined with the syntax errors, the empty strings, and the &lt;code&gt;_INTERNAL&lt;/code&gt; leak, it reads as &lt;strong&gt;a hurried mechanical pass over code that wasn't structured for public release&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you're keeping score, this looks like a textbook screw-up: they shipped less than 2023 Twitter did (which shipped actual numeric weights — favorite=0.5, reply=13.5, reply_engaged_by_author=75, report=-369), the redaction was sloppy, and the consolation prize is that nothing in the released form is even runnable.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the redaction did &lt;em&gt;not&lt;/em&gt; touch
&lt;/h2&gt;

&lt;p&gt;But hold that read next to this.&lt;/p&gt;

&lt;p&gt;Every weight has a symbol. Every threshold has a symbol. Every feature flag has a symbol. The values are gone. &lt;strong&gt;The schema is fully public.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Now think about who the competitor audience actually is. Meta, TikTok, Pinterest, Reddit, Discord — companies with their own users, their own engagement data, their own A/B testing infrastructure. What do they lack?&lt;/p&gt;

&lt;p&gt;They don't lack data. They have their own users. &lt;strong&gt;What they lack is the list of dials xAI considers worth having.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;"FAVORITE_WEIGHT = 1.0" is a single data point on someone else's product. But the &lt;em&gt;set&lt;/em&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;FAVORITE_WEIGHT, REPORT_WEIGHT, OON_WEIGHT_FACTOR, NEW_USER_OON_WEIGHT_FACTOR,
AUTHOR_DIVERSITY_DECAY, AUTHOR_DIVERSITY_FLOOR, MAX_POST_AGE, ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's not data. That's the &lt;em&gt;design of the search space&lt;/em&gt;. The values are recoverable from there if you have your own data — which a serious competitor does.&lt;/p&gt;

&lt;p&gt;Search space matters more than search values when you have your own engagement data. The release shipped the search space.&lt;/p&gt;

&lt;h2&gt;
  
  
  So which is it?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Reading A — the oopsie.&lt;/strong&gt; Mechanical sanitization stripped numeric values. The same script left symbol names alone because it didn't have to scrub them. The schema leak is a side-effect. They thought they were hiding the operational details. They didn't realize that for serious competitors, the architecture &lt;em&gt;is&lt;/em&gt; most of the operational details.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reading B — the play.&lt;/strong&gt; Someone at xAI knew exactly what they were doing. The values get you sued, regulated, front-paged. The names signal architectural sophistication, anchor recruiting conversations, lock competitors into your conceptual framework, and let you claim transparency without leaking what actually matters. The schema is &lt;em&gt;meant&lt;/em&gt; to be visible.&lt;/p&gt;

&lt;p&gt;You can't tell which is right from the artifact alone. But the diagnostic question generalizes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Read the function, not the artifact
&lt;/h2&gt;

&lt;p&gt;A partial disclosure is one observation. The interesting object is the &lt;em&gt;function that produces the artifact&lt;/em&gt; — and that function shows itself across releases, not within one.&lt;/p&gt;

&lt;p&gt;Reading A predicts: the next release will have more leaks of the same kind, because the underlying process is mechanical and the codebase keeps evolving.&lt;/p&gt;

&lt;p&gt;Reading B predicts: the next release will close some of the current leaks and open new disclosure surfaces, because there's a calibration loop.&lt;/p&gt;

&lt;p&gt;They diverge over a release cadence, not over a single release.&lt;/p&gt;

&lt;p&gt;If I had to bet right now, I'd bet &lt;strong&gt;both are partially true&lt;/strong&gt;: leadership chose architectural-transparency-without-operational-disclosure as the release shape (deliberate), and the actual execution of that shape was a sanitization pipeline that wasn't entirely careful (accidental). The empty-string Kafka topics and the broken Python are sloppy execution of a deliberate strategy.&lt;/p&gt;

&lt;p&gt;The operational consequence is identical either way: the schema is public. What matters is what comes next.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd take from this
&lt;/h2&gt;

&lt;p&gt;When you can't grep for numbers, &lt;strong&gt;grep for symbol names&lt;/strong&gt;. The set of names tells you what the operator considers a tunable axis. That's design intent, and it's hard to fake or obscure once it's in source.&lt;/p&gt;

&lt;p&gt;The redaction's seams are diagnostic. Mechanical sanitization leaves footprints: empty strings, broken syntax, surviving &lt;code&gt;_INTERNAL&lt;/code&gt; identifiers, zero-&lt;code&gt;TODO&lt;/code&gt; source trees. If the seams are obvious, the redaction was probably mechanical and the disclosure was probably not curated artifact-by-artifact. If there are no seams, somebody was paying attention.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally posted at &lt;a href="https://bjro.dev/posts/xai-schema-disclosure-oopsie-or-onpurpose/" rel="noopener noreferrer"&gt;bjro.dev&lt;/a&gt;. Co-authored with Claude Opus 4.7.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>security</category>
      <category>discuss</category>
    </item>
    <item>
      <title>I tagged every blog post by cognitive mode. Most of mine are knowledge-telling — and that's fine.</title>
      <dc:creator>Björn Roberg</dc:creator>
      <pubDate>Wed, 13 May 2026 09:54:28 +0000</pubDate>
      <link>https://forem.com/roobie/i-tagged-every-blog-post-by-cognitive-mode-most-of-mine-are-knowledge-telling-and-thats-fine-350p</link>
      <guid>https://forem.com/roobie/i-tagged-every-blog-post-by-cognitive-mode-most-of-mine-are-knowledge-telling-and-thats-fine-350p</guid>
      <description>&lt;p&gt;I added a single optional field to my blog's frontmatter. By the end of the afternoon, I had a meta page that classified my own posts into four groups, and the distribution was unflattering in a way I hadn't predicted.&lt;/p&gt;

&lt;p&gt;Here's the field:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;kt_mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;knowledge-transforming&lt;/span&gt;   &lt;span class="c1"&gt;# or knowledge-telling, or mixed&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the schema change, in Zod for an Astro content collection:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;kt_mode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enum&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;knowledge-telling&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;knowledge-transforming&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;mixed&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;optional&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the whole new contract. Optional, so existing posts don't break. Omission means "I haven't classified this yet."&lt;/p&gt;

&lt;p&gt;A new page at &lt;code&gt;/lenses/&lt;/code&gt; reads the field and groups every post under one of four headings: &lt;code&gt;knowledge-transforming&lt;/code&gt;, &lt;code&gt;mixed&lt;/code&gt;, &lt;code&gt;knowledge-telling&lt;/code&gt;, &lt;code&gt;unclassified&lt;/code&gt;. The Astro is unremarkable — &lt;code&gt;getCollection&lt;/code&gt;, group by a property, render a &lt;code&gt;&amp;lt;Card&amp;gt;&lt;/code&gt; per post. I cribbed the structure from my own &lt;code&gt;/archives/&lt;/code&gt; page and replaced "group by year" with "group by &lt;code&gt;kt_mode ?? 'unclassified'&lt;/code&gt;". About forty lines.&lt;/p&gt;

&lt;p&gt;What's worth talking about is what the field is &lt;em&gt;for&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The categories aren't mine
&lt;/h2&gt;

&lt;p&gt;They come from Carl Bereiter and Marlene Scardamalia, &lt;em&gt;The Psychology of Written Composition&lt;/em&gt; (1987). Their central claim, simplified:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Two distinct cognitive processes underlie all written composition, and most writers default to the simpler one without noticing.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Knowledge-telling&lt;/strong&gt; is the retrieve-and-write loop. You locate what you know, locate the rough shape of the genre, write down what comes up, stop when nothing new surfaces. The output is coherent. Your beliefs don't change. Cognitive cost is low. Most tutorials, most how-tos, most "here's a thing I built" posts are knowledge-telling — and that's appropriate. They're not arguing anything.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Knowledge-transforming&lt;/strong&gt; runs two problem spaces in tension: a &lt;em&gt;content&lt;/em&gt; space (&lt;em&gt;what do I actually believe? what's the real shape of this claim?&lt;/em&gt;) and a &lt;em&gt;rhetorical&lt;/em&gt; space (&lt;em&gt;how should this be said, to whom, toward what end?&lt;/em&gt;). Each space's tentative answer constrains the other; the writer iterates between them. Beliefs shift during the writing. Cognitive cost is high. If you can predict the post at the start, you weren't doing this — you were doing the first thing, well.&lt;/p&gt;

&lt;p&gt;The uncomfortable empirical claim: most writing, including by competent practitioners, is knowledge-telling. Knowledge-transforming doesn't happen by accident.&lt;/p&gt;

&lt;h2&gt;
  
  
  The reveal
&lt;/h2&gt;

&lt;p&gt;I classified five pilot posts — two knowledge-transforming, one mixed, two knowledge-telling. The remaining twenty-four sit under "unclassified" because I haven't gone through them yet.&lt;/p&gt;

&lt;p&gt;That ratio is itself diagnostic. I have written more than I have &lt;em&gt;deliberately transformed&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;This is not a moral failing. Knowledge-telling is what most published technical writing &lt;em&gt;should&lt;/em&gt; be. The failure mode B&amp;amp;S names is not "writing knowledge-telling posts." It's &lt;strong&gt;mistaking one for the other&lt;/strong&gt; — treating a knowledge-telling artifact as if it were substrate-bearing thinking. The categories are useful precisely because they expose the confusion. You can't tell from the surface. Both kinds of post can be well-written, well-structured, fully formed. The difference is invisible to the reader unless the writer marks it.&lt;/p&gt;

&lt;p&gt;So I'm marking it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters in 2026
&lt;/h2&gt;

&lt;p&gt;Large language models perform &lt;strong&gt;knowledge-telling natively&lt;/strong&gt;. That's what next-token prediction structurally &lt;em&gt;is&lt;/em&gt; — retrieve plausible content that fits a topic and a genre, write it down, repeat. An LLM has no content problem space because it has no beliefs to revise.&lt;/p&gt;

&lt;p&gt;This is fine when knowledge-telling is what you wanted. It's fine for boilerplate, summaries, restatements of well-trodden material, drafts you'll rewrite.&lt;/p&gt;

&lt;p&gt;The failure mode is when knowledge-telling masquerades as the other thing. An LLM-written essay can have all the surface markers of knowledge-transforming — caveats, opposing views considered, conclusions that update mid-piece — &lt;em&gt;without anyone's beliefs having changed&lt;/em&gt;. The surface is dialectic; the substrate is empty. It reads like thinking happened; nothing was thought.&lt;/p&gt;

&lt;p&gt;That's what &lt;code&gt;kt_mode&lt;/code&gt; is trying to make legible. Not as a quality grade. As a process disclosure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;If this resonates, here's the small thing I'd suggest:&lt;/p&gt;

&lt;p&gt;Add one optional frontmatter field to your blog or notes system. Any classification axis that names something invisible from the surface — cognitive mode, co-authoring posture, genre, audience, your own taxonomy. The constraint is that the categories should be hard to fake and hard to derive from skimming. The act of choosing the value should require an honest look at what you actually wrote.&lt;/p&gt;

&lt;p&gt;Then build a meta page that displays the corpus through that lens. It doesn't have to be pretty. The point is that the classification becomes part of the artifact.&lt;/p&gt;

&lt;p&gt;The first useful thing this gave me wasn't the categories themselves — those were always there. It was the visible &lt;em&gt;unclassified&lt;/em&gt; count, growing every time I declined to look at a post and decide. The blank space is its own diagnostic.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally posted at &lt;a href="https://bjro.dev/posts/tagging-posts-by-cognitive-mode/" rel="noopener noreferrer"&gt;bjro.dev&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>writing</category>
      <category>astro</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
