<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Rasmus Ros</title>
    <description>The latest articles on Forem by Rasmus Ros (@monom).</description>
    <link>https://forem.com/monom</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3900227%2F5d5543f8-2a87-476a-b277-62b24d3f5049.jpeg</url>
      <title>Forem: Rasmus Ros</title>
      <link>https://forem.com/monom</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/monom"/>
    <language>en</language>
    <item>
      <title>Writing the Loss Function</title>
      <dc:creator>Rasmus Ros</dc:creator>
      <pubDate>Sun, 03 May 2026 19:57:32 +0000</pubDate>
      <link>https://forem.com/monom/writing-the-loss-function-3i67</link>
      <guid>https://forem.com/monom/writing-the-loss-function-3i67</guid>
      <description>&lt;p&gt;I keep seeing the same argument about AI making us dumber. It's the same argument people had about search engines, and before that books. The usual response is to point at history and say "every generation panics, every generation was wrong, relax." I think that response is half right, and the wrong half is what bothers me.&lt;/p&gt;

&lt;p&gt;Tools change what we bother to remember. The people who'd trained their whole lives to memorize 10,000-line oral epics watched the craft die when writing showed up. Long arithmetic in your head used to be normal; calculators arrived and the payoff for keeping that skill sharp went away. Brains didn't shrink. The skills just stopped being worth practicing.&lt;/p&gt;

&lt;p&gt;Search engines are the one I lived through. I was a kid when Google replaced Altavista and went from "useful" to being a &lt;a href="https://en.wikipedia.org/wiki/Generic_trademark" rel="noopener noreferrer"&gt;synonym for finding things&lt;/a&gt;. I still remember being amazed that I could search for a zebra and have a picture of one on my screen in only five minutes. Years later I ended up working on search engines as a dev myself in ecommerce, and I've even built one from scratch for &lt;a href="https://www.theca.com" rel="noopener noreferrer"&gt;Theca&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnc4xugy829jid3911jx4.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnc4xugy829jid3911jx4.jpeg" alt="AltaVista interface" width="800" height="339"&gt;&lt;/a&gt;&lt;br&gt;Only 90s kids will understand that this makes you dumber. (It was genuinely bad.)
  &lt;/p&gt;

&lt;p&gt;I don't memorize phone numbers anymore. I don't memorize directions. I don't even memorize the APIs of libraries I use every week. What I do instead is keep a fairly precise mental index of &lt;em&gt;where&lt;/em&gt; things live and &lt;em&gt;what query&lt;/em&gt; will retrieve them. That's a real cognitive trade. I gave up some recall and got back a much larger working set of pointers. Net positive, I think, but I notice the trade in a way I didn't when I was nine.&lt;/p&gt;

&lt;h2&gt;
  
  
  We usually keep teaching
&lt;/h2&gt;

&lt;p&gt;AI tools push the same trade further. They don't just outsource recall, they outsource synthesis: the part where you actually work through a problem and end up with a model of it in your head. I notice this when I let an LLM write code I could have written myself. I get the output, but I didn't build the model, which is usually the part I wanted. The people who worry about atrophy here aren't wrong, and it's worth its own post.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpfsg4fsw7tnd3v6ucwa4.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpfsg4fsw7tnd3v6ucwa4.jpg" alt="Small brain" width="378" height="453"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;One thing the prior cases got right is that society kept teaching the underlying skill anyway. Calculators didn't kill arithmetic class. Search engines didn't kill the library-science basics on how an index actually works. Some skills got canonized as core, worth practicing even after the tool that automated them arrived, because we collectively decided they mattered. Coding hadn't quite reached that status yet, but I think it would have given another decade. AI may have shown up too early for that to happen.&lt;/p&gt;

&lt;p&gt;So the historical pattern mostly holds: tools rewire priorities, some skills fade, others grow, the panic looks silly in retrospect. Where the "relax, every generation panics" crowd gets it wrong is in assuming AI is just the next entry in that list. It might be. But the environment AI is landing in is not the environment the printing press or the early search engine landed in.&lt;/p&gt;

&lt;h2&gt;
  
  
  The loop is the problem
&lt;/h2&gt;

&lt;p&gt;Books don't optimize you. Calculators don't optimize you. Search engines, at the lookup layer at least, were mostly trying to give you the page you asked for and then get out of the way. Modern search has piled on ads and ranking incentives since, but the core "find it and leave" loop is still recognizable. The dominant information channel today is none of those things. It's a feed, and the feed is an optimizer. The target variable is engagement.&lt;/p&gt;

&lt;p&gt;Earlier tools removed friction from a specific task and let you spend the saved effort somewhere else. A feed isn't trying to remove friction from anything you'd recognize as a task. It's trying to keep you in the loop. The reward signal it's chasing (what makes you click, stay, scroll, react) is not the same signal as "this was useful to me." It's often the opposite.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://mental.jmir.org/2022/4/e33450" rel="noopener noreferrer"&gt;There's data on this now.&lt;/a&gt; Heavy social media use predicts elevated depression and anxiety in kids and young adults. Longitudinal studies find the social media use comes first, not the depression.&lt;/p&gt;

&lt;p&gt;And then you wire a generative model into the same loop. Generative AI doesn't change the objective, it just gives the loop a faster, cheaper supply tuned to whatever it already rewards.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6r7dmtfd04kiklplks0g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6r7dmtfd04kiklplks0g.png" alt="Diagram of engagement loop with AI" width="800" height="406"&gt;&lt;/a&gt;&lt;br&gt;Left: today's engagement loop, ranking from a human-made pool. Right: the same loop with a generative model in place of the pool.
  &lt;/p&gt;

&lt;h2&gt;
  
  
  Adding AI to the stack
&lt;/h2&gt;

&lt;p&gt;My background is in optimization. The recurring question I work on is what a product should actually be optimizing for (PhD on automating A/B testing, &lt;a href="https://eignex.com/about" rel="noopener noreferrer"&gt;Eignex&lt;/a&gt; the side project still chasing it). So when I look at "LLMs plus arecommendation feed" it looks to me like the same loop with a much better content supply. Not really a new content medium.&lt;/p&gt;

&lt;p&gt;The version running today doesn't even use generation in the loop. The recommender stacks at the big platforms (Meta, TikTok, YouTube) are still doing what they've done for a decade: ranking content other people uploaded. The supply pool was already effectively infinite after years of user-generated content. The change is that a growing share of what gets uploaded is now AI-made, and the existing optimizer ranks the synthetic stuff exactly like everything else.&lt;/p&gt;

&lt;p&gt;The scarier version puts the generator inside the loop, per-user posts written for you on demand. That sounds like fiction, and we don't have it. The thing is, we don't need it. The pool of generated content is already absurd enough that something in it fits your viewing history, your current mood, and what you had for breakfast. The optimizer just has to find it. A pool that grows by millions of items a day, at near-zero cost per item, behaves a lot like an on-demand generator.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyqy7zqgs1b1xiq9vp0ky.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyqy7zqgs1b1xiq9vp0ky.png" alt="Diagram of AI filling in the blanks of content topics" width="800" height="492"&gt;&lt;/a&gt;&lt;br&gt;Each dot is a post in embedding space. Human posts (blue) cluster on popular topics; AI posts (red) fill the gaps.
  &lt;/p&gt;

&lt;p&gt;None of this is hypothetical. AI-generated music has already racked up millions of streams on Spotify before anyone noticed it wasn't human (the &lt;a href="https://www.theguardian.com/technology/2025/jul/14/an-ai-generated-band-got-1m-plays-on-spotify-now-music-insiders-say-listeners-should-be-warned" rel="noopener noreferrer"&gt;Velvet Sundown&lt;/a&gt; story last summer was the most visible example). Facebook is saturated with generative slop: fabricated heart-warming stories, &lt;a href="https://boingboing.net/2025/02/21/how-ai-generated-sadcore-posts-exploit-facebook-users-for-profit.html" rel="noopener noreferrer"&gt;sculptures supposedly carved by a 92-year-old grandpa nobody appreciates&lt;/a&gt;, content farms running cheap image generators to chase engagement[^slop], and the people reliably engaging with it skew &lt;a href="https://www.thedailybeast.com/how-seniors-are-falling-for-ai-generated-pics-on-facebook/" rel="noopener noreferrer"&gt;much older&lt;/a&gt;. The TikTok-side version of the same dynamic is "&lt;a href="https://en.wikipedia.org/wiki/Italian_brainrot" rel="noopener noreferrer"&gt;Italian brainrot&lt;/a&gt;", absurd AI-generated creatures with names like Tralalero Tralala and Bombardiro Crocodilo, captioned with nonsense-Italian audio dubs, pulling hundreds of millions of views from a much younger audience.&lt;/p&gt;

&lt;p&gt;Facebook's own VP described the dynamic &lt;a href="https://futurism.com/artificial-intelligence/facebook-ai-slop-dark" rel="noopener noreferrer"&gt;in plain terms to Futurism&lt;/a&gt; earlier this year: "if you, as a user, are interested in a piece of content which happens to be AI-generated, the recommendations algorithm will determine that, over time, you are interested in this topic." None of this uses particularly sophisticated tech, and it's already running at scale.&lt;/p&gt;

&lt;p&gt;This loop doesn't get out of the way like search did. It takes friction out of producing whatever the optimizer rewards. Right now that's engagement, so the system gets better at engagement. Nothing malicious has to happen for that to land badly; it's doing exactly what it was asked.&lt;/p&gt;

&lt;h2&gt;
  
  
  The objective is a choice
&lt;/h2&gt;

&lt;p&gt;I'm not fully pessimistic about this, though.&lt;/p&gt;

&lt;p&gt;The objective is a choice. Engagement isn't a law of physics. Somebody picked clicks or watch time because it was easy to measure and correlated with revenue. People also reach for banning AI-generated content here. That isn't it either: "the machine wrote it" isn't a stable category once the machines are this good. The thing to push on is the loss function itself (what the system is told to optimize for), and the loss function is written by people.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frcmif4azhr3om05i4j8e.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frcmif4azhr3om05i4j8e.jpg" alt="Moses meme holding stone table with the -clicks loss function" width="800" height="800"&gt;&lt;/a&gt;&lt;br&gt;The original loss function.
  &lt;/p&gt;

&lt;p&gt;The irony's not lost on me that if you're reading this, it probably reached you through one of these feeds. As engineers we&lt;br&gt;
like to act like the loss function is handed down on stone tablets.&lt;/p&gt;

&lt;p&gt;It isn't. Somebody wrote it, and on the products I work on that somebody is me.&lt;/p&gt;

&lt;p&gt;There is research on what "different" could look like: ranking for &lt;a href="https://arxiv.org/abs/2501.06274" rel="noopener noreferrer"&gt;informational diversity&lt;/a&gt;, or ranking on whether users still endorse a piece of content &lt;a href="https://arxiv.org/abs/2212.00419" rel="noopener noreferrer"&gt;a week later instead of whether they reacted in the first three seconds&lt;/a&gt;. None of it is mature, none of it has a business model behind it the way engagement does, and that's the real obstacle, not the technical side. The systems are perfectly capable of optimizing for something else. The question is whether anyone with the keys wants to. I'd rather sort it out before the next, much more capable generator gets wired into the&lt;br&gt;
same loop.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;No zebras were harmed in the making of this post.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>discuss</category>
      <category>algorithms</category>
    </item>
    <item>
      <title>KEncode: Packing Data for Strict Limits</title>
      <dc:creator>Rasmus Ros</dc:creator>
      <pubDate>Thu, 30 Apr 2026 07:21:53 +0000</pubDate>
      <link>https://forem.com/monom/kencode-packing-data-for-strict-limits-3agp</link>
      <guid>https://forem.com/monom/kencode-packing-data-for-strict-limits-3agp</guid>
      <description>&lt;p&gt;Over the past few years, I found myself occasionally writing the same boilerplate: manually packing bits of application state into tight, heavily character-limited strings. It ended up with me creating a library for it called kencode. But first it's story time... and then a little explanation of the underlying tech of why &lt;code&gt;kotlinx.serialization&lt;/code&gt; is so cool and THEN I'll go over kencode.&lt;/p&gt;

&lt;p&gt;It all started with URL callback links on an integrated Search Engine Results Page (SERP). In a previous project at &lt;a href="https://www.theca.com" rel="noopener noreferrer"&gt;Theca&lt;/a&gt;, we had built a search engine embedded directly into a client's website. When users clicked a search result, the link first redirected to our servers so we could register telemetry for the click before finally sending them to the actual target page.&lt;/p&gt;

&lt;p&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftq2gxjpnpkloqkva4w00.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftq2gxjpnpkloqkva4w00.png" width="680" height="765"&gt;&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;This is standard tracking infrastructure stuff. But if enough state can be encoded directly into the URL, the tracking server can bypass an expensive database lookup entirely. In this particular case, we needed to pass the query ID, the user ID, the document ID, and the exact position in the SERP (the redirection target itself is appended as well, but does not benefit from compression). One database call is not much, but latency &lt;a href="https://glinden.blogspot.com/2006/11/marissa-mayer-at-web-20.html" rel="noopener noreferrer"&gt;does matter&lt;/a&gt; for initial impressions.&lt;/p&gt;

&lt;p&gt;Having a short URL here is nice, they look more professional, and there is a limit to how long URLs can be (browser specific). We also want there to be no special characters in the encoded result. That includes hyphens and underscore, since that would otherwise break the word selecting logic. Try to select the entire path by double-clicking in this URL and you'll see: &lt;code&gt;https://example.com/hyphen-path&lt;/code&gt;. But here it works just fine to select dQw...: &lt;code&gt;https://www.youtube.com/watch?v=dQw4w9WgXcQ&lt;/code&gt; since it's a single word.&lt;/p&gt;

&lt;p&gt;Anyway...&lt;/p&gt;

&lt;p&gt;Then the same encoding problem happened again with Kubernetes pod names. I was dynamically spinning up short-lived jobs and wanted to embed trace IDs somehow. Naturally, this metadata should also be stored in Kubernetes labels so it remains queryable with &lt;code&gt;kubectl&lt;/code&gt;. But since you need a unique name for a pod regardless you might as well use something more informative than the default random suffix, so I put it in the name.&lt;/p&gt;

&lt;p&gt;Besides, relying on labels to pass execution state creates tons of error-prone boilerplate. To read that state back, you typically have to fetch the labels by name and manually parse strings, something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;clientId&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"clientId"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;toIntOrNull&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;?:&lt;/span&gt; &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Missing clientId"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;batchId&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"batchId"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;toIntOrNull&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;?:&lt;/span&gt; &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Missing batchId"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;retryCount&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"retryCount"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;toIntOrNull&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;?:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;isPriority&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"isPriority"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;toBooleanStrictOrNull&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;?:&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Kubernetes also imposes a strict 63-character limit on names and only allows alphanumeric characters and hyphens. Encoding efficiency becomes a limiting factor here.&lt;/p&gt;

&lt;p&gt;Later, I ran into this encoding problem a third time while implementing stateless pagination links for that SERP. We had built a complex hybrid search system merging traditional keyword matching with semantic vector search. Paginating correctly through these blended results meant we had to carry internal ranking state from page to page. This state lived entirely inside a &lt;code&gt;?next=xxx&lt;/code&gt; query parameter, meaning the payload had to be compact, URL-safe, and opaque to the user.&lt;/p&gt;

&lt;p&gt;And now, I find myself needing it a fourth time for my current project &lt;a href="https://eignex.com/about" rel="noopener noreferrer"&gt;Eignex&lt;/a&gt;. It's an optimization engine for doing structured optimization in production to automatically tune things like model parameters or ranking weights. Think of it like an advanced multi-variate A/B test. It requires tracking chosen values for the optimization problems until we have a result, at which point we update the optimization algorithm. By potentially passing that state in a token to the front-end and back we can avoid storing it in a massive dict of &lt;code&gt;user ID to settings&lt;/code&gt; on the back-end.&lt;/p&gt;

&lt;p&gt;I realize this is not an everyday problem, but I have now encountered it four separate times. I think the ability to pack complex state into a tiny string is a useful architectural trick. Doing it manually each time is error-prone.&lt;/p&gt;

&lt;p&gt;This is where kencode shines. You define a data class and get strong typing directly from the decoded payload:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Serializable&lt;/span&gt;
&lt;span class="kd"&gt;data class&lt;/span&gt; &lt;span class="nc"&gt;JobState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;clientId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;batchId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;retryCount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Int&lt;/span&gt;&lt;span class="p"&gt;?,&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;isPriority&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Boolean&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;state&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;JobState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;119&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;210&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;encodedState&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;EncodedFormat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encodeToString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;// This encodes the object into the string:&lt;/span&gt;
&lt;span class="c1"&gt;// 03W8mJ&lt;/span&gt;

&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;decodedState&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;EncodedFormat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;decodeFromString&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;JobState&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;encodedState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For comparison, the same object in other encodings:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Encoding&lt;/th&gt;
&lt;th&gt;Length&lt;/th&gt;
&lt;th&gt;Output&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;JSON&lt;/td&gt;
&lt;td&gt;66 chars&lt;/td&gt;
&lt;td&gt;&lt;code&gt;{"clientId":119,"batchId":210,"retryCount":null,"isPriority":true}&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Protobuf + Base64&lt;/td&gt;
&lt;td&gt;10 chars&lt;/td&gt;
&lt;td&gt;&lt;code&gt;CHcQ0gEgAQ&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;kencode (Base62)&lt;/td&gt;
&lt;td&gt;6 chars&lt;/td&gt;
&lt;td&gt;&lt;code&gt;03W8mJ&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;kencode is implemented as a custom format on top of the &lt;code&gt;kotlinx.serialization&lt;/code&gt; library, which has quite a different approach to serialization compared to other JVM libraries. Why that is the case requires some context.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why kotlinx.serialization?
&lt;/h2&gt;

&lt;p&gt;Before libraries like modern Jackson became the standard, serializing Java objects usually involved writing manual boilerplate. If you need to support multiple formats like Protobuf in addition to JSON you will suffer. Manually crafting custom serializers for every single combination of data type and output format (the classic NxM problem) is simply not the way.&lt;/p&gt;

&lt;p&gt;To reduce this boilerplate, runtime reflection libraries like Gson and Jackson became popular. Under the hood, when an object is serialized, these libraries inspect the class at runtime to find its fields, their types, and their values. They map these fields to sequential tokens on the fly. This makes standard JSON-focused libraries easy to use, but not necessarily easy to extend.&lt;/p&gt;

&lt;p&gt;The sequential model of serializing makes it difficult to create formats that perform aggregate operations on the entire class. kencode relies on exactly this kind of optimization to compact the payload, like grouping all boolean fields and nullability flags into a single bitmask header.&lt;/p&gt;

&lt;p&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3tagxyeqaa0wn32mwey5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3tagxyeqaa0wn32mwey5.png" width="800" height="2248"&gt;&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;There is also a hard performance ceiling on the reflection, and here is some sage advice: &lt;a href="https://vercel.com/blog/how-we-made-global-routing-faster-with-bloom-filters" rel="noopener noreferrer"&gt;never&lt;/a&gt; &lt;a href="https://branchfree.org/2019/02/25/paper-parsing-gigabytes-of-json-per-second/" rel="noopener noreferrer"&gt;ignore&lt;/a&gt; &lt;a href="https://www.linkedin.com/blog/engineering/infrastructure/linkedin-integrates-protocol-buffers-with-rest-li-for-improved-m" rel="noopener noreferrer"&gt;the&lt;/a&gt; &lt;a href="https://www.cockroachlabs.com/blog/high-performance-json-parsing/" rel="noopener noreferrer"&gt;cost&lt;/a&gt; &lt;a href="https://www.uber.com/en-AU/blog/go-geofence-highest-query-per-second-service/" rel="noopener noreferrer"&gt;of&lt;/a&gt; &lt;a href="https://blog.openresty.com/en/xray-customer-casestudy-dns/" rel="noopener noreferrer"&gt;serialization&lt;/a&gt;. Reflection libraries do usually cache the reflection steps, but the issue is not the reflection itself. It's that interpreting these cached steps at runtime is inherently slower than executing statically compiled code. When a reflection library loops over the fields of your class, it essentially calls a method like &lt;code&gt;serializer.write(fieldValue)&lt;/code&gt; over and over. Since your fields are all different types, that is a &lt;a href="https://shipilev.net/jvm/anatomy-quarks/16-megamorphic-virtual-calls/" rel="noopener noreferrer"&gt;megamorphic call site&lt;/a&gt; which the compiler can't inline or optimize well.&lt;/p&gt;

&lt;p&gt;This is why kotlinx.serialization takes another approach completely. Instead of relying on reflection at runtime, it generates static serializers at compile time. The approach is similar to Rust's serde framework, allowing for highly optimized serialization without resorting to manual boilerplate.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"This all sounds good but where is the evidence?"&lt;/em&gt; It's probably what I would think at this point. Well, there is actually a &lt;a href="https://itegam-jetia.org/journal/index.php/jetia/article/view/3040" rel="noopener noreferrer"&gt;recent study&lt;/a&gt; comparing kotlinx.serialization to Gson and Jackson &lt;em&gt;(full disclosure: the journal it's published in is a bit dubious, but the actual benchmark methodology looks good)&lt;/em&gt;. They found that the static compiled approach outperforms Gson and Jackson in most cases in both CPU and memory. kotlinx.serialization was especially good with small payloads with many repetitions. For very large payloads, Jackson was slightly faster. These results are also backed up by &lt;a href="https://tech.teaddict.net/kotlin/programming/json/2025/02/15/kotlin-json-performance/" rel="noopener noreferrer"&gt;this benchmark&lt;/a&gt; for CPU only.&lt;/p&gt;

&lt;p&gt;In kotlinx.serialization, when a Kotlin data class is annotated with @Serializable, a compiler plugin hooks directly into the build process. It inspects the exact "shape" of the data class and synthetically generates a custom KSerializer implementation for it. Because this happens at compile time, there are no expensive runtime reflection loops or type-guessing. The generated code is strictly typed. This makes JIT happy, which is why kotlinx.serialization is good in high-repetition benchmarks.&lt;/p&gt;

&lt;p&gt;The plugin handles what you'd expect from a serialization library:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Primitives: Mapped directly to basic, unboxed encoder instructions.&lt;/li&gt;
&lt;li&gt;Generics: The generated serializers simply accept child serializers as constructor arguments, so something like a &lt;code&gt;Response&amp;lt;T&amp;gt;&lt;/code&gt; knows exactly how to serialize its generic payload.&lt;/li&gt;
&lt;li&gt;Polymorphism: annotating a sealed class automatically generates a serializer that injects a class discriminator (like a &lt;code&gt;"@type": "MyClass"&lt;/code&gt; string) so the decoder knows which specific subclass to instantiate later.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The generated serializer for JobState (from above) will look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Generated automatically by the @Serializable compiler plugin&lt;/span&gt;
&lt;span class="kd"&gt;object&lt;/span&gt; &lt;span class="nc"&gt;JobStateSerializer&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;KSerializer&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;JobState&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

    &lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;descriptor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;SerialDescriptor&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt;
        &lt;span class="nf"&gt;buildClassSerialDescriptor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"JobState"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;element&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Int&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="s"&gt;"clientId"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;element&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Int&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="s"&gt;"batchId"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;element&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Int&lt;/span&gt;&lt;span class="p"&gt;?&amp;gt;(&lt;/span&gt;&lt;span class="s"&gt;"retryCount"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;element&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Boolean&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="s"&gt;"isPriority"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;serialize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;encoder&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Encoder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;JobState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;composite&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;encoder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;beginStructure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;descriptor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;composite&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encodeIntElement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;descriptor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clientId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;composite&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encodeIntElement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;descriptor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;batchId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;composite&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encodeNullableSerializableElement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;descriptor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Int&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;serializer&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;retryCount&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;composite&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encodeBooleanElement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;descriptor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isPriority&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;composite&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;endStructure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;descriptor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;deserialize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;decoder&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Decoder&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;JobState&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// This method is analogous to serialize and a bit longer, due to formats with arbitrary ordering like JSON.&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The benefit here is that there are no generic loops and the call sites are strictly typed (monomorphic), which is a massive speed advantage!&lt;/p&gt;

&lt;p&gt;If you're more curious about the details of how the code generation works I really recommend &lt;a href="https://www.revenuecat.com/blog/engineering/kotlinx-serialization/" rel="noopener noreferrer"&gt;this post&lt;/a&gt;. For example, what about which constructor to call (and how) in deserialize?&lt;/p&gt;

&lt;p&gt;Notice how serialize just calls methods on an Encoder. The KSerializer provides the data shape while the Encoder writes it. This separation is why it's so convenient to do custom formats in kotlinx.serialization.&lt;/p&gt;

&lt;p&gt;So to wrap up so far, kotlinx.serialization has three layers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Format (StringFormat or BinaryFormat): The entrypoint of the library, like &lt;code&gt;Json.encodeToString()&lt;/code&gt; or &lt;code&gt;ProtoBuf.encodeToByteArray()&lt;/code&gt;. This is also where you configure and create the underlying encoder/decoders.&lt;/li&gt;
&lt;li&gt;Encoder and Decoder: The actual format implementation. They map the shape from the serializer into the logical structure of the output format.&lt;/li&gt;
&lt;li&gt;Serializer: Generated at compile time for classes annotated with @Serializable or manually constructed.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;p&gt;Let's dive into kencode.&lt;/p&gt;

&lt;p&gt;I ended up splitting it into three separate pieces: a compact binary format, a general byte-to-text encoder, and a small composition layer that turns the whole thing into a normal string format. The binary format and text encoders can be used separately.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. PackedFormat
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;PackedFormat&lt;/code&gt; is the biggest part of the library. It contains the logic to serialize Kotlin objects into small byte arrays.&lt;/p&gt;

&lt;p&gt;The format assumes both sides already agree on the schema. This is quite a strong assumption and definitely not what you want for persistence or cross-language communication. But when the assumption holds, we can save a lot of space not encoding structural information that both sides already know.&lt;/p&gt;

&lt;p&gt;Its other core optimizations are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bitmask headers: boolean fields and nullability markers are packed into a compact bitset header, costing 1 bit per field instead of the usual 1 byte.&lt;/li&gt;
&lt;li&gt;Merged nested headers: bitmask bits from nested class fields are collected into a single root-level header, eliminating the per-class byte-alignment padding that would otherwise be wasted at each nesting boundary.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Variable-length integers: Standard integer fields waste space because they always consume 4 or 8 bytes, even for small numbers. We shrink them using varint (LEB128) and ZigZag encodings.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Varint works by using the most significant bit (MSB) of each byte as a "continuation flag." If the bit is &lt;code&gt;1&lt;/code&gt;, more bytes follow; if &lt;code&gt;0&lt;/code&gt;, it's the final byte. This allows small positive numbers to squeeze into a single byte.&lt;/li&gt;
&lt;li&gt;ZigZag maps small negative numbers to small positive numbers (0 → 0, -1 → 1, 1 → 2, -2 → 3, etc.), keeping the encoded size small.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Collection bitmaps: boolean lists and nullable element lists pack their flags into a leading bitmap rather than storing one byte per element.&lt;/p&gt;&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Together these optimizations explain how the &lt;code&gt;JobState&lt;/code&gt; example was compacted.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcpqo9r7x260x73k868ud.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcpqo9r7x260x73k868ud.png" alt="Payload example" width="800" height="238"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The header for a flat class is straightforward: one bit per boolean field, one bit per nullable field (0 = null, 1 = present), packed into a bitset with the smallest number of bytes. For &lt;code&gt;JobState&lt;/code&gt;, that is two bits total, which is just a single byte. The field data follows immediately after.&lt;/p&gt;

&lt;p&gt;Nesting complicates this. If &lt;code&gt;JobState&lt;/code&gt; had a nested class, the naïve approach would write a separate header for each class in the tree. But class boundaries force byte alignment, wasting space. Instead, all bits are merged into a single shared header at the root.&lt;/p&gt;

&lt;p&gt;Collections work differently due to their dynamic size. The element count comes first as a varint and then each value is packed after.&lt;/p&gt;

&lt;p&gt;PackedFormat is the layer that actually reduces the payload. Everything after this is about transport.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The Text Layer: ASCII-Safe Codecs
&lt;/h3&gt;

&lt;p&gt;Transporting byte data as text is a common operation and usually handled by Base64 encoding it. In kencode, we support multiple encodings like Base62, Base64, and Base85.&lt;/p&gt;

&lt;p&gt;Base62 is the default because it stays alphanumeric while still being dense.&lt;/p&gt;

&lt;p&gt;The encoder uses chunking to avoid &lt;code&gt;O(n^2)&lt;/code&gt; BigInteger costs, keeping it effectively &lt;code&gt;O(n)&lt;/code&gt;.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Codec&lt;/th&gt;
&lt;th&gt;chars / byte&lt;/th&gt;
&lt;th&gt;Alphabet&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Base36&lt;/td&gt;
&lt;td&gt;1.55&lt;/td&gt;
&lt;td&gt;&lt;code&gt;0-9 a-z&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Base62&lt;/td&gt;
&lt;td&gt;1.34&lt;/td&gt;
&lt;td&gt;&lt;code&gt;0-9 a-z A-Z&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Base64&lt;/td&gt;
&lt;td&gt;1.33&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;0-9 a-z A-Z&lt;/code&gt; + 2 symbols&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Base85&lt;/td&gt;
&lt;td&gt;1.25&lt;/td&gt;
&lt;td&gt;85 printable ASCII&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  3. EncodedFormat
&lt;/h3&gt;

&lt;p&gt;Finally there is &lt;code&gt;EncodedFormat&lt;/code&gt;, which combines everything into a &lt;code&gt;StringFormat&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;format&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;EncodedFormat&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;binaryFormat&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PackedFormat&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;defaultEncoding&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;IntPacking&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;SIGNED&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;transform&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;encryptingTransform&lt;/span&gt;
    &lt;span class="n"&gt;codec&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Base62&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;token&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;format&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encodeToString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This allows layering encryption, checksums, or compression.&lt;/p&gt;




&lt;p&gt;Anyway, that's kencode. Source is at &lt;a href="https://github.com/eignex/kencode" rel="noopener noreferrer"&gt;https://github.com/Eignex/kencode&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>kotlin</category>
      <category>webdev</category>
      <category>performance</category>
      <category>backend</category>
    </item>
    <item>
      <title>Building Eignex in the Open</title>
      <dc:creator>Rasmus Ros</dc:creator>
      <pubDate>Mon, 27 Apr 2026 11:21:57 +0000</pubDate>
      <link>https://forem.com/monom/building-eignex-in-the-open-5c4i</link>
      <guid>https://forem.com/monom/building-eignex-in-the-open-5c4i</guid>
      <description>&lt;p&gt;I've always been fascinated by applying optimization to solve real-world problems.&lt;/p&gt;

&lt;p&gt;It is often an inherently multidisciplinary activity, and there is something deeply satisfying about taking distinct, often siloed ideas and jamming them together to create something that is fundamentally better than the sum of its parts. In my PhD thesis it was search-based optimization, multi-armed bandit algorithms, combinatorial optimization, probabilistic machine learning, and of course, software engineering.&lt;/p&gt;

&lt;p&gt;I wrapped up my PhD thesis back in 2022. I loved the work itself, digging deep into continuous optimization and A/B testing, but I realized pretty quickly that I didn't want to stay in academia.&lt;/p&gt;

&lt;p&gt;The environment felt incredibly results-driven, but often in the wrong way. It felt like to be successful you have to play the academic game of marketing your work, rather than the pure engineering challenge of solving a hard problem and making it robust.&lt;/p&gt;

&lt;p&gt;I wasn't ready to stop working on optimization just because I left the university, though. I actually find this stuff fun. I wanted to keep building, but I wanted to build tools that actually &lt;em&gt;work&lt;/em&gt; in the real world, not just in a paper.&lt;/p&gt;

&lt;p&gt;That's basically why I started the Eignex project.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Open Source?
&lt;/h3&gt;

&lt;p&gt;To me, open sourcing the work felt like a no-brainer. It wasn't a strategic decision I thought twice over.&lt;/p&gt;

&lt;p&gt;First, I enjoy writing high-performance code, and it's simply more fun when other people can use it. But more importantly, there is a trust factor.&lt;/p&gt;

&lt;p&gt;If you are building infrastructure that is going to automatically tweak parameters on a live production system, you shouldn't be doing it inside a black box. If a piece of software is going to turn knobs on my server, I want to see the code. I want to know exactly how it makes decisions and how safety constraints are enforced.&lt;/p&gt;

&lt;p&gt;That's why all the building blocks of the core engines are public. You can audit the math yourself and contribute if you want.&lt;/p&gt;

&lt;h3&gt;
  
  
  The End Goal
&lt;/h3&gt;

&lt;p&gt;Let's be real: making money on open source is notoriously difficult. I'm not under any illusions about that, and I'm not trying to build something to make a living out of.&lt;/p&gt;

&lt;p&gt;The plan, though, is to eventually build a managed SaaS.&lt;/p&gt;

&lt;p&gt;It doesn't exist yet. Right now, I'm just focusing on building the core engine from the bottom-up, one library at a time. But the long-term goal is to build a platform that handles the messy parts of running these optimization loops in production. Things like dashboards, persistent state management, and k8s setup.&lt;/p&gt;

&lt;p&gt;If I can eventually get that managed service to a point where it covers the server bills, I'll call that a win.&lt;/p&gt;

&lt;p&gt;For now, I'm just building.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>buildinpublic</category>
      <category>academia</category>
      <category>startup</category>
    </item>
  </channel>
</rss>
