<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: yep</title>
    <description>The latest articles on Forem by yep (@yepchaos).</description>
    <link>https://forem.com/yepchaos</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2089712%2Ffa6eed7d-19b8-48b9-8c23-dd66c11a895e.jpg</url>
      <title>Forem: yep</title>
      <link>https://forem.com/yepchaos</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/yepchaos"/>
    <language>en</language>
    <item>
      <title>Front-End &amp; Struggles</title>
      <dc:creator>yep</dc:creator>
      <pubDate>Fri, 10 Apr 2026 13:43:31 +0000</pubDate>
      <link>https://forem.com/yepchaos/front-end-struggles-14f9</link>
      <guid>https://forem.com/yepchaos/front-end-struggles-14f9</guid>
      <description>&lt;p&gt;I didn’t have much frontend experience. This post covers the struggles I ran into.&lt;/p&gt;

&lt;h2&gt;
  
  
  Starting Point: React + TypeScript + Plain CSS
&lt;/h2&gt;

&lt;p&gt;React with TypeScript felt like the obvious choice — popular, good ecosystem, type safety. I started writing plain CSS modules for styling. Full control, right?&lt;/p&gt;

&lt;p&gt;The problem wasn't the code. It was that I had no design vision. I'd write a component, look at it, and know it looked bad but not know how to fix it. What colors? How much padding? How should this align? I couldn't answer these questions. The feedback loop was: write code → look bad → feel stuck → repeat. This went on for weeks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tailwind CSS Didn't Solve the Real Problem
&lt;/h2&gt;

&lt;p&gt;I switched to Tailwind CSS thinking it would help. It sped things up — utility classes are fast to write, no context switching between files. But Tailwind is a tool for people who already know what they want to build. It doesn't give you design vision, it just makes it faster to execute one. I still didn't know what I wanted.&lt;/p&gt;

&lt;p&gt;I tried Figma. I couldn't make anything that looked good there either. The problem wasn't the tools.&lt;/p&gt;

&lt;p&gt;I restarted the project multiple times during this period — changing UI approach, structure, and direction. It felt like progress but mostly wasn't. This lasted around 3-4 weeks. Eventually I accepted that I wasn't going to figure out design from scratch and looked for a component library.&lt;/p&gt;

&lt;h2&gt;
  
  
  shadcn/ui
&lt;/h2&gt;

&lt;p&gt;I looked at Material UI and Ant Design. Both felt heavy and opinionated in ways that would fight me later. I wanted something that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Looked good out of the box&lt;/li&gt;
&lt;li&gt;Integrated with Tailwind&lt;/li&gt;
&lt;li&gt;I could actually own the code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;shadcn/ui fit all of this. The key difference is that components are generated into your codebase rather than installed as a dependency. This gives full control over behavior and styling, without being constrained by a library’s abstraction. That turned out to matter a lot later when I needed to adapt components for React Native.&lt;/p&gt;

&lt;h2&gt;
  
  
  Next.js: Tried It, Left It
&lt;/h2&gt;

&lt;p&gt;Around this time I migrated to Next.js. SSR, SSG, the whole thing. After a while I realized most of my components were client-side anyway. ASTRING is a chat app — it's dynamic content fetched after load, not static pages that benefit from SSR. I moved back to React with Vite. Faster dev server, simpler setup, no framework fighting back.&lt;/p&gt;

&lt;h2&gt;
  
  
  State Management: Jotai
&lt;/h2&gt;

&lt;p&gt;Chat state is genuinely complex — active rooms, message lists, unread counts, real-time updates coming in from WebSocket, user presence. Redux felt like too much ceremony for this. Context API caused re-render problems as state got more interconnected.&lt;/p&gt;

&lt;p&gt;Jotai worked well. Atoms are simple to create, updates are granular, and the mental model maps cleanly to "this piece of state, these components that care about it." Chat state in particular became much cleaner — each room's state is an atom, components subscribe only to what they need.&lt;/p&gt;

&lt;h2&gt;
  
  
  React Native: Why I Left Ionic
&lt;/h2&gt;

&lt;p&gt;I built the mobile version with Ionic React first. Code reuse from the web was easy. But Ionic started showing limits for a chat app specifically — animations felt off, native components were lacking, the "native feel" wasn't there. Chat apps have specific UX expectations: smooth scrolling through message history, keyboard handling, swipe gestures, native-feeling transitions. Ionic's web-based approach couldn't deliver these well enough.&lt;/p&gt;

&lt;p&gt;I moved to React Native with Expo. Expo makes the setup significantly easier — no manual native build configuration, good tooling, OTA updates.&lt;/p&gt;

&lt;h2&gt;
  
  
  Current Setup: Monorepo with pnpm Workspaces
&lt;/h2&gt;

&lt;p&gt;Moving to React Native meant I had two apps — web (React + Vite) and mobile (React Native + Expo). Rather than duplicate code, I set up a monorepo with pnpm workspaces. Nothing fancy.&lt;/p&gt;

&lt;p&gt;Shared packages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Services&lt;/strong&gt; — business logic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API clients&lt;/strong&gt; — all backend communication&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Jotai atoms&lt;/strong&gt; — shared state: rooms, chats, user cache&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Web and mobile each have their own UI layer, but everything underneath is shared. State is defined once — a room list atom, a message cache atom, a user presence atom — and both platforms consume the same atoms. This means a bug fix or API change happens once, and state behavior is consistent across platforms.&lt;/p&gt;

&lt;p&gt;For styling on React Native I use NativeWind — Tailwind for React Native. Same utility classes I use on web, works on mobile. Makes the styling consistent and fast.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;shadcn/ui on React Native&lt;/strong&gt; — since shadcn components live in my codebase, I adapted them for React Native manually. &lt;code&gt;div&lt;/code&gt; → &lt;code&gt;View&lt;/code&gt;, &lt;code&gt;button&lt;/code&gt; → &lt;code&gt;Pressable&lt;/code&gt;, CSS → NativeWind classes. It's tedious but straightforward, and the result is consistent components across both platforms without two completely separate design systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where It Stands
&lt;/h2&gt;

&lt;p&gt;The code is messy in places. The UI is functional but not something I'm proud of visually. The monorepo structure is working well. Mobile is better than Ionic was for the specific things chat needs.&lt;/p&gt;

&lt;p&gt;The main thing I learned: frontend is a different skill set. I kept thinking better tools would solve a design problem. They didn't. What helped was accepting I needed pre-built components, adapting them rather than fighting them, and not over-engineering the state management.&lt;/p&gt;

</description>
      <category>react</category>
      <category>reactnative</category>
      <category>ionic</category>
      <category>frontend</category>
    </item>
    <item>
      <title>From Fly.io to On-Premise Kubernetes</title>
      <dc:creator>yep</dc:creator>
      <pubDate>Thu, 09 Apr 2026 13:39:56 +0000</pubDate>
      <link>https://forem.com/yepchaos/from-flyio-to-on-premise-kubernetes-4bj9</link>
      <guid>https://forem.com/yepchaos/from-flyio-to-on-premise-kubernetes-4bj9</guid>
      <description>&lt;p&gt;Everything works in localhost. Exposing it to the internet is a different problem. I went through Fly.io, Linode managed Kubernetes, and eventually landed on an on-premise cluster. Each step had tradeoffs in both cost and operational complexity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Containers and Kubernetes
&lt;/h2&gt;

&lt;p&gt;Before getting into the details, here is the short explanation.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;container&lt;/strong&gt; is a lightweight, isolated unit that packages an application with its runtime and dependencies. Unlike virtual machines, containers share the host OS kernel, which makes them efficient in terms of startup time and resource usage. Docker is the standard tooling: define a &lt;code&gt;Dockerfile&lt;/code&gt;, build an image, and run it across environments with minimal variation.&lt;/p&gt;

&lt;p&gt;The problem: once we have multiple containers across multiple machines, managing them manually is chaos. Which machine runs what? What happens when a container crashes? How do you roll out updates without downtime?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Kubernetes&lt;/strong&gt; solves this. It's an orchestration platform — you describe what you want (3 replicas of this service, always keep them running, expose them on this port) and Kubernetes figures out how to make it happen. The key building blocks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pod&lt;/strong&gt; — the smallest unit, one or more containers running together&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deployment&lt;/strong&gt; — describes how many pods to run and how to update them&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Service&lt;/strong&gt; — a stable network endpoint that routes traffic to the right pods&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ingress&lt;/strong&gt; — routes external HTTP traffic into the cluster&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The big win: if a pod crashes, Kubernetes restarts it. If a node goes down, it reschedules pods elsewhere. You stop thinking about individual machines and start thinking about desired state.&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase 1: Fly.io + Vercel
&lt;/h2&gt;

&lt;p&gt;For the backend I started with &lt;a href="http://Fly.io" rel="noopener noreferrer"&gt;Fly.io&lt;/a&gt;. Easy to deploy, cheap during development — I stayed under their $5 threshold. For the frontend, Vercel. Push to GitLab, it deploys automatically. Vercel still handles the frontend today, no complaints. Fly.io manages container just fine.&lt;/p&gt;

&lt;p&gt;Fly.io was fine for stateless services. The problem was stateful ones — ScyllaDB and NATS.&lt;/p&gt;

&lt;p&gt;Running stateful services on Kubernetes properly requires &lt;strong&gt;operators&lt;/strong&gt; — controllers that understand the specific lifecycle of a piece of software. ScyllaDB has its own operator that handles cluster bootstrapping, repairs, scaling, backup, topology changes. NATS has one too. On platforms like this, running Kubernetes operators isn’t possible because you don’t have access to a Kubernetes control plane. As a result, lifecycle management for stateful systems must be handled manually. You're stuck managing stateful services manually. I spent more time managing the platform than building the product.&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase 2: Linode Managed Kubernetes
&lt;/h2&gt;

&lt;p&gt;I needed real Kubernetes. Linode offered $100 credit on signup, which was enough to experiment properly.&lt;/p&gt;

&lt;p&gt;The setup: 3 worker nodes (1 CPU, 2GB RAM, 50GB storage each) plus a load balancer. Linode's managed Kubernetes is free for the control plane — we only pay for nodes and networking.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;3 nodes × $12/month = $36&lt;/li&gt;
&lt;li&gt;Load balancer = $10/month&lt;/li&gt;
&lt;li&gt;Total: $46/month&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I used Terraform with Linode's provider to provision the cluster — infrastructure as code, version controlled, easy to redeploy. Once the cluster was up, I could run operators properly. ScyllaDB and NATS behaved the way they were supposed to.&lt;/p&gt;

&lt;p&gt;When the $100 credit ran out, $46/month was hard to justify for a project still in testing. So I started thinking about on-premise.&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase 3: On-Premise Kubernetes
&lt;/h2&gt;

&lt;p&gt;The same teacher from my IOI days - gave me access to three VMs on his company's infrastructure. Free. Each machine had 8 cores, 8GB RAM, and 50GB storage.&lt;/p&gt;

&lt;p&gt;I set up my own Kubernetes cluster on these using &lt;strong&gt;k3s&lt;/strong&gt; — a lightweight Kubernetes distribution that works well for on-premise and resource-constrained environments. I'll write a dedicated post on the k3s setup, but the short version: it's full Kubernetes without the overhead, and it runs fine on these VMs.&lt;/p&gt;

&lt;p&gt;Full control over the environment. I can run any operator, configure networking however I need, no platform restrictions. The tradeoff is that there's no managed control plane — if something breaks at the infrastructure level, I fix it myself. It’s acceptable also free just some of my time.&lt;/p&gt;

&lt;p&gt;I deployed everything: ScyllaDB cluster, NATS, PostgreSQL, Redis, the backend services. All running on three VMs, costing nothing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Things Stand
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Frontend&lt;/strong&gt;: Vercel, still works great&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backend + all services&lt;/strong&gt;: On-premise k3s cluster&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On-premise infrastructure requires more operational effort but provides full control and effectively zero cost when hardware is available. Next I'll write about the actual k3s setup — how I configured the cluster, networking, storage, and got everything running.&lt;/p&gt;

</description>
      <category>kubernetes</category>
    </item>
    <item>
      <title>Object Storage &amp; CDN Journey</title>
      <dc:creator>yep</dc:creator>
      <pubDate>Thu, 09 Apr 2026 13:18:38 +0000</pubDate>
      <link>https://forem.com/yepchaos/object-storage-cdn-journey-27ke</link>
      <guid>https://forem.com/yepchaos/object-storage-cdn-journey-27ke</guid>
      <description>&lt;p&gt;A chat application needs reliable object storage — media uploads, backups, logs. Sounds simple, but there’s lot of choices. I went through six different solutions before landing on something that actually made sense.&lt;/p&gt;

&lt;h2&gt;
  
  
  The S3 API
&lt;/h2&gt;

&lt;p&gt;Before getting into the journey, one thing worth explaining: almost every object storage provider today implements the &lt;strong&gt;S3 API&lt;/strong&gt; — the interface originally built by AWS for their Simple Storage Service.&lt;/p&gt;

&lt;p&gt;It's a RESTful interface: buckets as containers, objects accessed by unique keys, HTTP methods for everything. The key thing is it's a standard. Providers like Wasabi, MinIO, Backblaze, Cloudflare R2 — they all speak S3. That means I can swap providers without rewriting application logic, just change the endpoint and credentials. That portability matters a lot when you're still figuring out the right fit.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Provider Journey
&lt;/h2&gt;

&lt;h3&gt;
  
  
  AWS S3
&lt;/h3&gt;

&lt;p&gt;The obvious starting point. Reliable, feature-rich, integrates with everything. I used it early on and it worked fine — but the pricing model is higher than others. I stopped using it before things got expensive.&lt;/p&gt;

&lt;h3&gt;
  
  
  Backblaze B2
&lt;/h3&gt;

&lt;p&gt;Backblaze B2 has egress-free pricing, which sounds great. The problem: it only has American data centers. My servers and users aren't in America, so the latency was noticeable and unacceptable for a real-time chat app.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tigris (via Fly.io)
&lt;/h3&gt;

&lt;p&gt;Tigris (&lt;a href="http://fly.io/" rel="noopener noreferrer"&gt;Fly.io&lt;/a&gt;) provides globally distributed, S3-compatible storage with low latency, addressing the B2 latency limitations. However, its pricing model includes per-request charges in addition to storage. For an API-heavy workload like a chat system, this would scale poorly, so I decided not to go with it.&lt;/p&gt;

&lt;h3&gt;
  
  
  MinIO
&lt;/h3&gt;

&lt;p&gt;I actually deployed MinIO in my cluster. It's open-source, S3-compatible, and simple to run. But running it yourself means managing infrastructure, handling high availability, paying for the compute. For a small project it's overkill — I was spending more time on storage ops than on the actual product.&lt;/p&gt;

&lt;h3&gt;
  
  
  Wasabi
&lt;/h3&gt;

&lt;p&gt;Wasabi has egress-free pricing and good performance. I settled here for a while. But there's a catch: &lt;strong&gt;Wasabi doesn't support public bucket permissions&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For private files, that's fine — I generate pre-signed URLs from the backend, the user gets a temporary link, no credentials exposed. But for public files like profile pictures, I had to build a backend service to forward them to users. Extra latency, extra backend load, not ideal.&lt;/p&gt;

&lt;p&gt;I made it work, but then realized a bigger problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Wasabi Pricing Problem
&lt;/h2&gt;

&lt;p&gt;Wasabi charges for a minimum of &lt;strong&gt;1TB&lt;/strong&gt; regardless of how much you actually store. My total data — user uploads, database backups, cluster backups — was under 10GB. I was paying &lt;strong&gt;$8/month&lt;/strong&gt; to store 10GB. That's bad.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fixing the Public File Problem First
&lt;/h2&gt;

&lt;p&gt;Before I figured out the pricing issue, I spent time solving the public file latency problem with Cloudflare caching. Worth documenting because it works well if you're stuck on Wasabi or something like this.&lt;/p&gt;

&lt;p&gt;The setup: every public file request goes through my backend at &lt;code&gt;/api/v1/media/file/*&lt;/code&gt;. I set Cloudflare cache rules on that path — mark responses eligible for cache, force an edge TTL of 1 year, bypass backend &lt;code&gt;Cache-Control&lt;/code&gt; headers. Once a file is cached at Cloudflare's edge, it never hits my backend or Wasabi again.&lt;/p&gt;

&lt;p&gt;Here's a real cached response:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft4xwpufxzwfm1fpcqr7n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft4xwpufxzwfm1fpcqr7n.png" alt=" " width="800" height="1112"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;CF-Cache-Status: HIT&lt;/code&gt; — served from Cloudflare's edge, not my backend&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Age: 774&lt;/code&gt; — seconds it's been cached at the edge&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Cache-Control: max-age=31536000&lt;/code&gt; — browser caches it for 1 year too&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Zero extra cluster resources, no Wasabi bandwidth on repeat requests, low latency globally. If you're using Wasabi and hitting this problem, this approach works.&lt;/p&gt;

&lt;p&gt;Because of the fixed fee, i decided to move anyway.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Setup: Cloudflare R2
&lt;/h2&gt;

&lt;p&gt;Cloudflare R2 has a free tier of &lt;strong&gt;10GB&lt;/strong&gt;. My entire dataset fits in that. No egress fees, native CDN built in — so no need for the Cloudflare caching workaround above (though good to know it works). I moved everything to R2 and now pay nothing for storage.&lt;/p&gt;

&lt;p&gt;For backups, I'm keeping Backblaze B2 in mind for when data grows — egress-free and cheap for large volumes, as long as the latency to my users is acceptable for backup use cases (it is).&lt;/p&gt;

&lt;p&gt;Current state:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cloudflare R2&lt;/strong&gt; — user uploads, all active data, everything under 10GB (free tier)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backblaze B2&lt;/strong&gt; — future home for backups once R2 free tier isn't enough&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The egress-free advantage of Wasabi turned out to be irrelevant at my scale. Under 1TB, you're just paying the minimum anyway. R2's free tier made the decision easy&lt;/p&gt;

</description>
      <category>objectstorage</category>
      <category>cdn</category>
    </item>
    <item>
      <title>Finding Rigth Database</title>
      <dc:creator>yep</dc:creator>
      <pubDate>Wed, 08 Apr 2026 14:04:20 +0000</pubDate>
      <link>https://forem.com/yepchaos/finding-rigth-database-15c1</link>
      <guid>https://forem.com/yepchaos/finding-rigth-database-15c1</guid>
      <description>&lt;p&gt;Most applications need to persist state. In a chat application, that state is massive, constantly growing, and high-frequency. The obvious starting point is a traditional RDBMS — but the specific access patterns of a real-time chat system eventually force a rethink.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem with RDBMS for Chat
&lt;/h2&gt;

&lt;p&gt;I could use PostgreSQL for storing messages. It works, until it doesn't.&lt;/p&gt;

&lt;p&gt;Chat is different from most relational data. Messages don't join to other tables. What I actually need is simple: insert a message, fetch messages by room or user. That's it. So the requirements are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It grows fast — millions, then billions of rows&lt;/li&gt;
&lt;li&gt;No joins needed — just "give me all messages for room X"&lt;/li&gt;
&lt;li&gt;Reads and writes need to be fast&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Traditional databases like PostgreSQL and MySQL weren't designed with this access pattern as the primary use case. Here's why that matters.&lt;/p&gt;

&lt;h3&gt;
  
  
  Partitioning
&lt;/h3&gt;

&lt;p&gt;As the message table grows, we can partition it — split it into smaller physical chunks based on some key, like room ID or time range. The database only scans the relevant partition instead of the whole table. Postgres supports this natively, but it handles it differently from distributed systems — partitions still live on a single machine, so we’re organizing data, not distributing the load.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Write Scaling Problem
&lt;/h3&gt;

&lt;p&gt;The bigger issue is writes. PostgreSQL and MySQL use a single-master model — one node handles all writes, replicas handle reads. Every message sent goes through that one master. At high write volume, that becomes  bottleneck.&lt;/p&gt;

&lt;p&gt;The common solution is sharding: split data across multiple independent database instances, each owning a slice. Hash the room ID to decide which shard it lives on. In theory, clean. In practice, painful — managing shard keys, handling rebalancing when nodes are added, cross-shard queries becoming a nightmare. I decided early on to avoid this entirely by choosing a database built for it natively.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cassandra and ScyllaDB
&lt;/h2&gt;

&lt;p&gt;This is where wide-column stores like Cassandra — and its C++ reimplementation, ScyllaDB — come in. Same architecture, ScyllaDB just rewrote it in C++ for better performance and lower latency.&lt;/p&gt;

&lt;p&gt;The core idea: instead of one master handling writes, Cassandra/ScyllaDB uses a &lt;strong&gt;ring topology&lt;/strong&gt;. Every node in the cluster owns a range of a hash space. When a message is written, the room ID gets hashed and routed to the node that owns that hash range. No single master, no write bottleneck — every node can accept writes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Replication&lt;/strong&gt; works naturally on top of this. With a replication factor of 3, a write doesn't just go to the primary node — it also goes to the next 2 nodes on the ring. So there are 3 copies of the data across different nodes. If one goes down, the data is still there. No manual failover, it's built into how the ring works.&lt;/p&gt;

&lt;p&gt;The other key advantage is the &lt;strong&gt;partition key&lt;/strong&gt;. By using room ID as the partition key, Cassandra/ScyllaDB guarantees all messages for that room are stored together on the same node. Pair that with a &lt;strong&gt;clustering key&lt;/strong&gt; on timestamp, and messages within a room are physically stored in time order — fetching history becomes one sequential read, already sorted. No ORDER BY, no extra cost.&lt;/p&gt;

&lt;p&gt;This turns random I/O into sequential I/O. Fetching chat history means finding the right node and reading one continuous stream. That's a hardware-level optimization that a single-master Postgres setup simply can't match at scale.&lt;/p&gt;

&lt;p&gt;The tradeoff: Cassandra/ScyllaDB is bad at full scans and joins, because those require hitting every node. But based on the requirements here, that doesn't matter — joins are never needed.&lt;/p&gt;

&lt;p&gt;This isn't just theory. Discord went through this exact problem — first scaling with Cassandra for billions of messages, then eventually migrating to ScyllaDB for better performance at trillions of messages. Worth reading if you want a production-scale perspective:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://discord.com/blog/how-discord-stores-billions-of-messages" rel="noopener noreferrer"&gt;How Discord Stores Billions of Messages&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://discord.com/blog/how-discord-stores-trillions-of-messages" rel="noopener noreferrer"&gt;How Discord Stores Trillions of Messages&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'll write a dedicated post on Cassandra/ScyllaDB internals — replication strategies, consistency levels, and multi-DC support deserve their own space.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Hybrid Architecture
&lt;/h2&gt;

&lt;p&gt;There's no perfect database. Different tools solve different problems, so I use both:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;PostgreSQL&lt;/strong&gt; — relational, "small" data: users, friend lists, room metadata. Needs ACID compliance and complex queries, but doesn't grow at a massive rate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cassandra/ScyllaDB&lt;/strong&gt; — the heavy data: every message ever sent. High-write throughput, fast sequential reads by room, horizontally scalable without a single write master.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each database does what it's actually good at.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;There's more to cover here — consistency models, high availability, failover, and distributed systems fundamentals like Raft. I'll get into those in future posts. For now, this is the architectural reasoning behind the storage layer in ASTRING.&lt;/p&gt;

</description>
      <category>database</category>
      <category>postgres</category>
      <category>cassandra</category>
      <category>scylladb</category>
    </item>
  </channel>
</rss>
