<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Tiago Pereira</title>
    <description>The latest articles on Forem by Tiago Pereira (@wildlifechorus).</description>
    <link>https://forem.com/wildlifechorus</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3936140%2F8cc0552b-a16e-44c3-a34b-af234513f745.jpeg</url>
      <title>Forem: Tiago Pereira</title>
      <link>https://forem.com/wildlifechorus</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/wildlifechorus"/>
    <language>en</language>
    <item>
      <title>I built a self-hosted news digest that re-ranks itself based on what you actually read</title>
      <dc:creator>Tiago Pereira</dc:creator>
      <pubDate>Sun, 17 May 2026 11:09:28 +0000</pubDate>
      <link>https://forem.com/wildlifechorus/i-built-a-self-hosted-news-digest-that-re-ranks-itself-based-on-what-you-actually-read-4l44</link>
      <guid>https://forem.com/wildlifechorus/i-built-a-self-hosted-news-digest-that-re-ranks-itself-based-on-what-you-actually-read-4l44</guid>
      <description>&lt;p&gt;I use RSS readers constantly. The problem I kept running into was that they are&lt;br&gt;
stateless. Every day the same firehose, in the same order, with no memory of&lt;br&gt;
what I cared about yesterday. I wanted something that actually got better over&lt;br&gt;
time, ran on my own machine, and didn't require handing my reading habits to a&lt;br&gt;
third party.&lt;/p&gt;

&lt;p&gt;So I built CondenseIt.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it does
&lt;/h2&gt;

&lt;p&gt;CondenseIt is a self-hosted daily news digest. You point it at your sources, it&lt;br&gt;
runs on a schedule, summarises everything using a local LLM, and serves the&lt;br&gt;
results as a web app you can read in the browser.&lt;/p&gt;

&lt;p&gt;Sources it supports out of the box:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;RSS and Atom feeds&lt;/li&gt;
&lt;li&gt;YouTube channels (via transcripts from the public channel RSS)&lt;/li&gt;
&lt;li&gt;Reddit subreddits (transparently served via Lemmy.world RSS so it works on
VPS IPs where Reddit is blocked; the "Reddit" badge still shows)&lt;/li&gt;
&lt;li&gt;Hacker News (top, best, new, ask, show)&lt;/li&gt;
&lt;li&gt;GitHub release feeds&lt;/li&gt;
&lt;li&gt;Google News searches (with operator support: &lt;code&gt;site:&lt;/code&gt;, &lt;code&gt;when:&lt;/code&gt;, &lt;code&gt;intitle:&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Website change detection (diffs the page content and treats meaningful
changes as new items)&lt;/li&gt;
&lt;li&gt;Podcast feeds (new episodes from any podcast RSS, with iTunes search built
in to find the feed URL)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of that is configured from an admin panel. No YAML editing required after&lt;br&gt;
initial setup. Per-source keyword filters (allowlist, hide, and highlight rules)&lt;br&gt;
are also set per-source from the admin panel.&lt;/p&gt;

&lt;h2&gt;
  
  
  The LLM part
&lt;/h2&gt;

&lt;p&gt;Summaries run through Ollama locally, Metal-accelerated on Apple Silicon. No&lt;br&gt;
discrete GPU needed. I run it on a Mac Mini and it handles a daily digest of&lt;br&gt;
50-80 articles without issues. If you prefer cloud inference, OpenRouter is&lt;br&gt;
supported too, with budget limits so you don't accidentally run up a large bill.&lt;br&gt;
Summarization is now parallelized so the digest runs noticeably faster. OpenRouter&lt;br&gt;
calls also retry automatically on rate-limit responses.&lt;/p&gt;

&lt;p&gt;The LLM is used for summarisation. Ranking is a separate system built on top of&lt;br&gt;
classical signals, with optional AI layers you can turn on incrementally.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the re-ranking works
&lt;/h2&gt;

&lt;p&gt;Every article gets a score composed of classical signals:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Keyword overlap with your liked/disliked term profile&lt;/li&gt;
&lt;li&gt;Bigram phrase matching&lt;/li&gt;
&lt;li&gt;TF-IDF cosine similarity against your content history&lt;/li&gt;
&lt;li&gt;Category and source averages from your past ratings&lt;/li&gt;
&lt;li&gt;Three implicit engagement signals: reading an article, saving it for later,
and dismissing it&lt;/li&gt;
&lt;li&gt;Synonym boost (configurable synonym groups propagate weight across related
terms)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Explicit star ratings (1-5) drive the biggest updates to the profile. Implicit&lt;br&gt;
signals are weighted at 0.5 by default so they contribute but don't dominate.&lt;br&gt;
All rating contributions decay exponentially (default half-life: 30 days) so&lt;br&gt;
stale preferences fade rather than accumulating forever.&lt;/p&gt;

&lt;p&gt;There are also three optional AI layers, each independently controlled and off&lt;br&gt;
by default:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Semantic embeddings&lt;/strong&gt; - article text and your liked/disliked articles are&lt;br&gt;
encoded as vectors. Each candidate is scored by cosine similarity to the&lt;br&gt;
centroid of your liked embeddings minus disliked embeddings. Embeddings are&lt;br&gt;
generated once and cached in SQLite (keyed by URL + content hash), so&lt;br&gt;
subsequent runs are fast.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Topic/entity enrichment&lt;/strong&gt; - the same LLM call that summarizes an article&lt;br&gt;
also extracts topics, entities, and a novelty score (1-5) at no extra cost.&lt;br&gt;
These are used to build a topic preference profile from your ratings. Topics&lt;br&gt;
and a "novel" badge appear on each card.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;LLM reranker&lt;/strong&gt; - after classical scoring, a compact profile narrative is&lt;br&gt;
built from your top liked/disliked terms, categories, and sources. The LLM&lt;br&gt;
scores the top-K candidates by relevance and returns a brief reason. The&lt;br&gt;
relevance score is blended with the classical score (default blend: 0.3) and&lt;br&gt;
the reason appears in the "Why ranked here?" panel.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There is also a cold-start bootstrap: if you have no ratings yet, you can visit&lt;br&gt;
Admin &amp;gt; Preferences and describe your interests in plain text. The LLM derives&lt;br&gt;
initial keywords, synonyms, and a profile summary that seed the engine before&lt;br&gt;
any ratings exist.&lt;/p&gt;

&lt;p&gt;The part I found most useful to add was a transparency panel. Every article card&lt;br&gt;
has a collapsible "Why ranked here?" section that shows each signal (classical&lt;br&gt;
and AI) as a proportional bar. It made tuning the weights much easier and&lt;br&gt;
surfaced some surprising things about my own reading habits.&lt;/p&gt;

&lt;p&gt;A newer addition is semantic deduplication: articles covering the same story are&lt;br&gt;
collapsed before ranking, so you don't see five versions of the same news item.&lt;/p&gt;

&lt;h2&gt;
  
  
  Saves vs. star ratings
&lt;/h2&gt;

&lt;p&gt;There are two kinds of saves now:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Save for later&lt;/strong&gt; - implicit strong positive; contributes to your preference
profile like a high star rating.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Starred (permanent save)&lt;/strong&gt; - has no ranking effect. Starred articles live
on a separate page and survive all future digest runs. Useful for
bookmarking things you want to return to.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.11, FastAPI, SQLite&lt;/li&gt;
&lt;li&gt;React, TypeScript, Vite&lt;/li&gt;
&lt;li&gt;Ollama or OpenRouter for inference&lt;/li&gt;
&lt;li&gt;MIT license&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Local setup is &lt;code&gt;uv sync&lt;/code&gt; then &lt;code&gt;condenseit serve&lt;/code&gt;. There's also a deploy script&lt;br&gt;
for running it on a VPS with nginx and a systemd service. The deploy script&lt;br&gt;
now supports multiple instances with an interactive picker if you have more&lt;br&gt;
than one server.&lt;/p&gt;

&lt;p&gt;The web app is installable as a PWA on iOS and Android (there's a guide in the&lt;br&gt;
docs).&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'm looking for
&lt;/h2&gt;

&lt;p&gt;The project is open source and I'm actively looking for contributors. Areas&lt;br&gt;
where help would be most useful: new collector types, frontend and accessibility&lt;br&gt;
improvements, test coverage, and Docker packaging. Good first issues are labeled&lt;br&gt;
in the repo.&lt;/p&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/wildlifechorus/condenseit" rel="noopener noreferrer"&gt;https://github.com/wildlifechorus/condenseit&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Happy to answer questions about any of the implementation decisions in the&lt;br&gt;
comments.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>python</category>
      <category>selfhosted</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
