<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Constantin</title>
    <description>The latest articles on Forem by Constantin (@engag1ng).</description>
    <link>https://forem.com/engag1ng</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3652609%2F1a0e84d4-e295-4cdc-847d-6c6731e109e9.png</url>
      <title>Forem: Constantin</title>
      <link>https://forem.com/engag1ng</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/engag1ng"/>
    <language>en</language>
    <item>
      <title>🚀 The Future of Information Retrieval (IR) — Looking for Testers &amp; Feedback</title>
      <dc:creator>Constantin</dc:creator>
      <pubDate>Mon, 08 Dec 2025 20:15:00 +0000</pubDate>
      <link>https://forem.com/engag1ng/the-future-of-information-retrieval-ir-looking-for-testers-feedback-2gf3</link>
      <guid>https://forem.com/engag1ng/the-future-of-information-retrieval-ir-looking-for-testers-feedback-2gf3</guid>
      <description>&lt;p&gt;Hi everyone!&lt;br&gt;
My name is Constantin, and for the past few months I've been building a custom Information Retrieval (IR) system from scratch — partly for fun, partly to learn, and partly because I wanted something that existing tools didn’t give me, simplicity mixed with raw speed.&lt;/p&gt;

&lt;p&gt;I’m finally at a point where it’s usable, and now I’m looking for feedback, testers, ideas, and brutally honest opinions from other developers who care about making information retrieval better for everyone.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/engag1ng/hirmes" rel="noopener noreferrer"&gt;https://github.com/engag1ng/hirmes&lt;/a&gt;&lt;br&gt;
&lt;a href="https://hirmes.webflow.io/" rel="noopener noreferrer"&gt;https://hirmes.webflow.io/&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  🔍 Why I Built It
&lt;/h2&gt;

&lt;p&gt;I already know that most commercial software has some sort of Information Retrieval system built-in and yet nobody uses it. I wanted to fix that.&lt;/p&gt;

&lt;h2&gt;
  
  
  ✨ Key Features (so far)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  🔸 1. Custom Token–Postings Indexing System
&lt;/h3&gt;

&lt;p&gt;Instead of using Lucene, Whoosh, or Elastic, I built my own:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tokenizer with optional filters&lt;/li&gt;
&lt;li&gt;Postings lists stored efficiently&lt;/li&gt;
&lt;li&gt;Fast search over token sets&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🔸 2. SQLite Backend for Storage
&lt;/h3&gt;

&lt;p&gt;Initially I used dbm + pickle, but I migrated to sqlite3 for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Better performance on large posting sets&lt;/li&gt;
&lt;li&gt;ACID guarantees&lt;/li&gt;
&lt;li&gt;Easier debugging&lt;/li&gt;
&lt;li&gt;More predictable persistence&lt;/li&gt;
&lt;li&gt;The schema is simple and extensible, so you can add your own metadata or scoring fields.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🔸 3. User-Assigned Document IDs
&lt;/h3&gt;

&lt;p&gt;You can directly assign your own document IDs, making it ideal for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;personal knowledge bases&lt;/li&gt;
&lt;li&gt;bookmarking apps&lt;/li&gt;
&lt;li&gt;search inside your own dataset&lt;/li&gt;
&lt;li&gt;programmatically indexed corpora&lt;/li&gt;
&lt;li&gt;No auto-generation required unless you want it.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🔸 4. Search Engine Core Logic
&lt;/h3&gt;

&lt;p&gt;The search API currently supports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;term lookup&lt;/li&gt;
&lt;li&gt;multi-term queries&lt;/li&gt;
&lt;li&gt;boolean AND/OR&lt;/li&gt;
&lt;li&gt;scoring based on term intersections (more ranking choices planned)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🔸 5. Performance-Focused Tokenization
&lt;/h3&gt;

&lt;p&gt;I spent quite a bit of time optimizing tokenization for speed.&lt;/p&gt;

&lt;h2&gt;
  
  
  🧪 What I’m Looking For
&lt;/h2&gt;

&lt;p&gt;I’d love early testers who can help with:&lt;/p&gt;

&lt;p&gt;✔️ Trying it on your own small dataset&lt;br&gt;
✔️ Finding slow spots, bugs, or edge cases&lt;br&gt;
✔️ Suggesting features, scoring models, or indexing ideas&lt;br&gt;
✔️ Telling me if something is unclear or needs documentation&lt;br&gt;
✔️ Experimenting with tokenization and weighting strategies&lt;/p&gt;

&lt;p&gt;If you’ve ever built tools involving:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;search&lt;/li&gt;
&lt;li&gt;indexing&lt;/li&gt;
&lt;li&gt;NLP&lt;/li&gt;
&lt;li&gt;document retrieval&lt;/li&gt;
&lt;li&gt;data engineering&lt;/li&gt;
&lt;li&gt;Python performance tuning
…your thoughts would mean the world to me.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  🙏 How You Can Help
&lt;/h2&gt;

&lt;p&gt;If you want to test it or give feedback, just drop a comment here or message me.&lt;br&gt;
I can provide:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Installation instructions&lt;/li&gt;
&lt;li&gt;Example code&lt;/li&gt;
&lt;li&gt;A test dataset&lt;/li&gt;
&lt;li&gt;Architecture overview&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Or if you prefer GitHub issues / discussions, I can open those up as well.&lt;/p&gt;

&lt;h2&gt;
  
  
  ❤️ Thank You
&lt;/h2&gt;

&lt;p&gt;I know there are a ton of IR libraries and search engines out there, so if you take the time to try a small personal project of mine, it means a lot.&lt;br&gt;
I’m doing this to learn and to build something useful — and I’d love to improve it with help from you guys.&lt;/p&gt;

</description>
      <category>productivity</category>
      <category>python</category>
      <category>opensource</category>
      <category>testing</category>
    </item>
  </channel>
</rss>
