Adding ML layer to Search: Hybrid Search Optimizer

#federatedsearch #hybridsearch #neuralsearch #keywordsearch

If we were to look 10 years out, I think an ideal solution is that we are not doing hybrid search anymore: just have a better approach. Something beyond vector + keyword, something better, that still supports the 0 results is the right answer (sometimes). We would have better approach, and not this slighly band-aidy approach, but for now hybrid search is exciting!

It is fascinating and funny how things develop, but also turn around. In 2022–23 everyone was buzzing about hybrid search. In 2024 the conversation shifted to RAG, RAG, RAG. And now we are in 2025 and back to hybrid search — on a different level. Finally, there are strides and contributions towards making hybrid search parameters learnt with ML. How cool is that?

When I looked at hybrid search, I instantly knew that fiddling with a and b in a*keyword + b*neural will be the crux of succeeding with this approach to search. I also knew, that a better way than manual tweaking will be applying ML.

I’m really happy someone clever did this. Daniel Wrigley and Eric Pugh, both from OpenSource Connections, decided to do exactly that: apply machine learning to the problem of computing these coefficients. In other words: what weight to give to keyword match vs neural search match. And what’s fascinating, is that they experimented with a multitude of methods from global to dynamic (per query), with different permutations, feature groups, combination methods, query sampling — it sounds an overwhelming study.

What’s even cooler, is that all of this is open source.

Check out this episode, let me (us) know what you think. And remember to subscribe to stay tuned for new episodes.

Design: Saurabh Rai, https://www.linkedin.com/in/srbhr/

The design of this episode is inspired by a scene in Blade Runner 2049. There’s a clear path leading towards where people want to go to, yet they’re searching for something.

As usual, you can find the episode in audio form on your favorite platform.

62% faster than every other vector database

Tired of slow, inaccurate vector search?
Redis delivers top recall and low latency, outperforming leading vector databases in recent benchmarks. With built-in ANN and easy scaling, it’s a fast, reliable choice for real-time AI apps.

Get started

Top comments (2)

𝚂𝚊𝚞𝚛𝚊𝚋𝚑 𝚁𝚊𝚒 • Apr 14

This episode is 🔥 🔥 🔥

Dmitry Kan • Apr 15

Thanks Saurabh!!

🐯 🚀 Timescale is now TigerData: Building the Modern PostgreSQL for the Analytical and Agentic Era

We’ve quietly evolved from a time-series database into the modern PostgreSQL for today’s and tomorrow’s computing, built for performance, scale, and the agentic future.

So we’re changing our name: from Timescale to TigerData. Not to change who we are, but to reflect who we’ve become. TigerData is bold, fast, and built to power the next era of software.