Forem: Sergey Nikolaev

How to Make xt850 Match xt 850

Sergey Nikolaev — Fri, 08 May 2026 05:30:14 +0000

TL;DR

Since version 23.0.0, Manticore can make searches like xt850 match xt 850 using bigram_delimiter together with digit-aware bigram_index modes.

This solves a common tokenization mismatch in product search, where users remove spaces from model names but the source data stores them as separate tokens.

Assumptions and verification

This article assumes:

RT tables created with SQL examples exactly as shown
default tokenization unless the example explicitly changes a setting
ASCII digits in model names, because second_numeric and second_has_digit are digit-aware modes built around 0-9

All SQL examples and expected outputs in this article were verified against a real Manticore 23.0.0 instance before publishing, using fresh tables created from scratch for each scenario.

The broader search problem

Imagine a catalog containing:

xt 850 action camera
iphone 5se battery case
canon eos 80d body
thinkpad x1 carbon

Now imagine users searching for:

xt850
iphone5se
eos80d
thinkpadx1

From the user's point of view, these should obviously match. From the engine's point of view, they often do not, because the indexed text is tokenized as separate terms.

Search systems usually attack that mismatch in one of four ways:

index prefixes or infixes
add custom normalization rules
duplicate content into alternate normalized fields
index adjacent token pairs and optionally store glued variants too

Manticore's newer bigram functionality is a structured way to do the fourth option without awkward field duplication.

Baseline: why `xt850` fails by default

Here is the problem in its simplest form:

DROP TABLE IF EXISTS bi_default_demo;

CREATE TABLE bi_default_demo(title text);

INSERT INTO bi_default_demo VALUES
  (1,'xt 850 action camera');

SELECT id, title FROM bi_default_demo WHERE MATCH('xt850');

Expected result:

Empty set

Why does this fail?

Because the document is indexed as two separate tokens, xt and 850, while the query is a single token, xt850.

By default, Manticore does not assume that:

xt850 should be split into xt + 850
or xt + 850 should also be searchable as xt850

So this is not really a typo-tolerance problem or a phrase problem. It is a tokenization mismatch: the index sees two tokens, while the query provides one.

That is the gap the newer bigram settings are designed to close. They let Manticore index selected adjacent token pairs in a form that can also match glued queries.

Why bigrams help here

bigram_index can help with both phrase acceleration and model-name matching, and in this article we focus on the xt 850 vs xt850 problem.

The key idea is simple:

detect adjacent token pairs that look like model names
store those pairs in a glued form too
let queries such as xt850, iphone5se, or thinkpadx1 hit the spaced text

That is where bigram_delimiter matters.

A note about bigram_delimiter

bigram_index decides which adjacent pairs are eligible.

bigram_delimiter decides how eligible bigrams are stored:

true: internal delimited token only
none: glued token only, such as galaxy24
both: both forms

The practical difference is easiest to understand from the query side:

with true, Manticore keeps the internal bigram form used for phrase optimization, but it does not keep the glued user-facing form, so a query like xt850 will not match xt 850
with none, Manticore keeps only the glued form, so xt850 can match xt 850, but you are leaning entirely on the glued representation for those pairs
with both, Manticore keeps both the internal bigram representation and the glued form, so xt850 can match xt 850 without giving up ordinary phrase behavior

For this use case, both is usually the safer default because it covers the user-visible problem directly while keeping behavior less surprising for normal phrase queries and mixed workloads.

Mode 1: `second_numeric`

bigram_index = second_numeric
bigram_delimiter = both

This mode is aimed at model names where the second token is purely numeric.

That is common in product catalogs:

xt 850
galaxy 24
playstation 5
pixel 8

The idea is simple: users often search these as glued terms such as xt850, galaxy24, or playstation5, even though the source text stores them with a space.

second_numeric stores the pair only when the second token is ASCII digits only.

Use it when:

you have product generations and numbered models
users often remove spaces in search
the second token is usually just digits

Example

DROP TABLE IF EXISTS bi_second_numeric_demo;

CREATE TABLE bi_second_numeric_demo(title text)
  bigram_index='second_numeric'
  bigram_delimiter='both';

INSERT INTO bi_second_numeric_demo VALUES
  (1,'xt 850 action camera'),
  (2,'galaxy 24 ultra'),
  (3,'playstation 5 slim'),
  (4,'iphone 5se case'),
  (5,'canon eos 80d body'),
  (6,'thinkpad x1 carbon');

Then test the queries one by one:

SELECT id, title FROM bi_second_numeric_demo WHERE MATCH('xt850');

+------+----------------------+
| id   | title                |
+------+----------------------+
|    1 | xt 850 action camera |
+------+----------------------+

SELECT id, title FROM bi_second_numeric_demo WHERE MATCH('galaxy24');

+------+-----------------+
| id   | title           |
+------+-----------------+
|    2 | galaxy 24 ultra |
+------+-----------------+

SELECT id, title FROM bi_second_numeric_demo WHERE MATCH('playstation5');

+------+--------------------+
| id   | title              |
+------+--------------------+
|    3 | playstation 5 slim |
+------+--------------------+

SELECT id, title FROM bi_second_numeric_demo WHERE MATCH('iphone5se');

Empty set

SELECT id, title FROM bi_second_numeric_demo WHERE MATCH('eos80d');

Empty set

SELECT id, title FROM bi_second_numeric_demo WHERE MATCH('thinkpadx1');

Empty set

That boundary is the whole point of the mode:

24 and 5 qualify
5se, 80d, and x1 do not

Mode 2: `second_has_digit`

bigram_index = second_has_digit
bigram_delimiter = both

This mode is the more flexible sibling of second_numeric.

It stores the pair when the second token contains at least one ASCII digit. That makes it a much better fit for real product catalogs, where model identifiers are often mixed alphanumeric strings:

xt 850
iphone 5se
eos 80d
thinkpad x1

Use it when:

your model names mix letters and digits
users frequently remove spaces in their searches
you want catalog-friendly matching without indexing every pair in the table

Example

DROP TABLE IF EXISTS bi_second_has_digit_demo;

CREATE TABLE bi_second_has_digit_demo(title text)
  bigram_index='second_has_digit'
  bigram_delimiter='both';

INSERT INTO bi_second_has_digit_demo VALUES
  (1,'xt 850 action camera'),
  (2,'galaxy 24 ultra'),
  (3,'playstation 5 slim'),
  (4,'iphone 5se case'),
  (5,'canon eos 80d body'),
  (6,'thinkpad x1 carbon'),
  (7,'kindle paperwhite signature');

Then test the queries one by one:

SELECT id, title FROM bi_second_has_digit_demo WHERE MATCH('xt850');

+------+----------------------+
| id   | title                |
+------+----------------------+
|    1 | xt 850 action camera |
+------+----------------------+

SELECT id, title FROM bi_second_has_digit_demo WHERE MATCH('galaxy24');

+------+-----------------+
| id   | title           |
+------+-----------------+
|    2 | galaxy 24 ultra |
+------+-----------------+

SELECT id, title FROM bi_second_has_digit_demo WHERE MATCH('iphone5se');

+------+---------------------+
| id   | title               |
+------+---------------------+
|    4 | iphone 5se case     |
+------+---------------------+

SELECT id, title FROM bi_second_has_digit_demo WHERE MATCH('eos80d');

+------+---------------------+
| id   | title               |
+------+---------------------+
|    5 | canon eos 80d body  |
+------+---------------------+

SELECT id, title FROM bi_second_has_digit_demo WHERE MATCH('thinkpadx1');

+------+---------------------+
| id   | title               |
+------+---------------------+
|    6 | thinkpad x1 carbon  |
+------+---------------------+

SELECT id, title FROM bi_second_has_digit_demo WHERE MATCH('kindlesignature');

Empty set

This is often the better fit for mixed model identifiers, because real catalog data frequently includes forms like 5se, 80d, or x1 rather than only clean numeric suffixes like 24.

How to choose between the two

If your search problem is specifically "How do I make xt850 find xt 850?", the practical rule is:

use second_numeric when the second token is digits-only
use second_has_digit when the second token may be mixed, like 5se, 80d, or x1

There is one practical caveat: this is compatible with other common text-processing settings in the straightforward case. xt 850 still matches xt850 with morphology='stem_en' enabled and with a wordforms rule enabled.

But that does not mean those settings rewrite the glued query for you. In tests, iphones 5 matched iphones5, but not iphone5, even with stemming or a wordforms rule mapping iphones to iphone. So the short version is: basic xt 850 vs xt850 matching stays compatible with morphology and wordforms, but if you rely on them, test the exact query shape you care about.

Final takeaway

The xt850 problem is not really about one product name. It is about a broader mismatch between how users type model names and how search engines tokenize them.

Since version 23.0.0, Manticore gives you a built-in way to handle that mismatch with bigram_delimiter plus the digit-aware bigram_index modes, which is much cleaner than duplicating fields or inventing custom preprocessing pipelines.

If your main problem is phrase-search performance rather than glued model-name matching, see How to Speed Up Phrase Search with bigram_index.

How to Speed Up Phrase Search with bigram_index

Sergey Nikolaev — Thu, 07 May 2026 08:50:15 +0000

TL;DR

bigram_index can be used for several purposes, and in this article we focus specifically on phrase-search performance: on the 1M-document benchmark below, bigram_index='all' improved QPS by about 2.9x and cut average phrase-query latency by about 3.2x.

If your main problem is matching xt850 against xt 850 rather than speeding up phrase search, see How to Make xt850 Match xt 850.

Phrase search can be expensive. Even when a query is short, the engine still has to verify ordering and adjacency, and that work gets more noticeable when:

the individual words are common
the dataset is large
phrase queries are frequent in your workload

That is exactly what bigram_index is for.

What bigram indexing actually does

Normally, a phrase like "noise cancelling headphones" is handled as separate tokens that also need to appear in the right order and next to each other. Bigram indexing lets Manticore pre-store adjacent token pairs such as:

noise cancelling
cancelling headphones

That gives the engine a faster way to narrow down candidate documents during phrase matching.

This article focuses specifically on phrase acceleration.

Important caveat: bigrams work at tokenization level

This is the part that is easy to miss when you only look at the happy-path speedup story.

bigram_index works at the tokenization level only. It does not account for later transformations such as morphology, wordforms, or stopwords, and that can materially change phrase-matching expectations.

The practical conclusion is simple: bigrams can be excellent for phrase speed, but if your index relies heavily on morphology, wordforms, or stopwords, test the actual phrase behavior you care about before rolling the setting out broadly.

Mode 1: Default behavior

This is the baseline. No explicit bigram indexing is enabled, so no bigram posting lists are stored.

Use it when:

phrase search is rare
documents are short
you want the leanest indexing path

Example

DROP TABLE IF EXISTS bi_none_demo;

CREATE TABLE bi_none_demo(title text);

INSERT INTO bi_none_demo VALUES
  (1,'wireless noise cancelling headphones'),
  (2,'noise cancelling microphone'),
  (3,'wireless gaming headset');

SELECT id, title FROM bi_none_demo WHERE MATCH('"noise cancelling"');

This is the baseline behavior. The query matches the expected rows, but Manticore has no precomputed bigram posting lists to help resolve the phrase more efficiently.

Mode 2: `all`

bigram_index = all

This is the most aggressive phrase-acceleration mode. Every adjacent token pair gets indexed as a bigram.

Use it when:

exact phrase search is a core feature
phrase queries often include common words and produce many candidates
you want the strongest phrase acceleration
you do not want to tune a frequent-word list

Example

DROP TABLE IF EXISTS bi_all_demo;

CREATE TABLE bi_all_demo(title text)
  bigram_index='all';

INSERT INTO bi_all_demo VALUES
  (1,'lord of the rings trilogy'),
  (2,'house of the dragon season 2'),
  (3,'made for iphone charger');

SELECT id, title FROM bi_all_demo WHERE MATCH('"house of the dragon"');
SELECT id, title FROM bi_all_demo WHERE MATCH('"made for iphone"');

The important point here is not different matches, but different indexing strategy: all stores every adjacent pair, so phrase queries have the maximum amount of bigram help available at search time.

The reason to choose all is when phrase search becomes more expensive because many documents match the individual words, and Manticore then has to do more positional verification to confirm the exact phrase. all helps by narrowing candidates earlier.

Mode 3: `first_freq`

bigram_index = first_freq
bigram_freq_words = for, of, the, with

This mode stores a pair only when the first token is in your frequent-word list.

Use it when:

phrase search matters
you want a lighter alternative to all
many phrases in your data contain words that are genuinely frequent in your own corpus

With the list above:

for iphone is eligible
of the is eligible
the dragon is eligible
made for is not eligible
lord of is not eligible

For production use, do not pick bigram_freq_words from memory. Derive it from your own data. A practical way is to dump dictionary stats with indextool using --dumpdict ... --stats, review the most frequent tokens, and then build a small bigram_freq_words list from those results.

Example

DROP TABLE IF EXISTS bi_first_freq_demo;

CREATE TABLE bi_first_freq_demo(title text)
  bigram_index='first_freq'
  bigram_freq_words='for,of,the,with';

INSERT INTO bi_first_freq_demo VALUES
  (1,'made for iphone charger'),
  (2,'lord of the rings trilogy'),
  (3,'house of the dragon season 2');

SELECT id, title FROM bi_first_freq_demo WHERE MATCH('"made for iphone"');
SELECT id, title FROM bi_first_freq_demo WHERE MATCH('"lord of the"');

The queries still return the expected rows. What changes is which pairs get indexed:

"made for iphone" benefits from for iphone
"lord of the" benefits from of the

This makes first_freq a lighter alternative to all when many useful phrases involve common bridge words.

Mode 4: `both_freq`

bigram_index = both_freq
bigram_freq_words = for, of, the, with

This is the narrowest frequency-based mode. A pair is stored only when both tokens are in the frequent-word list.

Use it when:

you want the most conservative bigram footprint
you mainly care about pairs built from words that are highly frequent in your corpus
you are tuning a large corpus and do not want to index every adjacent pair

With the same list:

of the is eligible
for iphone is not eligible
the dragon is not eligible

Example

DROP TABLE IF EXISTS bi_both_freq_demo;

CREATE TABLE bi_both_freq_demo(title text)
  bigram_index='both_freq'
  bigram_freq_words='for,of,the,with';

INSERT INTO bi_both_freq_demo VALUES
  (1,'lord of the rings trilogy'),
  (2,'house of the dragon season 2'),
  (3,'made for iphone charger');

SELECT id, title FROM bi_both_freq_demo WHERE MATCH('"lord of the"');
SELECT id, title FROM bi_both_freq_demo WHERE MATCH('"made for iphone"');

The queries still match, but the internal selectivity differs:

"lord of the" includes of the, which both_freq is willing to store
"made for iphone" includes for iphone, which first_freq would cover but both_freq would not

Which performance mode should you choose?

The benchmark in this article shows that all can deliver a strong speedup, but it is still just one benchmark on one workload.

Manticore's own documentation says that for most use cases, both_freq is the best mode. That is a sensible default because it aims for a more balanced trade-off between phrase acceleration and indexing cost.

Use the modes like this:

choose both_freq as the default starting point for general phrase-search workloads
choose all when phrase search is especially important and you want the strongest acceleration, accepting higher indexing cost
choose first_freq when many useful phrases in your data involve common bridge words and you want something broader than both_freq
choose the default behavior when phrase acceleration is not important

Benchmark: does bigram indexing really speed up phrase search?

Yes. In a simple local benchmark, the difference was easy to measure.

I used manticore-load to build two 1M-document tables against the same Manticore instance:

one with no explicit bigram_index setting
one with bigram_index='all'

The documents were random 60-80 word texts, and the benchmark repeatedly ran random 2-word phrase queries.

For clarity, both indexing and search were run with --threads=1. Multi-threaded numbers would of course be higher, but single-thread runs make it easier to see what the feature changes on one CPU core.

SELECT COUNT(*) FROM bench_bigram_* WHERE MATCH('"<text/2/2>"')

Benchmark setup

Data load without bigrams:

manticore-load \
  --drop \
  --wait \
  --threads=1 \
  --batch-size=1000 \
  --total=1000000 \
  --init="CREATE TABLE bench_bigram_none_rand(title text)" \
  --load="INSERT INTO bench_bigram_none_rand(id,title) VALUES(<increment>,'<text/60/80>')"

Data load with all bigrams:

manticore-load \
  --drop \
  --wait \
  --threads=1 \
  --batch-size=1000 \
  --total=1000000 \
  --init="CREATE TABLE bench_bigram_all_rand(title text) bigram_index='all'" \
  --load="INSERT INTO bench_bigram_all_rand(id,title) VALUES(<increment>,'<text/60/80>')"

Search benchmark without bigrams:

manticore-load \
  --threads=1 \
  --total=5000 \
  --load="SELECT COUNT(*) FROM bench_bigram_none_rand WHERE MATCH('\\\"<text/2/2>\\\"')"

Search benchmark with all bigrams:

manticore-load \
  --threads=1 \
  --total=5000 \
  --load="SELECT COUNT(*) FROM bench_bigram_all_rand WHERE MATCH('\\\"<text/2/2>\\\"')"

What I observed

On this local run:

Table	QPS	Avg latency
`bench_bigram_none_rand`	`755`	`1.3 ms`
`bench_bigram_all_rand`	`2175`	`0.4 ms`

That is roughly a 2.9x improvement in QPS and about a 3.2x improvement in average latency on the same 1M-document workload.

Indexing was slower with bigram_index='all', which is expected:

without bigrams: about 45k docs/sec
with all: about 17k docs/sec

That trade-off is exactly why multiple modes exist.

Final takeaway

If your main problem is phrase-search performance, treat bigram_index first and foremost as an acceleration feature.

For most real workloads, start with both_freq and measure. Move to all if you need a stronger effect and can afford the extra indexing cost. Consider first_freq when your phrase workload is heavily shaped by common bridge words.

Build a Searchable Catalog with Filters, Facets, and Semantic Search

Sergey Nikolaev — Wed, 06 May 2026 06:17:07 +0000

A search box is easy. A searchable catalog that keeps being useful after the first query is the harder part.

That is the problem this demo takes on. It uses a small board-game catalog, but the shape of the problem is familiar: users type something half-remembered, misspell it, narrow by constraints, keep browsing, open a result, then want "more like this" without starting over. If your product has that flow, most of the work is not the UI polish. It is getting the search behavior right without turning the stack into a science project.

In this article, we build a searchable catalog with autocomplete, typo tolerance, filters, facets, deep pagination, semantic search, and similar-item recommendations.

You can try the hosted version first:

https://catalog.manticoresearch.com

The app itself is implemented in PHP, but that is not really the story here. The interesting part is how little ceremony you need to get from a basic query box to something that already feels like a working catalog: search, filters, facets, and similar-item discovery all show up quickly.

Run it locally

To run the same demo locally, you only need PHP 8.1+, Composer, and Docker (or any other way to run Manticore).

In this setup, Manticore is the search engine behind the catalog: it handles indexing, filtering, faceting, and semantic retrieval. The repo already includes a Docker setup for it, so the quickest way to get the demo running is to clone the repo and start Manticore from the project root:

git clone https://github.com/manticoresoftware/php-catalog-demo
cd php-catalog-demo
docker compose up -d

docker compose ps should show the container as running.

Inside the cloned repo, create the app environment file:

cp app/.env.example app/.env

For a local run, the important part is just how the app reaches Manticore:

MANTICORE_HOST=127.0.0.1
MANTICORE_PORT=9308

Install dependencies:

cd app
composer install

The demo reads those settings and creates a Manticore client:

$settings = require $root . '/config/settings.php';

$client = new Client([
    'host' => $settings['manticore']['host'],
    'port' => $settings['manticore']['port'],
    'transport' => 'Http',
]);

Then load the demo dataset:

php bin/bootstrap-demo.php

That command recreates the demo table and imports the starter catalog, so you begin from a known state instead of debugging old data.

Start the app:

php -S localhost:8081 -t public

Open http://localhost:8081/ and you have a working catalog to search.

Not glamorous. Still worth it. A lot of search demos lose people before the first query because setup sprawls. This one does not need much.

What makes the app feel usable

The part I care about most is not that the demo returns results. Plenty of demos do that. It is that the search flow holds together as users get more specific.

Start with autocomplete

People usually begin with fragments. Sometimes they remember the exact game title. Often they do not.

So the first layer is autocomplete:

$payload = [
    'body' => [
        'query' => $term,
        'table' => $this->tableName,
        'options' => ['limit' => $limit, 'force_bigrams' => 1],
    ],
];
$suggestions = $this->client->autocomplete($payload);

Using force_bigrams here helps tighten typo-tolerant matching for short or slightly wrong input, which is exactly where autocomplete can otherwise get mushy.

This is a small feature, but it changes the feel of the app immediately. Users stop guessing what your catalog calls things.

Make the first results page forgiving

Once the query is submitted, the first page needs to be useful even when the spelling is off by a bit.

$search = (new Search($this->client))
    ->setTable($this->tableName)
    ->limit($limit);

if ($query !== '') {
    $search->search($query);
    if ($fuzzy) {
        $search->option('fuzzy', 1)->option('force_bigrams', 1);
    }
} else {
    $search->search('*');
}

Fuzzy mode is doing plain practical work here: recovering close matches when users do not type the title exactly right.

If you want the lower-level details, see Spell correction and fuzzy search.

Let users narrow without rewriting

This is where many search interfaces get annoying. The query is close enough, but the result set is still too broad, so now the user has to reformulate it from scratch.

Better to let them narrow in place.

Range filters handle constraints like price, player count, play time, and release year. Facets expose the shape of the current result set so users can click into categories or tags instead of thinking up a more precise sentence.

$attributeFilters = [
    'price_min' => $priceMin,
    'price_max' => $priceMax,
    'play_time_min' => $playTimeMin,
    'play_time_max' => $playTimeMax,
    'player_count_min' => $playerCountMin,
    'player_count_max' => $playerCountMax,
    'release_year_min' => $yearMin,
    'release_year_max' => $yearMax,
];

if ($categoryIds !== []) {
    $search->filter('category_id', 'in', $categoryIds);
}
if ($tagIds !== []) {
    $search->filter('tag_id', 'in', $tagIds);
}
$this->applyNumericFilters($search, $attributeFilters);
$search->facet('category_id')->facet('tag_id');

That combination matters more than it may look on paper. In practice, this is where the catalog starts feeling easy to use: a broad query can shrink fast once you click into a category or tag, without losing the original search intent.

Keep deep pagination stable

If people browse further, offset pagination starts showing its age. Data changes between requests, offsets get larger, and eventually "show more" becomes less trustworthy than it should be.

This demo uses scroll tokens instead:

// Page 1 starts a fresh scroll session; next pages continue with returned token.
$effectiveScrollToken = $page > 1 ? $scrollToken : null;
$search->option('scroll', $effectiveScrollToken ?? true);

$resultSet = new ResultSet($this->client->search(['body' => $body], true));
$nextScroll = $resultSet->getScroll();
$hasMore = $nextScroll !== null && (string) $nextScroll !== '';

That gives the app a much better foundation for deep pagination: each request continues from a returned token rather than recomputing larger and larger offsets.

Operationally, this is one of those choices users never notice when it works and definitely notice when it does not. More on the mechanism here: Scroll-Based Pagination.

Add semantic retrieval where keywords fail

Keyword search gets you far. It does not solve everything.

Sometimes users describe something in roughly the right language, but not in the same words your catalog uses. That is where hybrid search earns its keep.

Use hybrid search on the results page

In this demo, one request includes both a lexical query block and a semantic knn block, then combines them with reciprocal rank fusion via options.fusion_method = rrf:

$body = [
    'query' => ['bool' => ['must' => [['query_string' => ['query' => $query]]]]],
    'knn' => [
        'field' => 'description_vector',
        'query' => $query,
    ],
    'options' => ['fusion_method' => 'rrf'],
    'limit' => $limit,
];

The vector field uses auto-embeddings, so the app does not have to generate query vectors on its own:

'description_vector' => [
    'type' => 'float_vector',
    'options' => [
        'MODEL_NAME' => 'sentence-transformers/all-MiniLM-L6-v2',
        'FROM' => 'description',
    ],
],

Because the knn block names the vector field directly ('field' => 'description_vector'), Manticore can embed the query text automatically for KNN search.

That keeps the application logic simpler than many teams expect when they first hear "semantic search." It also lets the results page stay in one flow instead of bolting a separate semantic experience onto the side.

Use similar-item discovery on the detail page

The same vector field does a different job on the item page: "show me similar games" without forcing the user to invent another query. This part uses KNN directly against the current item.

$search = new Search($this->client);
$search->setTable($this->tableName)
    ->knn('description_vector', $source->getId(), self::SIMILAR_KNN_LIMIT)
    ->notFilter('id', 'in', [$source->getId()])
    ->limit(self::SIMILAR_RESULT_LIMIT);

$resultSet = $search->get();
$hits = $this->formatResultSet($resultSet)['hits'];
return array_slice($hits, 0, self::SIMILAR_RESULT_LIMIT);

That is where search stops being a utility and starts helping discovery. On a real detail page, this is the part that makes it easy to keep exploring instead of bouncing back to the search box.

For reference: Auto Embeddings and KNN Search.

Keeping writes and search results in sync

A demo app is easy to trust when the data never changes. Real apps do not get that luxury.

Here, the table stays in sync through the same application flow users and admins already touch: bootstrap for a clean baseline, batched imports from the admin UI, and update/delete actions for individual items.

Prepared imports use the client's batch write methods:

$table = $this->client->table($this->indexConfig['name']);

if ($appendAsNewIds) {
    $table->addDocuments($batch);
} else {
    $table->replaceDocuments($batch);
}

For individual item changes, the app uses the table API directly:

if ($id > 0) {
    $this->table->replaceDocument($document, $id);
} else {
    $this->table->addDocument($document);
}

$this->table->deleteDocument($id);

And if you want to reset the experiment, the admin UI can drop imported records and return to the baseline dataset:

$baseMaxId = $this->resolveBaseMaxId();

$this->table->deleteDocuments([
    'range' => [
        'id' => ['gt' => $baseMaxId],
    ],
]);

No extra background machinery in the demo, no detached sync story to explain away. Just writes going where they need to go.

Why this matters

What this demo shows is not just that Manticore can return results. It shows you can assemble a searchable catalog that feels complete: users can start loosely, narrow quickly with filters and facets, recover from imperfect queries, open an item, and keep discovering from there without the whole stack getting complicated.

That is already enough to make search feel like part of the product, not a bolt-on feature.

Why monitoring your search engine matters: Manticore ➡ Prometheus ➡ Grafana

Sergey Nikolaev — Thu, 09 Apr 2026 04:13:56 +0000

One of our users reached out recently with a familiar problem: search had suddenly become noticeably slower, even though nothing looked obviously broken.

The service was up, no errors in the logs, CPU usage looked normal — yet users were starting to complain that results felt sluggish.

This is how search problems usually show up in production. Not with a dramatic outage, but as a slow, creeping degradation. A little more traffic here, some extra indexing there, and before you know it, performance has slipped.

By the time users notice, the real issue has often been building for hours. Without good visibility you’re left guessing: Is the system overloaded? Is one table eating up resources? Or is something else quietly going wrong?

That’s why monitoring matters. It turns the vague “search feels slow” complaint into something you can actually diagnose and fix.

Introducing the Manticore Grafana dashboard

This is exactly what our new Manticore Grafana dashboard is built for.

Instead of raw metrics, it gives you a clean, practical view of what really matters when running search in production. At a glance you can see:

Is the node healthy?
How heavy is the current load?
Are queries slowing down?
Which tables are using the most resources?

It’s designed to help you move quickly from a user symptom to the actual root cause.

How the stack works

The setup is straightforward: Manticore → Prometheus → Grafana.

Manticore exposes rich internal metrics, Prometheus collects and stores them as time-series data, and Grafana visualizes everything with our pre-built dashboard — including 21 production-ready alerts.

You can launch the entire stack with a single Docker command:

docker run -e MANTICORE_TARGETS=localhost:9308 -p 3000:3000 manticoresearch/dashboard

(Just change the MANTICORE_TARGETS environment variable if your Manticore instance is running somewhere else.)

If you prefer to set things up manually, grab these files:

Minimal Prometheus scrape config:

scrape_configs:
  - job_name: "manticore"
    static_configs:
      - targets: ["localhost:9308"]

Exploring the dashboard

The dashboard is laid out so you can follow a natural troubleshooting flow.

1. Health summary (start here)

Open the dashboard and look at the top row first. It gives you an instant picture of the node’s overall health.

Key panels to watch:

Health / Up — Is Prometheus even able to scrape metrics?
Health / Crash indicator — Any recent crashes?
Workers Utilization % + Load / Queue pressure — These two together are gold. High utilization plus rising queue pressure is one of the clearest early signs the node is approaching saturation.

The System Score panel also gives you a quick overall health rating at a glance.

2. Query load and latency

Next, check what kind of workload the system is handling.

QPS Total shows overall traffic levels.
Search Latency (p95/p99) is one of the most important panels — averages can hide problems, but percentiles show what your users are really experiencing.
Slowest Thread helps spot expensive or stuck queries.
Work Queue Length and Worker Saturation together tell you whether the node is keeping up or starting to fall behind.

3. Memory and resources

This section is one of the most useful because memory pressure is a very common (and often hidden) cause of slowdowns in search engines. Instead of showing one vague number, the dashboard breaks it down so you can see exactly where the growth is happening.

Searchd RSS and Buddy RSS show the total resident memory — how much physical RAM the main search daemon (searchd) and the Buddy helper process are actually using right now.
The Anon RSS panels go one level deeper. “Anonymous” memory is the private, dynamic RAM allocated by Manticore itself (think heap, query caches, loaded data structures, temporary buffers — everything not backed by a file on disk). Unlike file-mapped memory (which the OS can page out or reclaim), anon memory is what usually puts real pressure on your system.

Why show both RSS and Anon RSS? Total RSS gives you the big picture, but Anon RSS tells you the story behind it. If total RSS is climbing but Anon RSS is stable, the growth might be harmless (e.g. more cached files). If Anon RSS is also rising fast, that’s usually a sign that Manticore’s own data structures or query activity are consuming more and more memory — exactly the kind of thing that leads to slower queries or even swapping.

At the bottom you’ll also see several quick counters:

Resources / FDs (searchd) — current number of open file descriptors used by the search daemon. Manticore opens a lot of files for indexes (especially large real-time tables with many disk chunks). If this number gets too high you can hit the OS limit and start seeing “Too many open files” errors. You can raise the soft limit with the max_open_files setting (see the Manticore docs on server settings).
Active workers, table counts, and non-served tables — all quick signals that something might need attention.

4. Table-level insights

Now zoom in on the data itself.

Document counts per table
Top 10 tables by RAM and disk usage
Tables / Health panel — this one is particularly valuable because it combines docs, RAM, disk, and state flags (locked/optimizing) in a single view.

5. Cluster state and history

For distributed setups you get node status and sync state. The history section is excellent for answering the most important question during any incident: what changed right before things slowed down?

Conclusion

Remember the user who reached out because search had suddenly become noticeably slower?

Once he enabled this dashboard, the problem became obvious almost immediately: workers were getting busier, queues were growing, and memory pressure was building — all before any obvious errors or crashes appeared. With clear visibility into what was actually happening inside the engine, he quickly pinpointed the root cause, made the right adjustments, and got performance back to the fast, reliable level his users expected.

The real value of monitoring isn’t just seeing pretty graphs. It’s catching those creeping issues early — before they cost you money or customers.

This dashboard removes that blind spot. It gives you the visibility you need to keep your search fast and reliable.

Monitor Manticore Search in Grafana with One Command

Sergey Nikolaev — Wed, 08 Apr 2026 02:27:24 +0000

The most annoying kind of incident is when database doesn’t go down completely - it just gets slower.

Users start noticing it right away. Complaints come in. Everything is technically still running, but clearly something is off.

And that is usually the hardest part: not noticing the problem, but figuring out what is actually happening.

When everything looks fine, but search is still slow

Let’s take a pretty normal scenario.

Search starts slowing down. It is not crashing. It is not returning obvious errors. The service is up. From the outside, nothing looks broken in a dramatic way.

But users can feel it.

So you open your monitoring:

CPU looks fine.
Average latency does not look too bad.
No obvious alerts.

At first glance, nothing really explains the slowdown.

So you keep digging...

You check the queue. Nothing jumps out immediately.
You look at worker usage. They are busy, but not in a way that tells you much on its own.
You check the logs. Still nothing obvious.

And after a while you get to that frustrating point where you realize you have already checked the usual things, and you still do not know where the problem is.

Each metric, by itself, looks more or less okay. But together, the system is clearly degrading.

So now you are no longer following a clear line of investigation. You are just checking everything you can think of and hoping the pattern shows up.

Meanwhile, time is passing.

What was actually going on

A couple of hours later, the picture finally starts to make sense.

It turns out:

the request queue has been slowly growing;
workers have been sitting near 100% utilization;
one heavy query keeps blocking execution from time to time;
p99 latency is much worse than the average suggests;
and one of the nodes restarted recently.

So the signals were there all along.

The problem was that they were scattered across different places, and it took too long to connect them into one clear story.

The solution: see the whole picture right away

Instead of spending hours piecing all of that together by hand, it is much better to have one place where the important signals are already visible.

That is why we put together a ready-to-use dashboard for Manticore Search that starts with a single Docker command. It comes with Grafana, Prometheus, a preconfigured data source, and built-in alerts.

docker run -p 3000:3000 manticoresearch/dashboard

Environment variables

The container supports two environment variables:

MANTICORE_TARGETS - comma-separated list of Manticore Search instances (default: localhost:9308)
GF_AUTH_ENABLED - set to true to enable Grafana login (by default, anonymous admin access is enabled)

Example:

docker run -p 3000:3000 \
  -e MANTICORE_TARGETS=your-host:9308 \
  manticoresearch/dashboard

If you monitor multiple nodes, pass them as a comma-separated list:

docker run -p 3000:3000 \
  -e MANTICORE_TARGETS=node1:9308,node2:9308,node3:9308 \
  manticoresearch/dashboard

If Manticore is running on a remote server

By default, the dashboard expects Manticore at localhost:9308. If your instance is running on a remote machine, the simplest option is SSH port forwarding:

ssh -L 9308:localhost:9308 user@your-server

After that, local connections to localhost:9308 will be forwarded to the remote server, so the dashboard can connect without additional changes.

A minute later, you have a usable overview of your system.

Not just a pile of graphs, but a dashboard that helps you quickly answer the questions you actually care about when something feels wrong.

You can see queue growth, worker saturation, latency, process state, and query behavior in one place, instead of bouncing between tools and trying to stitch the story together in your head.

What the dashboard shows

The value here is not that there are a lot of panels. The value is that the panels answer the right questions quickly.

The first place to look is the overall system view:

This gives you the basic picture right away:

is the service up;
has it restarted recently;
is there queue pressure;
are workers already under load.

If this row looks healthy, maybe the issue is narrow and local. If it does not, you know right away that the system is under real pressure.

Then you move to load and query behavior:

This is where you can quickly see:

whether work is starting to pile up;
whether workers are saturated;
whether latency is getting worse, especially p95 and p99;
whether one slow thread is causing a disproportionate amount of trouble.

And if you need more context, you can drill down into the rest of the dashboard:

cluster state:

tables and data:

At that point, you are no longer looking at disconnected metrics. You are looking at the system as a whole.

Why this matters

In the kind of situation that used to cost you a couple of hours just to understand, now you can usually spot the direction in a few minutes.

You can see that the queue is growing.
You can see that workers are pinned.
You can see that p99 is climbing.
You can see that one node restarted.
You can see that one query is probably doing most of the damage.

That does not mean the dashboard magically fixes the issue for you.

What it does do is remove the slowest part of the whole process: figuring out where to look.

And in practice, that is often the difference between spending two hours trying to understand the incident and spending five minutes getting to the real problem.

Parallel chunk merging in Manticore Search

Sergey Nikolaev — Tue, 07 Apr 2026 03:52:38 +0000

Starting from Manticore Search 24.4.0, RT table compaction has a more capable execution model. Instead of merging chunk pairs one-by-one in a serial flow, optimization now supports two important improvements:

disk chunk merges can run in parallel
each merge job can merge more than two chunks at once
parallel_chunk_merges: how many RT disk chunk merge jobs may run at the same time
merge_chunks_per_job: how many RT disk chunks a single job can merge in one pass

The compaction docs were also updated to describe optimization as an N-way merge handled by a background worker pool rather than a single serial merge thread.

Why this matters

For RT workloads, the interesting number is often not just how fast you can insert documents, but how long it takes until compaction catches up and the table returns to its target chunk count.

That is especially noticeable when:

you ingest data at a sustained rate
optimize_cutoff is low enough that merges kick in early
you wait for compaction to finish before considering the load fully complete

This matters most in two common cases:

you are doing an initial bulk upload into a real-time table and want the table not just searchable, but already compacted to its steady state before putting more pressure on it
you regularly ingest large batches and want each batch to finish cleanly before the next one arrives

The table is searchable before compaction finishes, but "fully searchable" and "fully optimized" are not the same thing. A higher chunk count can still matter if you care about keeping the table close to its target shape, limiting background merge work before the next ingest wave, or reducing the window where storage is busy with post-load compaction.

To show the difference, we loaded 10 million documents into an RT table. Each document contains:

id bigint
name text with generated text between 10 and 100 words
type int

The table was created with:

CREATE TABLE test(id bigint, name text, type int) optimize_cutoff='16'

So the target was to compact the table back down to roughly 16 disk chunks.

For the benchmark we used manticore-load, our load generation and benchmarking tool. It is useful for reproducing scenarios like this, stress-testing ingestion, and comparing configuration changes without building custom scripts every time.

The data was loaded with:

manticore-load \
  --cache-gen-workers=5 \
  --drop \
  --batch-size=1000 \
  --threads=5 \
  --total=10000000 \
  --init="CREATE TABLE test(id bigint, name text, type int) optimize_cutoff='16'" \
  --load="INSERT INTO test(id,name,type) VALUES(<increment>,'<text/10/100>',<int/1/100>)" \
  --wait

Before: one merge job, two chunks at a time

With the old behavior forced explicitly:

mysql -P9306 -h0 -e "set global parallel_chunk_merges=1; set global merge_chunks_per_job=2"

the run looked like this:

merging started at 14 seconds, when about 1.8M documents had been inserted
all 10M documents were loaded after 1 minute 18 seconds
at that point the data was already fully searchable
compaction kept running in the background until 3 minutes 23 seconds

At 01:18, the table still had more than 50 chunks. Near the end of loading the status looked like:

17:14:50  01:17     98%         133      128.4K   21%     5          53        1         4.22GB      9.9M
17:14:51  01:18     100%        131      310.9K   15%     1          53        1         4.27GB      10.0M
...
17:16:55  03:22     100%        0        49.4K    4%      1          17        1         4.27GB      10.0M
...
Total time:       03:23

This is the classic pattern of a healthy ingest pipeline followed by a long merge tail.

After: parallel merges plus larger merge jobs

With the new settings:

mysql -P9306 -h0 -e "set global parallel_chunk_merges=3; set global merge_chunks_per_job=5"

the same workload finished much faster:

merging again started at about 14 seconds
all 10M documents were again loaded after about 1 minute 18 seconds
full compaction finished after only 1 minute 31 seconds

The end of the run looked like this:

17:19:22  01:17     99%         127      127.9K   28%     6          26        1         4.22GB      9.9M
17:19:23  01:18     100%        132      1883.8K  17%     1          23        1         4.25GB      10.0M
...
17:19:36  01:31     100%        0        110.2K   3%      1          17        1         4.25GB      10.0M
...
Total time:       01:31

What changed in practice

The ingest phase itself stayed roughly the same:

old settings: 1:18 to load all data
new settings: 1:18 to load all data

The big gain came from post-ingest compaction:

old settings: about 2:05 of additional merge time after loading finished
new settings: about 0:13 of additional merge time after loading finished

That is roughly:

55% lower total time overall, from 3:23 down to 1:31
about 90% less merge tail after the last document was inserted

Chunk pressure during ingest was much lower too. Near the end of loading:

old settings: 53 chunks
new settings: 23 chunks

So the improvement is not just that compaction finishes sooner. It also keeps the chunk count under control much more aggressively while data is still being inserted.

What about the new defaults?

On this server, with the new default settings and no explicit tuning at all, the same workload finished in:

Total time:       01:57

That already cuts the old 03:23 result substantially, while still leaving room for additional tuning with:

parallel_chunk_merges
merge_chunks_per_job

In other words, the new defaults already improve the out-of-the-box experience, and systems with enough I/O headroom can push compaction even further by increasing both settings carefully.

Broader benchmark results: row-wise and columnar storage

The 10M-document example above shows the mechanics clearly, but the larger picture is even more interesting. In a wider test matrix we measured the combined load + optimize time for both row-wise and columnar storage across multiple values of:

parallel_chunk_merges
merge_chunks_per_job

The headline result is that, in some cases, tuning these settings can reduce total load + optimize time by:

up to 4.6x for row-wise storage
up to 6.8x for columnar storage

Here is the best-vs-worst picture from that test set:

Storage	Best settings	Best time	Slowest settings	Slowest time	Improvement
Row-wise	`parallel_chunk_merges=4`, `merge_chunks_per_job=5`	14:35	`parallel_chunk_merges=1`, `merge_chunks_per_job=2`	67:15	4.61x
Columnar	`parallel_chunk_merges=4`, `merge_chunks_per_job=5`	15:10	`parallel_chunk_merges=1`, `merge_chunks_per_job=2`	99:14	6.80x

There is also a useful tuning pattern in the full results:

the best runs for both storage modes clustered around parallel_chunk_merges=4..5
the best runs also clustered around merge_chunks_per_job=4..5
the slowest results were consistently at parallel_chunk_merges=1 with merge_chunks_per_job=2

In other words, the old serial two-chunk pattern is not just a little slower. On large workloads it can become dramatically slower, especially with columnar storage.

How to think about the two settings

The new docs describe two separate levers:

parallel_chunk_merges increases how many merge jobs can run at once
merge_chunks_per_job increases how many chunks each job can consume

Lower merge_chunks_per_job values make it easier to schedule more jobs in parallel because each job consumes fewer chunks from the available pool. If a table has many chunks waiting to be compacted, smaller jobs leave more independent chunks available for other workers, so the scheduler can keep several merges active at once. Higher values reduce the total number of merge steps, but each job becomes heavier and grabs a larger portion of the available chunks, which can leave less room for concurrent jobs.

The right balance depends on your storage and workload, but the benchmark above shows that combining both approaches can dramatically reduce the time spent waiting for RT chunk compaction to finish.

Takeaway

If your RT workloads spend too long waiting for chunk compaction after bulk inserts, the new parallel merge model changes that equation significantly.

On this 10M-document test with optimize_cutoff=16:

Mode	Searchable at	Fully optimized at
Old settings: `parallel_chunk_merges=1`, `merge_chunks_per_job=2`	1:18	3:23
New defaults	1:18	1:57
Tuned new settings: `parallel_chunk_merges=3`, `merge_chunks_per_job=5`	1:18	1:31

the time until all data became searchable stayed the same
the time until chunk compaction completed dropped from 3:23 to 1:31
even the new defaults reduced the total time to 1:57

This is exactly the kind of improvement that matters for operational RT indexing. The data is searchable as soon as it is loaded, and that point stayed about the same in both runs. The difference is what happens after that: how long the server keeps spending time compacting chunks in the background before the table returns to its target shape. If your workflow depends on the table becoming compact again before the next heavy ingest, before a maintenance window closes, or before you hand the system over to a search workload that should run with fewer chunks and less background merge pressure, the improvement is substantial.

S3 Streamable Backup: Direct-to-Cloud Backups for Manticore Search

Sergey Nikolaev — Mon, 06 Apr 2026 08:24:45 +0000

Since we introduced the backup tool in Manticore Search 6, backing up your data has become significantly easier. But we kept hearing the same question: "What about cloud storage?" Today, we're excited to announce that manticore-backup now supports S3-compatible storage with streaming uploads — no intermediate files, no local disk space headaches, just direct-to-cloud backups.

The Problem with Traditional Backups

When you're running Manticore Search in production, your datasets can grow quickly. Backing up to local storage has its limitations:

Disk space constraints: You need free space equal to your backup size on the same machine
Manual transfer steps: Backup locally, then upload to cloud storage
Time overhead: The copy-then-upload dance doubles your backup window
Complexity: Scripting reliable uploads with resume capability, encryption, and error handling

Streamable S3 Backup: How It Works

The new S3 storage support streams your backup data directly to S3-compatible storage. Here's what happens under the hood:

No intermediate files: Data streams from Manticore straight to S3
Automatic multipart uploads: Large files are automatically chunked and uploaded in parallel
Built-in encryption: SSE-S3 encryption is enabled by default for AWS S3 (configurable for other providers)
Compression support: Optional zstd compression reduces transfer time and storage costs
Manifest-based restore: No s3:ListBucket permission required for restores

Supported Storage Providers

We've tested with AWS S3, MinIO, and Cloudflare R2, but any S3-compatible storage should work. The implementation uses the standard AWS SDK for PHP, so if it speaks the S3 API, it should work.

Usage

Using S3 backup is as simple as changing your destination path:

CLI

# Set your credentials
export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
export AWS_REGION=us-east-1

# Backup to S3
manticore-backup --config=/etc/manticore/manticore.conf --backup-dir=s3://my-bucket/manticore-backups

# With custom endpoint (MinIO, Wasabi, etc.)
export AWS_ENDPOINT_URL=https://minio.example.com
manticore-backup --config=/etc/manticore/manticore.conf --backup-dir=s3://my-bucket/backups

Environment Variables

Variable	Description
`AWS_ACCESS_KEY_ID`	Your S3 access key
`AWS_SECRET_ACCESS_KEY`	Your S3 secret key
`AWS_REGION`	S3 region (e.g., `us-east-1`)
`AWS_ENDPOINT_URL`	Custom endpoint for S3-compatible storage
`AWS_S3_ENCRYPTION`	Set to `0` to disable SSE-S3 encryption (for MinIO/custom endpoints)

Performance Considerations

S3 streaming backup performance depends primarily on your network bandwidth and the S3 provider's upload speeds. Unlike local disk backups where you're limited by disk I/O, S3 backups are network-bound. The key advantage is eliminating the "write locally, then upload" overhead — data streams directly from Manticore to S3 without touching the local filesystem.

For optimal performance:

Ensure adequate upload bandwidth to your S3 endpoint
Consider using compression (--compress) to reduce data transfer
Multipart uploads are automatic for files over 5MB, improving reliability for large datasets

Restore from S3

Restoring works seamlessly too. The tool downloads files to a temporary directory first, then performs the restore:

# List available backups
manticore-backup --backup-dir=s3://my-bucket/manticore-backups --list

# Restore a specific backup
manticore-backup --config=/etc/manticore/manticore.conf --backup-dir=s3://my-bucket/manticore-backups --restore=backup-20250115120000

Required S3 Permissions

For backup:

s3:PutObject
s3:PutObjectAcl (if using ACLs)

For listing backups:

s3:ListBucket

For restore:

s3:GetObject

Note: While listing backups requires s3:ListBucket, restoring a specific backup does not. If you know the backup folder name (e.g., backup-20250115120000), you can restore directly using --restore with just s3:GetObject permission. The manifest file tracks all backup contents, so no directory listing is needed.

Use Cases

Cloud-Native Deployments

Running Manticore in Kubernetes or Docker? S3 backup fits naturally into cloud-native workflows:

# Kubernetes CronJob example
apiVersion: batch/v1
kind: CronJob
metadata:
  name: manticore-backup
spec:
  schedule: "0 2 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: backup
            image: manticoresearch/manticore:latest
            command:
            - manticore-backup
            - --config=/etc/manticore/manticore.conf
            - --backup-dir=s3://my-backup-bucket/manticore
            env:
            - name: AWS_ACCESS_KEY_ID
              valueFrom:
                secretKeyRef:
                  name: s3-credentials
                  key: access-key
            - name: AWS_SECRET_ACCESS_KEY
              valueFrom:
                secretKeyRef:
                  name: s3-credentials
                  key: secret-key
          restartPolicy: OnFailure

Disaster Recovery

Store backups in a different region or even a different cloud provider:

# Primary backup to local S3-compatible storage
export AWS_ENDPOINT_URL=https://minio.internal.company.com
manticore-backup --backup-dir=s3://backups-primary/manticore

# Secondary backup to AWS S3 for DR
unset AWS_ENDPOINT_URL
export AWS_REGION=eu-west-1
manticore-backup --backup-dir=s3://company-dr-backups/manticore

Reducing Local Storage Requirements

For large datasets, local backup storage can be expensive. With S3 streaming:

No need to provision large backup volumes
Pay only for the S3 storage you use
Lifecycle policies can automatically move old backups to cheaper storage classes

Technical Details

Streaming Architecture

The S3 storage implementation uses a streaming approach:

File-by-file streaming: Each table file is read and uploaded as a stream
Automatic multipart: Files over 5MB automatically use multipart upload for reliability
Compression on-the-fly: If enabled, zstd compression happens during the stream
Checksum verification: Each file is checksummed to ensure integrity

Storage Interface

The S3 support is built on a new StorageInterface that abstracts storage operations. This means:

Local filesystem and S3 share the same code path
Future storage backends (GCS, Azure Blob) can be added easily
Consistent behavior regardless of storage type

Migration from Local Backups

Already using local backups? Migration is straightforward:

Set up your S3 credentials
Change --backup-dir from /local/path to s3://bucket/path
That's it! The same commands work exactly the same way

Your existing local backups remain accessible, and you can gradually transition to S3 or maintain both for redundancy.

Conclusion

S3 streamable backup brings Manticore Search backup capabilities to the cloud era. Whether you're running in a cloud-native environment, need cross-region disaster recovery, or simply want to reduce local storage overhead, direct-to-S3 streaming makes backups simpler and more efficient.

The feature is available now in manticore-backup. Check out the documentation for more details, and let us know what you think!

Ready to try it? Install Manticore Search and start backing up to S3 today. Questions or feedback? Join us on Slack or GitHub.

Prepared statements in Manticore Search

Sergey Nikolaev — Fri, 03 Apr 2026 04:38:10 +0000

Imagine you're building a powerful search application. Users type in keywords, and your backend needs to query the Manticore Search database to find matching results. A common (and tempting!) approach is to embed user input directly into your SQL queries. For example, you might filter by a numeric field such as a category or record ID. If the user passes a normal value like 5, the query is SELECT * FROM products WHERE id=5. But what if they pass 1 OR 1=1? The query becomes SELECT * FROM products WHERE id=1 OR 1=1 — the condition is always true, so the query returns every row instead of one. This is SQL injection.

Fortunately, there's a safer and more efficient way: prepared statements. Essentially, prepared statements separate your SQL code from the data you pass in. Instead of building the entire query string each time, you define the query structure once with placeholders and then supply the search terms separately. You can learn more about the concept on Wikipedia.

Manticore Search supports prepared statements over the standard MySQL protocol, giving you a powerful tool for building secure search applications. By using prepared statements, you'll not only dramatically reduce the risk of SQL injection, but you'll also improve the readability of your code.

Prepared statements aren't just a feature; they're sometimes a requirement. For example, the Rust sqlx library works with the MySQL endpoint solely using prepared statements. Also, some OLE DB connectors that enable MS SQL to work with a MySQL server use prepared statements internally.

Why Use Prepared Statements?

Security First (SQL Injection): SQL injection is a web security vulnerability that allows attackers to interfere with the queries an application makes to its database. It happens when user input is improperly incorporated into a SQL query, allowing malicious code to be executed. For example, consider a simple search query built by concatenating a user's search term directly into the SQL:

// Vulnerable code example (DO NOT USE!)
$productId = $_GET['search'];
$query = "SELECT * FROM products WHERE id= " . $productId;

If $productId contained something like 0 OR 1=1, the query would become SELECT * FROM products WHERE id= 0 OR 1=1, effectively bypassing the WHERE clause and returning all products.

Prepared statements prevent this by treating user input strictly as data, not as part of the SQL command itself. The database driver handles the escaping and quoting, ensuring that any potentially harmful characters are neutralized. Here's the same query using a prepared statement:

// Secure code example using a prepared statement
$productId = $_GET['search'];
$stmt = $mysqli->prepare("SELECT * FROM products WHERE id= ?");
$stmt->bind_param("i", $productId);
$stmt->execute();

In this case, even if $productId contains malicious code, it will be treated as a literal value, not executable SQL.

How They Work

Prepared statements operate using a simple three-step process:

Prepare: First, you send the SQL statement with placeholders (like ? or ?VEC?) to Manticore Search. Manticore parses this statement and creates a query plan. It then returns a unique identifier for this prepared statement.
Bind: Next, you send the actual data – the values for the placeholders – to Manticore separately. This is where the security comes in; the data is treated purely as data, not as SQL code.
Execute: Finally, you instruct Manticore to execute the prepared statement using the stored query plan and the bound parameters.

Think of it like creating a template. You build the structure once, then fill in the blanks with different information each time you need to use it.

Parameter Placeholders: `?` & `?VEC?`

Manticore Search uses specific placeholders to identify parameters within your prepared statements:

? represents a single parameter – this could be an integer, a floating-point number, or a string. When using this placeholder, Manticore automatically handles escaping and quoting for string values, protecting against SQL injection and ensuring proper data formatting.
?VEC? is designed for lists of numeric values. It expects a string containing numbers separated by commas and optional spaces (e.g., 1, 2.3, 4, 1e-10, INF). Crucially, no escaping or quoting is applied to the values within ?VEC?. Valid input consists solely of numbers, commas, and spaces; any other characters will likely result in an error. This makes it perfect for directly inserting numeric vectors into your data - both float vectors and integer MVAs (multi-value attributes).

Example: prepared statements in PHP

Let's see how prepared statements work in practice using PHP. We'll demonstrate both a simple insert with string values and a more complex insert involving a floating-point vector using the ?VEC? placeholder.

First, a basic insertion:

<?php
// Assuming you have a valid MySQLi connection established ($mysqli)

$stmt = $mysqli->prepare("INSERT INTO products (name, description) VALUES (?, ?)");
$productName = "Awesome Widget";
$productDescription = "A truly amazing widget for all your needs.";
$stmt->bind_param("ss", $productName, $productDescription); // "ss" indicates two strings
$stmt->execute();

echo "Product added successfully!";
?>

This code prepares the INSERT statement, binds the string values for the product name and description, and then executes the query. The resulting SQL executed by Manticore would be:

INSERT INTO products (name, description) VALUES ('Awesome Widget', 'A truly amazing widget for all your needs.');

Now, let's tackle an example using a float vector. What is ?VEC?? It is a placeholder (only used in prepared statements) for a vector — a list of numbers, e.g. for embeddings or similar data. In Manticore SQL, a vector literal is always written with parentheses: (0.1, 0.2, 0.3). So when you use a prepared statement and have a vector parameter, you write those parentheses in the SQL string and use ?VEC? where the numbers go. You bind only the comma-separated numbers (e.g. "0.1,0.2,0.3"); you do not bind the ( and ) — they stay in the query. Without prepared statements you would build the full literal (0.1, 0.2, 0.3) yourself in the query string.

In PHP mysqli, the usual way to bind ?VEC? values is as strings, so iss is the normal choice in this example. If you want to stream a larger vector payload, you can also bind the parameter as b and send the contents with send_long_data().

<?php
// Assuming you have a valid MySQLi connection established ($mysqli)

$stmt = $mysqli->prepare("INSERT INTO items (item_id, coords, features) VALUES (?, (?VEC?),(?VEC?))");
$itemId = 123;
$coordVector = "20.245,54.354,30.000"; // that is vector of floats
$featureSet = "1,4,20,456,112,3"; // that is set of integer values (MVA)
$stmt->bind_param("iss", $itemId, $coordVector, $featureSet); // "i" for integer (itemId), "s" for string
$stmt->execute();

echo "Item with feature vector added successfully!";

$itemId = 124;
$coordVector = "18.500,42.000,31.125"; // Another float vector
$featureSet = "0,6,34,665,22,3445,221,564,2232,5644,43"; // Example with more feature values

// For larger payloads you can bind the second ?VEC? as a blob and stream it.
$featurePlaceholder = "";
$stmt->bind_param("isb", $itemId, $coordVector, $featurePlaceholder); // "b" is for blob data
// bind_param() must be called before send_long_data().
$stmt->send_long_data(2, $featureSet); // zero-based index: 2 means the third bound parameter
$stmt->execute();

echo "Item with feature vector added successfully!";
?>

Notice that the parentheses are part of the SQL string in the prepare() call. We only bind the values within the parentheses using the ?VEC? placeholder. The resulting SQL executed by Manticore will be:

INSERT INTO items (item_id, coords, features) VALUES (123, (20.245,54.354,30.000), (1,4,20,456,112,3));
INSERT INTO items (item_id, coords, features) VALUES (124, (18.500,42.000,31.125), (0,6,34,665,22,3445,221,564,2232,5644,43));

Using ?VEC? in a prepared statement gives you the same benefits as with the ? placeholder: the vector values are sent as data, not as part of the SQL text, so they cannot be interpreted as SQL and cannot cause injection. You also avoid having to manually build or escape the vector literal in your application — Manticore receives the bound numbers and formats the vector correctly, which keeps the query safe and the data consistent.

Important Considerations & Limitations

While powerful, Manticore's prepared statements have a few limitations to keep in mind.

Multi-Queries: Only a single SQL statement is allowed per prepared statement. Attempts to use multi-queries (e.g., SELECT ...; SHOW META) will fail. If you need to execute multiple statements, prepare a separate statement for each one and execute them sequentially within the same session.

Numeric Types: Some database drivers (like mysql2 for Node.js) might send numeric parameters as DOUBLE by default. This could lead to unexpected behavior if you require strict integer behavior (like rejecting negative IDs). In such cases, consider sending integers as strings or utilize driver-specific integer types (e.g., BigInt) to ensure correct data handling.

Rust sqlx Users: If you're using the sqlx crate in Rust, be aware that when reading result set rows, you must use column indices rather than column names. While column names are present in the result set, sqlx doesn't utilize them for mapping. For example, use row.try_get(0)? instead of row.try_get("id")?.

Conclusion

Prepared statements offer a critical combination of security, readability, and potential performance gains when working with Manticore Search. By separating your SQL logic from your data, you dramatically reduce the risk of SQL injection attacks, improve code maintainability, and potentially speed up query execution. We strongly encourage you to adopt prepared statements in your Manticore Search applications.

For more in-depth information, be sure to consult these resources:

Manticore Search Documentation on Prepared Statements: https://manual.manticoresearch.com/Connecting_to_the_server/MySQL_protocol#Prepared-statements
Wikipedia - Prepared Statements: https://en.wikipedia.org/wiki/Prepared_statement

This guide provides a solid foundation for using prepared statements effectively in your Manticore Search projects, leading to more secure, efficient, and maintainable applications.

KNN prefiltering in Manticore Search

Sergey Nikolaev — Thu, 02 Apr 2026 05:50:56 +0000

Vector search rarely happens in isolation. You almost always have filters — a price range, a category, a date window, a geographic boundary. The question is: when do those filters get applied?

The answer makes a surprising difference in result quality.

KNN prefiltering is available in Manticore Search starting from version 19.0.1.

The problem with postfiltering

Consider a product catalog with 10 million items. A user asks for the 10 nearest neighbors to a query vector, restricted to category = 'electronics'. With postfiltering, the KNN search runs first over the entire dataset, then the filter is applied to the results. If electronics make up 5% of the catalog, the graph explores nodes that are mostly irrelevant. Worse, many of the k nearest neighbors may not be electronics at all, so the final result set can be much smaller than requested. Ask for 10 results, get 2.

This is the fundamental limitation of postfiltering: the HNSW graph doesn't know about your filters. It finds the closest vectors overall, not the closest vectors that match your criteria. The more selective the filter, the worse the problem gets.

What prefiltering does differently

Prefiltering passes the filter into the HNSW graph traversal itself. As the algorithm explores candidate nodes, each one is checked against the filter before being added to the result heap. Only matching documents contribute to the final k results. This means you reliably get the k results you asked for, assuming k matching documents exist in the dataset.

In Manticore Search, prefiltering is enabled by default when your query combines KNN search with attribute filters. No special syntax is needed:

SELECT id, title, knn_dist()
FROM products
WHERE knn(embedding, (0.12, 0.45, 0.78, 0.33))
  AND category = 'electronics'
  AND price < 500
LIMIT 10;

Both category = 'electronics' and price < 500 are evaluated during HNSW traversal, not after. The equivalent JSON query:

POST /search
{
    "table": "products",
    "knn": {
        "field": "embedding",
        "query": [0.12, 0.45, 0.78, 0.33]
    },
    "query": {
        "bool": {
            "must": [
                { "equals": { "category": "electronics" } },
                { "range": { "price": { "lt": 500 } } }
            ]
        }
    },
    "limit": 10
}

Naive prefiltering and where it falls short

The obvious first approach is straightforward: traverse the HNSW graph normally, compute distances for every neighbor, but only add filter-matching nodes to the result heap. Filtered-out nodes still participate in navigation — if a non-matching node has a competitive distance, it enters the candidate queue and its neighbors get explored. The filter only gates what goes into the results.

This actually works reasonably well. The graph stays connected because filtered-out nodes are still traversed. But it has a performance problem that gets worse as the filter becomes more selective: every unvisited neighbor gets a distance computation regardless of whether it passes the filter. Distance computation is the most expensive operation in the search. With a filter matching 5% of documents, 95% of that work produces results that are immediately discarded. The algorithm pays full cost for navigation but gets no results from most of the work.

How Manticore solves it: ACORN-1

Manticore uses an ACORN-1-based algorithm (from the ACORN paper, SIGMOD 2024) that improves on naive prefiltering in two ways:

No distance computation for filtered-out nodes. When visiting a node's neighbors, ACORN-1 checks the filter first and only computes distance for nodes that pass. Filtered-out neighbors are never scored. When 95% of nodes fail the filter, this saves roughly 95% of the distance work compared to the naive approach.

Adaptive expansion through filtered-out nodes. When a neighbor fails the filter, the algorithm looks through that node's own neighbors to find filter-passing nodes further away. If those neighbors also fail the filter and not enough matching candidates have been found yet, it keeps going — 3 hops, 4 hops, as far as needed. The more selective the filter, the more aggressively the algorithm expands. This targeted walk through non-matching neighborhoods reaches matching candidates without scoring the non-matching ones along the way.

Think of it as searching for Italian restaurants in a city. The naive approach checks the menu at every restaurant and only keeps the Italian ones. ACORN-1 glances at the sign first — "French, skip; Thai, skip" — without going inside. And when it sees a stretch of non-Italian restaurants, it walks past them, peeking around each corner until it finds an Italian place on the other side.

Manticore activates ACORN-1 when fewer than 60% of total documents pass the filter. Above that threshold, naive prefiltering works well enough on its own.

Automatic brute-force fallback

Prefiltering works well across a wide range of filter selectivities, but there's an extreme case: what if only 50 documents out of 10 million match the filter? Traversing the HNSW graph — even with ACORN-1 — visits far more nodes than just scanning those 50 documents directly.

Manticore detects this automatically. When prefiltering is enabled, the query planner estimates the cost of HNSW traversal versus a brute-force distance scan over the filtered subset. It uses histogram-based selectivity estimates to predict how many documents pass the filter, then compares that against the expected number of nodes HNSW would visit. If brute-force is cheaper, Manticore skips HNSW entirely and scans the filtered documents directly.

This means you don't need to think about edge cases. Prefiltering adapts: ACORN-1 for moderate selectivity, brute-force for extreme selectivity, and standard HNSW when no filter is present.

When to use postfiltering instead

Prefiltering isn't always the best choice. There are cases where postfiltering is preferable:

When you want the closest vectors regardless of filters. Postfiltering gives you the k nearest neighbors from the full dataset, then removes non-matching ones. If your application tolerates getting fewer than k results and you care most about vector distance quality, postfiltering is simpler and more predictable.
When the filter matches most documents. If 95% of documents pass the filter, prefiltering adds overhead for almost no benefit — nearly every candidate matches anyway.
When you're debugging or benchmarking. Postfiltering gives you a clean baseline: pure HNSW results with a filter on top. This makes it easier to isolate whether a quality issue comes from the graph or the filter.

To explicitly request postfiltering in SQL:

SELECT id, knn_dist()
FROM products
WHERE knn(embedding, (0.12, 0.45, 0.78, 0.33), { prefilter=0 })
  AND category = 'electronics'
LIMIT 10;

In JSON, set "prefilter": false inside the knn object:

POST /search
{
    "table": "products",
    "knn": {
        "field": "embedding",
        "query": [0.12, 0.45, 0.78, 0.33],
        "prefilter": false
    },
    "query": {
        "equals": { "category": "electronics" }
    },
    "limit": 10
}

Forcing brute-force

If you know your dataset is small enough or your filters selective enough that a linear scan is the right strategy, you can force brute-force mode directly:

SELECT id, knn_dist()
FROM products
WHERE knn(embedding, (0.12, 0.45, 0.78, 0.33), { fullscan=1 })
  AND category = 'electronics'
LIMIT 10;

This skips HNSW entirely and computes exact distances over all documents that pass the filter. It guarantees perfect recall at the cost of linear-time scanning.

Summary

Prefiltering is the default in Manticore and the right choice for most filtered KNN queries. It guarantees you get k results (if they exist). Manticore automatically picks the best strategy based on how selective the filter is: standard filtered HNSW when most documents match, ACORN-1 when fewer than 60% pass (saving distance computations on filtered-out nodes), and brute-force when the filtered subset is small enough to scan directly. The query planner estimates filter selectivity per-query, per-segment, so there's nothing to tune.

Use postfiltering (prefilter=0 in SQL, "prefilter": false in JSON) when you want the globally closest vectors and can tolerate getting fewer than k results. Use brute-force (fullscan=1 in SQL, "fullscan": true in JSON) when you know a linear scan is the right strategy for your data.

Hybrid search in Manticore Search

Sergey Nikolaev — Wed, 01 Apr 2026 10:46:41 +0000

Search is rarely a one-size-fits-all problem. A user typing "cheap running shoes" wants exact keyword matches, but a user asking "comfortable footwear for jogging" is expressing the same intent in different words. Traditional full-text search handles the first case well. Vector search handles the second. Hybrid search combines both in a single query so you don't have to choose.

In modern search systems, this is often described as combining lexical (sparse) retrieval with semantic (dense) retrieval. Different terms, same idea: exact matching plus meaning.

What is hybrid search?

Hybrid search runs a full-text (BM25) search and a vector (KNN) search side by side, then merges the two result lists into one. Documents that score well on either signal (or both) rise to the top.

Full-text search is great at exact keywords, rare terms, and identifiers. Vector search understands meaning — that "automobile" and "car" are the same concept — because their embeddings are nearby in vector space.

Each method has blind spots:

Full-text struggles with synonyms and natural language
Vector search struggles with exact tokens like SKUs, error codes, and IDs

Hybrid search covers both.

How hybrid search fits into modern search pipelines

Hybrid search is the retrieval stage — the part that finds relevant candidates from your dataset.

Instead of relying on a single method, hybrid search combines keyword matching and semantic similarity to produce a stronger result set from the start.

In practice, this means:

Better recall for natural language queries
Precise matching for identifiers like SKUs or error codes
More relevant results without needing complex query logic

The goal is simple: return the best possible candidates in a single pass, using both signals together.

When should you use it?

Hybrid search is a good fit when:

Your queries mix intent and specifics. A search like python error 403 forbidden benefits from keyword precision on the error code and semantic understanding of the problem description.
You're building a RAG pipeline. Retrieval-Augmented Generation needs the most relevant chunks fed to the LLM. Hybrid retrieval consistently finds more relevant documents than either method alone.
Your catalog has structured and unstructured data. E-commerce products have precise names and model numbers (keyword territory) but also descriptions where meaning matters more than exact wording.
You can't predict how users will search. Some will paste exact phrases, others will describe what they're looking for in natural language. Hybrid search handles both gracefully.

How it works

Manticore uses Reciprocal Rank Fusion (RRF) to merge results. The idea is simple: instead of trying to compare raw BM25 scores with KNN distances (which are on completely different scales), RRF looks at rank positions. A document that's ranked #1 in the text results and #3 in the KNN results gets a higher combined score than a document that only appears in one list.

Here's a quick example. Suppose a text search and a KNN search each return their own top 3:

Text search results:

Rank	Document
1	Doc A
2	Doc B
3	Doc C

KNN search results:

Rank	Document
1	Doc C
2	Doc A
3	Doc D

RRF scores each document using the formula 1 / (rank_constant + rank). With the default rank_constant=60:

Document	Text contribution	KNN contribution	RRF score
Doc A	1/(60+1) = 0.0164	1/(60+2) = 0.0161	0.0325
Doc C	1/(60+3) = 0.0159	1/(60+1) = 0.0164	0.0323
Doc B	1/(60+2) = 0.0161	—	0.0161
Doc D	—	1/(60+3) = 0.0159	0.0159

Doc A ranks highest because it appears near the top in both lists. Doc C is close behind for the same reason. Doc B and Doc D each appear in only one list, so they score lower.

Why RRF?

There are two common ways to combine results:

Rank-based fusion (RRF) — simple, robust, no need to normalize scores
Score-based fusion — normalize scores first, then combine

Manticore uses RRF because it works well out of the box and avoids score calibration problems.

Under the hood, a hybrid query is split into independent sub-queries — one for full-text, one (or more) for KNN — that run in parallel. Once all sub-queries finish, RRF fuses their ranked result lists into a single output.

Why not just use one or the other?

Consider a support knowledge base with articles for different error codes — connection failures, authentication problems, sync issues. A user sees error E-5020 on screen and reports: "I can't connect to the server."

Vector search understands the symptom but not the error code. A KNN search for "can not connect to the server" returns:

#	Title	KNN distance
1	Error E-5030: DNS Resolution Failed	0.572
2	Error E-2091: App Loading Timeout	0.583
3	Error E-5020: SSL Certificate Mismatch	0.605
4	Error E-5010: Service Unavailable	0.622
5	Error E-4001: Login Failed	0.665

The correct article (E-5020) is buried at #3. KNN ranks DNS and timeout errors higher because their descriptions are semantically closer to "can't connect." The actual problem — an SSL certificate mismatch — uses completely different vocabulary, so it scores lower.

You might think: just add the error code to the KNN query. But "E-5020" and "E-5010" are arbitrary identifiers with no semantic meaning — embeddings treat them as nearly identical tokens. KNN for "E-5020 can not connect to the server" does move E-5020 to #1, but only because the added text shifts the semantic context — the error code itself carries no weight.

Hybrid search solves this by sending each signal where it works best — the error code to full-text, the symptom to KNN:

SELECT title, hybrid_score()
FROM support_articles
WHERE knn(embedding, 'can not connect to the server')
  AND MATCH('E-5020')
LIMIT 5
OPTION fusion_method='rrf';

#	Title	Hybrid score
1	Error E-5020: SSL Certificate Mismatch	0.032
2	Error E-5030: DNS Resolution Failed	0.016
3	Error E-2091: App Loading Timeout	0.016
4	Error E-5010: Service Unavailable	0.016
5	Error E-4001: Login Failed	0.015

E-5020 jumps from #3 to #1 with twice the score of everything else. Full-text treats "E-5020" as an exact string — not similar to "E-5010", not close enough, just different. KNN ensures related connection errors still appear below for context.

This is the core value of hybrid search:

Identifiers → full-text
Meaning → vector search

Each method covers the other's blind spot.

Getting started

The simplest way to run a hybrid search is with hybrid_match(). If your table has auto-embeddings configured, one line does everything — text search, embedding generation, KNN search, and RRF fusion:

SELECT id, hybrid_score()
FROM products
WHERE hybrid_match('running shoes');

The JSON equivalent:

POST /search
{
  "table": "products",
  "hybrid": { "query": "running shoes" }
}

Manticore:

generates embeddings
runs both searches in parallel
fuses results

Full control: explicit MATCH + KNN

When you need to supply your own vectors or tune individual sub-queries, use the explicit form with MATCH() and KNN() in the WHERE clause:

SELECT id, hybrid_score()
FROM products
WHERE match('running shoes')
  AND knn(embedding, (0.12, 0.45, 0.78, ...))
OPTION fusion_method='rrf';

POST /search
{
  "table": "products",
  "knn": {
    "field": "embedding",
    "query_vector": [0.12, 0.45, 0.78, "..."]
  },
  "query": { "match": { "title": "running shoes" } },
  "options": { "fusion_method": "rrf" }
}

Each result includes:

hybrid_score() — fused score (used for default sorting)
weight() — BM25 score
knn_dist() — vector distance

Attribute filters (AND category = 'footwear') apply to both sub-queries.

Tuning

Three options let you adjust fusion behavior:

rank_constant — controls how much top positions dominate the fused score. Lower values (e.g. 10) make rank #1 count significantly more than rank #5. Higher values flatten the curve. See rank_constant.
fusion_weights — lets you give different importance to each sub-query. If text relevance matters more than vector similarity, weight it higher. See fusion_weights.
window_size — how many results each sub-query retrieves before fusion. By default, Manticore computes this automatically from your KNN parameters and query LIMIT. See window_size.

Multi-vector fusion

Hybrid search isn't limited to one text search plus one KNN search. You can fuse multiple vector searches together — useful when your data has several distinct semantic dimensions. For example, an e-commerce product has a textual description and a photo. A user searching for "minimalist white sneakers" cares about both: the title should match the style, and the product image should look like what they have in mind. By encoding the title and the image into separate vector spaces, you can search both at once and let RRF surface products that match across all three signals — keywords, text meaning, and visual similarity:

SELECT id, hybrid_score()
FROM products
WHERE match('running shoes') AS text
  AND knn(title_vec, (0.12, 0.45, ...)) AS title_sim
  AND knn(image_vec, (0.88, 0.21, ...)) AS image_sim
OPTION fusion_method='rrf',
       fusion_weights=(text=0.5, title_sim=0.3, image_sim=0.2);

All sub-queries run in parallel and are fused together via RRF.

Conclusion

Hybrid search is not about replacing full-text or vector search — it’s about using both where they work best.

Keyword search gives you precision for exact terms and identifiers. Vector search gives you flexibility for natural language and meaning. On their own, each has gaps. Together, they produce consistently better results across a wide range of queries.

With hybrid search in Manticore, you don’t need to choose between the two or build complex query logic to handle different cases. You can run both signals in parallel and get a single, unified result set.

If your search needs to handle both exact matches and intent — which most real-world applications do — hybrid search is a straightforward way to improve relevance without adding complexity.

Manticore Search 25.0.0

Sergey Nikolaev — Tue, 31 Mar 2026 10:54:20 +0000

Manticore Search 25.0.0 has been released. This version brings a simpler packaging model together with major improvements in hybrid search, vector filtering, backups, RT table maintenance, and application integration.

Upgrade Notes

Please review these before upgrading:

MCL 13.0.0 is required. Manticore Search 25.0.0 updates the daemon/MCL interface and adds API_URL and API_TIMEOUT for auto-embedding models. If you manage MCL separately, upgrade the daemon and MCL together. (PR #123)
Replication clusters require coordinated upgrades. Mixed-version clusters are not compatible with the replication changes in 24.0.0. Upgrade clustered nodes together. (Issue #4343)
Newer bigram tokenization options affect downgrade paths. If you rebuild indexes with the bigram tokenization changes introduced in 23.0.0, those rewritten indexes are not compatible with older Manticore versions. (Issue #4364)
Filtered KNN results may change. Since KNN prefiltering was introduced in 19.0.0, filtered vector queries can now prioritize nearest neighbors that satisfy the filter during search, rather than filtering only after candidate selection. (Issue #4103)

Packaging Simplified

Starting with 25.0.0, manticore is the bundle package for deb and rpm. It includes the daemon, tools, converter, development headers, ICU data, bundled dependency packages, and built-in language packs for German, English, and Russian, along with Jieba support.

In most cases, upgrading is now simpler: install manticore and let the bundle pull in the components you need. If older split packages conflict with the new layout, remove them first with apt remove 'manticore*' or yum remove 'manticore*' and then install manticore. Your existing data remains intact. On yum-based systems, the package manager may replace the config file, but it automatically keeps a backup of the previous one.

This is an important operational change: it reduces packaging friction and makes installation simpler and more predictable.

Highlights

Hybrid search is now a first-class option

Manticore now supports hybrid search, allowing you to combine full-text and vector retrieval in a single query. This makes it much easier to build retrieval pipelines that balance lexical precision with semantic recall.

You can use hybrid search via both SQL and JSON interfaces. In SQL, you can combine MATCH() with one or more KNN() subqueries. For teams building modern search experiences, this is one of the biggest additions in the release line.

Better vector search with KNN prefiltering

With KNN prefiltering, attribute filters can be applied during vector search instead of only after candidate selection. That matters when you need "the nearest neighbors among documents that also match my filter", not just "the nearest neighbors overall, filtered afterward".

This improves both relevance and predictability for filtered vector search workloads such as category-constrained product search, tenant-aware search, and permission-filtered semantic retrieval.

Faster RT maintenance with parallel chunk merging

Manticore RT tables now handle heavy maintenance much better thanks to N-way merges and parallel OPTIMIZE jobs. We covered the details in Parallel chunk merging.

The result is simpler to explain than the implementation: when a table accumulates many disk chunks, cleanup and compaction take less time, so RT tables perform better under sustained write load.

Easier application integration with prepared statements

Manticore now supports MySQL-compatible prepared statements, which we covered in Prepared statements in Manticore Search. This improves compatibility with MySQL clients, connection pools, ORMs, and frameworks that expect binary protocol prepare/execute behavior.

For application developers, this removes one more integration edge case and makes Manticore easier to adopt in existing stacks.

S3-compatible backup and restore

Backup operations are more flexible now thanks to S3-compatible backup and restore. Manticore Backup supports AWS S3, MinIO, Wasabi, and Cloudflare R2, making it easier to ship backups to object storage and build cleaner disaster-recovery workflows.

This is especially useful for containerized and cloud-native deployments where local disk is temporary but object storage is the durable layer.

Auto-embeddings keep improving

25.0.0 also extends Manticore's recent auto-embeddings work. The new MCL version adds API_URL and API_TIMEOUT controls for auto-embedding models. Recent development also added support for GGUF quantized local embedding models, T5 encoders, gated Hugging Face downloads, and replication-safe embedding handling for RT tables.

Taken together, these changes make Manticore more practical both for local embedding pipelines and for deployments that rely on external model endpoints.

Other Notable Improvements

This release also includes 36 bug fixes across query execution, replication, macOS packaging, auto-embeddings, RT tables, and SQL compatibility.

False-positive full-text matches caused by max_query_time interruptions in complex queries were fixed, so timed-out searches no longer return rows that do not actually satisfy the query. (Issue #4375)
Replication was fixed for transactions containing duplicate document IDs, so replicas no longer lose rows while the donor removes duplicates correctly. (Issue #4388)
Several auto-embedding stability issues were fixed, including crashes during embedding generation, invalid UTF-8 handling, and missing RT locks during validation. (PR #4349, PR #4370, PR #4371)
LEFT JOIN now returns proper MySQL NULL values instead of the string NULL, improving compatibility with MySQL clients and drivers. (Issue #4229)
A race during RT disk chunk save that could lose killed documents and produce duplicate rows after merges or saves was fixed. (Issue #4207)
Fuzzy search now works across queries involving multiple tables. (PR #4372)

Why 25.0.0 Matters

Manticore Search 25.0.0 combines the packaging changes with several important capabilities that are now available together:

hybrid lexical + vector retrieval
filtered vector search that behaves the way users expect
simpler integration through prepared statements
object-storage-friendly backup workflows
faster RT table compaction and maintenance
more flexible auto-embedding deployments

For the complete technical details, see the changelog.

Need help or want to connect?

Join our Slack
Visit the Forum
Report issues or suggest features on GitHub
Email us at contact@manticoresearch.com

MCP-Manticore: Let Your AI Assistant Write Manticore Queries for You

Sergey Nikolaev — Wed, 25 Mar 2026 10:23:25 +0000

Introduction

You've heard Manticore Search is fast. You've heard it handles full-text, vector, and fuzzy search in one engine. But when you sit down to actually use it, you're staring at documentation, guessing at SQL syntax, and hoping your CREATE TABLE doesn't throw an obscure error.

MCP-Manticore changes the game. It's a Model Context Protocol (MCP) server that connects Cursor, Claude Code, Codex CLI, or any MCP-compatible AI assistant directly to your Manticore instance. The AI can read the docs, inspect your schema, and execute queries — all before it writes a single query for you.

MCP (Model Context Protocol) is an open standard that lets AI assistants connect to external tools and data sources. Instead of the AI hallucinating Manticore syntax based on training data from who-knows-when, it gets real-time access to your database and the official documentation.

Two Ways This Helps You

Depending on what you're doing, MCP-Manticore provides value in two different ways:

1. Developer Assistance (Primary): You're building an app that uses Manticore. The AI helps you create tables, write complex queries, and understand features — without you memorizing SQL syntax.

2. Data Exploration (Secondary): You have data in Manticore and want to ask questions in plain English. "Show me the cheapest accessories" or "Find products similar to this one." For complex operations like vector search or fuzzy matching, the AI needs MCP to know the correct Manticore-specific syntax.

Let's look at both.

The Problem: AI Without Context

When you ask an AI assistant to help with Manticore without MCP integration, here's what happens:

Example: Creating a Table with Auto-Embeddings

You: "How do I create a Manticore table with auto-embeddings for semantic search?"

AI (guessing):

CREATE TABLE products (
    title text,
    description text,
    description_vector float_vector 
        embedding='sentence-transformers/all-MiniLM-L6-v2' 
        source='description'
);

Looks reasonable, right? Wrong. The actual syntax uses MODEL_NAME and FROM, not embedding and source. You'd discover this only after the query fails, sending you back to the documentation.

This is the pattern:

AI generates query based on outdated or guessed syntax
You run it, get an error
You manually check documentation
You correct the AI (or fix it yourself)
Repeat until it works

For developers integrating Manticore into applications, this friction kills productivity. You're not exploring data — you're fighting syntax.

The Solution: MCP-Manticore

With MCP-Manticore enabled, the conversation changes:

You: "How do I create a Manticore table with auto-embeddings for semantic search?"

AI: "Let me check the documentation for the correct syntax..."

[AI calls list_documentation via MCP, searches for "embedding|semantic|knn|vector"]

AI: "Here's the correct syntax. Manticore uses MODEL_NAME and FROM for auto-embeddings:"

CREATE TABLE products (
    title TEXT,
    description TEXT,
    embedding_vector FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2' FROM='title,description'
);

The AI now has:

Real-time access to Manticore documentation
Schema introspection via list_tables() and describe_table()
Query execution to test and validate
Safety controls — read-only by default, write operations require opt-in

Real Examples: With and Without MCP

Example 1: Schema Creation

Without MCP:
The AI guessed the syntax, using embedding='...' and source='...'—which doesn't exist in Manticore. You'd hit an error and waste time debugging.

With MCP:
The AI retrieved the official documentation first and provided the correct MODEL_NAME and FROM syntax. It also explained the supported models (local HuggingFace models, OpenAI, Voyage, Jina) and the HNSW_SIMILARITY options (L2, IP, COSINE).

Example 2: Semantic Search with Auto-Embeddings

You: "Find products similar to 'noise-canceling headphones for travel'"

Without MCP:
The AI completely loses track. Without access to documentation, it:

Tries to SELECT all data and aggregate internally without any filter
Hallucinates embedding vectors with made-up syntax: ANY_KNN(embedding, (-0.07089090,0.04201586,-0.03262700...))
Attempts to write Python scripts to manually calculate similarity
Eventually gives up and just does string matching on descriptions

Result: It finds "Wireless Headphones" only because the description literally contains "noise-canceling headphones" — pure luck, not semantic search.

With MCP:
The AI checks documentation, discovers your table uses auto-embeddings, and learns that knn() accepts text directly when MODEL_NAME is configured:

SELECT id, name, description, knn_dist() 
FROM products 
WHERE knn(embedding, 5, 'noise-canceling headphones for travel');

Result: Returns Wireless Headphones as #1 (correct), but also surfaces semantically related items — actual vector similarity, not keyword matching.

Example 3: Fuzzy Search (Typo Tolerance)

You: "Find products even if I misspell the name, like 'headphons' instead of 'headphones'"

Without MCP:
The AI tries everything it was trained on, hoping something works:

MATCH('headphons~1') and MATCH('headphons~') — wrong operators
CALL SUGGEST('headphons', 'products') — wrong approach for this use case
MATCH('FUZZY(headphons') — hallucinated syntax that doesn't exist
ALTER TABLE products SET min_infix_len = 3 — unnecessary and wrong
OPTION expand_keywords = 1 — unrelated feature

It even tried to optimize the table and run suggestions again. Complete chaos.

Result: No working query. Just a pile of failed attempts based on outdated or confused training data.

With MCP:
The AI checks the documentation and finds the correct syntax immediately:

SELECT * FROM products WHERE MATCH('headphons') OPTION fuzzy=1;

Result: Returns "Wireless Headphones" despite the typo. The AI also explains that fuzzy=1 allows Levenshtein distance of 1 (one character difference), and you can adjust tolerance with OPTION fuzzy=1, distance=2 for more flexibility.

Key Features

Intelligent Documentation Lookup

MCP-Manticore includes a documentation fetcher that pulls directly from the Manticore Search manual on GitHub. When you ask about features like KNN vector search, fuzzy matching, or full-text operators, the AI retrieves the official documentation before responding.

Schema-Aware Query Building

The server provides tools that let the AI understand your data structure before writing queries:

list_tables() — See what tables exist
describe_table() — Understand column names and types
execute_query() — Run queries and see results

Safe Query Execution

By default, MCP-Manticore runs in read-only mode. Write operations (INSERT, UPDATE, DELETE, DROP) require explicit opt-in via environment variables:

export MANTICORE_ALLOW_WRITE_ACCESS=true  # Enable INSERT/UPDATE/DELETE
export MANTICORE_ALLOW_DROP=true            # Enable DROP/TRUNCATE

Multiple Transport Options

Connect via:

stdio (for CLI-based AI assistants like Claude Code)
HTTP (for web-based integrations)
SSE (Server-Sent Events for real-time updates)

With optional JWT authentication for secure deployments.

Tutorial: Setting Up MCP-Manticore

MCP-Manticore works with any MCP-compatible AI assistant, including Cursor, Claude Code, Codex CLI, Windsurf, and any other tool that supports the Model Context Protocol.

Step 1: Ensure UV is Installed

MCP-Manticore runs best with uv, a fast Python package manager:

curl -LsSf https://astral.sh/uv/install.sh | sh

With uv, you don't need to manually install MCP-Manticore—uvx downloads and runs it automatically.

Step 2: Configure Environment Variables (Optional)

# Required: Manticore connection (defaults shown)
export MANTICORE_HOST=localhost
export MANTICORE_PORT=9308

# Optional: Enable write access (default: read-only)
export MANTICORE_ALLOW_WRITE_ACCESS=true

# Optional: Allow destructive operations (DROP, TRUNCATE)
export MANTICORE_ALLOW_DROP=false

Step 3: Add to Your MCP Client

General Configuration:

Command: uvx mcp-manticore
Environment variables (if needed): MANTICORE_HOST, MANTICORE_PORT, etc.

Example configuration (mcp.json):

{
  "mcpServers": {
    "manticore": {
      "command": "uvx",
      "args": ["mcp-manticore"],
      "env": {
        "MANTICORE_HOST": "localhost",
        "MANTICORE_PORT": "9308"
      }
    }
  }
}

For client-specific setup instructions (Cursor, Claude Desktop, Windsurf, etc.), see the MCP-Manticore README.

Step 4: Verify Connection

Test by asking your AI assistant:

"Show me all tables in Manticore"

You should see the AI call the list_tables() tool and display your tables.

Configuration Reference

Environment Variable	Description	Default
`MANTICORE_HOST`	Manticore server hostname	`localhost`
`MANTICORE_PORT`	Manticore HTTP port	`9308`
`MANTICORE_ALLOW_WRITE_ACCESS`	Enable INSERT/UPDATE/DELETE	`false`
`MANTICORE_ALLOW_DROP`	Enable DROP/TRUNCATE	`false`
`MANTICORE_MCP_TRANSPORT`	Transport type (stdio/http/sse)	`stdio`
`MANTICORE_MCP_AUTH_TOKEN`	JWT token for HTTP/SSE	-

The Future: Agents That Install Themselves

There's a third use case on the horizon: autonomous agents that discover and install MCP servers themselves.

Imagine an AI agent that:

Finds your GitHub repo mentioning Manticore
Searches for "Manticore MCP server"
Finds MCP-Manticore, installs it automatically
Starts querying your database to complete its task

This isn't science fiction — OpenAI's Codex and similar agentic systems are moving in this direction. When that future arrives, having MCP-Manticore in the MCP registry means your AI tools will just work with Manticore, no manual setup required.

Conclusion

MCP-Manticore transforms AI assistants from passive text generators into active, knowledgeable development partners. Whether you're:

Building with Manticore — Let the AI handle syntax while you focus on your application
Learning Manticore — Ask questions in plain English, get accurate answers backed by docs
Exploring your data — Query without memorizing SQL syntax or table schemas

The old way: guess, error, debug, repeat.

The new way: ask, verify, execute, done.

Ready to try it? With uv installed, just add MCP-Manticore to your MCP client settings and start asking. Your future self — free from syntax rabbit holes — will thank you.

Resources:

Forem: Sergey Nikolaev

How to Make xt850 Match xt 850

TL;DR

Assumptions and verification

The broader search problem

Baseline: why xt850 fails by default

Why bigrams help here

A note about bigram_delimiter

Mode 1: second_numeric

Example

Mode 2: second_has_digit

Example

How to choose between the two

Final takeaway

How to Speed Up Phrase Search with bigram_index

TL;DR

What bigram indexing actually does

Important caveat: bigrams work at tokenization level

Mode 1: Default behavior

Example

Mode 2: all

Example

Mode 3: first_freq

Example

Mode 4: both_freq

Example

Which performance mode should you choose?

Benchmark: does bigram indexing really speed up phrase search?

Benchmark setup

What I observed

Final takeaway

Build a Searchable Catalog with Filters, Facets, and Semantic Search

Run it locally

What makes the app feel usable

Start with autocomplete

Make the first results page forgiving

Let users narrow without rewriting

Keep deep pagination stable

Add semantic retrieval where keywords fail

Use hybrid search on the results page

Use similar-item discovery on the detail page

Keeping writes and search results in sync

Why this matters

Why monitoring your search engine matters: Manticore ➡ Prometheus ➡ Grafana

Introducing the Manticore Grafana dashboard

How the stack works

Exploring the dashboard

1. Health summary (start here)

2. Query load and latency

3. Memory and resources

4. Table-level insights

5. Cluster state and history

Conclusion

Monitor Manticore Search in Grafana with One Command

When everything looks fine, but search is still slow

What was actually going on

The solution: see the whole picture right away

Environment variables

If Manticore is running on a remote server

What the dashboard shows

Why this matters

Parallel chunk merging in Manticore Search

Why this matters

Before: one merge job, two chunks at a time

After: parallel merges plus larger merge jobs

What changed in practice

What about the new defaults?

Broader benchmark results: row-wise and columnar storage

How to think about the two settings

Takeaway

S3 Streamable Backup: Direct-to-Cloud Backups for Manticore Search

The Problem with Traditional Backups

Streamable S3 Backup: How It Works

Supported Storage Providers

Usage

CLI

Environment Variables

Performance Considerations

Restore from S3

Required S3 Permissions

Baseline: why `xt850` fails by default

Mode 1: `second_numeric`

Mode 2: `second_has_digit`

Mode 2: `all`

Mode 3: `first_freq`

Mode 4: `both_freq`

Parameter Placeholders: `?` & `?VEC?`