Forem: Ólafur Aron Jóhannsson

Build a Document Search Engine in C# Without Python

Ólafur Aron Jóhannsson — Sun, 15 Feb 2026 11:05:42 +0000

Build a Document Search Engine in C

Most search implementations fall into one of two camps: send everything to Elasticsearch, or call a search API. Both work. Both add infrastructure.

Here's a third option. Index local files, search them by keyword, by meaning, or both, in about 10 lines of C#. No external services.

dotnet add package Kjarni

NuGet

using Kjarni;

using var indexer = new Indexer(model: "minilm-l6-v2", quiet: true);
indexer.Create("my_index", new[] { "docs/" });

using var searcher = new Searcher(
    model: "minilm-l6-v2",
    rerankerModel: "minilm-l6-v2-cross-encoder");

var results = searcher.Search("my_index", "how do returns work?",
    mode: SearchMode.Hybrid);

foreach (var r in results)
    Console.WriteLine($"  {r.Score:F4}: {r.Text}");

The indexer reads your files, splits them into chunks, encodes each chunk as a vector, and builds a BM25 keyword index. The searcher queries both indexes and combines the results.

Setup

Create a few text files to search over:

mkdir -p docs

docs/returns.txt:

Our return policy allows customers to return any unused item within 30 days
of purchase for a full refund. Items must be in their original packaging.
Shipping costs are non-refundable.

docs/shipping.txt:

We ship to all 50 US states and internationally to over 40 countries.
Standard shipping takes 5-7 business days. Express shipping is available
for an additional fee.

docs/account.txt:

To reset your password, click "Forgot Password" on the login page.
You will receive an email with a reset link. The link expires after 24 hours.

Three short documents. In practice these could be product manuals, support articles, internal wikis, or any text files.

Indexing

using var indexer = new Indexer(model: "minilm-l6-v2", quiet: true);
indexer.Create("my_index", new[] { "docs/" });

The indexer does three things:

Reads all files in the given directories
Chunks each file into passages (for long documents)
Encodes each chunk into a 384-dimension vector using the embedding model

It also builds a BM25 keyword index over the same chunks. The result is a local index on disk that you can query repeatedly without re-indexing.

Three Search Modes

Keyword Search (BM25)

Matches documents that contain the query words. The same algorithm that powers Elasticsearch and Solr.

var results = searcher.Search("my_index", "return policy refund",
    mode: SearchMode.Keyword);

  7.8795: Our return policy allows customers to return any unused item
          within 30 days of purchase for a full refund...

This works because the query words — "return", "policy", "refund" — appear in the document. If you searched for "send items back and get money" instead, keyword search would find nothing.

For the theory behind BM25, see BM25 vs TF-IDF: Keyword Search Explained.

Semantic Search

Matches documents by meaning, regardless of the exact words used.

var results = searcher.Search("my_index", "can I send items back and get money?",
    mode: SearchMode.Semantic);

This finds the returns document even though none of those exact words appear in it. The embedding model understands that "send items back" means "return" and "get money" means "refund."

For how embeddings and similarity work, see Semantic Search in C#.

Hybrid Search

Combines keyword and semantic results. This is usually the best default.

var results = searcher.Search("my_index", "how do returns work?",
    mode: SearchMode.Hybrid);

   1.3282: Our return policy allows customers to return any unused item
           within 30 days of purchase for a full refund. Items must be in
           their original packaging. Shipping costs are non-refundable.

 -10.5874: To reset your password, click "Forgot Password" on the login
           page. You will receive an email with a reset link. The link
           expires after 24 hours.

 -11.0939: We ship to all 50 US states and internationally to over 40
           countries. Standard shipping takes 5-7 business days. Express
           shipping is available for an additional fee.

Hybrid search catches both exact keyword matches and semantically related content. The scores are from the reranker (more on that below), which is why the gap between relevant and irrelevant results is so large. The returns document scores 1.3, while the other two are deep in the negatives.

Reranking

The results above use a cross-encoder reranker. This is the difference between good search and great search.

The Problem with Embeddings Alone

Embedding models are fast because they encode the query and each document independently. But this means they can't model the interaction between query and document directly. They're comparing summaries, not reading both texts together.

How Reranking Fixes This

A cross-encoder takes the query and a document as a single input and outputs a relevance score. It reads both texts at the same time, so it can attend to specific words in the document that answer the specific question.

Bi-encoder (embedding):     Query -> Vector    Document -> Vector    Compare
Cross-encoder (reranker):   [Query + Document] -> Relevance Score

The cross-encoder is slower because it processes each query-document pair individually. That's why it's used as a second stage: the embedding model retrieves candidates quickly, then the cross-encoder reranks the top results precisely.

Using the Reranker Directly

You can also use the reranker on its own:

using var reranker = new Reranker();

var results = reranker.Rerank(
    "What is machine learning?",
    new[] {
        "Machine learning is a subset of artificial intelligence.",
        "Deep learning uses neural networks with many layers.",
        "The weather today is sunny.",
    });

foreach (var r in results)
    Console.WriteLine($"  {r.Score:F4}: {r.Document}");

  10.5139: Machine learning is a subset of artificial intelligence.
  -5.5301: Deep learning uses neural networks with many layers.
 -11.1001: The weather today is sunny.

The scores are logits, not probabilities. What matters is the relative ordering and the gap between scores. A positive score means the cross-encoder thinks the document is relevant. A negative score means it's not.

The Full Pipeline

Here's how the pieces fit together:

Query
  |
  +-- BM25 Keyword Index ----> Top N candidates by word match
  |
  +-- Vector Index ----------> Top N candidates by meaning
  |
  v
  Merge candidates (union or intersection)
  |
  v
  Cross-Encoder Reranker ----> Final ranked results
  |
  v
  Return to user

Each stage filters and refines. BM25 is cheap and catches exact matches. The vector index catches semantic matches that keywords miss. The reranker reads both query and document together to produce a precise ranking.

using var indexer = new Indexer(model: "minilm-l6-v2", quiet: true);
indexer.Create("my_index", new[] { "docs/" });

using var searcher = new Searcher(
    model: "minilm-l6-v2",
    rerankerModel: "minilm-l6-v2-cross-encoder");

// Hybrid = BM25 + Semantic + Reranker
var results = searcher.Search("my_index", "how do returns work?",
    mode: SearchMode.Hybrid);

When to Use Each Mode

Mode	Best for	Misses
Keyword	Exact terms, error codes, IDs	Synonyms, rephrased queries
Semantic	Intent matching, fuzzy queries	Exact phrases, rare terms
Hybrid	General purpose (recommended)	Slightly slower

Start with Hybrid. Switch to Keyword if your users search for exact identifiers. Switch to Semantic if your users describe what they want in natural language.

Practical Patterns

Filtering Results

Apply a score threshold to filter out irrelevant results:

var results = searcher.Search("my_index", query, mode: SearchMode.Hybrid);
var relevant = results.Where(r => r.Score > 0.0);

With reranking, a score above 0 is a reasonable default threshold for "probably relevant."

Search + Classification

Find relevant documents, then classify their sentiment. This combines search and classification together:

using var searcher = new Searcher(model: "minilm-l6-v2");
using var classifier = new Classifier("roberta-sentiment");

var results = searcher.Search("reviews_index", "battery life",
    mode: SearchMode.Hybrid);

foreach (var r in results.Take(10))
{
    var sentiment = classifier.Classify(r.Text);
    Console.WriteLine($"  {sentiment}  \"{r.Text}\"");
}

See Sentiment Analysis in C# Without Python for more on classification.

Re-indexing

When documents change, re-create the index:

indexer.Create("my_index", new[] { "docs/" });

This rebuilds the full index. For large corpora where incremental updates matter, you'd manage the vector storage separately.

How It Compares

Approach	Setup	Cost	Offline
Elasticsearch	Cluster + config	Server costs	No
Azure AI Search	Portal + API key	Per-query pricing	No
Algolia	Dashboard + API key	Per-search pricing	No
Kjarni	`dotnet add package`	Free	Yes

The tradeoff: Kjarni runs in-process on a single machine. If you need distributed search across billions of documents, use Elasticsearch. If you need search over thousands to millions of documents on a single server, a local engine works well and eliminates a dependency.

How It Works Under the Hood

Kjarni builds two indexes per collection:

BM25 index — inverted index over tokenized text, with term frequency saturation and document length normalization
Vector index — encoded embeddings for each chunk, queried by cosine similarity

At search time, both indexes return candidates. The results are merged and optionally reranked by a cross-encoder model that reads the query and each candidate together.

The engine is written in Rust. The C# package wraps a single native library. There is no Python runtime, no JVM, and no external service.

NuGet:  https://www.nuget.org/packages/Kjarni
GitHub: https://github.com/olafurjohannsson/kjarni

Other Resources

Semantic Search in C# - Embeddings and similarity from scratch
Build a Document Search Engine in C# - Full hybrid search with indexing and reranking
BM25 vs TF-IDF: Keyword Search Explained - How keyword search works under the hood
What are Vector Embeddings? - How machines understand meaning through numbers

Semantic Search in C# Without Python

Ólafur Aron Jóhannsson — Sat, 14 Feb 2026 10:51:36 +0000

Semantic Search in C# — Without a Vector Database

Keyword search finds documents that contain the words you typed. Semantic search finds documents that mean what you meant.

Search for "how to change my login credentials". Keyword search returns nothing because none of your documents contain those exact words. Semantic search returns "How do I reset my password?" because the meaning is the same.

dotnet add package Kjarni

NuGet

using Kjarni;

using var embedder = new Embedder("minilm-l6-v2");
Console.WriteLine(embedder.Similarity("doctor", "physician")); // 0.8598
Console.WriteLine(embedder.Similarity("doctor", "banana"));    // 0.3379

No API key. No Python. No vector database. The model runs locally on CPU.

How Semantic Search Works

The core idea: convert text into numbers that capture meaning.

A sentence embedding model reads text and outputs a vector — an array of floating-point numbers, typically 384 or 768 dimensions. Texts with similar meaning produce vectors that are close together. Texts with different meaning produce vectors that are far apart.

"doctor"    -> [0.12, -0.34, 0.56, 0.78, ...]  (384 numbers)
"physician" -> [0.11, -0.33, 0.55, 0.79, ...]  (384 numbers)  <- close
"banana"    -> [-0.45, 0.23, -0.12, 0.01, ...]  (384 numbers)  <- far

You measure the distance between vectors using cosine similarity. The score ranges from -1 (opposite) to 1 (identical).

For a deeper explanation of how embeddings work, see What are Vector Embeddings?.

Encoding Text

using var embedder = new Embedder("minilm-l6-v2");
float[] vector = embedder.Encode("Hello world");
Console.WriteLine(vector.Length);                          // 384
Console.WriteLine(string.Join(", ", vector[..5]));
// -0.034477282, 0.03102318, 0.006734989, 0.02610899, -0.03936202

The model downloads on first use (~90MB) and caches locally. Every call to Encode() runs the full transformer: tokenization, attention layers, mean pooling, normalization. The output is a unit-length vector ready for cosine similarity.

These are the same vectors you'd get from Python's sentence-transformers:

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
vector = model.encode("Hello world", normalize_embeddings=True)
# [-0.03447726  0.03102319  0.00673499  0.02610895 -0.03936201]

Same model, same weights, same output.

Measuring Similarity

Cosine similarity tells you how close two vectors are:

using var embedder = new Embedder("minilm-l6-v2");

var pairs = new[] {
    ("doctor", "physician"),
    ("doctor", "hospital"),
    ("doctor", "banana"),
    ("cat", "dog"),
    ("cat", "car"),
    ("machine learning", "artificial intelligence"),
    ("machine learning", "potato soup"),
};

foreach (var (a, b) in pairs)
    Console.WriteLine($"  {embedder.Similarity(a, b):F4}  \"{a}\" / \"{b}\"");

  0.8598  "doctor" / "physician"
  0.5971  "doctor" / "hospital"
  0.3379  "doctor" / "banana"
  0.6606  "cat" / "dog"
  0.4633  "cat" / "car"
  0.7035  "machine learning" / "artificial intelligence"
  0.1848  "machine learning" / "potato soup"

The scores match intuition. "Doctor" and "physician" are near-synonyms (0.86). "Cat" and "dog" are related but different (0.66). "Machine learning" and "potato soup" have almost nothing in common (0.18).

Building a Search

Here's the pattern. Encode your documents once. Encode the query at search time. Rank by cosine similarity.

using var embedder = new Embedder("minilm-l6-v2");

var docs = new[] {
    "How do I reset my password?",
    "What is your refund policy?",
    "Do you ship internationally?",
    "How do I update my billing address?",
    "Where can I track my order?",
};

// Encode all documents (do this once, store the vectors)
var vectors = embedder.EncodeBatch(docs);

// Search
var query = embedder.Encode("I need to change my login credentials");

var results = docs
    .Select((doc, i) => (doc, score: Embedder.CosineSimilarity(query, vectors[i])))
    .OrderByDescending(x => x.score);

foreach (var (doc, score) in results)
    Console.WriteLine($"  {score:F4}: {doc}");

  0.5981: How do I reset my password?
  0.4067: How do I update my billing address?
  0.0767: Where can I track my order?
 -0.0027: What is your refund policy?
 -0.0451: Do you ship internationally?

"Change my login credentials" matches "reset my password" at 0.60 despite sharing zero keywords. That's the entire value of semantic search.

"Update my billing address" comes second. The model understands that changing account information is related even though the specific fields differ.

FAQ Matching

Route support tickets to the most relevant FAQ:

using var embedder = new Embedder("minilm-l6-v2");

var faqs = new[] {
    "How do I cancel my subscription?",
    "How do I get a refund?",
    "How do I change my email address?",
    "What payment methods do you accept?",
    "How do I contact support?",
};
var faqVectors = embedder.EncodeBatch(faqs);

string MatchFaq(string userQuestion)
{
    var queryVec = embedder.Encode(userQuestion);
    var best = faqs
        .Select((faq, i) => (faq, score: Embedder.CosineSimilarity(queryVec, faqVectors[i])))
        .OrderByDescending(x => x.score)
        .First();

    return best.score > 0.4 ? best.faq : "No matching FAQ found.";
}

Console.WriteLine(MatchFaq("I want to stop paying"));
// How do I cancel my subscription?

Console.WriteLine(MatchFaq("Can I pay with crypto?"));
// What payment methods do you accept?

Encode your FAQs once at startup, store the vectors, and only encode the user's query at request time.

Deduplication

Find near-duplicate content in a dataset:

var texts = GetAllDocuments();
var vectors = embedder.EncodeBatch(texts);

var duplicates = new List<(string, string, float)>();

for (int i = 0; i < texts.Length; i++)
    for (int j = i + 1; j < texts.Length; j++)
    {
        var sim = Embedder.CosineSimilarity(vectors[i], vectors[j]);
        if (sim > 0.85)
            duplicates.Add((texts[i], texts[j], sim));
    }

A threshold of 0.85 catches rephrased content while ignoring merely related documents.

Combining with Sentiment

Find relevant reviews about a topic, then check their sentiment. See Sentiment Analysis in C# for more on the classification side.

using var embedder = new Embedder("minilm-l6-v2");
using var classifier = new Classifier("roberta-sentiment");

var query = embedder.Encode("battery life");
var relevant = reviews
    .Select(r => (review: r, score: Embedder.CosineSimilarity(query, embedder.Encode(r))))
    .Where(x => x.score > 0.3)
    .OrderByDescending(x => x.score);

foreach (var (review, score) in relevant)
    Console.WriteLine($"{classifier.Classify(review)}  \"{review}\"");

Choosing a Model

Model	Dimensions	Speed	Quality
`minilm-l6-v2`	384	Fast	Good
`mpnet-base-v2`	768	Slower	Better

Start with minilm-l6-v2. It's the most widely used embedding model in production and handles most use cases well. Switch to mpnet-base-v2 if you need higher quality and can afford the extra latency and memory.

Both models have a 512 token input limit (~300-400 words). Longer text gets truncated. If your documents are long, split them into chunks first — which is exactly what the document search engine does for you automatically.

When to Use Semantic Search vs Keyword Search

Semantic search is not always better than keyword search.

Semantic search works best when:

Users don't know the exact terminology
You're matching intent, not words ("change login" → "reset password")
Documents are short (FAQs, product descriptions, support tickets)

Keyword search works best when:

Users search for exact terms (error codes, product IDs, proper nouns)
You need exact phrase matching
Documents are long and repetitive keywords matter

The best approach is usually both. See Build a Document Search Engine in C# for a hybrid search implementation that combines BM25 keyword search with semantic vectors and reranking.

For the theory behind keyword search, see BM25 vs TF-IDF: Keyword Search Explained.

How This Works

Kjarni loads HuggingFace sentence-transformer models directly from safetensors. The inference engine is written in Rust. The C# package wraps a single native library.

The outputs match Python's sentence-transformers

NuGet:  https://www.nuget.org/packages/Kjarni
GitHub: https://github.com/olafurjohannsson/kjarni

Other Resources

Semantic Search in C# - Embeddings and similarity from scratch
Build a Document Search Engine in C# - Full hybrid search with indexing and reranking
BM25 vs TF-IDF: Keyword Search Explained - How keyword search works under the hood
What are Vector Embeddings? - How machines understand meaning through numbers

Sentiment Analysis in C# - Without Python or External APIs

Ólafur Aron Jóhannsson — Fri, 13 Feb 2026 09:13:20 +0000

Sentiment Analysis

You have text. You want to know if it's positive, negative, or neutral.

The usual options:

Azure Cognitive Services - API call per request, pay per character
ML.NET - train your own model, bring your own dataset
Python sidecar - run Flask next to your .NET app, serialize everything as JSON

There's a simpler option. Load a pretrained transformer model and run it locally.

dotnet add package Kjarni

NuGet

using Kjarni;

using var classifier = new Classifier("roberta-sentiment");
Console.WriteLine(classifier.Classify("I love this product!"));
// positive (98.5%)

The model downloads on first use, then caches locally, and runs on CPU by default, but GPU is also available.
No API key. No Python. No container.

How Sentiment Models Work

A sentiment classifier is a neural network trained on labeled text.
The model has seen millions of examples like:

Text	Label
"This movie was fantastic"	positive
"Terrible customer service"	negative
"The package arrived on Tuesday"	neutral

At inference time, the model reads your input text, encodes it into a
high-dimensional vector, and runs that vector through a classification
head that outputs a probability for each label.

Input text -> Tokenizer -> Transformer Encoder -> Classification Head -> Probabilities
                                                                         positive: 98.5%
                                                                         neutral:   1.2%
                                                                         negative:  0.3%

You pass in text, you get back a label and a score.

Three-Class Sentiment

The roberta-sentiment model classifies text as positive, negative, or neutral.
It was trained on ~124M tweets and handles informal text, slang, and emoji well.

using var classifier = new Classifier("roberta-sentiment");

var inputs = new[] {
    "I love this product!",
    "Terrible quality, broke after one day.",
    "It's okay I guess.",
    "The packaging was nice but the product itself was mediocre.",
    "Just received my order",
};

foreach (var text in inputs)
    Console.WriteLine($"{classifier.Classify(text)}  \"{text}\"");

positive (98.5%)  "I love this product!"
negative (94.1%)  "Terrible quality, broke after one day."
positive (52.9%)  "It's okay I guess."
negative (81.4%)  "The packaging was nice but the product itself was mediocre."
positive (57.5%)  "Just received my order"

Notice the nuance. "It's okay I guess" is technically positive but barely, at 52.9%.
The model picks up on hedging. "The packaging was nice but the product itself was mediocre"
is classified negative because the overall sentiment leans that way despite the
positive clause.

Getting All Scores

Classify() returns the top label. To see the full probability distribution:

var result = classifier.Classify("The packaging was nice but the product itself was mediocre.");
Console.WriteLine(result.ToJson());

{
  "label": "negative",
  "score": 0.8138,
  "predictions": [
    {"label": "negative", "score": 0.8138},
    {"label": "neutral", "score": 0.1615},
    {"label": "positive", "score": 0.0247}
  ]
}

The scores sum to 1.0. The model is 81.4% confident this is negative,
16.2% neutral, 2.5% positive. In production, you might treat anything
below 70% confidence as "uncertain" rather than taking the label at face value.

Five-Star Sentiment (Multilingual)

If you need finer granularity, bert-sentiment-multilingual maps text to a 1-5 star rating.
It works across English, German, French, Spanish, Italian, and Portuguese.

using var classifier = new Classifier("bert-sentiment-multilingual");

var inputs = new[] {
    ("en", "Absolutely amazing!"),
    ("es", "Esta es la peor compra que he hecho."),
    ("de", "Das ist ganz okay."),
    ("fr", "C'est terrible."),
    ("it", "Non male, ma potrebbe essere meglio."),
};

foreach (var (lang, text) in inputs)
    Console.WriteLine($"[{lang}] {classifier.Classify(text)}  \"{text}\"");

[en] 5 stars (96.7%)  "Absolutely amazing!"
[es] 1 star (94.1%)   "Esta es la peor compra que he hecho."
[de] 3 stars (77.7%)  "Das ist ganz okay."
[fr] 1 star (70.4%)   "C'est terrible."
[it] 3 stars (83.7%)  "Non male, ma potrebbe essere meglio."

Same API. The model handles language detection internally.

Emotion Detection

Sentiment tells you positive or negative. Emotion tells you why.

The distilroberta-emotion model classifies text into seven emotions:
anger, disgust, fear, joy, neutral, sadness, and surprise.

using var classifier = new Classifier("distilroberta-emotion");

var inputs = new[] {
    "I just got promoted!",
    "My dog passed away yesterday.",
    "I can't believe they did that to me.",
    "I'm so nervous about the interview tomorrow.",
};

foreach (var text in inputs)
    Console.WriteLine($"{classifier.Classify(text)}  \"{text}\"");

surprise (50.7%)  "I just got promoted!"
sadness (98.4%)   "My dog passed away yesterday."
surprise (89.2%)  "I can't believe they did that to me."
fear (99.4%)      "I'm so nervous about the interview tomorrow."

"I just got promoted!" is interesting. The model sees it as surprise more than joy.
If you need the full breakdown:

var result = classifier.Classify("I just got promoted!");
Console.WriteLine(result.ToJson());

{
  "label": "surprise",
  "score": 0.5066,
  "predictions": [
    {"label": "surprise", "score": 0.5066},
    {"label": "anger", "score": 0.2376},
    {"label": "joy", "score": 0.0980},
    {"label": "neutral", "score": 0.0664},
    {"label": "disgust", "score": 0.0658},
    {"label": "sadness", "score": 0.0221},
    {"label": "fear", "score": 0.0035}
  ]
}

For finer-grained emotions, the roberta-emotions model detects 28 labels
including admiration, amusement, curiosity, gratitude, and others.

Toxicity Detection

Content moderation is a specific form of classification. The toxic-bert model
scores text across six categories simultaneously:

using var classifier = new Classifier("toxic-bert");
Console.WriteLine(classifier.Classify("You are an idiot").ToDetailedString());

             toxic   98.61%
            insult   96.00%
           obscene   75.64%
      severe_toxic    4.56%
     identity_hate    1.41%
            threat    0.13%

This is a multi-label model, meaning multiple categories can be true at the same time.
A comment can be both toxic and an insult. The scores are independent, not competing.

Compare with something benign:

Console.WriteLine(classifier.Classify("I respectfully disagree with your point").ToDetailedString());

             toxic    0.07%
           obscene    0.02%
            insult    0.02%
     identity_hate    0.01%
            threat    0.01%
      severe_toxic    0.01%

All scores near zero. The model is confident this is not toxic.
In production you'd set a threshold (say 80%) and only flag content above it.

Bulk Processing

For analyzing many texts, loop through them:

using var classifier = new Classifier("roberta-sentiment");

var reviews = new[] {
    "Fast shipping, great product",
    "Arrived damaged, no response from support",
    "Does what it says",
    "Best purchase I've made this year",
    "Meh",
};

foreach (var text in reviews)
    Console.WriteLine($"{classifier.Classify(text)}  \"{text}\"");

positive (97.2%)  "Fast shipping, great product"
negative (85.3%)  "Arrived damaged, no response from support"
neutral (82.0%)   "Does what it says"
positive (98.1%)  "Best purchase I've made this year"
neutral (62.3%)   "Meh"

Practical Patterns

Confidence Thresholding

Don't trust low-confidence predictions blindly:

var result = classifier.Classify(text);

if (result.Score > 0.80)
    // High confidence, act on it
    SaveSentiment(text, result.Label);
else
    // Low confidence, flag for review or use a neutral default
    FlagForReview(text, result);

Aggregating Sentiment

To get the overall sentiment of a product from many reviews:

var results = reviews.Select(r => classifier.Classify(r)).ToArray();

var summary = results
    .GroupBy(r => r.Label)
    .ToDictionary(g => g.Key, g => g.Count());

// { "positive": 847, "negative": 121, "neutral": 232 }

Combining with Search

Sentiment pairs well with semantic search.
Find relevant documents first, then analyze their sentiment:

using var embedder = new Embedder("minilm-l6-v2");
using var classifier = new Classifier("roberta-sentiment");

// Find reviews about shipping
var query = embedder.Encode("shipping and delivery experience");
var relevant = reviews
    .Select(r => (review: r, score: Embedder.CosineSimilarity(query, embedder.Encode(r))))
    .Where(x => x.score > 0.3)
    .OrderByDescending(x => x.score);

// Classify only the relevant ones
foreach (var (review, score) in relevant)
    Console.WriteLine($"{classifier.Classify(review)}  \"{review}\"");

Which Model to Use

Need	Model	Labels
Quick positive/negative	`distilbert-sentiment`	2 (pos/neg)
Positive/negative/neutral	`roberta-sentiment`	3
Star rating (multilingual)	`bert-sentiment-multilingual`	5 (1-5 stars)
Basic emotion	`distilroberta-emotion`	7
Fine-grained emotion	`roberta-emotions`	28
Content moderation	`toxic-bert`	6 (multi-label)

Start with roberta-sentiment. If you need more detail, move to the multilingual
or emotion models. They all work the same way, same API, different model name.

How This Works Under the Hood

Kjarni loads HuggingFace models
directly from safetensors using memory-mapped I/O. The inference engine is
written in Rust with hand-tuned SIMD kernels. The C# package wraps a single
native library.

These are the same models used by Python's transformers and sentence-transformers
libraries. The outputs match to four decimal places. The difference is you don't
need a Python runtime or 2GB of pip dependencies.

NuGet:  https://www.nuget.org/packages/Kjarni
GitHub: https://github.com/olafurjohannsson/kjarni

Other Resources

Semantic Search in C# - Embeddings and similarity from scratch
Build a Document Search Engine in C# - Full hybrid search with indexing and reranking
BM25 vs TF-IDF: Keyword Search Explained - How keyword search works under the hood
What are Vector Embeddings? - How machines understand meaning through numbers

What are Vector Embeddings?

Ólafur Aron Jóhannsson — Sun, 19 Oct 2025 07:00:25 +0000

It's just matrices

Vector embeddings serve a very important piece in technology today, as their application is so useful, as they capture
meaning of natural language.

You've used them before

Netflix/Spotify uses it for their recommendation systems (because you watched X...)
Duplicate detection
Retrieval Augmented Generation (RAG) systems, retreive relevant text from a corpora
Content moderation
Question Answering, match the intent, not just keywords (How do i reset password -> Forgot login credentials)

Turn text into numbers in high-dimensional space. More dimensions = more detail about meaning. MiniLM uses 384. Bigger models go to 1024+, but cost more compute and memory.

This is how computers know "doctor" and "physician" mean the same thing despite sharing zero letters.

Try It Yourself

→ Live Interactive Demo

Demo

How It Works

Neural networks learn to map words to vectors by training on billions of text examples. Words that appear in similar contexts end up with similar vectors.

The network learns patterns like:

"The doctor prescribed medication"
"The physician prescribed medication"

Since "doctor" and "physician" appear in similar contexts, they get similar embeddings.

Why 384 Dimensions?

Each embedding is 384 numbers (for MiniLM-L6-v2). Why so many?

Each dimension captures a different aspect of meaning:

Dimension 1 might encode "is this a profession?"
Dimension 47 might encode "medical-related?"
Dimension 203 might encode "human-related?"

The model learns these automatically from data. We can't interpret individual dimensions, but the full vector captures nuanced meaning.

Measuring Similarity

Cosine similarity measures how "aligned" two vectors are:

1.0 = identical direction (same meaning)
0.0 = perpendicular (unrelated)
-1.0 = opposite direction (antonyms)

fn cosine_similarity(a: &[f32], b: &[f32]) -> f32 {
    let dot: f32 = a.iter().zip(b).map(|(x, y)| x * y).sum();
    let norm_a: f32 = a.iter().map(|x| x * x).sum::<f32>().sqrt();
    let norm_b: f32 = b.iter().map(|x| x * x).sum::<f32>().sqrt();
    dot / (norm_a * norm_b)
}

What You Can Build With This

Once you have embeddings, you can:

Semantic Search - Find documents by meaning, not keywords

Clustering - Group similar items together

Recommendations - "Users who liked X also liked Y"

Duplicate Detection - Find similar content with different wording

Classification - Categorize text by meaning

All powered by comparing vectors.

The Model Behind This Demo

This demo uses EdgeBERT, my pure Rust BERT model inference implementation that runs in browsers via WebAssembly.

Model: all-MiniLM-L6-v2 (384 dimensions)

Other Resources

EdgeBERT on GitHub - The WASM library powering this demo

Alse see BM25 vs TF-IDF - BM25 vs TF-IDF

EdgeBERT: I Built My Own Neural Network Inference Engine in Rust

Ólafur Aron Jóhannsson — Fri, 12 Sep 2025 10:52:04 +0000

Lightweight BERT Embeddings in Rust (Sentence-Transformers Alternative Without Python)

I needed semantic search in my Rust app, so users could search for "doctor" and still find documents mentioning "physician" or "medical practitioner". I wanted a lightweight BERT embeddings solution in Rust, small enough to run on edge devices, browsers, and servers without headaches.

The trick is to turn text into vectors that capture meaning. Similar words → similar numbers.

doctor → [0.2, 0.5, -0.1, 0.8, ...]
physician→ [0.2, 0.4, -0.1, 0.7, ...] ✅ similar
banana → [0.9, -0.3, 0.6, -0.2, ...] ❌ different

The Pain with Existing Solutions

To compare it with Python, the standard approach is... heavy:

Just to generate embeddings, a fresh virtual environment ballooned to 6.8 GB, mostly PyTorch, tokenizers, and model weights.

The ONNX Runtime

But i was using Rust, someone mentioned ort, i'll use ONNX Runtime from Rust. How hard could it be?

    pub fn new(model_path: &str, tokenizer_path: &str) -> Result<Self> {
        let environment = ort::environment::init()
.with_execution_providers([CUDAExecutionProvider::default().build()]).commit()?;
    }
// ... 150 lines total just to get encode to work

Binary bloat

In the end it worked, but ort pulled in 80+ crates, expanded my release build to 350 MB, and relied on system libraries like libstdc++, libpthread, libm, libc. OpenSSL version mismatches added another headache.

System library conflicts

error: OpenSSL 3.3 required
$ openssl version
OpenSSL 1.1.1k  # Can't upgrade - would break RHEL dependencies

The C++ dependencies wanted OpenSSL 3.3. My RHEL system had 1.1. Removing 1.1 would break half my system.

In the beginning i was trying to build a light-weight offline RAG solution, with this one dependency turned into a major challenge.

What i actually wanted

This was the API i was looking for

model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = model.encode(texts)

The Solution

So I built it my own inference engine in Rust:

use edgebert::{Model, ModelType};
let model = Model::from_pretrained(ModelType::MiniLML6V2)?;
let embeddings = model.encode(texts, true)?;

5MB binary
200MB RAM
No dependencies hell
Same accuracy (0.9997 correlation)
BLAS optional feature flag

Performance

Initial benchmarks were promising, and after optimizing the matrix multiplication routines, here's how EdgeBERT stacks up against sentence-transformers on a CPU:

Configuration	EdgeBERT	sentence-transformers
Single-threaded	`1.76ms/sentence`	5.02ms/sentence
Multi-threaded (8)	3.04ms/sentence	3.90ms/sentence
Default threads	8.52ms/sentence	`2.88ms/sentence`

👉 EdgeBERT is up to 3× faster in single-threaded scenarios. This is because MiniLM's matrices (384×384) are small, meaning the overhead from thread coordination can outweigh the benefits of parallelization. While sentence-transformers pulls ahead when using all cores on large batches, EdgeBERT's single-threaded efficiency is a key advantage for lightweight and edge applications.

CPU performance is only half the story. Memory efficiency, especially RAM usage during encoding, is critical. We can see a significant difference here:

EdgeBERT's memory footprint is not only smaller but also more stable, avoiding the large initial allocation spikes seen with the PyTorch-based solution.

Accuracy

Comparing with Python sentence-transformers:

EdgeBERT:   [-0.0344, 0.0309, 0.0067, 0.0261, -0.0394, ...]
Python:     [-0.0345, 0.0310, 0.0067, 0.0261, -0.0394, ...]
Cosine similarity: **0.9997**

Rounding differences of floating point computations, 99.97% the same.

Why This Matters (Even If You Don’t Do ML)

This enables:

Smart search: Users find what they mean, not just what they typed
Better recommendations: "If you liked X, you'll like Y" based on meaning
Duplicate detection: Find similar issues/documents even with different wording
Content moderation: Detect harmful content regardless of phrasing
RAG/AI features: Give LLMs the right context without keyword matching

All in 5MB of Rust. No Python required.

When to Use

Use EdgeBERT when:

You need embeddings without Python
Deployment size matters (5MB vs 6.8GB)
Running on edge devices or browsers
Memory is constrained
Single-threaded performance matters

Use sentence-transformers when:

You need GPU acceleration
Using multiple model architectures
Already in Python ecosystem
Need the full Hugging Face stack

WebAssembly

Since it's pure Rust with minimal dependencies, it compiles to WASM:

import init, { WasmModel, WasmModelType } from './pkg/edgebert.js';

const model = await WasmModel.from_type(WasmModelType.MiniLML6V2);
const embeddings = model.encode(texts, true);

429KB WASM binary + 30MB model weights. Runs in browsers.

Had to implement WordPiece tokenizer from scratch - the tokenizers crate has C dependencies that don't compile to WASM.

How It Works

BERT is matrix operations in a specific order:

Tokenization - WordPiece tokens to IDs
Embedding - 384-dimensional vectors (word + position + segment)
Self-attention - Q·K^T/√d, softmax, multiply by V
Feed-forward - Linear, GELU, Linear
Pooling - Average tokens into sentence embedding

Each transformer layer repeats attention and feed-forward, refining the representations. MiniLM has 6 layers.

The core implementation is ~500 lines in src/lib.rs.

No magic, just the transformer algorithm, written in Rust.

https://github.com/olafurjohannsson/edgebert

Configuration

For best performance:

# EdgeBERT - fastest single-threaded
export OPENBLAS_NUM_THREADS=1; cargo run --release --features openblas

# Python - let it auto-tune threads
python native.py

Installation

[dependencies]
edgebert = "0.3.4"

Roadmap & Future Work

EdgeBERT is focused and minimal by design, but there are exciting directions for the future:

GPU Support: Adding wgpu support for cross-platform GPU acceleration is a top priority.
More Architectures: Expanding beyond all-MiniLM-L6-v2 to support other efficient models.
Quantization: Implementing model quantization to further reduce model size and improve performance on CPU and microcontrollers.

Pull requests are always welcome!

Code

https://github.com/olafurjohannsson/edgebert

Most of the implementation is in one file. Pull requests welcome.

I built this because I needed it.

I’m sharing it because maybe you do too. 🚀

Benchmark Details

EdgeBERT: cargo run --release --features openblas --bin native with OPENBLAS_NUM_THREADS set
Python: python native.py with OMP_NUM_THREADS set
Installed sentence_transformers using pip in a venv and inspected filesize du -sh venv/lib/python*/site-packages/
Inspected Rust dependencies with cargo tree | wc -l
Inspected venv dependencies pip list | wc -l
WASM file size ls -lh examples/pkg/edgebert_bg.wasm
Native file size ls -lh target/release/native
Used /usr/bin/time -v to measure peak Maximum resident set size (kbytes)