<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Antonius Indra Dharma Prasetya</title>
    <description>The latest articles on Forem by Antonius Indra Dharma Prasetya (@indra_makassar).</description>
    <link>https://forem.com/indra_makassar</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3523648%2F06f89bb1-711d-4028-9e01-218446a12cd5.jpg</url>
      <title>Forem: Antonius Indra Dharma Prasetya</title>
      <link>https://forem.com/indra_makassar</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/indra_makassar"/>
    <language>en</language>
    <item>
      <title>Building a Text Similarity Checker API Using Sentence Transformers and Flask</title>
      <dc:creator>Antonius Indra Dharma Prasetya</dc:creator>
      <pubDate>Tue, 23 Sep 2025 07:06:26 +0000</pubDate>
      <link>https://forem.com/indra_makassar/building-a-text-similarity-checker-api-using-sentence-transformers-and-flask-1g6f</link>
      <guid>https://forem.com/indra_makassar/building-a-text-similarity-checker-api-using-sentence-transformers-and-flask-1g6f</guid>
      <description>&lt;h2&gt;
  
  
  What Is Sentence-Transformers (SBERT)?
&lt;/h2&gt;

&lt;p&gt;When I first explored text similarity, it was for an exam management system. The problem? Sometimes different teachers unknowingly entered the same question twice in the database just phrased a little differently. Traditional keyword-based matching struggled with this, since two questions could look different on the surface but actually mean the same thing.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“What is the capital city of France?”&lt;br&gt;
“Which city is the capital of France?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To a keyword search, those might look different. But to a student (and to us), they’re clearly duplicates.&lt;/p&gt;

&lt;p&gt;This is where &lt;strong&gt;&lt;a href="https://www.sbert.net/" rel="noopener noreferrer"&gt;Sentence-Transformers (SBERT)&lt;/a&gt;&lt;/strong&gt; comes in. By converting questions into &lt;strong&gt;semantic embeddings&lt;/strong&gt;, SBERT allows us to measure how close two sentences are in meaning, not just in wording. That makes it perfect for detecting duplicate questions automatically.&lt;/p&gt;

&lt;p&gt;In this article, I’ll walk you through how to build a &lt;strong&gt;Text Similarity Checker API&lt;/strong&gt; using SBERT and Flask. While my first use case was exam management, this same idea applies to &lt;strong&gt;semantic search, chatbots, recommendation engines, and plagiarism detection.&lt;/strong&gt; By the end, you’ll have a lightweight REST API that can take two texts and tell you how similar they are.&lt;/p&gt;




&lt;h2&gt;
  
  
  Project Setup
&lt;/h2&gt;

&lt;p&gt;Before we start coding, let’s set up a clean environment for our project. I recommend using a virtual environment (&lt;code&gt;venv&lt;/code&gt;) to keep dependencies isolated and avoid conflicts with other Python projects on your system.&lt;/p&gt;

&lt;p&gt;Next, in your venv install the required libraries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install flask sentence-transformers
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will install:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Flask&lt;/strong&gt; → to create our REST API.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sentence Transformers&lt;/strong&gt; → to load SBERT and compute sentence embeddings.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Building A Minimal Flask Application
&lt;/h2&gt;

&lt;p&gt;Before we bring in Sentence-BERT, let’s start with the simplest possible Flask app to make sure everything works.&lt;/p&gt;

&lt;p&gt;Create a file called &lt;code&gt;app.py&lt;/code&gt; inside your project folder and add the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from flask import Flask

# Create a Flask app instancefrom flask import Flask # Create a Flask app instance app = Flask(__name__) # Define a basic route @app.route('/') def home():     return "Hello, Flask is running!" if __name__ == '__main__':     app.run(debug=True)
app = Flask(__name__)

# Define a basic route
@app.route('/')
def home():
    return "Hello, Flask is running!"

if __name__ == '__main__':
    app.run(debug=True)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then you could start the server with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python app.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flask --app app run
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If everything is set up correctly, you’ll see output like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;* Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, open a browser and go to &lt;a href="http://127.0.0.1:5000/" rel="noopener noreferrer"&gt;http://127.0.0.1:5000&lt;/a&gt; and you should see:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Hello, Flask is running!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This minimal example confirms Flask is installed and working. Next, we’ll extend this app to build our Text Similarity Checker API using SBERT.&lt;/p&gt;

&lt;h2&gt;
  
  
  Adding a Helper for SBERT
&lt;/h2&gt;

&lt;p&gt;Our Flask app is running, now let’s bring in Sentence-BERT (SBERT).&lt;br&gt;
To keep our code clean, we’ll create a new file called embedding_service.py where we handle three things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Loading the model once (lazy loading).&lt;/li&gt;
&lt;li&gt;Turning text into embeddings (vectors).&lt;/li&gt;
&lt;li&gt;Comparing two embeddings to get similarity.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We’ll build this step by step.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 1: Load the Model
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from transformers import AutoTokenizer, AutoModel

# Lazy-loaded global variables
_tokenizer = None
_model = None

def _load_model():
    """Load SBERT tokenizer and model once (lazy loading)."""
    global _tokenizer, _model
    if _tokenizer is None or _model is None:
        _tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
        _model = AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
    return _tokenizer, _model
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Here’s what’s happening:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;_tokenizer&lt;/code&gt; and &lt;code&gt;_model&lt;/code&gt; are kept as global variables so the model is only loaded once.&lt;/li&gt;
&lt;li&gt;By default, we’re using &lt;code&gt;all-MiniLM-L6-v2&lt;/code&gt;, a fast and lightweight SBERT model (384 dimensions).&lt;/li&gt;
&lt;li&gt;You can swap this out with any other Sentence Transformer model depending on your use case (better accuracy, domain-specific data, etc.).&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;A wide collection of SBERT models can be found on Hugging Face here:&lt;br&gt;
&lt;a href="https://huggingface.co/models?library=sentence-transformers" rel="noopener noreferrer"&gt;https://huggingface.co/models?library=sentence-transformers&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For example, you could replace the default with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;sentence-transformers/all-mpnet-base-v2&lt;/code&gt; (higher accuracy, 768 dimensions)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;sentence-transformers/paraphrase-MiniLM-L12-v2&lt;/code&gt; (optimized for paraphrase detection)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This flexibility makes SBERT powerful where you can start small, then upgrade the model later without changing the rest of your code.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 2: Convert Sentences into Vectors
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import torch
import torch.nn.functional as F

def mean_pooling(model_output, attention_mask):
    """Average token embeddings with attention mask (standard SBERT pooling)."""
    token_embeddings = model_output[0]
    mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * mask_expanded, 1) / torch.clamp(mask_expanded.sum(1), min=1e-9)


def to_vector(text):
    """Convert a string (or list of strings) into SBERT embeddings."""
    if isinstance(text, str):
        text = [text]

    tokenizer, model = _load_model()
    encoded_input = tokenizer(text, padding=True, truncation=True, return_tensors="pt")

    with torch.no_grad():
        model_output = model(**encoded_input)

    sentence_embeddings = mean_pooling(model_output, encoded_input["attention_mask"])
    return F.normalize(sentence_embeddings, p=2, dim=1)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;mean_pooling&lt;/code&gt;: SBERT doesn’t just use the [CLS] token; instead, we average all token embeddings while respecting the attention mask.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;to_vector&lt;/code&gt;:&lt;/li&gt;
&lt;li&gt;Takes a string (or list of strings).&lt;/li&gt;
&lt;li&gt;Tokenizes it (padding/truncation for batch processing).&lt;/li&gt;
&lt;li&gt;Runs it through the model in inference mode (torch.no_grad()).&lt;/li&gt;
&lt;li&gt;Pools it into a single vector per sentence.&lt;/li&gt;
&lt;li&gt;Normalizes the vector (important for cosine similarity).&lt;/li&gt;
&lt;li&gt;Now we can turn any sentence into a numerical vector.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Step 3: Compare Two Sentences
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def compare(text1, text2):
    """Compare two texts and return cosine similarity."""
    v1 = to_vector(text1)
    v2 = to_vector(text2)
    similarity = torch.nn.functional.cosine_similarity(v1, v2)
    return similarity.item()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;This function takes two texts.&lt;/li&gt;
&lt;li&gt;It converts them into vectors using to_vector.&lt;/li&gt;
&lt;li&gt;Then, it uses cosine similarity to calculate how close they are in meaning.&lt;/li&gt;
&lt;li&gt;Finally, it returns a single similarity score (0 → different, 1 → identical meaning).&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  A Note on Using the SentenceTransformers Shortcut
&lt;/h3&gt;

&lt;p&gt;You might be wondering: &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Why not just use the SentenceTransformers library directly? Isn’t there already a function for this?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And you’d be right here’s the quick way:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from sentence_transformers import SentenceTransformer

# Load a pretrained Sentence Transformer model
model = SentenceTransformer("all-MiniLM-L6-v2")

# Encode sentences directly
sentences = [
    "The weather is lovely today.",
    "It's so sunny outside!",
    "He drove to the stadium.",
]

embeddings = model.encode(sentences)

# Calculate pairwise similarities
similarities = model.similarity(embeddings, embeddings)
print(similarities)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works perfectly fine, and for small projects or experiments, it’s the fastest way to go.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why We Wrote It Manually
&lt;/h3&gt;

&lt;p&gt;In this tutorial, I chose to build the embedding helper “by hand” (with &lt;code&gt;AutoTokenizer&lt;/code&gt;, &lt;code&gt;AutoModel&lt;/code&gt;, and &lt;code&gt;pooling&lt;/code&gt;) instead of relying only on the &lt;code&gt;SentenceTransformer.encode()&lt;/code&gt; shortcut. Why?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Transparency&lt;/strong&gt; → you see exactly how embeddings are generated, step by step.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deeper Understanding&lt;/strong&gt; → by writing it out, you gain a better grasp of what’s happening under the hood (tokenization → model forward pass → pooling → normalization → similarity).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flexibility&lt;/strong&gt; → you can change pooling strategies, thresholds, or normalization methods to fit your use case.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Configurability&lt;/strong&gt; → we tie it into Flask’s app.config, so you can swap models or tweak parameters without rewriting code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scalability&lt;/strong&gt; → manual control lets you optimize for GPUs, batching, or mixed precision when moving to production.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compatibility&lt;/strong&gt; → you’re not limited to models wrapped by SentenceTransformers. Any Hugging Face model can be plugged in.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most importantly: by building it this way, you’ll understand it well enough to modify it later. If your project needs domain-specific embeddings or a different similarity function, you’ll know exactly where and how to make those changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building the REST API
&lt;/h2&gt;

&lt;p&gt;Now we’ll extend our app.py file so that it:&lt;/p&gt;

&lt;p&gt;Accepts two input texts via a POST request.&lt;br&gt;
Uses our embedding_service to calculate similarity.&lt;br&gt;
Returns the similarity score as JSON.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 1: Import the Helper
&lt;/h3&gt;

&lt;p&gt;Inside &lt;code&gt;app.py&lt;/code&gt;, import the helper we created:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from flask import Flask, request, jsonify
from embedding_service import compare
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Add a New Route
&lt;/h3&gt;

&lt;p&gt;We’ll define a new endpoint &lt;code&gt;/similarity&lt;/code&gt; that accepts a POST request with two texts.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@app.route('/similarity', methods=['POST'])
def similarity():
    try:
        data = request.get_json()
        text1 = data.get("text1")
        text2 = data.get("text2")

        if not text1 or not text2:
            return jsonify({"error": "Both text1 and text2 are required"}), 400

        score = compare(text1, text2)
        return jsonify({
            "text1": text1,
            "text2": text2,
            "similarity": score
        })
    except Exception as e:
        return jsonify({"error": str(e)}), 500
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this code we&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We grab the JSON body with request.get_json().&lt;/li&gt;
&lt;li&gt;Extract text1 and text2.&lt;/li&gt;
&lt;li&gt;Validate that both are provided.&lt;/li&gt;
&lt;li&gt;Call our compare() function from the embedding service.&lt;/li&gt;
&lt;li&gt;Return a JSON response with the result.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 3: Test the API
&lt;/h3&gt;

&lt;p&gt;Run the Flask app again:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python app.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, send a POST request with two sentences (using curl, Postman, or httpie):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -X POST http://127.0.0.1:5000/similarity \
    -H "Content-Type: application/json" \
    -d '{"text1": "What is the capital of France?", "text2": "Which city is the capital of France?"}'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If everything is working, you should get a JSON response like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "similarity": 0.9416358470916748,
  "text1": "What is the capital of France?",
  "text2": "Which city is the capital of France?"
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you have a fully working Text Similarity REST API powered by SBERT and Flask.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion and Next Steps
&lt;/h2&gt;

&lt;p&gt;In this article, we built a &lt;strong&gt;hands-on text similarity API&lt;/strong&gt; step by step:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First, we learned how embeddings work by writing comparison logic manually.&lt;/li&gt;
&lt;li&gt;Then, we automated embedding and similarity with the &lt;code&gt;SentenceTransformer&lt;/code&gt; API.&lt;/li&gt;
&lt;li&gt;Finally, we wrapped everything in a &lt;strong&gt;Flask REST service&lt;/strong&gt; so others can consume it easily.&lt;/li&gt;
&lt;li&gt;This is a solid foundation but you don’t have to stop here.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You could improve this project by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Registering embeddings into a &lt;strong&gt;vector database (such as Pinecone, Weaviate, Qdrant, or FAISS or pgvector).&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;That way, instead of only comparing two sentences, you can run &lt;strong&gt;nearest-neighbor queries&lt;/strong&gt; across thousands or millions of stored texts.&lt;/li&gt;
&lt;li&gt;This unlocks &lt;strong&gt;semantic search, recommendation systems&lt;/strong&gt;, and even &lt;strong&gt;chat with your documents&lt;/strong&gt; use cases.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scaling the API&lt;/strong&gt; with FastAPI + Uvicorn for production.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Adding authentication &amp;amp; logging&lt;/strong&gt; if others will consume your service.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With these improvements, your little embedding service can evolve into a &lt;strong&gt;real-world semantic search engine.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>nlp</category>
      <category>python</category>
      <category>machinelearning</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
