Forem: Unknownerror-404

Why do we CHOOSE to learn anything?

Unknownerror-404 — Fri, 17 Apr 2026 04:45:00 +0000

The Internship

Hello to all those reading this blog for the first time.

Before we move on to the piece, I think I should preface the fact that this isn't really an advice piece; it's much more of an experience piece. I've lately been itching to write about it, so here goes...

Recently (as of February 2026), I was accepted into a pretty niche internship using a profound new technology called AI, specifically supervised and unsupervised learning models in the field of healthcare.

Now, the experience itself was pretty good, beginning with a topic my group (whom I’ll refer to as G1) started researching for our projects during the internship. Unlike most internships, where we have to work for a company or primarily focus on coding and learning how projects are made, or be confronted with workflow in the industry, this was a research internship. For those unaware, a research internship is basically a place where we spend nearly all our time building something new or experimenting with old data to find new information.

Being as early as I am within CSE and AI in general, I'll be the first to admit I am not very experienced in the complete field. However, the things I've done, I am pretty good at. Given this context, I immediately took on the role of classifier developer, or rather programmer, for developing a Bi-LSTM trained on user activity. (For this blog, I’m not going to delve too deeply into what our research covers, as the paper is still being worked on, even though the internship is over.)

Having known some theory (as I wrote a paper on it), I was pretty confident in what I could accomplish. But the problems began when I started working on it. Knowing how to build an LSTM and figuring out how real-time data is supposed to be passed to it are two completely different things.

About a month in, I had decided to use inference, OpenCV, and a simple LSTM setup to develop the LSTM model. However, I was really struggling to find the pipeline for how the data goes from A to B, i.e., the camera to the LSTM.

After about a month of referencing, Googling, and headache-inducing frustrations, I finally realised I should try learning a Python package, specifically FAST API.

So, after another two weeks of struggle, I finally understood enough to build my very own data pipeline (albeit lacking optimisation, of course).

For anyone who might not have used FAST API, passing 30 fps from your camera directly to your development server, especially one with limited storage, might not be a good idea because the data may never truly reach the intended model.

Another three to four days of understanding optimisation later, I had what you call a prototype build (i.e., the most stable version of your product, which you would present because it works). It was not perfect, nor was it aesthetically pleasing, but hey, it functioned.

After this headache of a journey, I presented the results and called it a day, having gone through such a stressful period that promptly sent me back to wanting to relax. This is the part where the story should end, we learn x, y, and z, and have fun, but for me, this is where the problem began.

Why Do Anything?

Relaxing can be a very enjoyable time; you don't think about much and just reflect on what you've built while feeling proud. But, in trying to get into the industry, I thought I would see how to improve on my current design.

Big Mistake

The ten seconds that followed are the direct impetus for this blog. As a college student, the most optimal solution is to understand how the best solution would work; consequently, I turned to LLMs. The problem arose when I realised I could have simply typed:

Mind providing a pipeline that 'does my part of the assignment inserted here'

and saved months of my life, obtaining the following code block:

import torch
import torch.nn as nn
class BiLSTM(nn.Module):
    def __init__(self, input_dim, hidden_dim, num_layers, output_dim):
        super().__init__()

        self.lstm = nn.LSTM(
            input_dim,
            hidden_dim,
            num_layers=num_layers,
            bidirectional=True,
            batch_first=True
        )
        self.attention = nn.Linear(hidden_dim * 2, 1)

        self.fc = nn.Linear(hidden_dim * 2, output_dim)
        self.dropout = nn.Dropout(0.3)

    def forward(self, x):
        lstm_out, _ = self.lstm(x)

        # Attention weights
        attn_weights = torch.softmax(self.attention(lstm_out), dim=1)
        context = torch.sum(attn_weights * lstm_out, dim=1)

        out = self.dropout(context)
        out = self.fc(out)

        return out


**This code was taken directly from ChatGPT**

As grim as this realisation can be, it prompted me to ask:

Why do we learn anything if everything can be obtained externally?

Or, more appropriately, What is the cost of having knowledge exported?

Having this realisation really emphasised the sunk-cost fallacy, as I found myself trying to argue morality in a rather pragmatic debate. Being a university student, this is one part that I truly haven't had any experience with (even with my internship in consideration). Based on my assumptions, given the mass firing and adaptation to agentic AI, the picture here seems pretty clear:

Tech Corps don't really care about the morality behind how something is done, but rather that it meets the deadline as stipulated.

Having nearly no real-experience and as my sources are primarily secondary, I truly cannot back these statements up with evidence, yet as they remain critical pieces to this discussion (forming a part of the context in the light of which we'll be examining our statement), I do not aim at diving further into the truth of this statement, and the aim moving further would be simply observations and records.

So, in the context of our earlier question:

What is the cost of studying anything if the thing that matters remains basic completion?

With this sombre note, I spent the past 15 days simply trying to understand whether, as a student, beyond the basics of understanding code at a technical level, familiarising myself with the library, understanding its appropriate use and usability and in-depth practice, why would such a thought be promoted if, in the end, the thing which truly mattered was proper quality and working code in the given time.

(Now, before I move forward, I am again prefacing the fact that this assumption is built on the recent actions of firing and then rehiring a chunk of devs/testers/designers/architects/and programmers upon the failures of quality automation and integration of agentic AI.)

(In considering this, it is given that as of now, the tech may be underdeveloped, prompting these actions. Given a complete view of its capabilities, these actions could widely differ)

Unlike most coders who might find this answer within a minute or maybe a month or are still finding it themselves, I had to leave the field of tech itself to attempt to understand the reasoning.

The Window

Being a hobby artist, I do tend to spend time trying at least once or twice a month to work on something. As most of us know, artists rarely begin by creating something original (Not saying they can't, these instances are few and are most notably considered geniuses). Most commonly, they begin with imitations.

Because the ability to revolutionise the field is so far between, it mostly defines the artist's life or becomes a defining factor in their growth. But even so, every artist begins by taking inspiration from every object. In a sense, beginning with studying the subject.

In simple terms, this concept is quite similar to that of a developer. Although the fields may not overlap significantly, the need for study is present in both areas. This need is more evident in art, where some artistic disciplines are directly based on the subjects being studied. However, many artists often create works by imitating previous pieces. In fact, some artists take great pride in their ability to recreate works while maintaining an exceptional level of quality. Ultimately, what matters most to the untrained eye is the final product, whether it’s a print or a portrait.

The question of whether we would label programmers who use tools like Claude or Co-Pilot as "fake programmers" is something I cannot definitively argue for or against. Even if certain aspects align, multiple factors remain distinctly different. In both cases, there exists a culture of encouraging individuals to "study or understand the piece." And in any way, the aim of this piece remains to 'observe' and 'record' said observations for further interpretation, i.e. the answer I've formed is both incomplete and forming.

However, an axis of examination which might be pretty good to discuss the two cases will be:

Even the act of copying a piece leads to the act of learning something in terms of art, whereas for a programmer, the task of copying anything is as simple as Ctrl+c and Ctrl+v. I think this simple observation talks to the cost of entry. Even attempting to copy something for an artist makes them actively think and understand things like composition, poses, lighting, or at least makes them work on their line work. For programmers and coders, it begins and ends with understanding what we need to do, leading to subsequently creating the best prompt, i.e. prompt engineering.
For coders, the primary task shifts from understanding development to supervising.

This, I think, is the greatest cost for:

What is the cost of having knowledge exported?

The cost is a lack of visualisation, or rather, the intuition of development.

Before I started working on my internship project, I was asked to review existing literature. This task was not only about gaining knowledge but also about developing an intuition for how the final product would look, feel, and function before I began my work on it. This, without any prior knowledge, is simply an impossible task. Like, think about it, if you hadn't seen the theory for an LSTM before, what degree of correctness or wrongness does one work with? How does one tell whether something is precise or not? The whole basis of these and more questions is based on knowing what something does or is supposed to do.

For any programmer, if I were to ask to write code in some language you haven't even heard of, I'd say it'd be a pretty difficult task, not impossible. Because you'd be able to project the needed task onto something you already know, compute all related mathematical or logical foundations, simply because you know how they work.

Had you been asked to code the result for a simple two-number addition without the prior knowledge of what addition is? or how do you code? I think the task might just be greatly disadvantageous, not impossible, because you may translate it to something you know pretty in-depth.

Now I can't really debate the value of such a loss, as this may vary from person to person, context to context. However, the existence of such a value is something which existed within programmers preceding us, and could be something worth learning the importance of from first-hand experience.

This is also not to say that outsourcing this visualisation is either positive or negative, and I also know this isn't the be-all end-all, but I really wanted to test it. So, I began work on a creative portfolio, something limited to me but containing the same level as any professional product. When I began working on it, I was really stuck, unable to envision what my creative portfolio would look best as. Having taken a page from the artist within, I thought I'd give art a try for my inspiration. The piece, of course, an imitation, led to my interpretation of 'The Window.'

This forms the foundation for the colour theme and layout of my creative portfolio, which reflects my vision for it. This is the design currently featured as the blog cover.

As I reflect on this journey, I find myself unsure about whether to delve deeper into this topic or to keep it as is. My absence lately, aside from my internships, has largely been due to this uncertainty. In this space, I genuinely want to explore related aspects of technology, rather than just technology itself. Instead of expressing personal biases, I want to maintain this platform as a space for learning and inquiry. Therefore, I propose making observations that allow each user to dig deeper if they feel the discussions are too superficial.

I hope you will continue with me in this next phase of our conversation. Until next time!

An update....

Unknownerror-404 — Fri, 13 Mar 2026 17:21:18 +0000

Hey there! Been a while. If you're new here, I'm your resident chatbot geek and classifier nerd. The reason for this blog: an update. Over the past couple of months, I've been working on a few projects (~3), which have consistently taken up most of my free time. So, this is just an update to let you know I have begun working on the next blog, and this time it isn't purely programmatic. It's slightly philosophical. With that, I'll leave you to it.
Until Next Time!

From Understanding to Action: Teaching Your Assistant to Respond

Unknownerror-404 — Fri, 20 Feb 2026 13:30:00 +0000

NLU answers:

“What did the user mean?”

Dialogue management answers:

“What should I do about it?”

In Rasa, this logic is built using four core components:

Domain files
Stories
Rules
Slots

Let’s break them down.

The Domain File: Defining the Assistant’s World

The domain file lives here:

domain.yml

This file defines everything your assistant knows how to do.

version: "3.1"

intents:
  - greet
  - book_flight

entities:
  - location

slots:
  location:
    type: text
    mappings:
      - type: from_entity
        entity: location

responses:
  utter_greet:
    - text: "Hello! How can I help you?"

  utter_ask_location:
    - text: "Where would you like to travel?"

actions:
  - action_book_flight

The domain defines:

What intents exist
What entities exist
What slots store
What responses are available
What custom actions can run Think of it as the assistant’s capability registry; if it’s not in the domain, it doesn’t exist.

Slots: Memory Between Turns

NLU understands a single message. Slots allow your assistant to remember information across messages. Slots are your assistant’s working memory; without them, every message is isolated.

Stories: Teaching Multi-Turn Behaviour

Stories describe example conversations and live in stories.yml.

Example:

stories:
  - story: book flight happy path
    steps:
      - intent: book_flight
      - action: utter_ask_location
      - intent: inform
        entities:
          - location: Madrid
      - action: action_book_flight

Stories show the dialogue model:

Which intent starts a flow
What action follows
How slot filling changes behaviour
When to execute business logic
Under the hood, Rasa uses a transformer-based dialogue policy (TED policy) to learn patterns across these conversation examples.

Unlike rule systems, it generalises beyond exact story matches.

Rules: Deterministic Behaviour

Sometimes you don’t want learning, and want certainty and rules live in rules.yml.

Examples:
rules:
  - rule: respond to greeting
    steps:
      - intent: greet
      - action: utter_greet

Rules are:

Deterministic
One-intent → one-action
Ideal for FAQs, greetings, confirmations

Use rules for predictable behaviour and use stories for flows.

Common Dialogue Design Mistakes

Just like NLU, dialogue design has pitfalls.
Avoid:

Overusing rules for complex flows
Writing too few stories
Ignoring unhappy paths
Forgetting slot resets
Embedding business logic in responses

And most importantly:
Do not confuse intent prediction with behaviour control.

Intent prediction tells you what the user wants, whereas dialogue management determines what happens next.

What You’ve Built So Far;

Hopefully by now, you understand:

Pipelines
Rules
Training
Dialogue
Model Behaviour and the basics of rasa...

Following this blog, the next one will probably be the last one to actually showcase a complete script to you.
So you can practically understand all the concepts through examples.
So, Until next time....

From DIET to Deployment: Training Your First Rasa NLU Model

Unknownerror-404 — Mon, 16 Feb 2026 13:30:00 +0000

CRF showed us structured entity extraction. DIET showed us joint intent–entity learning. Now it’s time to move from theory to practice.

Understanding models is important. But models are useless without data, and this is where real NLU development begins.

So far, we’ve discussed how DIET works internally.
Now we answer the practical question: How do we actually train it?

Rasa training consists of three core steps:

Create structured training data
Configure the NLU pipeline
Train the model

Let’s walk through each.

Generating Training Data

Rasa models learn entirely from annotated examples. Unlike rule-based systems, you don’t write logic, but you provide examples.

The NLU File

Rasa training data lives inside a YAML file, typically:

data/nlu.yml

version: "3.1"

nlu:
  - intent: book_flight
    examples: |
      - Book a flight to [Paris](location)
      - I want to fly to [Berlin](location)
      - Get me a ticket to [London](location)

  - intent: greet
    examples: |
      - Hello
      - Hi
      - Hey there

Notice:

Intents are labels
Entities are annotated inline
No separate entity file
No feature engineering DIET learns both tasks from this single dataset.

How Much Data Do You Need?

There’s no magic number, but for general guidance:
10–15 examples per intent → minimum prototype
50–100 examples per intent → production baseline
Diverse phrasing is nearly always better than repetitive patterns.

Bad example:

Book a flight to Paris
Book a flight to Berlin
Book a flight to London

Good example:

I need to travel to Paris
Can you find flights to Berlin?
Get me a ticket heading to London
Fly me to Rome tomorrow

Variation teaches generalisation.

We've already covered the configuration of the pipeline, so those curious can read the intermediate blogs from the playlist to understand how the configuration works.

Training the Model

Once data and configuration are ready, training is simple:

rasa train

Behind the scenes, Rasa reads NLU data using it to build vocabulary. This is followed by initialises DIET model and running multiple training epochs
This is followed by optimising loss for intent + entity prediction, and those parameters are then saved as a trained model file.

models/20260215-123456.tar.gz

This folder contains:

NLU model
Dialogue model
Metadata Now your assistant is runnable.

What Happens During Training?

Internally:

Text is tokenised.
Tokens are vectorised.
Transformer layers process context.
Intent and entity losses are computed jointly.
Gradients update shared weights.

You don’t manually tune features.

Where the dev has to tune:

epochs
learning rate (advanced use)
embedding dimensions
batch size

Testing the Model

After training:

rasa shell nlu

Which activates a test server where you can truly experience the model yourself, asking prompts, testing limitations and forming improvements as you communicate.

For any input you will obtain the outputs as follows:
Say you type:

Book a flight to Madrid tomorrow

You can safely assume to obtain:

{
  "intent": {
    "name": "book_flight",
    "confidence": 0.94
  },
  "entities": [
    {
      "entity": "location",
      "value": "Madrid"
    }
  ]
}

This is DIET in action, trained on your data.

Common Beginner Mistakes

Now to address a few common yet harmful beginner errors:
Improving data quality almost always improves performance more than tweaking architecture.

So you should focus on:

Diverse phrasing
Balanced intents
Clear entity boundaries
Avoiding overlapping intent meanings

Good data reduces ambiguity.

And try avoiding:

Too few examples
Overlapping intents
Copy-paste variations
Mixing business logic into NLU
Ignoring real user phrasing

Always remember:

NLU predicts meaning, and it does not enforce workflow.
And that the training works by following:
Train → Test → Improve → Retrain.

Where We Go Next

Now that we know:

How to generate training data
How to configure DIET
How to train a Rasa model

Next, we’ll connect NLU to dialogue training:

Domain files
Stories
Rules
Slot filling
End-to-end training Because predicting intent is only step one. Building behaviour is step two. Now we begin building real assistants.

Until next time.

What is the DIETClassifier?

Unknownerror-404 — Sun, 08 Feb 2026 01:29:37 +0000

In the previous blog, we explored CRFEntityExtractor, a sequence-labeling model that learns how entities appear in context using statistical features.

CRF represented a major step forward from pure rule-based extraction.
But as conversational systems evolved, maintaining separate models for intent classification and entity extraction started to show its limits.

Modern NLU pipelines favor shared representations, joint learning, and deep learning–based generalization.

That’s where DIETClassifier comes in.

Contents of this blog

What is DIETClassifier
Why DIET was introduced
How DIET works at a high level
Intent classification with DIET
Entity extraction with DIET
Training data format
When to use DIETClassifier

What is the DIETClassifier?

DIET stands for Dual Intent and Entity Transformer.

It is a single neural network that performs:

Intent classification
Entity extraction

…at the same time.

Unlike CRFEntityExtractor, which focuses only on entities, DIET jointly learns:

The meaning of the full sentence (intent)
The role of each token (entity labels)

This shared learning allows the model to use intent-level context to improve entity prediction, and vice versa.

Why was DIET introduced?

Traditional pipelines looked like this:
Intent classifier → predicts intent
Entity extractor → predicts entities independently

This separation has drawbacks:

Duplicate feature computation
No shared understanding between intent and entities
More models to train, tune, and maintain

DIET solves this by using one model to learn shared embeddings and optimise both tasks together.

This leads to better performance, especially when training data is limited.

How DIET works:

DIET is based on a Transformer architecture.
At a high level, it:

Tokenizes the input text
Converts tokens into embeddings
Applies transformer layers to model context

and predicts:

A sentence embedding → intent
Token-level labels → entities
Instead of hand-engineered features (as in CRF), DIET learns features automatically.

Intent classification with DIET

For intent classification, DIET:

Embeds the entire sentence
Compares it against learned intent embeddings
Uses similarity scoring to choose the best intent

Example:

"Book a flight to Paris."

The model learns that this sentence embedding is closest to the book_flight intent. This approach allows DIET to generalize well to paraphrases and unseen phrasing.

Entity extraction with DIET

For entities, DIET performs token-level classification, similar to CRF. Each token receives labels like B-entity, I-entity, O, etc.

Book    O
a       O
flight  O
from    O
New     B-location
York    I-location
to      O
Paris   B-location

The difference is that DIET uses contextual embeddings produced by transformers instead of manually designed features.

Training data format

DIET uses the same annotated NLU data as CRF.

version: "3.1"

nlu:
  - intent: book_flight
    examples: |
      - Book a flight from [New York](location) to [Paris](location)
      - Fly from [Berlin](location) to [London](location)

There is no separate configuration for intent vs entity training. DIET learns both from the same data.

Internal working (simplified)

At runtime, DIET:

Tokenizes the message
Generates embeddings
Applies transformer layers
Predicts:
- Intent with confidence
- Entity labels per token
Groups entity tokens
Outputs structured NLU results

Example output:

{
  "intent": {
    "name": "book_flight",
    "confidence": 0.92
  },
  "entities": [
    {
      "entity": "location",
      "value": "Paris",
      "start": 23,
      "end": 28
    }
  ]
}

When should you use DIETClassifier?

DIETClassifier is the default choice when you want a single model for intents and entities, when the language is flexible and conversational,
and when you care about long-term scalability or are building production-grade assistants.

CRFEntityExtractor and RegexEntityExtractor still have value, especially for highly structured or deterministic entities, but DIET is the backbone of modern Rasa NLU pipelines.

With this, we have completed most of the major entity and intent mappers. Following this, we shall begin to see how bots are developed using code.

Until next time.

Understanding CRFEntityExtractor: Learning Entities from Context

Unknownerror-404 — Fri, 23 Jan 2026 13:46:39 +0000

In the previous blog, we explored RegexEntityExtractor, a rule-based approach where entities are extracted by explicitly matching patterns.

That works extremely well when entity formats are predictable.

But not all entities behave that way.

Some entities depend heavily on context, word boundaries, and surrounding tokens.
This is where statistical learning becomes necessary.

Enter the CRFEntityExtractor.

Contents of this blog

What is CRFEntityExtractor
Why do we need it
How CRF works at a high level
Training data format
Pipeline configuration
Internal working
Strengths and limitations
When and why to use it

What is the CRFEntityExtractor?
The CRFEntityExtractor is a machine learning based entity extractor that uses a Conditional Random Field (CRF) model.

Unlike regex-based extractors, it does not rely on fixed patterns.
Instead, it learns how entities appear in context from labeled training data.

In simple terms:

Given a sequence of tokens, the model learns which tokens belong to which entity types.

This allows it to extract entities even when:

Formats vary
Words are ambiguous
Structure is loose
Context determines meaning

Why do we need it?
Many real-world entities are not strictly structured.

Examples:

Person names
Locations
Job titles
Product names
Custom domain-specific terms

Consider the word “Apple”:

“Buy Apple stock” → organization
“Eat an apple” → food

Regex cannot solve this.
CRF can, because it looks at neighboring tokens, not just the token itself.

How CRF works (high level)
CRF is a sequence labeling model.
Instead of classifying individual tokens independently, it predicts the most likely sequence of labels for an entire sentence.

Each token is assigned a label such as:

B-entity (beginning)
I-entity (inside)
O (outside)

For example:

Book a flight from New York to Paris

Token labels might look like:

Book O
a O
flight O
from O
New B-location
York I-location
to O
Paris B-location

The CRF learns which label sequences are valid and likely, not just which individual labels fit.

Training data format
CRFEntityExtractor requires annotated training data in your NLU YAML file.

Example:

version: "3.1"

nlu:
  - intent: book_flight
    examples: |
      - Book a flight from [New York](location) to [Paris](location)
      - Fly from [Berlin](location) to [London](location)

From this data, the model learns:

Token patterns
Contextual relationships
Entity boundaries
Transition probabilities between labels

More diverse examples generally lead to better generalization.

Pipeline configuration
To enable CRF-based extraction, add it to your pipeline:

pipeline:
  - name: WhitespaceTokenizer
  - name: LexicalSyntacticFeaturizer
  - name: CRFEntityExtractor

Key supporting components:

Tokenizer → splits text into tokens
Featurizer → generates features such as:
- Lowercase form
- Word shape
- Prefixes / suffixes
- Token position CRF does not work directly on raw text, it works on features.

Internal Working
At runtime, the CRFEntityExtractor operates roughly as follows:

Tokenizes the user message
Generates features for each token
Applies the trained CRF model
Predicts a label for every token
Groups consecutive B- / I- labels into entities
Outputs entities with:
- Entity name
- Extracted value
- Start and end character indices For the input: > "I want to fly tomorrow"

The extractor may output:

{
  "entity": "location",
  "value": "San Francisco",
  "start": 19,
  "end": 32
}

The phrase is extracted not because it matches a pattern, but because the model learned that this sequence of tokens commonly forms a location.

When should CRFEntityExtractor be used?
CRFEntityExtractor is a good fit when:

Entity boundaries depend on context
Formats are inconsistent or unknown
Natural language varies widely
You want generalization rather than exact matching

It is often used alongside RegexEntityExtractor, not instead of it.
Each extractor solves a different problem class.

In the next blog, we’ll look at how DIETClassifier unifies intent classification and entity extraction, and why modern pipelines increasingly rely on it over standalone CRF models.

Understanding the RegexEntityExtractor in RASA

Unknownerror-404 — Mon, 19 Jan 2026 12:00:00 +0000

Our previous blog explored how the Entity Synonym Mapper helps normalize extracted entities into canonical values.

Hereafter, we’ll move one step deeper into how entities are detected in the first place, specifically using pattern-based extraction.
This is where the RegexEntityExtractor comes into play.

Contents of this blog

What is RegexEntityExtractor
YAML configuration
Internal working
When and why to use it

What is the RegexEntityExtractor?
The RegexEntityExtractor is a rule-based entity extractor that uses regular expressions to identify entities in user input.
Unlike ML-based extractors, it does not learn from data.
Instead, it works on a very simple principle:

If the text matches a predefined pattern, extract it as an entity.

This makes it:

Deterministic
Fast
Extremely precise (when patterns are well-defined)

Why do we need it?
Not all entities are ambiguous.
Some entities:

Follow fixed formats
Are numerical or structured
Do not benefit from ML generalization

Examples:

Phone numbers
Email addresses
Order IDs
Dates
ZIP codes
Trying to train an ML model to extract these is often overkill.

YAML Configuration Example
Regex patterns are defined directly in your NLU YAML file.

version: "3.1"

nlu:
  - regex: phone_number
    examples: |
      - abc@gmail.com
      - xyz@gmail.com

Pipeline Configuration
To enable it, the extractor must be added to your pipeline:

pipeline:
  - name: WhitespaceTokenizer
  - name: RegexEntityExtractor

Internal Working
At a low level, the RegexEntityExtractor works as follows:

Takes the raw user message
Iterates over each regex pattern defined in YAML
Applies the pattern to the text
If a match is found:
- Extracts the matched substring
- Assigns it as an entity
- Stores start and end character indices

Consider the example:

"My phone number is 9876543210"

Then the entity extracted is:

{
  "entity": "phone_number",
  "value": "9876543210",
  "start": 19,
  "end": 29
}

Combining with Entity Synonym Mapper
A very common pattern is:

RegexEntityExtractor extracts the entity
Entity Synonym Mapper normalizes it

This combination gives:

Precision
Consistency
Clean downstream data

When should RegexEntityExtractor be used?

When Entity format is predictable
When Precision matters more than recall
When You want to reduce ML complexity
When You want deterministic behavior

Hereafter we’ll explore CRFEntityExtractor, where entities are learned statistically rather than matched explicitly.

Understanding the Entity Synonym Mapper in RASA

Unknownerror-404 — Sat, 17 Jan 2026 13:25:14 +0000

Our previous blog: Understanding RASA pipelines
described how RASA NLU handles stories, rules, policies, and forms.

Hereafter, we'll dive deeper into how entities are normalized in RASA and how the Entity Synonym Mapper works, with YAML examples and practical insights for pipeline development.

Contents of this blog

What is the Entity Synonym Mapper?
Why entity normalization is important
YAML configuration example
Internal working and considerations

What is the Entity Synonym Mapper?
As we discussed before, a pipeline is made up of modular components, each performing a small but important operation.

The Entity Synonym Mapper is one such component in RASA NLU pipelines. Its primary role is:

To map different textual representations of the same concept to a canonical form so your model can treat them equivalently.

Think of it as a translator for your entities. For example, your users might type:

"NYC"
"New York City"
"Big Apple"

All of these mean the same place, but without normalization, your chatbot would treat them as different entities. The Entity Synonym Mapper ensures that all of these map to a single canonical value, e.g., "New York City".

Why is this important?
Machine learning models, and NLP pipelines in general, cannot reason about synonyms automatically.

Without normalization:

Intent classification might succeed, but entity extraction will be inconsistent.
Downstream processes, like database queries or API calls, may fail if the entity values are inconsistent.

With normalization:
"NYC" → "New York City"
"Big Apple" → "New York City"

This reduces variance, improves training efficiency, and ensures predictable behavior.

YAML Configuration Example

The Entity Synonym Mapper uses a YAML file to define the synonyms. Here’s a minimal example:

version: "3.1"

nlu:
  - intent: inform_city
    examples: |
      - I want to travel to [NYC](city)
      - I'm going to [Big Apple](city)
      - Book a hotel in [New York City](city)

  - synonym: New York City
    examples: |
      - NYC
      - Big Apple

How this works:

The intent section shows how users might express a concept in multiple ways.
The synonym section defines the canonical value (New York City) and the variations that should be mapped to it (NYC, Big Apple).

Once defined, any entity recognized as one of the variations is automatically replaced by the canonical value.

Internal Working

At a low level, the Entity Synonym Mapper operates like this:

Entity extraction happens first (via your pipeline’s tokenizer + featurizer).
The Mapper checks if the extracted entity matches any synonym entry in the YAML file.
If a match is found, the entity value is replaced with the canonical value.

Think of it as a dictionary lookup:

synonyms = {
    "NYC": "New York City",
    "Big Apple": "New York City"
}

entity = "NYC"
canonical_value = synonyms.get(entity, entity)
print(canonical_value)
# Output: New York City

Practical Example
Imagine a chatbot for booking flights:

User inputs:
"I want to fly to Big Apple next week"
Without the Entity Synonym Mapper:

{
"intent": "inform_city",
"entities": [{
"entity": "city",
"value": "Big Apple"}]
}

With the Mapper:

{
"intent": "inform_city",
"entities": [{
"entity": "city",
"value": "New York City"}]
}
Now your downstream logic, such as searching flight databases, always receives consistent entity values, eliminating errors.

When to Use the Entity Synonym Mapper:

When you have common abbreviations or nicknames in user input.
When you want consistent entity values for downstream actions.
When training on multiple intents that share the same entity concept but have different expressions.

We’ll explore RegexEntityExtractor, diving into pattern-based entity extraction and how it complements the Entity Synonym Mapper for robust NLU.

Understanding the whitespace tokenizer!

Unknownerror-404 — Thu, 08 Jan 2026 11:50:00 +0000

Our previous blog: Understanding RASA pipelines describes how RASA NLU handles stories, rules, policies and forms.
Here after, we'll dive deeper into how pipelines should be developed and how each pipeline may be developed.

Contents of this blog:

Developing Pipelines
WhitespaceTokenizer

Developing Pipelines:

As we discussed in the last blog, a pipeline is the basic architecture of any chatbot. These pipelines are built in a similar manner to functional coding or OOP, where the programmer effectively writes functions for specific operations which are then extended for further functional additions.

def add(x, y):
    return x + y

def add_two_num(a, b):
    print(add(a, b))

if __name__ == "__main__":
    num1 = int(input("Provide the 1st num: "))
    num2 = int(input("Provide the 2nd num: "))
    add_two_num(num1, num2)

When we develop a pipeline the basic considerations are, what do we want to achieve and is there a pre-existing package which does what we want already.

If your answer is yes, it makes things very easy!

The most basic resources for anyone working with RASA lies in its base documentation, the base repository, and their API.

Once we identify that are is a set of pipelines which could be useful for us, we begin effectively stacking one on top of the other building our functionality.

Example:

Very recently I developed a bot for a clinic, as the answers required consistency and the quires could range from 'I need a vet' to 'Mind one for the animal doctor', RASA was the perfect fit.

When I was working with RASA, I began by building architecture from a bottom-up approach, beginning by defining what how a word should be defined.

This is where; we use the:

WhitespaceTokenizer

Now even though we've heard of or explained the 'WhitespaceTokenizer' before this blog, I want to dive deep within the working of the module.

It is the first step within the steps of RASA NLU pipelines:

pipeline:
- name: WhitespaceTokenizer
  intent_tokenization_flag: true
  intent_split_symbol: "_"

The only purpose its servers is to break the sentences of users into 'tokens' it is not used for syntactical analysis, intent analysis, or even sentence normalisation.

It is what decides where one 'token' is formed, now as redundant this maybe we consider tokens, not words.

As ML models are unable to directly work with large string data, or rather raw text, they use tokens which are then converted to features and further into embeddings. Whitespacetokenizer is the simplest type, it only looks for whitespaces within a sentence and defines tokens.

Internal working:

Tokeinzation

Consider a sentence, as:

'Hey? Can you direct me to the purchase page?'

Now the tokenizer works by dividing the sentences on whitespaces, and form a list as:

["Hey?", "Can", "you", "direct", "me", "to", "the", "purchase", "page?"]

The tokenizer does not remove any punctuations from sentences, this simple rule allows for a range of emotions to be captured through each input.

As linguistically, Hey!, Hey?, Hey?(hesitant) or even Hey can have a multitude of different meanings which the model must capture to be precise. Whenever the module forms a singular token the information which is stored by it consists of the starting character number within the string, the ending character number and the message itself.

{
  "text": "direct",
  "start": 14,
  "end": 19
}

In terms of low-level code, one could map RASA string handling to how strings are terminated within C using '/0' or 'nullpointers' within Linked Lists.

Rather than using it as its own, the Whitespace tokenizer is seen as a building block. Another similar tokenizer for periods is the RegexTokenizer. It too is consistently used within projects as but rather than working with sentenced tokens, it works with paragraphs and further divides them into sentences.

Now that we have our building block placed down, here after we'll move to how sentences are considered syncatically.

The next blog: To be released

Understanding RASA pipelines

Unknownerror-404 — Tue, 06 Jan 2026 11:50:00 +0000

Our previous blog: Understanding YAML describes how RASA NLU handles entities, intents and how slots are used within RASA.

This blog will discuss the need and use of stories, rules, policies, and forms within a chatbot.

Contents of this blog:

Stories
Rules
Policies
Forms

So, What are stories?

Stories

If intents describe what the user wants, entities describe the details, and slots describe what the assistant remembers, then stories describe how a conversation flows over time. In simple terms, stories teach RASA what should happen next. They are examples of conversations written from start to finish, showing how the assistant should respond given a sequence of user inputs, slot values, and actions.

Why stories exist
Unlike rule-based chatbots that follow rigid decision trees, RASA learns dialogue behaviour from examples. Stories provide those examples. Instead of explicitly coding:

If the user says X, then do Y

The programmer shows RASA:

When conversations look like this, the assistant usually responds like that.

What a story contains?
A story is a sequence of: User intents, Optional entities and slot updates, and Assistant actions written chronologically.

Essentially stories are training sets which train the bot on a set behaviour for some branch of the conversation.

User says something
→ Bot responds
→ User provides more info
→ Bot reacts accordingly

This sequence is what RASA learns from.

Basic story structure
Stories are defined in stories.yml.

version: "3.1"

stories:
- story: report symptom with duration
  steps:
  - intent: report_symptom
    entities:
    - symptom: fever
  - action: action_ask_duration
  - intent: provide_duration
    entities:
    - duration: three days
  - action: action_give_advice

Each step represents one turn in the conversation. This provides the programmer the ability to be as nuanced or intentional as they want to be with their respective bot, and conversation direction.

However, to be careful so that the bot doesn't respond to the unintended queries, we implement rules.

Rules

If stories teach RASA how conversations usually flow, rules define what must always happen. Rules are used when there is no room for ambiguity. They ensure that certain behaviors are deterministic, predictable, and enforced, regardless of context, wording, or conversation history.

Why rules exist

Machine learning is probabilistic by nature. That’s great for flexible conversations, but dangerous when a set condition is required to occur.

A goodbye should always end the conversation.
A form must always ask missing information.
An emergency symptom must always escalate.

Rules act as guardrails that override uncertainty within response selection, adding deterministic behaviour within the responses.

What a rule contains
A rule describes two important properties of behaviour:

A condition (intent, slot, or active loop)
A mandatory action that must follow

Unlike stories which preserve possible probabilistic behaviour, rules do not branch, they do not generalise and are applied without any variation to them.

Basic rule structure
A basic rule structure consists of the name of the rule, and the steps which are to be carried out by that rule.

version: "3.1"

rules:
- rule: say goodbye
  steps:
  - intent: goodbye
  - action: utter_goodbye

Rules can also depend on slots or conversation state, hence certain conditions must be met surely in order for rules to be executed.

- rule: emergency escalation
  condition:
  - slot_was_set:
    - emergency: true
  steps:
  - action: action_emergency_protocol

When both rules and stories apply to a conversation, we follow a rule first order, hence if a rule exists, RASA will follow it even if a story suggests a different response.

Policies
If intents and entities help RASA understand what the user said, and stories and rules describe how conversations should flow, then policies decide which action the assistant should take next.
Policies are the decision-makers of RASA’s dialogue system. A policy is a strategy that RASA uses to predict the next action based on:

The current conversation state
The intent detected
Extracted entities
Slot values
Previous actions
Active rules or forms

Multiple policies can exist at once, and RASA evaluates all of them before choosing the final action. Within the architecture of information processing, policies are located as after considering the conversation state.

User message
   |
   V
NLU (intent + entities)
   |
   V
Tracker (conversation state)
   |
   V
Policies evaluate state
   |
   V
Best next action chosen

Policies operate after NLU and before response execution.

Each policy:

Looks at the conversation tracker
Predicts the next action
Assigns a confidence score

RASA then selects the action with the highest confidence across all policies. A typical config.yml might look like:

policies:
  - name: RulePolicy
  - name: MemoizationPolicy
  - name: TEDPolicy

RulePolicy

The RulePolicy enforces rules. It checks if any rule applies, If yes → executes the rule-defined action it overrides all other policies. This guarantees deterministic behavior. If a rule matches, no ML prediction is needed.

MemoizationPolicy
Memoization is exact recall. If the current conversation state exactly matches a previously seen story, RASA repeats the same next action.

TEDPolicy
The TEDPolicy is RASA’s main ML-based dialogue policy. It embeds conversation states, learns patterns across stories, and generalises to unseen paths.

TED allows the assistant to handle paraphrases, adapt to partial information, manage complex, and branching conversations.

When it comes to how RASA NLU processes policies, it follows the conceptual order. So in our example, it would be RulePolicy -> deterministic manner, MemoizationPolicy -> Trained/seen data and follows it up using TEDPolicy to hand it off to ML processing.

Forms

If intents tell RASA what the user wants, entities extract key information, and policies decide what to do next, then forms exist to systematically collect missing information.
Forms are RASA’s way of saying:
I can’t proceed until I have everything I need.

A form is a controlled dialogue mechanism used to:

Ask the user for required information
Validate inputs
Store values in slots
Maintain conversational context until completion

They exist to handle free-flow conversation breaks down when:

Multiple values are required
Order matters
Missing data blocks progress

In our previous pipeline, forms act right after intent and entity consideration

User message
   |
   V
Intent + Entities
   |
   V
Form activated
   |
   V
Ask for required slots
   |
   V
Validate inputs
   |
   V
Form deactivates

Defining a form
Forms are declared in domain.yml in the following manner:

forms:
  symptom_form:
    required_slots:
      - symptom
      - duration
      - severity

When a required slot is empty, RASA automatically asks for it when forms are activated in stories and rules.
This covers up the basics of using RASA to build a chatbot, finally we will begin diving deeper into how the chatbot files play off each other and are used to, how policies themselves work, and intentional actions.

The next blog: To be released

Understanding YAML

Unknownerror-404 — Sat, 03 Jan 2026 12:50:00 +0000

Following up Understanding RASA which discussed Featurizers and Classifiers, and Pipelines, this one will dive into Stories, Rules and Policies.

Contents of this blog:

YAML
Intents
Entities
Slots

This blog will introduce readers to essential building blocks of yaml and RASA itself.

YAML

YAML or YML is a markup language, however unlike html or xml, yaml is used not for website design, but when using RASA, it acts as the structural foundation. Before we move on let's begin with understanding what yaml is and how yaml works.

Yaml effectively is a coding language which works using indentations used to form type blocks. In yaml, a type is can be considered as a superclass which consists of all the subtypes of most commonly the particular class.

The most common structure of a block is as provided below:

- Type:
    examples: |
      - e.g. 1
      - e.g. 2
      - e.g. 3

This structure is most commonly used for defining intents, slots or entities when providing examples.

Intents

Intents are essentially what a user aimed at saying from their message. Effectively what the user intended on saying from the message. Now even though RASA uses ML, it is utilised only by setting policies and creating policies. (If you don't know what those are, head on ahead to understanding RASA as it is clearly established within the first blog.)

It is an effective mapping tool which links all the similar meaning to a singular intent which conveys the broader response pattern.

E.g:

I think I have a stomachache,
My stomach hurts,
I might be having abdominal pain.

Are all linked to the intent: intent_symptom_stomach_ache

So an intent answers:
What is the user trying to do or express?

Within RASA itself, intents are the core unit of NLU (Natural Language Understanding). When considering a pipeline, we operate on text as:

                         [Some input query]
                                  |
                                  V
          [RASA model predictions from intents and entities]
                                  |
                                  V
           [Dialogue manager manages which O/P to provide]
                                  |
                                  V
                           [Bot response]



*Pipeline representation using ASCII art.

Internally json handles these queries like:

{
  "intent": {
    "name": "report_symptom",
    "confidence": 0.92
  },
  "entities": [
    {"entity": "symptom", "value": "cough"},
    {"entity": "duration", "value": "two days"}
  ]
}

Here the intent tag defines which type of query was provided by the user and the confidence level provides how confident the module is within this prediction.

In actuality when building chatbots, we would do it as:

version: "3.1"

nlu:
- intent: greet
  examples: |
    - hello
    - hi
    - good morning

- intent: report_symptom
  examples: |
    - I have a headache
    - My head hurts
    - I've been coughing for two days

Here, intent is the group label for where they belong. These are defined under a file called domain.yml. This acts as the initialisation of the intent/group for user query.

intents:
- greet
- goodbye
- affirm
- deny
- mood_great
- mood_unhappy

Entity

When the intent forms the intention behind the sentence, entity forms the specific value, information which the user aimed at finding information for. Without the specificity of entities, the information cannot be used for in-depth responses. Entities allow for procedural dynamism by utilizing branching.

Intent -> Symptom
Entities -> 
  symptomp = fever.
  duration = No. of days.

Without these entities the bot isn't as intelligent as normal. RASA uses entities are extracted by NLU and passed to the dialogue manager. JSON handles these in a similar manner to intents.

"entities": [
    {"entity": "symptom", "value": "fever"},
    {"entity": "duration", "value": "three days"}
  ]

Entity definition occurs within the same file where we define the examples of intents i.e. 'nlu.yml'. However, the actual initialisation occurs within domain.yml in a similar manner.

- intent: report_symptom
  examples: |
    - I have a [fever](symptom)
    - I've been coughing for [two days](duration)
    - My [head](body_part) hurts


*within nlu.yml

Entities are of multiple types ranging from word, categorical, numerical, and lookup or as regex entities. Each time they are used together to take in as much information as possible.

Slots

If intents answer what the user wants, and entities answer which specific information they provided, then slots answer what the assistant remembers.

In simple terms, slots act as RASA’s memory system.

While entities are extracted from a single user message, slots persist across multiple turns of conversation. This allows the chatbot to reason contextually instead of treating every user message as an isolated input.

Why slots are needed
Consider the following interaction:

User: I have a fever.
Bot: How long have you had it?
User: Three days.

Here, the bot must remember that:
“it” refers to fever
the symptom has already been mentioned

This continuity is made possible by slots. Without slots, the dialogue manager would not retain previous information, and the conversation would feel repetitive or incoherent.

Slot representation:
Extending the earlier pipeline representation, slots would act accordingly.

                         [User input]
                               |
                               V
                [Intent & Entity extraction]
                               |
                               V
                 [Slot filling / slot update]
                               |
                               V
               [Dialogue manager (policies)]
                               |
                               V
                        [Bot response]


*extension of the pipeline from intents.

Slots sit between NLU and dialogue management, acting as state variables that influence which action or response is selected next.

Defining slots
Slots are initialised inside domain.yml, similar to intents and entities.

slots:
  symptom:
    type: text
    influence_conversation: true
  duration:
    type: text
    influence_conversation: true

Here:
The type defines how the data is stored and influence_conversation determines whether the slot affects dialogue prediction for the current query.

When an entity is extracted, RASA can automatically map it to a corresponding slot.

Slots in JSON representation
Once filled, slots are stored internally as part of the conversation state.

"slots": {
  "symptom": "fever",
  "duration": "three days"
}

Slots can store different kinds of information depending on the use case.

Text slots : store raw strings (e.g., symptoms, names)
Categorical slots : restrict values to a predefined set (e.g., mild / moderate / severe)
Boolean slots : true/false flags (e.g., emergency_present)
Float / integer slots : numerical values such as age or dosage
List slots : store multiple values (e.g., multiple symptoms) are the most common types.

Now that the most basic definitions have been established, well look into how this behaviour is handled by the predefined pipeline.

The next blog: To be released

Understanding RASA

Unknownerror-404 — Thu, 01 Jan 2026 12:00:00 +0000

Previously, we understood the basics of Natural Language Processing ranging from sentence segmentation to parsing. These essential fundamentals form the foundation for understanding how systems work with and manipulate sentences. If you haven't read the blog, you can read it here.

Moving forward we'll dive into understanding chatbots and building them using RASA.

Contents of this blog:

Chatbots and development
What is RASA?
RASA core.
Featurizers and Classifiers.
Pipelines.

Definitions
Intents: An intent is a specific grouping of messages which the module can anticipate being used to map responses to.

Classifiers: Classifiers essentially take features produced by featurizers and make predictions.

Entity: An entity is a piece of information that the chatbot extracts from the user to perform some action.

Slots: Slots are temporary variables used to hold data from conversations.

Chatbot and development

Chatbots are basic response systems used for providing answers to queries. Although they are meant to be consistent, we can add dynamism by consider the methods by which they are developed.

Effectively chatbot development consisted of linking specified answers to questions based on the user needs. Most commonly these were used as assistants within web services, however utility doesn't just end there. With the introduction of Artificial Intelligence development of chatbots has becoming scarcer, as people prefer building LLMs with neural networks the considerations of chatbot or chat models has recently dwindled.

However, this doesn't make it ancient tech, rather a better understanding of preset response systems can help newer developers better understand the grounds for LLM development in general.

So how does one develop chatbots?
The development of chatbot varies a lot, for some a simple if-else structure leading to varying website responses can be a chatbot. For others a chatbot should be dynamic enough to be syntactically intelligent while also being consistent in its answering.

Chatbot development can be divided into static type and dynamic type (very broadly) based on this user need.

As previously stated, a simple if-else clause handling a 'y/n' response can be considered as partly a chatbot. Only in this case the answers are preloaded in the form of links or redirects to relevant pages.

In recent years the web-dev scene has moved closer to adopting syntactical analysers and add a sense of dynamism while keeping consistency by using such logical iterators. These are the basis of website helpers or assistants, nearly intelligent yet partly logical.

Query: Hey, Mind redirecting me to the Home page? I can't seem to find the link.
Assistant (internal): Hey, Mind redirecting me to the Home page? (Question)
I can't seem to find the link. (sentence)

Assistant (syntactical handling): ['Redirect'(action), 'me'(user), 'Home page'(location), 'can't find link' (reason)]

System response: https://Link_for_loc.org (some worded response)

This way the developer has complete control over the prompt responses which offers flexibility when developing large scalable websites. This reduces the 'black-box' from neural network training and add more transparency within the system itself.

What is RASA?

Now that we've established the need for response control with syntactical analysis one might question what RASA even is.

It's exactly that, RASA is a python package providing syntactical intelligence to systems. This allows for an ML based input system which has the ability of understanding synonyms as well as complete sentence variation.

For this the RASA module utilizes to core sub-modules which are responsible for handling this ability.

RASA core
RASA NLU

For this blog's we'll set our focus on what the RASA core is and does. Specifically looking at Featurizers, Classifiers, Pipelines and Policies.

What is RASA core?

RASA core can be considered as the responder to the user queries. The queries are commonly handled by the NLU (Natural Language Understanding) engine, whereas the responses are managed by the core.

RASA core is a state machine, i.e. it keeps track of the conversation, what the user intends from a sentence and finds the appropriate response for the query. Now we'll be covering NLU in the upcoming blog, I'll briefly explain how it works, as an understanding of intents is crucial for understanding response generation.

Simply stated, the RASA NLU utilizes a file known as 'domain.yml'. Essentially it is a yaml file which is used to declare all the types of intents. Within the file, being headed as 'intents' all the relevant group titles are declared under it.

intents:
  - greet
  - goodbye
  - affirm

This essentially tells the model that there will be some queries which you may expect of the type 'greet' or 'goodbye'.

Now the text is as is partly due to yaml coding, it uses python like indentations followed by dashes to declare subtype.

These intents are then "initialised" similarly to how variable initialisation works, under the file named 'nlu.yml' we declare all the examples of what 'greet' may look like (if it is difficult understanding it, imagine a super class called greet which holds all methods: examples of greetings)

- intent: greet
  examples: |
    - hey
    - hello
    - hi

Once all of these examples have been added we map these groups to their respective responses within a rules file, which maps the responses as steps which are to be performed once a query is asked.

As we understand how the model understands and handles data inputs let's move onto the actual reason behind why it is able to understand varying degrees of similar sentence intentions.

Features and Classifiers

Featurizers: Featurizers convert user messages (text) into numerical representations (features) that machine-learning models can understand.

Effectively taken the parsed input and producing vector representation internally. This allows the model to understand patterns, capture the meaning and obtain a sense of 'What do you mean?'

There are multiple featurizers but they are all dependent on the user's requirement.

Whitespace tokenizer: Used to form tokens, the mode of tokenization is every space between two words forms the definitions for where another token begins.

within: stories.yml
language: en
pipeline:
- name: WhitespaceTokenizer
  intent_tokenization_flag: true
  intent_split_symbol: "_"

Regex Featurizer: In Rasa, the RegexFeaturizer is a lightweight feature extractor that adds binary features based on whether parts of a user message match predefined regular expression. It does not extract entities by itself, and it does not classify intents. Instead, it helps classifiers (like DIETClassifier) by giving them strong signals.

within: stories.yml
- name: RegexFeaturizer

Lexical Syntactic Featurizer: In Rasa, the Lexical Syntactic Featurizer (officially LexicalSyntacticFeaturizer) is a token-level featurizer that adds linguistic pattern features based on the form and position of each token in a sentence. It helps classifiers and entity extractors recognize structural patterns, not meaning.

within: stories.yml
- name: LexicalSyntacticFeaturizer

And many more!

Classifiers:
In Rasa, classifiers are machine-learning components that take the numeric features produced by featurizers and use them to predict labels from user messages.

Those labels are mainly:

Intents (what the user wants)
Entities (important structured values in the text)

What a classifier does

Receives features (sparse + dense) from featurizers
Learns patterns from labeled training data
Predicts labels for new user messages

In short:

Text → Features → Classifier → Intent / Entities

Some common classifiers are DIETClassifiers and Entity Synonym Mapper. These are essentially used to classify entities from the user queries.

DIETClassifiers
DIET stands for Dual Intent Entity Transformer classifier. It is used for both intent classification and entity extraction within a singular model in the place of using two varying models.

Featurizers such as Regex (pattern), Lexical Syntactical (Wording), and Count vectors (vectorization) preprocess the initial inputs. With DIET receiving embeds as inputs transformer modules such as BERT uses self-attention. This allows it to learn contextual meaning.

Entity Synonym Mapper
Used for normalising entity values. Used for synonymous words consideration for varying user inputs. This effectively translates:

I need a heart doctor -> I need a cardiologist.

These changes can be spotted by the synonym mapper, but they are handled by slots and mapped within the 'domain.yml' file.

slots:
  patient_name:
    type: text
    mappings:
      - type: from_entity
        entity: patient_name

Pipelines
Pipelines are the architectural ordering which is used to process sentences. Pipelines are largely task specific, for this purpose RASA itself provides prebuilt packages of pipelines. These are mostly open-source projects, this promotes development of their own sequencing based on tasks, this is quite essential when developers prefer flexibility.

For e.g.
SpaCy Pipeline, Bert Pipeline or Bio-Bert Pipelines.

SpaCy Pipeline:
SpaCy is an NLP library with pre-trained embeddings for multiple languages. It provides embeddings with POS tagging with lemmatization and some NER.

Bert Pipeline: Provides a deep contextual embedding i.e. understands meaning from context. For RASA, the HF transformer NLP or DIET are integrated into the pipeline.

Bio-Bert: Domain specific version of BERT pretrained on biomedical text, this module is better for medical terminology. Accurate NER and useful for symptom checker based on disease names and drugs. This version is also useful in appointment scheduling for specified specialist.

These modules basically form the backbone of response generation and information outputting. In the following blogs I'll dive deeper into the rules, policies and stories which will inform you how the rules for what to output are formed.

Until next time!
The next blog: To be decided