Forem: Fortune Ndlovu

Build a RAG agent with LangChain and Ollama

Fortune Ndlovu — Tue, 05 May 2026 04:12:50 +0000

I started where a lot of us do: a LangChain RAG walkthrough. You chunk some text, embed it, retrieve top‑k chunks, and wire an LLM to answer questions. It clicks quickly, which is exactly why it’s easy to walk away thinking you’ve “done RAG.”

What bothered me was that the demo corpus is usually tiny and artificial, which is by design. I write on DEV.to about things like NLP routing and CNN image classification. If I can’t point a system at my own posts and get answers I can verify, I’m not building something close to “real world.”

So I pivoted: pull my public articles through the DEV.to API, index them in a persistent vector store, and chat over that index.

Then I hit a second wall that matters in practice: embedding APIs have quotas. When Google’s embedding tier said “not today,” I didn’t want the project to die. I switched embeddings to Ollama (nomic-embed-text) so indexing stays local and repeatable.

At that point I had a working assistant, but I still had an open question that tutorials rarely answer honestly:

Which wiring pattern actually wins when everything else is held constant?

Not “which blog post says CRAG is fancy,” but: same corpus, same embeddings, same chat model, and three different retrieval/orchestration strategies compared with traces and numbers.

So I turned the repo into a small RAG strategy lab: a tool-calling agent, an always-retrieve chain, and a corrective RAG graph. The punchline (benchmarks) lives at the end, swap in your own runs when you reproduce this.

Concepts I’m leaning on (short, practical definitions)

Retrieval-Augmented Generation (RAG) means: before the model answers, you fetch relevant passages from your sources and condition the answer on them. The failure modes are familiar, bad chunks, wrong retrieval, model ignores context. which is why “one demo implementation” isn’t enough.

Three strategies I compare

Tool-calling agent the model decides if and when to call retrieval tools. Flexible: it can retrieve multiple times, list articles, or pull full markdown for one URL when snippets aren’t enough.
Always-retrieve chain every user turn runs similarity search first; retrieved passages are injected into the system prompt via middleware. Predictable: one retrieval per turn (unless you change k), no tool loop.
Corrective RAG (CRAG) retrieve, then grade whether the passages are sufficient; if not, rewrite the query and retrieve again (with a hard cap on retries), then generate an answer.

Markdown-aware chunking technical posts have structure (headings, sections). Splitting on headers first keeps section boundaries and metadata like h1/h2/h3 useful for citations.

Content-hash dedup store a hash of each article body in chunk metadata. On re-ingest, skip unchanged articles so you’re not re-embedding the internet every time.

LangSmith tracing so every ingest step, chat turn, and benchmark example shows up as a run you can inspect. LangSmith Evaluate runs your target function over a dataset and attaches scores.

Persistent threads (SqliteSaver) conversation state survives restarts. Useful when you’re dogfooding the chat UI like a product, not a one-off script.

Prerequisites

You’ll need:

Python 3.11+ (that’s what this repo was tested with; 3.10+ should work if dependencies install cleanly).
A DEV.to username with public posts (swap DEVTO_USERNAME to yours).
Ollama if you want local embeddings and/or local chat.
- For this project: ollama pull nomic-embed-text and a chat model such as ollama pull qwen2.5:7b.
LangSmith API key if you want traces and benchmarks (LANGSMITH_API_KEY).
Google Gemini API key if you use Gemini for chat, embeddings, or the benchmark judge (GOOGLE_API_KEY). If everything is Ollama, you can skip Google but the default bench setup often keeps a stronger judge on Gemini while chat stays local.

Install and `.env` (from the project root)

python -m venv .venv
source .venv/bin/activate
# Windows PowerShell: .\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
cp .env.example .env

Edit .env with your keys, DEVTO_USERNAME, and model provider choices. The repo’s .env.example lists every variable with comments.

Repo layout (what each file is for)

.
├── rag_agent.py              # CLI entry: python rag_agent.py <command>
├── bench.py                  # Alternate entry for benchmarks
├── requirements.txt
├── .env.example
├── evals/
│   ├── qa.yaml               # Golden questions + expected substrings + source hints
│   └── results-*.md        # Written by bench (gitignored pattern in .gitignore)
└── rag_lab/
    ├── __init__.py
    ├── __main__.py           # python -m rag_lab
    ├── config.py             # Settings, tracing, model + store factories
    ├── ingest.py             # DEV.to fetch, chunk, dedup, Chroma upsert
    ├── tools.py              # Tools for the agent strategy only
    ├── cli.py                # ingest | chat | bench
    ├── bench.py              # LangSmith dataset + evaluate runner
    └── strategies/
        ├── __init__.py       # STRATEGIES registry
        ├── prompts.py
        ├── agent.py          # Tool-calling agent
        ├── chain.py          # Always-retrieve via @dynamic_prompt
        └── crag.py           # Corrective RAG StateGraph

Build walkthrough: what goes in each file

Keep in mind you are not missing steps, the full files live in this repo https://github.com/Fortune-Ndlovu/Build-a-RAG-agent-with-LangChain. Under each heading I explain why the piece matters, show the smallest slice that carries the idea, then point you at the full file for copy-paste.

`requirements.txt`

Pinned dependency list: LangChain, LangGraph, Chroma, Ollama, LangSmith, etc.

Full source in repo: requirements.txt

`rag_agent.py`

Thin entry point so readers run python rag_agent.py from the repo root without installing a package in editable mode.

#!/usr/bin/env python3
"""Entry point: run from repo root, e.g. `python rag_agent.py chat`."""

from rag_lab.cli import main

if __name__ == "__main__":
    main()

Full source in repo: rag_agent.py

`bench.py`

Same pattern for python bench.py → rag_lab.bench.main.

Full source in repo: bench.py

`rag_lab/config.py`

Everything reads one Settings object so switching providers is an env change, not a rewrite.

Conditional Google key: you only need GOOGLE_API_KEY when chat, judge, or embeddings use Google otherwise local-only setups stay simple.

@model_validator(mode="after")
def _validate_google_and_export_key(self) -> Settings:
    cp = self.chat_provider.lower().strip()
    jp = self.judge_provider.lower().strip()
    ep = self.embedding_provider.lower().strip()

    needs_google = cp == "google" or jp == "google" or ep == "google"
    if needs_google and self.google_api_key is None:
        raise ValueError(
            "GOOGLE_API_KEY is required when CHAT_PROVIDER, JUDGE_PROVIDER, or "
            "EMBEDDING_PROVIDER is google (or chat_model is a google_genai:* id)."
        )
    if self.google_api_key is not None:
        os.environ["GOOGLE_API_KEY"] = self.google_api_key.get_secret_value()
    return self

Ollama chat models need an explicit provider bare names like qwen2.5:7b don’t always infer correctly:

def _init_chat_for_provider(*, model_id: str, provider: str, ollama_base_url: str) -> Any:
    p = provider.lower().strip()
    if p == "ollama":
        return init_chat_model(
            model_id,
            model_provider="ollama",
            base_url=ollama_base_url,
        )
    # ...

LangSmith env so traces and evals land in your project:

def setup_tracing(settings: Settings) -> None:
    if not settings.langsmith_api_key:
        _log.info("LangSmith disabled (no LANGSMITH_API_KEY).")
        return
    key = settings.langsmith_api_key.get_secret_value()
    os.environ["LANGSMITH_TRACING"] = str(settings.langsmith_tracing).lower()
    os.environ["LANGSMITH_API_KEY"] = key
    os.environ["LANGSMITH_PROJECT"] = settings.langsmith_project
    # ...

Full source in repo: rag_lab/config.py

`rag_lab/ingest.py`

DEV.to gives you a list endpoint and a per-article endpoint. I fetch summaries, then full body_markdown per id, chunk, and upsert.

Markdown headers first, then character split preserves structure in metadata for citations:

md_splitter = MarkdownHeaderTextSplitter(
    headers_to_split_on=[("#", "h1"), ("##", "h2"), ("###", "h3")],
    strip_headers=False,
)
char_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1200,
    chunk_overlap=150,
    add_start_index=True,
)

Dedup: same body → same hash → skip re-embed for that article id.

def compute_changes(
    articles: list[dict[str, Any]], existing: dict[str, str]
) -> tuple[list[dict[str, Any]], int]:
    to_process: list[dict[str, Any]] = []
    skip = 0
    for art in articles:
        body = art.get("body_markdown") or ""
        h = _sha256(body)
        aid = str(art["id"])
        if existing.get(aid) == h:
            skip += 1
            continue
        to_process.append(art)
    return to_process, skip

@traceable on ingest steps gives you a span hierarchy in LangSmith for “what did ingest do?”

Full source in repo: rag_lab/ingest.py

`rag_lab/tools.py`

retrieve_context uses response_format="content_and_artifact" so the model sees serialized text, but the CLI can recover structured Document objects for a sources panel.

@tool(response_format="content_and_artifact")
def retrieve_context(query: str, k: int = 4) -> tuple[str, list]:
    """Retrieve the most similar passages from the indexed DEV.to articles."""
    docs = vector_store.similarity_search(query, k=int(k))
    serialized = "\n\n".join(
        (f"Source: {doc.metadata}\nContent: {doc.page_content}")
        for doc in docs
    )
    return serialized, docs

Full source in repo: rag_lab/tools.py

`rag_lab/strategies/prompts.py`

Three strategies means three prompt contracts especially CRAG, where grading and rewriting are separate tasks with tight output formats (YES/NO, one-word rewrite).

Example excerpt agent tool policy + citation behavior:

AGENT_SYSTEM = """You answer questions using tools backed by the author's indexed DEV.to articles.

Tools:
- retrieve_context: semantic search over passages (use for factual detail and quotes).
- list_articles: browse titles/URLs/tags without retrieval (use for "what posts exist?" questions).
- fetch_article: load full markdown for one URL when snippets are insufficient.
...
"""

Full source in repo: rag_lab/strategies/prompts.py

`rag_lab/strategies/agent.py`

This is the “let the model decide” baseline tools + system prompt, LangChain create_agent.

def build(model, vector_store, settings: Settings, *, checkpointer: Any = None, name: str | None = "strategy_agent"):
    tools = make_rag_tools(settings, vector_store)
    return create_agent(
        model,
        tools,
        system_prompt=AGENT_SYSTEM,
        checkpointer=checkpointer,
        name=name,
    )

Full source in repo: rag_lab/strategies/agent.py

`rag_lab/strategies/chain.py`

No tools middleware runs retrieval on every turn and stuffs <passage> blocks into the prompt. Great when you want “RAG always on” and minimal branching.

@dynamic_prompt
def prompt_with_context(request: ModelRequest) -> str:
    last = request.messages[-1]
    lc = getattr(last, "content", "")
    last_query = lc if isinstance(lc, str) else str(lc)
    docs = vector_store.similarity_search(last_query, k=4)
    docs_content = "\n\n".join(
        f"<passage source={doc.metadata!r}>\n{doc.page_content}\n</passage>"
        for doc in docs
    )
    return (
        CHAIN_SYSTEM
        + "\n<context>\n"
        + docs_content
        + "\n</context>"
    )

Full source in repo: rag_lab/strategies/chain.py

`rag_lab/strategies/crag.py`

Explicit graph grade sufficiency, optionally rewrite query, cap retries.

def route_after_grade(state: CRAGState) -> str:
    if state.get("sufficient"):
        return "generate"
    if state.get("attempts", 0) >= 2:
        return "generate"
    return "rewrite"

graph = StateGraph(CRAGState)
graph.add_node("prepare", prepare)
graph.add_node("retrieve", retrieve)
graph.add_node("grade", grade)
graph.add_node("rewrite", rewrite)
graph.add_node("generate", generate)

graph.add_edge(START, "prepare")
graph.add_edge("prepare", "retrieve")
graph.add_edge("retrieve", "grade")
graph.add_conditional_edges("grade", route_after_grade, {"generate": "generate", "rewrite": "rewrite"})
graph.add_edge("rewrite", "retrieve")
graph.add_edge("generate", END)

return graph.compile(checkpointer=checkpointer, name=name or "strategy_crag")

Full source in repo: rag_lab/strategies/crag.py

`rag_lab/strategies/init.py`

Strategy registry for the CLI and bench:

STRATEGIES: dict = {
    "agent": agent.build,
    "chain": chain.build,
    "crag": crag.build,
}

Full source in repo: rag_lab/strategies/__init__.py

`rag_lab/cli.py`

SqliteSaver gives you durable threads; slash commands let you switch strategies or resume a thread without restarting Python.

Sources panel from tool artifacts:

def _format_tool_sources(messages: list) -> str:
    lines: list[str] = []
    for m in messages:
        if not isinstance(m, ToolMessage):
            continue
        art = getattr(m, "artifact", None)
        # ... Document metadata → readable lines ...

Important: run bench from the shell, not inside the chat REPL otherwise your shell command becomes just another user message (ask me how I know).

Full source in repo: rag_lab/cli.py

`rag_lab/bench.py`

ensure_dataset uploads evals/qa.yaml once; client.evaluate runs each strategy against the same examples.

Retrieval score did we surface a URL containing the expected substring?

def retrieval_recall(run: Any, example: Any) -> EvaluationResult:
    ref = (example.outputs or {}) if example else {}
    expected_url = ref.get("expected_source_url_contains") or ""
    if not expected_url:
        return EvaluationResult(key="retrieval_recall", score=1.0)
    outs = run.outputs or {}
    urls = outs.get("retrieved_urls") or []
    hit = any(expected_url in u for u in urls)
    return EvaluationResult(key="retrieval_recall", score=1.0 if hit else 0.0)

Correctness LLM-as-judge against an expected substring (case-flexible intent):

def llm_judge_correctness(run: Any, example: Any) -> EvaluationResult:
    # ...
    verdict = judge.invoke([HumanMessage(content=prompt)])
    text = (verdict.content or "").strip().lower()
    ok = text.startswith("y")
    return EvaluationResult(key="correctness", score=1.0 if ok else 0.0)

Full source in repo: rag_lab/bench.py

`evals/qa.yaml`

Twelve labeled examples (factual, multi-hop, negative). Two excerpt rows:

- id: nlp-conda
  question: What conda environment name does the intelligent support ticket routing post recommend creating?
  expected_answer_substring: ticket
  expected_source_url_contains: intelligent-support-ticket-routing
  category: factual

- id: negative-out-of-scope
  question: What Kubernetes operator pattern does the author's DEV feed recommend for production clusters?
  expected_answer_substring: do not know
  expected_source_url_contains: ""
  category: negative

Full source in repo: evals/qa.yaml

`rag_lab/main.py` and `rag_lab/init.py`

Package entry and version string small files; link for completeness.

Exact log output (full transcript from one session)

Below is the complete terminal output from my machine: first pulling models and ingesting the corpus, then running the same three questions under each strategy (agent, then chain, then crag) in one REPL session. I did not trim the Assistant replies or the httpx lines so you can see exactly what showed up.

Setup: Ollama pulls + ingest

(.venv) PS C:\Users\ndlov\Documents\Build-a-RAG agent-with-LangChain> ollama pull nomic-embed-text
pulling manifest
pulling 970aa74c0a90: 100% ▕███████████████████████████████████████████████████████████████████████████▏ 274 MB
pulling c71d239df917: 100% ▕███████████████████████████████████████████████████████████████████████████▏  11 KB
pulling ce4a164fc046: 100% ▕███████████████████████████████████████████████████████████████████████████▏   17 B
pulling 31df23ea7daa: 100% ▕███████████████████████████████████████████████████████████████████████████▏  420 B
verifying sha256 digest
writing manifest
success
(.venv) PS C:\Users\ndlov\Documents\Build-a-RAG agent-with-LangChain> ollama pull qwen2.5:7b
pulling manifest
pulling 2bada8a74506: 100% ▕███████████████████████████████████████████████████████████████████████████▏ 4.7 GB
pulling 66b9ea09bd5b: 100% ▕███████████████████████████████████████████████████████████████████████████▏   68 B
pulling eb4402837c78: 100% ▕███████████████████████████████████████████████████████████████████████████▏ 1.5 KB
pulling 832dd9e00a68: 100% ▕███████████████████████████████████████████████████████████████████████████▏  11 KB
pulling 2f15b3218f05: 100% ▕███████████████████████████████████████████████████████████████████████████▏  487 B
verifying sha256 digest
writing manifest
success
(.venv) PS C:\Users\ndlov\Documents\Build-a-RAG agent-with-LangChain> python rag_agent.py ingest
01:55:20 INFO    rag_lab.config | LangSmith tracing enabled → https://smith.langchain.com (project=rag-agent-tutorial)
01:55:29 INFO    rag_lab.cli | Ingest complete: {'articles_found': 10, 'articles_skipped_unchanged': 10, 'articles_updated': 0, 'chunks_written': 0}
{'articles_found': 10, 'articles_skipped_unchanged': 10, 'articles_updated': 0, 'chunks_written': 0}

First strategy: `agent` (tool-calling)

(.venv) PS C:\Users\ndlov\Documents\Build-a-RAG agent-with-LangChain> python rag_agent.py chat --strategy agent
01:56:06 INFO    rag_lab.config | LangSmith tracing enabled → https://smith.langchain.com (project=rag-agent-tutorial)
Thread: 6711ee83-c6ec-41d7-bba3-b188475f5575  |  strategy: agent
Commands: /help /sources /reset /rebuild /threads /resume <id> /strategy <name> /quit

You> What conda environment name does the ticket routing article recommend?
01:57:21 INFO    httpx | HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
01:57:25 INFO    httpx | HTTP Request: POST http://localhost:11434/api/embed "HTTP/1.1 200 OK"
01:58:43 INFO    httpx | HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"

Assistant> The article recommends using a conda environment named `ticket-nlp` for setting up the development environment. You can create this environment by running:

conda create -n ticket-nlp python=3.10 -y
conda activate ticket-nlp

You can find more details in the "Prerequisites" section of the article titled [Intelligent Support Ticket Routing with Natural Language Processing (NLP)](https://dev.to/fortune-ndlovu/intelligent-support-ticket-routing-with-natural-language-processing-nlp-57g1).

Sources:
  [1] Intelligent Support Ticket Routing with Natural Language Processing (NLP) — 1. Problem Statement and Goal
      https://dev.to/fortune-ndlovu/intelligent-support-ticket-routing-with-natural-language-processing-nlp-57g1
  [2] Intelligent Support Ticket Routing with Natural Language Processing (NLP) — 9. Conclusion
      https://dev.to/fortune-ndlovu/intelligent-support-ticket-routing-with-natural-language-processing-nlp-57g1
  [3] Intelligent Support Ticket Routing with Natural Language Processing (NLP) — 3. Load and Explore Dataset
      https://dev.to/fortune-ndlovu/intelligent-support-ticket-routing-with-natural-language-processing-nlp-57g1
  [4] Intelligent Support Ticket Routing with Natural Language Processing (NLP) — 2. Prerequisites
      https://dev.to/fortune-ndlovu/intelligent-support-ticket-routing-with-natural-language-processing-nlp-57g1

LangSmith project: https://smith.langchain.com (project=rag-agent-tutorial)

You> What class name is used for the CNN in the image classification post?
01:59:41 INFO    httpx | HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
01:59:44 INFO    httpx | HTTP Request: POST http://localhost:11434/api/embed "HTTP/1.1 200 OK"
02:01:19 INFO    httpx | HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"

Assistant> The class name used for the CNN in the image classification post is `CNN`. This can be seen in the following code snippet from the article:


class CNN(nn.Module):
def __init__(self):
    super(CNN, self).__init__()
    # Layer definitions...

You can find this information in the "Define the CNN Model" section of the article titled [Image Classification with Convolutional Neural Networks (CNNs)](https://dev.to/fortune-ndlovu/image-classification-and-convolutional-neural-networks-cnns-4bdl).

Sources:
  [1] Image Classification with Convolutional Neural Networks (CNNs) — Define the CNN Model
      https://dev.to/fortune-ndlovu/image-classification-and-convolutional-neural-networks-cnns-4bdl
  [2] Image Classification with Convolutional Neural Networks (CNNs) — Conclusion
      https://dev.to/fortune-ndlovu/image-classification-and-convolutional-neural-networks-cnns-4bdl
  [3] Image Classification with Convolutional Neural Networks (CNNs) — Reshape for CNN Input
      https://dev.to/fortune-ndlovu/image-classification-and-convolutional-neural-networks-cnns-4bdl
  [4] Install Red Hat Developer hub (RHDH) in a fully air-gapped Minikube environment — 1, Download and Save Container Images (Locally)  
      https://dev.to/fortune-ndlovu/install-red-hat-developer-hub-rhdh-in-a-fully-air-gapped-minikube-environment-4ngk

LangSmith project: https://smith.langchain.com (project=rag-agent-tutorial)

You> Summarize how the MNIST example sets up the model.
02:01:56 INFO    httpx | HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"

Assistant> The MNIST example in the article [Image Classification with Convolutional Neural Networks (CNNs)] sets up the model using a class named `CNN`. Here's a summary of how the model is defined and structured:

1. **Class Definition**: The `CNN` class inherits from `nn.Module`.

2. **Convolutional Layers**:
   - `conv1`: A convolutional layer with 32 filters, each of size 3x3.
   - `conv2`: Another convolutional layer with 64 filters, each of size 3x3.

3. **Pooling Layer**:
   - `pool`: A max pooling layer that reduces the spatial dimensions by a factor of 2.

4. **Flattening**:
   - The output from the convolutional layers is flattened into a one-dimensional vector to be fed into fully connected layers.

5. **Fully Connected Layers**:
   - `fc1`: A fully connected layer with 64 neurons.
   - `fc2`: Another fully connected layer with 10 neurons, corresponding to the 10 possible digit classes (0 through 9).

6. **Dropout and Non-linearity**:
   - A dropout layer is used to prevent overfitting.
   - ReLU activations are applied after convolutional layers for non-linear transformations.

Here's a simplified version of the model definition:

import torch.nn as nn

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.fc1 = nn.Linear(in_features=64 * 7 * 7, out_features=64)
        self.dropout = nn.Dropout(p=0.5)
        self.fc2 = nn.Linear(in_features=64, out_features=10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 64 * 7 * 7)
        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)
        return x

This model processes input images by first applying convolutional layers to extract features, then using pooling to reduce dimensionality, and finally passing the flattened feature maps through fully connected layers to produce class scores for the 10 digit classes.        

Sources:
  (no retrieve_context tool calls in this turn — see LangSmith trace for internal retrieval.)

LangSmith project: https://smith.langchain.com (project=rag-agent-tutorial)

Second strategy: `chain` (always-retrieve)

You> /strategy chain
Strategy -> chain

You> What conda environment name does the ticket routing article recommend?
02:04:04 INFO    httpx | HTTP Request: POST http://localhost:11434/api/embed "HTTP/1.1 200 OK"
02:07:40 INFO    httpx | HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"

Assistant> The ticket routing article recommends using a conda environment named `ticket-nlp` for setting up the development environment. You can create this environment by running:

conda create -n ticket-nlp python=3.10 -y
conda activate ticket-nlp

You can find more details in the "Prerequisites" section of the article [Intelligent Support Ticket Routing with Natural Language Processing (NLP)](https://dev.to/fortune-ndlovu/intelligent-support-ticket-routing-using-natural-language-processing-4b7e).

Sources:
  (no retrieve_context tool calls in this turn — see LangSmith trace for internal retrieval.)

LangSmith project: https://smith.langchain.com (project=rag-agent-tutorial)

You> What class name is used for the CNN in the image classification post?
02:08:13 INFO    httpx | HTTP Request: POST http://localhost:11434/api/embed "HTTP/1.1 200 OK"
02:11:54 INFO    httpx | HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"

Assistant> The class name used for the CNN in the image classification post is `CNN`. This can be seen in the following code snippet:

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        # Layer definitions...

This definition is found in the "Define the CNN Model" section of the article [Image Classification with Convolutional Neural Networks (CNNs)](https://dev.to/fortune-ndlovu/image-classification-and-convolutional-neural-networks-cnns-4bdl).

Sources:
  (no retrieve_context tool calls in this turn — see LangSmith trace for internal retrieval.)

LangSmith project: https://smith.langchain.com (project=rag-agent-tutorial)

You> Summarize how the MNIST example sets up the model.
02:12:31 INFO    httpx | HTTP Request: POST http://localhost:11434/api/embed "HTTP/1.1 200 OK"
02:14:59 INFO    httpx | HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"

Assistant> The MNIST example in the article [Image Classification with Convolutional Neural Networks (CNNs)] sets up the model using a `CNN` class. Here’s a summary of the model setup:

1. **Class Definition**: The `CNN` class inherits from `nn.Module`.

2. **Convolutional Layers**:
   - `conv1`: A convolutional layer with 32 filters, each of size 3x3.
   - `conv2`: Another convolutional layer with 64 filters, each of size 3x3.

3. **Pooling Layer**:
   - `pool`: A max pooling layer that reduces the spatial dimensions by a factor of 2.

4. **Fully Connected Layers**:
   - `fc1`: A fully connected layer with 64 neurons.
   - `fc2`: Another fully connected layer with 10 neurons, corresponding to the 10 possible digit classes (0 through 9).

5. **Dropout and Non-linearity**:
   - A dropout layer is used to prevent overfitting.
   - ReLU activations are applied after convolutional layers for non-linear transformations.

Here’s a simplified version of the model definition:

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.fc1 = nn.Linear(in_features=64 * 7 * 7, out_features=64)
        self.dropout = nn.Dropout(p=0.5)
        self.fc2 = nn.Linear(in_features=64, out_features=10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 64 * 7 * 7)
        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)
        return x

This model processes input images by first applying convolutional layers to extract features, then using pooling to reduce dimensionality, and finally passing the flattened feature maps through fully connected layers to produce class scores for the 10 digit classes.        

Sources:
  (no retrieve_context tool calls in this turn — see LangSmith trace for internal retrieval.)

LangSmith project: https://smith.langchain.com (project=rag-agent-tutorial)

Third strategy: `crag` (corrective RAG)

You> /strategy crag 
Strategy -> crag

You> What conda environment name does the ticket routing article recommend?
02:17:40 INFO    httpx | HTTP Request: POST http://localhost:11434/api/embed "HTTP/1.1 200 OK"
02:18:50 INFO    httpx | HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
02:18:54 INFO    httpx | HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
02:18:58 INFO    httpx | HTTP Request: POST http://localhost:11434/api/embed "HTTP/1.1 200 OK"
02:20:05 INFO    httpx | HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
02:21:12 INFO    httpx | HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"

Assistant> The ticket routing article recommends the conda environment name `ticket-nlp`. This can be seen in the passage where it states:

conda create -n ticket-nlp python=3.10 -y
conda activate ticket-nlp

This command creates and activates a new conda environment named `ticket-nlp`.

Sources:
  (no retrieve_context tool calls in this turn — see LangSmith trace for internal retrieval.)

LangSmith project: https://smith.langchain.com (project=rag-agent-tutorial)

You> What class name is used for the CNN in the image classification post?
02:21:31 INFO    httpx | HTTP Request: POST http://localhost:11434/api/embed "HTTP/1.1 200 OK"
02:22:44 INFO    httpx | HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
02:23:56 INFO    httpx | HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"

Assistant> The class name used for the CNN in the image classification post is `CNN`. This can be seen in the code snippet provided within the passage where the class definition starts with `class CNN(nn.Module):`.

Sources:
  (no retrieve_context tool calls in this turn — see LangSmith trace for internal retrieval.)

LangSmith project: https://smith.langchain.com (project=rag-agent-tutorial)

You> Summarize how the MNIST example sets up the model.                    
02:24:14 INFO    httpx | HTTP Request: POST http://localhost:11434/api/embed "HTTP/1.1 200 OK"
02:25:43 INFO    httpx | HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
02:25:46 INFO    httpx | HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
02:25:51 INFO    httpx | HTTP Request: POST http://localhost:11434/api/embed "HTTP/1.1 200 OK"
02:27:13 INFO    httpx | HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
02:28:35 INFO    httpx | HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"

Assistant> The MNIST example sets up a CNN model with the following architecture:

1. It uses convolutional layers (`conv1` and `conv2`) to detect local features such as edges and curves in input images.
2. Max pooling layers are used to reduce spatial dimensions, making the model more efficient and helping to prevent overfitting.
3. The output from the convolutional layers is flattened.
4. Fully connected layers (`fc1` and `fc2`) process the flattened data to make predictions.
5. ReLU activations provide non-linearity.
6. A dropout layer is included to further reduce overfitting.

This model transforms input images into class scores representing digits 0 through 9, effectively classifying handwritten digit images from the MNIST dataset.

Sources:
  (no retrieve_context tool calls in this turn — see LangSmith trace for internal retrieval.)

LangSmith project: https://smith.langchain.com (project=rag-agent-tutorial)

What you can already see in the logs (before any benchmark table)

The three strategies are not the same execution pattern.

agent: You get multiple POST .../api/chat calls per question because the model decides what to do next, and POST .../api/embed when it calls retrieve_context. The Sources panel lists real ToolMessage artifacts when retrieval ran and you can see noise too (for example, one retrieved hit points at an unrelated article URL in the CNN question). That is normal top-k behavior; benchmarks quantify whether it hurts answers.
chain: Each question shows a simple rhythm: one embed (similarity search) and one chat completion. There are no tool-call artifacts, so the CLI prints the “no retrieve_context tool calls” line and points you to LangSmith for internal retrieval. Compare the link in the conda answer to the agent run: the URL slug can differ while the prose still sounds confident.
crag: Count the HTTP lines on the first question: extra /chat calls align with grade (and sometimes rewrite) before generate, and you may see two embed rounds when retrieval runs again after a rewrite. CRAG can spend more time per question than chain.

Don’t run benchmarks from inside the chat REPL

After the CRAG transcript above, I accidentally pasted python rag_agent.py bench --strategy all as chat input (the small “what not to do” block right under this paragraph). The model answered from posts not what you want for LangSmith eval. Exit with /quit, then run bench from the shell; full benchmark logs go in the next section once your run finishes.

What not to do (mistyped `bench` inside the REPL)

You> python rag_agent.py bench --strategy all
02:29:07 INFO    httpx | HTTP Request: POST http://localhost:11434/api/embed "HTTP/1.1 200 OK"
02:30:20 INFO    httpx | HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
02:30:23 INFO    httpx | HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
02:30:27 INFO    httpx | HTTP Request: POST http://localhost:11434/api/embed "HTTP/1.1 200 OK"
02:32:11 INFO    httpx | HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
02:32:14 INFO    httpx | HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
02:32:17 INFO    httpx | HTTP Request: POST http://localhost:11434/api/embed "HTTP/1.1 200 OK"
02:33:31 INFO    httpx | HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
02:34:45 INFO    httpx | HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"

Assistant> I do not know. The provided passages do not contain any information related to the command `python rag_agent.py bench --strategy all`. There is no context or code snippet that describes what this command does.

Sources:
  (no retrieve_context tool calls in this turn — see LangSmith trace for internal retrieval.)

Benchmarks (results)

The transcript above is the qualitative half (same three questions, three strategies). This section is the quantitative half what one automated evaluation run looked like when scored against a small golden set (evals/qa.yaml). The screenshot is enough for readers to see the shape of the results; you do not need my hosted tracing UI to follow the story.

What the numbers say (tool-calling `agent`)

Twelve questions, same dataset as the rest of the lab:

Metric	Value
Correctness (LLM-as-judge mean)	0.75
Retrieval (expected source URL present in retrieved URLs)	1.00
Latency (P50)	134.36 s
Total tokens	44,422 (input 41,661 \| output 2,761)
Cost	$0.00 with local chat/embeddings and a hosted judge configured as in `.env.example`

Finding: retrieval hit 1.00 but correctness only 0.75. In plain terms, the system usually pulled in the right article (URL checks passed), but the answer still failed the judge 3 times out of 12 wrong extraction, wrong paraphrase, or the model ignored what it had. So the bottleneck for this run was generation and grounding, not “find the post.” Latency was heavy (roughly 58 s–388 s per row in the UI; ~134 s median) because each example combined slow local inference with multi-step agent behavior plus a judge call.

`chain` and `crag` on the same harness

I ran chain and CRAG through the same benchmark script and YAML; wall-clock and judge quotas made those runs noisier to interpret than the agent screenshot, so I am not pasting a second dashboard here. Qualitatively they match what we already saw in the REPL logs: chain is the steady “one retrieval, one reply” pattern; CRAG fires extra model steps when it grades or rewrites before answering.

What I’d actually choose

If you want the simplest thing that behaves: chain. Always-retrieve keeps behavior predictable and avoids tool-loop variance good default when your corpus is stable and questions look like Q&A over passages.
If you want citations and optional browsing: agent. Tools plus artifacts are worth it when users jump between “list my posts,” “pull this URL,” and “quote this section.” The benchmark above shows retrieval can look perfect while answers still slip so don’t assume tools fix sloppy prompts or a weak chat model.
If retrieval quality is your pain: crag. Grading and query rewrite cost more calls; use it when evals show bad first hits, not when the model already gets the right chunks but misreads them.

My takeaway for this project: with retrieval already at 1.00 on this slice, I would upgrade answer quality first (stronger local model, stricter “cite or abstain” prompts, maybe a reranker) before I chase a fancier graph. Chain remains my practical default for a small personal corpus; agent stays the demo-friendly mode with the sources panel; CRAG is the key I turn on when similarity search alone stops matching the question.

Twelve rows is only enough to compare wiring, not to crown a forever winner rerun when you change models or grow qa.yaml.

Thank you for reading!

Follow me on LinkedIn.

Build an AI Chatbot Backend in Rust: Step-by-Step Tutorial

Fortune Ndlovu — Tue, 27 Jan 2026 15:24:56 +0000

What We're Building and Why

In this tutorial, we'll build a complete AI chatbot backend from scratch using Rust. You'll learn both Rust programming concepts and AI API integration as we create a REST API that connects to Google's Gemini AI.

Reference project: https://github.com/Fortune-Ndlovu/rust-ai-chatbot/tree/main

What is Rust?
Rust is a high-performance systems programming language designed by Graydon Hoare at Mozilla in 2006 as a safer, modern alternative to C++. It achieves memory safety and thread concurrency without a garbage collector by using a strict compiler that eliminates common bugs at compile time. You can explore its development history through the Rust Foundation.

What You'll Build:

A REST API server that accepts chat messages
Integration with Google Gemini AI (free tier available)
An interactive CLI client to chat with your bot
Proper error handling and input validation
A working project you can play with

AI Concepts You'll Learn:

How to integrate with Large Language Model (LLM) APIs
Understanding API request/response formats
Handling AI model responses and errors
Building a conversational interface

Prerequisites

Before starting this tutorial, you should have:

Rust installed - Install Rust (latest stable version recommended)
Basic terminal/command line knowledge - Comfortable running commands in terminal
Text editor or IDE - VS Code, IntelliJ IDEA, or any Rust-capable editor
Google account - For accessing Gemini API (free tier available)

Helpful but not required:

Basic programming experience (any language)
Familiarity with HTTP/APIs (we'll explain as we go)
Understanding of JSON format (we'll cover it)

Verifying Rust Installation:

rustc --version    # Should show Rust version
cargo --version    # Should show Cargo version

If you don't have Rust installed, follow the official installation guide. The installation includes both rustc (compiler) and cargo (package manager).

Let's get started!

Project Structure

Before we dive into the code, here's the complete project structure you'll be building:

rust-ai-chatbot/
├── Cargo.toml              # Project manifest: defines dependencies and metadata
├── Cargo.lock              # Lock file: pins exact dependency versions (auto-generated)
├── .env                    # Environment variables: stores your Gemini API key (not in git)
├── .env.example            # Template for .env file: shows required environment variables
├── .gitignore              # Git ignore rules: excludes build artifacts and secrets
├── README.md               # Project documentation: quick start guide and usage
├── BLOG.md                 # This tutorial: complete step-by-step guide
└── src/
    ├── main.rs             # Main server code: REST API, routes, and Gemini integration
    └── cli.rs              # CLI client module: interactive chat interface

Now that you understand the structure, let's build it step by step!

Step 1: Project Setup

Creating the Project

Open your terminal and run:

cargo new rust-ai-chatbot
cd rust-ai-chatbot

What cargo new does:

Creates a new Rust project directory
Initializes a Git repository
Creates Cargo.toml (project manifest)
Sets up src/main.rs with a basic template

Understanding Cargo:
Cargo is Rust's package manager and build tool (like npm for Node.js or pip for Python). It streamlines the development process by automating tasks such as downloading libraries, compiling code, and managing project dependencies.

Step 2: Configuring Dependencies (Cargo.toml)

Create or edit Cargo.toml with the following content:

[package]
name = "rust-ai-chatbot"
version = "0.1.0"
edition = "2021"

[dependencies]
axum = "0.7"
tokio = { version = "1", features = ["full"] }
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
reqwest = { version = "0.11", features = ["json"] }
dotenv = "0.15"
tower = "0.4"
tower-http = { version = "0.5", features = ["cors"] }

Let's understand each dependency:

axum = "0.7" - Web framework
- Rust concept: Modern async web framework built on Tokio
- What it does: Handles HTTP routing, request parsing, response formatting
- Why we need it: To build our REST API endpoints
tokio = { version = "1", features = ["full"] } - Async runtime
- Rust concept: Enables async/await syntax in Rust
- What it does: Provides runtime for concurrent operations
- Why we need it: To handle multiple requests without blocking threads
- Allows our server to handle multiple chat requests simultaneously
serde = { version = "1.0", features = ["derive"] } - Serialization
- Rust concept: Framework for converting Rust types to/from formats like JSON
- What it does: Auto-generates code to serialize/deserialize structs
- Why we need it: To convert between Rust structs and JSON (API format)
- LLM APIs use JSON, so we need to convert our Rust data to JSON
serde_json = "1.0" - JSON support
- Rust concept: JSON implementation for Serde
- What it does: Handles JSON parsing and generation
- Why we need it: Gemini API communicates in JSON format
reqwest = { version = "0.11", features = ["json"] } - HTTP client
- Rust concept: Async HTTP client library
- What it does: Makes HTTP requests to external APIs
- Why we need it: To call the Gemini API
- This is how we send prompts to the AI and receive responses
dotenv = "0.15" - Environment variables
- Rust concept: Loads .env files
- What it does: Reads environment variables from .env file
- Why we need it: To securely store API keys (never commit to git!)
tower and tower-http - Middleware
- Rust concept: Middleware framework for HTTP
- What it does: Adds cross-cutting concerns (CORS, logging, etc.)
- Why we need it: For production-ready features

When working with AI APIs, you typically need HTTP client (reqwest) to make API calls, JSON serialization (serde) to format requests/responses
and Async runtime (tokio) to handle concurrent requests efficiently.

Step 3: Environment Configuration

Create a .env file in your project root:

# Google Gemini API Key
# Get your free API key from: https://makersuite.google.com/app/apikey
GEMINI_API_KEY=your_api_key_here

Note on API Keys

LLM APIs require authentication via API keys
Never commit API keys to version control
Store them in .env files (which should be in .gitignore)
Free tier APIs (like Gemini) have rate limits, but are great for learning

Getting Your Gemini API Key:

Visit https://makersuite.google.com/app/apikey
Sign in with your Google account
Click "Create API Key"
Copy the key and paste it in .env (replace your_api_key_here)

For more information on env vars, see working with environment variables in Rust.

Step 4: The Main Server Code (src/main.rs)

Now let's build the complete server. Here's the full src/main.rs file:

mod cli;

use axum::{
    extract::Json,
    http::StatusCode,
    response::Json as ResponseJson,
    routing::{get, post},
    Router,
};
use serde::{Deserialize, Serialize};
use std::env;
use std::time::Duration;

#[derive(Deserialize)]
struct ChatRequest {
    message: String,
}

#[derive(Serialize)]
struct ChatResponse {
    response: String,
}

#[derive(Serialize)]
struct ErrorResponse {
    error: String,
}

// Gemini API request structures
#[derive(Serialize)]
struct GeminiRequest {
    contents: Vec<Content>,
}

#[derive(Serialize, Deserialize)]
struct Content {
    parts: Vec<Part>,
}

#[derive(Serialize, Deserialize)]
struct Part {
    text: String,
}

// Gemini API response structures
#[derive(Deserialize)]
struct GeminiResponse {
    candidates: Vec<Candidate>,
}

#[derive(Deserialize)]
struct Candidate {
    content: Content,
}

// Gemini API error response
#[derive(Deserialize)]
struct GeminiErrorResponse {
    error: GeminiError,
}

#[derive(Deserialize)]
struct GeminiError {
    code: u16,
    message: String,
    status: String,
}

async fn call_gemini_api(message: &str) -> Result<String, String> {
    let api_key = env::var("GEMINI_API_KEY")
        .map_err(|_| "GEMINI_API_KEY not found in environment variables".to_string())?;

    // Try different models - Use actual available models from API
    // Models that support generateContent: gemini-2.5-flash, gemini-flash-latest, gemini-pro-latest, etc.
    let models = ["gemini-2.5-flash", "gemini-flash-latest", "gemini-pro-latest", "gemini-2.0-flash"];
    let api_versions = ["v1beta", "v1"];

    // Create HTTP client with timeout
    let client = reqwest::Client::builder()
        .timeout(Duration::from_secs(30))
        .build()
        .map_err(|e| format!("Failed to create HTTP client: {}", e))?;

    let request_body = GeminiRequest {
        contents: vec![Content {
            parts: vec![Part {
                text: message.to_string(),
            }],
        }],
    };

    // Try different API versions and models
    for api_version in &api_versions {
        for model in &models {
            let url = format!(
                "https://generativelanguage.googleapis.com/{}/models/{}:generateContent?key={}",
                api_version, model, api_key
            );

            let response = match client
                .post(&url)
                .json(&request_body)
                .send()
                .await
            {
                Ok(resp) => {
                    eprintln!("Trying: {} (model: {})", api_version, model);
                    resp
                }
                Err(e) => {
                    eprintln!("Failed to send request to {}: {}", url, e);
                    continue;
                }
            };

            let status = response.status();
            let status_code = status.as_u16();

            if status.is_success() {
                match response.json::<GeminiResponse>().await {
                    Ok(gemini_response) => {
                        if let Some(text) = gemini_response
                            .candidates
                            .first()
                            .and_then(|c| c.content.parts.first())
                            .map(|p| p.text.clone())
                        {
                            eprintln!("Success with {} / {}", api_version, model);
                            return Ok(text);
                        }
                    }
                    Err(e) => {
                        eprintln!("Failed to parse response from {}: {}", url, e);
                        continue;
                    }
                }
            } else {
                // Read response text first (can only consume once)
                let error_text = response.text().await.unwrap_or_else(|_| "Unknown error".to_string());

                // Try to parse as structured error
                if let Ok(error_response) = serde_json::from_str::<GeminiErrorResponse>(&error_text) {
                    eprintln!(
                        "API error from {} / {}: {} ({}): {}",
                        api_version, model, error_response.error.status, error_response.error.code, error_response.error.message
                    );
                } else {
                    eprintln!("HTTP {} from {} / {}: {}", status_code, api_version, model, error_text);
                }
                // Continue to next model/version
                continue;
            }
        }
    }

    Err("Failed to get response from Gemini API. Please check your API key and model availability.".to_string())
}

async fn chat_handler(Json(payload): Json<ChatRequest>) -> Result<ResponseJson<ChatResponse>, (StatusCode, ResponseJson<ErrorResponse>)> {
    if payload.message.trim().is_empty() {
        return Err((
            StatusCode::BAD_REQUEST,
            ResponseJson(ErrorResponse {
                error: "Message cannot be empty".to_string(),
            }),
        ));
    }

    // Validate message length
    if payload.message.len() > 10000 {
        return Err((
            StatusCode::BAD_REQUEST,
            ResponseJson(ErrorResponse {
                error: "Message is too long (max 10000 characters)".to_string(),
            }),
        ));
    }

    match call_gemini_api(&payload.message).await {
        Ok(response) => Ok(ResponseJson(ChatResponse { response })),
        Err(e) => {
            eprintln!("Error calling Gemini API: {}", e);
            Err((
                StatusCode::INTERNAL_SERVER_ERROR,
                ResponseJson(ErrorResponse { error: e }),
            ))
        }
    }
}

#[tokio::main]
async fn main() {
    // Load environment variables
    dotenv::dotenv().ok();

    // Check for CLI mode
    let args: Vec<String> = std::env::args().collect();
    if args.len() > 1 && (args[1] == "chat" || args[1] == "cli" || args[1] == "--chat" || args[1] == "--cli") {
        cli::run_interactive_chat().await;
        return;
    }

    // Verify API key is set
    if env::var("GEMINI_API_KEY").is_err() {
        eprintln!("Warning: GEMINI_API_KEY not found in environment variables");
        eprintln!("   Please create a .env file with your API key");
    }

    // Health check endpoint
    async fn health() -> &'static str {
        "OK"
    }

    // Build the application router
    let app = Router::new()
        .route("/", get(health))
        .route("/health", get(health))
        .route("/chat", post(chat_handler));

    // Run the server
    let listener = tokio::net::TcpListener::bind("0.0.0.0:3000")
        .await
        .expect("Failed to bind to port 3000. Is another server running?");

    println!("🚀 Server running on http://localhost:3000");
    println!("📝 POST to http://localhost:3000/chat with {{ \"message\": \"your message\" }}");
    println!("💡 Health check: http://localhost:3000/health");

    axum::serve(listener, app)
        .await
        .expect("Server failed to start");
}

Now let's explore how this code works and what makes Rust so effective for building AI backends.

When you look at the imports at the top, you'll notice we're using modules to organize our code. The mod cli; declaration tells Rust to look for a file called cli.rs in the same directory. This keeps our code organized without needing complex directory structures. The use statements bring types into scope, so instead of writing axum::Router::new() everywhere, we can just write Router::new(). It's a small thing, but it makes the code much more readable.

The interesting part is how Axum's Json extractor works. When a request comes in with JSON data, Axum automatically converts it into our ChatRequest struct. If the JSON doesn't match our struct shape, Axum returns a 400 error before our handler even runs. This is what people mean when they talk about Rust's type safety: the compiler and framework work together to catch errors before they become runtime problems. You can learn more about Rust's type system in the official book.

Our struct definitions use derive macros to automatically generate serialization code. When you write #[derive(Deserialize)], Rust generates all the code needed to convert JSON into your struct at compile time. There's no runtime reflection or dynamic parsing; the compiler knows exactly what fields your struct has and generates optimized code to parse them. This means when JSON like {"message": "hello"} arrives, it becomes a ChatRequest { message: "hello" } struct with zero runtime overhead.

The Gemini API structures show how Rust handles nested data. We have GeminiRequest containing a vector of Content, which contains a vector of Part, which contains a String. This mirrors the JSON structure the API expects, where you have arrays of conversation turns that can contain multiple parts (text, images, etc.). The Vec type is Rust's growable array, similar to lists in other languages, but with compile-time guarantees about memory safety.

What's neat about the call_gemini_api function is how it handles errors. The return type Result<String, String> means the function can either succeed with a String response or fail with a String error message. This forces you to handle both cases; you can't accidentally ignore an error like you might in languages with exceptions. The ? operator is Rust's way of saying "if this fails, return the error immediately." It's syntactic sugar for a common pattern, but it makes error handling feel natural rather than tedious. Learn more about error handling in Rust.

The model fallback logic demonstrates Rust's iteration capabilities. We try multiple API versions and models in nested loops, and if one fails, we just continue to the next one. The match expression handles the HTTP response, checking if it's successful or an error. What's interesting is that Rust's pattern matching is exhaustive; the compiler won't let you compile code that doesn't handle all possible cases. This means you can't accidentally forget to handle an error condition.

When we process the response, we use Option chaining to safely navigate through nested data. The .first() method returns an Option, which is Rust's way of saying "this might not exist." Then .and_then() says "if this exists, apply this function, otherwise return None." This pattern lets us safely extract the text from deep inside the response structure without worrying about null pointer exceptions or index out of bounds errors. The compiler ensures we handle the case where any part of the chain might be missing.

The chat_handler function shows how Axum's extractors work. The parameter Json(payload) tells Axum to automatically deserialize the request body into a ChatRequest. If the JSON is malformed or missing required fields, Axum handles it before our code runs. The return type is interesting too; we can return either a successful response or a tuple of status code and error response. This gives us fine-grained control over HTTP semantics while keeping everything type-safe.

Input validation happens before we even call the AI. We check if the message is empty or too long, returning appropriate HTTP status codes. This is defensive programming; we validate early and fail fast with clear error messages. The .trim() method removes whitespace, and .len() gives us the byte length of the string. These are simple operations, but they prevent us from wasting API calls on invalid input.

The main function uses the #[tokio::main] attribute macro, which transforms our async main function into a regular synchronous main that sets up the Tokio runtime. This is how Rust enables async/await syntax; the runtime handles all the scheduling and task management behind the scenes. When we call .await on something, the runtime can pause that task and work on other tasks while waiting for I/O to complete. This is how we handle thousands of concurrent requests without creating thousands of threads.

Building the router is straightforward; we just chain .route() calls to define our endpoints. Each route maps a path and HTTP method to a handler function. The compiler verifies that our handlers have the correct signatures, so we can't accidentally wire up a handler that expects different parameters. When we start the server with axum::serve(), it begins listening on port 3000 and handling incoming requests. The server runs until the process exits, and thanks to Tokio's async runtime, it can handle many requests concurrently without blocking.

Step 5: The Interactive CLI Client (src/cli.rs)

Create src/cli.rs with this complete code:

use std::io::{self, Write};

pub async fn run_interactive_chat() {
    println!("🤖 Rust AI Chatbot - Interactive Mode");
    println!("Type 'exit' or 'quit' to end the conversation\n");

    loop {
        print!("You: ");
        io::stdout().flush().unwrap();

        let mut input = String::new();
        match io::stdin().read_line(&mut input) {
            Ok(_) => {
                let message = input.trim();

                if message.is_empty() {
                    continue;
                }

                if message.eq_ignore_ascii_case("exit") || message.eq_ignore_ascii_case("quit") {
                    println!("👋 Goodbye!");
                    break;
                }

                // Send request to local server
                match send_chat_request(message).await {
                    Ok(response) => {
                        println!("Bot: {}\n", response);
                    }
                    Err(e) => {
                        eprintln!("Error: {}\n", e);
                    }
                }
            }
            Err(error) => {
                eprintln!("Error reading input: {}\n", error);
                break;
            }
        }
    }
}

async fn send_chat_request(message: &str) -> Result<String, String> {
    let client = reqwest::Client::new();
    let body = serde_json::json!({
        "message": message
    });

    let response = client
        .post("http://localhost:3000/chat")
        .json(&body)
        .send()
        .await
        .map_err(|e| format!("Failed to connect to server: {}. Make sure the server is running on port 3000.", e))?;

    if !response.status().is_success() {
        let error_text = response.text().await.unwrap_or_else(|_| "Unknown error".to_string());
        return Err(format!("Server error: {}", error_text));
    }

    let chat_response: serde_json::Value = response
        .json()
        .await
        .map_err(|e| format!("Failed to parse response: {}", e))?;

    chat_response["response"]
        .as_str()
        .map(|s| s.to_string())
        .ok_or_else(|| "Invalid response format".to_string())
}

The CLI code is simpler than the server, but it shows how Rust handles interactive I/O. The loop keyword creates an infinite loop, and we use print! instead of println! so the cursor stays on the same line. The .flush() call forces the output to appear immediately; without it, you might not see the prompt until after you type something. The .unwrap() here is fine for a CLI tool since we want it to crash if stdout fails, but you'd be more careful in a server.

Reading user input uses mutable references. The mut keyword makes the variable mutable, and &mut input passes a mutable reference to read_line(). This lets the function modify the string without taking ownership. When the user presses Enter, read_line() fills the string with their input, including the newline, so we call .trim() to remove it.

The match expression handles the result of reading input. If it succeeds, we process the message. If it fails (maybe the terminal closed), we break out of the loop. This is another example of Rust forcing you to handle errors explicitly; you can't accidentally ignore a failed read operation.

When we send the request to the server, we use the same reqwest client and serde_json::json! macro. The json! macro is convenient for creating simple JSON objects inline; it's compile-time checked, so you get errors if you write invalid JSON syntax. The response handling uses the same patterns we saw in the server code: checking status codes, parsing JSON, and handling errors with Result types.

Step 6: Running Your Chatbot

1. Start the Server

In one terminal:

cargo run

You should see:

🚀 Server running on http://localhost:3000
📝 POST to http://localhost:3000/chat with { "message": "your message" }
💡 Health check: http://localhost:3000/health

2. Use Interactive Mode

In another terminal:

cargo run -- chat

You'll see:

🤖 Rust AI Chatbot - Interactive Mode
Type 'exit' or 'quit' to end the conversation

You:

Type a message and get AI responses!

3. Test with HTTP

In PowerShell:

$body = @{message="Hello! Tell me about Rust."} | ConvertTo-Json
Invoke-RestMethod -Uri http://localhost:3000/chat -Method Post -Body $body -ContentType "application/json"

Conclusion

Congratulations! You've built a complete AI chatbot backend in Rust, keep Learning, and building more projects to reinforce concepts.

Happy coding! 🦀

Resources

Install Red Hat Developer Hub with AI Software Templates on OpenShift

Fortune Ndlovu — Thu, 24 Apr 2025 10:02:40 +0000

As AI-based applications have become increasingly prevalent, organizations are seeking ways to standardize and accelerate the development of intelligent software. Red Hat Developer Hub (RHDH), built on Backstage, provides internal developer platforms that enable this through a curated experience. With the addition of AI Software Templates, RHDH delivers a structured approach to bootstrapping AI-driven applications with minimal manual effort.

This blog explores what AI Software Templates are, how to install them, using the redhat-ai-dev/ai-rhdh-installer, with OpenShift GitOps, Pipelines, and specifically on an OpenShift cluster.

What are Software Templates?

Software Templates in RHDH are YAML-defined blueprints that scaffold entire projects, including:

Source code repositories
GitOps repositories
CI/CD pipelines (e.g., Tekton)
Deployment configurations

Templates use the Backstage Scaffolder engine and can be customized to support different languages, frameworks, or domains.

What are AI Software Templates?

AI Software Templates are specialized software templates that:

Scaffold AI-based applications (e.g., RAG chatbots, audio-to-text models, object detection apps)
Integrate LLM inference servers like llama.cpp and vLLM
Include GitOps support for deploying to OpenShift
Use Tekton pipelines for building and deploying containerized AI services
Connect with vector databases like ChromaDB

These templates are fully integrated into the RHDH UI via the "Create" menu, offering a guided experience for users to generate ready-to-deploy AI apps.

Architecture Overview

At a high level:

The user selects an AI Template via the RHDH UI.
Parameters are collected (e.g., GitHub repo, model server choice).
The template executes steps like: fetch:template, publish:github, catalog:register, and argocd:create-resources.
Two repositories are created: the app source and GitOps manifests.
Apps are deployed via Argo CD and managed through GitOps flows.

Prerequisites

Before getting started, ensure you have the following:

A running OpenShift 4.15+ cluster
Access to the OpenShift web console and oc CLI
A GitHub Organization where you can install GitHub Apps
A Quay.io account (optional, for image publishing)
Helm installed: brew install helm or from https://helm.sh
yq v4+ installed (see note below)

Note: The installer requires yq v4+, but Fedora/RHEL may ship yq v3. Use this to install v4+:

sudo curl -L https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64 -o /usr/bin/yq
sudo chmod +x /usr/bin/yq

Clone the AI Installer Repo

git clone https://github.com/redhat-ai-dev/ai-rhdh-installer.git
cd ai-rhdh-installer

Create a GitHub App for RHDH

RHDH integrates with GitHub via a GitHub App. Follow APP-SETUP.md or these summarized steps:

Go to GitHub Developer Settings => GitHub Apps
Click "New GitHub App"
Use these values:
- Callback URL: https://<your-rhdh-route>/api/auth/github/handler/frame
- Webhook URL: same as above
- Permissions:
  - Contents: Read & write
  - Issues: Read & write
  - Metadata: Read-only
  - Webhooks: Read & write
- Subscribe to events: push, pull_request
Save and generate a private key (.pem file)

In your GitHub App settings, under "General", make sure: Request user authorization (OAuth) is checked

Create private.env with Secrets

Copy the template:

cp default-private.env private.env

Fill it with your secrets (escape PEM with \n):

export GITHUB__APP__ID='123456'
export GITHUB__APP__CLIENT__ID='Iv1...'
export GITHUB__APP__CLIENT__SECRET='...'
export GITHUB__APP__WEBHOOK__URL='https://<rhdh-route>/api/auth/github/handler/frame'
export GITHUB__APP__WEBHOOK__SECRET='secret'
export GITHUB__APP__PRIVATE_KEY='-----BEGIN RSA PRIVATE KEY-----\n...'

export GITOPS__GIT_TOKEN='ghp_...'

# Optional if pushing images
export QUAY__DOCKERCONFIGJSON=''
export QUAY__API_TOKEN=''

You can generate the escaped PEM with:

cat your-key.pem
awk '{printf "%s\\n", $0}' your-key.pem

Then source:

source private.env

Install RHDH and Dependencies

helm upgrade --install ai-rhdh ./chart --namespace ai-rhdh --create-namespace

Create a GitOps Repo

The installer doesn’t explicitly tell you this, but you must create a GitOps repo manually for Argo CD and RHDH integration to work.

Create it in your GitHub org:

https://github.com/<your-org>/rhdh-ai-gitops

Then, create the required secret in OpenShift:

oc create secret generic rhdh-argocd-secret \
  --from-literal=url=https://github.com/<your-org>/rhdh-ai-gitops \
  --from-literal=token=$GITOPS__GIT_TOKEN \
  -n ai-rhdh

Edit your argocd-config ConfigMap:

oc edit configmap argocd-config -n ai-rhdh

Replace the URL line with your actual Argo CD route and remember to have applied the token, for example:

argocd:
  username: ${ARGOCD_USER}
  password: ${ARGOCD_PASSWORD}
  waitCycles: 25
  appLocatorMethods:
    - type: 'config'
      instances:
        - name: default
          url: https://ai-rhdh-argocd-server-ai-rhdh.apps.ci-ln-4x2hc1b-72292.origin-ci-int-gce.dev.rhcloud.com
          token: ${ARGOCD_API_TOKEN}

Configure developer-hub-app-config??

Ensure the following is added to your ConfigMap (e.g. developer-hub-app-config):

app-config.extra.yaml: |-
  app:
    baseUrl: https://<your-rhdh-route>

  backend:
    baseUrl: https://<your-rhdh-route>
    cors:
      origin: https://<your-rhdh-route>

  auth:
    environment: production
    providers:
      github:
        production:
          clientId: ${GITHUB__APP__CLIENT__ID}
          clientSecret: ${GITHUB__APP__CLIENT__SECRET}
          signIn:
            resolver:
              githubUserId:
                resolver: github

Remember to add your github token for example:

integrations:
  github:
    - host: github.com
      token: ${GITHUB_TOKEN}

Create a GitHub App Secret and Inject it

source private.env

oc create secret generic backstage-env-ai-rh-developer-hub \
  --from-literal=GITHUB__APP__ID=$GITHUB__APP__ID \
  --from-literal=GITHUB__APP__CLIENT__ID=$GITHUB__APP__CLIENT__ID \
  --from-literal=GITHUB__APP__CLIENT__SECRET=$GITHUB__APP__CLIENT__SECRET \
  --from-literal=GITHUB__APP__WEBHOOK__SECRET=$GITHUB__APP__WEBHOOK__SECRET \
  --from-literal=GITHUB__APP__WEBHOOK__URL=$GITHUB__APP__WEBHOOK__URL \
  --from-literal=GITHUB__APP__PRIVATE_KEY="$GITHUB__APP__PRIVATE_KEY" \
  --from-literal=GITOPS__GIT_TOKEN="$GITOPS__GIT_TOKEN" \
  --dry-run=client -o yaml | oc apply -f - -n ai-rhdh

oc set env deployment/backstage-ai-rh-developer-hub \
  --from=secret/backstage-env-ai-rh-developer-hub -n ai-rhdh

# Restart Backstage deployment:
oc rollout restart deployment/backstage-ai-rh-developer-hub -n ai-rhdh

Run the Configuration Script

source private.env
bash ./configure.sh

This script configures:

Argo CD connection
Tekton Pipelines + Chains
GitHub integration for auth + GitOps
RHDH environment variables and plugins

Restart RHDH Deployment (if login fails)

If you see this error:

No auth provider registered for 'github'

Restart the RHDH pod to pick up the config:

oc rollout restart deployment/backstage-ai-rh-developer-hub -n ai-rhdh

Log in and Use the Portal

Open your RHDH route URL and log in via GitHub.

Let’s Register Existing AI Templates

Now that we are logged in with GitHub navigate to ‘Create’ and register an existing component. Using this url:

https://github.com/redhat-ai-dev/ai-lab-template/blob/release-v0.9.x/all.yaml

When a user selects "Register an existing component" in the Red Hat Developer Hub UI and enters the all.yaml catalog link (https://github.com/redhat-ai-dev/ai-lab-template/blob/release-v0.9.x/all.yaml), the following steps are executed behind the scenes:

1. Catalog Processor Kicks In

RHDH uses Backstage’s Software Catalog processors to analyze the contents of the provided YAML URL.

It scans all.yaml, which is a composite entity containing multiple Location entries pointing to individual template files:
- template:rag
- template:chatbot
- template:object-detection
- etc.

2. Each Location is Followed

For each Location entry, RHDH:

Resolves the GitHub raw URL
Fetches and parses the individual template.yaml files (e.g., templates/rag/template.yaml)
Validates that they conform to the Template kind spec (scaffolder.backstage.io/v1beta3)

3. Entities are Previewed

The UI shows a preview of the discovered entities before import:

All valid templates are listed
Users confirm and click Import

4. Entities Are Registered in the Catalog

Once imported:

Each template.yaml becomes a first-class Catalog entity of kind Template
They appear under the Create menu
Templates are tagged and categorized using their metadata.tags and spec.type

5. Templates Are Now Usable

Developers can now:

Click Create > Choose Template
Fill out the dynamic UI form (built from the parameters in the YAML)
Execute the full scaffolding + GitOps pipeline defined in steps

Once the Ai templates are registered and visible in your RHDH UI, you can, do the following:

Go to Create > Choose Template

Locate and select the RAG Chatbot Application template.

Fill in the Form Fields

You'll be prompted to enter several pieces of information:

Section 1: Application Information

Name: Unique name for your app (e.g., rag-apr22-25)
Owner: Owner of the component (e.g., user:guest)
Model Server: Choose from llama.cpp, vLLM, or Existing model server
Model Name: (if applicable) select or provide the name of the LLM model (e.g., instructlab/granite-7b-lab)

Section 2: Application Repository Information

Git host type (GitHub or GitLab)
Repository owner (GitHub org/user)
Name of the repo and the default branch

Section 3: Deployment Information

Image registry and organization
Image name
OpenShift namespace to deploy into
Optionally deploy to OpenShift AI via a checkbox

Click "Next", Review, and Submit

Red Hat Developer Hub orchestrates a powerful sequence of automation behind the scenes. The selected AI Software Template, defined as a Backstage scaffolder YAML, begins executing a series of templated actions: it fetches the base application and GitOps skeleton, publishes the source and GitOps repositories to GitHub (or GitLab), and registers both components into the RHDH Software Catalog. At this point, these entities are now discoverable and manageable via the RHDH UI. Simultaneously, Tekton pipelines are configured to handle application builds, and Argo CD resources are created to continuously deploy the application to OpenShift using GitOps principles. The entire lifecycle from scaffolding and repository creation to deployment and catalog registration is completed in minutes, providing a seamless developer experience with enterprise-grade automation.

Once your AI Software Template has scaffolded the application and deployed it via GitOps, this may take a moment. You can navigate to the CI tab of your catalog component in Red Hat Developer Hub. Here, you’ll see PipelineRuns powered by Tekton:

These runs clone the source repo
Build and push the container image
Update your GitOps repo with the new image reference

You can also view Argo CD syncing the deployment in real time.

In OpenShift, go to the Topology view, find your app, and click its Route. This will open the deployed app in a browser. Try asking the chatbot questions. You may get a vague or inaccurate answer. That’s expected; it hasn’t been taught about the topic in question yet. Upload Custom Knowledge. Ask the same question again. The Chatbot now gives a more accurate, context-aware response.

Additional Resources

Backstage for the win!

Fortune Ndlovu — Fri, 18 Apr 2025 16:55:14 +0000

Backstage is an open platform for building developer portals, created by Spotify and now an incubating project under the Cloud Native Computing Foundation (CNCF). It aims to streamline the developer experience by centralizing tools, services, documentation, and infrastructure into a single, cohesive interface. Whether you are managing hundreds of microservices or just want a clean, developer-friendly interface for your internal tooling, Backstage provides the building blocks to make it happen.

This blog walks you through the practical process of standing up a working Backstage instance from scratch. We will cover:

Creating and configuring a Backstage app
Setting up a PostgreSQL database for Backstage to use
Setting up authentication with GitHub

By the end of this guide, you will have a local Backstage instance that you can share with your friends.

Prerequisites

Before we get started, make sure your development environment meets the following requirements:

Access to a Unix-based operating system, such as Linux, MacOS, or Windows Subsystem for Linux
A GNU-like build environment available at the command line. For example, on Debian/Ubuntu you will want to have the make and build-essential packages installed. On MacOS, you will want to run xcode-select --install to get the Xcode command line build tooling in place.
curl or Wget
Node.js Active LTS Release
- We recommend using nvm for this:
- Install nvm
- Install Node with nvm
Yarn for dependency management
- You will need to use Yarn Classic to create a new project, but it can then be migrated to Yarn 3
Git for source control

Great, now that we have some housekeeping done. Let us dive in!

Creating a Backstage Application

To create a new Backstage standalone application, we’ll use npx, a command-line tool that comes bundled with Node.js. It allows you to execute packages directly from the npm registry without needing to install them globally. Before running the command, let us break down what it actually does. When you run the Backstage app creation wizard, it will:

Prompt you to enter a name for your application
Create a new directory using that name in your current working directory
Scaffold a complete Backstage project inside that directory with all the necessary configuration and code structure to get started

Project Structure Overview

Here is a simplified view of the folder structure that gets generated:

my-backstage-app/
├── app-config.yaml
├── catalog-info.yaml
├── package.json
└── packages/
    ├── app/
    └── backend/

Here is what each part of the structure represents:

app-config.yaml: The main configuration file for your app. This is where you'll manage everything from plugins to authentication settings.
catalog-info.yaml: Describes catalog entities (like systems, APIs, and components). See Catalog Descriptor Format to learn more.
package.json: Root-level package configuration. Avoid adding dependencies here—most should live in specific workspaces under packages/.
packages/: A monorepo-style workspace directory (managed via Lerna) containing separate packages:
- app/: The frontend React app, your developer portal UI.
- backend/: The backend service that powers features like authentication, catalog ingestion, software templates, and TechDocs.

Now that you know what’s being created, let us go ahead and generate your Backstage app:

npx @backstage/create-app@latest

Follow the prompts to name your app, and the CLI will take care of the rest.

Run the Backstage application

At this point your Backstage application should be installed and ready to be run. You can go to the application directory and start the application, for example:

cd my-backstage-app # your app name
yarn start

When you navigate to http://localhost:3000 to see your Backstage application, you should see:

Setting Up PostgreSQL with Backstage

At this point, we want to ensure PostgreSQL is installed, initialized, and ready for use with our Backstage app on any operating system. Backstage offers SQLite, an in-memory database, for easy local setup with no external dependencies. SQLite is a great way to kick the tires, but it is recommended to set up PostgreSQL. Backstage uses the Knex database library under the covers, which supports many popular databases.

First, let us stop Backstage before we continue by pressing Control-C.

Configure Backstage (app-config.local.yaml)

backend:
  database:
    client: pg
    connection:
      host: localhost
      port: 5432
      user: postgres
      password: secret

Linux

Fedora / RHEL / CentOS / AlmaLinux / Rocky Linux

sudo dnf install postgresql-server postgresql-contrib
sudo postgresql-setup --initdb
sudo systemctl enable --now postgresql

Then set a password:

sudo -u postgres psql
ALTER USER postgres PASSWORD 'secret';
\q

Ubuntu / Debian

sudo apt update
sudo apt install postgresql postgresql-contrib

Then:

sudo -u postgres psql
ALTER USER postgres PASSWORD 'secret';
\q

PostgreSQL runs automatically on install. Confirm with:

sudo systemctl status postgresql

macOS

Using Homebrew (recommended)

brew update
brew install postgresql
brew services start postgresql@16

Set password:

psql postgres
ALTER USER postgres PASSWORD 'secret';
\q

You’re ready to go.

Windows

Using the PostgreSQL Installer

Download the installer from: https://www.enterprisedb.com/downloads/postgres-postgresql-downloads
Run the installer and select:

PostgreSQL version (e.g. 16)
Default user: postgres
Set your password (e.g. secret)

Finish setup and start the database service

You can use pgAdmin or psql.exe to connect:

psql -U postgres -h localhost -d postgres

Done.

Make sure authentication is password-based (not ident/peer)

Edit pg_hba.conf:

OS	Default location
Linux	`/var/lib/pgsql/data/pg_hba.conf`
Ubuntu	`/etc/postgresql/<ver>/main/pg_hba.conf`
macOS	`/opt/homebrew/var/postgres/pg_hba.conf`
Windows	`%PROGRAMFILES%\PostgreSQL\<ver>\data\pg_hba.conf`

Update the file like this:

For example, on Linux:

sudo nano /var/lib/pgsql/data/pg_hba.conf

Replace all peer or ident entries with: md5

Then restart PostgreSQL:

# Linux (systemd)
sudo systemctl restart postgresql

# macOS (Homebrew)
brew services restart postgresql@16

Test it works

Try this from the terminal:

psql -U postgres -h 127.0.0.1 -d postgres

If it prompts for a password and connects, you’re ready

Start the app again:

yarn dev

The backend should connect and initialize all plugins.

Open in browser: http://localhost:3000

Verify DB connectivity inside Backstage

You should not see any startup errors like:

Failed to instantiate service 'core.database'

Backstage will use PostgreSQL to store catalog data, auth sessions, and more.

Setting up Authentication

Authentication is a core part of any internal developer portal, and Backstage offers a flexible, plugin-based system to integrate with your identity provider of choice. Whether your organization uses GitHub, GitLab, Google, Okta, or a custom solution, Backstage provides a consistent way to manage sign-in, issue tokens, and associate users with catalog entities.

In this guide, we'll focus on integrating with GitHub as our authentication provider, using Backstage's built-in GitHub auth provider. This allows developers to sign in using their GitHub credentials and enables features like ownership mapping and permission-aware visibility across the portal.

Go to https://github.com/settings/applications/new to create your OAuth App.

Homepage URL should point to Backstage's frontend, in our tutorial it would be http://localhost:3000
Authorization callback URL should point to the auth backend, http://localhost:7007/api/auth/github/handler/frame

Generate a new Client Secret and take a note of the Client ID and the Client Secret.

With your OAuth app registered in GitHub, it’s time to wire up authentication in your Backstage instance. This involves two key steps:

Adding your GitHub OAuth credentials to the app-config.yaml
Updating the sign-in page logic in your App.tsx to use the GitHub provider

1. Add GitHub OAuth Credentials to app-config.yaml

Open the app-config.yaml file at the root of your project (created automatically during app generation). This file is used to override the default configuration in a local development environment. Append the following:



auth:
  # see <https://backstage.io/docs/auth/> to learn about auth providers
  environment: development
  providers:
    # See <https://backstage.io/docs/auth/guest/provider>
    guest: {}
    github:
      development:
        clientId: YOUR CLIENT ID
        clientSecret: YOUR CLIENT SECRET
        signIn:
          resolvers:
            # Matches the GitHub username with the Backstage user entity name.
            # See <https://backstage.io/docs/auth/github/provider#resolvers> for more resolvers.
            - resolver: usernameMatchingUserEntityName

Replace YOUR_CLIENT_ID and YOUR_CLIENT_SECRET with the values from your GitHub OAuth App. Backstage will automatically reload configuration changes on save. If no errors appear in your terminal, you're good to go.

2. Customize the Sign-In Page in App.tsx

Next, we’ll tell Backstage to use GitHub as the sign-in method.

Open the packages/app/src/App.tsx file in your favorite editor.

Just below the final import line, add the following:

import { githubAuthApiRef } from '@backstage/core-plugin-api';

Now, locate the createApp function call and look for the components section. Add or modify the SignInPage definition like this:


components: {
  SignInPage: props => (
    <SignInPage
      {...props}
      auto
      provider={{
        id: 'github-auth-provider',
        title: 'GitHub',
        message: 'Sign in using GitHub',
        apiRef: githubAuthApiRef,
      }}
    />
  ),
},

This configures Backstage to present users with a sign-in screen offering both GitHub login. The default Backstage setup includes a guest Sign-In Resolver. This resolver assigns all users a shared identity (guest) and is designed for quick local setup or sandbox exploration. In a production environment, you’ll want to replace or disable it in favor of real user identity mapping. To learn more, check out the Backstage Auth documentation and the section on Sign-In Resolvers.

With the frontend configured to support both GitHub, we now need to ensure the backend knows how to handle those authentication flows. Backstage's createBackend() system makes this easy. You don't need to manually wire up auth routes, instead, you just declare the auth modules you want to use.

To add the auth provider to the backend, we will first need to install the package by running this command:

yarn --cwd packages/backend add @backstage/plugin-auth-backend-module-github-provider

Open packages/backend/src/index.ts and locate the section where backend plugins are registered. To enable GitHub and Guest authentication, add the following lines:

// auth plugin
backend.add(import('@backstage/plugin-auth-backend'));
// See <https://backstage.io/docs/backend-system/building-backends/migrating#the-auth-plugin>
backend.add(import('@backstage/plugin-auth-backend-module-guest-provider'));
backend.add(import('@backstage/plugin-auth-backend-module-github-provider'));

Once those lines are in place, you may want to restart the backend using yarn start-backend, and then you can cancel if you wish and move forward.

We can add a User easily, the recommended approach for adding Users, and Groups, into your Catalog is to use one of the existing Org Entity Providers - like this one for GitHub - or if those don't work you may need to create one that fits your Organization's needs.

For the sake of this guide we'll simply step you though adding a User to the org.yaml file that is included when you create a new Backstage instance. Let's do that:

First open the /examples/org.yaml file in your text editor of choice
At the bottom we'll add the following YAML:

---
apiVersion: backstage.io/v1alpha1
kind: User
metadata:
  name: YOUR GITHUB USERNAME
spec:
  memberOf: [guests]

Now make sure to replace the text "YOUR GITHUB USERNAME" with your actual GitHub User name.

Just for Fun

Before we launch our newly wired-up Backstage app with GitHub authentication, let’s take a moment to make it feel like yours. Backstage is designed to be fully customizable, from layout and themes to branding, so let’s start with the basics: updating the app name, organization name, and logo.

Setting the Application and Organization Name

By default, the Backstage Catalog header reads “My Company Catalog.” Let’s swap that out for something more fitting for your team or project.

Open the app-config.yaml file at the root of your project.
Locate the app and organization blocks and edit them like this:

app:
  title: Winning Backstage

organization:
  name: Winning Backstage

app.title: Appears in the browser tab and in various places in the UI.
organization.name: Appears in the Catalog header, as in "Rockit Rockets Catalog".

Note: app-config.local.yaml is already listed in your .gitignore, so it’s safe to include secrets and tokens there. But general branding like this should go in the main app-config.yaml.

Optional: Updating the Logo

Want to go a step further and give Backstage a visual identity?

Open packages/app/src/components/Root.tsx
You’ll find these imports near the top:

import { LogoFull, LogoIcon } from './components/Logo';

These are the logo components used in the header and sidebar.

To update the branding:
- Replace the SVGs or React components used in LogoFull and LogoIcon.
- You can either modify the existing files or point to new components you've created.

This allows you to drop in your own company logo, project branding, or any other personalized touch. To apply your changes, restart the development server:

yarn dev

Navigate to your Backstage app in the browser and you should see the application name updated and your GitHub Authentication implemented:

Next, sign in with GitHub and follow the authorization process:

Well done, you are now a Backstage expert. Continue experting!

To learn more about Backstage, visit https://backstage.io/

Intelligent Support Ticket Routing with Natural Language Processing (NLP)

Fortune Ndlovu — Thu, 17 Apr 2025 05:57:01 +0000

Intro to Large Language Model Data

This project explores the process of preparing real-world language data for use with large language models (LLMs). While the broader field of natural language processing (NLP) has existed for decades, modern LLMs bring new complexity and opportunity to tasks like classification, generation, and automated decision-making. In this context, support ticket routing is an ideal real-world case to apply LLM-aligned data preparation techniques.

We'll focus on a specific use case: building an NLP pipeline that processes, cleans, embeds, and classifies enterprise support tickets. The result is a system that can accurately route tickets across a range of multilingual categories using advanced sentence embeddings and machine learning.

1. Problem Statement and Goal

Support teams often deal with thousands of incoming tickets that need to be routed to the correct department. Manually categorizing these tickets is slow and error-prone. Our goal is to automatically classify support tickets based on their textual content using NLP techniques, enabling faster triage and resolution.

2. Prerequisites

GitHub Reference material: https://github.com/Fortune-Ndlovu/Intelligent-Support-Ticket-Routing-with-NLP-and-XGBoost/tree/main

This notebook assumes you have the latest Python and the following libraries installed.
First things first, Set Up Your Environment (Anaconda) by creating a new conda environment you can achieve this by opening up your terminal (or Anaconda Prompt):

conda create -n ticket-nlp python=3.10 -y
conda activate ticket-nlp

You will probably want to install the required packages therefore use conda and pip as needed:

# Core packages
conda install pandas scikit-learn -y
conda install -c conda-forge matplotlib seaborn

# Install pip packages (for embedding + transformers)
pip install sentence-transformers
pip install tqdm
pip install nltk
pip install deep-translator tqdm
pip install xgboost

At this point your environment is ready. Let us proceed to loading and exploring the dataset!

3. Load and Explore Dataset

You can download the Dataset by navigating to Multilingual Customer Support Tickets – Kaggle and save it as tickets.csv in your project folder

At this point, you now have the raw data and can begin exploring by loading the dataset and checking available columns.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import re
import nltk
from nltk.corpus import stopwords
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
from sentence_transformers import SentenceTransformer
nltk.download('stopwords')

df = pd.read_csv("tickets.csv")
print(df.columns)

C:\Users\ndlov\anaconda3\envs\ticket-nlp\lib\site-packages\tqdm\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm


Index(['subject', 'body', 'answer', 'type', 'queue', 'priority', 'language',
       'business_type', 'tag_1', 'tag_2', 'tag_3', 'tag_4', 'tag_5', 'tag_6',
       'tag_7', 'tag_8', 'tag_9'],
      dtype='object')


[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\ndlov\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!

# Quick preview
df.head()

	subject	body	answer	type	queue	priority	language	business_type	tag_1	tag_2	tag_3	tag_4	tag_5	tag_6	tag_7	tag_8	tag_9
0	Problema crítico del servidor requiere atenció...	Es necesaria una investigación inmediata sobre...	Estamos investigando urgentemente el problema ...	Incident	Technical Support	high	es	IT Services	Urgent Issue	Service Disruption	Incident Report	Service Recovery	System Maintenance	NaN	NaN	NaN	NaN
1	Anfrage zur Verfügbarkeit des Dell XPS 13 9310	Sehr geehrter Kundenservice,\n\nich hoffe, die...	Sehr geehrter <name>,\n\nvielen Dank, dass Sie...	Request	Customer Service	low	de	Tech Online Store	Sales Inquiry	Product Support	Customer Service	Order Issue	Returns and Exchanges	NaN	NaN	NaN	NaN
2	Erro na Autocompletação de Código do IntelliJ ...	Prezado Suporte ao Cliente <name>,\n\nEstou es...	Prezado <name>,\n\nObrigado por entrar em cont...	Incident	Technical Support	high	pt	IT Services	Technical Support	Software Bug	Problem Resolution	Urgent Issue	IT Support	NaN	NaN	NaN	NaN
3	Urgent Assistance Required: AWS Service	Dear IT Services Support Team, \n\nI am reachi...	Dear <name>,\n\nThank you for reaching out reg...	Request	IT Support	high	en	IT Services	IT Support	Urgent Issue	Service Notification	Cloud Services	Problem Resolution	Technical Guidance	Performance Tuning	NaN	NaN
4	Problème d'affichage de MacBook Air	Cher équipe de support du magasin en ligne Tec...	Cher <name>,\n\nMerci de nous avoir contactés ...	Incident	Product Support	low	fr	Tech Online Store	Technical Support	Product Support	Hardware Failure	Service Recovery	Routine Request	NaN	NaN	NaN	NaN

Before diving into preprocessing, it's important to understand the structure and richness of our dataset. Each row represents a unique support ticket submitted by a user. These tickets span multiple languages and departments, simulating a real-world enterprise support system. We begin by loading the CSV file using pandas and displaying a quick preview:

This gives us insight into the following key columns:

Column	Description
`subject`	Short summary or title of the ticket (usually user-written)
`body`	Full description of the issue or request
`answer`	Optional response or continuation in the thread
`type`	Ticket type such as `"Incident"` or `"Request"`
`queue`	Ground-truth label for which department handled the ticket
`priority`	Priority level (e.g., `"high"`, `"low"`)
`language`	Detected language of the ticket
`business_type`	Type of customer/business segment
`tag_1`–`tag_9`	Multi-label tags capturing relevant categories, issue types, or subtopics

This diverse set of features allows us to build a model that not only understands natural language but also considers context, issue categorization, and business structure, making it ideal for intelligent routing tasks.

4. Text Cleaning

Text cleaning is the process of transforming raw, messy, human-written text into a structured, consistent format that machine learning models can understand. In the context of support tickets, this involves removing unnecessary characters (like punctuation), normalizing casing and accents, eliminating common filler words (like "the" or "please"), and combining fragmented text fields into a single input. This step is critical in natural language processing (NLP) because clean, standardized text helps models learn patterns more effectively, especially when dealing with multiple languages, noisy inputs, and user-generated content. LLMs and ML models benefit from clean, normalized text. We'll lowercase, remove punctuation, stopwords, and extra whitespace.

import re
import unicodedata
from sklearn.feature_extraction.text import ENGLISH_STOP_WORDS

# 1. Combine fields robustly
df['text'] = df[['subject', 'body', 'answer']].fillna('').agg(' '.join, axis=1)

# 2. Use sklearn's stopword list
stop_words = ENGLISH_STOP_WORDS

# 3. Compile regex once for performance
_whitespace_re = re.compile(r"\s+")
_non_alphanum_re = re.compile(r"[^a-z0-9\s]")

# 4. Define pro cleaner with accent normalization
def clean_text(text):
    text = str(text).lower().strip()
    text = unicodedata.normalize('NFKD', text).encode('ascii', 'ignore').decode('utf-8')
    text = _non_alphanum_re.sub("", text)
    text = _whitespace_re.sub(" ", text)
    tokens = [word for word in text.split() if word not in stop_words]
    return " ".join(tokens)

# 5. Apply cleaning function
df['clean_text'] = df['text'].apply(clean_text)

# 6. Preview result
df[['subject', 'clean_text']].head()

	subject	clean_text
0	Problema crítico del servidor requiere atenció...	problema critico del servidor requiere atencio...
1	Anfrage zur Verfügbarkeit des Dell XPS 13 9310	anfrage zur verfugbarkeit des dell xps 13 9310...
2	Erro na Autocompletação de Código do IntelliJ ...	erro na autocompletacao codigo intellij idea p...
3	Urgent Assistance Required: AWS Service	urgent assistance required aws service dear se...
4	Problème d'affichage de MacBook Air	probleme daffichage macbook air cher equipe su...

Did you notice our data is still not all English? This is because the original ticket dataset is intentionally multilingual. If we just filter stopwords using English rules or lowercase French/Spanish/Portuguese words, we’re still not doing the best we can.

That’s why in the next section, we will:

Detect ticket language
Automatically translate non-English tickets to English using Google Translate
Then apply this same cleaning function

import pandas as pd
import re
import unicodedata
from sklearn.feature_extraction.text import ENGLISH_STOP_WORDS
from deep_translator import GoogleTranslator
from functools import lru_cache
from tqdm import tqdm

# Enable tqdm for pandas apply
tqdm.pandas()

# --- 1. Combine subject + body + answer into single text column ---
df['text'] = df[['subject', 'body', 'answer']].fillna('').agg(' '.join, axis=1)

# --- 2. Caching Google Translate for performance ---
@lru_cache(maxsize=10000)
def cached_translate(text, lang):
    if lang != 'en':
        try:
            return GoogleTranslator(source=lang, target='en').translate(text)
        except Exception:
            return text  # fallback to original
    return text

# --- 3. Translate non-English text with progress ---
df['text_en'] = df.progress_apply(lambda row: cached_translate(row['text'], row['language']), axis=1)

# --- 4. Use sklearn's English stopwords ---
stop_words = ENGLISH_STOP_WORDS

# --- 5. Compile regex patterns ---
_whitespace_re = re.compile(r"\s+")
_non_alphanum_re = re.compile(r"[^a-z0-9\s]")

# --- 6. Define professional text cleaner ---
def clean_text(text):
    text = str(text).lower().strip()
    text = unicodedata.normalize('NFKD', text).encode('ascii', 'ignore').decode('utf-8')  # remove accents
    text = _non_alphanum_re.sub("", text)  # remove punctuation
    text = _whitespace_re.sub(" ", text)  # normalize whitespace
    tokens = [word for word in text.split() if word not in stop_words]
    return " ".join(tokens)

# --- 7. Apply the cleaning function with progress ---
df['clean_text'] = df['text_en'].progress_apply(clean_text)

# --- 8. Preview sample results ---
sample = df[['language', 'subject', 'text_en', 'clean_text']].sample(5, random_state=42)
for i, row in sample.iterrows():
    print(f"Language: {row['language']}")
    print(f"Subject: {row['subject']}")
    print(f"Translated: {row['text_en'][:200]}...")
    print(f"Cleaned: {row['clean_text'][:200]}...\n")
    print("-" * 80)

100%|██████████| 4000/4000 [11:37<00:00,  5.74it/s]
100%|██████████| 4000/4000 [00:00<00:00, 4544.00it/s]

Language: pt
Subject: Assistência Necessária para Problemas Persistentes de Atolamento de Papel com Impressora Canon
Translated: Assistance required for persistent paper jam problems with canon printer with customer support,

I am writing to report persistent paper jam problems with my Canon Pixma MG3620 printer. The problem oc...
Cleaned: assistance required persistent paper jam problems canon printer customer support writing report persistent paper jam problems canon pixma mg3620 printer problem occurs light checkout documentation ass...

--------------------------------------------------------------------------------
Language: es
Subject: nan
Translated: Dear customer support equipment, I am writing to get your attention on the continuous problems we are experiencing with our AWS cloud implementation, which is managed through its AWS administration se...
Cleaned: dear customer support equipment writing attention continuous problems experiencing aws cloud implementation managed aws administration service interruptions happening growing frequency led significant...

--------------------------------------------------------------------------------
Language: en
Subject: Urgent: Jira Software 8.20 Malfunction Issue
Translated: Urgent: Jira Software 8.20 Malfunction Issue Dear Support Team,

I am writing to report a serious issue that we have been facing with Jira Software 8.20, specifically during our Scrum sprint managemen...
Cleaned: urgent jira software 820 malfunction issue dear support team writing report issue facing jira software 820 specifically scrum sprint management tasks team encountered persistent malfunctions significa...

--------------------------------------------------------------------------------
Language: es
Subject: Problema de creación de tickets en Jira Software 8.20
Translated: Ticket creation problem in jira software 8.20 estimated customer support,

I am experiencing problems with the process of creating tickets in Jira Software 8.20. Every time I try to send a new ticket,...
Cleaned: ticket creation problem jira software 820 estimated customer support experiencing problems process creating tickets jira software 820 time try send new ticket error message appears prevents completing...

--------------------------------------------------------------------------------
Language: fr
Subject: nan
Translated: Dear customer service,

I hope you find you healthy. I am writing to request an upgrading of our Google Workspace licenses for the sales team in order to improve their productivity and their collabora...
Cleaned: dear customer service hope healthy writing request upgrading google workspace licenses sales team order improve productivity collaboration capacities currently use standard business edition transition...

--------------------------------------------------------------------------------

Now that we’ve cleaned text instead of dropping them (which would waste data), we have taken the logical approach:

Detect the ticket language
Translate non-English text into English automatically
Then apply the same cleaning logic as before

This ensures every ticket is processed in the same language, which makes our model smarter and fairer.

5. Text Embedding and Classification Model Training

After cleaning the text, we still can’t feed it directly into a machine learning model, computers don’t understand words the way humans do. This is where text embedding comes in. Embedding is the process of converting text into numerical vectors (lists of numbers) that capture the meaning and context of the words or sentences. Think of it as turning text into something the model can "see" and learn from.

Once the text is embedded, we use those vectors to train a classification model, a type of algorithm that learns to recognize patterns and assign labels. In our case, the model learns to predict the correct support queue (like “Technical Support” or “Product Support”) based on the ticket’s content. This combination of embedding + classification is the core of how we automate ticket routing using NLP.

In this step, we train a machine learning classifier on the embedded support tickets. To do this, we first encode our category labels (queue_grouped) into numbers using a label encoder, then train an XGBoost model a high-performance, gradient-boosted decision tree classifier. After training, we evaluate the model's accuracy and visualize how well it performs across all support categories using a classification report and a confusion matrix.

from sklearn.preprocessing import LabelEncoder

# Encode y labels (queue_grouped)
le = LabelEncoder()
y_encoded = le.fit_transform(y)

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y_encoded, test_size=0.2, random_state=42)

# Train XGBoost
print("Training XGBoost...")
clf = XGBClassifier(
    n_estimators=300,
    max_depth=8,
    learning_rate=0.1,
    subsample=0.8,
    colsample_bytree=0.8,
    scale_pos_weight=1,
    use_label_encoder=False,
    eval_metric='mlogloss',
    n_jobs=-1,
    verbosity=1
)
clf.fit(X_train, y_train)

# Predict & decode
y_pred = clf.predict(X_test)
y_test_labels = le.inverse_transform(y_test)
y_pred_labels = le.inverse_transform(y_pred)

# Evaluate
print("\n Classification Report:")
print(classification_report(y_test_labels, y_pred_labels, zero_division=0))

# Confusion Matrix
cm = confusion_matrix(y_test_labels, y_pred_labels, labels=le.classes_)
plt.figure(figsize=(12, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=le.classes_, yticklabels=le.classes_)
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("XGBoost Confusion Matrix (Grouped + Tags)")
plt.tight_layout()
plt.show()

 Training XGBoost...


C:\Users\ndlov\anaconda3\envs\ticket-nlp\lib\site-packages\xgboost\training.py:183: UserWarning: [04:31:23] WARNING: C:\actions-runner\_work\xgboost\xgboost\src\learner.cc:738: 
Parameters: { "scale_pos_weight", "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)



 Classification Report:
                                 precision    recall  f1-score   support

           Billing and Payments       0.96      0.93      0.95        75
               Customer Service       0.65      0.60      0.62       124
                     IT Support       0.92      0.46      0.61        98
                          Other       0.84      0.47      0.60        55
                Product Support       0.67      0.70      0.68       143
          Returns and Exchanges       0.88      0.80      0.83        44
Service Outages and Maintenance       0.80      0.53      0.64        30
              Technical Support       0.62      0.87      0.72       231

                       accuracy                           0.71       800
                      macro avg       0.79      0.67      0.71       800
                   weighted avg       0.74      0.71      0.70       800

6. Dataset Summary

The dataset contains support tickets from a global enterprise environment, spanning multiple departments and languages. Each ticket includes a subject, body, and answer, enriched with structured metadata such as language, business type, and hierarchical tags. To ensure linguistic consistency and inclusivity, all non-English tickets were translated to English before preprocessing.

The original label space (queue) exhibited significant class imbalance. To improve model performance and evaluation fairness, low-frequency categories such as "Human Resources", "Sales and Pre-Sales", and "General Inquiry" were grouped under an "Other" class. This consolidation helped stabilize predictions and boost performance across underrepresented groups.

Queue (Grouped)	Ticket Count
Technical Support	1317
Product Support	690
Customer Service	627
IT Support	445
Billing and Payments	338
Returns and Exchanges	197
Service Outages and Maintenance	141
Other	165 (approx)

7. Evaluation

To see how well our model performs, we trained an XGBoostClassifier using sentence embeddings generated by the all-mpnet-base-v2 transformer a powerful language model designed to capture the meaning of full sentences.

We grouped some of the less common ticket categories under a new "Other" label to reduce noise and help the model focus on learning the main categories. The data was split so that 80% was used for training and 20% for testing.

Here’s what the model achieved:

✅ 71% overall accuracy
✅ Macro F1-score of 0.71 (a balanced measure of performance across all classes)
✅ Strong performance in categories like "Billing and Payments", "Returns and Exchanges", and "Technical Support"

The confusion matrix below shows how well the model predicted each category. Values along the diagonal represent correct predictions:

Actual \ Predicted	Billing	Cust. Service	IT Support	Other	Product	Returns	Outages	Tech. Support
Billing and Payments	✅ 70							4
Customer Service		✅ 74			15	2		32
IT Support		10	✅ 45		7	1		34
Other		9	0	✅ 26	8			11
Product Support		10	2		✅ 100	1		30
Returns and Exchanges		3			4	✅ 35		2
Service Outages & Maintenance		1		1		1	✅ 16	11
Technical Support	2	6	2	2	15		4	✅ 200

✅ Diagonal values are correct predictions

⚠️ Off-diagonal values show where the model made mistakes (e.g. confusing similar categories like "IT Support" and "Technical Support")

Overall, this model shows strong potential for automating ticket routing in a multilingual enterprise environment, especially for high-volume categories. With more labeled data and continued tuning, it can be made even more accurate and aligned to specific business needs.

8. Inference Example

Inference is the final, most exciting step: using the trained model to make predictions on new, unseen data. While training involved teaching the model what each ticket should be labeled as, inference is all about applying what the model learned to real-world examples. In our case, inference means feeding in a new support ticket, maybe from a form, email, or chat, and asking the model to predict which department or queue it should go to (e.g., “Product Support”, “Technical Support”). To make this useful in practice, we wrap all the preprocessing, embedding, and prediction steps into a single function: predict_ticket(). This simulates how a support platform could instantly route tickets without human input. Once a model is trained, the next step is making it useful in the real world — we call this inference. This means taking new ticket data (a subject line, body, maybe some tags), and asking the model to predict where the ticket should be routed. To make this simple and reusable, we define a predict_ticket() function. This function:

Combines text fields like during training

Cleans and embeds the input

Uses the trained classifier to make a prediction

Returns a human-readable label (like "Product Support")

This is the same process your company could use in a real app or bot!

def predict_ticket(subject, body, answer="", tags=None):
    """
    Predicts the support queue for a new ticket using the trained model.

    Args:
        subject (str): Ticket subject line
        body (str): Main body of the ticket
        answer (str): Optional reply or continuation of conversation
        tags (list of str): Optional list of tag strings (e.g. issue type, priority)

    Returns:
        str: Predicted support queue label
    """
    # Combine fields like in training
    base_text = f"{subject} {body} {answer}"
    tags_text = " ".join(tags) if tags else ""
    full_input = f"{base_text} {tags_text}"

    # Clean input (same steps as training)
    clean = clean_text(full_input)

    # Embed with the same model
    embedding = model.encode([clean])

    # Predict with trained model
    encoded_pred = clf.predict(embedding)[0]
    return le.inverse_transform([encoded_pred])[0]

predict_ticket(
    subject="Cannot access Jira after upgrading to 8.20",
    body="The Jira service throws a 503 error after our recent upgrade. This is blocking several engineering teams.",
    tags=["Technical Issue", "Urgent", "Atlassian"]
)
# Output: 'Technical Support'

'Technical Support'

Our model predicted that the ticket belongs to the Technical Support queue, and it makes a lot of sense based on the input:

Cannot access Jira”: Mentions a software access issue.
“503 error”: A server or application error, very common in infrastructure or backend support tickets.
“Blocking engineering teams”: High urgency, affecting internal teams.
Tags like “Technical Issue” and “Urgent” further reinforce that this is not just a general inquiry — it needs hands-on technical help.

Based on similar examples the model saw during training, it learned that Jira issues + technical errors + urgency often belong to the Technical Support department.

So, this prediction isn't just random, it's learned from patterns in your real-world data. That’s the magic of combining embeddings + ML!

9. Conclusion

In this project, we built a robust, real-world NLP pipeline for automated support ticket routing, going from raw multilingual input to a high-performing, production-ready model. Here's what we accomplished:

Cleaned and translated multilingual support ticket content for uniform preprocessing
Combined unstructured text with structured tags to enrich the input signal
Generated dense semantic embeddings using the all-mpnet-base-v2 transformer
Trained a high-accuracy XGBoostClassifier with grouped labels for improved generalization
Evaluated model performance across 8 enterprise queues using both metrics and visual confusion matrices
Wrapped everything in a real-time predict_ticket() function ready for integration

With an accuracy of ~71% and a macro F1-score of 0.71, this pipeline provides a strong and scalable foundation for enterprise-grade ticket triage.

Room for Further Gains:

Adding more labeled training data and fine-tuning embeddings
Incorporating rich metadata (e.g., ticket priority, business type, submission time)
Integrating real-time user feedback to drive continuous learning

References:

Image Classification with Convolutional Neural Networks (CNNs)

Fortune Ndlovu — Wed, 09 Apr 2025 11:27:23 +0000

A common task to use neural networks and deep learning is computer vision. We will use an MNIST dataset to classify handwritten digits 0-9 and be able to classify new handwritten digits based on that data. The first technique we will employ will be a simple multilayer perceptron, and then we will use the more powerful convolutional neural network.

Prerequisites

Basic proficiency with Python, including variables, loops, installing and importing packages, collections, list comprehensions.
Know how to declare NumPy arrays in Python (See NumPy documentation).

Setup

To follow along using your desktop IDE:

Install or update to the latest version of Anaconda
Launch your command line tool and configure your conda environment

For macOS and Linux users: Search and launch Terminal in your system
For Windows users: Locate and launch Anaconda Prompt in your system

You can find the .ipynb file I am working on here https://github.com/Fortune-Ndlovu/ML

The MNIST Dataset

The MNIST handwritten digit recognition problem is the "Hello World" of computer vision problems. When we talk about computer vision, we are classifying images algorithmically. But because it is difficult to explicitly code an algorithm to recognize images of dogs versus cats, or the digits 0,1,2,3... in handwriting, it is advantageous and more practical to use machine learning to find patterns in pre-labeled training images. This is a balancing act though, because learning patterns too well and too tightly can cause overfitting.

For now, let's focus on classifying handwritten digits. This is highly applicable. For example, I use my iPad with an Apple Pencil to take handwritten notes or write text inputs in apps. This is achieved through character recognition software that likely was trained using deep learning. Ironically, this rarely is branded as AI anymore as we take it for granted. We can practice this problem on a smaller scale using the MNIST dataset.

The MNIST dataset was developed by Yann LeCunn and his colleagues to test machine learning models for handwritten digit recognition. The National Institute of Standards and Technology (NIST) provided scanned documents and derived datasets becoming the Modified NIST (MNIST) dataset.

The digits were scanned, rescaled so they all matched in size, and positioned in the center of each image. We should appreciate this cleaning process that took place which many computer vision projects require, but we can jump right in and use it as this work is done. The images are 28 by 28 pixels, making each image 784 pixels in total. There is no color, so they operate on a grayscale from 0 through 255 which we can rescale to be between 0 and 1. There are 70,000 images in the dataset total.

Let's bring in the dataset and sample 5 records.

import pandas as pd

df = pd.read_csv('https://github.com/Fortune-Ndlovu/ML/raw/refs/heads/main/mnist_784.zip')

df.sample(5)

	...	class
38715	...	2
14828	...	5
38771	...	3
54871	...	7
4321	...	0

5 rows × 785 columns

Note that each image is represented by one row, with 784 columns each representing the value of each pixel. This may seem counterintuitive at first, but each image is being represented by a 1-dimensional vector. The last column class is the label indicating what digit this image represents.

To make this data more intuitive, we randomly select 9 samples, reshape each one into a 28×28 matrix, and display them as images. Here's what that looks like:

import matplotlib.pyplot as plt
sample_imgs = df.sample(9)
fig, axes = plt.subplots(3, 3, figsize=(6, 6))

for i, ax in enumerate(axes.flat):
    img = sample_imgs.iloc[i, :-1].values.reshape(28, 28)
    label = sample_imgs.iloc[i, -1]
    ax.imshow(img, cmap='gray')
    ax.set_title(f"Label: {label}")
    ax.axis('off')

plt.tight_layout()
plt.show()

The code is also pretty simple:

df.sample(9) randomly picks 9 digits from the dataset.

Each image is reshaped from a flat vector back to a 28x28 grid using .reshape(28, 28).

matplotlib is used to plot the images in a 3×3 grid.

The labels are shown above each image so you know what digit it represents.

This simple visualization makes the data feel real as it's no longer just numbers in a table. We're now looking at actual handwriting that our model will soon learn to recognize.

Interestingly, if you look closely at the pixel matrix (without formatting or reshaping), you can actually make out the shape of a digit just from the raw numbers. This works because non-zero values represent the strokes of the handwritten digit.

Let’s bring that to life by visualizing one digit as a heatmap, where brighter colors indicate higher pixel intensity (i.e. more ink):

import matplotlib.pyplot as plt

# Grab the first sample image (excluding the label) and reshape it
img_matrix = sample_imgs.iloc[0, :-1].values.reshape(28, 28)

plt.imshow(img_matrix, cmap='hot')
plt.title(f"Heatmap of Digit: {sample_imgs.iloc[0, -1]}")
plt.axis('off')
plt.show()

We're using the 'hot' colormap to highlight areas of high intensity.

Bright (yellow/white) regions represent the strokes of the digit.

Darker (black/red) areas are the background or "no ink" zones.

This is exactly the kind of structure a neural network will learn to pick up on, where the ink is, how it's shaped, and what patterns define a 6 versus an 8.

Normalize the Pixel Values

Before feeding the data into a neural network, we need to normalize the pixel values.

Why? Because each pixel is a value between 0 and 255, and neural networks perform better when the input values are on a smaller, consistent scale typically between 0 and 1.

from sklearn.model_selection import train_test_split

# Split features and labels
X = df.iloc[:, :-1].values / 255.0  # normalize
y = df.iloc[:, -1].values

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

We divide every pixel by 255 to convert values from the range [0, 255] to [0, 1]. This small step improves training speed and stability.

Reshape for CNN Input

Convolutional Neural Networks (CNNs) expect input data with height, width, and channels. Right now, each image is a flat vector of 784 values.

Let’s reshape it into (28, 28, 1) format the last dimension 1 is for grayscale (1 channel).

X_train = X_train.reshape(-1, 28, 28, 1)
X_test = X_test.reshape(-1, 28, 28, 1)

The -1 tells NumPy to automatically figure out the batch size. We’re just reshaping each image from a 1D vector into a 2D matrix with a single channel.

Now that our data is cleaned and reshaped properly, let’s define a Convolutional Neural Network using PyTorch

CNNs are especially good at capturing spatial patterns in images. Instead of treating pixels as independent features (like in a basic neural network), CNNs use filters to scan across images and learn patterns like edges, curves, and ultimately digits.

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader, TensorDataset
from sklearn.model_selection import train_test_split

Prepare the Data

We prepare the data by first separating the pixel values (features) from the labels (digit classes), then normalize the features by dividing by 255.0 to scale pixel values to the [0, 1] range, which helps the neural network train more effectively. After splitting the data into training and test sets, we reshape each image into the (1, 28, 28) format expected by PyTorch CNNs, where 1 is the number of color channels (grayscale). We then convert the NumPy arrays into PyTorch tensors, and wrap them in TensorDataset objects. Finally, we use DataLoader to efficiently batch and shuffle the data for training and evaluation.

# Split features and labels
X = df.iloc[:, :-1].values / 255.0
y = df.iloc[:, -1].values

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Reshape and convert to PyTorch tensors
X_train_tensor = torch.tensor(X_train.reshape(-1, 1, 28, 28), dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.long)
X_test_tensor = torch.tensor(X_test.reshape(-1, 1, 28, 28), dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.long)

# Create datasets and dataloaders
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
test_dataset = TensorDataset(X_test_tensor, y_test_tensor)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64)

Define the CNN Model

We define the CNN model to create a neural network architecture that is specifically designed to process image data by learning spatial patterns. Convolutional layers (conv1 and conv2) detect local features like edges and curves, while max pooling layers reduce the spatial dimensions to make the model more efficient and reduce overfitting. The output from the convolutional layers is flattened and passed through fully connected layers (fc1 and fc2) to make predictions. We also use ReLU activations for non-linearity and a dropout layer to help prevent overfitting during training. This architecture transforms input images into class scores representing the digits 0 through 9.

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, 3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(32, 64, 3, padding=1)
        self.fc1 = nn.Linear(64 * 7 * 7, 64)
        self.dropout = nn.Dropout(0.5)
        self.fc2 = nn.Linear(64, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))  # 28x28 → 14x14
        x = self.pool(F.relu(self.conv2(x)))  # 14x14 → 7x7
        x = x.view(-1, 64 * 7 * 7)
        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)
        return x

model = CNN()

Train the Model

We train the model to adjust its internal parameters (weights and biases) so it can accurately classify handwritten digits. This process uses the training data to minimize prediction errors over multiple epochs. We define a loss function (CrossEntropyLoss) that measures how far the model's predictions are from the actual labels, and use the Adam optimizer to update the model’s parameters based on this loss. For each batch of data, we perform a forward pass to get predictions, compute the loss, backpropagate the error, and update the weights. Tracking the running loss and accuracy over each epoch helps us monitor the learning progress.

import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

for epoch in range(10):
    running_loss = 0.0
    correct = 0
    total = 0

    for inputs, labels in train_loader:
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print(f"Epoch {epoch+1}, Loss: {running_loss:.3f}, Accuracy: {100 * correct / total:.2f}%")

Epoch 1, Loss: 386.764, Accuracy: 85.88%
Epoch 2, Loss: 150.673, Accuracy: 94.69%
Epoch 3, Loss: 116.227, Accuracy: 95.82%
Epoch 4, Loss: 99.392, Accuracy: 96.49%
Epoch 5, Loss: 88.411, Accuracy: 96.71%
Epoch 6, Loss: 79.443, Accuracy: 97.05%
Epoch 7, Loss: 70.979, Accuracy: 97.34%
Epoch 8, Loss: 63.954, Accuracy: 97.54%
Epoch 9, Loss: 60.638, Accuracy: 97.70%
Epoch 10, Loss: 57.378, Accuracy: 97.73%

During training, the model gradually improves its ability to classify handwritten digits by minimizing the loss and increasing accuracy across epochs. In the first epoch, the model starts with a relatively high loss (386.764) and an initial accuracy of 85.88%. As training progresses, the loss consistently decreases while accuracy steadily increases, reaching 97.54% by epoch 8. This shows that the model is learning useful features from the data and becoming more confident in its predictions, with less error and better performance over time.

Evaluate the Model on Test Set

model.eval()  # set to evaluation mode

correct = 0
total = 0

with torch.no_grad():  # disable gradient tracking for inference
    for inputs, labels in test_loader:
        outputs = model(inputs)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f"Test Accuracy: {100 * correct / total:.2f}%")

Test Accuracy: 99.04%

import matplotlib.pyplot as plt

# Get a small batch of test images
dataiter = iter(test_loader)
images, labels = next(dataiter)

# Run the model on this batch
outputs = model(images)
_, preds = torch.max(outputs, 1)

# Plot the first 6 images with predictions
fig, axes = plt.subplots(2, 3, figsize=(9, 6))

for i, ax in enumerate(axes.flat):
    img = images[i].squeeze().numpy()  # remove channel dimension
    ax.imshow(img, cmap='gray')
    ax.set_title(f"Predicted: {preds[i].item()}\nActual: {labels[i].item()}")
    ax.axis('off')

plt.tight_layout()
plt.show()

Lets show a a visual representation of how the training loss and accuracy evolved over 8 epochs

import matplotlib.pyplot as plt

# Training metrics from the user-provided output
epochs = list(range(1, 9))
loss = [386.764, 150.673, 116.227, 99.392, 88.411, 79.443, 70.979, 63.954]
accuracy = [85.88, 94.69, 95.82, 96.49, 96.71, 97.05, 97.34, 97.54]

# Plotting
fig, ax1 = plt.subplots(figsize=(10, 5))

color = 'tab:red'
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Loss', color=color)
ax1.plot(epochs, loss, marker='o', color=color, label='Loss')
ax1.tick_params(axis='y', labelcolor=color)
ax1.set_title('Training Loss and Accuracy Over Epochs')

# Second y-axis for accuracy
ax2 = ax1.twinx()
color = 'tab:blue'
ax2.set_ylabel('Accuracy (%)', color=color)
ax2.plot(epochs, accuracy, marker='s', linestyle='--', color=color, label='Accuracy')
ax2.tick_params(axis='y', labelcolor=color)

fig.tight_layout()
plt.grid(True)
plt.show()

As the loss sharply decreases, the accuracy steadily increases indicating that the model is learning meaningful features from the MNIST dataset.

Conclusion

Through this project, we’ve seen how convolutional neural networks (CNNs) can effectively learn to classify handwritten digits using the MNIST dataset. Starting from data preprocessing and normalization, to reshaping images for CNN input, and finally building and training a deep learning model using PyTorch we’ve followed the complete image classification pipeline. With over 99% accuracy on the test set, the results clearly highlight the power of CNNs in computer vision tasks. As AI continues to evolve, foundational projects like this provide essential insights into how machines learn to see and understand visual data. Whether you're building digit recognizers or training models for more complex vision problems, these techniques remain core building blocks.

Install Red Hat Developer hub (RHDH) in a fully air-gapped Minikube environment

Fortune Ndlovu — Thu, 03 Apr 2025 13:42:47 +0000

This guide ensures that every component is self-contained, using only the resources available on your air-gapped machine. That is No external access, no file transfers, everything generated locally.

Prerequisites

Ensure the following tools are installed on the air-gapped machine:

docker --version       # Verify Docker is installed
minikube version       # Ensure Minikube is installed
kubectl version --client  # Check if kubectl is installed
helm version           # Helm is required for deployment

Start Minikube with enough resources:

minikube start --driver=docker --cpus=4 --memory=8192 --no-vtx-check

Important: The --no-vtx-check flag ensures Minikube starts without checking virtualization support.

1, Download and Save Container Images (Locally)

Since your machine has no external access, you need to download the images directly using Podman or Docker.

Step 1.1: Pull Required Images (Locally)

Run the following on the air-gapped machine:

podman pull registry.redhat.io/rhdh/rhdh-hub-rhel9:1.4
podman pull registry.redhat.io/rhel9/postgresql-15:latest

Step 1.2: Save Images as .tar Files

Once the images are pulled locally, save them to .tar archives:

podman save -o rhdh-hub.tar registry.redhat.io/rhdh/rhdh-hub-rhel9:1.4
podman save -o postgresql.tar registry.redhat.io/rhel9/postgresql-15:latest

Step 1.3: Load Images Into Minikube

Minikube does not have direct access to the host’s images, so we must load them manually.

Create a directory inside Minikube for storing the images:

minikube ssh -- mkdir -p /home/docker

Copy the .tar files into Minikube’s internal storage:

minikube cp rhdh-hub.tar /home/docker/rhdh-hub.tar
minikube cp postgresql.tar /home/docker/postgresql.tar

Load the images inside Minikube:

minikube ssh -- docker load -i /home/docker/rhdh-hub.tar
minikube ssh -- docker load -i /home/docker/postgresql.tar

Verify the images are available inside Minikube:

minikube ssh -- docker images | grep rhdh-hub
minikube ssh -- docker images | grep postgresql

If necessary, manually tag the images:

minikube ssh -- docker tag registry.redhat.io/rhdh/rhdh-hub-rhel9:1.4 rhdh-hub-rhel9:1.4
minikube ssh -- docker tag registry.redhat.io/rhel9/postgresql-15:latest postgresql-15:latest

5. Create a Namespace

kubectl create namespace rhdh

2, Configure Persistent Storage for PostgreSQL

Since Minikube won’t automatically create storage, manually configure PersistentVolume (PV) and PersistentVolumeClaim (PVC).

Step 2.1: Create Persistent Volume

Create a file called pv.yaml:

cat <<EOF > pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: postgres-pv
  labels:
    type: local
spec:
  capacity:
    storage: 5Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  hostPath:
    path: "/mnt/data/postgres"
EOF

Apply it:

kubectl apply -f pv.yaml

Create the directory inside Minikube:

minikube ssh -- mkdir -p /mnt/data/postgres
minikube ssh -- sudo chmod 777 /mnt/data/postgres

3, Deploy PostgreSQL

Create a file called postgres.yaml:

cat <<EOF > postgres.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-pvc
  namespace: rhdh
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
---
apiVersion: v1
kind: Service
metadata:
  name: postgres
  namespace: rhdh
spec:
  ports:
    - port: 5432
  selector:
    app: postgres
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgres
  namespace: rhdh
spec:
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      securityContext:
        fsGroup: 26  # Fix permission issues
      containers:
        - name: postgres
          image: postgresql-15:latest
          imagePullPolicy: IfNotPresent
          env:
            - name: POSTGRESQL_DATABASE
              value: "rhdh"
            - name: POSTGRESQL_USER
              value: "rhdh"
            - name: POSTGRESQL_PASSWORD
              value: "rhdhpassword"
          ports:
            - containerPort: 5432
          volumeMounts:
            - name: postgres-storage
              mountPath: /var/lib/pgsql/data
      volumes:
        - name: postgres-storage
          persistentVolumeClaim:
            claimName: postgres-pvc
EOF

Apply it:

kubectl apply -f postgres.yaml

4, Deploy RHDH

Create a file called rhdh.yaml:

cat <<EOF > rhdh.yaml
apiVersion: v1
kind: Service
metadata:
  name: rhdh
  namespace: rhdh
spec:
  ports:
    - port: 7007
      targetPort: 7007
      nodePort: 31207
  selector:
    app: rhdh
  type: NodePort
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rhdh
  namespace: rhdh
spec:
  replicas: 1
  selector:
    matchLabels:
      app: rhdh
  template:
    metadata:
      labels:
        app: rhdh
    spec:
      initContainers:
        - name: wait-for-db
          image: alpine
          command: ['sh', '-c', 'until nc -z postgres.rhdh.svc.cluster.local 5432; do echo waiting for database; sleep 2; done;']
      containers:
        - name: rhdh
          image: rhdh-hub-rhel9:1.4
          imagePullPolicy: IfNotPresent
          command: ["/bin/sh", "-c"]
          args:
            - "mkdir -p /opt/app-root/src/dynamic-plugins-root && exec node packages/backend"
          env:
            - name: DATABASE_URL
              value: "postgresql://rhdh:rhdhpassword@postgres.rhdh.svc.cluster.local:5432/rhdh"
            - name: PGHOST
              value: "postgres.rhdh.svc.cluster.local"
            - name: PGPORT
              value: "5432"
            - name: PGPASSWORD
              value: "rhdhpassword"
            - name: APP_CONFIG_app_baseUrl
              value: "http://192.168.49.2:31207"
            - name: APP_CONFIG_backend_baseUrl
              value: "http://192.168.49.2:31207/api"
            - name: BACKEND_PORT
              value: "7007"
            - name: HOST
              value: "0.0.0.0"
            - name: NODE_ENV
              value: "production"
          ports:
            - containerPort: 7007
          volumeMounts:
            - name: plugins-volume
              mountPath: /opt/app-root/src/dynamic-plugins-root
      volumes:
        - name: plugins-volume
          emptyDir: {}
EOF

Apply it:

kubectl apply -f rhdh.yaml

5, Verify & Access RHDH

Check if the pods are running:

kubectl get pods -n rhdh

Check logs:

kubectl logs -n rhdh -l app=rhdh

Use port forwarding:

kubectl port-forward svc/rhdh 7007:7007 -n rhdh

Access RHDH at:

http://localhost:7007

Most Used Git Commands: A Step-by-Step Guide with Examples

Fortune Ndlovu — Thu, 03 Apr 2025 13:23:28 +0000

Whether you're new to Git or need a quick refresher, this post will walk you through the most commonly used Git commands with practical examples. Each command is paired with input/output and real-world context so you can learn by doing. Let’s dive in!

1. `git init` – Start a New Git Repository

Use Case:

You're creating a new project and want to start tracking it with Git.

Input:

mkdir my-app && cd my-app
git init

Output:

Initialized empty Git repository in /home/user/my-app/.git/

Explanation:

This creates a .git directory that tracks your project. It's now a local Git repo.

2. `git clone` – Copy an Existing Repository

Use Case:

You want to contribute to an existing project or start working locally on one from GitHub.

Input:

git clone https://github.com/octocat/Hello-World.git

Output:

Cloning into 'Hello-World'...
remote: Enumerating objects: 10, done.
remote: Counting objects: 100% (10/10), done.

Explanation:

Creates a new directory (Hello-World/) and downloads all files and history from the repo.

3. `git status` – Check Your Work

Use Case:

You want to see what’s changed, what’s staged, and what’s untracked.

Input:

git status

Output:

On branch main
Untracked files:
  (use "git add <file>..." to include in what will be committed)

    index.html

Explanation:

Shows what files are untracked (not staged) or modified. Helpful before committing.

4. `git add` – Stage Changes

Use Case:

You're ready to tell Git which files to include in the next commit.

Input:

git add index.html

Output:

(no output on success)

Explanation:

Stages the file so it will be included in the next commit. Use . to stage all files:

git add .

5. `git commit` – Save a Snapshot

Use Case:

You're done with a chunk of work and want to checkpoint your progress.

Input:

git commit -m "Add landing page"

Output:

[main 3e1f51a] Add landing page
 1 file changed, 12 insertions(+)

Explanation:

Commits all staged files to your local repo with a message.

6. `git push` – Upload Changes to Remote

Use Case:

You want to push your local commits to GitHub (or another remote).

Input:

git push origin main

Output:

Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
To https://github.com/user/my-app.git
   abc1234..def5678  main -> main

Explanation:

Sends your local commits to the remote repository. Often used after commit.

7. `git pull` – Get Latest Changes

Use Case:

Other team members made changes and you want to fetch + merge them into your local copy.

Input:

git pull origin main

Output:

Updating abc1234..def5678
Fast-forward
 style.css | 4 ++++

Explanation:

Downloads and merges new changes from the remote branch.

8. `git branch` – Manage Branches

View all branches:

git branch

Create a new branch:

git branch feature/login

Switch to a branch:

git checkout feature/login

Output:

Switched to branch 'feature/login'

Explanation:

Branches help isolate features or fixes. Always create a branch for new work!

9. `git merge` – Combine Branches

Use Case:

You're done working in a feature branch and want to merge it into main.

Input:

git checkout main
git merge feature/login

Output:

Updating abc1234..def5678
Fast-forward
 login.js | 20 ++++++++++++++++++++

Explanation:

Combines the changes from the feature branch into your main branch.

10. `git log` – View Commit History

Input:

git log --oneline

Output:

def5678 Add login functionality
abc1234 Add landing page
3e1f51a Initial commit

Explanation:

Shows a compact history of commits. Super helpful for understanding progress or debugging.

Bonus: Undoing Mistakes

Unstage a file:

git restore --staged file.txt

Discard changes:

git restore file.txt

Reset last commit (keep changes):

git reset --soft HEAD~1

Summary Table

Command	Description
`git init`	Start a new repo
`git clone`	Copy an existing repo
`git status`	Check file changes
`git add`	Stage files for commit
`git commit`	Record staged changes
`git push`	Upload commits to remote
`git pull`	Get latest changes from remote
`git branch`	Create/list branches
`git checkout`	Switch branches
`git merge`	Combine branches
`git log`	View commit history

Final Thoughts

These commands form the backbone of your daily Git workflow. Whether you're building a side project, collaborating on a team, or contributing to open source, mastering these basics will keep your code safe, trackable, and team-friendly.

Understanding Node.js Streams — The Unsung Hero of I/O

Fortune Ndlovu — Thu, 03 Apr 2025 13:14:19 +0000

When working with Node.js, you’ve probably dealt with reading files, sending HTTP responses, or handling data from APIs. But have you ever wondered how Node handles large data efficiently?

Welcome to the world of Streams.

What Are Streams in Node.js?

Streams are a powerful way to handle large chunks of data piece by piece instead of loading it all into memory at once.

They follow a simple idea: process data as it comes. Whether it's reading a file, sending a response, or piping video data — streams let you read, write, and transform data efficiently.

There are four types of streams in Node.js:

Readable – for reading data (e.g., fs.createReadStream)
Writable – for writing data (e.g., fs.createWriteStream)
Duplex – both readable and writable (e.g., a TCP socket)
Transform – a special type of Duplex that modifies data as it passes through (e.g., gzip compression)

Why Streams Matter

Imagine you’re reading a 1GB log file. With fs.readFile(), you’re loading the entire file into memory. That’s risky.

With streams:

const fs = require('fs');

const stream = fs.createReadStream('bigfile.log', { encoding: 'utf8' });

stream.on('data', (chunk) => {
  console.log('Received chunk:', chunk.length);
});

stream.on('end', () => {
  console.log('Done reading file.');
});

This reads the file in small chunks, using less memory and allowing other operations to continue — great for performance and scalability.

Stream Piping

One of the coolest things about streams is piping — connecting one stream to another:

const fs = require('fs');

const readStream = fs.createReadStream('input.txt');
const writeStream = fs.createWriteStream('output.txt');

readStream.pipe(writeStream);

Boom! You just copied a file using streams — memory efficient and elegant.

Real-World Use Cases

Reading/writing large files
Streaming video/audio content
Handling HTTP requests/responses
Processing data in real-time (e.g., CSV parsing, compression)

Gotchas and Tips

Always handle stream errors: stream.on('error', handler)
Backpressure can occur — use .pipe() to manage it automatically
Prefer async/await with pipeline for better readability in modern apps

const { pipeline } = require('stream/promises');

await pipeline(
  fs.createReadStream('input.txt'),
  fs.createWriteStream('output.txt')
);

TL;DR

Streams are the secret sauce behind Node.js's performance with I/O. They let you process data efficiently, save memory, and keep your app fast and responsive.

Next time you’re reading files, dealing with APIs, or handling data flows — think streams.

Have you used streams in your projects? Any cool patterns or struggles? Drop them in the comments! 💬👇

Is jQuery Dead?

Fortune Ndlovu — Thu, 03 Apr 2025 13:10:02 +0000

Ah, jQuery — the library that powered a generation of web apps.

In the early 2010s, it was nearly impossible to build a frontend without it. DOM manipulation, AJAX calls, cross-browser compatibility — jQuery was the go-to solution. Fast forward to today, and modern JavaScript frameworks like React, Vue, and Svelte dominate the landscape.

So, is jQuery dead?

Not Exactly.

While it's no longer the star of modern frontend development, jQuery is far from dead. In fact, it still ships with WordPress, is used by countless legacy apps, and continues to have millions of downloads per week on npm.

Here's why jQuery is still alive:

Legacy Systems: Thousands of websites still run on jQuery. Rewriting them from scratch isn't always feasible.
Simplicity: For quick scripts and small projects, jQuery still offers a fast way to handle DOM manipulation.
Developer Familiarity: Many developers — especially those maintaining older projects — are comfortable with it.

But Here's the Catch:

Modern JavaScript has caught up. With querySelector, fetch, classList, and more, the main reasons for using jQuery are now natively supported.
Frameworks Rule: Component-based frameworks and reactive libraries make jQuery's imperative model feel outdated.
Performance & Bundle Size: In performance-sensitive apps, the extra weight of jQuery is hard to justify.

TL;DR

jQuery isn't dead — it's just retired from the spotlight.

It’s a legacy tool that still works, but if you're starting a new project in 2025, it’s probably best to skip it.

What do you think? Are you still using jQuery in any of your projects? Let me know in the comments! 👇

What is Agile and What It's Definitely Not

Fortune Ndlovu — Thu, 03 Apr 2025 12:39:23 +0000

“Agile” gets thrown around a lot in tech. Some teams think daily standups and Jira boards mean they’re Agile. But Agile isn’t a checklist of ceremonies it’s a mindset.

In this post, we’ll unpack what Agile really means, and what it doesn’t, and show examples of each.

What Agile Is

Agile is about delivering value iteratively, working closely with stakeholders, and being flexible in the face of change. It’s rooted in the Agile Manifesto, which values:

Individuals & interactions over processes & tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan

Let us look at a few real-world examples.

Iterative Delivery

You launch a minimal version of a developer portal, just enough to be useful and ship updates weekly based on user feedback. That’s Agile. You’re delivering early, gathering insights, and improving fast.

Retrospectives That Drive Change

After each sprint, your team reflects and applies one concrete improvement. Agile isn't just about building products it’s about building better teams over time.

What Agile Is Not

Agile is not a process to follow blindly or a way to rebrand Waterfall. It’s not micromanagement in disguise, and it’s definitely not ignoring feedback because “it’s in the sprint plan.”

Micromanagement with Jira

Jira tracks every task, but the team has zero say in scope or implementation. That’s command-and-control, not Agile. Agile thrives on trust, autonomy, and shared goals.

Ignoring Change

Users raise concerns mid-sprint, but the team says “Too late, it's already planned.” Agile teams embrace change even mid-sprint if it leads to better outcomes.

Agile Is a Mindset, Not a Method

Scrum, Kanban, SAFe they’re frameworks, not guarantees. Agile is about adaptability, collaboration, and continuous improvement. Tools help, but mindset matters more.

Agile isn't about doing Agile. It’s about being Agile.

Final Thoughts

If your team delivers working software, listens to users, adapts to feedback, and improves each sprint that’s Agile.

If you're just checking boxes, it might be time to rethink what agility really means.

Forem: Fortune Ndlovu

Build a RAG agent with LangChain and Ollama

Concepts I’m leaning on (short, practical definitions)

Prerequisites

Install and .env (from the project root)

Repo layout (what each file is for)

Build walkthrough: what goes in each file

requirements.txt

rag_agent.py

bench.py

rag_lab/config.py

rag_lab/ingest.py

rag_lab/tools.py

rag_lab/strategies/prompts.py

rag_lab/strategies/agent.py

rag_lab/strategies/chain.py

rag_lab/strategies/crag.py

rag_lab/strategies/__init__.py

rag_lab/cli.py

rag_lab/bench.py

evals/qa.yaml

rag_lab/__main__.py and rag_lab/__init__.py

Exact log output (full transcript from one session)

Setup: Ollama pulls + ingest

First strategy: agent (tool-calling)

Second strategy: chain (always-retrieve)

Third strategy: crag (corrective RAG)

What you can already see in the logs (before any benchmark table)

Don’t run benchmarks from inside the chat REPL

What not to do (mistyped bench inside the REPL)

Benchmarks (results)

What the numbers say (tool-calling agent)

chain and crag on the same harness

What I’d actually choose

Build an AI Chatbot Backend in Rust: Step-by-Step Tutorial

What We're Building and Why

Prerequisites

Project Structure

Step 1: Project Setup

Creating the Project

Step 2: Configuring Dependencies (Cargo.toml)

Step 3: Environment Configuration

Step 4: The Main Server Code (src/main.rs)

Step 5: The Interactive CLI Client (src/cli.rs)

Step 6: Running Your Chatbot

1. Start the Server

2. Use Interactive Mode

3. Test with HTTP

Conclusion

Resources

Install Red Hat Developer Hub with AI Software Templates on OpenShift

What are Software Templates?

What are AI Software Templates?

Architecture Overview

Prerequisites

Clone the AI Installer Repo

Create a GitHub App for RHDH

Create private.env with Secrets

Install RHDH and Dependencies

Create a GitOps Repo

Edit your argocd-config ConfigMap:

Configure developer-hub-app-config??

Create a GitHub App Secret and Inject it

Run the Configuration Script

Restart RHDH Deployment (if login fails)

Log in and Use the Portal

Let’s Register Existing AI Templates

1. Catalog Processor Kicks In

2. Each Location is Followed

3. Entities are Previewed

4. Entities Are Registered in the Catalog

5. Templates Are Now Usable

Backstage for the win!

Prerequisites

Creating a Backstage Application

Project Structure Overview

Run the Backstage application

Setting Up PostgreSQL with Backstage

Configure Backstage (app-config.local.yaml)

Linux

Install and `.env` (from the project root)

`requirements.txt`

`rag_agent.py`

`bench.py`

`rag_lab/config.py`

`rag_lab/ingest.py`

`rag_lab/tools.py`

`rag_lab/strategies/prompts.py`

`rag_lab/strategies/agent.py`

`rag_lab/strategies/chain.py`

`rag_lab/strategies/crag.py`

`rag_lab/strategies/init.py`

`rag_lab/cli.py`

`rag_lab/bench.py`

`evals/qa.yaml`

`rag_lab/main.py` and `rag_lab/init.py`

First strategy: `agent` (tool-calling)

Second strategy: `chain` (always-retrieve)

Third strategy: `crag` (corrective RAG)

What not to do (mistyped `bench` inside the REPL)

What the numbers say (tool-calling `agent`)

`chain` and `crag` on the same harness

1. `git init` – Start a New Git Repository

2. `git clone` – Copy an Existing Repository

3. `git status` – Check Your Work

4. `git add` – Stage Changes