Beyond the Chatbot: Engineering a Hybrid AI Math Tutor for the Future

V K Adhithiya Kumar — Sun, 22 Mar 2026 18:52:13 +0000

Building AI tools for education is tricky. Schools and students need the intelligence of cutting-edge LLMs, but they also need strict privacy, offline capabilities, and guardrails against prompt injection and toxic outputs.

For this hackathon, I built Neural Math Lab: a React/Vite-based math orchestrator that seamlessly switches between Azure OpenAI (with RAG) and Local Ollama (DeepSeek-R1 and minicpm v), all sitting behind a custom security proxy.

Here is how I built a system designed for the Offline-Ready AI and Agentic System Architecture tracks.

🔗 [https://github.com/dev-Adhithiya/Neural-Math-Lab]

🏗️ The Architecture: Client, Proxy, and Intelligence

I wanted to build something beyond a simple API wrapper. The app is split into a frontend UI and a Node.js backend proxy.

Frontend (React + Vite): Handles the UI, the Node-link Topic Map for navigation, and local state management (IndexedDB).
Security Proxy (Node.js/Express): The true engine of the app. It holds all Azure keys securely server-side and runs all prompts through policy middleware before they ever reach an LLM.
The AI Layer: A toggleable hybrid system connecting to either Azure AI Foundry/OpenAI or a local Ollama instance.

🛡️ Responsible AI & The Security Proxy

One of my main focuses was building "Enterprise-grade" safety into an educational tool. If a student tries to jailbreak the tutor, the system needs to catch it.

Instead of calling LLMs directly from the browser, I routed everything through a custom backend proxy. This allowed me to implement a robust Policy Middleware:

Prompt Injection Filter: Detects and blocks system override attempts before the LLM processes them.
Safety Categories: Scans for violence, self-harm, hate speech, and cyber abuse.
Strict Mode: A toggleable setting that completely blocks flagged outputs rather than just warning the user.

By keeping the Azure keys in the server .env, the client remains entirely unprivileged.

🔌 Hybrid Intelligence: Cloud RAG vs. Local Inference

Not every student has a stable internet connection, and not every query needs to be sent to the cloud. Neural Math Lab features a unified streaming interface that supports two distinct modes:

1. Local Mode (Privacy-First & Multi-AI Orchestration)

Unlike standard implementations that rely on a single model, Neural Math Lab uses a Local Multi-Agent Pipeline via Ollama. This allows for complex, multimodal workflows entirely on-device:

The Vision Agent (MiniCPM-V): When a student uploads a photo of a handwritten equation or a geometric diagram, the app routes the image to MiniCPM-V. This specialized model "sees" the math, performing spatial reasoning to convert visual homework into structured digital text.

The Reasoning Agent (DeepSeek-R1): Once the problem is digitized, it is passed to DeepSeek-R1. R1 doesn't just provide an answer; it uses its internal Chain-of-Thought (CoT) to "think" through the logical steps, explaining the process of solving the math.

Zero Latency & 100% Privacy: Because both MiniCPM and DeepSeek run via Ollama, sensitive student data (like photos of their desk or handwriting) never leaves their machine.

2. Online Mode (Azure RAG)

When connected to the internet, the app leverages an Azure OpenAI endpoint combined with Azure AI Search.

I chunked and indexed a math_textbook.pdf.
When a student asks a complex question, the proxy fetches top matches from the Azure index and injects them as grounding context. The AI doesn't just guess; it teaches directly from the syllabus.

🔒 User Control and Data Retention

To round out the privacy-first approach, I built comprehensive data controls directly into the frontend Settings:

Persistent Chat via IndexedDB: Chats are saved locally in the browser.
Encrypted Local State: Users can set a VITE_LOCAL_VAULT_KEY to encrypt their local data.
Auto-Delete Retention: Configurable settings to automatically prune chat history after a set number of days, plus one-click export/delete controls.

🚀 Running it Locally

If you are testing this out, getting started is easy:

Clone the repo and run npm install.
Copy .env.example to .env and add your Azure keys (kept safely on the server!).
Start your local Ollama instance: ollama run deepseek-r1:7b.
Spin up the full stack: npm run dev:full (Starts Vite on port 5173 and the Proxy on 8787).

🛠️ Step-by-Step Walkthrough: How it Works
Step 1: Secure Request Handling (The Backend Proxy)
Most frontend apps leak API keys in the network tab. I built a Node.js/Express proxy to solve this. When a user sends a math question:

The frontend hits http://localhost:8787/api/chat.

The Policy Middleware scans the prompt for injection attacks (e.g., "forget your math rules").

If safe, the backend attaches the Azure OpenAI Key (stored securely in .env) and forwards the request.

Step 2: The "Hybrid" Fork
In the Settings.tsx component, I implemented a state-managed switch. Depending on the toggle:

Online Path: The proxy uses the openai SDK to call Azure. It simultaneously queries Azure AI Search to find relevant math formulas from a pre-indexed textbook.

Local Path: The request is redirected to the Ollama API (localhost:11434). This allows the app to work 100% offline using the DeepSeek-R1 reasoning model.

Step 3: Visual Learning with the Topic Map
I moved beyond standard chat lists by building a node-link Topic Map.

Using a graph-based navigation system, students can see how "Algebra" connects to "Calculus."

Clicking a node triggers a state update that fetches the specific context for that math topic, grounding the AI's response in that specific domain.

Step 4: Local Privacy & Persistence
Data privacy is a right, not a feature. I used IndexedDB for chat history storage.

Persistence: Chats stay in the browser even after a refresh.

Optional Encryption: Users can toggle "Encrypt local state." This uses a vault key to scramble the data before it hits the browser storage, protecting it from other users on the same machine.

Step 5: Responsible AI Filtering
Before the response is streamed back to the student, it passes through a final Output Filter.

If the model attempts to generate restricted content (violence, self-harm, or non-educational topics), the stream is intercepted and replaced with a "Safety Policy Violation" message.

What's Next?

Building Neural Math Lab proved to me that we don't have to compromise between powerful AI and strict safety/privacy. The hybrid approach—RAG when you need accuracy, Local when you need privacy—is the future of educational tech.

Thanks for checking out my submission! I'd love to hear your thoughts in the comments.

Forem: V K Adhithiya Kumar