Forem: Kevin Djabaku Ocansey

Building JarvisOS.

Kevin Djabaku Ocansey — Mon, 06 Apr 2026 17:46:35 +0000

What is JarvisOs

Local models have been growing fast. Frameworks like Ollama make it easy to load and run models on a desktop or servers, but last year I came across Cactus. It is an inference engine built specifically to run LLMs, vision models, and speech models on any smartphone, including low-range devices. Cactus Compute Think of it as Ollama, but for mobile.

Cactus provides SDKs in Flutter, Kotlin, and React Native that let developers build workflows with agentic tool use, RAG, and more. But those SDKs are app-level. They sit inside your application. There was nothing treating the phone itself as the compute platform — no system-level orchestration layer, no persistent agentic runtime that other apps could plug into.

That's the gap our team plans to fills. We're building an agentic system on top of Cactus, running as privileged Android system services. Everything stays on the device — no cloud routing, no API calls home. The phone isn't a remote interface to some server somewhere. It is the server.
We did some research, defined our requirements, and started building. The next section covers what you need to get the system running if you want to contribute.

Setup and Requirements

To build JarvisOS, you need real control over the OS. That means a custom Android distribution. We chose LineageOS — a free, open-source Android distribution that extends the functionality and lifespan of mobile devices from more than 20 manufacturers, and gives us the ability to modify the system server itself.

We started on LineageOS 23, ran into issues, and dropped back to LineageOS 21 for active development. LineageOS 22.1 is based on Android 15 QPR1, which is what we're targeting for actual device deployment when the time comes.

wiki.lineageos.org

[Link above: Build for Nothing Phone Lineage docs]

For your development machine, here's what you need:

Before anything else, you need the right machine.

OS: Ubuntu 22.04 or newer on Linux. We used WSL2 on Windows
RAM: At minimum 16GB. However for LineageOS 21 and up, 64GB is recommended (the less RAM you have, the longer the build will take. Your machine will remind you of this constantly)
Storage: 400GB free for LineageOS 21 and up.
A Nothing Phone 2 (codename Pong) or a Google Pixel 6. Both have NPUs for on-device inference.

Once your machine is ready, the first real step is installing the repo tool. This is Google's tool for managing Android's hundreds of individual git repositories as one coordinated source tree.

Step 1: Initialise the LineageOS source

mkdir -p ~/android/lineage && cd ~/android/lineage
repo init -u https://github.com/LineageOS/android.git -b lineage-21.0 --git-lfs --no-clone-bundle

Step 2: Add the JarvisOS manifest

mkdir -p ~/android/lineage/.repo/local_manifests
cat > ~/android/lineage/.repo/local_manifests/jarvos.xml << 'EOF'
<?xml version="1.0" encoding="UTF-8"?>
<manifest>
  <remote name="JarvisOs"
          fetch="ssh://git@github.com/"
          revision="main" />

  <project name="ocansey11/android_frameworks_base"
           path="frameworks/base"
           remote="JarvisOs"
           revision="lineage-21.0" />

  <project name="ocansey11/vendor_jarvisos"
           path="vendor/jarvisos"
           remote="JarvisOs"
           revision="main" />

  <project name="ocansey11/cactus"
           path="vendor/cactus"
           remote="JarvisOs"
           revision="main" />
</manifest>
EOF

The manifest file is what tells repo where to find each piece of the project and where to put it in your build tree. Each entry has three key things: the GitHub repo to clone (name), where it lands locally (path), and which branch to use (revision).

The block at the top just defines GitHub as the source so you don't have to repeat the URL for every project.

That's all it is — a map. Repo reads it, pulls everything down, and your tree is assembled

Step 3: Sync

repo sync

You should have something like this in the end

~/android/lineage/                          ← LineageOS build root
│
├── frameworks/
│   └── base/                              ← ocansey11/android_frameworks_base
│       └── services/core/java/com/android/server/rag/  <- Jarvis as a system service
│           ├── RagService.java
│           ├── core/
│           ├── indexing/
│           ├── inference/
│           ├── model/
│           ├── search/
│           └── tools/
│
└── vendor/
    ├── jarvisos/                          ← ocansey11/vendor_jarvisos
    │   ├── sepolicy/
    │   ├── prebuilts/objectbox/
    │   └── (documentation)
    │
    └── cactus/                            ← ocansey11/cactus

System Server and Architecture — How We Built It and Why

We have already explored using cactus at the Application level. Currently Jarvis will be a background service. Think of it as an extension of system_server — the same process that runs ActivityManager, WindowManager, and every other privileged Android service. Our RagService boots alongside those on startup.

Communication follows the standard Android pattern: apps talk to RagManager (our public API), which crosses the process boundary via Binder IPC using an AIDL interface, landing inside RagService which orchestrates everything — RAG pipeline, model selection, tool dispatch.

Is this the most performant approach possible?

Honestly, no.

The fastest possible on-device AI pipeline would be purpose-built native code: custom C++ inference with hand-tuned ARM kernels talking directly to hardware accelerators, zero JVM overhead, no Binder serialisation costs on every call. That's what a team at Qualcomm or Google would ship.

What we have instead is a Java system service calling into Cactus through a JNI wrapper, with Binder IPC adding latency on every query boundary. Each tool dispatch adds a broadcast round-trip with a 10-second timeout window.

The reason is, i personally do not have the experience with android systems programming and i use books like

Android Systems Programming by Roger Ye
Effective Java by Joshua Bloch

Along with Claude Code to help architect the specs and code the various pipelines.

Also, Cactus already solved the hard part — ARM SIMD kernels, KV cache quantisation.

Building on top of it meant we could focus on the thing that actually didn't exist: the agentic orchestration layer. The system server approach gave us process isolation, kernel-enforced permissions, and a persistent runtime that survives app restarts — things you can't get from an app-level SDK.

We want to start somewhere. And this is a solid somewhere. After enough experience we can fiure out how to actually do this from scratch properly. Think of it as our experiment to figure out if it can actually work.

Jarvis system service architecture

Below is a new system background service we are adding which extends system server to handle rag, persistent memory etc the foundation for agentic behaviour

frameworks/base/services/core/java/com/android/server/rag/
│
├── RagService.java          ← Main orchestrator, entry point
├── IRagService.aidl         ← Binder contract
├── Android.bp               ← Build file
│
├── core/
│   ├── RagManager.java      ← Public API wrapper
│   ├── RagException.java    ← Exception definitions
│   ├── JarvisStore.java     ← ObjectBox store init
│   ├── ModelRegistry.java   ← Manages model + index handle pairs
│   └── IndexQueue.java      ← Queues indexing tasks
│
├── indexing/
│   ├── RagIndexWorker.java  ← Processes the index queue
│   ├── TextExtractor.java   ← Handles multiple file types
│   ├── ChunkingStrategy.java← Splits documents into chunks
│   └── JarvisFileObserver.java ← Watches filesystem for changes
│
├── search/
│   └── MetadataSearch.java  ← Metadata-based search
│
├── model/
│   ├── SourceFile.java      ← ObjectBox entity
│   ├── DocumentChunk.java   ← ObjectBox entity
│   ├── Folder.java          ← ObjectBox entity
│   ├── Chunk.java           ← ObjectBox entity
│   ├── Conversation.java    ← ObjectBox entity
│   ├── Message.java         ← ObjectBox entity
│   ├── UserContext.java     ← ObjectBox entity
│   ├── AccessLog.java       ← ObjectBox entity
│   └── TaskMemory.java      ← ObjectBox entity
│
├── inference/
│   └── CactusWrapper.java   ← Single entry point to Cactus
│
└── tools/                   ← Managing Tools from Apps
    ├── AppRecord.java        ← One entry per installed app
    ├── ToolRecord.java       ← One entry per tool
    ├── ToolScannerService.java ← Scans APKs on install
    └── ToolDispatcher.java   ← Resolves + fires tools

Vendors — The supporting Layer

vendor/jarvisos holds everything that supports the system services but isn't part of them:

Sepolicy — Telling Android to Trust Us

Android doesn't trust new system services by default. SELinux is the security layer baked into every Android device. This enforces strict rules about what each process is allowed to do. Without the right policy, our service would be blocked from reading files, sending broadcasts, or talking to other services, regardless of what the code says.

The sepolicy rules in vendor/jarvisos/sepolicy/are what grant JarvisOS the permissions it needs at the OS level.

ObjectBox — The Database Layer and Where We're Headed

The compiled ObjectBox libraries sit in vendor/jarvisos/prebuilts/objectbox/ and get picked up at build time. At runtime they power everything in the model/ folder ie SourceFile, DocumentChunk, Conversation, TaskMemory, AccessLog. It's an embedded database that handles both structured queries and vector search in the same store.

Our indexer which stores metadata and pointers, never content, never embeddings. SourceFile holds file paths, hash, and mime type. DocumentChunk stores a short summary and a cactusIndexId, an integer pointer into Cactus's binary index. The actual embedding will live in Cactus.

When RagIndexWorker indexes a file, it calls CactusWrapper.embed() to generate the vector, then CactusWrapper.indexAdd() to store it.

ObjectBox only keeps the ID that points there.

This separation also sets up where we want to go. Traditional RAG retrieves chunks that look similar, useful, but it can't connect facts that live in different documents. Also for mobile, users may have different ways of referring to tasks in which semantic similarity may not be enough

GraphRAG adds a knowledge graph between the indexer(objectbox) and the retriever(cactus), so instead of returning isolated chunks it returns entities and the relationships between them. ObjectBox is already positioned to store that graph layer. We haven't built it yet, but the architecture doesn't need to change to get there.

Cactus — The Engine

vendor/cactus is our fork of the Cactus inference engine. At runtime it powers everything that requires a model — embeddings, vector search, and LLM completions. The entire codebase talks to it through exactly one file: CactusWrapper.java (in the android/framework/base directory).

That wrapper exposes five primitives we actually use. init loads a model and returns a handle. embed turns text into a float array. indexInit, indexAdd and indexQuery manage the vector index. complete runs inference with optional RAG context and tool definitions injected as a system message.

Everything goes through JNI — Java calls into the Cactus C++ engine via native methods. It's blocking by design. RagIndexWorker and RagService already run on background threads so that's fine.

The wrapper is intentionally thin. Cactus handles the hard parts — ARM kernels, quantisation, attention. We handle the orchestration above it. If we ever need to swap Cactus out or contribute changes upstream, there's exactly one folder to touch.

The Changing Landscape of Small Models

The SLM space is moving fast. Genuinely fast. Gemma 4 dropped this week — four sizes, with the E2B and E4B built specifically for on-device use. It is up to 4x faster than the previous version and uses up to 60% less battery. The entire family moves beyond simple chat to handle complex logic and agentic workflows, with native function calling built in.

For JarvisOS that is significant. We currently manage multiple model and index handle pairs through ModelRegistry because different tasks need different models. A capable single model that handles inference, embeddings, and tool calling natively starts to change that equation. And on top of that Gemma is multimodal.

But this is exactly why moving slowly and deliberately matters. The landscape shifts every few months. If we had tightly coupled our architecture to a specific model six months ago we would be rewriting it now. Instead CactusWrapper is a clean boundary, ModelRegistry is flexible, and swapping in Gemma 4 or whatever comes next is a simple configuration change.

Important research and papers released recently

Running a model on a phone is a different problem to running one in the cloud. Every token generated costs compute. Every token the model has already seen costs memory. On a server you throw hardware at both problems. On a phone you can't. Two research directions we've been following attack this directly.

CALM

Continuous Autoregressive Language Models (CALM), is a paradigm shift from discrete next-token prediction to continuous next-vector prediction. CALM uses a high-fidelity autoencoder to compress a chunk of K tokens into a single continuous vector, this allows the team to model language as a sequence of continuous vectors instead of discrete tokens (Shao et al., 2025).

As the research space grows, this technique will be necessary to further reduce the number of autoregressive steps models need to take, making on-device inference faster and more practical without requiring larger hardware.

Read about CALM here CALM Paper

TurboQuant

Quantization compresses LLMs so they can run on devices with limited RAM, but there's a tradeoff. Compressing weights from 16 bits to 4 bits means rounding values, and those small errors compound across thousands of operations, quietly degrading accuracy. Google's TurboQuant tackles this differently by compressing the KV cache specifically down to 3 bits, with no training or fine-tuning required and no measurable accuracy loss.

TurboQuant: Redefining AI efficiency with extreme compression

Reinforcement Learning

Reinforcement learning is how you make a system improve through experience rather than explicit programming. For JarvisOS, the practical application is tool selection. Right now ToolDispatcher picks tools based on semantic similarity. Over time, it should learn which tools actually produce good outcomes for which queries. RL gives you that feedback loop without needing labelled training data, just signals from what worked and what didn't. Some papers to help are

Reinforcement Learning for Strategic Tool Use in LLMs

Federated learning

Federated learning is a technique to improve models without centralising data. Each device trains locally on its own interactions and shares only the model updates, never the raw data. The common assumption is that this requires a central server to aggregate those updates — but recent research shows that's not actually necessary. Serverless approaches like Plexus demonstrate that devices can coordinate directly with each other peer-to-peer, with no central infrastructure at all. (Dhasade et al., 2025)

Practical Federated Learning without a Server

For JarvisOS that matters. A privacy-first OS that routes model improvements through a central server would be contradicting its own premise. Peer-to-peer federated learning means phones running JarvisOS could get smarter over time from each other, without anyone's data ever leaving their device.

Final Remarks

Right now every app that wants local AI capabilties bundles its own model. One user will have GPT, Claude, Deepseek, Perplexity etc. Imagine A notes app requiring a user to download a model. A calendar app; downloads a model. A navigation app downloads another. The phone ends up with multiple copies of similar models eating storage and RAM, each isolated, none aware of the others. That's the wrong direction.

MCP — the Model Context Protocol popularised by Anthropic — tries to solve a similar problem on the desktop. It lets AI models connect to external tools and services through a standardised interface. It's a good idea, but it's designed for a world where the model lives on a server and tools are remote services. On a phone, running an HTTP server in the background is exactly the kind of thing Android will kill to save battery.

JarvisOS takes a different position. The model lives at the OS level. Apps don't download models — they register tools. A calendar app tells JarvisOS what it can do by declaring capabilities in its manifest, and JarvisOS handles the intelligence. Binder IPC replaces HTTP — it's kernel-enforced, microsecond latency, and Android won't kill it because it's a system service.

The shift this requires from app developers is actually small. Instead of building AI into your app, you describe what your app can do (well-defined actions with clear inputs and outputs, declared in your manifest. Think of it like a ContentProvider but for intent. You're not exposing data, you're exposing capability). JarvisOS figures out when to call it. This way apps get smarter without carrying the weight of a model, and the phone gets a single intelligence layer that sees across all of them rather than being fragmented across dozens of isolated AI stacks.

References

Cactus Compute. (2025). Cactus: AI Inference Engine for Phones & Wearables. https://github.com/cactus-compute/cactus

Cactus Compute. (2025). Cactus v1: Cross-Platform LLM Inference on Mobile. https://cactuscompute.com

LineageOS. (2024). Changelog 29: LineageOS 22.1 based on Android 15 QPR1. https://lineageos.org/Changelog-29/

LineageOS Wiki. (2025). Build for Nothing Phone (Pong). https://wiki.lineageos.org/devices/Pong/build/

Nothing Technology. (2025). Nothing Phone (3): Snapdragon 8s Gen 4. https://androidauthority.com/nothing-phone-3-snapdragon-chip-3568225/

Google DeepMind. (2026). Gemma 4: Byte for byte, the most capable open models. https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4/

Google Android Developers. (2026). Gemma 4: The new standard for local agentic intelligence on Android. https://android-developers.googleblog.com/2026/04/gemma-4-new-standard-for-local-agentic-intelligence.html

Shao, Z., et al. (2025). Continuous Autoregressive Language Models (CALM). https://arxiv.org/abs/2510.27688

Zandieh, A., & Mirrokni, V., et al. (2026). TurboQuant: Redefining AI efficiency with extreme compression. https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/

Chi, J., & Zhong, W. (2025). ReTool: Reinforcement Learning for Strategic Tool Use in LLMs. https://arxiv.org/abs/2504.11536

Daly, et al. (2024). Federated Learning in Practice. https://arxiv.org/abs/2410.08892

Dhasade, A., et al. (2025). Practical Federated Learning without a Server. https://arxiv.org/abs/2503.05509

ObjectBox. (2024). ObjectBox: On-device vector database. https://objectbox.io

Acknowledgements
The following articles informed the research and visuals used in this post:
Chitalia, R. (2024). An Introduction to RAG and Simple Complex RAG. Medium. https://medium.com/enterprise-rag/an-introduction-to-rag-and-simple-complex-rag-9c3aa9bd017b

LearnByBuilding. (2024). RAG from Scratch. https://learnbybuilding.ai/tutorial/rag-from-scratch/

GradientFlow. (2024). Techniques, Challenges and Future of Augmented Language Models. https://gradientflow.com/techniques-challenges-and-future-of-augmented-language-models/

Siddique, Y. (2024). Tool Calling for LLMs: A Detailed Tutorial. Medium. https://medium.com/@yasir_siddique/tool-calling-for-llms-a-detailed-tutorial-a2b4d78633e2

LLaVA Team. (2024). LLaVA-NeXT Video. https://llava-vl.github.io/blog/2024-04-30-llava-next-video/

Shemet, R. (2025). Cactus On-Device Inference. HuggingFace. https://huggingface.co/blog/rshemet/cactus-on-device-inference

DigitalOcean. (2024). Model Quantization for Large Language Models. https://www.digitalocean.com/community/tutorials/model-quantization-large-language-models

CodeToDeploy. (2024). How On-Device LLMs Rewrite the Rules of App Development. Medium. https://medium.com/codetodeploy/how-on-device-llms-rewrite-the-rules-of-app-development-ad1fe44e64c4

From Hash Functions to Vector Databases: The Data Structures Powering AI

Kevin Djabaku Ocansey — Tue, 14 Oct 2025 18:18:03 +0000

Introduction: The Reluctant Start

Hey everyone

I wanted to share a bit about my Data Structures & Algorithms journey.

We have a small study group where we discuss and solve DSA questions together. Honestly, I was a bit reluctant to get involved at first. I felt like I already understood most of it.

But after facing a few questions I couldn't solve, I decided to take a step back and really dig deeper.

Part 1: Understanding Hashing — The Collision Problem

What is Hashing?

At its core, hashing is about mapping keys to fixed-size slots using a hash function. The goal? O(1) insertion, deletion, and lookup.

A hash function takes a key and returns an index, which determines where the key-value pair will be stored in a hash table. A simple example of a hash function uses the modulus operator to compute the index:

index = hash(key) % table_size

In this example:

hash(key) generates a hash code from the input key.
table_size is the total number of slots in the hash table.
The remainder from the division ensures the index falls within the bounds of the table.

This technique helps distribute keys uniformly across the table and is a common approach in implementing hash tables.

The Problem: Collisions

What happens when two different keys hash to the same index? This is called a collision, and it's inevitable (by the pigeonhole principle).

Here's a simple hash table that demonstrates the problem:

class SimpleHashTable:
    def __init__(self, size):
        self.size = size
        self.table = [None] * size

    def hash_function(self, key):
        return key % self.size

    def insert(self, key):
        index = self.hash_function(key)
        print(f"Inserting {key} at index {index}")
        self.table[index] = key  # Overwrites if collision!

    def display(self):
        print("\nHash Table:")
        for i in range(self.size):
            print(f"Slot {i}: {self.table[i]}")


sht = SimpleHashTable(7) # lol interesting accronym

keys = [50, 700, 76, 85, 92, 73, 101]

print("Inserting keys:", keys)
print()

for key in keys:
    sht.insert(key)

sht.display()

Results

Inserting keys: [ 50, 700, 76, 85, 92, 73, 101]

Inserting 50 at index 1
Inserting 700 at index 0
Inserting 76 at index 6
Inserting 85 at index 1
Inserting 92 at index 1
Inserting 73 at index 3
Inserting 101 at index 3

Hash Table:
Slot 0: 700
Slot 1: 92
Slot 2: None
Slot 3: 101
Slot 4: None
Slot 5: None
Slot 6: 76

When we insert keys [50, 85, 92], they all hash to slot 1. The last one wins, and we lose data.

Part 2: The Solution — Chaining

The solution? Chaining. Instead of storing a single value per slot, each slot holds a list (chain) of all keys that hash to that index.

class HashTableChaining:
    def __init__(self, size):
        self.size = size
        self.table = [[] for _ in range(size)]  # Each slot is a list!

    def hash_function(self, key):
        return key % self.size

    def insert(self, key):
        index = self.hash_function(key)
        if key not in self.table[index]:
            self.table[index].append(key)
            print(f"Inserted {key} at slot {index}")

    def search(self, key):
        index = self.hash_function(key)
        return key in self.table[index]

    def delete(self, key):
        index = self.hash_function(key)
        if key in self.table[index]:
            self.table[index].remove(key)

    def display(self):
        print("\n" + "="*40)
        print("HASH TABLE (with chaining):")
        print("="*40)
        for i in range(self.size):
            print(f"Slot {i}: {self.table[i]}")
        print("="*40 + "\n")

ht = HashTableChaining(7)


keys = [50, 700, 76, 85, 92, 73, 101]

print("Inserting keys:", keys)
print()

for key in keys:
  ht.insert(key)

ht.display()

Results

Inserting keys: [50, 700, 76, 85, 92, 73, 101]

Inserted 50 at slot 1
Inserted 700 at slot 0
Inserted 76 at slot 6
Inserted 85 at slot 1
Inserted 92 at slot 1
Inserted 73 at slot 3
Inserted 101 at slot 3

========================================
HASH TABLE (with chaining):
========================================
Slot 0: [700]
Slot 1: [50, 85, 92]
Slot 2: []
Slot 3: [73, 101]
Slot 4: []
Slot 5: []
Slot 6: [76]
========================================

Part 3: Enter the Age of AI — From Keys to Meaning

The Plot Twist

As I tried to understand why data structures like hash functions are so important, I stumbled upon recent concepts i learnt a while back: Vector Databases and Semantic Search. It turns out that in state-of-the-art systems—especially in AI—these foundational concepts play a crucial role in efficiently organizing, indexing, and retrieving high-dimensional data

Traditional hashing maps keys to values. But in modern AI systems, we're mapping meaning.

What Are Vector Embeddings?

An embedding is a numerical representation of data (text, images, audio) as a vector in high-dimensional space.

Example:

"cat" → [0.2, 0.8, -0.3, 0.5, ...]
"dog" → [0.3, 0.7, -0.2, 0.6, ...]
"car" → [-0.8, 0.1, 0.9, -0.4, ...]

Similar concepts sit close together in this space. We measure closeness using distance metrics like:

Euclidean distance: √(Σ(a[i] - b[i])²)
Cosine similarity: (a · b) / (||a|| × ||b||)

Simple Example: Word Embeddings

Let's start with a toy example using simple word vectors:

import numpy as np

# Simple 5-dimensional word vectors (in reality, these are 300-768 dimensions)
word_vectors = {
    "cat": np.array([0.8, 0.3, 0.1, 0.9, 0.2]),
    "dog": np.array([0.7, 0.4, 0.2, 0.85, 0.25]),
    "tiger": np.array([0.75, 0.35, 0.15, 0.88, 0.22]),
    "car": np.array([0.1, 0.9, 0.8, 0.2, 0.7]),
    "vehicle": np.array([0.15, 0.85, 0.75, 0.25, 0.65]),
    "automobile": np.array([0.12, 0.88, 0.78, 0.22, 0.68])
}

def cosine_similarity(v1, v2):
    return np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))

def find_similar(query_word, top_k=3):
    query_vec = word_vectors[query_word]
    similarities = {}

    for word, vec in word_vectors.items():
        if word != query_word:
            similarities[word] = cosine_similarity(query_vec, vec)

    # Sort by similarity
    sorted_words = sorted(similarities.items(), key=lambda x: x[1], reverse=True)
    return sorted_words[:top_k]

# Test
print("Most similar to 'cat':")
for word, score in find_similar("cat"):
    print(f"  {word}: {score:.4f}")

print("\nMost similar to 'car':")
for word, score in find_similar("car"):
    print(f"  {word}: {score:.4f}")

Output:

Most similar to 'cat':
  tiger: 0.9987
  dog: 0.9951
  vehicle: 0.6234

Most similar to 'car':
  automobile: 0.9998
  vehicle: 0.9956
  tiger: 0.6891

See the pattern? Animals cluster together, vehicles cluster together.

Part 4: Real-World Embeddings — Sentence Transformers

Toy examples are fun, but let's use real embeddings from a pre-trained model.

from sentence_transformers import SentenceTransformer
import numpy as np

# Load a pre-trained model (384-dimensional embeddings)
model = SentenceTransformer('all-MiniLM-L6-v2')

# Sample sentences
sentences = [
    "I love programming in Python",
    "Python is a great programming language",
    "The snake slithered through the grass",
    "Machine learning is fascinating",
    "Deep learning models are powerful",
    "I saw a snake in my backyard"
]

# Generate embeddings
embeddings = model.encode(sentences)
print(f"Embedding shape: {embeddings.shape}")  # (6, 384)

def find_similar_sentences(query, sentences, embeddings, top_k=3):
    query_embedding = model.encode([query])[0]

    # Calculate cosine similarities
    similarities = []
    for i, emb in enumerate(embeddings):
        sim = cosine_similarity(query_embedding, emb)
        similarities.append((sentences[i], sim))

    # Sort and return top k
    sorted_results = sorted(similarities, key=lambda x: x[1], reverse=True)
    return sorted_results[:top_k]

# Test semantic search
query = "coding with Python"
print(f"\nQuery: '{query}'\n")
print("Most similar sentences:")
for sentence, score in find_similar_sentences(query, sentences, embeddings):
    print(f"  [{score:.4f}] {sentence}")

Output:

Embedding shape: (6, 384)

Query: 'coding with Python'

Most similar sentences:
  [0.7856] I love programming in Python
  [0.7234] Python is a great programming language
  [0.4123] Machine learning is fascinating

Notice: "coding with Python" matches "programming in Python" even though the exact words differ. That's semantic search.

Part 5: The Scalability Problem

Why Traditional Hashing Fails

Imagine you have 10 million documents, each represented as a 768-dimensional vector.

Naive approach:

def naive_search(query_embedding, all_embeddings):
    similarities = []
    for emb in all_embeddings:  # 10M comparisons
        sim = cosine_similarity(query_embedding, emb)
        similarities.append(sim)
    return max(similarities)

Cost: O(N × D) where N = number of vectors, D = dimensions

For 10M vectors × 768 dimensions = 7.68 billion operations per query

This doesn't scale.

Can We Use Traditional Hashing?

What if we tried hashing embeddings like integers?

def bad_hash(embedding):
    return hash(tuple(embedding)) % 1000

Problem: Vectors [0.5, 0.3, 0.2] and [0.5, 0.3, 0.21] are semantically similar but hash to completely different buckets.

Traditional hashing assumes: Similar keys → different hashes (to avoid collisions)

What we need: Similar keys → SAME hash (intentional collision!)

This is where Locality Sensitive Hashing (LSH) comes in.

Part 6: Locality Sensitive Hashing (LSH) — Deep Dive

The Core Idea

LSH intentionally hashes similar items to the same bucket with high probability.

Instead of avoiding collisions, we embrace them — but only for similar items.

Mathematical Foundation

An LSH family is a family of hash functions h(·) such that for any two vectors u and v:

If sim(u, v) is high → P[h(u) = h(v)] is high
If sim(u, v) is low  → P[h(u) = h(v)] is low

Where sim(·) is a similarity measure (e.g., cosine similarity).

Part 7: Random Projection LSH (Implementation)

Theory: Random Hyperplanes

Key insight: If we pick a random hyperplane through the origin, similar vectors are likely to be on the same side of it.

How it works:

Generate random hyperplane (normal vector r)
Project vector v onto r
Hash based on sign: h(v) = sign(r · v)

If r · v > 0 → hash bit = 1
If r · v ≤ 0 → hash bit = 0

By using multiple random hyperplanes, we create a binary hash code.

Implementation: Basic LSH

import numpy as np

class RandomProjectionLSH:
    def __init__(self, input_dim, num_hash_functions):
        """
        Args:
            input_dim: Dimensionality of input vectors
            num_hash_functions: Number of random hyperplanes
        """
        self.input_dim = input_dim
        self.num_hashes = num_hash_functions

        # Generate random hyperplanes (each is a random vector)
        # Shape: (num_hashes, input_dim)
        self.random_vectors = np.random.randn(num_hash_functions, input_dim)

        # Storage: hash_code -> list of (vector, original_data)
        self.hash_tables = {}

    def _hash(self, vector):
        """
        Hash a vector to a binary code.
        Returns a tuple of 0s and 1s.
        """
        # Dot product with all random vectors: (num_hashes,)
        projections = np.dot(self.random_vectors, vector)

        # Convert to binary: positive → 1, negative → 0
        hash_code = tuple((projections > 0).astype(int))
        return hash_code

    def insert(self, vector, data):
        """
        Insert a vector into the LSH index.

        Args:
            vector: numpy array
            data: any associated data (e.g., document ID, original text)
        """
        hash_code = self._hash(vector)

        if hash_code not in self.hash_tables:
            self.hash_tables[hash_code] = []

        self.hash_tables[hash_code].append((vector, data))

    def query(self, query_vector, max_results=5):
        """
        Find similar vectors to the query.

        Returns: List of (similarity_score, data) tuples
        """
        query_hash = self._hash(query_vector)

        # Get candidate vectors from the same bucket
        candidates = self.hash_tables.get(query_hash, [])

        if not candidates:
            return []

        # Calculate actual similarities for candidates
        results = []
        for vector, data in candidates:
            similarity = np.dot(query_vector, vector) / (
                np.linalg.norm(query_vector) * np.linalg.norm(vector)
            )
            results.append((similarity, data))

        # Sort by similarity
        results.sort(reverse=True, key=lambda x: x[0])
        return results[:max_results]

    def get_stats(self):
        """Return statistics about the hash table."""
        num_buckets = len(self.hash_tables)
        bucket_sizes = [len(items) for items in self.hash_tables.values()]

        return {
            'num_buckets': num_buckets,
            'avg_bucket_size': np.mean(bucket_sizes) if bucket_sizes else 0,
            'max_bucket_size': max(bucket_sizes) if bucket_sizes else 0,
            'total_items': sum(bucket_sizes)
        }

Part 8: Testing LSH Performance

Experiment Setup

Let's compare naive search vs LSH on synthetic data:

# Generate synthetic data
np.random.seed(42)
dimension = 128
num_vectors = 10000

# Create 10 clusters of similar vectors
vectors = []
labels = []

for cluster_id in range(10):
    # Random cluster center
    center = np.random.randn(dimension)
    center = center / np.linalg.norm(center)  # Normalize

    # Generate 1000 vectors around this center
    for _ in range(1000):
        noise = np.random.randn(dimension) * 0.1  # Small noise
        vector = center + noise
        vector = vector / np.linalg.norm(vector)  # Normalize

        vectors.append(vector)
        labels.append(cluster_id)

vectors = np.array(vectors)
print(f"Generated {len(vectors)} vectors with dimension {dimension}")

Naive Search Baseline

import time

def naive_search(query, vectors, top_k=5):
    """Brute force search - check all vectors."""
    similarities = []
    for i, vec in enumerate(vectors):
        sim = np.dot(query, vec)
        similarities.append((sim, i))

    similarities.sort(reverse=True)
    return similarities[:top_k]

# Test query
query = vectors[0]  # Use first vector as query
true_label = labels[0]

# Measure time
start = time.time()
naive_results = naive_search(query, vectors, top_k=10)
naive_time = time.time() - start

print(f"\nNaive Search Time: {naive_time*1000:.2f}ms")
print("Top 5 results:")
for sim, idx in naive_results[:5]:
    print(f"  Index {idx} (cluster {labels[idx]}): similarity = {sim:.4f}")

LSH Search

# Build LSH index
lsh = RandomProjectionLSH(input_dim=dimension, num_hash_functions=16)

print("\nBuilding LSH index...")
build_start = time.time()

for i, vec in enumerate(vectors):
    lsh.insert(vec, i)  # Store index as data

build_time = time.time() - build_start
print(f"Build time: {build_time:.2f}s")

# Show statistics
stats = lsh.get_stats()
print(f"\nLSH Statistics:")
print(f"  Number of buckets: {stats['num_buckets']}")
print(f"  Average bucket size: {stats['avg_bucket_size']:.2f}")
print(f"  Max bucket size: {stats['max_bucket_size']}")

# Query
start = time.time()
lsh_results = lsh.query(query, max_results=10)
lsh_time = time.time() - start

print(f"\nLSH Search Time: {lsh_time*1000:.2f}ms")
print(f"Speedup: {naive_time/lsh_time:.1f}x")

print("\nTop 5 LSH results:")
for sim, idx in lsh_results[:5]:
    print(f"  Index {idx} (cluster {labels[idx]}): similarity = {sim:.4f}")

Expected Output:

Generated 10000 vectors with dimension 128

Naive Search Time: 45.23ms
Top 5 results:
  Index 0 (cluster 0): similarity = 1.0000
  Index 342 (cluster 0): similarity = 0.9856
  Index 891 (cluster 0): similarity = 0.9823
  Index 567 (cluster 0): similarity = 0.9801
  Index 123 (cluster 0): similarity = 0.9789

Building LSH index...
Build time: 0.15s

LSH Statistics:
  Number of buckets: 1247
  Average bucket size: 8.02
  Max bucket size: 45

LSH Search Time: 0.85ms
Speedup: 53.2x

Top 5 LSH results:
  Index 0 (cluster 0): similarity = 1.0000
  Index 342 (cluster 0): similarity = 0.9856
  Index 891 (cluster 0): similarity = 0.9823
  Index 567 (cluster 0): similarity = 0.9801
  Index 229 (cluster 0): similarity = 0.9756

Key Observations:

LSH is ~50x faster
LSH finds vectors from the same cluster
LSH might miss some true nearest neighbors (trade-off)

Part 9: Advanced LSH — Multiple Hash Tables

The Recall Problem

LSH with a single hash table can miss similar items if they hash to different buckets due to random chance.

Solution: Use multiple independent hash tables!

Build L different LSH hash tables with different random hyperplanes
Query all L tables and merge results
Higher recall (find more true neighbors) at cost of more memory

Implementation

class MultiTableLSH:
    def __init__(self, input_dim, num_hash_functions, num_tables):
        """
        Args:
            input_dim: Dimensionality of vectors
            num_hash_functions: Hash bits per table
            num_tables: Number of independent hash tables (L)
        """
        self.num_tables = num_tables

        # Create L independent LSH hash tables
        self.tables = [
            RandomProjectionLSH(input_dim, num_hash_functions)
            for _ in range(num_tables)
        ]

    def insert(self, vector, data):
        """Insert into all tables."""
        for table in self.tables:
            table.insert(vector, data)

    def query(self, query_vector, max_results=10):
        """
        Query all tables and merge results.
        Remove duplicates and sort by similarity.
        """
        all_candidates = {}  # data -> (similarity, vector)

        for table in self.tables:
            results = table.query(query_vector, max_results=100)
            for sim, data in results:
                # Keep best similarity if duplicate
                if data not in all_candidates or sim > all_candidates[data][0]:
                    all_candidates[data] = (sim, data)

        # Convert to list and sort
        results = [(sim, data) for data, (sim, _) in all_candidates.items()]
        results.sort(reverse=True, key=lambda x: x[0])

        return results[:max_results]

    def get_stats(self):
        """Aggregate statistics across all tables."""
        all_stats = [table.get_stats() for table in self.tables]
        return {
            'num_tables': self.num_tables,
            'total_buckets': sum(s['num_buckets'] for s in all_stats),
            'avg_buckets_per_table': np.mean([s['num_buckets'] for s in all_stats]),
            'total_items': all_stats[0]['total_items'] * self.num_tables
        }

Comparison: Single vs Multi-Table

# Single table LSH
lsh_single = RandomProjectionLSH(dimension, num_hash_functions=16)
for i, vec in enumerate(vectors):
    lsh_single.insert(vec, i)

# Multi-table LSH (5 tables)
lsh_multi = MultiTableLSH(dimension, num_hash_functions=16, num_tables=5)
for i, vec in enumerate(vectors):
    lsh_multi.insert(vec, i)

# Test on 100 random queries
num_queries = 100
test_queries = vectors[np.random.choice(len(vectors), num_queries, replace=False)]

single_recalls = []
multi_recalls = []

for query in test_queries:
    # Get ground truth (top 10 neighbors via naive search)
    naive_results = naive_search(query, vectors, top_k=10)
    ground_truth = set(idx for _, idx in naive_results)

    # Single table
    single_results = lsh_single.query(query, max_results=10)
    single_found = set(idx for _, idx in single_results)
    single_recall = len(single_found & ground_truth) / len(ground_truth)
    single_recalls.append(single_recall)

    # Multi table
    multi_results = lsh_multi.query(query, max_results=10)
    multi_found = set(idx for _, idx in multi_results)
    multi_recall = len(multi_found & ground_truth) / len(ground_truth)
    multi_recalls.append(multi_recall)

print(f"\nRecall@10 Comparison:")
print(f"  Single Table LSH: {np.mean(single_recalls):.2%}")
print(f"  Multi-Table LSH (L=5): {np.mean(multi_recalls):.2%}")

Expected Output:

Recall@10 Comparison:
  Single Table LSH: 68.3%
  Multi-Table LSH (L=5): 91.7%

Multi-table LSH significantly improves recall!

Part 10: Building a Mini Semantic Search Engine

Now let's bring it all together: a semantic search engine using real sentence embeddings + LSH.

from sentence_transformers import SentenceTransformer

class SemanticSearchEngine:
    def __init__(self, num_hash_functions=16, num_tables=5):
        # Load sentence transformer model
        self.model = SentenceTransformer('all-MiniLM-L6-v2')
        self.embedding_dim = 384  # Model output dimension

        # Create LSH index
        self.lsh = MultiTableLSH(
            input_dim=self.embedding_dim,
            num_hash_functions=num_hash_functions,
            num_tables=num_tables
        )

        # Store documents
        self.documents = []

    def add_documents(self, documents):
        """
        Add documents to the search engine.

        Args:
            documents: List of strings
        """
        print(f"Encoding {len(documents)} documents...")
        embeddings = self.model.encode(documents, show_progress_bar=True)

        print("Building LSH index...")
        for i, (doc, emb) in enumerate(zip(documents, embeddings)):
            self.documents.append(doc)
            self.lsh.insert(emb, i)

        print(f" Indexed {len(documents)} documents")

    def search(self, query, top_k=5):
        """
        Search for documents similar to the query.

        Args:
            query: String query
            top_k: Number of results to return

        Returns:
            List of (similarity_score, document) tuples
        """
        # Encode query
        query_embedding = self.model.encode([query])[0]

        # Search LSH
        results = self.lsh.query(query_embedding, max_results=top_k)

        # Return (score, document) pairs
        return [(score, self.documents[idx]) for score, idx in results]

# Example usage
sample_docs = [
    "Python is a versatile programming language used for web development, data science, and automation.",
    "Machine learning models can recognize patterns in large datasets.",
    "The Eiffel Tower is a famous landmark in Paris, France.",
    "Deep learning is a subset of machine learning that uses neural networks.",
    "Pandas is a powerful library for data manipulation in Python.",
    "The Great Wall of China is one of the Seven Wonders of the World.",
    "Natural language processing enables computers to understand human language.",
    "JavaScript is essential for front-end web development.",
    "Computer vision allows machines to interpret visual information from the world.",
    "The Colosseum is an ancient amphitheater in Rome, Italy."
]

# Build search engine
engine = SemanticSearchEngine(num_hash_functions=8, num_tables=3)
engine.add_documents(sample_docs)

# Test queries
queries = [
    "programming in Python",
    "AI and neural networks",
    "famous buildings in Europe"
]

for query in queries:
    print(f"\n{'='*60}")
    print(f"Query: '{query}'")
    print('='*60)

    results = engine.search(query, top_k=3)
    for i, (score, doc) in enumerate(results, 1):
        print(f"\n{i}. [Score: {score:.4f}]")
        print(f"   {doc}")

Output:

Encoding 10 documents...
Building LSH index...
Indexed 10 documents

============================================================
Query: 'programming in Python'
============================================================

1. [Score: 0.7234]
   Python is a versatile programming language used for web development, data science, and automation.

2. [Score: 0.6891]
   Pandas is a powerful library for data manipulation in Python.

3. [Score: 0.4512]
   JavaScript is essential for front-end web development.

============================================================
Query: 'AI and neural networks'
============================================================

1. [Score: 0.7823]
   Deep learning is a subset of machine learning that uses neural networks.

2. [Score: 0.7156]
   Machine learning models can recognize patterns in large datasets.

3. [Score: 0.6534]
   Natural language processing enables computers to understand human language.

============================================================
Query: 'famous buildings in Europe'
============================================================

1. [Score: 0.6745]
   The Eiffel Tower is a famous landmark in Paris, France.

2. [Score: 0.6523]
   The Colosseum is an ancient amphitheater in Rome, Italy.

3. [Score: 0.5234]
   The Great Wall of China is one of the Seven Wonders of the World.

It works! The search engine finds semantically similar documents even when exact keywords don't match.

Part 11: Real-World Vector Databases

Production Systems

Modern vector databases use LSH and related techniques at scale:

System	Technique	Use Case
FAISS (Meta)	Product Quantization + LSH	Billion-scale similarity search
Pinecone	HNSW + DiskANN	Managed vector search
Weaviate	HNSW	GraphQL-based vector DB
Milvus	IVF + HNSW	Cloud-native vector search
Qdrant	HNSW	Production-ready vector search

HNSW vs LSH

HNSW (Hierarchical Navigable Small World) is another popular technique:

Graph-based approach (not hash-based)
Often better recall than LSH
More memory intensive
Used by Spotify, Uber, Pinterest

LSH advantages:

Lower memory footprint
Simpler to implement
Better for very high dimensions (>1000)
Natural distributed scaling

Part 12: Performance Tuning LSH

Key Parameters

Number of hash functions (k)
- More → fewer false positives, but lower recall
- Fewer → higher recall, but more candidates to check
- Typical: 8-32
Number of tables (L)
- More → better recall
- Fewer → faster, less memory
- Typical: 3-10
Embedding dimension
- Higher → more expressive, but slower
- Lower → faster, but less accurate
- Common: 128, 384, 768

Trade-off Analysis

# Experiment: vary k and L
configs = [
    (8, 1, "Low k, Single table"),
    (16, 1, "Medium k, Single table"),
    (32, 1, "High k, Single table"),
    (16, 3, "Medium k, Few tables"),
    (16, 10, "Medium k, Many tables"),
]

for k, L, name in configs:
    lsh = MultiTableLSH(dimension, num_hash_functions=k, num_tables=L)

    # Index
    for i, vec in enumerate(vectors[:5000]):  # Use subset for speed
        lsh.insert(vec, i)

    # Test recall
    recalls = []
    query_times = []

    for _ in range(50):
        query = vectors[np.random.randint(5000)]

        # Ground truth
        naive_results = naive_search(query, vectors[:5000], top_k=10)
        ground_truth = set(idx for _, idx in naive_results)

        # LSH search (timed)
        start = time.time()
        lsh_results = lsh.query(query, max_results=10)
        query_times.append(time.time() - start)

        lsh_found = set(idx for _, idx in lsh_results)
        recall = len(lsh_found & ground_truth) / len(ground_truth)
        recalls.append(recall)

    print(f"\n{name}:")
    print(f"  Avg Recall: {np.mean(recalls):.2%}")
    print(f"  Avg Query Time: {np.mean(query_times)*1000:.2f}ms")

Sample Output:

Low k, Single table:
  Avg Recall: 45.2%
  Avg Query Time: 0.23ms

Medium k, Single table:
  Avg Recall: 68.3%
  Avg Query Time: 0.31ms

High k, Single table:
  Avg Recall: 52.1%
  Avg Query Time: 0.18ms

Medium k, Few tables:
  Avg Recall: 87.6%
  Avg Query Time: 0.89ms

Medium k, Many tables:
  Avg Recall: 96.3%
  Avg Query Time: 2.34ms

Insights:

Too many hash functions (high k) → over-partitioning → lower recall
Multiple tables dramatically improve recall
Trade-off: recall vs query time

Part 13: Connection Back to Traditional Hashing

Full Circle

Let's visualize the parallel:

Traditional Hash Table	LSH Vector Database
Goal: Avoid collisions	Goal: Encourage collisions (for similar items)
Key: Integer/string	Key: High-dimensional vector
Hash Function: `key % size`	Hash Function: Random projections
Collision Resolution: Chaining (lists)	Collision: Feature, not bug!
Lookup: Exact match	Lookup: Approximate nearest neighbor
Complexity: O(1) expected	Complexity: Sublinear (better than O(N))

The Evolution

Traditional Hashing (1950s)
  ↓
"How do we handle collisions?"
  ↓
Chaining (lists at each slot)
  ↓
"What if we WANT similar items to collide?"
  ↓
LSH (1990s)
  ↓
"How do we apply this to AI embeddings?"
  ↓
Modern Vector Databases (2020s)

Conclusion: The Journey Continues

Starting with basic hash tables and collision resolution, I never imagined this path would lead to understanding modern AI infrastructure.

The same principles we learn in DSA

partitioning data
managing collisions
trading off space for time

keep showing up, just evolved for new problems.

Key takeaways:

Traditional hashing avoids collisions; LSH embraces them
Chaining handles multiple items per bucket in both approaches
Vector embeddings represent meaning as numbers
LSH makes similarity search scalable
Real-world systems (FAISS, Pinecone) build on these foundations

I'm still learning, but now when I see concepts like "vector database" or "semantic search," I don't feel lost. I see the hash tables underneath.

Resources

Papers:

Other Docs:

Wikipedia: https://en.wikipedia.org/wiki/Hash_function
FAISS: https://github.com/facebookresearch/faiss
Sentence Transformers: https://www.sbert.net/
Pinecone Learning Center: https://www.pinecone.io/learn/
ClickHouse Blog: https://clickhouse.com/blog/approximate-nearest-neighbour-ann-with-sql-powered-local-sensitive-hashing-lsh-random-projections

Building a School Management System (SMS) from Scratch: Vibe Coding Lessons, Pitfalls, and Clean Solutions

Kevin Djabaku Ocansey — Mon, 21 Jul 2025 10:26:59 +0000

Introduction

Recently, in an interview, I presented my vision for an AI-powered School Management System (SMS). My focus was on how Large Language Models (LLMs) could transform user experience and automate workflows. But after the interview, I realized something crucial: AI is only as good as the foundation it’s built on. If your data, user flows, and backend are messy, your AI will be too—garbage in, garbage out.

So for my portfolio, I set out to build the SMS the right way, from the ground up. And make sure i have enough metadata from users, activities to feed my llms. Mind you this data could be from forms, user activity or even prompts and RAG.

Building a Strong Foundation

My Strong Foundation includes

Database design: Sets the schema, what data do you have this leads to clear, normalized tables.
REST APIs: Well-structured endpoints for every entity.
CRUD Layer: Shared database logic that keeps routes clean and modular.
User flows: Clean login, dashboard, and management screens. works well with good UI/UX team. (i didnt really think much about this to be honest)

Database design: Why an ERD (Entity Relationship Diagram) Matters

A solid ERD is the backbone of any complex System. It defines how users, classes, students, guardians, and other entities relate to each other. This structure ensures your data is consistent and easy to query—crucial for both traditional features and AI-powered insights.

Example: PostgreSQL Entities

# filepath: app/db/models.py
class User(Base):
    __tablename__ = "users"
    id = Column(UUID, primary_key=True, default=uuid.uuid4)
    first_name = Column(String)
    last_name = Column(String)
    email = Column(String, unique=True, index=True)
    role = Column(String)
    password_hash = Column(String)
    # ...other fields...

class Class(Base):
    __tablename__ = "classes"
    id = Column(UUID, primary_key=True, default=uuid.uuid4)
    name = Column(String)
    subject = Column(String)
    teacher_id = Column(UUID, ForeignKey("users.id"))
    # ...other fields...

Although I’ve designed a complex version, I decided to build out this lean prototype ERD first. It gives me a solid foundation to test features quickly, iterate fast, and make smarter architectural decisions once the system is in motion.

Basic ERD Example:


User (id, first_name, last_name, email, role, password_hash)
   ├── A user can be a teacher, student, admin, or guardian (via the `role` field)
   |
   ├──< Class (id, name, subject, teacher_id → User.id)
   |       • Each class is taught by one teacher
   |
   ├──< StudentClass (student_id → User.id, class_id → Class.id)
   |       • Many-to-many between students and classes
   |
   └──< GuardianStudent (guardian_id → User.id, student_id → User.id)
           • Many-to-many between guardians and students

This can get more complex with more entities such as quizzes, leaderboards, peer budy concepts payment information and profiles. These will become clearer to me after some research and what type of AI models i might use. So stay tuned for my next Blog

REST API Design

Every main entity gets its own set of RESTful endpoints. This makes the system modular and easy to extend.

Example: Admin REST APIs

# filepath: app/api/admin.py
@router.get("/admin/users")
async def get_all_users():
    # Returns all users

@router.post("/admin/users")
async def create_user(user: UserCreate):
    # Creates a new user

@router.get("/admin/classes")
async def get_all_classes():
    # Returns all classes

CRUD Layer

Instead of putting raw SQL or ORM queries in every route, we use crud.user.get_multi, crud.user.create, crud.user.count, etc

class CRUDUser(CRUDBase):
    """CRUD operations for User model."""

    async def get_by_email(self, db: AsyncSession, *, email: str) -> Optional[User]:
        """Get user by email."""
        result = await db.execute(select(User).where(User.email == email))
        return result.scalar_one_or_none()

    async def create(self, db: AsyncSession, *, obj_in: UserCreate) -> User:
        """Create a new user."""
        # Check if user already exists
        existing_user = await self.get_by_email(db, email=obj_in.email)
        if existing_user:
            raise UserAlreadyExistsException(obj_in.email)

Then in your FastAPi route you can do this

@router.post("/admin/users", response_model=UserResponse, status_code=status.HTTP_201_CREATED)
async def create_user(
    user_in: UserCreate,
    db: AsyncSession = Depends(get_db),
    current_user = Depends(require_admin)
):
    """
    Create a new user (admin only).
    """
    try:
        user = await crud_user.create(db, obj_in=user_in)
        return UserResponse.model_validate(user)

    except UserAlreadyExistsException as e:
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail=f"User with email {e.email} already exists"
        )

Reproducible Dev Setup: Demo Users + Automation

When you're working on a team or just revisiting your own code after a week you don't want to waste time trying to remember what the test accounts password was. I needed a way to spin up the app and instantly have working demo credentials for different roles (admin, teacher, student, guardian). Also for anyone looking to contribute to this project they can easily run using Docker from a shell script.

Scripts for demo credentials: Now you could easily login and test dashboards and other functionalities. Hence easily contribute.

Automated Scripts for user creation: For new contributors, the shell script automatically runs a demo user creation script and verifies everything works. It checks if the demo users already exist, creates any missing ones, and tests that all roles (admin, teacher, student, etc.) can log in successfully

Example: Smart Demo User Creation Script

# filepath: scripts/create_demo_users.py
async def create_demo_users():
    # ...existing code...
    existing_users, missing_users = await check_existing_users()
    if not missing_users:
        print("✓ All demo users already exist!")
        return
    # ...create only missing users...

Example: Automated Testing in start-dev.bat

echo Step 3: Testing existing demo credentials...
docker exec sms-app-1 python /app/scripts/test_demo_credentials.py

echo Step 4: Creating missing demo users...
docker exec sms-app-1 python /app/scripts/create_demo_users.py

echo Step 5: Testing all demo credentials again...
docker exec sms-app-1 python /app/scripts/test_demo_credentials.py

Phase 1 Complete: Ready for the Admin Dashboard

With everything in place— Docker, Database, REST APIs, and demo scripts. I’m ready to build out the admin dashboard and connect the UI to the backend.

Next Steps:

Build out pages and dashboards
Connect and test REST APIs from the frontend
Architect the AI layer once the traditional user flow is solid.

Ai Helpers: The Frustration of Context-Less Coding

I started out using ChatGPT 4o and Claude 4.1(Copilot) for pair programming. But I quickly hit a wall. These tools are great at generating code, but they often lose context—forgetting what’s already been built, duplicating files, or overcomplicating simple tasks. I found myself trusting the AI too much, not double-checking, and getting lost in circular logic.

Quick Fix:

Most of the friction i had with aget mode was due to poor naming of my files. For example, I used app for my backend and web for my frontend. This led to confusion in Docker, scripts, and even in my own mental model.

Lesson: Use clear, conventional names like backend and frontend to avoid headaches. A clean skeleton goes a long way specially for your AI assistant. (Sadly i havent changed this yet)

Cleaning Up: A Collaborative Approach

I decided to start fresh. I cleaned all my Docker containers, revised my flow, and worked collaboratively with my AI assistant—but with me in control. We reviewed every file, every script, and every config. Here’s what worked:

Switched to Ask Mode

Agent mode is great, but it can go off the rails and leave you with alot of bugs. Switching to Ask Mode made me the human in the loop. I was able to review code and test them before i kept them. It also made me review my ideas more. I also realized that i needed to be strict and start applying BDD and TDD principles because this was generally what i was doing, but not the right way. (i know these are good principles, but i honestly wont have the time, juggling dissertation and job hunt. If you want to hire me heres my email. kevinocansey11@gmail.com)

Providing rest Api docs for quick Frontend creation

One bad habit i had was to send my copilot images of how i want my interface to look like. But now i try to focus on one page at a time and provide REST API docs for it to buld the frontend.

A prompt will be .

Using the Admin REST-API docs, scaffold the entire front-end skeleton (directory tree, empty TS stubs, and React-Router routes) for all Admin endpoints; when the skeleton is ready, let me know so we can review each page one by one.

And a response may be like this


web/src/components/admin/
├── dashboard/
│   ├── DashboardHome.tsx
│   ├── DashboardStats.tsx
│   ├── RecentActivity.tsx
│   └── SystemHealth.tsx
│
├── users/
│   ├── UsersListPage.tsx
│   ├── CreateUserModal.tsx
│   ├── EditUserModal.tsx
│   ├── UserCard.tsx
│   ├── UserFilters.tsx
│   └── BulkUserActions.tsx
│
├── classes/
│   ├── ClassesListPage.tsx
│   ├── CreateClassModal.tsx
│   ├── EditClassModal.tsx
│   └── ClassDetails.tsx
│
├── reports/
│   ├── ReportsPage.tsx
│   ├── UserReport.tsx
│   └── ClassReport.tsx
│
└── shared/
    ├── AdminLayout.tsx
    ├── DataTable.tsx
    ├── Modal.tsx
    └── Pagination.tsx

And then i can take it page by page and review the code accordingly.

Pydantic

Pydantic is the most widely used data validation library for Python. Im fairly new to it as well, but i realized why its so powerful. It all starts with models like this.


from pydantic import BaseModel, EmailStr

class UserCreate(BaseModel):
    first_name: str
    last_name: str
    email: EmailStr
    password: str

Behind the scenes, this is doing strict validation. It makes sure your data is the right type before it ever reaches your logic. Email is actually a valid email, strings are strings, and if anything’s missing it logs an error.

Now imagine a route below responsible for retrieving all the users from your database.


@router.get("/users", response_model=PaginatedResponse)
async def get_users(
    pagination: PaginationParams = Depends(get_pagination_params),
    role: str = None,
    db: AsyncSession = Depends(get_db),
    current_user = Depends(require_admin)
):
    """Get all users with pagination and optional role filter."""
    try:

        # Get users
        users = await crud.user.get_multi(
            db, 
            skip=pagination.skip, 
            limit=pagination.limit,
            role=role
        )

        # Get total count
        total = await crud.user.count(db, role=role)

        return PaginatedResponse(
            items=[UserResponse.model_validate(user) for user in users],
            total=total,
            page=pagination.page,
            size=pagination.size,
            pages=(total + pagination.size - 1) // pagination.size
        )

It expects a nice, predictable JSON list of users. But without validation, anything can slip through: missing fields, wrong types, or even legacy junk from the database possibly with messy joins, or weird types.

UserResponse.model_validate() turns each one into a clean, fully validated Pydantic model, ready to be serialized into JSON.

So if something's off? It fails fast instead of sending corrupted or partial data to your frontend. And if it passes? You know you’re returning rock-solid responses

The JWT Authentication Challenge: A Real-World Problem

Situation

After setting up our React Router DOM properly and fixing the navbar architecture, we hit a critical issue: clicking "Sign In" did nothing. Users would enter credentials, the form would submit, but they'd remain stuck on the login page. This was puzzling because our backend was working perfectly.

Diagnosis

Step 1: Backend Investigation

We tested login via PowerShell:

$body = '{"email": "admin@school.edu", "password": "admin123"}'
$headers = @{ "Content-Type" = "application/json" }
$response = Invoke-WebRequest -Uri "http://localhost:8000/api/auth/login" -Method POST -Headers $headers -Body $body
$response.Content | ConvertFrom-Json

The backend was working perfectly, returning:

{
  "access_token": "eyJhbG..."
}

Step 2: Frontend Flow Analysis

The problem was with my frontend:

// BROKEN: Looking for data.user.role that doesn't exist
localStorage.setItem('user_data', JSON.stringify(data.user));
setUser(data.user);
navigate('/');

Problem: Our backend only returns access_token, not a user object. The frontend was expecting data.user.role but getting undefined.

Step 3: The Fix

Instead of expecting user directly, its good pracice to decode the JWT

const tokenPayload = JSON.parse(atob(data.access_token.split('.')[1]));

const user = {
  id: tokenPayload.sub,
  email: tokenPayload.email,
  role: tokenPayload.role,
  first_name: tokenPayload.first_name || '',
  last_name: tokenPayload.last_name || ''
};

This gives you access to the role and could redirect with navigate(/${user.role});

Technical Learning: How JWT Works

Simple bug but it led me to a deeper understanding of how JWT tokens work:

JWT (JSON Web Token) is a three-part system:

Header: Specifies the algorithm (e.g., HS256)
Payload: Contains user data (claims) like sub, email, role, exp
Signature: Cryptographic proof of authenticity

Decoding Process:

// Split token: header.payload.signature
const parts = token.split('.');
const payload = JSON.parse(atob(parts[1])); // Base64 decode the payload

Security Benefits:

Stateless: No server-side session storage needed
Tamper-proof: Signature verification prevents modification
Self-contained: All user info embedded in token
Expiration: Built-in time-based security

Development vs Production Security:

Dev: Same secret = predictable signatures (fine for testing)
Prod: Random secret + proper expiration + token rotation

Final Architecture

Now our authentication flow works perfectly:

Login → Backend validates credentials
JWT Token → Contains all user information
Frontend Decode → Extract user role from token
Smart Routing → Redirect to role-specific dashboard (/admin, /teacher, etc.)
Persistent Sessions → Token stored locally for app reloads

Final Note

While this works, storing user data in localStorage isn't ideal for production apps. A more secure approach would be to store just the access_token (or better, use HttpOnly cookies).

Race Condition

At one point, my isAuthenticated logic was just a simple boolean. Either the user was logged in (true) or not (false).

This caused a race condition. On page reload, isAuthenticated would default to false before checking localStorage or decoding the JWT. So even though the user had a valid token, the app would instantly redirect them to /login

The fix is to add a 'null' state.
const [isAuthenticated, setIsAuthenticated] = useState<true | false | null>(null);

So now:

null = still checking (e.g. during initial load)
true = authenticated
false = definitely not authenticated

The SaaS Pivot: Decentralized, Multi-Tenant Architecture

As I built out the admin dashboard and core CRUD flows, I realized the real opportunity wasn’t just a school management system for one organization—it was a platform. What if any teacher, tutor, or school could sign up and run their own “mini-school” on my infrastructure?

This is the SaaS model powering giants like Google Classroom, Teachable, and Slack.

Vision: Decentralized, Multi-Tenant SaaS

Decentralized onboarding: Any teacher, tutor, or organization can sign up and manage their own classes, students, and guardians—no central admin required.
Tenant isolation: Every user, class, student, and payment is linked to a tenant_id (the teacher or organization), ensuring strict data separation.
Flexible roles: Teachers create and manage their own classes, add students, communicate with guardians, and set quizzes. Guardians enroll and pay for classes, track progress, and communicate with teachers. Organizations can have their own admins and branding.
Platform admin: I (the platform owner) can see and support all tenants, but don’t micromanage their data.

How This Changes the Architecture

Backend: Every resource (user, class, payment, etc.) now includes a tenant_id. All queries are filtered by tenant, and the platform admin can view across tenants for analytics and support.
Frontend: On signup, users choose “I’m a teacher/tutor” or “I’m an organization.” After login, they only see and manage their own data.
Payments: Integrated payment processing for subscriptions, class fees, etc.
Notifications: Decentralized messaging for teachers, students, and guardians.

Next Steps in the Pivot

Refactor the database schema to include tenant_id in all tables and update relationships.
Update the CRUD operations to filter by tenant_id and ensure data isolation.
Revise the authentication flow to support multi-tenancy, including tenant-specific roles and permissions.
Redesign the frontend to allow users to choose their role during signup and manage their own classes/students.
Implement payment processing for class enrollments and subscriptions.
Enhance the admin dashboard to provide insights across tenants and support tenant management.

Final Thoughts

AI is only as good as your foundation.

By building a clean, well-structured SMS, I’ve set myself up for success—whether it’s traditional features or advanced AI integrations. The collaborative, step-by-step approach with my AI assistant kept things clean, focused, and definitely speedy.

After doing this, I realized just how important the fundamentals still are.

So in my next blog, I’ll be flexing my Data Structures and Algorithms skills, along with deeper Database techniques, to show how they still play a vital role before we move on to llm integration and architecting agentic workflows.

Vibe Coding a Receipt Parser in an Hour (with VLMs + Google Sheets)

Kevin Djabaku Ocansey — Sun, 29 Jun 2025 19:12:08 +0000

Today I Tried Out Vibe Coding For Real.

Without too much manual review or prep. I wanted to stop resisting and depend on GENAI for 96% of the dev work. It wasn’t perfect but it was fast, lean, and surprisingly powerful. I finished an mvp within 60mins give or take

The Idea

I’ve been thinking a lot about personal efficiency — the kind that starts at home.

What if I could keep track of the meals I cook, the ingredients I buy, and the patterns that keep showing up?

Not with an app from the Play Store, but with something I designed, shaped by how I actually live.

Because I think that’s where GenAI really shines:
Not just answering questions, but help you build mental systems for repetitive tasks around your life. Its like a Siri version of Jarvis ?

I came back from the store with a receipt in hand and thought:
"Why enter all this manually when I could just take a picture and let an agent do the work?"

So the raw idea was simple:

Upload an image of my receipt
Extract ingredients and prices
Connect the agent to a spreadsheet to update repetitive information
Update info like shelf life (how long they lasted), what meal i made, later manually

(Note: this wasn’t the sheet. — I changed the order of elements later on)

Chatgpt: Chief Architect

I made Chatgpt the chief architect. It created a nice read me of the plan filled with mini objectives and then i used that Readme as kinda the prompt for copilot in vscode.
The readMe had the skeleton, gauged to guide copilot. Cause we know what happens when youre not specific with what you want. Theyll go off the leash.

meal-prepping/
├── app/
│   ├── main.py              # FastAPI app: upload, process, submit routes
│   ├── agent/
│   │   ├── ocr.py           # OCR wrapper to extract text from images
│   │   ├── parser.py        # Parse OCR text into structured data
│   │   └── infer.py         # Clean up quantities and prices
│   ├── services/
│   │   └── sheets.py        # Google Sheets integration
│   ├── templates/
│   │   └── review.html      # Review page with editable table
│   └── static/
│       └── styles.css       # Basic styling
├── tests/
│   └── test_parser.py       # Unit tests
├── requirements.txt         # Dependencies
└── README.md                <--- This is where you are

App Flow (as defined by the agent)

[1] Upload receipt image
        ↓
[2] OCR extracts text
        ↓
[3] Parse into structured data
        ↓
[4] Review/edit in table
        ↓
[5] Send to Google Sheets

📦 Dependencies (Before Openai and dotenv)

fastapi
uvicorn
pillow
easyocr
google-api-python-client
google-auth
python-multipart

Copilot: The builder

We decided on the requirements and got to work. Copilot had no issues. Its first review was on the name of the project and how it conflicted with the information in the Readme file. That was nothing major. Aside that i had no bugs for my first prototype

OCR Results (First Attempt)

This was the image i consistently used for testing

After our first test. I noticed that easyocr wasnt giving us good results.

(It got the prices right but couldnt extract the ingredients)

Switching to VLMs Changed Everything

So i decided to go fully into the direction of VLMS. And it did way better. More than expected.

It was able to get all the ingredients right. But about 70% of the prices right. I couldn't blame it, because to be fair the paper was crumbled. Which is actually grea. since i have the ability to correct it before we submit it to our spreadsheet.

Submitting to spreadsheet.

The only issue i had here was that the information being sent did not match the columns. So i cheated a bit. After sending the information i change the columns appropriately

PS:
To get the full system working end-to-end — from receipt image to structured spreadsheet — I had to wire up a few things behind the scenes.

Here’s exactly how I did it, so you don’t get stuck:

1. Google Sheets API Setup

You can’t just connect to Google Sheets without letting Google know your app exists.

Here’s the flow:

Go to the Google Cloud Console.
Create a new project (e.g. MealPrepAgent).
Go to APIs & Services → Library and search for:
👉 Google Sheets API
Click Enable to turn it on for your project.

Boom, your project is now allowed to talk to spreadsheets.

2. Get Your Spreadsheet ID

This part is super simple but easy to miss.

Open the Google Sheet you want to connect to. Look at the URL:

bash

https://docs.google.com/spreadsheets/d/<some random figure and text >/edit#gid=0

Copy just the part between /d/ and /edit.

That’s your Spreadsheet ID. Save it in your .env like this:

SPREADSHEET_ID=<some random figure and text >

3. Set Up Your Service Account & Credentials

This is what lets your Python app act like a Google user (with permissions).

Here’s how:

Go back to the Google Cloud Console.
Go to IAM & Admin → Service Accounts.
Click “Create Service Account”
- Name: receipt-agent or whatever
- Permissions: just click through — you don’t need to set roles here
After creating, go to the Keys tab → Add Key → Create new key; Choose JSON
Download it — this is your credentials.json
Put this file in your project root. Then, in your .env:

GOOGLE_CREDENTIALS_PATH=credentials.json

Experimenting with Ai MultiAgents

Kevin Djabaku Ocansey — Wed, 28 May 2025 22:00:44 +0000

After completing the Azure AI Foundry agentic AI challenge, the goal was to experiment with multi-agents. There was a lot to absorb around building and orchestrating agents using Azure and Semantic Kernel. So decided to experiment with simple model deployments using a chat completion model to gain deeper understanding especially with the azure ecosystem.

I had a bunch of ideas floating around, but after a visit to nandos. I decided on something to improve their ordering system. Currently with their system customers scan a QR code on their table, get redirected to a website, place their order, and minutes later, their food arrives. It was smooth, efficient.

How could Agents make this more flexible ?

Hmmm? Something that uses speech?

The upside of using voice as the main interaction is that it feels like talking to a real person. But there's also untapped potential where users will be able to navigate the entire website through speech alone. Scrolling pages, clicking buttons, filling forms—all through voice commands. That’s where this could become something much more powerful.

So I set up a quick FastAPI backend to be able to connect to my model deployment on azure, Scaffolded a simple front end with HTML and CSS, and reached into my creative toolbox.

The image below represents the current pipeline. It's functional, but extremely slow in practice. This is because Azure agent deployments do not yet support voice models directly. That’s a limitation worth noting. Similar to Langchain and Hugginface, they require integration with external speech-to-text (STT) and text-to-speech (TTS) tools to build voice-enabled applications

Playing with Voice using p5.js

I’ve had some experience with p5.js for a hackathon. It's super flexible and quick for visual prototyping. So I thought, why not use it to build a voice-driven interface? I also once saw someone build an agent that uses blandAi to represent them in an interview. And i really liked the effects and UI they used to represent the voice waves.

So I whipped up something simple and similar, a nice responsive round blob that pulses with your voice input using p5.AudioIn() and p5.FFT().

Then I added speech synthesis so you could talk to it — and as you speak, the waveform animates, and words pop up on screen one-by-one.

Record the speech

  speechRec = new p5.SpeechRec('en-GB', gotSpeech);
  speechRec.continuous = true;
  speechRec.interimResults = false;

  fft = new p5.FFT();
  mic = new p5.AudioIn();
  mic.start();
  fft.setInput(mic);

Next was to give the instructions to the Ai agent. Different prompts bring out different outputs.

 query_embedding = embedder.encode(speech).tolist()
    results = collection.query(query_embeddings=[query_embedding], n_results=5)
    top_chunks = results["documents"][0]
    rag_context = "\n---\n".join(top_chunks)

    messages = [
        {
            "role": "system",
            "content": (
                "You are a helpful Nando's assistant. Help the user place an order by asking for:\n"
                "- Side dish\n- Main dish\n- Drink + size\n- Sauce\n- Table number\n- Payment method\n"
                "Use the following menu context to answer accurately:\n" + rag_context +
                "\nReturn the final order in JSON format when ready."
            )
        },
        {
            "role": "user",
            "content": speech
        }
    ]

RAG

ChromaDB was set up locally. This is where the deeper understanding of the difference between a normal chat compeltion model and an agent was seen and with it, a key learning curve.

Beginner Tip: Chat completion is like sending a single message to GPT and getting a reply. Agent deployments give you threads, memory, and advanced tools like function calling.

The distinction between standard model deployments and agents in Azure is that agents offer thread management, which is helpful for tracking state, managing conversation history, and linking file contexts. However, they can also incur additional costs.

To avoid diving in too deep too soon, the initial setup stuck with chat completions, manually managing context until the overall architecture became clearer.

Using a Nando’s menu PDF, the content was converted into JSON and flattened for embedding:

Setting up ChromaDb for RAG

# Flatten content into chunks
for item in menu.get("peri_peri_chicken", []):
    chunks.append(f"{item['name']}: {item['description']} - £{item['price']}")

# ... same for burgers, wraps, drinks, etc.

embeddings = embedder.encode(chunks).tolist()
collection.add(
    documents=chunks,
    embeddings=embeddings,
    ids=[f"item-{i}" for i in range(len(chunks))]
)

Converting Model reply to audio

After chroma was setup the the model was more context aware. It didnt need to hallucinate

Once input was processed and the AI responded with text, the output needed conversion to spoken feedback to mimic an oral communication.

Several TTS tools were explored. While some failed to deliver usable output, gTTS stood out for its simplicity and effectiveness.

Azure Insight: Azure offers audio transcription through its Speech services—separate from chat models. And while GPT-4o preview can accept audio input, it’s not yet fully supported in agents.


    try:
        from gtts import gTTS

        tts = gTTS(text=text, lang='en', slow=False)
        audio_buffer = io.BytesIO()
        tts.write_to_fp(audio_buffer)
        audio_buffer.seek(0)

        # Write to stdout for streaming
        sys.stdout.buffer.write(audio_buffer.read())

Future Ideas

As ive already mentioned, the next feature planned is keyword-based interaction. For example saying certain phrases to trigger buttons or frontend events. Initially, Hugging Face tools were on the table, but after all this, its important i do not mix too many frameworks. It’s clear that using Semantic Kernel is the smarter choice if i decide to stick to their model deployments or agents.

Also, im thinking about downloading some open source models locally instead of relying on endpoints. I think that can increase performance as well.

In terms of architecture, the current thinking is this: it’s often better to commit to a single ecosystem unless there’s a clear reason to mix. Semantic Kernel pairs naturally with Azure endpoints, offering memory, RAG, and function calling. Hugging Face, on the other hand, is powerful when fine-tuning, customizing, or hosting models locally. It’s flexible but demands more manual setup.

Biggest learning point?

This can be considered as version 1 of the project. So not to self

For voice agents, use audio model deployments, not Azure agent deployments (which only support chat-type models for now).
Some voice models like gpt-4o-mini-realtime-preview are available, but I haven’t integrated them yet. That’s probably where the next blog will go.
Voice agents aren’t fully supported yet, but previews show that more is coming.

This started off as a half-baked idea and it’s grown into a real project with legs. The next phase? More performance tests, more model comparisons, and building out the interaction layer.

Keep track of the project on my github
https://github.com/ocansey11/azure.git

i recently saw a tiktok that shows what i wanted to do.
https://www.tiktok.com/@kyutai_labs/video/7507687261098741015?_t=ZM-8wkuLndLoUH&_r=1