Forem: Muhammad Taqi

I Built a Group Chat Where Multiple AIs Can Talk to You at the Same Time

Muhammad Taqi — Sat, 09 May 2026 17:32:27 +0000

There's a specific habit I picked up early as an AI developer that probably sounds familiar if you've spent any real time working with language models.
You come across a problem. You want to know how different models think about it. So you open one tab for ChatGPT, another for Claude, maybe a third for Gemini. You paste the same prompt three times. You wait. Then you manually read through three separate conversations trying to piece together which perspective makes the most sense.
It works. But it's tedious in a way that starts to bother you after the hundredth time.
I kept thinking there had to be a better way to do this. Not a comparison tool with a split screen and rigid inputs, but something that actually felt like a conversation. Something where you could just talk, and multiple models would respond naturally, in the same thread, the same way your friends would in a group chat.
That thought turned into Kōl.

The Idea

The concept is simple. You create a room. You add whichever AI models you want as members, and you can also invite actual people. Friends, teammates, whoever. Then you send a message and everyone in the room responds, humans and AIs together, in the same conversation thread.
It's not a side by side comparison UI. It's not a dashboard with dropdowns. It's a chat room. The AIs are members the same way your friends are members. The conversation flows naturally and you can follow up, go deeper, push back on one response, and watch how everyone else handles the same thread.
But the part that makes it genuinely different from a simple multi-model wrapper is what happens over time. Each room builds its own memory. The longer a room exists, the more context it carries. The AIs in that room don't just know what was said five minutes ago. They know what your room has been about for weeks.

How I Built It

The frontend is Next.js. The backend is Express on Node.js. The real-time layer runs on Socket.io. AI models are pulled from Groq, LongCat, and Gemini, giving a mix of speed and personality across the responses.
One decision I made early that shaped the whole feel of the product: responses are not streamed token by token. When you send a message, a typing indicator appears for each AI member in the room, just like when a real person is composing a reply. Then the full response arrives when it's ready.
That single choice changed everything about how the product feels. It stopped feeling like a tool and started feeling like a conversation. The typing animation creates the same anticipation you get when a friend is actually thinking through what to say. You're not watching text generate, you're waiting for a response. The distinction sounds small but experientially it's completely different.

The Memory System

This is the part I'm most proud of and the part that took the most thinking to get right.
Every room in Kōl builds its own memory over time. There's a background model that runs silently, watching the conversations happening in each room. As messages accumulate, it processes them and builds a growing summary of what that room is about. What topics come up. What decisions were made. What the people in that room care about. What was discussed last Tuesday.
That summary isn't just stored somewhere. It gets fed back to the AI members as context every time someone sends a new message. So when you come back to a room after a few days and ask something, the AIs aren't starting from scratch. They already know the history of that room. They respond with the weight of everything that's been discussed before.
The effect is hard to describe until you experience it. It stops feeling like you're querying a model and starts feeling like you're talking to someone who was there for the whole conversation and actually remembers it.
The background model running and maintaining memory independently was an interesting architectural challenge. It needs to process conversations without interrupting them, update the summary incrementally as new messages come in, and make sure the context it produces is actually useful rather than just a raw dump of everything ever said. Getting that balance right took a few iterations.

The Routing Problem That Almost Broke Everything

With multiple rooms active and multiple models responding simultaneously, socket events were landing in the wrong rooms. It was one of those bugs that makes the whole product look broken when the actual problem is much smaller than it appears. Took a few broken UIs to find it but once I saw it clearly it was a straightforward fix.

What It Changed

The thing I didn't fully expect was how much it changed the way I interact with AI models in general.
When you're bouncing between tabs, you naturally anchor to whichever response you read first. It frames how you think about the problem and everything after gets filtered through that lens. You're not really comparing anymore, you're just looking for confirmation.
When responses arrive in the same thread alongside each other and alongside what your actual friends are saying, that bias goes away. You process them more like genuine perspectives in a real conversation. It's a small shift but it changes how you evaluate what you're reading.
And with the memory layer on top of that, the room starts to feel like a place rather than a session. Something that has history. Something you can come back to.

What I'd Do Differently

The thing I'd rethink most is how the background model picks which AIs respond. Right now there's already an LLM running behind the scenes that decides which models should reply to a given message, which is the right approach. But the decision making could be smarter. I'd want it to factor in more context, the tone of the message, which model has been most active, what the room's history looks like, so the participation feels more organic and less predictable. In a real group chat people don't all chime in with the same energy every time. The models shouldn't either.
I'd also improve the way responses are ordered when multiple models reply around the same time. Right now they arrive based on which API responded first. Giving more control over that, or making it feel more natural, is something worth thinking about.

Why I Open Sourced It

Kōl started as something I built for myself. It solved a real problem I had and I wanted to see if it solved the same problem for other people.
Open sourcing it felt like the natural next step. If someone wants to fork it, build on it, or just dig into how the memory layer or the real-time routing works, it's all there. And honestly, seeing how other developers approach the same kinds of problems is worth more to me than keeping it closed.
If you build something with it or run into something interesting, I'd genuinely like to hear about it.

Here's The repo: https://github.com/m-taqii/kol.

I Built an AI Agent That Talks to My Website Visitors So I Don't Have to Miss Leads Anymore

Muhammad Taqi — Thu, 05 Mar 2026 13:12:11 +0000

There's a specific type of frustration I kept running into as a freelance developer. Someone would land on my site, browse around for a few minutes, and then leave. No message, no inquiry, nothing. And I'd never know what they wanted or if they were even close to reaching out.

The ones who did reach out were great. But I started thinking about the ones who didn't. Some people genuinely don't like filling out contact forms. It feels formal, one-sided, a bit awkward. You're essentially writing a cold email to someone whose work you just discovered. A lot of potential clients just bounce instead of going through that friction.

So I started thinking about what the middle ground looks like. Something between "browse silently and leave" and "fill out a form and wait." That middle ground turned out to be a conversation.

The Idea

I wanted to build something that could talk to visitors the way I would if I were sitting right there with them. Answer their questions naturally. Tell them about what I do, how I work, what kinds of projects I take on. Not in a scripted, button-clicking chatbot way, but in an actual back and forth exchange.

And if at some point in that conversation the visitor started showing genuine interest in working together, the agent would smoothly ask for their contact details, put together a description of what they're looking for, and send it straight to me. No forms. No friction. Just a conversation that ends with a qualified lead in my inbox.

If you want to see what I mean before reading further, you can go talk to it yourself at iamtaqi.site it's live on the site right now.

The key thing I was careful about from the start: it should never push. If someone just has a general question, it answers it. If someone wants to know how I approach a certain type of project, it tells them. It only asks for contact info when the person themselves signals they want to move forward. That distinction mattered a lot to me.

How I Built It

The stack I used was LangChain and LangGraph on the backend, with a RAG pipeline powering the knowledge base.

The RAG layer is what makes the agent actually useful. RAG stands for Retrieval Augmented Generation, and the short version is this: instead of the AI making things up or relying on general training data, it retrieves real information from a specific knowledge base before forming a response. In this case, that knowledge base is everything about me and my work, my services, my process, the kinds of projects I've done, my tech stack, how I approach client relationships, turnaround times, all of it.

When a visitor asks something like "do you work with React?" or "have you built anything with AI before?", the agent doesn't guess. It goes and pulls the relevant context from the knowledge base and builds its answer from that. This means the responses are accurate, grounded, and actually reflect what I do rather than what a generic AI thinks I might do.

LangGraph is what handles the flow of the conversation. Think of it as the brain that decides what state the conversation is in and what should happen next. Is this person still in discovery mode? Are they asking something the knowledge base can answer? Have they expressed enough interest that it makes sense to ask for their details? LangGraph manages all of that through a graph of nodes and conditional edges, so the agent always knows where it is in the conversation and what the right next move is.

LangChain sits underneath all of this and handles the actual language model interactions, the prompt construction, and the memory that lets the agent remember what was said earlier in the conversation.

The flow roughly looks like this: visitor sends a message, the agent retrieves relevant context from the knowledge base using semantic search, constructs a response that feels natural and on-brand, and then based on the conversation state, decides whether to just answer, ask a follow up, or gently transition to collecting contact info if the visitor seems interested.

When it does collect contact info, it also generates a short project brief based on everything discussed in the conversation. So by the time I get the notification, I already have the person's name, contact details, and a summary of what they're looking for. I can reply with full context from the first message.

What It Changed

The most obvious thing is that I stopped losing warm leads. People who would have bounced after a minute of browsing now have a reason to stick around and engage. And because the conversation is low pressure and genuinely helpful, the quality of the people who do end up reaching out is better too. They've already talked through their project, they know what I do, and they've self-selected.

The other thing I didn't fully anticipate is how much it communicates about the kind of work I do. If someone comes to my site wondering if I'm serious about AI development, and then they immediately have a sophisticated AI conversation right there on the page, that answers the question pretty effectively.

It also handles the common questions I used to answer repeatedly over email. Stuff like tech stack questions, availability, what kinds of industries I've worked with. The agent takes care of all of that so by the time someone reaches me directly, we're past the intro stage.

What I'd Do Differently

If I were building this again I'd invest more time upfront in structuring the knowledge base really carefully. The quality of what the RAG system retrieves is directly tied to how well the source documents are written and organized. Early on I had some responses that were slightly off because the relevant information was buried in a long document and the retrieval wasn't surfacing it cleanly. Taking the time to write focused, well-structured knowledge documents made a significant difference.

I'd also build in more analytics from day one. Right now I can see conversations that converted into leads, but I'd like to have better visibility into the questions people are asking that the agent couldn't fully answer. That's actually really valuable product feedback if you look at it the right way.

Closing Thought

This project started from a pretty simple observation: people don't always want to fill out a form, but that doesn't mean they don't want to connect. Giving them a conversation instead of a form removed a friction point that I didn't even realize was costing me leads.

If you're a freelancer or running a small agency and you've ever wondered how many visitors left your site without reaching out, this is worth thinking about. The technology to build something like this is genuinely accessible now, and the ROI on not missing a single interested visitor adds up fast.

Happy to answer any questions about the build in the comments. And if you want to see it in action, just head over to iamtaqi.site and start a conversation.