<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Roma</title>
    <description>The latest articles on Forem by Roma (@roman_zh333).</description>
    <link>https://forem.com/roman_zh333</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3845332%2Fc587d0b7-8de3-436c-b258-421627d43458.png</url>
      <title>Forem: Roma</title>
      <link>https://forem.com/roman_zh333</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/roman_zh333"/>
    <language>en</language>
    <item>
      <title>How AI Companion Apps Handle Messaging at Scale: WhatsApp, Telegram, and Beyond</title>
      <dc:creator>Roma</dc:creator>
      <pubDate>Mon, 30 Mar 2026 15:28:29 +0000</pubDate>
      <link>https://forem.com/roman_zh333/how-ai-companion-apps-handle-messaging-at-scale-whatsapp-telegram-and-beyond-4hbc</link>
      <guid>https://forem.com/roman_zh333/how-ai-companion-apps-handle-messaging-at-scale-whatsapp-telegram-and-beyond-4hbc</guid>
      <description>&lt;p&gt;Most AI companion products are self-contained apps. You download, you chat, everything happens inside their walled garden. But a growing subset of the market takes a different approach: building AI companions that live inside existing messaging platforms like WhatsApp and Telegram.&lt;br&gt;
This architectural choice introduces a completely different set of engineering challenges. Here is what it actually looks like under the hood.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why build on messaging platforms
&lt;/h2&gt;

&lt;p&gt;The user experience argument is straightforward: people already live inside their messengers. Meeting users where they are, instead of asking them to download another app, reduces friction and increases engagement.&lt;br&gt;
But there are also technical advantages. Messaging platforms handle the entire client-side stack — UI rendering, push notifications, media delivery, offline queuing. Instead of building and maintaining native apps for iOS and Android, you build a backend that communicates through messaging APIs.&lt;br&gt;
The trade-off is control. You cannot customize the chat UI, you are subject to the platform's rate limits and content policies, and you depend on third-party API stability.&lt;/p&gt;

&lt;h2&gt;
  
  
  The WhatsApp integration landscape
&lt;/h2&gt;

&lt;p&gt;WhatsApp offers two integration paths, and they serve very different use cases.&lt;br&gt;
The official WhatsApp Business API (through Meta's Cloud API) is designed for businesses sending notifications and handling customer service. It requires business verification, enforces template-based messaging for outbound messages, and charges per conversation. It is not designed for AI companion use cases and the content policies would likely flag this type of application.&lt;br&gt;
The alternative is unofficial API providers. Services like Green API or Evolution API provide WhatsApp integration through web client automation or multi-device protocol implementation. Green API operates as a cloud service — you get an API endpoint, send messages, receive webhooks. Evolution API is self-hosted — you run the infrastructure, which gives more control but requires DevOps work.&lt;br&gt;
The architectural pattern for either approach looks like this:&lt;br&gt;
User sends WhatsApp message to the AI number. The API provider receives it and sends a webhook to your backend. Your backend processes the message through the AI pipeline (orchestration, model inference, memory lookup, response generation). The response is sent back through the API provider to the user's WhatsApp.&lt;br&gt;
Latency management is critical here. WhatsApp users expect near-instant read receipts and responses within seconds. The AI pipeline — especially if it involves multiple model calls — can take 3-10 seconds. Solutions include sending read receipts immediately (before processing), showing "typing" indicators during generation, and streaming responses where the API supports it.&lt;br&gt;
Telegram's bot API&lt;br&gt;
Telegram is more developer-friendly for this use case. The Bot API is official, well-documented, free, and explicitly supports conversational bots.&lt;br&gt;
But for AI companions that need to feel like real contacts rather than bots, some platforms use user accounts through libraries like GramJS or Telethon instead of the Bot API. A user account can have a profile picture, status, and appears in the regular chat list rather than being marked as a bot.&lt;br&gt;
This approach is technically against Telegram's terms of service for automated usage of user accounts, but it is widely practiced. The risk is account suspension, which means having backup accounts and rotation strategies.&lt;/p&gt;

&lt;h2&gt;
  
  
  The state management challenge
&lt;/h2&gt;

&lt;p&gt;Messaging platform integrations are inherently stateless from the API perspective. Each webhook is an independent HTTP request. But AI companion conversations are deeply stateful — you need to track conversation history, character state, memory, and ongoing context.&lt;br&gt;
The standard architecture uses Redis for hot state (current conversation context, recent messages, active session data) and PostgreSQL or similar for cold state (long-term memory, user profiles, conversation archives).&lt;br&gt;
Each incoming message triggers a pipeline: load hot state from Redis, enrich with relevant cold state from database, run through AI pipeline, update state, return response. The entire cycle needs to complete within the messaging platform's timeout window.&lt;br&gt;
For a platform handling thousands of concurrent conversations, the state management layer is often the bottleneck. Each conversation maintains its own context window, memory index, and character state. Multiplied by thousands of active users, this requires careful memory management and connection pooling.&lt;/p&gt;

&lt;h2&gt;
  
  
  Proactive messaging architecture
&lt;/h2&gt;

&lt;p&gt;One of the most interesting engineering challenges in messenger-based AI companions is proactive messaging — having the AI reach out to the user without being prompted.&lt;br&gt;
This requires a scheduling system that evaluates when and whether to send a message to each user. Factors include: time since last interaction, time of day in the user's timezone, conversation momentum (was the last exchange engaging or winding down), and character personality (some characters are more initiating than others).&lt;br&gt;
The scheduler typically runs as a separate service, scanning active conversations on a regular interval and queuing proactive messages that pass the evaluation criteria. Rate limiting is essential — too many unprompted messages becomes spam.&lt;br&gt;
This is where the experience diverges significantly from app-based companions. The AI feels like a real contact in your phone because it behaves like one — messaging when it has something to say, not just when you open an app.&lt;/p&gt;

&lt;h2&gt;
  
  
  The scaling economics
&lt;/h2&gt;

&lt;p&gt;Running AI companions on messaging platforms has a different cost structure than app-based products.&lt;br&gt;
You save on: mobile app development and maintenance, push notification infrastructure, client-side media handling, app store fees (15-30% on in-app purchases).&lt;br&gt;
You spend on: messaging API costs (Green API charges per instance), model inference (unchanged), state management infrastructure, compliance with messaging platform policies.&lt;br&gt;
For early-stage products, the messenger approach is significantly cheaper to launch. No app review process, no client-side bugs across hundreds of device configurations, no app store politics. Ship a backend, connect it to a WhatsApp number, and you are live.&lt;br&gt;
For scale, the economics depend heavily on the messaging API pricing model and your inference costs. The companies getting this right are using cost-efficient models (DeepSeek and similar) for the majority of messages and reserving expensive models for high-complexity interactions.&lt;br&gt;
The messaging-native approach to AI companions is still early. But the engineering patterns are maturing fast, and the user experience advantages are real. If you are building in this space, it is worth evaluating whether you really need your own app — or whether the messenger is the app.&lt;/p&gt;

</description>
      <category>api</category>
      <category>ai</category>
      <category>performance</category>
      <category>startup</category>
    </item>
    <item>
      <title>Building AI Companions That Feel Real: A Technical Deep Dive</title>
      <dc:creator>Roma</dc:creator>
      <pubDate>Mon, 30 Mar 2026 15:13:44 +0000</pubDate>
      <link>https://forem.com/roman_zh333/building-ai-companions-that-feel-real-a-technical-deep-dive-pjb</link>
      <guid>https://forem.com/roman_zh333/building-ai-companions-that-feel-real-a-technical-deep-dive-pjb</guid>
      <description>&lt;p&gt;If you have ever tried to build a chatbot that maintains personality across hundreds of messages, you know the fundamental problem: LLMs have no inherent sense of self.&lt;br&gt;
Every message is generated from the context window. Change the context, and you change the personality. This is fine for assistant-style applications where consistency does not matter. For AI companions - where the user expects a persistent, coherent character - it is the core engineering challenge.&lt;br&gt;
I have been studying how modern AI companion platforms solve this, and the architecture patterns are more interesting than you might expect.&lt;/p&gt;

&lt;h2&gt;
  
  
  The character consistency problem
&lt;/h2&gt;

&lt;p&gt;A naive approach to AI companions is straightforward: write a system prompt describing the character, pass it with every API call, hope for the best. This works for about 20 messages before the character starts drifting.&lt;br&gt;
The drift happens because system prompts compete with conversation history for attention in the context window. As the conversation grows, the model weighs recent messages more heavily than the system prompt. Your carefully crafted "sarcastic goth artist who loves cats" gradually becomes a generic helpful assistant.&lt;/p&gt;

&lt;h2&gt;
  
  
  The solutions fall into three categories.
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Approach 1: Reinforcement through injection&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The simplest mitigation is periodic character reinforcement. Every N messages, inject a hidden system message reminding the model who it is. Some platforms do this every 5-10 turns.&lt;br&gt;
This works but creates a sawtooth pattern in character consistency. The character is strongest right after injection and weakest right before the next one. Observant users notice the oscillation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approach 2: Multi-layer prompting&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;More sophisticated platforms use a layered prompt architecture. Instead of one system prompt, they maintain several layers.&lt;br&gt;
Layer 1: Core identity (never changes) - fundamental personality traits, values, speaking patterns.&lt;br&gt;
Layer 2: Relationship state (updates per session) - how the character feels about the user based on conversation history, current emotional dynamic.&lt;br&gt;
Layer 3: Context window management - a summarizer that compresses old conversation into character-relevant highlights, preserving information that matters for personality consistency while discarding generic exchanges.&lt;br&gt;
Layer 4: Behavioral rules - guardrails and response patterns that keep the character within bounds.&lt;br&gt;
This multi-layer approach produces dramatically better consistency because each layer serves a different function and can be optimized independently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approach 3: The orchestrator pattern&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The most advanced architecture I have seen uses a separate orchestrator model that sits between the user and the character model. The orchestrator analyzes each user message, determines the appropriate response strategy, selects the right combination of prompt layers, and routes the request accordingly.&lt;/p&gt;

&lt;p&gt;For example, if the user sends a casual message, the orchestrator might use a lighter prompt configuration. If the user sends something emotionally charged, it switches to a configuration that emphasizes the character's emotional depth. If the conversation is heading toward a topic the character has strong opinions about, it loads the relevant personality modules.&lt;/p&gt;

&lt;p&gt;One implementation of this pattern that I found documented is the approach used by &lt;a href="https://tooshy.ai/blog/how-ai-girlfriends-work" rel="noopener noreferrer"&gt;TooShy&lt;/a&gt; - they describe a multi-layer strategist system that dynamically adjusts the model's behavior based on conversation context. The orchestrator pattern is powerful because it allows a relatively simple character model to produce complex, context-appropriate responses.&lt;/p&gt;

&lt;h2&gt;
  
  
  The memory architecture
&lt;/h2&gt;

&lt;p&gt;Consistency across sessions requires persistent memory. The standard approaches are:&lt;/p&gt;

&lt;p&gt;Vector store with semantic search - each conversation turn is embedded and stored. When generating a response, relevant past interactions are retrieved and injected into context. Works well for factual recall ("what is the user's job") but poorly for emotional continuity ("how did the user feel last time we talked about their family").&lt;/p&gt;

&lt;p&gt;Structured memory with categories - instead of raw conversation storage, extract specific memory types: facts about the user, emotional events, relationship milestones, user preferences. Store these in structured format and inject relevant ones per conversation.&lt;br&gt;
Hybrid approach - combine vector search for general recall with structured memory for high-importance information. Add a decay function so older, less-referenced memories fade while frequently-accessed ones stay prominent.&lt;/p&gt;

&lt;p&gt;The anti-pattern to avoid is storing everything and retrieving too much. Flooding the context window with old conversation data dilutes the character prompt and causes the same drift problem you were trying to solve.&lt;/p&gt;

&lt;h2&gt;
  
  
  Output quality control
&lt;/h2&gt;

&lt;p&gt;Even with perfect character consistency and memory, the raw model output needs processing. Common post-processing steps include:&lt;br&gt;
Length normalization - preventing the model from writing essays when a one-line response is appropriate.&lt;/p&gt;

&lt;p&gt;Repetition detection - catching when the model falls into repetitive patterns ("That's interesting! Tell me more!" syndrome).&lt;br&gt;
Character voice validation - checking that the response matches the character's established vocabulary and speech patterns.&lt;br&gt;
Emotional tone matching - ensuring the response's emotional register is appropriate for the conversation context.&lt;/p&gt;

&lt;p&gt;Some platforms run a lightweight classifier on every output to score it against the character profile before sending it to the user. If the score is too low, they regenerate. This adds latency but significantly improves quality.&lt;br&gt;
The deployment reality&lt;/p&gt;

&lt;h2&gt;
  
  
  Building all of this is one thing. Running it at scale is another.
&lt;/h2&gt;

&lt;p&gt;Each conversation requires multiple model calls (orchestrator + character + post-processing), persistent storage for memory, and real-time state management for active conversations.&lt;/p&gt;

&lt;p&gt;The cost optimization strategies are their own engineering challenge. Smaller models for orchestration and classification, larger models for actual conversation generation. Caching common response patterns. Batching memory updates instead of processing them per-message.&lt;/p&gt;

&lt;p&gt;If you are building in this space, the technical moat is not the model - everyone has access to good models now. The moat is the orchestration layer, the memory architecture, and the quality control pipeline. That is where the engineering complexity lives, and where the user experience is won or lost.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>ai</category>
      <category>architecture</category>
    </item>
    <item>
      <title>How Messaging Apps Became the Next Platform for AI</title>
      <dc:creator>Roma</dc:creator>
      <pubDate>Thu, 26 Mar 2026 22:46:15 +0000</pubDate>
      <link>https://forem.com/roman_zh333/how-messaging-apps-became-the-next-platform-for-ai-25mi</link>
      <guid>https://forem.com/roman_zh333/how-messaging-apps-became-the-next-platform-for-ai-25mi</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvl5koy0g575zradc127j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvl5koy0g575zradc127j.png" alt=" " width="800" height="1071"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There's a pattern in tech that keeps repeating: the most impactful products don't create new behaviors — they embed themselves into existing ones.&lt;/p&gt;

&lt;p&gt;Email didn't replace letters by being better letters. It replaced them by living where people already worked. Mobile apps didn't replace websites by being better websites. They replaced them by living where people already looked.&lt;/p&gt;

&lt;p&gt;AI companions are following the same pattern. And the platform they're embedding into? Your messaging apps.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why dedicated AI apps hit a ceiling
&lt;/h2&gt;

&lt;p&gt;Every AI companion platform faces the same growth problem: you need users to download a new app, create an account, build a habit, and keep coming back. Each step loses 50-70% of potential users.&lt;/p&gt;

&lt;p&gt;The funnel looks something like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hears about product: 100%&lt;/li&gt;
&lt;li&gt;Visits website: 30%&lt;/li&gt;
&lt;li&gt;Downloads app: 10%&lt;/li&gt;
&lt;li&gt;Creates account: 7%&lt;/li&gt;
&lt;li&gt;Has first conversation: 5%&lt;/li&gt;
&lt;li&gt;Returns next day: 2%&lt;/li&gt;
&lt;li&gt;Still active after 30 days: 0.5%&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This isn't a product problem — it's a platform problem. Dedicated apps compete for attention against every other app on your phone. And attention is finite.&lt;/p&gt;

&lt;h2&gt;
  
  
  The messaging integration thesis
&lt;/h2&gt;

&lt;p&gt;What if the AI lived in an app you already open 50+ times a day?&lt;/p&gt;

&lt;p&gt;WhatsApp has 2.7 billion monthly active users. Telegram has 900 million. These aren't apps people need to be convinced to open — they're already there, all day, every day.&lt;/p&gt;

&lt;p&gt;An AI companion on WhatsApp doesn't need to fight for a spot on your home screen. It doesn't need push notification permission. It doesn't need you to build a new habit. It's just another conversation in your existing message list.&lt;/p&gt;

&lt;p&gt;The retention numbers reflect this. AI companions on messaging platforms typically see 3-5x higher day-30 retention compared to dedicated apps.&lt;/p&gt;

&lt;h2&gt;
  
  
  Technical architecture for messaging AI
&lt;/h2&gt;

&lt;p&gt;Building AI on top of messaging platforms introduces interesting architectural challenges:&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Session management without sessions
&lt;/h2&gt;

&lt;p&gt;Traditional chatbots have sessions — discrete conversations with a beginning and end. Messaging apps don't. A conversation on WhatsApp is a continuous thread that might span months.&lt;/p&gt;

&lt;p&gt;This means your AI needs persistent state management. Every message arrives in the context of the entire conversation history. The system needs to efficiently retrieve relevant context without loading thousands of messages into memory.&lt;/p&gt;

&lt;p&gt;A common pattern: maintain a rolling context window (last N messages) plus a semantic search index over the full history. When a message arrives, combine recent context with semantically relevant older messages to build the prompt.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Asynchronous by nature
&lt;/h2&gt;

&lt;p&gt;In a dedicated app, you control the UX. Response time, typing indicators, read receipts — all customizable. On WhatsApp or Telegram, you're constrained by the platform's UX.&lt;/p&gt;

&lt;p&gt;This is actually an advantage. Messaging apps have built-in affordances for asynchronous communication: typing indicators, delivery receipts, "last seen" timestamps. Users already expect variable response times in messaging. An AI that takes 3-5 seconds to respond feels natural in WhatsApp but painfully slow in a dedicated chat UI.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Rich media is native
&lt;/h2&gt;

&lt;p&gt;Modern messaging platforms support images, voice messages, stickers, reactions, location sharing, and more. An AI companion on WhatsApp can send a voice note, share a photo, or react with an emoji — all using native platform features.&lt;/p&gt;

&lt;p&gt;This creates a much richer interaction model than text-only AI interfaces. The AI can "see" images users send (via vision models), respond with voice (via TTS), and share relevant images — all feeling native to the platform.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Delivery guarantees and state
&lt;/h2&gt;

&lt;p&gt;Messaging platforms handle delivery reliability. If the user's phone is offline, WhatsApp queues the message. Read receipts tell you whether the user has seen your response. This information is valuable for AI behavior:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Did the user read my last 3 messages without responding? Maybe I should stop sending.&lt;/li&gt;
&lt;li&gt;Did they come back after 3 days? Acknowledge the gap naturally.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  5. Multi-device considerations
&lt;/h2&gt;

&lt;p&gt;WhatsApp Web, Telegram Desktop — users access messaging from multiple devices. Your AI's webhook receiver needs to handle deduplication and maintain consistent state across these touchpoints.&lt;/p&gt;

&lt;p&gt;The integration layer&lt;/p&gt;

&lt;p&gt;Most messaging platform integrations use one of two approaches:&lt;/p&gt;

&lt;p&gt;Official API — WhatsApp Business API, Telegram Bot API. Clean, sanctioned, limited. Good for business use cases but often restricted for companion-style interactions.&lt;/p&gt;

&lt;p&gt;Protocol-level integration — Libraries like GramJS (Telegram) or unofficial WhatsApp bridges. More capabilities but more fragile. Requires careful management of connections, sessions, and rate limits.&lt;/p&gt;

&lt;p&gt;The ideal architecture often combines both: official APIs for reliability where possible, protocol-level access for features that official APIs don't support.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I've learned
&lt;/h2&gt;

&lt;p&gt;After spending months in this space, a few lessons stand out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Latency matters more than quality. A good response in 2 seconds beats a great response in 10 seconds. People expect messaging to feel real-time.&lt;/li&gt;
&lt;li&gt;Proactive messaging is the killer feature. AI that texts first — a good morning message, a check-in, a random thought — drives engagement more than any model improvement.&lt;/li&gt;
&lt;li&gt;Platform constraints are features. Being limited to WhatsApp's UX forces simplicity. No buttons, no carousels, no complex UI — just conversation. This is actually what makes it feel real.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The messaging platform era of AI is just beginning. And I think it's going to be the one that makes AI companionship mainstream.&lt;/p&gt;

</description>
      <category>whatsapp</category>
      <category>telegram</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
