<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Yusuf Elnady</title>
    <description>The latest articles on Forem by Yusuf Elnady (@yelnady).</description>
    <link>https://forem.com/yelnady</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3825271%2F7af686f0-17df-4a9a-b75a-71d85cd7a984.jpg</url>
      <title>Forem: Yusuf Elnady</title>
      <link>https://forem.com/yelnady</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/yelnady"/>
    <language>en</language>
    <item>
      <title>We Built a Free Islamic School Management System — Here's Everything It Does</title>
      <dc:creator>Yusuf Elnady</dc:creator>
      <pubDate>Thu, 07 May 2026 10:30:28 +0000</pubDate>
      <link>https://forem.com/yelnady/we-built-a-free-islamic-school-management-system-heres-everything-it-does-4lg1</link>
      <guid>https://forem.com/yelnady/we-built-a-free-islamic-school-management-system-heres-everything-it-does-4lg1</guid>
      <description>&lt;p&gt;Most Islamic schools in the US run on spreadsheets, WhatsApp groups, and prayer. Teachers track Quran memorization in notebooks. Admins chase tuition payments over text message. Parents have no idea if their child showed up to class until they ask.&lt;/p&gt;

&lt;p&gt;We built &lt;strong&gt;Qaf School App&lt;/strong&gt; to fix that — and we made it completely free.&lt;/p&gt;




&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Qaf School App&lt;/strong&gt; is a full-stack Islamic school management system purpose-built for Quran schools, weekend Islamic programs, and madrasa-style education. It covers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Real-time attendance with instant parent notifications&lt;/li&gt;
&lt;li&gt;Homework tracking for all 114 Surahs&lt;/li&gt;
&lt;li&gt;Quran audio player and recording submissions&lt;/li&gt;
&lt;li&gt;Financial management (tuition, expenses, donations)&lt;/li&gt;
&lt;li&gt;A gamified stars &amp;amp; trophies system for students&lt;/li&gt;
&lt;li&gt;WhatsApp-style parent–teacher messaging&lt;/li&gt;
&lt;li&gt;Substitute teacher matching&lt;/li&gt;
&lt;li&gt;6 languages including full Arabic RTL support&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Completely free. No subscription. No payment processing fees.&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Problem: Islamic Schools Are Underserved by EdTech
&lt;/h2&gt;

&lt;p&gt;Generic school management tools (PowerSchool, ClassDojo, Google Classroom) weren't designed for Islamic education. They don't know what Surah Al-Baqarah is. They have no concept of Hifz classes or Tajweed levels. They can't track whether a student memorized verses 1–5 or 6–10 this week.&lt;/p&gt;

&lt;p&gt;Islamic schools have unique needs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Quran memorization (Hifz) with verse-level tracking&lt;/li&gt;
&lt;li&gt;Multiple class types: QRN, ARA, ISL, TJW, TFS, HFZ, NQ, KGN&lt;/li&gt;
&lt;li&gt;Mixed-language families (Arabic, Somali, Turkish, French)&lt;/li&gt;
&lt;li&gt;Volunteer teachers who rotate through substitute roles&lt;/li&gt;
&lt;li&gt;Community-run finances: Venmo, Zelle, cash, check&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So we built something that actually fits.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Qaf App Does
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Real-Time Attendance — Parents Know Instantly
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fupovuhdedgubj7on7s92.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fupovuhdedgubj7on7s92.png" alt="Attendance tracking" width="800" height="1738"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Teachers mark attendance with a single tap: &lt;strong&gt;Present&lt;/strong&gt;, &lt;strong&gt;Late&lt;/strong&gt;, or &lt;strong&gt;Absent&lt;/strong&gt;. The moment a student is marked, parents receive a push notification on their phone.&lt;/p&gt;

&lt;p&gt;No more "Did my child go to class today?" texts to the teacher.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Three-status system (Present / Late / Absent) with timestamps&lt;/li&gt;
&lt;li&gt;Flexible editing — teachers can update attendance anytime&lt;/li&gt;
&lt;li&gt;Weekly attendance grid for admins&lt;/li&gt;
&lt;li&gt;Absence request system — parents submit advance notices through the app&lt;/li&gt;
&lt;li&gt;Daily 3 PM reminder to teachers for pending submissions&lt;/li&gt;
&lt;li&gt;Attendance analytics: absence patterns, low-attendance days, holiday identification&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.qaf.app%2Fscreenshots%2Fattendance-weekly.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.qaf.app%2Fscreenshots%2Fattendance-weekly.png" alt="Weekly attendance overview" width="800" height="373"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  2. Homework Portal — All 114 Surahs Built In
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdcohj358yowpq6cseg5z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdcohj358yowpq6cseg5z.png" alt="Teacher adding homework" width="800" height="1738"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For Quran classes, adding homework takes two clicks. All 114 Surahs are preloaded in order. Teachers select the Surah, mark it as new memorization or revision, add optional notes, and post — parents are notified instantly.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Built-in Surah list (descending order, ready to use)&lt;/li&gt;
&lt;li&gt;Homework types: New Memorization, Revision, Arabic, Islamic Studies, Other&lt;/li&gt;
&lt;li&gt;Full edit/delete flexibility after posting&lt;/li&gt;
&lt;li&gt;Admin dashboard showing school-wide homework completion&lt;/li&gt;
&lt;li&gt;Weekly cycle with daily reminders for pending submissions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff3k9nk8w0lifef3l74rb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff3k9nk8w0lifef3l74rb.png" alt="Parent homework view" width="800" height="1738"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Parents see exactly what their child is supposed to memorize that week, with the full Surah and syllabus one tap away.&lt;/p&gt;




&lt;h3&gt;
  
  
  3. Quran Audio Player — Practice Anywhere
&lt;/h3&gt;

&lt;p&gt;Students can listen to any of the 114 Surahs directly in the app, with support for multiple reciters and verse-by-verse playback. This helps students practice their assigned memorization at home before class.&lt;/p&gt;

&lt;p&gt;Teachers can also link audio directly to homework assignments, so parents know exactly what the recitation should sound like.&lt;/p&gt;




&lt;h3&gt;
  
  
  4. Stars &amp;amp; Trophies — Gamified Motivation
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7ul9w01c1ja5s09bfmqs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7ul9w01c1ja5s09bfmqs.png" alt="Stars and trophies system" width="800" height="1738"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Teachers award 0–5 stars per homework submission with written feedback. Students accumulate stars toward five trophy tiers: &lt;strong&gt;Bronze → Silver → Gold → Platinum → Diamond&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Monthly and semester leaderboards&lt;/li&gt;
&lt;li&gt;Trophy milestone celebrations (with confetti)&lt;/li&gt;
&lt;li&gt;Parent visibility into their child's progress and next goal&lt;/li&gt;
&lt;li&gt;Admin overview of star distribution across the school&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It works. Students who know they're on the leaderboard show up — and memorize more.&lt;/p&gt;




&lt;h3&gt;
  
  
  5. Financial Management — Without the Chaos
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.qaf.app%2Fscreenshots%2Fbudget-payments.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.qaf.app%2Fscreenshots%2Fbudget-payments.png" alt="Budget and payments" width="800" height="373"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Qaf doesn't process payments — schools collect money their own way (cash, Venmo, Zelle, check). What Qaf does is &lt;em&gt;track&lt;/em&gt; it, cleanly.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Record payments by semester or annually&lt;/li&gt;
&lt;li&gt;Payment status: Verified, Pending, Rejected&lt;/li&gt;
&lt;li&gt;One-click payment reminders sent as push notifications to unpaid families&lt;/li&gt;
&lt;li&gt;Expense reimbursements with receipt photo upload (teachers submit, admins approve)&lt;/li&gt;
&lt;li&gt;Donation tracking (separate from tuition)&lt;/li&gt;
&lt;li&gt;CSV export for accountants&lt;/li&gt;
&lt;li&gt;Full budget dashboard: income, expenses, remaining balance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.qaf.app%2Fscreenshots%2Fschool-reports.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.qaf.app%2Fscreenshots%2Fschool-reports.png" alt="School reports" width="800" height="381"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  6. Substitute Teacher Portal
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp4uhfqwwkso8mg7kvs86.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp4uhfqwwkso8mg7kvs86.png" alt="Substitute teacher system" width="800" height="1738"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When a teacher can't make it, they post a substitute request with lesson plan details, time slot, and student age group. Available substitutes in the system receive a notification, review the details, and accept with one tap ("Press to Teach").&lt;/p&gt;

&lt;p&gt;The substitute then takes attendance and posts homework just like the regular teacher. Everything is logged — no audit gaps.&lt;/p&gt;




&lt;h3&gt;
  
  
  7. WhatsApp-Style Messaging — But School-Scoped
&lt;/h3&gt;

&lt;p&gt;Every parent, teacher, and admin can message each other directly through the app in a familiar interface — read receipts, timestamps, unread badges, archive. The key difference: &lt;strong&gt;conversations are scoped to the school&lt;/strong&gt;. A parent from School A can't see or contact anyone from School B.&lt;/p&gt;

&lt;p&gt;No more managing multiple WhatsApp groups where parents end up in each other's threads.&lt;/p&gt;




&lt;h3&gt;
  
  
  8. Registration &amp;amp; Enrollment
&lt;/h3&gt;

&lt;p&gt;Parents register new students or re-enroll existing ones through dynamic forms customized per school. Admins receive email notifications and can process registrations from a dedicated dashboard.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Playground&lt;/strong&gt; — a drag-and-drop interface — lets admins bulk-assign students to classes at the start of each year.&lt;/p&gt;




&lt;h3&gt;
  
  
  9. Multilingual — 6 Languages with Full Arabic RTL
&lt;/h3&gt;

&lt;p&gt;Qaf supports &lt;strong&gt;English, Arabic, French, Turkish, German, and Somali&lt;/strong&gt; — the languages of the families actually attending Islamic schools in the US and beyond. We are proudly exist across 6 countries across the globe.&lt;/p&gt;

&lt;p&gt;Arabic support is full RTL with proper text reshaping, not a broken half-implementation.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Admin Portal Has 30+ Tools
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.qaf.app%2Fscreenshots%2Fadmin-dashboard.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.qaf.app%2Fscreenshots%2Fadmin-dashboard.png" alt="Admin dashboard" width="800" height="373"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Admins get a comprehensive set of management tools:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Module&lt;/th&gt;
&lt;th&gt;What It Does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Students&lt;/td&gt;
&lt;td&gt;Full database, enrollment history, parent linking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Teachers&lt;/td&gt;
&lt;td&gt;Profiles, class assignments, substitute coordination&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Parents&lt;/td&gt;
&lt;td&gt;Contact info, payment tracking, engagement metrics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Classes&lt;/td&gt;
&lt;td&gt;Creation, type selection, teacher assignment, student roster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Finance&lt;/td&gt;
&lt;td&gt;Budget dashboard, income, expenses, reimbursements&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reports&lt;/td&gt;
&lt;td&gt;Attendance, grades, stars, financial — CSV exportable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Announcements&lt;/td&gt;
&lt;td&gt;School-wide or class-specific with push + email delivery&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Academic Calendar&lt;/td&gt;
&lt;td&gt;Color-coded events, holidays, exams, meetings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Book Tracking&lt;/td&gt;
&lt;td&gt;Textbook inventory and student distribution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Start New Year&lt;/td&gt;
&lt;td&gt;Year transition with grade advancement, star reset, re-enrollment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Permissions&lt;/td&gt;
&lt;td&gt;Admin / Manager / Treasurer role-based access control&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.qaf.app%2Fscreenshots%2Fstudents-table.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.qaf.app%2Fscreenshots%2Fstudents-table.png" alt="Students table" width="800" height="369"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Built With Modern Stack
&lt;/h2&gt;

&lt;p&gt;Qaf School App is built on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Next.js 16&lt;/strong&gt; (App Router) + &lt;strong&gt;React 19&lt;/strong&gt; + &lt;strong&gt;TypeScript&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Firebase&lt;/strong&gt; (Firestore real-time database, Auth, Storage)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tailwind CSS 4&lt;/strong&gt; + &lt;strong&gt;shadcn/ui&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;React Query&lt;/strong&gt; with optimistic updates and offline support&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Framer Motion&lt;/strong&gt; for animations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vercel&lt;/strong&gt; for hosting&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloud Run&lt;/strong&gt; for backend notification services (attendance alerts, homework reminders)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Real-time sync means a teacher tapping "Present" in class shows up on the parent's screen in under a second — no refresh needed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Free?
&lt;/h2&gt;

&lt;p&gt;Islamic schools are community institutions. Most operate on tight budgets run by volunteers. Charging per-student or per-school fees would price out exactly the schools that need this most.&lt;/p&gt;

&lt;p&gt;Qaf School App is free. There are no subscription tiers. There are no per-student fees. Schools receive payments however they want — we don't take a cut.&lt;/p&gt;




&lt;h2&gt;
  
  
  Who It's For
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Weekend Islamic schools&lt;/strong&gt; (the classic 2-day Quran program)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full-time Islamic schools&lt;/strong&gt; with multiple academic subjects&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quran memorization centers&lt;/strong&gt; (Hifz programs)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Community masjids&lt;/strong&gt; with educational programs&lt;/li&gt;
&lt;li&gt;Any school teaching Quran, Arabic, or Islamic Studies&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;The app is live at &lt;strong&gt;&lt;a href="https://www.qaf.app" rel="noopener noreferrer"&gt;qaf.app&lt;/a&gt;&lt;/strong&gt;. You can explore the demo videos at &lt;strong&gt;&lt;a href="https://www.qaf.app/demos" rel="noopener noreferrer"&gt;qaf.app/demos&lt;/a&gt;&lt;/strong&gt; and see a full feature breakdown at &lt;strong&gt;&lt;a href="https://www.qaf.app/features/admin-features" rel="noopener noreferrer"&gt;qaf.app/features/admin-features&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you run an Islamic school or know someone who does — share this. The goal is simple: free every Islamic school from the spreadsheet.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Qaf School App — Quran, Qiraa, Qudwa.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>education</category>
      <category>webdev</category>
      <category>productivity</category>
      <category>islam</category>
    </item>
    <item>
      <title>I Built Rayan: A 3D Memory Palace Live Agent That Listens, Remembers, and Speaks Back</title>
      <dc:creator>Yusuf Elnady</dc:creator>
      <pubDate>Mon, 16 Mar 2026 03:15:09 +0000</pubDate>
      <link>https://forem.com/yelnady/i-built-a-3d-memory-palace-that-listens-remembers-and-speaks-back-2hip</link>
      <guid>https://forem.com/yelnady/i-built-a-3d-memory-palace-that-listens-remembers-and-speaks-back-2hip</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;I created this blog post for the purposes of entering the Gemini Live Agent Challenge hackathon. #GeminiLiveAgentChallenge&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;We forget things. All the time. Not in some big philosophical way. In the most basic, embarrassing way. What did you eat yesterday? What was that person's name at the conference? What did your manager say two Fridays back?&lt;/p&gt;

&lt;p&gt;I kept seeing the same thing on Reddit, in productivity forums, in real conversations. &lt;em&gt;I take in so much information every day and I remember almost none of it.&lt;/em&gt; We take notes we never look at again. We bookmark articles we never reopen. We record meetings we never re-watch.&lt;/p&gt;

&lt;p&gt;The problem isn't capture. We have more capture tools than ever. The problem is &lt;strong&gt;retrieval&lt;/strong&gt;. Our memories are flat, unsearchable, disconnected from each other.&lt;/p&gt;

&lt;h2&gt;
  
  
  Then I Found Gemini Live
&lt;/h2&gt;

&lt;p&gt;I started playing with the &lt;strong&gt;Gemini Live 2.5 Flash&lt;/strong&gt; API. And something clicked.&lt;/p&gt;

&lt;p&gt;A model that holds a persistent, real-time audio session. You can interrupt it mid-sentence and it recovers. It calls tools asynchronously without breaking the conversation. It matches your tone, your energy, your pace through affective dialogue. It accepts screen sharing and video alongside voice. And it's &lt;em&gt;fast&lt;/em&gt;. Really, genuinely fast.&lt;/p&gt;

&lt;p&gt;I realized I could build something that doesn't just capture memories. It could &lt;strong&gt;be&lt;/strong&gt; your memory. A system that listens alongside you, pulls out what matters on its own, organizes it in space, and lets you &lt;em&gt;walk through it&lt;/em&gt; and &lt;em&gt;talk to it&lt;/em&gt;. No keyboard. No typing prompts.&lt;/p&gt;

&lt;p&gt;That system is &lt;strong&gt;Rayan - Your 3D Memory Palace&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Rayan Actually Is
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8byyz7vxp75vnuqr1a3i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8byyz7vxp75vnuqr1a3i.png" alt=" " width="800" height="371"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Rayan turns everything you hear, see, and say into a &lt;strong&gt;3D Memory Palace&lt;/strong&gt; you can explore. It's not a notes app. It's not a chatbot with a search bar. It's a fully rendered Three.js environment you navigate in first person. Rooms, walls, glowing objects, doors. Every object on every wall is a memory that Rayan extracted, categorized, and placed there for you, in real time, while you were just going about your day.&lt;/p&gt;

&lt;p&gt;Two persistent Gemini Live voice agents run the whole thing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CaptureAgent&lt;/strong&gt; listens alongside you. Run it during a lecture, a meeting, a podcast, while browsing the web with screen share. It passively analyzes what it hears and sees. When it detects a concept worth keeping (confidence &amp;gt;= 0.7), it silently extracts it. It generates a title, summary, keywords, classifies the type, creates an embedding, and hands it to the Memory Architect for room placement. A new 3D artifact shows up on your palace wall in real time. You don't press anything. You don't type anything. You just live, and the palace builds itself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;RecallAgent&lt;/strong&gt; is your voice companion inside the palace. Walk up to any room, any artifact, and just ask. "What did I learn about transformers last week?" or "Quiz me on everything in my Biology room." It searches your memories semantically, grounds every answer in what you've actually captured (it literally cannot hallucinate things not in your palace), and speaks back to you. It navigates rooms, highlights artifacts, pulls up related memories as it talks. It can create new memories, edit existing ones, generate visual mind maps, do web searches. All by voice, mid-conversation.&lt;/p&gt;

&lt;p&gt;The palace isn't a metaphor. It's a real 3D space you walk through.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why a Memory Palace
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://en.wikipedia.org/wiki/Method_of_loci" rel="noopener noreferrer"&gt;method of loci&lt;/a&gt; is one of the oldest memory techniques that exists. The idea is simple. Place the things you want to remember in specific locations within a familiar space. Then mentally walk through that space to retrieve them. The spatial context, like &lt;em&gt;this fact was on the north wall of the library, next to the crystal orb about neural networks&lt;/em&gt;, gives you retrieval cues that flat lists and folders never will.&lt;/p&gt;

&lt;p&gt;Rayan makes that literal. Your memories aren't rows in a database. They're glowing hologram panels, floating books, crystal orbs, speech bubbles, and framed screenshots spread across themed 3D rooms that you navigate through. The spatial encoding isn't a gimmick. It's the core retrieval mechanism, backed by voice and semantic search.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flg1t1a7yw1j91w0ffje5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flg1t1a7yw1j91w0ffje5.png" alt=" " width="800" height="369"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Two Modes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Capture. Your Always-On Memory Companion.
&lt;/h3&gt;

&lt;p&gt;When you start a Capture session, Rayan opens a persistent Gemini Live connection (&lt;code&gt;gemini-live-2.5-flash-native-audio&lt;/code&gt;) and starts listening. You get three input approaches.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Voice only.&lt;/strong&gt; Just talk. Rayan listens to your microphone, pulls out concepts from your speech, and builds your palace as you speak.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Screen share.&lt;/strong&gt; Share your screen and Rayan becomes your study partner. It sees your slides, your browser tabs, your documents. It extracts key information from both what it sees and what you say about it. When it spots a good diagram or slide, it autonomously calls &lt;code&gt;take_screenshot&lt;/code&gt;, uploads the image to Cloud Storage, and places it as a framed visual artifact on your palace wall.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Camera.&lt;/strong&gt; Point your webcam at a whiteboard, a textbook, your environment. Rayan processes the video frames alongside your voice.&lt;/p&gt;

&lt;p&gt;You control how aggressively it captures. Want it taking notes every few seconds? Adjust the cadence. Want it to listen longer before synthesizing multiple memories on the same topic? You can do that too.&lt;/p&gt;

&lt;p&gt;As Capture runs, new artifacts appear in your 3D palace in real time. No page refresh, no manual save. The WebSocket connection pushes &lt;code&gt;palace_update&lt;/code&gt; events the instant an artifact is created, and the Three.js scene renders it live.&lt;/p&gt;

&lt;p&gt;Every extraction goes through &lt;strong&gt;smart deduplication&lt;/strong&gt;. New captures are cosine-compared against everything saved in the current session. Near-duplicates (similarity &amp;gt;= 0.90) are merged, not duplicated. The palace stays clean.&lt;/p&gt;

&lt;h3&gt;
  
  
  Recall. Your Voice-Navigable Second Brain.
&lt;/h3&gt;

&lt;p&gt;In Recall mode, a second Gemini Live agent connects and becomes your conversational guide through the palace. You speak naturally. No wake words, no rigid commands. It understands context, follows up on previous statements, handles interruptions gracefully through Gemini Live's built-in VAD and the &lt;code&gt;interrupted&lt;/code&gt; server event, and executes tools mid-conversation.&lt;/p&gt;

&lt;p&gt;Here's what Recall can do, all by voice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Navigate rooms.&lt;/strong&gt; "Take me to my Machine Learning room" triggers &lt;code&gt;navigate_to_room&lt;/code&gt;, and your camera flies there.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Highlight artifacts.&lt;/strong&gt; "Show me what I captured about attention mechanisms" triggers &lt;code&gt;highlight_artifact&lt;/code&gt;, and the relevant 3D object scales up and glows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Answer questions.&lt;/strong&gt; Every answer is grounded in your actual memories via semantic search. Rayan &lt;em&gt;cannot&lt;/em&gt; make up information that isn't in your palace.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Create new memories.&lt;/strong&gt; "Remember that the deadline is March 20th" creates a new artifact mid-conversation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Edit memories.&lt;/strong&gt; "Update my notes on the project, the scope changed to include mobile" modifies an existing artifact.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Synthesize rooms.&lt;/strong&gt; "Synthesize this room" generates a creative AI mind map image that visually summarizes every memory in the current room, rendered directly on the 3D wall.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Web search.&lt;/strong&gt; "Look up the latest paper on mixture of experts" runs a grounded web search and can save findings as enrichment artifacts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bird's-eye view.&lt;/strong&gt; "Show me the map" toggles to an overview camera so you can see your whole palace layout.&lt;/p&gt;

&lt;p&gt;The critical piece here is &lt;strong&gt;semantic grounding&lt;/strong&gt;. Every time you enter a room, navigate to an artifact, or ask a question, the RecallAgent runs a real-time semantic search.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Your query (or the current artifact's summary) is embedded via &lt;strong&gt;Vertex AI &lt;code&gt;text-embedding-005&lt;/code&gt;&lt;/strong&gt; into a 768-dimensional vector.&lt;/li&gt;
&lt;li&gt;That vector is cosine-compared against every stored artifact embedding in Firestore.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;top 8 most semantically relevant memories&lt;/strong&gt; are injected into the live system prompt under a &lt;code&gt;MEMORIES&lt;/code&gt; section.&lt;/li&gt;
&lt;li&gt;The system prompt enforces &lt;em&gt;"ONLY use information from the provided MEMORIES section. NEVER hallucinate or invent information. Cite which artifact/room the information comes from."&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;On every room navigation and artifact highlight, &lt;code&gt;update_context()&lt;/code&gt; re-runs the search and injects fresh memories mid-conversation via &lt;code&gt;send_client_content&lt;/code&gt;. No reconnection needed.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This isn't RAG tacked onto a chatbot. It's a voice-driven retrieval system where the grounding context updates &lt;em&gt;continuously&lt;/em&gt; as you move through your palace.&lt;/p&gt;




&lt;h2&gt;
  
  
  How People Actually Use This
&lt;/h2&gt;

&lt;p&gt;Rayan isn't a productivity app you try once and forget. It's a persistent second brain you build over weeks and months.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Learning.&lt;/strong&gt; Run Capture during any lecture or online course. Your palace auto-fills with searchable concepts as you listen. Read aloud or screen-share a textbook and Rayan clusters key ideas into rooms by topic. Walk your palace before a test and ask Recall to quiz you. Capture browser tabs and articles across a research session and let Recall surface connections between them. Capture vocabulary in context for language learning. Revisit your palace daily for spaced repetition that's more durable than flashcards because of the spatial encoding.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Work.&lt;/strong&gt; Run Capture during meetings. Action items, decisions, and names get auto-extracted. Capture everything discussed during client onboarding so Recall knows the context as well as you do. Capture architecture discussions and technical decisions so Recall can answer "why did we do it this way?" months later. Build a room per direct report and Recall surfaces what you discussed last time before every one-on-one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Creative work.&lt;/strong&gt; Capture sources, quotes, and ideas, then Recall helps you cite and cross-reference while writing. Capture lore and character decisions for worldbuilding and Recall keeps your fictional world consistent. Capture every brainstorming idea and Recall finds the patterns across messy ideation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Personal life.&lt;/strong&gt; Capture travel recommendations and Recall answers "what was that restaurant someone mentioned?" Capture doctor conversations and health research. Capture birthdays, preferences, and conversations so you remember what matters to people.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Power features.&lt;/strong&gt; Recall can save new memories during a voice conversation without leaving the palace. Everything persists across sessions, so your palace from six months ago is fully searchable today. Generate an AI mind map of any room on demand. Watch new 3D artifacts appear live as Capture runs.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Technology Stack
&lt;/h2&gt;

&lt;p&gt;Rayan is built entirely on Google's AI and cloud ecosystem. Let me walk through every layer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Four Gemini Models, Four Roles
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;gemini-live-2.5-flash-native-audio&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Real-time two-way voice streaming for CaptureAgent and RecallAgent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;gemini-2.5-flash&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Text generation for the Memory Architect (categorization), Narrator Agent (narration scripts), general AI tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;gemini-2.5-flash-image&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Creative synthesis. Generates styled mind map images that visually summarize a room's memories&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;text-embedding-005&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;768-dimensional semantic embeddings for every artifact, powering cosine similarity search and grounding&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Three things made Gemini Live the only real option for this project.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Persistent sessions.&lt;/strong&gt; Gemini Live holds a long-running WebSocket connection. The CaptureAgent session can run for an entire hour-long lecture without reconnecting. This isn't request-response. It's a stateful, living conversation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Native tool calling.&lt;/strong&gt; Both agents register tools (navigate, highlight, create artifact, screenshot, web search, etc.) that Gemini calls autonomously mid-conversation. The tools execute asynchronously. Gemini doesn't freeze while waiting for a tool result. It keeps talking and works the result in when it arrives.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Affective dialogue.&lt;/strong&gt; &lt;code&gt;enable_affective_dialog=True&lt;/code&gt; means Gemini adjusts its tone, pacing, and empathy based on your emotional cues. When you sound excited, Rayan matches that energy. When you're quietly focused, it stays subdued. This is the difference between a tool and a companion.&lt;/p&gt;

&lt;h3&gt;
  
  
  Firebase
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;th&gt;What Rayan uses it for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Firebase Authentication&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Google Sign-In on the frontend. One click, you get a Firebase ID token, verified server-side via Firebase Admin SDK on every WebSocket connection and REST request. No passwords, no custom auth flows.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Firebase Hosting&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The React + Three.js frontend is built as a static SPA and deployed to Firebase Hosting. CDN distribution, SSL, SPA routing all handled.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Firebase Analytics&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Frontend analytics tracking engagement and feature usage.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Firebase was the natural choice for auth because it integrates with every other Google Cloud service Rayan uses. The ID token from Firebase Auth is the same identity that Firestore, Cloud Storage, and Cloud Run understand natively. One identity system across the whole stack.&lt;/p&gt;

&lt;h3&gt;
  
  
  Google Cloud Infrastructure
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;th&gt;What Rayan uses it for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cloud Firestore&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Primary database. Every room, artifact, capture session, and user profile lives here. Firestore's document model maps perfectly to the palace hierarchy, &lt;code&gt;users/{userId}/rooms/{roomId}/artifacts/{artifactId}&lt;/code&gt;. Embeddings (768-float arrays) are stored inline on each artifact document. No separate vector database needed at current scale.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cloud Storage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Two buckets. The media bucket stores screenshots captured by CaptureAgent and mind map images generated by the synthesis service. The frontend bucket hosts the static SPA build.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cloud Run&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The FastAPI backend runs as a containerized service. &lt;strong&gt;Session affinity is enabled.&lt;/strong&gt; This is critical because both agents maintain long-lived WebSocket connections to Gemini Live. If Cloud Run routed requests to different instances, those sessions would break. Session affinity ensures all traffic from a connected user sticks to the same container. Configured at 2 vCPU, 2 GiB memory, min 1 / max 10 instances.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Vertex AI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Powers &lt;code&gt;text-embedding-005&lt;/code&gt; for generating 768-dimensional embeddings. Also provides the client library for all Gemini API calls via the &lt;code&gt;google-genai&lt;/code&gt; SDK with Vertex AI backend.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;IAM and Service Accounts&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;A dedicated &lt;code&gt;rayan-backend&lt;/code&gt; service account with exactly three roles. &lt;code&gt;roles/datastore.user&lt;/code&gt; for Firestore, &lt;code&gt;roles/storage.objectAdmin&lt;/code&gt; for Cloud Storage, &lt;code&gt;roles/aiplatform.user&lt;/code&gt; for Vertex AI. Least privilege.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Why Google Cloud Specifically
&lt;/h3&gt;

&lt;p&gt;I'll be direct about this. Rayan could not have been built this cleanly on another cloud provider. The reason is integration density. Look at what happens when a user speaks a sentence during a Capture session.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Audio arrives at &lt;strong&gt;Cloud Run&lt;/strong&gt; via WebSocket.&lt;/li&gt;
&lt;li&gt;Cloud Run forwards it to the &lt;strong&gt;Gemini Live API&lt;/strong&gt; (same Google network, minimal latency).&lt;/li&gt;
&lt;li&gt;Gemini detects a concept and calls the &lt;code&gt;capture_concept&lt;/code&gt; tool.&lt;/li&gt;
&lt;li&gt;The backend generates an embedding via &lt;strong&gt;Vertex AI &lt;code&gt;text-embedding-005&lt;/code&gt;&lt;/strong&gt; (same network).&lt;/li&gt;
&lt;li&gt;The artifact is written to &lt;strong&gt;Firestore&lt;/strong&gt; (same network, same service account).&lt;/li&gt;
&lt;li&gt;If a screenshot is involved, it goes to &lt;strong&gt;Cloud Storage&lt;/strong&gt; (same network, same service account).&lt;/li&gt;
&lt;li&gt;The Memory Architect categorizes it via &lt;strong&gt;&lt;code&gt;gemini-2.5-flash&lt;/code&gt;&lt;/strong&gt; (same network).&lt;/li&gt;
&lt;li&gt;The frontend, hosted on &lt;strong&gt;Firebase Hosting&lt;/strong&gt;, receives the update over the same WebSocket.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Every hop is Google-to-Google. No cross-cloud latency, no credential translation, no API gateway stitching. The service account that Cloud Run uses is the same identity that Firestore, Storage, Vertex AI, and the Gemini API all trust. For a real-time voice agent where latency kills the experience, that coherence matters more than anything.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Agent Architecture
&lt;/h2&gt;

&lt;h3&gt;
  
  
  CaptureAgent
&lt;/h3&gt;

&lt;p&gt;This is the most complex piece of Rayan. It holds a persistent &lt;code&gt;async with client.aio.live.connect()&lt;/code&gt; context that stays alive for the entire capture session, potentially 60+ minutes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Initialization looks like this.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LiveConnectConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;response_modalities&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AUDIO&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;enable_affective_dialog&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system_instruction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;CAPTURE_LIVE_TOOLS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;speech_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;SpeechConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;voice_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;VoiceConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;prebuilt_voice_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;PrebuiltVoiceConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;voice_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Aoede&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Audio and video stream concurrently.&lt;/strong&gt; The frontend sends two data channels over WebSocket. Microphone audio at 16kHz mono PCM in ~100ms chunks, captured via AudioWorklet. And video frames as JPEG-encoded screen captures or webcam frames. Both are forwarded to the Gemini Live session simultaneously.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Extraction is autonomous.&lt;/strong&gt; The CaptureAgent doesn't wait to be told what to remember. Its system prompt instructs it to continuously analyze both streams. When it detects something worth capturing (confidence &amp;gt;= 0.7), it calls the &lt;code&gt;capture_concept&lt;/code&gt; tool on its own with a title (max 8 words), summary (50-150 words), type classification (one of 20+ types like lecture, insight, moment, goal, emotion), keywords, and a confidence score.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deduplication happens before saving.&lt;/strong&gt; The new concept's embedding is cosine-compared against every artifact already captured this session. If similarity &amp;gt;= 0.90, it merges into the existing artifact instead of creating a duplicate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Then the Memory Architect takes over.&lt;/strong&gt; It decides where the artifact belongs based on cosine similarity against existing room embeddings. Similarity &amp;gt;= 0.75 means auto-assign to the best room, no confirmation needed. Between 0.50 and 0.75 means suggest a room match and show a confirmation prompt. Below 0.50 means suggest creating a new room entirely.&lt;/p&gt;

&lt;p&gt;Registered tools for CaptureAgent include &lt;code&gt;capture_concept&lt;/code&gt;, &lt;code&gt;create_artifact&lt;/code&gt;, &lt;code&gt;create_room&lt;/code&gt;, &lt;code&gt;take_screenshot&lt;/code&gt;, &lt;code&gt;edit_artifact&lt;/code&gt;, &lt;code&gt;delete_artifact&lt;/code&gt;, &lt;code&gt;web_search&lt;/code&gt;, &lt;code&gt;navigate_to_room&lt;/code&gt;, and &lt;code&gt;end_session&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  RecallAgent
&lt;/h3&gt;

&lt;p&gt;This is the conversational companion inside the 3D palace. It also holds a persistent Gemini Live session, but it's focused on retrieval and interaction rather than extraction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The key innovation is the semantic context pipeline.&lt;/strong&gt; The RecallAgent maintains a continuously updated context.&lt;/p&gt;

&lt;p&gt;On session start, semantic search returns the top 8 most relevant memories across the entire palace. These get injected into the system prompt. On room navigation, when you or the agent navigate to a new room, &lt;code&gt;update_context()&lt;/code&gt; re-runs semantic search scoped to that room and injects fresh memories via &lt;code&gt;send_client_content()&lt;/code&gt;. Mid-conversation, no reconnection. On artifact highlight, when discussing a specific artifact, its summary becomes a search query to find related memories.&lt;/p&gt;

&lt;p&gt;So the RecallAgent's knowledge of your palace is always fresh and contextually relevant. It doesn't go stale after session start.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Interruption handling feels natural.&lt;/strong&gt; Gemini Live's built-in Voice Activity Detection catches when you start speaking while Rayan is still talking. The &lt;code&gt;interrupted&lt;/code&gt; server event fires and the RecallAgent gracefully stops. It feels like a real conversation, not turn-based Q&amp;amp;A.&lt;/p&gt;

&lt;p&gt;Registered tools include &lt;code&gt;navigate_to_room&lt;/code&gt;, &lt;code&gt;navigate_to_map_view&lt;/code&gt;, &lt;code&gt;navigate_horizontal&lt;/code&gt;, &lt;code&gt;highlight_artifact&lt;/code&gt;, &lt;code&gt;create_artifact&lt;/code&gt;, &lt;code&gt;edit_artifact&lt;/code&gt;, &lt;code&gt;delete_artifact&lt;/code&gt;, &lt;code&gt;create_room&lt;/code&gt;, &lt;code&gt;synthesize_room&lt;/code&gt;, &lt;code&gt;web_search&lt;/code&gt;, &lt;code&gt;end_session&lt;/code&gt;, and &lt;code&gt;close_artifact&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvu0epop1qe95pcsmxwl1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvu0epop1qe95pcsmxwl1.png" alt=" " width="800" height="369"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Narrator Agent
&lt;/h3&gt;

&lt;p&gt;When you click an artifact in the 3D palace outside of a live voice session, the Narrator Agent activates. It loads the artifact from Firestore, finds the top 5 related artifacts via semantic search, then generates a narration script via &lt;code&gt;gemini-2.5-flash&lt;/code&gt;. The script has a specific structure. An opening of about 5 seconds that says something like &lt;em&gt;"This is from your machine learning study session..."&lt;/em&gt; Then 20-30 seconds of core content synthesized in conversational language. Then 5-10 seconds of connections to related memories. Then a 5-second invitation to explore further.&lt;/p&gt;

&lt;p&gt;If the narration contains a diagram trigger (&lt;code&gt;[DIAGRAM: type|title|description]&lt;/code&gt;), it generates visual diagrams too. Then it synthesizes the text into voice audio via Gemini Live and returns everything. Audio, text, diagrams, and related artifact links.&lt;/p&gt;

&lt;h3&gt;
  
  
  Memory Architect
&lt;/h3&gt;

&lt;p&gt;This handles categorization. It uses &lt;code&gt;gemini-2.5-flash&lt;/code&gt; (text, not live) to decide where each captured concept belongs.&lt;/p&gt;

&lt;p&gt;The algorithm embeds the concept's title and summary via &lt;code&gt;text-embedding-005&lt;/code&gt;, computes cosine similarity against every existing room's &lt;code&gt;topicEmbedding&lt;/code&gt;, and applies thresholds. If creating a new room, it infers the name from concept keywords and assigns a random style from 10 options. Library, lab, gallery, garden, workshop, museum, observatory, sanctuary, studio, dojo.&lt;/p&gt;

&lt;p&gt;The result is that your palace structures itself. You never manually create rooms or drag things into folders. The architecture emerges from the semantic relationships in your knowledge.&lt;/p&gt;




&lt;h2&gt;
  
  
  The 3D Palace
&lt;/h2&gt;

&lt;p&gt;The 3D palace isn't a visualization layer on top of a database. It's the primary interface.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scene Architecture
&lt;/h3&gt;

&lt;p&gt;Built with &lt;strong&gt;React Three Fiber&lt;/strong&gt; (React renderer for Three.js) and &lt;code&gt;@react-three/drei&lt;/code&gt; utilities.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Canvas (full-screen, dark background #060614)
├── Environment (HDRI ambient lighting)
├── PalaceGround (500x500 tiled plane)
├── Lobby (central hub at world origin, 12x12 units)
│   └── Doors (one per room, cycling through 4 walls)
├── Room (8x8x4 default dimensions)
│   ├── Walls (north, south, east, west)
│   ├── Floor and Ceiling
│   ├── Door portals (to connected rooms)
│   └── Artifacts (type-based 3D renderers)
└── Camera Controls
    ├── FirstPersonControls (WASD + mouse look)
    └── OverviewControls (OrbitControls, bird's-eye)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  16+ Distinct 3D Artifact Types
&lt;/h3&gt;

&lt;p&gt;Every artifact looks different based on its type.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Floating books&lt;/strong&gt; for document, lecture, and lesson artifacts. &lt;strong&gt;Hologram panels&lt;/strong&gt; for concepts and insights. &lt;strong&gt;Crystal orbs&lt;/strong&gt; (icosahedrons with orbiting particles) for enrichment artifacts. &lt;strong&gt;Speech bubbles&lt;/strong&gt; for conversations. &lt;strong&gt;Framed images&lt;/strong&gt; for screenshots and synthesis images mounted on walls. And &lt;strong&gt;20+ unique GLB models&lt;/strong&gt; including a brain, question mark, coffee cup, milestone trophy, heart, dream cloud, tree, headphones, cash stack, exam paper, speaker, warning sign, and a hamburger.&lt;/p&gt;

&lt;h3&gt;
  
  
  Instanced Rendering for Performance
&lt;/h3&gt;

&lt;p&gt;With potentially hundreds of artifacts in a room, rendering each one individually would kill performance. Rayan uses instanced rendering.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;BookInstancedRenderer&lt;/code&gt; clones the document GLB model per artifact, sharing geometry and textures. One draw call instead of N. &lt;code&gt;OrbInstancedRenderer&lt;/code&gt; uses &lt;code&gt;InstancedMesh&lt;/code&gt; for both orbs and their particles. Two draw calls instead of N times 6.&lt;/p&gt;

&lt;p&gt;This keeps the palace smooth even with dense rooms.&lt;/p&gt;

&lt;h3&gt;
  
  
  Camera System
&lt;/h3&gt;

&lt;p&gt;Two modes with smooth transitions. First-person gives you WASD movement with mouse look, constrained to room bounds with wall collision detection. This is how you experience the palace. Overview gives you a bird's-eye OrbitControls view at 55 units height with 45-degree FOV. This is how you survey the layout.&lt;/p&gt;

&lt;p&gt;Room transitions use &lt;code&gt;flyTo()&lt;/code&gt; for smooth interpolation from current position to target.&lt;/p&gt;




&lt;h2&gt;
  
  
  Real-Time Communication
&lt;/h2&gt;

&lt;p&gt;Rayan's real-time behavior runs through a single WebSocket connection per user at &lt;code&gt;/ws/{userId}&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Authentication
&lt;/h3&gt;

&lt;p&gt;Client connects and sends &lt;code&gt;{ type: "auth", token: "&amp;lt;Firebase ID token&amp;gt;" }&lt;/code&gt;. Backend verifies via Firebase Admin SDK. Connection established.&lt;/p&gt;

&lt;h3&gt;
  
  
  60+ Message Types
&lt;/h3&gt;

&lt;p&gt;The WebSocket carries a typed protocol.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Client sends&lt;/strong&gt; &lt;code&gt;capture_start&lt;/code&gt;, &lt;code&gt;video_frame&lt;/code&gt;, &lt;code&gt;capture_voice_chunk&lt;/code&gt;, &lt;code&gt;capture_end&lt;/code&gt; for the capture lifecycle. &lt;code&gt;live_session_start&lt;/code&gt;, &lt;code&gt;audio_chunk&lt;/code&gt;, &lt;code&gt;live_session_end&lt;/code&gt; for recall. &lt;code&gt;context_update&lt;/code&gt; to notify RecallAgent of room navigation. &lt;code&gt;ping&lt;/code&gt; every 30 seconds as heartbeat.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Server sends&lt;/strong&gt; &lt;code&gt;palace_update&lt;/code&gt; for real-time palace mutations (rooms added, artifacts added/updated/removed, connections, lobby doors). &lt;code&gt;capture_ack&lt;/code&gt; when a concept is extracted. &lt;code&gt;capture_audio&lt;/code&gt; and &lt;code&gt;capture_text&lt;/code&gt; for Rayan's spoken acknowledgments during capture. &lt;code&gt;capture_user_text&lt;/code&gt; for transcription of user speech. &lt;code&gt;live_audio&lt;/code&gt; and &lt;code&gt;live_text&lt;/code&gt; for Rayan's voice responses during recall. &lt;code&gt;live_tool_call&lt;/code&gt; for tool invocations (navigate, highlight, synthesize). &lt;code&gt;live_interrupted&lt;/code&gt; when the user cuts in. &lt;code&gt;artifact_recall&lt;/code&gt; for narration and diagrams when you click an artifact. &lt;code&gt;room_suggestion&lt;/code&gt; for room placement suggestions. &lt;code&gt;enrichment_update&lt;/code&gt; for web search results.&lt;/p&gt;

&lt;h3&gt;
  
  
  Audio Pipeline
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Browser to server.&lt;/strong&gt; &lt;code&gt;getUserMedia()&lt;/code&gt; at 16kHz mono. An AudioWorklet (&lt;code&gt;pcm-processor.js&lt;/code&gt;) processes raw PCM. About 100ms chunks get base64-encoded and sent over the WebSocket. Echo cancellation and noise suppression are enabled.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Server to browser.&lt;/strong&gt; Gemini Live returns base64-encoded Linear16 PCM at 24kHz. The client wraps raw PCM in a WAV header (44 bytes). The browser's &lt;code&gt;decodeAudioData()&lt;/code&gt; parses it. Web Audio API schedules sequential chunks for gapless playback. Audio plays as it arrives. No waiting for the full response.&lt;/p&gt;




&lt;h2&gt;
  
  
  Infrastructure as Code with Terraform
&lt;/h2&gt;

&lt;p&gt;The entire Google Cloud infrastructure is defined in a single Terraform file (&lt;code&gt;infrastructure/terraform/main.tf&lt;/code&gt;) and provisioned with one command.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;terraform apply &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-var&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"project_id=your-project-id"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-var&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"backend_image=gcr.io/your-project-id/rayan-backend:latest"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That single command provisions everything below.&lt;/p&gt;

&lt;h3&gt;
  
  
  Service Account and IAM
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"google_service_account"&lt;/span&gt; &lt;span class="s2"&gt;"backend"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;account_id&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"rayan-backend"&lt;/span&gt;
  &lt;span class="nx"&gt;display_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Rayan Backend Service Account"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Three roles, least privilege&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"google_project_iam_member"&lt;/span&gt; &lt;span class="s2"&gt;"firestore"&lt;/span&gt;  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;role&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"roles/datastore.user"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"google_project_iam_member"&lt;/span&gt; &lt;span class="s2"&gt;"storage"&lt;/span&gt;    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;role&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"roles/storage.objectAdmin"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"google_project_iam_member"&lt;/span&gt; &lt;span class="s2"&gt;"vertex_ai"&lt;/span&gt;  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;role&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"roles/aiplatform.user"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Cloud Run Service
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"google_cloud_run_v2_service"&lt;/span&gt; &lt;span class="s2"&gt;"backend"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;template&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;session_affinity&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;  &lt;span class="c1"&gt;# Critical for long-lived Gemini Live sessions&lt;/span&gt;
    &lt;span class="nx"&gt;containers&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;image&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;backend_image&lt;/span&gt;
      &lt;span class="nx"&gt;resources&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;limits&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;cpu&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;memory&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"2Gi"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nx"&gt;scaling&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;min_instance_count&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;   &lt;span class="c1"&gt;# Always warm&lt;/span&gt;
      &lt;span class="nx"&gt;max_instance_count&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;  &lt;span class="c1"&gt;# Scale under load&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Firestore Database
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"google_firestore_database"&lt;/span&gt; &lt;span class="s2"&gt;"default"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;             &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"FIRESTORE_NATIVE"&lt;/span&gt;
  &lt;span class="nx"&gt;location_id&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;region&lt;/span&gt;  &lt;span class="c1"&gt;# us-central1&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Cloud Storage Buckets
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Media bucket (screenshots, mind maps)&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"google_storage_bucket"&lt;/span&gt; &lt;span class="s2"&gt;"media"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"rayan-media-${var.project_id}"&lt;/span&gt;
  &lt;span class="nx"&gt;location&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"US"&lt;/span&gt;
  &lt;span class="nx"&gt;cors&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;origin&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"*"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Frontend hosting bucket&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"google_storage_bucket"&lt;/span&gt; &lt;span class="s2"&gt;"frontend"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"rayan-frontend-${var.project_id}"&lt;/span&gt;
  &lt;span class="nx"&gt;website&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;main_page_suffix&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"index.html"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One command. Full infrastructure. Reproducible, version-controlled, reviewable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Frontend and Backend Stacks
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Frontend
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Library&lt;/th&gt;
&lt;th&gt;Version&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;React&lt;/td&gt;
&lt;td&gt;18.3.1&lt;/td&gt;
&lt;td&gt;UI framework&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Three.js&lt;/td&gt;
&lt;td&gt;0.170.0&lt;/td&gt;
&lt;td&gt;3D rendering engine&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;@react-three/fiber&lt;/td&gt;
&lt;td&gt;8.17.0&lt;/td&gt;
&lt;td&gt;React renderer for Three.js&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;@react-three/drei&lt;/td&gt;
&lt;td&gt;9.122.0&lt;/td&gt;
&lt;td&gt;Utilities (useGLTF, OrbitControls, Environment)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Zustand&lt;/td&gt;
&lt;td&gt;5.0.0&lt;/td&gt;
&lt;td&gt;State management, 6 stores (auth, palace, camera, capture, voice, transition)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Firebase SDK&lt;/td&gt;
&lt;td&gt;11.0.0&lt;/td&gt;
&lt;td&gt;Authentication&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Framer Motion&lt;/td&gt;
&lt;td&gt;12.35.0&lt;/td&gt;
&lt;td&gt;UI animations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GSAP&lt;/td&gt;
&lt;td&gt;3.12.5&lt;/td&gt;
&lt;td&gt;Camera transition timelines&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tailwind CSS&lt;/td&gt;
&lt;td&gt;3.4.19&lt;/td&gt;
&lt;td&gt;Utility-first styling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lucide React&lt;/td&gt;
&lt;td&gt;0.577.0&lt;/td&gt;
&lt;td&gt;Icons&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;React Router DOM&lt;/td&gt;
&lt;td&gt;6.27.0&lt;/td&gt;
&lt;td&gt;SPA routing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The 6 Zustand stores cleanly separate concerns. &lt;code&gt;authStore&lt;/code&gt; for Firebase user state. &lt;code&gt;palaceStore&lt;/code&gt; for rooms, artifacts, current room, layout. &lt;code&gt;cameraStore&lt;/code&gt; for position, orientation, overview mode, flyTo transitions. &lt;code&gt;captureStore&lt;/code&gt; for capture session state, audio stream, transcript, extraction messages. &lt;code&gt;voiceStore&lt;/code&gt; for recall session state, audio playback queue, tool activity. &lt;code&gt;transitionStore&lt;/code&gt; for room transition animations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Backend
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Library&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;FastAPI&lt;/td&gt;
&lt;td&gt;Async web framework with WebSocket support&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;google-genai&lt;/td&gt;
&lt;td&gt;Gemini Live SDK (persistent audio sessions, tool calling)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;google-adk&lt;/td&gt;
&lt;td&gt;Google Agent Development Kit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;google-cloud-firestore&lt;/td&gt;
&lt;td&gt;Async Firestore client&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;google-cloud-aiplatform&lt;/td&gt;
&lt;td&gt;Vertex AI embeddings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;firebase-admin&lt;/td&gt;
&lt;td&gt;Server-side token verification&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;numpy&lt;/td&gt;
&lt;td&gt;Cosine similarity computation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;httpx&lt;/td&gt;
&lt;td&gt;Async HTTP client for web search&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;beautifulsoup4&lt;/td&gt;
&lt;td&gt;HTML parsing for web search results&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;websockets&lt;/td&gt;
&lt;td&gt;WebSocket protocol&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pydantic&lt;/td&gt;
&lt;td&gt;Data validation and serialization&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Creative Synthesis
&lt;/h2&gt;

&lt;p&gt;One of my favorite features. When you ask Rayan to "synthesize this room," here's what happens.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;synthesis_service&lt;/code&gt; fetches all non-synthesis artifacts in the current room. It builds a prompt that includes the room name and style, a style-specific color palette (library gets warm amber and aged parchment, lab gets midnight blue and neon cyan, gallery gets lavender and rose gold), every artifact's title and keywords, mood hints derived from artifact types (emotion-heavy rooms get "warmth, feeling" hints), and color hints from individual artifact fields.&lt;/p&gt;

&lt;p&gt;The prompt goes to &lt;code&gt;gemini-2.5-flash-image&lt;/code&gt; with &lt;code&gt;response_modalities=["Text", "Image"]&lt;/code&gt;. The model generates a creative, styled mind map image. Not a diagram. Something that actually looks good. The PNG gets extracted, uploaded to Cloud Storage at &lt;code&gt;syntheses/{roomId}/{uuid}.png&lt;/code&gt;, and made publicly accessible. A synthesis artifact gets created and placed on the south wall, centered.&lt;/p&gt;

&lt;p&gt;Each synthesis is unique to the room's theme. Library rooms get warm parchment textures with scholarly connections. Lab rooms get holographic panels with neon data flows. Gallery rooms get painterly brushstrokes. The instruction to the model says &lt;em&gt;"Draw visible relationships. Make it beautiful enough to hang on a wall."&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Data Model
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Artifact
&lt;/h3&gt;

&lt;p&gt;Every memory in Rayan is an Artifact.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Artifact&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;                        &lt;span class="c1"&gt;# artifact_{uuid}
&lt;/span&gt;    &lt;span class="n"&gt;roomId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;                    &lt;span class="c1"&gt;# Parent room
&lt;/span&gt;    &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ArtifactType&lt;/span&gt;             &lt;span class="c1"&gt;# 20+ types (lecture, insight, moment, goal, emotion...)
&lt;/span&gt;    &lt;span class="n"&gt;visual&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ArtifactVisual&lt;/span&gt;         &lt;span class="c1"&gt;# 3D rendering type (floating_book, crystal_orb, hologram_frame...)
&lt;/span&gt;    &lt;span class="n"&gt;position&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Position3D&lt;/span&gt;           &lt;span class="c1"&gt;# x, y, z within room
&lt;/span&gt;    &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;keywords&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;                   &lt;span class="c1"&gt;# 50-150 word description
&lt;/span&gt;    &lt;span class="n"&gt;fullContent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;     &lt;span class="c1"&gt;# Extended content
&lt;/span&gt;    &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;         &lt;span class="c1"&gt;# 768-dim vector from text-embedding-005
&lt;/span&gt;    &lt;span class="n"&gt;sourceMediaUrl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Screenshot or mind map image URL
&lt;/span&gt;    &lt;span class="n"&gt;capturedAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;
    &lt;span class="n"&gt;captureSessionId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;enrichments&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;         &lt;span class="c1"&gt;# IDs of enrichment artifacts
&lt;/span&gt;    &lt;span class="n"&gt;relatedArtifacts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;    &lt;span class="c1"&gt;# Cross-links to related memories
&lt;/span&gt;    &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;           &lt;span class="c1"&gt;# Hex color hint for rendering
&lt;/span&gt;    &lt;span class="n"&gt;wall&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;            &lt;span class="c1"&gt;# north, south, east, west, or center
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Room
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Room&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;                        &lt;span class="c1"&gt;# room_{uuid}
&lt;/span&gt;    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;                      &lt;span class="c1"&gt;# "Machine Learning", "Travel Plans"
&lt;/span&gt;    &lt;span class="n"&gt;style&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;                     &lt;span class="c1"&gt;# library, lab, gallery, garden, workshop, etc.
&lt;/span&gt;    &lt;span class="n"&gt;position&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Position3D&lt;/span&gt;           &lt;span class="c1"&gt;# World coordinates
&lt;/span&gt;    &lt;span class="n"&gt;dimensions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dimensions3D&lt;/span&gt;       &lt;span class="c1"&gt;# Default 8x8x4
&lt;/span&gt;    &lt;span class="n"&gt;topicKeywords&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;       &lt;span class="c1"&gt;# ["AI", "neural networks"]
&lt;/span&gt;    &lt;span class="n"&gt;topicEmbedding&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;    &lt;span class="c1"&gt;# 768-dim for room matching
&lt;/span&gt;    &lt;span class="n"&gt;artifactCount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;                   &lt;span class="c1"&gt;# Derived from artifact summaries
&lt;/span&gt;    &lt;span class="n"&gt;firstMemoryAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;        &lt;span class="c1"&gt;# Earliest artifact
&lt;/span&gt;    &lt;span class="n"&gt;lastMemoryAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;         &lt;span class="c1"&gt;# Most recent artifact
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  10 Room Styles
&lt;/h3&gt;

&lt;p&gt;Each style defines the visual aesthetic and influences synthesis art. Library (scholarly, warm amber). Lab (scientific, midnight blue with neon). Gallery (artistic, lavender and rose gold). Garden (organic, emerald and bioluminescent). Workshop (practical, charcoal and molten orange). Museum (historical, classical). Observatory (visionary, deep space). Sanctuary (emotional, soft and reflective). Studio (creative, vibrant). Dojo (disciplined, minimal).&lt;/p&gt;




&lt;h2&gt;
  
  
  Deployment
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Backend to Cloud Run
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Build and push container&lt;/span&gt;
gcloud builds submit &lt;span class="nt"&gt;--tag&lt;/span&gt; gcr.io/&lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt;/rayan-backend &lt;span class="nb"&gt;.&lt;/span&gt;

&lt;span class="c"&gt;# Deploy with session affinity&lt;/span&gt;
gcloud run deploy rayan-backend &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--image&lt;/span&gt; gcr.io/&lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt;/rayan-backend &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; us-central1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--allow-unauthenticated&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--session-affinity&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set-env-vars&lt;/span&gt; &lt;span class="nv"&gt;GOOGLE_CLOUD_PROJECT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt;,MEDIA_BUCKET&lt;span class="o"&gt;=&lt;/span&gt;rayan-media-&lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Session affinity is the key flag. Without it, Cloud Run load-balances WebSocket connections across instances, which breaks the persistent Gemini Live sessions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Frontend to Firebase Hosting
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm run build
firebase deploy &lt;span class="nt"&gt;--only&lt;/span&gt; hosting
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Firebase Hosting serves the static SPA with CDN distribution and SSL.&lt;/p&gt;




&lt;h2&gt;
  
  
  Things I Learned Building with Gemini Live
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Session affinity is non-negotiable
&lt;/h3&gt;

&lt;p&gt;Gemini Live sessions are stateful WebSocket connections. If your infrastructure doesn't guarantee that subsequent messages from the same client hit the same server instance, your sessions break silently. Cloud Run's &lt;code&gt;--session-affinity&lt;/code&gt; flag fixed this.&lt;/p&gt;

&lt;h3&gt;
  
  
  Affective dialog changes everything
&lt;/h3&gt;

&lt;p&gt;I built Rayan initially with &lt;code&gt;enable_affective_dialog=False&lt;/code&gt;. It worked, but it felt mechanical. Flipping that single boolean changed the whole experience. Rayan became something you &lt;em&gt;wanted&lt;/em&gt; to talk to. The pacing changes, the tone shifts, the subtle empathy. It's the difference between a tool and a companion.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tool calling is asynchronous, and that's powerful
&lt;/h3&gt;

&lt;p&gt;Unlike traditional function calling where the model waits for a response, Gemini Live's tool calls are non-blocking. The model keeps talking while the tool executes in the background. When the result arrives, it gets injected via &lt;code&gt;send_client_content()&lt;/code&gt; and the model works it in naturally. This means Rayan can say "Let me navigate you to your Biology room" and &lt;em&gt;start talking about Biology&lt;/em&gt; while the navigation animation is still playing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Grounding must be continuous, not one-shot
&lt;/h3&gt;

&lt;p&gt;My first implementation loaded memories at session start and never refreshed them. After navigating three rooms, Rayan's context was stale. The fix was re-running semantic search on every room navigation and artifact interaction, injecting fresh memories mid-conversation. Gemini Live's &lt;code&gt;send_client_content()&lt;/code&gt; makes this possible without reconnection.&lt;/p&gt;

&lt;h3&gt;
  
  
  Embeddings inline on documents scale surprisingly well
&lt;/h3&gt;

&lt;p&gt;I originally planned to use a separate vector database. But storing 768-float embeddings directly on Firestore documents and doing cosine similarity in Python works fine up to thousands of artifacts. The simplicity is worth it. The Vertex AI Vector Search Index is already provisioned in Terraform for when scale demands it.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;Rayan works today. But the vision goes beyond what a hackathon timeline allows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Vertex AI Vector Search at Scale
&lt;/h3&gt;

&lt;p&gt;Right now, semantic search loads all artifacts from Firestore and computes cosine similarity in Python. This works at hundreds or low thousands of artifacts. Next step is activating the Vertex AI Vector Search Index already provisioned in Terraform, moving to approximate nearest neighbor search that handles millions of embeddings with sub-millisecond latency. This also opens the door to hybrid search, combining semantic similarity with keyword matching and temporal filters at the index level.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mobile Companion App
&lt;/h3&gt;

&lt;p&gt;The 3D palace works great on desktop, but real life happens on your phone. A mobile companion app (React Native or Flutter) would let you run Capture sessions from your pocket during walks, commutes, or in-person conversations, syncing everything back to your palace. The mobile experience would focus on voice-first interaction with a simplified 2D room view. The full 3D palace stays on desktop.&lt;/p&gt;

&lt;h3&gt;
  
  
  Collaborative Palaces
&lt;/h3&gt;

&lt;p&gt;A shared palace for a study group, a project team, or a couple. Multiple users contributing memories to shared rooms, with RecallAgent understanding multi-user context. "What did Sarah capture about the API design?" The architecture already supports multi-user Firestore paths. The agent context and permission model need extension.&lt;/p&gt;

&lt;h3&gt;
  
  
  Spaced Repetition Engine
&lt;/h3&gt;

&lt;p&gt;The palace structure is inherently spatial, which already helps memory. Adding a spaced repetition layer where Rayan proactively surfaces memories about to fade from your recall curve would turn the palace into an active learning system. "You haven't visited your Organic Chemistry room in 12 days. Want me to quiz you on the key reactions?"&lt;/p&gt;

&lt;h3&gt;
  
  
  Persistent Cross-Session Agent Memory
&lt;/h3&gt;

&lt;p&gt;Right now, each Capture and Recall session starts fresh (though the palace itself persists). Adding persistent agent memory where Rayan remembers &lt;em&gt;how&lt;/em&gt; you like to be spoken to, what topics you care about most, your learning style, your naming conventions would make the companion feel truly personal over months of use.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;Rayan is live at &lt;a href="https://rayan-memory.web.app" rel="noopener noreferrer"&gt;rayan-memory.web.app&lt;/a&gt;. Sign in with Google, start a Capture session, and speak. Watch your 3D palace build itself in real time. Then switch to Recall and walk through your memories.&lt;/p&gt;

&lt;p&gt;The whole project is built on &lt;strong&gt;Gemini Live API&lt;/strong&gt; (&lt;code&gt;gemini-live-2.5-flash-native-audio&lt;/code&gt;) for real-time voice agents. &lt;strong&gt;Gemini 2.5 Flash&lt;/strong&gt; for memory categorization and narration. &lt;strong&gt;Gemini 2.5 Flash Image&lt;/strong&gt; for creative mind map synthesis. &lt;strong&gt;Vertex AI &lt;code&gt;text-embedding-005&lt;/code&gt;&lt;/strong&gt; for semantic grounding. &lt;strong&gt;Cloud Run&lt;/strong&gt; for the backend with session affinity. &lt;strong&gt;Firestore&lt;/strong&gt; as the primary database. &lt;strong&gt;Cloud Storage&lt;/strong&gt; for media. &lt;strong&gt;Firebase Hosting&lt;/strong&gt; for the frontend. &lt;strong&gt;Firebase Authentication&lt;/strong&gt; for Google Sign-In. &lt;strong&gt;Terraform&lt;/strong&gt; for one-command infrastructure.&lt;/p&gt;

&lt;p&gt;A 3D memory palace that listens, remembers, and speaks back.&lt;/p&gt;




&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;I created this content for the purposes of entering the Gemini Live Agent Challenge hackathon. #GeminiLiveAgentChallenge&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt; &lt;a href="https://github.com/yelnady/rayan" rel="noopener noreferrer"&gt;github.com/yelnady/rayan&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Developer&lt;/strong&gt; &lt;a href="https://g.dev/yelnady" rel="noopener noreferrer"&gt;g.dev/yelnady&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>gemini</category>
      <category>googlecloud</category>
      <category>ai</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
