Forem: Fran

I Built an AI That Actually Remembers You — Here's a 4-Minute Demo

Fran — Fri, 03 Apr 2026 09:38:02 +0000

Every AI conversation starts from zero. I built Alma to change that.

Alma is a full AI workspace with persistent memory — it learns from your conversations and uses that context across every interaction.

I just published a 4-minute product demo showing everything in action:

What you'll see

Persistent Memory — Alma extracts facts, decisions, and patterns from your conversations. Each memory gets a confidence score and category. In the demo, you can see it retrieve 15+ memories in real time to answer a complex planning prompt.

Soul Engine — Define your AI's personality with structured blocks: identity, rules, worldview, tensions, anti-patterns, communication modes. Not a flat text dump — a real identity system.

Video Studio — Generate professional videos with Runway Gen-4 and Gen-4.5. Choose style, camera movement, duration.

Image Studio — Create images with Flux Pro in 10+ styles.

Writing Tools — 7 AI transformations: summarize, humanize, grammar check, translate, expand, simplify, change tone.

Web Search — Three levels of depth with AI-powered summaries and cited sources.

Plus: Documents, Ideas, Trends & News, Dashboard, Command Palette, 6 specialized Skills.

Stack

Cloudflare Workers (Hono)
Cloudflare D1 + Vectorize + Durable Objects + Queues
React 19 + Vite 6 + Tailwind 4
Anthropic Claude (Haiku/Sonnet/Opus)
2,964 tests, 100+ API endpoints, 15 languages

Try it

Free tier available — no credit card required.

🌐 Web: alma.olivares.ai
📦 SDK: npm install @olivaresai/alma-sdk
🔌 MCP: npm install @olivaresai/alma-mcp
💻 VSCode: Search "Alma" in Extensions

Would love to hear your thoughts. What would you want your AI to remember?

I Tested 5 AI Memory Tools So You Don't Have To (2026 Comparison)

Fran — Tue, 31 Mar 2026 12:57:20 +0000

AI memory is the hottest infrastructure category of 2026. I tested the top 5 tools as both a developer and a daily user. Here's what I found.

The tools I tested

Mem0 — The market leader ($24M raised, 80K developers)
Zep — Temporal knowledge graphs
Letta — Agent runtime with self-editing memory
SuperMemory — All-in-one memory + RAG
Alma — End-user product with memory (full disclosure: I built this one)

Test methodology

I used each tool for 1 week with the same workflow:

Daily coding conversations (TypeScript, React, Cloudflare Workers)
Project planning sessions
Writing and brainstorming

I evaluated: setup time, memory accuracy, retrieval quality, daily usability, and pricing.

Results

Setup time

Tool	Setup	Notes
Mem0	15 min	pip install + API key. Clean SDK.
Zep	30 min	Docker compose or cloud signup. More config.
Letta	45 min	Full agent runtime. Steeper learning curve.
SuperMemory	5 min	Cloud-only. Fastest setup.
Alma	2 min	Web signup. MCP install for Claude: 1 command.

Memory accuracy after 1 week

Tool	Memories stored	Accuracy	False positives
Mem0	47	78%	10 (generic/obvious)
Zep	31	85%	4 (entity-focused)
Letta	23	82%	5 (agent-curated)
SuperMemory	52	71%	16 (over-extracts)
Alma	38	87%	5 (confidence scoring helps)

Key insight: More memories ≠ better. SuperMemory stored the most but had the lowest accuracy because it over-extracted. Alma's confidence scoring (1.0 = user stated, 0.7 = inferred, 0.5 = observed) let me quickly filter out noise. Zep's entity focus was precise but missed conversational context.

Retrieval quality

When I asked "What framework am I using?" after discussing Next.js in week 1:

Tool	Found it?	Response quality
Mem0	Yes	Returned raw memory: "Uses Next.js"
Zep	Yes	Rich: "Next.js e-commerce project, started March 2026"
Letta	Yes	Agent summarized: "Your main project uses Next.js with Stripe"
SuperMemory	Partial	Found Next.js but also returned 5 irrelevant memories
Alma	Yes	Context-aware: assembled Soul + relevant memories + recent episodes

Daily usability (as an end user, not a developer)

This is where the tools diverge completely:

Tool	Can I see my memories?	Can I edit/delete?	Can I search?	Has a UI?
Mem0	Via API/dashboard	Via API	Via API	Dashboard (basic)
Zep	Via API	Via API	Via API	No native UI
Letta	Agent decides	Agent decides	Via agent	Dev UI
SuperMemory	Dashboard	Dashboard	Dashboard	Yes
Alma	Full UI	Full UI	Keyword + semantic	Full app

If you're a developer integrating memory into your product, Mem0 and Zep are the best choices. Clean APIs, good docs, production-proven.

If you're a person who wants an AI that remembers you, only Alma and SuperMemory offer a real end-user experience. And Alma's 3-layer architecture (memories + episodes + procedures) + Soul Engine puts it in a different league for personalization.

Pricing comparison

Tool	Free tier	Paid	Best value
Mem0	10K memories	$19 → $249/mo	Standard ($19) if you don't need graphs
Zep	1K credits	$25/mo	Good for temporal use cases
Letta	Self-hosted	$20-200/mo	Free if you manage infra
SuperMemory	1M tokens	Usage-based	Cheap for light use
Alma	500 memories	$19-149/mo	Pro ($19) covers most users

My recommendation

Building an AI app? → Mem0 (most mature) or Zep (if you need temporal reasoning)
Want an AI that knows you? → Alma (full product with memory as core UX)
Researching agent memory? → Letta (most innovative architecture)
Need simple, cheap memory? → SuperMemory (easiest to start)

Links: Mem0 · Zep · Letta · SuperMemory · Alma

What's your experience with AI memory tools? Drop your setup in the comments.

Mem0 Is an API. I Built a Product. Here's Why That Distinction Matters.

Fran — Mon, 30 Mar 2026 11:38:39 +0000

There are now 8+ AI memory frameworks. Mem0, Zep, Letta, Hindsight, SuperMemory, LangMem — all solving the same problem: LLMs forget everything between conversations.

I spent 6 months building Alma, and I made a fundamentally different bet than all of them.

They built APIs. I built a product.

Let me explain why I think that matters.

The API approach

Mem0 is the market leader. $24M raised, 80K developers, AWS partnership. Their pitch: "Add memory to your AI app with a few lines of code."

from mem0 import Memory
m = Memory()
m.add("User prefers dark mode", user_id="alice")
results = m.search("what does alice prefer?", user_id="alice")

This is genuinely useful. If you're building a customer support bot or an AI agent, Mem0 gives you a memory layer you don't have to build yourself. The API is clean, the docs are good, and the managed service handles infrastructure.

But here's the thing: Mem0's customer is a developer building an app. Not the person using the app.

The end user never sees Mem0. They don't configure it. They don't decide what gets remembered. They don't search their own memories. Mem0 is infrastructure — invisible by design.

The product approach

Alma takes a different position. The user IS the customer.

When you open Alma, you chat with an AI that remembers you. Not because a developer wired up memory API calls — but because the product is designed around persistent context as the core experience.

You can:

See what Alma remembers about you (and edit/delete anything)
Configure the AI's personality via Soul Engine (13 blocks: identity, tone, boundaries, knowledge...)
Search across months of memories by keyword or meaning
Separate contexts into environments (work, personal, side project)
Export everything in 6 formats (MD, HTML, PDF, DOCX, XLSX, JSON)

The memory isn't hidden infrastructure. It's the product.

Why this distinction matters

1. Trust requires visibility

When Mem0 stores a memory, the end user has no idea it happened. They can't see it, correct it, or delete it. This is fine for backend systems — but if you're building a personal AI assistant, users need to trust what's being remembered.

Alma shows every memory with its confidence score (1.0 = user stated, 0.7 = AI inferred, 0.5 = observed), category, and last access date. You have full control.

2. Memory needs personality context

A raw fact — "user prefers TypeScript" — means nothing without context. How should the AI use this information? Should it suggest TypeScript for every project? Only when the user asks for recommendations? Never assume, just know?

Alma's Soul Engine solves this. You define not just what the AI knows, but how it behaves with that knowledge:

<soul>
  <identity>Senior developer who values clean architecture.</identity>
  <tone>Direct, concise. Code over explanations.</tone>
  <anti_patterns>Never suggest "any" type. Never use var.</anti_patterns>
  <knowledge>Working on a Next.js e-commerce app called ShopperPro.</knowledge>
</soul>

Memory APIs don't have this. They store facts without behavioral context.

3. Three layers beat one

Most memory APIs store flat key-value pairs or vector embeddings. Alma uses three distinct layers:

Layer	What it stores	Example
Memories	Discrete facts with confidence scoring	"Uses TypeScript" (confidence: 1.0)
Episodes	Conversation patterns detected automatically	"User debugs auth issues on Mondays"
Procedures	Learned workflows, reinforced by use	"PR review = security → performance → tests"

Episodes and procedures are generated automatically by a background processor. The user doesn't have to manually "teach" the AI — it learns from patterns in your conversations.

Mem0 recently added graph memory (entities + relationships), which is powerful for multi-entity tracking. But it's paywalled at $249/month and aimed at agent architectures, not personal use.

4. The "AI that knows me" is a consumer product

Today, 200M+ people use ChatGPT. Zero of them use Mem0 directly. The gap is obvious: people want AI that remembers them, but the existing products don't offer it.

ChatGPT added "Memory" in 2024 — but it's a flat list of facts with no search, no organization, no confidence scoring, no personality system, and no way to separate work from personal context.

Claude has no memory at all between conversations.

The market for "personal AI with real memory" is massive and underserved. Mem0 serves developers building toward this. Alma serves users directly.

Where Mem0 wins

Let me be fair. There are clear cases where Mem0 is the right choice:

You're building a product that needs memory as a feature (customer support bot, coding assistant, healthcare agent)
You need graph memory for tracking entity relationships across users
You want AWS integration (Mem0 is the exclusive memory provider for AWS Agent SDK)
You have a team of developers who will manage the integration

Mem0 is great infrastructure. I'd probably use it if I were building a multi-tenant SaaS with AI features.

Where Alma wins

You want an AI that remembers YOU — not an API to add memory to your app
You want control over what's remembered, with the ability to see, edit, search, and delete
You want personality — not just memory, but behavioral context that shapes responses
You want a complete workspace — chat, code, images, documents, voice, search — all with persistent context
You're a developer who wants MCP/SDK integration AND a product to use daily

The real comparison

	Mem0	Alma
What it is	Memory API for developers	AI product for users
Customer	Developer building an app	Person using AI daily
User sees memory?	No (infrastructure)	Yes (searchable, editable)
Personality system	No	Soul Engine (13 blocks)
Memory layers	1 (vectors) or 2 (+ graph at $249/mo)	3 (memories + episodes + procedures)
Pricing	Free → $19 → $249/mo	Free → $19 → $49 → $149/mo
Self-contained product	No (requires your app)	Yes (web app + extensions)
MCP Server	No	Yes (npm install)
Languages	English	15 languages

Try both

If you're a developer deciding between these, I'd genuinely suggest trying both:

Mem0 for adding memory to an app you're building: mem0.ai
Alma for an AI that remembers you personally: alma.olivares.ai

They solve different problems. The question is which problem you have.

I'm building Alma solo. If you have questions about the architecture, memory scoring, or anything else — drop a comment. I read every one.

I Built a Full AI Platform with Persistent Memory — Here's What I Learned

Fran — Wed, 25 Mar 2026 22:17:24 +0000

Alma by Olivares.AI is a persistent memory layer for AI. It remembers facts, decisions, preferences, and behavioral patterns across every conversation. Think of it as giving your AI a long-term brain.

What's new:

Code Workspace — Upload repos or clone from GitHub. 8 AI skills: explain, refactor, review, test, fix, document, search, commit. Resizable 3-panel layout with file explorer, editor, and AI chat. Choose between Claude Opus, Sonnet, or Haiku.
Video Studio — Generate videos with Runway Gen-4 Turbo and Gen-4.5. Plan scenes with AI, manage projects with resource workspaces, generate YouTube metadata (title, description, tags), and stitch multiple clips into one video.
Global Conversation Search — Search across all your conversations instantly. Find that decision you made weeks ago in seconds.
Conversation Branching — Fork any conversation from a specific message. Explore alternative approaches without losing the original thread.
10 Image Presets — Cinematic, Anime, Watercolor, Pixel Art, Photography, 3D Render, Sketch, Neon, Minimalist. Each preset automatically enhances your prompt.
OCR — Extract text from images using AI vision. Upload a photo of a document, whiteboard, or receipt.
Web Search with AI Summaries — Perplexity-style search powered by Brave + Tavily with Claude-generated summaries and source citations.
Trends & News — Stay current with trending topics across 10 categories (tech, science, business, health, sports, entertainment, politics, world, environment, AI).

The Stack:

Cloudflare Workers (Hono) + D1 + R2 + Vectorize + KV + Durable Objects
React 19 + Vite 6 + Tailwind CSS 4
35 API routes, 79 migrations, 2,600+ tests
MCP Server, VSCode Extension, JavaScript SDK (all on npm)
15 languages supported

Try it: alma.olivares.ai

How I Score, Rank, and Assemble AI Memory in Production

Fran — Sun, 22 Mar 2026 12:29:10 +0000

Every AI app eventually hits the same problem: the model needs context, but you can't dump everything into the system prompt. Token budgets are finite. Not all information is equally relevant. And the naive approach — "just send the last N messages" — falls apart the moment your user has 200 memories and a 4,000 token budget.

I've been running a memory scoring and context assembly system in production for months. This is how it works, with actual code.

The pipeline

The system has four stages:

Extract structured memories from conversation
Deduplicate against existing memories
Score and rank by relevance, importance, recency, and frequency
Assemble a token-budgeted system prompt

Each stage has specific engineering decisions that took a while to get right.

Stage 1: Extraction

After every 5 assistant messages, a background processor fires asynchronously via ctx.waitUntil(). It takes the last 20 messages and asks the cheapest available model to extract structured data:

interface ExtractedMemory {
  content: string;
  category: 'preference' | 'fact' | 'decision' | 'project';
  importance: number; // 0.0 to 1.0
}

interface ExtractedEpisode {
  summary: string;
  topics: string[];
  outcome: string | null;
}

The extraction prompt has specific rules:

- Importance: 0.9+ for critical info, 0.5-0.8 for useful, below 0.5 for minor
- Keep memories concise (one sentence each)
- Extract 0-10 memories (only what's genuinely worth remembering)

The "0-10 memories" range matters. Early versions didn't cap extraction and the system generated noise — trivial facts diluting important ones. Capping at 10 per extraction cycle and requiring importance thresholds cleaned this up.

The episode summary is also structured — not "you talked for 45 minutes" but { summary: "Debugged auth middleware", topics: ["authentication", "middleware"], outcome: "Root cause was missing await" }. This makes episodes searchable by topic without embedding the full transcript.

One critical detail: this runs fire-and-forget. The user never waits. On Cloudflare Workers, that means every background promise needs both ctx.waitUntil() AND .catch():

const backgroundWork = processor.process(conversationId, messages, llm)
  .catch(err => console.error('Background processing failed:', err));
ctx.waitUntil(backgroundWork);

Missing that .catch() on Workers with compatibility dates 2024-10+ causes unhandled rejections that silently kill the Worker. This single line prevented a crash on every chat request.

Stage 2: Deduplication

Without dedup, you get the same preference stored 30 times. "User prefers TypeScript" appearing in every extraction cycle.

The approach: Jaccard similarity on extracted keywords with a 60% threshold and a 3-keyword minimum.

Why 60%? Tested extensively:

40% merges distinct memories ("prefers TypeScript" conflates with "prefers functional patterns")
80% lets obvious duplicates through
60% with 3-keyword minimum catches real duplicates while preserving distinct-but-related memories

When a duplicate is detected, the existing memory's access_count increments. Frequently confirmed facts naturally rise in rankings without creating noise.

Stage 3: Scoring

This is where it gets interesting. Every memory gets a composite score:

const DEFAULT_WEIGHTS = {
  relevance: 0.40,  // cosine similarity to current query
  importance: 0.30,  // extracted weight (0-1)
  recency:    0.20,  // exponential decay, 7-day half-life
  frequency:  0.10,  // log-scaled access count
};

The recency function uses exponential decay:

function recencyScore(accessedAt: string): number {
  const accessed = new Date(accessedAt).getTime();
  const hoursAgo = (Date.now() - accessed) / (1000 * 60 * 60);
  const halfLifeHours = 7 * 24; // 7 days
  return Math.exp((-Math.LN2 * hoursAgo) / halfLifeHours);
}

A memory accessed today scores 1.0. One week ago: 0.5. Two weeks: 0.25. This means stale memories don't disappear — they just yield to fresher ones when the budget is tight.

Frequency uses logarithmic scaling so high-access memories don't dominate:

function frequencyScore(accessCount: number): number {
  if (accessCount <= 0) return 0;
  return Math.min(1, Math.log10(accessCount + 1) / 2);
}

Why these weights?

Relevance at 0.40 because a perfectly scored memory about cooking is useless when you're debugging auth. Semantic relevance is the primary filter.

Importance at 0.30 because not all memories are equal. "User is migrating to PostgreSQL this quarter" (0.9) should outrank "User mentioned coffee" (0.3), even if the coffee mention is more recent.

Recency at 0.20 because conversations have temporal context. What you discussed yesterday is more likely relevant than what you discussed a month ago — but not always.

Frequency at 0.10 as a tiebreaker. Memories that keep surfacing in different conversations are probably important, but this shouldn't override direct relevance.

The confidence dimension

Each memory also has a confidence score that's separate from importance:

1.0 — user explicitly stated this
0.7 — AI inferred this from conversation context

Confidence feeds into retrieval quality. A high-confidence preference (the user said "I always use TypeScript") should surface over a high-importance but low-confidence inference ("probably prefers dark mode based on theme discussion").

Stage 4: Context Assembly

The Context Assembler takes scored memories and builds a token-budgeted system prompt:

interface AssembledContext {
  systemPrompt: string;
  metadata: {
    soulTokens: number;
    memoriesIncluded: number;
    memoriesTokens: number;
    episodesIncluded: number;
    proceduresIncluded: number;
    totalTokens: number;
    topMemoryScores: Array<{ content: string; score: number }>;
  };
}

The assembly order is strict:

Soul blocks first (identity, style, context) — always included, non-negotiable
Scored memories — ranked, filling up to 50% of remaining token budget
Recent episodes — latest conversation summaries
Relevant procedures — behavioral patterns matching the current query

Everything is wrapped in XML sections for structured parsing:

<alma_soul>
  <identity>...</identity>
  <anti_patterns>...</anti_patterns>
</alma_soul>

<alma_memories>
  <memory importance="0.9" category="project">Migrating auth to PostgreSQL</memory>
  <memory importance="0.7" category="preference">Prefers concise code reviews</memory>
</alma_memories>

<alma_episodes>
  <episode topics="auth,middleware">Debugged auth middleware...</episode>
</alma_episodes>

XML-safe truncation is critical — you never cut mid-tag. If a memory doesn't fit within the remaining budget, skip it entirely rather than corrupting the XML structure.

Why XML over JSON?

Tested both. XML with labeled attributes gives the model clearer section boundaries. JSON works fine for structured data but the model is more likely to reference XML-tagged content naturally in responses. The importance and category attributes are visible to the model, which helps it prioritize.

What I got wrong

First version had no scoring. Just retrieved the N most recent memories. This breaks immediately — a critical project decision from last week gets buried under trivial facts from today.

Second version over-weighted recency. Everything decayed too fast. Important long-term preferences disappeared within two weeks.

Third version didn't deduplicate. After a month of use, the same preferences appeared 40+ times, eating token budget with redundant information.

The current scoring weights are version four. They've been stable in production for months, but they're still configurable per user — different use cases might need different balances.

Numbers

Extraction latency: 0ms user-facing (background processing)
Scoring: <5ms for 500 memories
Context assembly: <10ms including soul prompt rendering
D1 reads: 1-5ms, writes: 5-15ms
Total overhead per message: near-zero for the user, ~2-4 seconds background

The system is Alma — alma.olivares.ai. It wraps this pipeline in a web app, MCP server (21 tools for Claude Desktop/Cursor/Windsurf), VSCode extension, and REST API. Free tier available.

But the scoring architecture applies to any AI system that needs to manage context at scale. The core insight: memory without ranking is just a pile of text. Ranking without token budgeting overflows the context window. Both without extraction means the user maintains everything by hand. You need all four stages.

I built a memory system for AI — here's the architecture

Fran — Sat, 14 Mar 2026 09:44:31 +0000

If you use Claude Code or Claude Projects with a well-written CLAUDE.md, you already know the difference it makes. The AI knows your stack, your conventions, your project structure. It's genuinely great.

But CLAUDE.md is static. You write it once, you maintain it manually, and it lives in one project. What about your preferences across projects? What about decisions you made three weeks ago? What about the patterns the AI could learn from watching how you work — if it had somewhere to store them?

That's the gap I wanted to close. So I built Alma — a cognitive memory system that gives AI persistent, structured memory that grows over time.

Here's how it works under the hood.

The architecture: 3 layers of memory

Alma organizes memory in three distinct layers, each serving a different purpose.

Memories — what the AI knows about you

Structured facts. Each one is semantically indexed, categorized, scored by importance, and tagged with confidence:

"Prefers TypeScript over JavaScript"          [confidence: 1.0, category: preference]
"Project uses D1 with Drizzle ORM"            [confidence: 1.0, category: technical]
"Hates verbose explanations — get to the point" [confidence: 0.8, category: preference]
"Decided on event-driven architecture March 3" [confidence: 1.0, category: decision]

When you say "review my auth middleware", Alma's Context Assembler runs a hybrid search (keyword + semantic) across your memories. It pulls the ones relevant to auth, to your stack, to your coding preferences — and injects them into the system prompt before the LLM even sees your message.

The result: the AI already has context before you type your first message.

Episodes — what happened before

After each conversation, a background processor generates a structured summary:

Episode: "Auth Middleware Refactor"
  Summary: Rewrote JWT validation to use jose library.
           Added refresh token rotation. Decided against
           session cookies for API-first architecture.
  Topics: auth, security, middleware
  Outcome: PR merged, deployed to staging

When you say "remember that auth discussion?", the AI recalls the full episode — decisions, outcomes, context. Structured summaries, not raw transcript fragments.

Procedures — how you like to work

Procedures are behavioral patterns the AI learns from observing your interactions:

"When reviewing code → check error handling first, then types"
"When explaining → use bullet points, not paragraphs"
"When debugging → ask for the error message before suggesting fixes"

These aren't stored and forgotten. They're matched against context on every conversation and applied dynamically. After a few weeks, the AI starts anticipating how you want things done — without you ever explicitly configuring it.

The Soul Engine: identity, not a system prompt

The Soul Engine goes beyond a single system prompt. It's 12 structured blocks organized in three sections:

SOUL — who the AI is:

Identity: core character, name, role
Worldview: how it approaches problems
Rules: non-negotiable behaviors (never fabricate memories, acknowledge uncertainty)
Tensions: the paradoxes that make personality feel real ("technical but warm", "concise but thorough when it matters")

STYLE — how it communicates:

Style Guide: voice, vocabulary, structure
Anti-Patterns: things to never do ("never say 'As an AI language model'")
Communication Modes: different modes for different situations (teaching, debugging, creative)
Example Interactions: calibration by demonstration

CONTEXT — what it knows right now:

User Profile, Active Context, Learned Patterns, Scratchpad
Plus custom blocks you define yourself

Every conversation, the Context Assembler renders this into a structured XML system prompt:

<alma_soul>
  <identity>You are Alma. Direct, technical, warm...</identity>
  <worldview>Simplicity over cleverness. Working code over elegant abstractions.</worldview>
  <tensions>Technical but approachable. Opinionated but open to correction.</tensions>
  <rules>Always reference relevant memories. Never fabricate information.</rules>
</alma_soul>

<alma_context>
  <user_profile>Senior dev, TypeScript, Hono + D1 stack...</user_profile>
  <active_context>Working on auth middleware refactor...</active_context>
  <memories>
    [12 most relevant memories for this conversation, ranked by semantic score]
  </memories>
  <episodes>
    [3 recent relevant episodes with summaries and outcomes]
  </episodes>
  <procedures>
    [Matched behavioral patterns for code review context]
  </procedures>
</alma_context>

Priority order: soul blocks first (always), then memories, then episodes, then procedures — all fit within a token budget. The AI gets a complete picture of who you are and what's happening, every single time.

It learns while you chat

Memory extraction runs in the background. You never wait for it.

After a conversation, a background processor (cheapest LLM, fire-and-forget with ctx.waitUntil()) analyzes the exchange and:

Extracts new memories
Generates episode summaries
Updates your user profile and active context
Refines procedures from observed patterns

You just chat. The AI gets quietly better after every interaction.

Developer-first: everything is an API

REST API — 140+ endpoints

Full CRUD on everything. Memories, episodes, procedures, blocks, conversations, chat (SSE streaming), files, images, voice, teams.

# Assemble full context for any message
curl -X POST https://alma.olivares.ai/api/v1/context/assemble \
  -H "X-API-Key: alma_key_..." \
  -d '{"user_message": "Review the auth middleware"}'

# Returns: structured system prompt + metadata (token counts, memory scores, keywords)

MCP Server — 21 tools for Claude Desktop, Cursor, Windsurf

{
  "mcpServers": {
    "alma": {
      "command": "npx",
      "args": ["-y", "@olivaresai/alma-mcp"],
      "env": { "ALMA_API_KEY": "alma_key_..." }
    }
  }
}

Your AI gets native tools: alma_search, alma_remember, alma_recall, alma_assemble, alma_focus, alma_update_block — it reads and writes to its own memory as part of reasoning.

JavaScript SDK

npm install @olivaresai/alma-sdk

import { Alma } from '@olivaresai/alma-sdk';

const alma = new Alma({ apiKey: 'alma_key_...' });
const context = await alma.context.assemble({ message: 'Review the auth middleware' });
// → Full system prompt with soul, memories, episodes, procedures

VSCode Extension

Memory search from the command palette. Context injection. Chat with persistent memory without leaving your editor.

Voice, images, documents — same memory

Voice Chat: Deepgram Nova-2 (transcription) + ElevenLabs (synthesis). Talk to your AI by voice — same persistent memory as text.
Image Studio: Flux Pro + Leonardo AI. The AI remembers your style preferences and past generations.
Document Generation: Export conversations to PDF, DOCX, XLSX, PPTX.

Every modality shares the same memory layer. A voice conversation references decisions from a text chat two weeks ago.

3 models, your choice

Powered exclusively by Anthropic Claude:

Tier	Model	Use case
Normal	Claude Haiku	Quick tasks, everyday
Advanced	Claude Sonnet	Professional work, complex analysis
Complex	Claude Opus	Deep reasoning, nuanced problems

Free plan gets Haiku. Paid plans get all three. Switch anytime — memory carries over.

BYOK: On Advanced+ plans, bring your own Anthropic, Replicate, or Leonardo API keys. Queries go direct to your accounts.

Privacy

Your memories, episodes, procedures, and identity blocks are the most personal data an AI can hold. Alma's position:

You own everything. Full .alma portable export. GDPR compliant (Articles 15-22).
Never used for training. Zero tracking. Zero analytics.
Account deletion permanently purges databases, R2 storage, and Stripe records. No retention.
Encrypted at rest and in transit. API keys hashed and never exposed after creation.

Pricing

Plan	Price	Highlights
Free	$0 forever	500 memories, 50 episodes, Claude Haiku
Pro	$19/mo	10K memories, 3 AI tiers, voice, images
Advanced	$49/mo	50K memories, API + MCP access, BYOK
Ultimate	$149/mo	Unlimited everything, dedicated support
Ultimate Max	$249/mo	2x weekly AI budget, maximum capacity

Weekly AI budget resets each Monday. Credit packs ($14.99 / $39.99 / $89.99) never expire.

The stack

If you're curious: the entire system runs on Cloudflare Workers (D1 for SQL, Vectorize for embeddings, R2 for files), Hono for the API framework, React for the frontend, and Anthropic Claude for all AI inference. 56 database migrations, ~1,600 tests passing. Solo developer.

Try it

Platform	Link
Web App	alma.olivares.ai — free, no credit card
MCP Server	@olivaresai/alma-mcp
VSCode	VS Code Marketplace
JS SDK	@olivaresai/alma-sdk
REST API	Developer Docs — 140+ endpoints
Docs	olivares.ai/docs

The free tier has 500 memories and no time limit. If you've ever been frustrated by an AI that forgets everything, give it a few conversations. The difference is immediate.

What would you want an AI that actually remembers you to do? I'd genuinely like to know.

*OlivaresAI.

I replaced my claude.md with a 3-layer cognitive memory system. Here's the architecture.

Fran — Thu, 12 Mar 2026 07:51:43 +0000

I built a structured memory system for AI called Alma. This post explains the architecture, not the marketing.

The problem, technically

Current AI memory implementations (claude.md, .cursorrules, ChatGPT Memory) share these limitations:

No schema. All data is unstructured text. No types, no fields, no queryable metadata.
No weighting. Every piece of information has equal priority in the context window.
No automatic extraction. The user manually maintains the memory.
No deduplication. Similar information accumulates without merging.
No separation of concerns. Identity, style preferences, and session context are mixed.

The architecture

Alma has three data layers and an assembly engine:

┌─────────────────────────────────────────────┐
│                Context Assembler             │
│  (dynamic token budget, relevance scoring)   │
├──────────┬──────────┬──────────┬────────────┤
│ Soul     │ Memories │ Episodes │ Procedures │
│ Engine   │          │          │            │
│ 13 blocks│ Weighted │ Summaries│ Behavioral │
│ Identity │ facts    │ w/ topics│ patterns   │
│ Style    │ w/ score │ outcomes │ auto-       │
│ Context  │ category │ search   │ extracted  │
└──────────┴──────────┴──────────┴────────────┘
         ↑ Background Processor ↑
         (async, every N messages)

Layer 1: Memories

Schema:

interface Memory {
  id: string;
  content: string;
  category: 'preference' | 'fact' | 'decision' | 'project' | 'general';
  importance: number;     // 0-1, determines context priority
  source: 'manual' | 'extracted' | 'extension' | 'api' | 'consolidated';
  access_count: number;   // incremented on retrieval
  reinforcement_count: number; // incremented on dedup match
  embedding: Float32Array;     // for semantic search
  created_at: string;
  last_accessed_at: string;
}

Deduplication uses Jaccard similarity on keyword sets with a 60% threshold and 3-keyword minimum. Above threshold: reinforce existing memory (increment count) instead of creating new record.

Search is hybrid: keyword (SQL FTS5) + semantic (cosine similarity on Cloudflare Vectorize embeddings). Results merged and re-ranked by a weighted score:

const WEIGHTS = {
  relevance: 0.40,   // Cosine similarity to current query
  importance: 0.30,  // 0.0-1.0, extracted or user-assigned
  recency: 0.20,     // Exponential decay, 7-day half-life
  frequency: 0.10,   // Logarithmic scale of access count
};

Layer 2: Episodes

interface Episode {
  id: string;
  conversation_id: string;
  summary: string;
  topics: string[];
  outcome: string;
  message_count: number;
  embedding: Float32Array;
}

Auto-generated at conversation end. Searchable by topic, outcome, or semantic similarity.

Layer 3: Procedures

interface Procedure {
  id: string;
  content: string;        // "Checks error handling first in code reviews"
  category: string;
  trigger: string;        // When this pattern activates
  source: 'extracted' | 'manual';
}

Extracted by the background processor analyzing conversation patterns. These represent behavioral habits, not explicit preferences.

Soul Engine: 13 blocks

type SoulSection = 'identity' | 'style' | 'context';
type BlockKey =
  | 'identity' | 'worldview' | 'tensions' | 'rules'
  | 'style_guide' | 'anti_patterns' | 'communication' | 'examples'
  | 'user_profile' | 'active_context' | 'learned_patterns'
  | 'scratchpad' | 'custom';

interface SoulBlock {
  key: BlockKey;
  section: SoulSection;
  content: string;
  char_limit: number;
  priority: number;
  truncation: 'head' | 'tail';  // head = keep newest, tail = keep oldest
}

Identity blocks use tail truncation (preserve oldest = core values stable). Context blocks use head truncation (trim oldest = keep fresh data). This simple mechanism creates different temporal behaviors without complex logic.

Context Assembler

async function assembleContext(userId: string, message: string): Promise<string> {
  // 1. Soul Engine — always included, highest priority
  const soul = await renderSoulBlocks(userId);

  // 2. Relevant memories — scored by semantic similarity to current message
  const memories = await searchMemories(userId, message, { mode: 'hybrid' });

  // 3. Recent episodes — for conversation continuity
  const episodes = await getRecentEpisodes(userId);

  // 4. Matching procedures — behavioral patterns
  const procedures = await matchProcedures(userId, message);

  // 5. Dynamic token budget — sections compete for space
  return buildPrompt({ soul, memories, episodes, procedures }, TOKEN_BUDGET);
}

Each section has a priority. If total tokens exceed the budget, lower-priority sections get truncated first. The Soul Engine is always preserved in full.

Background Processor

Fires asynchronously via ctx.waitUntil() every N messages:

Sends recent conversation to Claude Haiku for analysis
Receives structured JSON with extracted memories, episodes, procedures
Deduplicates memories against existing store
Updates relevant soul blocks (active_context, learned_patterns, user_profile)
Stores episode summary

Zero impact on conversation latency.

Infrastructure

Entirely Cloudflare:

Workers — API, SSE streaming, background processing
D1 — SQLite database (56 migrations)
Vectorize — Embedding storage and similarity search
R2 — File uploads (images, documents)
KV — Configuration cache
Durable Objects — Atomic budget tracking (single-threaded counters)

No AWS. No external databases. Cold start under 5ms.

Numbers

1,690 passing tests across 102 files
56 database migrations
180 REST API endpoints
15 fully localized languages
6 agent tools in chat + 21 MCP tools + 9 MCP resources

Try it

Web app: alma.olivares.ai

Free tier: 500 memories, Claude Haiku, automatic learning. No credit card.

Built by Francisco @ Olivares.AI

The Soul Engine: 13 blocks that replaced my 200-line system prompt

Fran — Wed, 11 Mar 2026 12:12:35 +0000

I used Claude with a 200+ line system prompt for months. Every convention, every preference, every project decision — crammed into a single text document. It worked. Barely.

Three problems kept growing:

1. No priority. "Be concise" and "never fabricate data" had equal weight. One is a style preference. The other is a critical rule.

2. Static by design. I corrected the same behavior ten times — "don't add comments to obvious code" — and it never stuck because the prompt didn't learn.

3. Mixed concerns. "You are thoughtful and direct" and "I'm working on the auth module this week" are fundamentally different types of information with different lifespans.

The Soul Engine

I built a replacement. 13 blocks organized into three sections, each with a different purpose and rate of change:

Section 1: alma_soul — WHO the AI is

<alma_soul>
  <identity>Core traits. Non-negotiable.</identity>
  <worldview>Beliefs, principles, decision framework.</worldview>
  <tensions>Creative paradoxes: "technical but warm",
    "concise but thorough when needed"</tensions>
  <rules>Behavioral rules. Always followed.</rules>
</alma_soul>

The tensions block is worth highlighting. Instead of flat rules, you define paradoxes: "opinionated about code quality, flexible about everything else." This produces more nuanced responses than a list of dos and don'ts. The AI gets permission to be complex.

These blocks are stable — you define them once and they rarely change.

Section 2: alma_style — HOW it communicates

<alma_style>
  <anti_patterns>Things to NEVER do.</anti_patterns>
  <style_guide>Voice, vocabulary, formatting.</style_guide>
  <communication_modes>
    "Debug mode: ask 2-3 questions first"
    "Code review: be direct, say 'change X to Y'"
  </communication_modes>
  <example_interactions>Calibration samples.</example_interactions>
</alma_style>

The anti_patterns block was the single most impactful change. Five lines:

Never start with "Great question!" or "That's interesting!"
Never hedge facts with "I think" or "I believe"
Never add comments to code unless logic is non-obvious
If response starts with an apology, rewrite without it
Never list more than 5 bullet points — synthesize instead

Why this works: Claude is trained on "helpful assistant" patterns. Suppressing specific unwanted patterns is a clearer signal than vague aspirational guidelines. The model knows exactly what not to do.

These blocks evolve slowly as you refine your preferences.

Section 3: alma_context — WHAT it knows about you

<alma_context>
  <user_profile>Facts about you. Auto-updated.</user_profile>
  <active_context>Current projects, focus areas.</active_context>
  <learned_patterns>Patterns discovered from your behavior.</learned_patterns>
  <scratchpad>Working memory for current conversation.</scratchpad>
</alma_context>

This section updates itself. A background processor fires every few messages, analyzes the conversation with a lightweight model, and updates your profile, context, and patterns. No manual maintenance.

Context Assembly

Having 13 blocks is meaningless if they blow the context window. The assembler manages a strict token budget:

Soul blocks always fit — highest priority, never truncated
Ranked memories fill up to 50% of remaining budget, scored by:
- Relevance to current topic: 40%
- Importance: 30%
- Recency (7-day half-life exponential decay): 20%
- Access frequency (log scale): 10%
Episode summaries and procedures fill the remainder
XML-safe truncation — never cuts mid-tag

If budget runs out, context sections drop first, then style. Soul blocks stay intact.

What changes over time

Week 1: Feels normal. The system silently builds context.

Week 2: The AI stops asking "what language do you use?" Code matches your conventions. Past decisions get referenced naturally.

Month 1: Cross-conversation connections. "This is similar to the approach you decided against for the auth module." That's when it shifts from tool to collaborator.

Try it

This is the core of Alma (alma.olivares.ai). The Soul Engine is fully available on the free tier — all 13 blocks, fully editable. The value appears around day 14 when enough context has accumulated.

What would your 13 blocks look like?

usulnet v26.2.7 — open-source Docker infrastructure platform

Fran — Mon, 23 Feb 2026 20:32:27 +0000

usulnet is an open-source, self-hosted Docker infrastructure platform. One binary, one web UI — containers, security, backups, reverse proxy, DNS, VPN, monitoring, terminal, file browser, multi-node orchestration. No vendor lock-in, no telemetry, no cloud dependency.

GitHub: github.com/fr4nsys/usulnet
Website: usulnet.com

v26.2.7 is the biggest release yet: 11 new features, 17 bug fixes (several critical), and a complete proxy simplification.

What's New in v26.2.7

Embedded DNS Server

Full authoritative DNS server built into usulnet, powered by miekg/dns (the Go library behind CoreDNS). Runs in-process — no external DNS software to install or manage.

Zone management — Create primary, secondary, and forward zones with full SOA configuration. Serial auto-increments on every record change.
10 record types — A, AAAA, CNAME, MX, TXT, NS, SRV, PTR, CAA, SOA. Per-record TTL and enable/disable toggle.
TSIG keys — Transaction Signature keys for secure zone transfers. Secrets encrypted at rest with AES-256-GCM.
Upstream forwarding — Non-authoritative queries forwarded to configurable upstreams (default: Cloudflare 1.1.1.3 + 1.0.0.3 malware-blocking DNS).
Live statistics — Real-time query counters, zones loaded, server uptime, health check.
Audit logging — Every zone/record/key change logged with user, action, resource, and timestamp.
8 new UI pages — Zone list, create/edit, detail with inline record management, DNS settings, audit log.

DNS Service Discovery

Running Docker containers are automatically registered as DNS records — no manual configuration.

A records: redis.containers.local → container IP. Registered on container start, removed on container stop/die.
SRV records: Exposed ports get _8080._tcp.myapp.containers.local for service discovery by name and port.
Real-time: Docker event stream callbacks — instant registration/deregistration, no polling.
Reconciliation: Periodic full-state sync catches events missed during transient Docker API disconnects.

dns:
  enabled: true
  listen_addr: ":53"
  service_discovery:
    enabled: true
    domain: "containers.local"
    create_srv: true

WireGuard VPN Management

Native WireGuard VPN from the web UI. No CLI, no config file editing.

Create and manage multiple WireGuard interfaces per host
Add peers with auto-generated Curve25519 keys and preshared keys
Client config generation (copy-paste or QR code)
Transfer statistics (rx/tx) per interface and per peer
Post-up/post-down script support for routing rules

Firewall Manager

Visual iptables/nftables management — create, edit, apply, and sync firewall rules from the browser.

Chains: INPUT, OUTPUT, FORWARD, DOCKER-USER
Protocols: TCP, UDP, ICMP, ALL
Actions: ACCEPT, DROP, REJECT, LOG
Audit log: Every rule change recorded with user, action, timestamp, and rule details
Auto-detection: Detects whether the host uses iptables or nftables and applies through the correct backend
One-click sync: Apply individual rules or sync the entire ruleset to the host

SSL Observatory

SSL Labs-style TLS scanner for monitoring certificate health across your infrastructure.

Certificate scanning: Analyzes protocol versions (TLS 1.0–1.3), cipher suites, certificate chains, OCSP stapling, HSTS, and Certificate Transparency logs
Grading: A+ to F letter grades with 0–100 numeric scoring
Dashboard: Grade distribution chart and expiring certificate alerts
Detailed reports: Per-target breakdown with actionable remediation guidance

Backup Verification

Automated backup integrity verification — proving backups are actually restorable, not just present.

Three methods: Extract (unpack and validate), Container (mount and verify), Database (restore to temp instance and query)
Integrity checks: Checksums, file readability, container accessibility, data integrity
Schedulable: Cron expressions for recurring automated verification
History: Full run log with status, method, duration, and error details

Container Image Builder

Build Docker images from Dockerfiles directly in the web UI.

Multi-stage build support
Build arguments and platform targeting
Reusable Dockerfile templates

Automated Rollback

Automatic stack rollback when deployments fail or health checks break.

Configurable rollback policies
Retry limits and cooldown periods
Full execution history

Crontab Manager

Web-based cron job scheduling — create, edit, enable/disable, and execute jobs from the UI.

Three command types: Shell commands (with working directory), Docker exec (target container), HTTP webhooks (GET/POST/PUT/DELETE)
Cron scheduling: Standard 5-field expressions via robfig/cron/v3
Execution history: Every run recorded — status, stdout/stderr, exit code, duration
Run Now: Execute any job immediately, independent of schedule
Auto-cleanup: Records older than 30 days pruned automatically

Interactive Network Topology Graph

The /topology page upgraded from static cards to an interactive D3.js force-directed graph.

Force-directed layout: Networks as rectangles, containers as circles, physics-based positioning
Drag & drop: Rearrange nodes, pin in place
Zoom & pan: Mouse wheel and drag, reset button
Hover highlighting: Hovering a node highlights connections, dims everything else
Click details: Sidebar panel with driver, subnet, state, connections
Color-coded: Networks by driver (bridge=blue, overlay=green), containers by state (running=green, stopped=red)
Fullscreen mode: For large topologies

Container Marketplace (Business)

Curated app marketplace for one-click Docker Compose deployments.

Searchable catalog with category filtering
Featured and verified app badges
User ratings and reviews
Configurable deployment fields
Community app submission

Proxy Simplification: Nginx-Only

Caddy and Nginx Proxy Manager backends have been completely removed — ~6,000 lines of dead code eliminated. Nginx is now the sole reverse proxy backend, always enabled.

New capabilities:

DNS-01 wildcard certificates: *.example.com via Cloudflare DNS API
Docker exec mode: When nginx runs in a container, usulnet uses the Docker API to execute nginx -t and nginx -s reload inside it — no local nginx binary needed
Sidebar search: Compact filter input below the logo, filters navigation in real-time, Escape clears

Already in usulnet

If you're discovering usulnet for the first time, here's what the platform already includes:

Core Docker

Containers: Full lifecycle — create, start, stop, restart, pause, kill, remove. Bulk operations, real-time stats, settings editor, filesystem browser.
Images: Pull, inspect, remove, prune. Docker Hub + private registries. Layer history.
Volumes: CRUD + built-in file browser for volume contents.
Networks: Bridge, overlay, macvlan. Connect/disconnect containers.
Stacks: Docker Compose deployment from YAML, Git repos, or built-in catalog (20 apps).
Docker Swarm: Initialize clusters, manage nodes, scale services, promote/demote, live service logs, rollback.

Security

Trivy scanning: CVE detection with severity classification per container and image
Security scoring: 0-100 composite score per container and across infrastructure
SBOM generation: CycloneDX and SPDX formats
RBAC: 46 granular permissions, custom roles, team-based scoping
2FA/TOTP: Google Authenticator, backup codes, account lockout
LDAP/OIDC: Active Directory, OAuth2 (GitHub, Google, Microsoft)
Audit logging: Every action logged to PostgreSQL with IP, timestamp, details
AES-256-GCM encryption for all secrets at rest

Monitoring & Alerting

Real-time CPU, memory, network, disk metrics per container and per host
Threshold-based alert rules (OK → Pending → Firing → Resolved)
11 notification channels (Email, Slack, Discord, Telegram, Gotify, ntfy, PagerDuty, Opsgenie, Teams, Webhook)
Docker event stream with filtering
Prometheus /metrics endpoint

Backup & Recovery

Back up containers, volumes, or stacks
Cron-based scheduling with retention policies
S3, MinIO, Azure Blob, GCS, Backblaze B2, SFTP, local
gzip/zstd compression
One-click restore

Multi-Node

Master/agent architecture with NATS + JetStream
Internal PKI with mTLS for agent-master communication
Auto-deploy agents via SSH from the web UI
Gateway routing — API requests auto-route to the correct node

Developer Tools

Terminal: Multi-tab browser terminal (xterm.js) — container exec + host SSH
Monaco Editor: VS Code editor in the browser for container/host files
Neovim: Neovim with lazy.nvim in the browser via WebSocket
File browsers: Container filesystem, host filesystem, SFTP browser
15 developer utilities: Base64, JSON formatter, UUID generator, regex tester, CIDR calculator, JWT decoder, and more
Snippets and command cheat sheet

Connections & Integrations

SSH (password/key auth, tunnels, port forwarding)
RDP/VNC via Guacamole (no client software needed)
Database browser (PostgreSQL, MySQL, MongoDB, Redis, SQLite)
LDAP browser
Git integration (Gitea, GitHub, GitLab — repos, PRs, issues, CI/CD)
Container registry browser (Docker Hub, GHCR, private OCI registries)

Automation

Outgoing webhooks with retry and delivery logs
Auto-deploy on Git push
Runbooks with approval gates
Scheduled jobs UI for all background tasks
Image update detection with batch apply + rollback

Reverse Proxy

Nginx with auto-HTTPS (Let's Encrypt)
HTTP-01 and DNS-01 (wildcard) certificate support
TCP/UDP stream proxying
Docker exec mode for containerized nginx

Operations

Docker daemon configuration (daemon.json) from the web UI — 50+ settings across 6 categories with risk badges
Drift detection (expected vs actual container state)
Change events feed (audit trail of infrastructure changes)
Resource cost optimization (rightsizing recommendations)
Session recording and replay
Operations calendar
Compliance PDF reports (CIS Docker Benchmark)

Tech Stack

Layer	Technology
Language	Go 1.25
Web	Chi v5 router
Templates	Templ (compiled, type-safe)
CSS	Tailwind CSS (standalone CLI, no Node.js)
Frontend	Alpine.js + HTMX
Terminal	xterm.js v5
Editor	Monaco v0.52 + Neovim
DNS	miekg/dns
Database	PostgreSQL 16 (54 migrations)
Cache	Redis 8 (TLS)
Messaging	NATS 2.12 (JetStream)
Auth	JWT + OAuth2/OIDC + LDAP + TOTP
Scanner	Trivy
Binary	~70 MB, no Node.js/Python runtime

Deploy in 60 Seconds

curl -fsSL https://raw.githubusercontent.com/fr4nsys/usulnet/main/deploy/install.sh | sudo bash

Auto-generates all secrets, starts PostgreSQL + Redis + NATS + Nginx + Guacamole. Access at https://your-server:7443 — default login: admin / usulnet.

I built a self-hosted Docker platform in Go

Fran — Mon, 09 Feb 2026 22:13:46 +0000

usulnet — Self-hosted Docker management platform

I've been building usulnet, a self-hosted platform for managing Docker
infrastructure. It's a single Go binary that handles containers, images,
volumes, networks, stacks, security scanning, backups, monitoring,
reverse proxy, SSH/RDP/database connections, and multi-node deployments
— all from one web UI.

Key highlights:
• Single binary (~50 MB), no Node.js or Python dependencies
• Trivy security scanning with CVE detection and scoring
• Multi-node master/agent architecture with NATS + mTLS
• Built-in terminal (xterm.js), code editor (Monaco), Neovim in browser
• 11 notification channels (Slack, Discord, Telegram, PagerDuty, etc.)
• RBAC with 44+ permissions, 2FA, LDAP/OIDC
• Backup & restore to S3/local with cron scheduling
• Reverse proxy management (Caddy + Nginx Proxy Manager)
• Full REST API with OpenAPI 3.0 docs

Tech stack: Go, Chi, Templ, Tailwind CSS, Alpine.js, HTMX, PostgreSQL,
Redis, NATS.

Fast deploy (60 seconds, auto-generated secrets):
curl -fsSL https://raw.githubusercontent.com/fr4nsys/usulnet/main/deploy/install.sh | bash

GitHub: https://github.com/fr4nsys/usulnet
License: AGPL-3.0

This is the first public beta (v26.2.0). It's functional and used in
production, but there may be rough edges. Bug reports and feedback are
very welcome — please open an issue on GitHub.