Forem: Oracle Developers

Oracle AI Agent Memory: A Governed, Unified Memory Core for Enterprise AI Agents

Wojtek Pluta — Mon, 04 May 2026 08:57:04 +0000

This article is syndicated from the original post on blogs.oracle.com. Read the canonical version there for the latest updates.

Recently, Oracle introduced the Oracle AI Agent Memory Python package, a model and framework-agnostic memory solution that gives enterprise AI teams a governed memory core on Oracle AI Database: short-term threads with summaries and context cards, long-term durable memories with vector search, automatic LLM-based memory extraction, and the governance and isolation production agents require.

Oracle Agent Memory: a unified agent memory layer built on Oracle AI Database.

Oracle AI Agent Memory is available now via PyPI as oracleagentmemory and documented in the Oracle Help Center. It is designed to replace the patchwork memory stack most production agents inherit with a single governed memory substrate built on Oracle AI Database.

This is the difference between an agent that is memory-augmented, given a vector store to consult, and one that is memory-aware, responsible for reading from and writing to its own governed, durable state with one enterprise-grade backend.

"Agent memory has shifted from a research curiosity to a production requirement in under two years. Teams shipping serious agentic systems need a backend that handles vectors, structured data, and transactional consistency in one place, not three stitched together. Oracle AI Database is one of the few platforms that delivers all of that natively, which is why we built Hindsight to run on it as a first-class backend."
— Chris Latimer, Co-Founder & CEO of Vectorize

Why Agent Memory, Why Now

The four types of agent memory: working, semantic, episodic, and procedural.

Most agent implementations treat memory as a bolt-on. A vector store consulted at retrieval time, a chat history table glued on beside it, and whatever hand-written extraction logic the team can maintain.

That stack holds together for a demo. It falls apart from the moment an enterprise AI team asks questions that actually matter. Who owns the memory? Where is it governed? How do we isolate tenants? How do we audit what the agent learned, and how do we forget it on request?

Context windows have grown over the years, but no context window is large enough to hold the full state of a long-running agent: weeks of user preferences, accumulated domain knowledge, prior tool outcomes, evolving task state, and the reasoning history that makes each decision defensible.

Agents need memory for the same reasons people do: to hold an active state while working on a problem, to retain facts learned over time, to recall specific past experiences, and to encode behavioral rules and procedures.

A practical taxonomy for agent memory commonly used in agent design covers four types:

Working memory is the active state the agent is reasoning over right now, the running conversation and the scratchpad the model sees at inference time.
Semantic memory is the durable facts and knowledge the agent accumulates about users, entities, and the world: preferences, canonical definitions, structured reference data.
Episodic memory is specific past experiences the agent can recall, what happened on a prior session, what the user asked three weeks ago, how a similar task resolved last time.
Procedural memory is the behavioral rules, guidelines, and learned procedures that shape how the agent acts, how to handle customers, which tools to prefer, what not to do.

These are not four different systems. They are four access patterns over the same underlying state, which is what makes a unified memory core the right architectural answer rather than four bolted-together services.

Oracle AI Database as the Memory Core

Reference architecture for Oracle AI Agent Memory on Oracle AI Database.

Oracle AI Database combines vector similarity search, relational querying, and graph-aware data access in one governed engine, enabling semantic recall alongside precise transactional and relationship-centric retrieval. Combined with Oracle's operational story, backups, replication, high availability, encryption, fine-grained access control, and audit, teams get a path from notebook to regulated production without swapping storage layers, rewriting compliance reviews, or stitching together bespoke isolation logic along the way.

Memory engineering, as a discipline, demands substrate choices that hold up under the access patterns a real enterprise agent actually has: concurrent writes, per-user and per-tenant scoping, full audit, and semantic retrieval at scale.

“Enterprise agents need an agent memory solution with robust security guarantees, strong governance controls, sophisticated workload isolation, as well as deep integration within the enterprise data platform. Oracle AI Agent Memory greatly simplifies building agent memory solutions by consolidating what are usually multiple separate and fragmented services, within the converged database architecture that customers already trust for their most critical data.”
— Tirthankar Lahiri, SVP, Mission-Critical Data and AI Engines, Oracle Database

Production agent memory carries two loads. Developers wire the stack together: vector store, chat log, extraction scripts, isolation logic, governance per piece. Agents reason over the fragments, deciding what to retrieve from where and fitting the relevant world into a finite context window each turn.

Oracle AI Agent Memory lifts both. One governed client replaces the four-service stack, with one set of credentials, one compliance review, and one backup story. Working, semantic, episodic, and procedural memory share one substrate and one retrieval surface, so the model reasons over a coherent view of its state. Summarization and scoped retrieval put the right subset into context at the right moment, freeing the model to spend its reasoning budget on the task rather than memory bookkeeping.

Automatic LLM-based extraction turns conversation into durable memories without hand-rolled prompt chains. Multi-tenant isolation is enforced at the store layer, so a single schema can host multiple deployments without cross-tenant leakage. And because the SDK is framework-agnostic, integrating with LangGraph, Claude Agent SDK, OpenAI Agent SDK, WayFlow, and custom harnesses, teams aren't locked into a single runtime to get the substrate.

Key Benefits For AI Workloads

Configuration: gpt-5.5, reasoning effort xhigh, nomic-embed-v1.5 embeddings, local HNSW index, top-K = 200. X-axis truncated; all categories scored above 88%.

Oracle AI Agent Memory is built for the operational realities of running AI agents in production.

Production-grade recall on long-horizon memory benchmarks. On LongMemEval, the standard academic benchmark for long-context agent memory, Oracle AI Agent Memory scores 93.8% (469 of 500), with the strongest results on the categories that matter most for production agents: 100% on single-session assistant recall, 96% on temporal reasoning, and 95% on knowledge-update tasks. Multi-session recall, the hardest category in the benchmark, lands at 88%. Configuration: OpenAI gpt-5.5 (reasoning effort xhigh), nomic-embed-text-v1.5 embeddings, local HNSW index, top-K = 200.

Bounded per-turn cost as sessions extend. Periodic thread summarization, durable memory extraction, and prompt-time message compaction keep the working context bounded as conversations grow. In an 80-turn scripted conversation, Oracle AI Agent Memory held per-request input around 1,300 tokens for the full run while a flat-history baseline grew linearly past 13,900 — roughly 9.5× more tokens per request by the final turn, and a much steeper bill across the full conversation. Teams shipping long-running agents trade a linear-in-history cost curve for a flat one.

80-turn ChromAtlas-ND scripted conversation · gpt-5.4 (raw OpenAI client, no framework). Token estimate: chars / 4 (notebook convention)

Better answers than a flat-history baseline. A flat-history agent has the entire verbatim conversation in its prompt — every fact ever mentioned, in order. By rights it should be hard to beat on recall. Across the same 80-turn conversation, evaluated by an impartial gpt-5.4 judge on accuracy, completeness, relevance, and coherence, Oracle AI Agent Memory won 48 turns to flat history's 13, with 19 ties: 3.7× more wins despite the baseline's information advantage. A retrieved context card focuses the model on what matters; a sprawling transcript dilutes attention across noise.

80-turn ChromAtlas-ND scripted conversation; judge: gpt-5.4; scored on accuracy, completeness, relevance, coherence

Cost is a tunable knob, not a fixed value. The summarization trigger controls how aggressively the package compacts thread context, and it moves the cost-fidelity trade-off directly. In an 8-query demo conversation (five runs per threshold), a 10,000-token trigger landed at a mean of 121,268 total tokens, about 60% under the 306,823-token flat-history baseline. As the trigger rises, the package compacts less often and preserves more raw context per turn; by a 50–70k trigger, mean total tokens approach or exceed the baseline, and run-to-run variance widens. Teams pick the threshold that matches their answer-quality requirements and lock in the cost envelope they want, rather than accepting whatever curve a fragmented stack produces.

Memory Agent Efficiency vs Summarization Threshold on Demo Conversation. Num queries = 8; num runs per threshold = 5.

One backend, every Python runtime. LangGraph, the Claude Agent SDK, the OpenAI Agents SDK, WayFlow, and custom Python harnesses all instantiate the same OracleAgentMemory client and read and write the same Oracle Database store. Teams running more than one framework no longer rebuild memory per runtime, and migrations between frameworks no longer mean migrating memory.

Primitives for audit and erasure on a single substrate. Every record carries user, agent, thread, and timestamp scoping fields, and the SDK exposes search, list, and per-record delete operations across memories, threads, and messages, so callers can locate records for a subject and remove them on request. Oracle Database's native auditing covers the storage layer underneath. Compliance reviews land on a single substrate (one database with audit, retention, and access controls already in the data plane) rather than four services with four reviews.

One vendor relationship for production agent memory. A single Oracle AI Database instance carries vector search, structured state, JSON document retrieval, transactional consistency, and database-native audit. No second vector database to license, no third service to monitor and scale, no fourth backup pipeline to maintain.

Who Oracle AI Agent Memory Is For

Oracle AI Agent Memory is designed for:

AI Developers and engineers building production agents who need durable short-term and long-term memory in one place, with enterprise security and isolation.
Teams already running Oracle AI Database who want their agents to write to the same governed backend as the rest of the business
Technical leaders evaluating Oracle AI Database for agent memory infrastructure at scale, with compliance and audit requirements

Getting Started

Install the Oracle AI Agent Memory package:

pip install oracleagentmemory

A minimal end-to-end loop in Python looks like this:

from oracleagentmemory import AgentMemory

memory = AgentMemory.from_connection(
    connection_string="...",
    user_id="user_123",
)

# Add conversation turns to a short-term thread
thread_id = memory.create_thread(user_id="user_123")
memory.add_messages(thread_id, messages=[
    {"role": "user", "content": "I prefer vegan meals."},
    {"role": "assistant", "content": "Noted."},
])

# Extract durable long-term memories from the thread
memory.extract_memories(thread_id)

# Scoped search over long-term memory, enforced per-user
results = memory.search(
    user_id="user_123",
    query="dietary preferences",
    limit=5,
)

Code samples are illustrative; the final API surface is documented in the Oracle Help Center.

The quickstart notebook and framework how-to guides are available in the Oracle AI Developer Hub, and the full API reference is available in the Oracle Help Center.

Oracle AI Agent Memory is the first release of a broader commitment to a governed memory substrate enterprise agents need. Memory engineering is still an emerging discipline. The infrastructure behind it should not be.

What Is Agent Memory? A Beginner’s Guide for AI Developers

Anya Summers — Wed, 29 Apr 2026 09:11:11 +0000

TL;DR: Agent memory is stored state an AI agent can retrieve across sessions to maintain continuity. A bigger context window does not fix the problem. Once memory has to persist, be scoped to the right user, and be retrieved reliably, it becomes a data problem, and is often best handled in a database such as Oracle AI Database.

Why This Matters

You can build a convincing AI agent surprisingly fast. Give it a model, wire up a few tools, and it can look sharp in the first session. Then the user comes back the next day. They ask a follow-up question. They refer to a failed attempt from yesterday. They expect the agent to remember that they prefer Python examples and concise answers. Instead, the agent starts from scratch.

That is usually the moment when a demo stops feeling clever and starts feeling flimsy. A lot of beginner guides blur this point. They talk as if a larger context window solves the whole problem. It does not. A larger context window gives the model more room to work during one session. It does not give the system a memory of what happened last week. When the session ends, the context goes with it. Memory is the layer that preserves what matters.

What You'll Learn

What agent memory actually is, and why a bigger context window is not a substitute
The four useful types of agent memory: working, procedural, semantic, and episodic
When memory stops being a prompt trick and starts being infrastructure
How to implement a persistent semantic memory store using LangChain and Oracle AI Database
Common mistakes to avoid and a checklist for building your first memory layer

What Is Agent Memory?

Agent memory is the information an AI agent can carry from one interaction to the next. That information might be a user preference, a summary of an earlier conversation, a previous task result, or facts the system has learned and may need later.

The key point is simple. It is not enough that the model saw the information once. The system needs to be able to bring it back when it matters.

Imagine a user tells an assistant three things today:

they prefer concise answers
they are working in Python
the last attempt failed because their API key had expired

If the assistant can use that information tomorrow without being told again, it has memory. If the user has to repeat all three points, it does not. That is the difference.

Context Window vs Memory: What Is the Difference?

This is the part that trips people up. A context window is the text the model can see right now. That includes the prompt, the recent messages, retrieved documents, tool outputs, and any system instructions passed into the current call. It is the model's live working space. Memory is different. Memory is stored state the system can recover later.

The simplest analogy is this:

the context window is the desk
memory is the filing cabinet

A bigger desk is useful. You can spread out more notes and hold more detail in front of the model. But the desk gets cleared. The filing cabinet is what lets you come back tomorrow, open the right folder, and pick up where you left off.

It also helps to separate memory from Retrieval Augmented Generation (RAG). RAG brings in external knowledge (e.g.,company PDFs) so the model can answer a question with better grounding. Memory, by contrast, preserves useful state from previous interactions. One helps the agent know more in the moment; the other helps it behave with continuity over time.

In practice, the strongest systems usually use all three layers:

context window for active reasoning
retrieval for outside knowledge
memory for continuity across sessions

A simple scenario makes the difference concrete.

Monday: The user says, "I am learning Python and I prefer short answers." The agent helps them debug a script and the session ends.

Tuesday: The user returns and asks, "Can you help me sort a list?"

Without memory, the agent gives a long answer in whichever language it guesses. With memory, the agent retrieves the user's preference, responds in Python, and keeps the answer concise. Same model.

Same prompt. Different system around it.

The Four Types of Agent Memory

A simple way to understand agent memory is to borrow a rough model from human memory. It's not perfect, but it's useful.

Memory type	Simple meaning	Example in an agent
Working memory	What the agent is handling right now	Current messages, tool outputs, temporary reasoning state
Procedural memory	How the agent does things	Instructions, workflows, and tool-use rules
Semantic memory	Facts the agent has learned	User preferences, saved facts, product knowledge
Episodic memory	Specific past events	Previous sessions, task history, and failed attempts

This framework matters because not every agent needs the same mix. A customer support agent may need semantic memory for customer preferences and episodic memory for past tickets. A coding agent may care more about procedural memory for workflows and semantic memory for project conventions. A one-shot Q&A bot may not need much memory at all.

That's worth keeping in mind: "add memory" is not a universal requirement. It only makes sense when continuity actually improves the experience or the outcome.

When Does Agent Memory Become a Data Problem?

The moment you want memory to persist, you're no longer just writing prompts. You are making storage and retrieval decisions.

You need to decide:

what is worth storing
what should be ignored
how memories are tied to the right user
how old or stale memories get updated or removed
how the system finds the right memory at the right time

That is where many first agent builds get messy. Saving text is easy. Bringing back the right memory for the right user, in the right context, without pulling in noise, is the hard part. Once memory has to persist and be searchable, structure matters. You typically need metadata such as user_id, memory_type, timestamps, and maybe expiry rules. You also need a retrieval strategy that avoids surfacing irrelevant or outdated information.

Persistence also introduces governance concerns. As soon as an agent is storing anything that can be traced to a person, you are dealing with personally identifiable information, and every mature system needs answers to a small set of questions. What personal data is being stored? How long is it kept? How does a user request deletion, and can the system actually honour that request?

Building those answers in from day one is much easier than retrofitting them after the first audit or data subject request. Governance lives best as code and schemas, not as a Confluence page somebody hopes to find later.

This is why databases appear so quickly in serious agent systems. Prompts are temporary. Memory needs storage, filtering, and lifecycle rules. If you are storing embeddings, scoping memories to a user, and reusing them later, you are designing a small data system whether you planned to or not.

That sounds heavier than it is. You do not need a giant memory platform on day one. But you do need to stop thinking of memory as "extra text for the prompt". It is a system component. Without structure and filtering, memory quickly turns into noisy context that reduces answer quality instead of improving it.

Architecture Overview

At a high level, a production-ready agent memory system has three layers working together: a context window for active reasoning, a retrieval layer for outside knowledge (RAG), and a persistent memory store for continuity across sessions. The memory store is where platforms like Oracle AI Database provide the most value.

Oracle AI Database is a strong fit for production memory systems for three reasons:

Durability: Memory is only valuable if it survives restarts, deployments, and the kind of quiet infrastructure changes that happen in any real engineering environment.
Metadata-driven filtering: Storing vectors next to structured columns like user_id, tenant_id, memory_type, and created_at means retrieval can be scoped cleanly without building a second database to hold the filters.
Lifecycle control: Expiry, archival, soft-delete, and audit trails are problems databases have been solving for decades, and memory needs all of them. Running vector search in the same database that already holds the relational and governance layer removes a whole category of synchronisation bugs that would otherwise appear on week three.

Prerequisites

Python 3.9 or later
Access to an Oracle AI Database 26ai instance (Autonomous Database, container, or local install)
An embedding model configured and callable from your environment
Basic familiarity with LangChain concepts (vector stores, retrievers)

Step-by-Step Guide: A Simple Memory Layer with LangChain and Oracle

The goal here is not to build a huge platform. It is to make the pattern concrete: save a useful memory, attach metadata, and retrieve it later when it becomes relevant.

Step 1: Install the Packages

Install the LangChain Oracle integration along with the Oracle Python driver and LangChain core.

pip install langchain-oracledb oracledb langchain-core

Step 2: Connect to a Persistent Store

Open a connection to Oracle and wrap it in a LangChain OracleVS vector store. This example assumes you already have an embedding model configured. The broad idea matters more than the exact class names — the agent now has somewhere durable to store semantic memory outside the prompt.

import oracledb
from langchain_oracledb.vectorstores import OracleVS

connection = oracledb.connect(
    user="agent_user",
    password="password",
    dsn="hostname:port/service"
)

memory_store = OracleVS(
    client=connection,
    embedding_function=embeddings,
    table_name="AGENT_MEMORY"
)

Step 3: Store a Memory and Retrieve It Later

Save something worth remembering, attach metadata so it stays scoped correctly, and retrieve it when the next interaction needs it. That is the core loop.

memory_store.add_texts(
    texts=["User prefers concise answers and Python examples."],
    metadatas=[{"user_id": "user_123", "memory_type": "preference"}]
)

results = memory_store.similarity_search(
    "How should I answer this user?",
    k=3,
    filter={"user_id": "user_123"}
)

If you only remember one design lesson from this section, make it this: memory quality depends less on storing more information and more on retrieving the right information cleanly.

When Do You Actually Need Agent Memory?

Not every agent needs memory. This is where it is easy to overbuild. You probably need memory when:

the same user comes back repeatedly
the agent needs to remember preferences or previous decisions
tasks span multiple sessions
the system improves when it learns from earlier outcomes

You may not need much memory when:

the task is one-off question answering
document retrieval is enough
users are unlikely to return
continuity adds more complexity than value

A lot of first agent projects do not need a big memory layer. They need a clear use case and a small amount of well-scoped memory. That's usually a better place to start.

Validation & Troubleshooting

Retrieval returns noise: If results look bad with ten stored memories, they will be worse at ten thousand. Validate retrieval quality early by running a handful of realistic queries and inspecting what comes back.
Wrong user's memories appear: Every similarity_search call should include a filter on user_id. If you see cross-user leakage, check that metadata is written on every add_texts call.
Stale memories resurface: Add a created_at timestamp and define lifecycle rules. Preferences change. Facts expire. If the system never updates or retires old memories, it will eventually return stale context.
Connection errors to Oracle: Verify your DSN string matches your service name, and that the oracledb driver can reach the host. Autonomous Database users should confirm their wallet configuration.
Everything gets saved as a "memory": That sounds safe, but it usually creates noise. Decide upfront what qualifies as a memory worth storing.

Common Mistakes to Avoid

A few mistakes show up again and again.

First, people treat a larger context window as if it solves memory. It helps within a single session. It does not create continuity on its own.
Second, they save everything. That sounds safe, but it usually creates noise. If every past detail becomes a "memory", retrieval quality drops fast.
Third, they skip metadata. Without fields like user_id, memory_type, or timestamps, the system has no reliable way to determine which memory belongs to whom or whether it is still relevant.
Fourth, they forget memory lifecycle. Preferences change. Facts expire. Previous failures become irrelevant. If the system never updates or retires old memories, it will eventually return stale context.

Finally, some teams add memory before they have proved they need it. That is backwards. Start with the user problem. Then decide whether continuity genuinely improves the product.

Key Takeaways

Agent memory is stored state an agent retrieves across sessions to maintain continuity. A context window helps with the current interaction; memory enables continuity over time.
Four useful memory types are working, procedural, semantic, and episodic. Not every agent needs the same mix.
As soon as memory must persist, be scoped to the right user, and be retrieved reliably, you are dealing with a data problem.
Start with one memory type to begin with. Semantic memory for user preferences is usually the highest-value entry point.
Scope every memory by user_id, and include memory_type and a timestamp from the first write.
Oracle AI Database 26ai fits well here by combining durable storage, metadata-driven filtering, and lifecycle control in the same system that already holds your relational and governance layer.
Building an agent is easy. Keeping one alive in production is where memory stops being a prompt trick and bedomes infrastructure.

Frequently Asked Questions

What is the difference between a context window and agent memory?
A context window is the information the model can see during the current interaction. Agent memory is information the system stores and can bring back in future interactions.

What are the main types of agent memory?
A simple framework uses four types: working, procedural, semantic, and episodic. In practice, most agent builds care most about semantic memory for facts and preferences, and episodic memory for past interactions.

Do all AI agents need memory?
No. Some agents only answer one-off questions and do fine with a prompt plus retrieval. Memory becomes useful when continuity across sessions actually improves the result.

Can a vector database be used for agent memory?
Yes. A vector database or vector-capable store can work well for semantic memory, especially when you need similarity search. It still needs metadata and retrieval rules, otherwise it turns into a pile of loosely relevant text.

When should I use Oracle AI Database 26ai for agent memory?
Use it when you need durable storage, vector similarity search, and metadata-driven filtering in the same system. It is especially valuable when your application already has a relational and governance layer, because running vector search alongside it removes a whole category of synchronisation bugs.

What are the limitations?
Memory architectures are still largely bespoke. There is no clean default answer for when to summarise versus store verbatim, or how to balance recall against retrieval noise as the store grows. Eviction, forgetting, and evaluation of memory systems are all genuinely open problems. Start small, instrument retrieval, and treat the design as something you will revise.

Next Steps

Unified Memory Core for AI Agents

Anya Summers — Mon, 27 Apr 2026 16:05:40 +0000

A practical guide to building episodic, lexical, vector, and graph memory workflows in Oracle AI Database

Companion notebook: Unified Agent Memory with Oracle AI Database

Key takeaways

A unified memory core combines episodic, lexical, semantic, and relationship-aware retrieval in one governed platform.
Hybrid retrieval (Oracle Text + vector + metadata filters) improves reliability in enterprise queries.
GRAPH_TABLE adds business relationship context beyond nearest-neighbor similarity.
DBMS_SCHEDULER and VPD patterns make memory lifecycle and security operational.
The companion notebook demonstrates all core patterns in a runnable workflow.

What agent memory is and why it matters

Agent memory is the stored context an AI system can access across steps, sessions, or workflows. In practice, it supports several critical functions:

State persistence – remembering what the agent is currently doing

Context continuity – carrying forward prior user goals and constraints

Knowledge retrieval – finding facts, documents, and learned abstractions

Workflow resilience – resuming long-running tasks after delays or failures

Personalization – adapting behavior based on historical interactions

This matters because modern agents are not just answering isolated questions. They are coordinating tools, operating over enterprise systems, and producing outputs that depend on both immediate context and historical knowledge.

At a high level, memory is what allows an agent to behave less like a stateless API and more like a system that can learn and adapt over time.

You'll learn how to

model episodic memory with JSON and query it using SQL/JSON.
run lexical retrieval with Oracle Text and semantic retrieval with vectors.
combine lexical, semantic, and metadata constraints into hybrid retrieval.
traverse user-ticket-document context with SQL Property Graph and GRAPH_TABLE.
apply lifecycle and tenant-aware security patterns for governed agent memory.

Architecture overview

The unified memory flow keeps ingestion, storage, retrieval, and governance inside Oracle AI Database. Episodic events are stored in JSON, reusable knowledge is retrieved with Oracle Text and vectors, relationship context is traversed with SQL Property Graph, and lifecycle/security controls are enforced with scheduler and VPD patterns.

Prerequisites

Python 3.10+
Oracle AI Database 26ai (or compatible environment)
Dependencies: oracledb, python-dotenv, pandas, optional langchain-core
Privileges for tables, indexes, SQL/JSON, and queries
Optional privileges for Oracle Text and SQL Property Graph features

Types of agent memory and their storage needs

Not all memory behaves the same way. Different memory types have different latency, durability, and retrieval requirements.

Short-term vs. long-term memory

Short-term memory is the agent's working context. It typically includes the current conversation window, recent tool outputs, temporary plans, and session variables. It requires very low latency but does not need to be persisted indefinitely.

Long-term memory persists beyond a single interaction. It may include user preferences, completed tasks, conversation summaries, business objects, knowledge artifacts, and execution history. It should be durable, searchable, and governed.

Episodic memory

Episodic memory stores events and experiences. For agents, that can mean:

previous conversations
tool calls and outputs
actions taken on behalf of a user
workflow checkpoints
timestamps, actors, and outcome metadata

This memory is usually time-oriented and benefits from structured metadata, durable storage, and filtering by user, task, tenant, or date range.

Example: episodic memory as JSON documents

A practical pattern is to store each conversation turn, tool invocation, or workflow checkpoint as a JSON document and query it with SQL/JSON functions:

CREATE TABLE agent_events (
    event_id    NUMBER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    session_id  VARCHAR2(100) NOT NULL,
    event_data  JSON NOT NULL,
    created_at  TIMESTAMP DEFAULT SYSTIMESTAMP
);

SELECT e.session_id,
       JSON_VALUE(e.event_data, '$.type') AS event_type,
       jt.tool_name,
       jt.latency_ms
FROM   agent_events e,
       JSON_TABLE(e.event_data, '$'
         COLUMNS (
           tool_name  VARCHAR2(100) PATH '$.tool.name',
           latency_ms NUMBER        PATH '$.tool.latencyMs'
         )
       ) jt
WHERE  JSON_VALUE(e.event_data, '$.type') = 'tool_call';

This pattern is especially useful when an agent needs durable session history without flattening every attribute into separate columns on day one.

Semantic memory

Semantic memory stores generalized knowledge rather than a raw event log. It includes facts, policies, product information, documentation, ontologies, embeddings, and derived knowledge the agent can reuse across tasks.

This memory often benefits from a combination of:

vector search for semantic similarity
keyword/text search for exact terminology and domain-specific phrases
relational filters for governance, freshness, and access constraints
graph relationships for connected business meaning

Example: semantic retrieval with Oracle Text

Why does lexical retrieval still matter in a vector-first architecture? An agent can use CONTAINS to rank policy, support, or product documents by relevance and combine that with vector search in the surrounding workflow:

SELECT article_id,
       title,
       SCORE(1) AS relevance
FROM   knowledge_articles
WHERE  CONTAINS(content, 'database performance', 1) > 0
ORDER  BY relevance DESC
FETCH FIRST 10 ROWS ONLY;

For short text catalogs, Oracle Text's CTXCAT model is also a strong fit when agents need keyword matching plus structured filters such as product family, severity, or tenant.

Matching memory types to storage technologies

The key design principle is simple: use multiple memory types, but avoid fragmenting them across too many disconnected systems unless you truly need to.

Storage technologies for agent memory

In-memory storage: pros, cons, and use cases

This type of storage is well suited for fast, transient state.

Pros

extremely low latency
simple fit for session state and active workflow context
useful for intermediate reasoning artifacts and recent tool results

Cons

limited durability
not suitable as the system of record
difficult to govern and audit if used alone

Best use cases

active conversation buffers
current plan state for an orchestrator
short-lived coordination across steps in a single run

File-based storage: when and why to use it

Files and object storage are useful for large, unstructured artifacts such as:

PDFs and reports
images and media
transcript archives
exported workflow bundles

They work well when the artifact itself is large, rarely updated, or naturally belongs in a document repository. However, file-based storage alone is a weak memory layer because it lacks rich query semantics. In practice, it works best when paired with database metadata, vector indexes, or a catalog layer.

Databases: SQL, NoSQL, and key-value stores

Databases are the backbone of durable agent memory.

SQL databases excel when agents need strong consistency, joins, transactions, governance, and structured filters.
NoSQL/document stores are useful when schemas evolve quickly and payloads are semi-structured.
Key-value stores are effective for simple lookups, caching, and session persistence at high speed.

For enterprise agents, the strongest pattern is often not choosing one memory store per memory type, but choosing a platform that can support multiple memory representations together.

Vector databases for semantic memory retrieval

Vector retrieval is essential when an agent must find content by meaning rather than exact wording. It is especially effective for:

semantic search over documents
similarity matching for prior cases
memory recall from summarized or embedded interactions
grounding RAG workflows with relevant context

But vector search should not be treated as the entire memory architecture. Enterprise retrieval often requires semantic matching plus exact filtering, freshness rules, tenant boundaries, business keys, and joins to live data.

That is where Oracle AI Database stands out as a unified memory core. It brings vector search next to relational data, JSON, graph, spatial, and enterprise governance features, allowing agents to retrieve semantically relevant context without losing operational control.

Why text search belongs in the unified memory core

A significant portion of production searches need both vector similarity and keyword matching. Users do not always ask only by meaning. They often include:

exact product names
policy clauses
ticket IDs
account numbers
legal terms
error messages and codes

That is why a mature memory core should include text search alongside vector, graph, spatial, JSON, and relational capabilities. Semantic retrieval helps with meaning; keyword retrieval helps with precision. Together they produce more reliable enterprise context.

Hybrid storage architectures and patterns

Most serious agent systems use a hybrid pattern, such as:

in-memory working context for active sessions
relational/JSON persistence for durable state and episodic history
vector indexes for semantic recall
text search for lexical precision
object storage for large source artifacts
graph structures where relationships are central to reasoning

The question is not whether hybrid memory exists—it almost always does. The real design decision is whether those layers are operationally fragmented or organized around a unified platform.

Example: relationship-aware memory with SQL Property Graph

Oracle SQL Property Graph lets you model and query graph data — vertices (nodes) and edges (relationships) — directly on top of existing relational tables, views, materialized views, or external tables. No data is copied; the graph definition stores only metadata, and queries operate against current table data. This is useful when an agent must follow connected context such as user → ticket → service → document:

SELECT *
FROM GRAPH_TABLE (memory_graph
  MATCH (u IS user) -[r IS opened]-> (t IS ticket) -[m IS mentions]-> (d IS document)
  COLUMNS (
    u.name  AS user_name,
    t.title AS ticket_title,
    d.title AS document_title
  )
);

That kind of retrieval complements vector similarity by surfacing the business relationships around a memory item, not just the nearest semantic neighbors.

Scaling agent memory for large applications

Externalizing memory from models

Large language models should not be expected to carry all relevant context in their parameters or prompt window. As applications grow, memory must be externalized into governed stores that can be queried on demand.

This improves:

freshness of retrieved context
controllability of business logic
auditability and compliance
reuse across workflows and teams
cost efficiency versus oversizing prompts

Hierarchical and tiered memory layers

At scale, memory is usually tiered:

Hot memory – immediate session context and recent tool outputs
Warm memory – summaries, recent episodes, and active task state
Cold memory – historical records, archived artifacts, and long-term facts

This layered approach keeps latency manageable while retaining historical depth.

Retrieval-augmented generation (RAG) techniques

RAG is the most common pattern for grounding an agent with external memory. Strong RAG systems typically combine:

semantic retrieval over embeddings
lexical retrieval for exact terms
metadata filtering for tenant, time, trust, or policy
reranking for relevance and precision
source attribution for traceability

In most real-world systems, no single retrieval method is enough on its own. In enterprise settings, hybrid retrieval matters. Many useful searches depend on both meaning and exact terminology, so text search is not a legacy feature to bolt on later. It is an important part of the unified memory core.

Retrieval evaluation metrics

To validate retrieval quality rigorously, hybrid memory systems should be measured with explicit ranking and coverage metrics, not only by subjective answer quality.

Precision@K: how many of the top-K retrieved results are relevant.
Recall@K: how much of the relevant context is recovered within top-K results.
MRR (Mean Reciprocal Rank): how early the first relevant result appears in the ranked list.
Latency (p50/p95): retrieval responsiveness under realistic concurrency and tenant load.

In practice, these metrics should be tracked per retrieval mode (lexical, semantic, hybrid, graph-augmented) and per query class (policy, troubleshooting, identity, compliance) to detect ranking drift early and keep retrieval behavior stable over time.

Memory management: summarization, pruning, and lifecycle policies

Memory is not just about storing more. It is also about deciding what to keep, compress, expire, and promote.

Important practices include:

summarization to condense long histories into reusable state
pruning to remove redundant or low-value context
retention policies based on legal, business, and product rules
promotion rules to move temporary knowledge into durable memory
staleness checks so outdated facts do not keep reappearing

Example: scheduled summarization and pruning

DBMS_SCHEDULER is Oracle's enterprise job scheduling framework. It provides a rich feature set for scheduling PL/SQL code, stored procedures, executables, and scripts — either on a time-based calendar expression, in response to external events, or as part of dependency chains. The DBMS_SCHEDULER maps naturally to memory lifecycle automation. Teams can schedule summarization, retention enforcement, and archive workflows directly in the database:

BEGIN
  DBMS_SCHEDULER.CREATE_JOB(
    job_name        => 'SUMMARIZE_AGENT_SESSIONS',
    job_type        => 'PLSQL_BLOCK',
    job_action      => 'BEGIN memory_pkg.summarize_old_sessions(30); END;',
    repeat_interval => 'FREQ=HOURLY;INTERVAL=6',
    enabled         => TRUE,
    auto_drop       => FALSE
  );
END;
/

The same scheduling pattern can also keep search infrastructure fresh, for example by running CTX_DDL.SYNC_INDEX and CTX_DDL.OPTIMIZE_INDEX jobs for Oracle Text indexes on a predictable cadence.

Infrastructure considerations for scalability

As memory volume grows, architects should consider:

partitioning by tenant, time, or workload
indexing strategies for vector, text, and relational access
concurrency and transaction isolation for multi-agent workflows
cost of re-embedding and re-ranking pipelines
observability for recall quality, latency, and drift

Oracle AI Database is compelling here because it supports enterprise-grade scalability while keeping multiple data modalities close together. That reduces the coordination overhead of moving context between independent systems.

Operations runbook for memory systems

To keep unified memory reliable in production, teams should operationalize a lightweight runbook that covers indexing, partitioning, scheduler cadence, and observability.

Indexing: monitor Oracle Text and vector index health, and schedule regular sync/optimize routines.
Partitioning: partition event and knowledge tables by tenant and/or time windows to control growth and query cost.
Scheduler cadence: run summarization, pruning, retention, and index-maintenance jobs on predictable intervals.
Observability: track retrieval latency (p50/p95), result quality metrics, and drift signals across retrieval modes.
Operational review: review failed jobs, low-confidence retrieval patterns, and tenant hot spots on a recurring schedule.

A compact runbook helps maintain retrieval quality, governance compliance, and performance consistency as memory volume and workload complexity increase.

Implementation walkthrough

This implementation builds a unified memory flow in Oracle AI Database, from event storage to multi-mode retrieval and governance.

Initialize the environment and establish one Oracle connection reused across the workflow.
Create episodic memory (agent_events) with realistic JSON events for user messages, tool calls, tool results, checkpoints, and summaries.
Query episodic memory with SQL/JSON (JSON_VALUE, JSON_TABLE) for filtering, extraction, and analytics.
Apply tenant-aware retrieval patterns so every recall path remains policy-aligned.
Create and populate the knowledge store (knowledge_articles) with tenant-scoped support and policy content.
Run lexical retrieval with Oracle Text (CONTAINS, SCORE) for exact-term precision.
Add semantic retrieval with vectors and rank results by similarity using VECTOR_DISTANCE.
Combine lexical, semantic, and metadata constraints into hybrid retrieval ranking.
Execute unified recall by combining latest episodic context with knowledge retrieval candidates.
Add relationship-aware retrieval with SQL Property Graph and GRAPH_TABLE over user-ticket-document paths.
Apply lifecycle automation and security patterns using DBMS_SCHEDULER and DBMS_RLS.
Validate outputs end to end and keep the workflow rerunnable with cleanup.

How this guide maps to the notebook

To make the guide easier to navigate, each concept in this article is implemented in a corresponding notebook section.

Episodic memory is introduced through the agent_events model and sample event ingestion, then expanded with SQL/JSON extraction using JSON_VALUE and JSON_TABLE.

Tenant-aware retrieval and knowledge storage follow, along with lexical retrieval using Oracle Text and semantic retrieval using vectors (VECTOR_DISTANCE).

Hybrid retrieval then combines lexical relevance, semantic distance, and metadata filtering in one ranking path.

The workflow continues with unified episodic-plus-knowledge recall, relationship-aware traversal using GRAPH_TABLE, and operational patterns for lifecycle automation (DBMS_SCHEDULER) and row-level tenant isolation (DBMS_RLS / VPD).

The notebook concludes with an optional LangChain interoperability layer that keeps retrieval Oracle-native.

Security and privacy considerations in agent memory storage

Memory makes agents more capable, but it also expands the attack surface.

Common security risks and attack vectors

data leakage across users, roles, or tenants
prompt injection through retrieved content
memory poisoning from incorrect or malicious inputs
over-retention of sensitive information
stale or conflicting memory causing unsafe decisions
weak authorization around recalled context and tool actions

Data protection and access control strategies

Best practices include:

role-based and attribute-based access control
encryption in transit and at rest
row-level or tenant-aware data isolation
retrieval filters tied to identity and policy
audit logs for memory access and mutation events
data classification for sensitive memory types

Operational security checklist

To translate security principles into day-to-day practice, teams should validate a compact operational checklist for every memory workflow.

Enforce tenant isolation in every retrieval path, not only at the application layer.
Log memory reads and writes for auditability, including tool-triggered retrieval actions.
Apply data classification and PII handling rules before memory is persisted or retrieved.
Use role-aware authorization checks for both retrieval and mutation operations.
Define retention and deletion controls so sensitive memory does not persist beyond policy windows.
Protect retrieved context against prompt-injection and memory-poisoning propagation.

This checklist helps ensure that memory quality, security, and compliance remain aligned as agent usage scales.

Example: tenant-aware memory access with VPD

Virtual Private Database (VPD), also called Fine-Grained Access Control (FGAC), is Oracle's mechanism for enforcing row-level security transparently at the database kernel level. Unlike application-layer filtering — which can be bypassed by ad-hoc queries, ETL tools, or reporting applications — VPD policies are enforced by the Oracle query engine itself, regardless of how a query reaches the database.. The Row-Level Security can enforce tenant isolation directly in the database with VPD (DBMS_RLS):

BEGIN
  DBMS_RLS.ADD_POLICY(
    object_schema   => 'APP',
    object_name     => 'AGENT_MEMORY',
    policy_name     => 'TENANT_ISOLATION',
    function_schema => 'APP',
    policy_function => 'TENANT_ISOLATION_POLICY',
    statement_types => 'SELECT, INSERT, UPDATE, DELETE',
    update_check    => TRUE
  );
END;
/

With SYS_CONTEXT-driven predicates, the same memory tables can serve many tenants while ensuring an agent only recalls context it is authorized to access.

Mitigation best practices and compliance

Organizations should design agent memory with compliance in mind from the start. That means applying retention rules, provenance tracking, redaction strategies, and approval workflows where needed. It also means ensuring the retrieval layer does not bypass the same governance standards applied to transactional systems.

This is another reason a unified, governed database platform is attractive: it allows memory retrieval to inherit mature enterprise security controls instead of recreating them separately for every store.

Recap of memory types and storage options

Short-term memory supports immediate task execution and should be fast.
Long-term memory preserves durable context across sessions and workflows.
Episodic memory captures what happened, when, and under what conditions.
Semantic memory helps the agent retrieve meaning, facts, and abstractions.

No single storage mechanism solves every memory problem. In-memory caches, files, relational stores, key-value systems, text indexes, graph structures, and vector search all have a role.

Guidelines for choosing and managing agent memory

Use the following rules of thumb:

Match storage to memory behavior, not just data format.
Keep working memory fast, but make durable memory governed and auditable.
Combine semantic retrieval with keyword and metadata filtering.
Treat lifecycle management as part of memory design, not an afterthought.
Prefer unified platforms when governance, consistency, and scale matter.

Balancing performance, scalability, and security

The best agent memory architectures do not optimize only for retrieval quality. They balance:

performance for responsive agent interactions
scalability for growing users, tasks, and data volumes
security for enterprise trust and compliance

Oracle AI Database fits this balance especially well for enterprise agents. It provides one of the most sophisticated and scalable ways to unify vector, JSON, graph, spatial, relational, and analytic data access in a governed platform. That makes it a strong foundation for a true unified memory core rather than a collection of disconnected memory services.

When memory becomes a first-class architectural concern, agents become more reliable, more context-aware, and more useful in real business workflows.

Validation & troubleshooting: failure modes and fallback strategy

Production memory systems should not assume every retrieval mode is always available. A resilient workflow defines deterministic fallback behavior so the agent can continue safely and predictably.

In this architecture, retrieval degrades gracefully: if lexical retrieval is unavailable or returns weak matches, the workflow can fall back to semantic retrieval; if vector retrieval is unavailable, lexical retrieval and tenant-scoped filtering remain active; if graph traversal is unavailable, relationship context can be reconstructed with relational joins; and if all retrieval modes return low-confidence results, the system should return a safe tenant-scoped fallback response and request clarification.

This fallback strategy preserves continuity, improves user trust, and prevents silent retrieval failure in enterprise workflows.

Validate Oracle Text index creation and lexical ranking output.
Validate vector dimensions and input format used by TO_VECTOR / VECTOR_DISTANCE.
If GRAPH_TABLE parsing fails, avoid reserved labels and use names like user_v, ticket_v, document_v.
If results are empty, verify tenant/category filters and fallback query paths.
Run the notebook end-to-end and confirm outputs across episodic, lexical, vector, hybrid, and graph sections.

Frequently Asked Questions

Why not use vector search alone?
Enterprise queries often contain exact policy/product terms and IDs, so hybrid retrieval is usually more reliable.

What does unified memory mean in practice?
It means episodic, lexical, semantic, and relationship-aware retrieval are handled in one governed database workflow.

What happens if one retrieval mode is unavailable?
The workflow can use fallback paths (lexical, semantic, or relational fallback) to preserve continuity and safe defaults.

How does this relate to the companion notebook?
The notebook implements each pattern as executable steps so readers can validate outputs end-to-end.

Agent Memory with LangChain4j and Oracle AI Database

Anders Swanson — Wed, 22 Apr 2026 17:22:17 +0000

One of the quickest ways to make an impressive agent demo is to prepare a clever prompt. One of the quickest ways to make that same agent fall apart in production is to give it no durable memory.

In this article, we'll build a small, memory-backed assistant with LangChain4j and Oracle AI Database. The assistant can search prior incidents, runbooks, decisions, and shift handoffs to answer questions. It can write new memories back to the database so they become searchable in any session. Additionally, all user, agent, and tool messages are logged to database table for observability and auditing.

Database feature overview
Run the sample
Chat Memory vs Durable Memory
Hybrid retrieval: semantic + full-text search
Lightweight reranking
LangChain4j agent
Memory writeback
Recording user, agent, and tool messages
Why database memory is useful for agents
Code pointers
Where you can take this next

Database feature overview

The agent is built with modern Oracle AI Database features:

persistent JSON memory documents in Oracle AI Database
vector embeddings in a VECTOR column
Oracle Text search over the same JSON document
hybrid ranking that blends semantic and exact-match retrieval
append-only transcript logging by conversation ID

Using these features, the agent (a fictional operations assistant) can answer question about runbooks, incident reviews, change requests, and shift handoffs from its persistent memory. Because the memory is database backed, multiple agents from concurrent sessions may access the same data safely.

Run the sample

You will need Java 21+, Maven, Docker, and an OpenAI API Key.

From the module root, run the tests:

export OPENAI_API_KEY=<your key>
mvn test

To run the live terminal app using your database connection string and user:

export OPENAI_API_KEY=<your key>
mvn compile exec:java \
  -Dexec.args="jdbc:oracle:thin:@localhost:1521/freepdb1 testuser testpwd"

Once it starts, try prompts like:

What happened during the checkout incident after CHG2145?
Which runbook section should I use for the checkout rollback?
Draft a next-shift handoff and remember it.

Chat Memory vs Durable Memory

Chat memory and durable memory solve different problems. Operational memory has different requirements:

it should survive process restarts
it should be queryable across conversations from distributed, concurrent agents
it should support structured metadata like service, environment, incident ID, and change ticket
it should be searchable both semantically and exactly
it should allow writeback when the agent learns something worth preserving

That starts to look a lot more like a database problem than a prompt engineering problem.

Hybrid retrieval: semantic + full-text search

The MemoryRepository runs two queries, which are fused into one ranked list:

Vector search over the embedding column using cosine distance.
Oracle Text search over the JSON payload using json_textcontains.

Here is the vector query:

select id,
       memory_kind,
       title,
       memory_doc,
       (1 - vector_distance(embedding, ?, COSINE)) as vector_score
from agent_memories
order by vector_score desc, id
fetch first ? rows only

And here is the text query:

select id,
       memory_kind,
       title,
       memory_doc,
       score(1) as text_score
from agent_memories
where json_textcontains(memory_doc, '$', ?, 1)
order by score(1) desc, id
fetch first ? rows only

Pure vector search is often too fuzzy for ticket IDs. Pure text search is often too brittle for paraphrases. Hybrid retrieval handles both.

Lightweight reranking

Once both branches return hits, MemorySearchRanker merges the results with deterministic weights:

a bonus when the incident ID or change ticket matches directly
a bonus for keyword overlap in the indexed memory text
a combined matchedBy indicator of VECTOR, TEXT, or BOTH

The deterministic ranker could be implemented by an LLM judge or a more complex re-ranking system. For this sample, I kept it intentionally lightweight and low-latency.

LangChain4j agent

The LangChain4j agent implementation is quite small, using a single interface:

public interface OpsMemoryAssistant {
    @SystemMessage("""
            You are an operations handoff assistant backed by Oracle AI Database memory.
            Use searchMemories when prior incidents, runbooks, handoffs, decisions, or change history are relevant.
            When you rely on memory results, include the references in the form [M123].
            If the user asks you to remember or preserve a new handoff or decision, call storeMemory after drafting it.
            Keep answers concise and operational. Mention incident IDs and change tickets when they matter.
            """)
    @UserMessage("{{message}}")
    String chat(@V("message") String userMessage);
}

That is the right level of abstraction for this sample.

LangChain4j handles chat orchestration and tool wiring. Oracle AI Database handles durable memory, search, and transcript persistence. Each layer is doing the job it is actually good at.

Memory writeback

The sample keeps two memory stores:

a curated durable memory store for retrieval
an append-only transcript for observability and auditing

This one also stores new durable memory through the storeMemory tool when the user explicitly asks the assistant to preserve a handoff or decision.

That matters because an agent memory system should not just be a read-only archive. If a useful conclusion comes out of a conversation, the system should be able to keep it.

In this sample, writeback creates a new MemoryDocument, generates an embedding, and inserts both the JSON payload and vector into agent_memories. Because the JSON search index is configured with sync (on commit), newly stored handoffs are searchable immediately after commit.

That last detail is important. Delayed indexing is exactly the kind of thing that makes an agent feel unreliable.

Recording user, agent, and tool messages

With our database connection, it's easy to record chat sessions in the database. To do this with LangChain4j, we implement the ChatMemory interface in the LoggingChatMemory.java class.

Each session gets its own unique conversation ID, and user/agent/tool messages are written to the agent_conversation_log table.

That table captures:

conversation_id
message_seq
role and message type
message text
tool name and tool call ID when relevant
optional JSON context
creation timestamp

That distinction tends to get blurred in agent demos. It should not.

Why database memory is useful for agents

Chat windows and flat files can't scale the same way a database can. A database-backed memory layer gives you:

durable storage
structured metadata
many types of retrieval: semantic, text, relationship, graph, etc.
transactional writes and concurrency
better auditability

Databases can help you progress from agent demos to real applications that effectively utilize agent memory.

Code pointers

If you want to explore the implementation, start here:

README.md -> app overview
OpsMemoryAgentApplication.java -> Main class and agent loop
MemoryRepository.java -> Memory retrieval for text and vector search
MemoryTools.java -> LangChain4j tool bindings to search and store memories
LoggingChatMemory.java -> LangChain4j ChatMemory implementation to log chat interactions
MemoryRepositoryIntegrationTest.java -> test using Oracle AI Database Free and Testcontainers

The tests validate the behavior that matters

The integration tests are worth reading because they verify the actual retrieval patterns we care about:

exact text search finds the checkout incident for CHG2145 and INC4721
vector search finds the same incident from a paraphrased outage description
hybrid fusion marks the strongest result as matched by both channels
a stored handoff can be found on the next combined search

Where you can take this next

If you'd like to extend this sample, here's a few ideas to play with:

Add "forgetting" with recency ranking so newer memories are ranked as more relevant.
Parameterize scoring and filtering mechanisms to make the app more flexible.
Add another agent tool that uses an LLM to judge search results.
Add approval/rejection when storing memories. Maintain a log of failures so the agent knows what not to do.

Using Agent Skills to develop with Oracle AI Database

Anders Swanson — Wed, 22 Apr 2026 17:19:24 +0000

Skills are reusable, task-specific workflows for your agents: each skill is a directory centered on a SKILL.md file, with optional scripts, references, and assets packaged together.

Skills bring context efficient capabilities to agents in a reliable, repeatable manner.

In this article, we'll look at what skills are, how to install them, and how the open source oracle-db-skills repo can help your agents do real work with Oracle AI Database. You can even easily write or extend the existing skills on your own!

Install skills
Using Oracle AI Database skills
Have a suggestion or a new skill? Contribute it!

Install skills

First, install the skills into your agent. With Codex, you can use the skills installer. Make sure to restart Codex afterwards:

$skill-installer https://github.com/krisrice/oracle-db-skills

With Claude Code, either clone or copy the repo into your Claude-specific skills directory:

git clone git@github.com:krisrice/oracle-db-skills.git
cp -r oracle-db-skills ~/.claude/skills/

Using Oracle AI Database skills

Let's use our newly installed skills to create a local Oracle AI Database Free container. These containers are great for local dev, POCs, and testing.

Using Codex, we can explicitly invoke the skill using the $oracle-db-skills skill command. With Claude Code, use /oracle-db-skills to invoke the skill. Let's do so using the following prompt:

$oracle-db-skills start a new oracle ai database container for local development and give me the sqlcl connection command

Using this prompt, Codex inspected my local environment to ensure it could pull and run the container, and then ran the following command to start an Oracle AI Database Free container:

• Ran docker run -d --name oracle-ai-free -p 1521:1521 -e ORACLE_PWD=Oracle123! -v oracle-ai-free-data:/opt/oracle/oradata
  │ container-registry.oracle.com/database/free:latest
  └ b4daa0c4f9b5966087395005ddd251e68a7cdf1cae851b006cd7800da947fd04

After verifying the container started correctly, Codex then gave me the following SQLcl commands to connect:

• Started a new Oracle AI Database Free container from the official image as oracle-ai-free. It's healthy on localhost:1521, and the PDB
  FREEPDB1 is open read/write.

  Use SQLcl like this for normal development work:

  sql system/Oracle123!@//localhost:1521/FREEPDB1

  If you need SYSDBA, use:

  sql sys/Oracle123!@//localhost:1521/FREE as sysdba

Both of which connect to the container started by the agent, using the skill.

Have a suggestion or a new skill? Contribute it!

The oracle-db-skills GitHub repo is open source, so if you have an idea or suggestion, go ahead and contribute it. The repo is designed to be an evolving set of curated skills, meaning developer feedback is essential.

16 Ways to Make a Small Language Model Think Bigger

Wojtek Pluta — Tue, 21 Apr 2026 07:56:58 +0000

This article is syndicated from the original post on blogs.oracle.com. Read the canonical version there for the latest updates.

All of the code in this article is available in the Oracle AI Developer Hub. The repository is part of Oracle’s open-source AI collection and serves as the reference implementation for everything covered here.

You can install it with pip install agent-reasoning, browse the 16 agent classes, run the TUI, or integrate it directly into an existing Ollama pipeline as a zero-change replacement client. If you find it useful, a GitHub star goes a long way.

Key Takeaways

Small language models struggle with complex reasoning on their own, but agent-based architectures (like Tree of Thoughts or Self-Consistency) can significantly improve their performance.
The agent-reasoning framework adds 16 research-backed reasoning strategies to any Ollama model using a simple +strategy tag—no code changes required.
Different strategies suit different tasks: CoT works well overall, ReAct excels with external data, and branching methods improve accuracy at the cost of speed.
Much of modern AI progress comes from orchestration (prompting, search, control flow), not just larger models.

Generally, a 270M parameter LLM (as of today, April 2026) struggles with even basic multi-step reasoning. Ask a model like gemma3:270m to solve the classic water jug problem, and it will often return a confidently incorrect answer—much like other small language models (SLMs) of similar size and training.

However, take that same model and wrap it inside a Tree of Thoughts (ToT) agent, running a breadth-first search (BFS) with three levels and weighted branches, and it can reliably solve the puzzle. The improvement comes from the architecture: the agent distributes the reasoning process across structured exploration steps, compensating for the limitations of a single LLM call.

This is where things get interesting. Much of the progress in applied AI isn't coming from bigger models alone, but from engineers rethinking how to orchestrate them—layering search, memory, and control flow on top of a standard LLM call to unlock new capabilities.

This is the fundamental idea behind agent-reasoning: sixteen cognitive architectures—each backed by peer-reviewed research—can be applied to any Ollama-served model via a simple +Strategy tag appended to the model name. Call gemma3:270m+tot instead of gemma3:270m, and the interceptor handles everything else.

We’ll talk about the different ways to invoke these reasoning strategies through the project.

What You’ll Learn

How the ReasoningInterceptor intercepts model names, removes the +Strategy tag, and directs traffic to one of 16 agent classes
How 16 strategies divide into four families: sequential, branching, reflective, and meta —each representing a different reasoning approach and set of trade-offs
What each major strategy accomplishes in practice, focusing on implementation rather than theory
Which type of problem each strategy is best suited for, based on benchmark results from March 2026

The Interception Layer

Key insight: The ReasoningInterceptor is an interchangeable drop-in client for Ollama that analyzes the model name for a +Strategy tag and directs traffic to one of 16 cognitive agent classes while making no modifications to your pre-existing code.

Everything relies on a single template: add +Strategy to any Ollama model name.

Using ReasoningInterceptor as a drop-in replacement client

The image below illustrates the entire routing process from start to finish. The interceptor acts as a middleman between your code and Ollama, removes the +Strategy tag, and sends traffic to the correct agent class.

Illustrating how the interceptor separates the base model from the Strategy tag

agent_map contains over fifty-five aliases mapped to sixteen agent classes. For example, cot, chain_of_thought, and CoT all map to CotAgent, while mcts and monte_carlo map to MCTSAgent. Because the interceptor is a drop-in client for Ollama—supporting the same .generate() and .chat() APIs— existing LangChain pipelines, web UIs, and scripts can automatically gain reasoning capabilities by changing a single string in the model name.

Additionally, the interceptor can be used as a network proxy. Instead of pointing an Ollama compatible application at http://localhost:11434, direct it to http://localhost:8080 instead. Using a model name like gemma3:270m+CoT, the gateway will apply reasoning transparently.

Family 1: Sequential Strategies

Key insight: Sequential Strategies process problems in a linear chain, where each step feeds into the next. In benchmarks, CoT achieved 88.7% average accuracy, compared to 81.3% for standard generation on the same model and weights.

Each of the sixteen strategies fall into one of four families. The diagram below illustrates how they are grouped.

Categorization of the four strategy families

Sequential strategies are designed for high-speed processing with minimal latency. They are ideal for problems with discrete, sequential steps.

Chain of Thought (CoT)

Paper: Wei et al. (2022), “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models”

Chain of Thought (CoT) is a prompting strategy in which the model generates intermediate reasoning steps before producing a final response. As noted in the original paper: prompting a model to produce these intermediate steps can significantly improve accuracy.

For example, standard prompting on GSM8K achieves 66.7% accuracy. With CoT prompting, this increases to 73.3%— a 10% relative improvement achieved through simple prompt design alone.

The following graphic illustrates how CoT chains appear in practice: a sequence of numbered steps, each building on the previous one.

CoT in operation

In terms of implementation within CotAgent, the query is wrapped in a structured prompt:

Structured prompting enforces step-by-step reasoning in CoTAgent

Benchmark result for qwen3.5:9b (9.7B): CoT achieves 88.7% average accuracy, across GSM8K (math), MMLU (logic), and ARC-Challenge (reasoning), compared to 81.3% for standard generation. This seven-point gain in performance is attributable solely to structural prompts. Identical weights and temperatures were applied to both models.

Recommended usage: Math word problems; logic puzzles; any multi-step reasoning task where the individual steps are sequential and do not have branches.

Decomposed Prompting

Paper: Khot et al. (2022), “Decomposed Prompting: A Modular Approach for Solving Complex Tasks”

Decomposed prompting is an architectural module that splits large problems into smaller sub-problems. Each sub-problem is handled independently while carrying forward accumulated context from earlier steps. Once all sub-problems are processed, their outputs are synthesized into a final result. DecomposedAgent follows a three-phase process—decomposition, execution and synthesis—and propagating context throughout so that each step can build on prior results.

Recommended usage: Planning problems; trip itinerary generation; any problem where the ultimate answer consists of multiple distinguishable parts that may be individually addressed.

Note: Decomposed prompting achieved only 38.5% average accuracy in benchmark testing. This result requires context. GSM8K primarily evaluates arithmetic reasoning, where decomposing a problem like “what is 47 × 13 + 9?” introduces overhead without improving the model's ability to compute the answer.

Decomposition is more effective for problems with genuinely separable components (trip planning, multi-section reports etc.), where each part benefits from focused attention. These strengths are not captured by the benchmark, and the results reflect that mismatch.

Least-to-Most Prompting

Paper: Zhou et al. (2022), “Least-to-Most Prompting Enables Complex Reasoning in Large Language Models”

Least-to-most prompting is a strategy that orders sub-questions from simplest to most complex, establishing prerequisite knowledge before tackling harder steps. Unlike decomposed prompting which generates arbitrary sub-problems, it enforces a deliberate progression where each step builds on the last. Knowledge is accumulated iteratively until the model reaches the final question.

Recommended usage: Questions with genuine prerequisites — e.g., “what is x?” before determining “how does x relate to y?”; educational style explanation sequences (“concept ladder”); tasks that require establishing foundational concepts before addressing more complex components.

Family 2: Branching Strategies

Key insight: Branching strategies explore multiple reasoning paths simultaneously and choose the best path. ToT scored 76.7% on GSM8K math, compared to 66.7% on GSM8K math with standard generation.

More LLM calls mean higher latency— but often better answers on hard problems. Take this into consideration when running all branching strategies.

Tree of Thoughts (ToT)

Paper: Yao et al. (2023), “Tree of Thoughts: Deliberate Problem Solving with Large Language Models”

ToT is a search-based methodology that evaluates numerous possible reasoning paths concurrently, selecting the best performing path as determined by evaluation metrics such as distance traveled or quality of intermediate solutions etc.

Similar to chess engines, ToT applies BFS through an expanding tree of possible solutions. The core idea is straightforward: generate multiple partial solutions, evaluate them, prune weaker candidates, and continue exploring the most promising branches.

Below is an illustration of how ToT generates and eliminates branches: green nodes represent surviving branches, while red nodes indicate those that have been eliminated. The final answer is derived from the highest scoring leaf node.

A key design decision is how branches are evaluated. Should the same model handle both generation and scoring, or should a stronger model be introduced as a judge? In these benchmarks, the same model was used for both roles, but this is an area worth experimenting with, depending on your accuracy and latency constraints.

Generating candidate branches at each level

ToTAgent implements this as configurable by depth (default=3) and width (default=2 branches). At every level, the agent generates a set of candidate next steps, evaluates them using a scoring function, prunes low-scoring options, and expands the remaining candidates into the next level.

Tot achieved 76.7% accuracy—a 10% percent improvement over standard generation on GSM8K math problems. This performance comes at a cost: additional LLM calls are required at each step to evaluate candidate paths and their intermediate result, making it roughly 5-8x slower than CoT equivalent queries.

Recommended usage: Logic puzzles with multiple solution paths; strategic decision problems; tasks where multiple approaches can be explored and compared.

Self-Consistency (Majority Voting)

Paper: Wang et al. (2022), “Self-Consistency Improves Chain of Thought Reasoning in Language Models”.

Self-Consistency is a sampling method that generates multiple independent reasoning traces and selects a final answer through majority voting. Unlike standard prompting, it relies on sampling k diverse traces at a higher temperature to encourage variation. Each trace produces a candidate answer, and the most frequently occurring answer is selected as the final output.

The image below illustrates how both Self-Consistency and Monte Carlo Tree Search (MCTS) sample multiple reasoning paths, but differ fundamentally in how those paths are evaluated—majority voting versus UCB1-based exploration-exploitation balancing.

Self-Consistency vs MCTS comparison

ConsistencyAgent uses k=5 samples at temperature of 0.7 by default. It extracts final answers using regex-based pattern matching and selects the most frequent result via counter.most_common().

Self-Consistency matches CoT on both MMLU (96.7%) and GSM8K (76.7%). Its advantage lies in reliability rather than raw accuracy: majority voting across independent reasoning traces reduces the risk of single-trace errors propagating to the final answer.

Recommended usage: Factual question answering; multiple-choice style questions; problems where arriving at the correct answer via diverse reasoning paths is more important than inspecting a single reasoning trace.

Family 3: Reflective Strategies

Self-Reflection

Paper: Shinn et al. (2023), “Reflexion: Language Agents with Verbal Reinforcement Learning” — arXiv:2303.11366

Self-Reflection is a draft-critique-refine loop in which the model generates an initial answer, critiques it for errors, and then revises it. The Reflexion paper showed that this iterative process can meaningfully improve output quality, even without any gradient updates.

The image below shows all 3 reflective strategies side by side: Self-Reflection, Debate, and Refinement Loop.

Reflective strategies comparison

SelfReflectionAgent runs a draft-critique-refine loop for up to 5 iterations, with early termination when the critique returns “CORRECT” in under 20 characters. If the critique is satisfied on an early pass, subsequent iterations are skipped. This approach helps keeps latency low for queries the model answers correctly on the initial pass.

Recommended usage: Creative writing, high-stakes technical explanations, anything where “good enough on the first try” is insufficient.

Adversarial Debate

Paper: Irving et al. (2018), “AI Safety via Debate”

Irving proposed debate as a mechanism for improving AI safety. Two agents present opposing arguments, and a judge (either a human or another LLM) evaluates their merits. The underlying premise is that that identifying flaw in weak arguments is often easier than constructing strong ones.

DebateAgent conducts multiple rounds of PRO and CON arguments, with a judge evaluating each exchange. Following all rounds, the strongest arguments from both sides are synthesized into a final answer that balances competing perspectives. Context is carried forward between rounds, enabling incremental refinement rather than redundant arguments.

Recommended usage: Controversial or ambiguous subjects; policy analysis; ethics and any subject matter requiring a balanced perspective.

Refinement Loop

Paper: Madaan et al. (2023), “Self-Refine: Iterative Refinement with Self-Feedback”

This paper describes a refinement loop similar to self-reflection, but instead of relying on a human-style critique to guide revisions, it uses a machine-based evaluation system with quantifiable quality metrics. These metrics determine whether further refinement is necessary. The loop terminates when a predefined quality metric is reached (> 0.9 by default) or when the maximum number of iterations is exceeded.

The five-stage complex refinement pipeline consists of sequential stages, each focused on a distinct type of critique: technical accuracy, structure, depth, examples, and polish.

Each stage targets a distinct aspect of quality, ensuring the model focuses exclusively on improving that dimension rather than attempting to optimize everything at once.

Recommended usage: Highly technical writing; documentation; blog posts, a scenario where production-quality output is required rather than simply a first draft.

Family 4: Cross-Domain and Meta Strategies

Key insight: Cross-domain strategies enable sharing knowledge among disciplines, while meta-strategies automatically route queries to the most appropriate reasoning technique without requiring manual selection.

Analogy-Based Reasoning

Paper: Gentner (1983), “Structure Mapping: A Theoretical Framework for Analogy”, Cognitive Science

Gentner's structure-mapping theory proposes that analogical reasoning operates by identifying structural correspondences across domains, rather than relying on surface-level similarity. The AnalogicalAgent builds on this idea through three phases: (1) identify the underlying structure independent of domain specifics, (2) generate analogous solutions from different domains that share that structure, (3) select the most effective analogy and apply its solution approach.

This process reduces reliance on memorized patterns. By focusing on underlying structure, the model learns why a solution works, rather than simply recalling what worked before.

Recommended usage: Solving problems that are structurally similar to prior ones, even if they differ superficially; transferring knowledge across domains; explaining complex concepts through analogy.

Socratic Questioning

Paper: Paul & Elder (2007), “The Art of Socratic Questioning”

The Socratic Method: Do not answer the question directly. Instead, ask follow-up questions that reduce ambiguity in the solution space.

SocraticAgent repeatedly asks questions and receives model responses, continuing until it reaches a limit of five question-response exchanges. It then synthesizes the collected information into a final answer. A deduplication or normalization step helps prevent repeated queries that differ only in wording.

Recommended for: Philosophy; ethics; deep technical knowledge; any field requiring the model to “know” something as opposed to merely answering it.

ReAct (Reason + Act)

Paper: Yao et al. (2022), “ReAct: Synergizing Reasoning and Acting in Language Models”

ReAct is a conceptual framework that interweaves reasoning steps with tool invocations, allowing the model to ground its thinking in external information. In practice, the model decides what action to take, calls a tool such as a web search engine, examines the result, updates its reasoning, and repeats the cycle until it reaches a satisfactory answer. Current tools include web scraping, accessing Wikipedia via an API call, and a calculator interface, with mock-ups available for off-line execution scenarios.

Using ReAct achieved 70.0% accuracy on ARC-Challenge (Science Reasoning). While not the highest on this particular benchmark, it enabled tool use for the LLM and allowed it to search for required information on the Internet.

Recommended usage: Fact-checking; current events queries; mathematical calculations; tasks where access to grounded, external information is important.

Auto Router: MetaReasoningAgent

Key insight: A single LLM invocation allows MetaReasoningAgent to classify each input into one of eleven categories and route it to the most appropriate strategy, without human intervention.

All sixteen strategies depend on selecting the appropriate strategy for a given task. By removing this requirement, MetaReasoningAgent eliminates the need for manual selection.

The diagram below shows how each category maps to its corresponding strategy.

MetaReasoningAgent classification diagram

MetaReasoningAgent instantiates the selected strategy class and passes control to it, along with all event objects for visualization.

To use this capability, specify a model such as gemma3:270m+meta or gemma3:270m+auto.

In practice, routing is generally intuitive: math problems are directed to CoT, logic puzzles to ToT, philosophical questions to Socratic Questioning, and controversial topics to Adversarial Debate.

The trade-off is reduced control over strategy-specific hyperparameters in exchange for automatic routing aligned with the problem type.

What Strategy Should You Pick? Benchmark Results (March 2026)

Key insight: CoT performs best on average (88.7%) across diverse tasks. ReAct excels when tool use is available (70.0% on ARC-Challenge). ToT and Self-Consistency tie on GSM8K math at 76.7%.

These results are based on 4,200 evaluations across 11 strategies using qwen3.5:9b, collected as of March 2026. All 16 strategies are implemented and production-ready. However, the benchmarks shown below focus on the 11 that produce a single extractable answer. The remaining five are generation-focused and not suited to multiple-choice evaluation.

The heat map and bar chart below provide a complete view of the results.

Benchmark results heatmap and bar chart

The short version: CoT wins on average across diverse tasks. Self-Consistency and ToT beat it on specific math benchmarks. ReAct dominates on factual/science tasks. Self-Reflection and Refinement Loop are not well captured by these benchmarks, as they primarily improve generation quality rather than multiple-choice accuracy.

For most queries, start with +cot. If you’re solving logic puzzles or planning problems, try +tot. If you need factually grounded responses, use +react. If you need polished, high-quality output rather than a quick answer, use +refinement. When in doubt, +meta will route the query automatically.

In my experience building agent-reasoning, the most surprising finding is how much prompt structure alone can improve performance. For example, qwen3.5:9b improves from 81.3% to 88.7% average accuracy simply by prompting it to produce numbered reasoning steps.

As of March 2026, all 16 strategies are production-ready and have been evaluated across 4,200 benchmark runs.

You can find the repository here. Install with pip install agent-reasoning or uv add agent-reasoning. The commands to get started:

Getting started commands

The TUI provides a 16-agent sidebar, live streaming, and a step-through debugger. Arena mode runs all 16 agents simultaneously on the same query in a 4×4 grid.

If this is useful, a GitHub star is always appreciated.

Frequently Asked Questions

Do I need to modify my existing code to use agent-reasoning?

No. The interceptor is a drop-in replacement for the Ollama client. Just change the model name string by appending +strategy (e.g., gemma3:270m+cot) and the interceptor handles everything else. Existing LangChain pipelines, web UIs, and scripts work without any other changes.

Which strategy should I start with?

Start with +cot (Chain of Thought). It scored the highest average accuracy (88.7%) across our benchmarks and adds minimal latency. If you are unsure, use +meta and let the auto-router pick the best strategy for you.

Why were only 11 of the 16 strategies benchmarked?

The benchmarks (GSM8K, MMLU, ARC-Challenge) measure multiple-choice accuracy, which works well for strategies that produce a single extractable answer. The remaining five strategies are generation-focused (e.g., Refinement Loop, MCTS) and their strengths in output quality are not captured by multiple-choice evaluations. All 16 strategies are fully implemented and production-ready.

Can I use this with models other than Ollama-served models?

Currently the interceptor targets the Ollama API. Since it exposes the same .generate() and .chat() endpoints, any Ollama-compatible client works out of the box. Support for additional inference backends is on the roadmap.

How much slower are branching strategies compared to CoT?

ToT is roughly 5-8x slower than CoT because it generates and evaluates multiple candidate branches at each level. Self-Consistency (k=5 samples) adds similar overhead. For latency-sensitive applications, stick with sequential strategies (CoT, Least-to-Most) and reserve branching strategies for problems where accuracy matters more than speed.

Created by Nacho Martinez, Data Scientist at Oracle. Find Nacho on GitHub and LinkedIn, or visit the Oracle AI Developer page for more resources.

Agent Memory: Why Your AI Has Amnesia and How to Fix It

Wojtek Pluta — Fri, 17 Apr 2026 09:08:10 +0000

Key Takeaways

Today's AI agents forget everything between conversations. Every interaction starts from zero, with no recall of who you are or what you've discussed before.
Agent memory isn't about bigger context windows. It's about a persistent, evolving state that works across sessions.
The field has converged on four memory types (working, procedural, semantic, episodic) that map directly to how human memory works.
Building agent memory at enterprise scale is fundamentally a database problem. You need vectors, graphs, relational data, and ACID transactions working together.

What Is Agent Memory and Why Does Your AI Agent Need It?

You've spent weeks building an AI customer service agent. It handles complaints, processes refunds, even cracks the occasional joke when the moment's right. A customer calls back the next day, and your agent has no idea who they are. The conversation from yesterday? Gone. The preference they mentioned twice last week? Never happened. Every single interaction starts from scratch.

This isn't a bug in your code. It's a fundamental design problem in how we build AI agents today.

LangChain put it well: 'Imagine if you had a coworker who never remembered what you told them, forcing you to keep repeating that information'. In the coworker scenario, that’s frustrating, and for AI applications, forgetfulness, that’s a dealbreaker.

At Oracle, we've been deep in this problem as we continue to provide support to customers building AI applications. And here's what we've found: the solution isn't bigger context windows or more verbose prompts. It's a proper memory infrastructure. The kind that databases have been providing for decades.

Agent memory is the composition of system components and infrastructure layer that gives AI agents a persistent, evolving state across conversations and sessions. It enables agents to store, retrieve, update, and forget information over time: learning user preferences, retaining context from past interactions, and adapting behavior based on accumulated experience. Without it, every interaction starts from zero.

This article breaks down what agent memory actually is, how it works under the hood, the frameworks shaping the field, and guidance on how to build it for production. Whether you're prototyping your first agent or scaling one to thousands of users, this is the foundation you need to get right.

Why Bigger Context Windows Aren't the Answer

The rapid expansion of context windows, now ranging from hundreds of thousands to millions of tokens, has created a convincing illusion across the industry: that with this much capacity available, the memory problem is effectively solved and retrieval-based mechanisms are behind us. That assumption is wrong.

The industry calls it 'the illusion of memory'. Stuffing more tokens into a prompt isn't memory. It's a bigger Post-it note: more space to scribble on, but it still goes in the bin when the conversation ends. Memory means the notes survive. Here's why that distinction matters:

Context windows degrade before they fill up. Most models break well before their advertised limits. A model claiming 200K tokens typically becomes unreliable around 130K, with sudden performance drops rather than gradual degradation.

There's no sense of importance. Context windows treat every token equally. Your name gets the same weight as a throwaway comment from three weeks ago. There's no prioritisation, no salience, no relevance filtering.

Nothing persists. Close the session and it's all gone. Every conversation starts from zero.

The cost scales linearly. Maintaining full context across a long agent lifetime gets expensive fast. You're paying per token, and most of those tokens are irrelevant noise.

Memory is not only about storing chat history or passing more tokens into the context window. It's about building a persistent state stored in an external system, that evolves and informs every interaction the agent has, even weeks or months apart.

Another misconception to address early on is that RAG (retrieval augmented generation) is agent memory. RAG brings external knowledge into the prompt at inference time. It's great for grounding responses with facts from documents. But RAG is fundamentally stateless. It has no awareness of previous interactions, user identity, or how the current query relates to past conversations. Memory brings continuity. Put simply: RAG helps an agent answer better. Memory helps it learn and adapt. You need both.

The Concept: A Mental Model for Agent Memory

Let me give you a framework that makes all of this click. It maps directly to how your own brain works.

In 2023, researchers at Princeton published the CoALA framework (Cognitive Architectures for Language Agents). It defines four types of memory, drawn from cognitive science and the SOAR architecture of the 1980s. Every major framework in the field builds on this taxonomy, and it answers a fundamental question: what options are available for adding memory to an AI agent?

Memory Type	Human Equivalent	What It Does in an Agent	Example Implementation
Working Memory	Your brain's scratch pad: holding what you're actively thinking about	Current conversation context, retrieved data, intermediate reasoning	Conversation buffers, sliding windows, rolling summaries, scratchpads
Procedural Memory	Muscle memory: knowing how to ride a bike without thinking	System prompts, agent code, decision logic	Prompt templates, tool definitions, agent configs
Semantic Memory	General knowledge: facts and concepts accumulated over your lifetime	User preferences, extracted facts, knowledge bases	Vector stores with similarity search
Episodic Memory	Autobiographical memory: recalling specific experiences from your past	Past action sequences, conversation logs, few-shot examples	Timestamped logs with metadata filtering

Think of it this way. When you're in a meeting, your working memory holds what's being discussed right now. Your procedural memory knows how to take notes and when to speak up. Your semantic memory reminds you that Sarah's team prefers Slack over email. Your episodic memory recalls that the last time you proposed this feature, the VP shut it down because of budget constraints.

An agent needs all four types working together. Most agents today only have working memory: whatever fits in the current context window. That's like trying to do your job using nothing but a whiteboard that gets wiped clean every evening.

Lilian Weng's influential formula captures the big picture simply:

Agent = LLM + Memory + Planning + Tool Use.

Her short-term memory maps to CoALA's working memory. Her long-term memory encompasses the other three types.

LangChain adds a practical layer with two approaches to memory updates:

Hot path memory: the agent explicitly decides to remember something before responding. This is what ChatGPT does. It adds latency but ensures immediate memory updates.
Background memory: a separate process extracts and stores memories during or after the conversation. No latency hit, but memories aren't available straight away.

The key insight: memory is application-specific. What a coding agent remembers about a user is very different from what a research agent might store.

Letta (formerly MemGPT) takes a different angle entirely, borrowing from operating systems. Treat the context window like RAM and external storage like a disk. The agent pages data between these tiers, creating a 'virtual context' that feels unlimited. The agent manages its own memory using tools: it decides what to remember, what to update, and what to archive.

The distinction between programmatic memory (developer decides what to store) and agentic memory (the agent itself decides) matters. The field is moving towards the latter. Agents that manage their own memory adapt to individual users without requiring developer intervention for each new use case. The decision as to which memory operations are programmatic and agent triggered isn’t always as clear cut, and we’ve seen various approaches work well in certain use cases and domains. In a future post, we will go into the common patterns and design principles of memory engineering.

Referring back to the customer service agent from the start of this article. Customer service is the most common use case for agents in production (26.5% of deployments, per LangChain's 2025 industry survey), and it demands all four memory types working together. Episodic memory recalls past tickets and interactions. Semantic memory stores customer preferences and account details. Working memory tracks the live conversation. Procedural memory encodes resolution workflows and escalation rules. All four memory types enable the chatbot to perform well on continuous tasks and adapt to new information.

The Landscape: Frameworks and Open-Source Libraries

What are the commonly used libraries and open-source projects for agent memory? The ecosystem has matured quickly. Here are the projects shaping how developers build agent memory today.

Project	What It Does	Open Source
LangChain / LangMem / LangGraph	Agent orchestration with built-in memory abstractions. Hot path and background memory. LangMem SDK handles extraction and consolidation.	Yes
Letta (MemGPT)	Stateful agent platform with OS-inspired memory hierarchy. Agents self-edit their own memory via tool calls.	Yes
Zep / Graphiti	Temporal knowledge graphs for relationship-aware memory. Bi-temporal modelling with sub-200ms retrieval.	Yes (Graphiti)
Mem0	Self-improving memory layer with vector and graph architecture. Automatic memory extraction and conflict resolution.	Yes
langchain-oracledb	Official LangChain integration for Oracle Database. Vector stores, hybrid search, and embeddings with enterprise-grade security.	Yes

The orchestration library matters, but at scale, the storage backend matters more. Most of these frameworks are database-agnostic by design. The question isn't which framework to use. It's what database sits underneath it.

The Deep Dive: How Agent Memory Actually Works

What are the common storage options for agent memory? Production systems today use three paradigms working together. You need to understand all three.

Vector stores for semantic memory

This is the most common approach. You take text, convert it to embeddings (typically 128 to 2,048 dimensions depending on embedding model utilised), and store them in a vector database. Retrieval works through vector search, against vectors that are indexed using HNSW (hierarchical navigable small world); typically we find the memories (embeddings in database) that are semantically closest to the current query.

It's fast and simple but limited. Vector search captures semantic similarity well, yet misses structural relationships.

Knowledge graphs for relationship memory

Vector search can tell you that a user mentioned coffee. But it can't tell you that they prefer a specific shop, ordered last Tuesday, and always get oat milk. That chain of connections (person, preference, place, time, detail) is a graph problem.

Knowledge graphs store facts as entities and relationships, with edges capturing how they connect. Add bi-temporal modelling (tracking both when events happened and when the system learned about them) and you can ask not just 'what do we know?' but 'what did we know at any point in time?'

Frameworks like Zep's Graphiti implement this pattern, achieving 94.8% accuracy on the Deep Memory Retrieval benchmark. Oracle Database supports property graphs natively through SQL/PGQ, so graph queries run inside the same engine as your vector search and relational data.

Structured databases for factual memory

Relational databases store the structured data: user profiles, access controls, session metadata, audit logs. As Cognee puts it: 'Vectors deliver high-recall semantic candidates (what feels similar), while graphs provide the structure to trace relationships across entities and time (how things relate).' Relational tables anchor both with the transactional guarantees that production systems demand.

Why does a converged database change the equation?

Most teams stitch this together with separate databases: Pinecone for vectors, Neo4j for graphs, Postgres for relational data. Three security models, three failure modes, no shared transaction boundaries. If one write fails, your agent's memory is in an inconsistent state.

Oracle's converged database runs all three paradigms natively inside a single engine:

AI Vector Search for embedding storage and similarity retrieval
SQL/PGQ for property graph queries across entity relationships
Relational tables for structured data, metadata, and audit trails
JSON Document Store for flexible, schema-free memory objects

All four share the same ACID transaction boundary and the same security model. Row-level security, encryption, and access controls apply uniformly across every data type. One engine, one transaction, one security policy: the three paradigms above become three views of the same underlying data.

The Four Memory Operations

Every memory system runs on four core operations:

ADD: Store a completely new fact
UPDATE: Modify an existing memory when new information complements or corrects it
DELETE: Remove a memory when new information contradicts it
SKIP: Do nothing when information is a repeat or irrelevant

Modern memory systems delegate these decisions to the LLM itself rather than using brittle if/else logic. The extraction phase ingests context sources (the latest exchange, a rolling summary, recent messages) and uses the LLM to extract candidate memories. The update phase compares each new fact against the most similar entries in the vector database, using conflict detection to determine whether to add, merge, update, or skip.

Retrieval: how agents recall

Due to the heterogenous nature of data that agents encounter, production systems combine multiple retrieval approaches:

Semantic search: vector similarity (cosine distance) for meaning-based matching
Temporal search: bi-temporal models enable point-in-time queries ('What did the user prefer last March?')
Graph traversal: multi-hop queries across knowledge graph edges for complex reasoning
Hybrid retrieval: combining keyword (full-text) and semantic (vector) search in a single query, which is critical for retrieving specific facts like names, dates, or project codes alongside conceptually related memories

Forgetting: the underrated operation

Effective forgetting can be implemented with decay functions applied to vector relevance scores: by analysing the results of vector search, old and unreferenced embeddings naturally fade from the agent's attention, imitating biological human memory decay patterns. In a database, this is straightforward. A recency-weighted scoring function multiplies semantic similarity by an exponential decay factor based on time since last access. The result: memories that haven't been recalled recently lose salience gradually, just like human recall.

Some systems take a different approach entirely. Old facts are invalidated but never discarded, preserving historical accuracy for audit trails. The right strategy depends on your use case, but both are fundamentally database operations.

The Enterprise Reality: What Changes at Scale

Here's where the gap between demo and production becomes a chasm.

KPMG's Pulse Survey of 130 C-suite leaders (all at companies with over $1B revenue) found that 65% cite agentic system complexity as the top barrier for two consecutive quarters. Agent deployment has more than doubled, from 11% in Q1 2025 to 26% in Q4 2025, but that still means three quarters of large enterprises haven't deployed. McKinsey puts it even more starkly: only 1% of leaders describe their companies as 'mature' in AI deployment.

The problems that surface at scale are database problems. They've been database problems all along.

Security and isolation. Memory must be scoped per user, per team, per organisation. Memory poisoning is a real attack vector: adversaries can inject malicious information into an agent's memory to corrupt future decision-making. You need row-level security, not just namespace-level isolation.

Multi-tenancy. Agents serving multiple organisations need complete data isolation. Most vector-only databases offer namespace-level separation. That's not the same as the row-level security that regulated industries require. Oracle's native PDB/CDB architecture provides inherent multi-tenant isolation.

Compliance is getting complex. GDPR's right to be forgotten applies to explicit agent memory stores. But the EU AI Act (fully applicable from August 2026) requires 10-year audit trails for high-risk AI systems. Think about that tension: you need to delete personal data on request while maintaining a decade of audit history. That requires architectural sophistication that most startups are only beginning to address.

ACID transactions matter. Agent memory operations often touch multiple data types simultaneously. Updating a vector embedding, modifying a graph relationship, and changing relational metadata must all succeed or all fail. Without atomicity, partial memory updates leave your agent in an inconsistent state.

These aren't theoretical concerns. They're the reasons three quarters of enterprises are still stuck at the pilot stage.

The Implementation: Building Agent Memory with LangChain and Oracle

Let's get practical. We'll use LangChain as our orchestration framework and Oracle Database as the memory backend, using the langchain-oracledb package. Here's how quickly you can go from zero to a working memory system.

pip install langchain-oracledb oracledb langchain-core

Connect and create a vector store

This is all it takes to set up a production-ready vector store backed by Oracle:

import oracledb
from langchain_oracledb.vectorstores import OracleVS

# Create a connection pool (production-ready)
pool = oracledb.create_pool(
 user="agent_user", password="password",
 dsn="hostname:port/service",
 min=2, max=10, increment=1
)

# Initialise vector store for semantic memory
semantic_memory = OracleVS(
 client=pool.acquire(),
 embedding_function=embeddings, # any LangChain-compatible embeddings
 table_name="AGENT_SEMANTIC_MEMORY",
 distance_strategy=DistanceStrategy.COSINE,
)

That's your semantic memory store. Oracle handles the vector indexing, ACID transactions, and security natively. No separate vector database needed.

Store and retrieve a memory

The core pattern is simple: write memories with metadata, retrieve them with similarity search.

# Store a memory
semantic_memory.add_texts(
 texts=["User prefers dark mode and concise responses."],
 metadatas=[{"user_id": "user_123", "memory_type": "preference"}]
)

# Retrieve relevant memories
results = semantic_memory.similarity_search(
 "What are this user's preferences?",
 k=5,
 filter={"user_id": "user_123"}
)
for doc in results:
 print(doc.page_content)

From here, you can create separate vector stores for each memory type (semantic, episodic, procedural) under the same Oracle instance, all sharing the same security policies and transaction guarantees.

Go deeper: the full memory engineering notebook

The snippets above show the building blocks, but a production agent memory system needs considerably more. We've published a complete, runnable notebook in the Oracle AI Developer Hub that implements the full architecture discussed in this post. This notebook builds a complete Memory Manager with six distinct memory types, each backed by Oracle:

Memory Type	Purpose	Storage
Conversational	Chat history per thread	SQL Table
Knowledge Base	Searchable documents and facts	SQL Table + Vector Enabled
Workflow	Learned action patterns	SQL Table + Vector Enabled
Toolbox	Dynamic tool definitions with semantic retrieval	SQL Table + Vector Enabled
Entity	People, places, systems extracted from context	SQL Table + Vector Enabled
Summary	Compressed context for long conversations	SQL Table + Vector Enabled

It also covers context engineering (monitoring context window usage, auto-summarisation at thresholds, just-in-time retrieval), semantic tool discovery (scaling to hundreds of tools while only passing the relevant ones to the LLM), and a complete agent loop that ties everything together.

Run the notebook: oracle-devrel/oracle-ai-developer-hub

The Perspective: Where This Is Heading

Here's what I think is coming, and where I'm still working things out.

Sleep-time computation will change the game. The idea is simple: agents that 'think' during idle time (reorganising, consolidating, refining their memories) perform better and cost less at query time. OpenAI's internal data agent already runs this pattern in production. Their engineering team describes a daily offline pipeline that aggregates table usage, human annotations, and code-derived enrichment into a single normalised representation, then converts it into embeddings for retrieval. At query time, the agent pulls only the most relevant context rather than scanning raw metadata.

Letta's research puts numbers to it: agents using this approach achieve 18% accuracy gains and 2.5x cost reduction per query. We're going to see a clear separation between 'thinking agents' that run in the background and 'serving agents' that handle real-time interactions. That's a pattern databases have supported forever: batch processing alongside real-time queries.

Memory will extend naive RAG implementations. The spectrum is already shifting: traditional RAG to agentic RAG to full memory systems. VentureBeat predicts that contextual memory will surpass RAG for agentic AI in 2026. I think that's right. RAG retrieves documents. Memory understands context. The agents that win will do both, but memory will be the differentiator.

The convergent database will become non-negotiable. Agent memory needs vectors, graphs, relational data, and temporal context working together. Stitching together separate databases for each type creates brittle systems with security gaps and consistency problems. I'm still figuring out exactly how fast this consolidation will happen, but the direction is clear.

One open question remains, and that is the pace at which enterprises will transition from pilot to production deployment. At the moment the technology is at a clear stage of maturity and architectural design patterns are proven and battle tested. On the other hand, organisational readiness, encompassing governance, infrastructure modernisation, and cross-functional alignment, is a fundamentally different challenge.

What is clear: agent memory is, at its foundation, a database problem. And building databases for mission-critical workloads is what Oracle has been doing for nearly five decades.

Frequently Asked Questions

What are the main types of agent memory used in AI systems?

The field has converged on four types, drawn from cognitive science: working memory (current conversation context), procedural memory (system prompts and decision logic), semantic memory (accumulated facts and user preferences), and episodic memory (past interaction logs and experiences). Every major framework builds on this taxonomy, first formalised in the CoALA framework from Princeton in 2023.

What options are available for adding memory to an AI agent?

Two broad approaches exist. Programmatic memory is where the developer defines what gets stored and retrieved. Agentic memory is where the agent itself decides what to remember, update, and forget using tool calls. Frameworks like Letta (formerly MemGPT) and LangChain's LangMem SDK support both patterns. The field is moving towards agentic memory, where agents manage their own state without developer intervention for each new use case.

What are common agent memory storage options?

Production systems typically combine three paradigms: vector stores for meaning-based retrieval (storing embeddings and querying by cosine similarity), knowledge graphs for relationship-aware retrieval (entities, edges, and bi-temporal modelling), and structured relational databases for transactional data like user profiles, access controls, and audit logs. Most teams stitch these together with separate databases, though converged databases like Oracle can run all three natively in a single engine.

What techniques allow AI agents to forget or selectively erase memory?

The most common approach uses decay functions applied to vector relevance scores: a recency-weighted scoring function multiplies semantic similarity by an exponential decay factor based on time since last access. Memories that haven't been recalled recently lose salience gradually, mimicking biological memory decay. An alternative approach invalidates old facts without discarding them, preserving historical accuracy for audit trails while removing them from active retrieval.

What are the differences between short-term and long-term agent memory?

Short-term memory (also called working memory) is the current context window: whatever the agent is actively reasoning about in this conversation. It's fast but volatile; close the session and it's gone. Long-term memory encompasses everything that persists across sessions: semantic memory (facts and preferences), episodic memory (past interactions), and procedural memory (learned behaviours and decision logic). Long-term memory requires external storage and retrieval infrastructure.

What are commonly used libraries for agent memory?

The ecosystem includes LangChain/LangMem (hot path and background memory with extraction and consolidation), Letta/MemGPT (OS-inspired memory hierarchy where agents self-edit memory via tool calls), Zep/Graphiti (temporal knowledge graphs with sub-200ms retrieval), Mem0 (self-improving memory with automatic conflict resolution), and langchain-oracledb (Oracle Database integration for vector stores, hybrid search, and embeddings with enterprise-grade security).

How do I store and query vector embeddings?

The core pattern is straightforward: convert text into embeddings (typically 128 to 2,048 dimensions), store them in a vector-capable database, and retrieve them using cosine similarity search. With langchain-oracledb and Oracle Database, you initialise a vector store, add texts with metadata (such as user ID and memory type), then query with similarity_search() filtered by metadata. Oracle handles vector indexing, ACID transactions, and security natively.

Which databases offer vector search capabilities for enterprises?

Several databases now support vector search, but enterprise requirements go beyond basic similarity queries. You need ACID transactions, row-level security, multi-tenancy, and compliance features alongside your vector operations. Oracle Database provides native AI Vector Search within its converged architecture, meaning vector queries run in the same engine as relational tables, property graphs (SQL/PGQ), and JSON document stores, all sharing a single transaction boundary and security model.

What Is the AI Agent Loop? The Core Architecture Behind Autonomous AI Systems

Wojtek Pluta — Fri, 17 Apr 2026 09:05:55 +0000

Key Takeaways

The architectural difference between a chatbot and an AI agent is one pattern: the agent loop. It’s an LLM invoking tools inside an iterative cycle, repeating until the task is complete or a stopping condition is reached.
A chatbot responds in a single pass. An agent persists, adapts, and acts across multiple steps: perceiving its environment, reasoning over available options, executing an action, and observing the result before deciding what comes next.
Every major AI company (OpenAI, Anthropic, Google, Microsoft, Meta) has converged on this same core pattern, despite building very different products around it.
Building agent loops for production requires engineering for two constraints: cost, where agents consume approximately 4x more tokens than standard chat interactions and up to 15x in multi-agent systems, and observability, the ability to trace every reasoning step, tool call, and decision across an iterative execution cycle.

What Is the AI Agent Loop and Why Should You Care?

The five-stage agent loop: Perceive, Reason, Plan, Act, Observe

You have built a chatbot. It works. Users ask a question, it generates a response, and the interaction is complete. Then someone asks it to do something that requires more than one step.

‘Find me the three cheapest flights to Tokyo next month, check if my loyalty points cover any of them, and book the best option’. The chatbot has no mechanism to proceed. It generates a response and stops. It can answer questions about flights. It can explain how loyalty points work. It cannot execute the workflow. The interaction is stateless. Each prompt is processed in isolation, with no persistent context, no access to intermediate results, and no ability to chain decisions across steps.

This is not a limitation of the model. Chat-GPT, Claude, and Gemini are all capable of reasoning through multi-step problems. The limitation is architectural. A chatbot is built to respond. An agent is built to act.

The difference is one while loop.

What is the Agent Loop?

The AI agent loop is the iterative execution cycle at the core of every agentic AI system. At each iteration, the agent assembles context from available inputs, invokes an LLM to reason and select an action, executes that action, observes the outcome, and feeds the observation back into the next iteration. This process repeats until the task is complete or a defined stopping condition is reached.

Across the engineering teams Oracle works with building AI applications, one architectural pattern consistently separates working prototypes from production-grade systems: the agent loop. It’s the architecture that transforms a language model from a text generation system into one that can take actions, adapt to results, and complete multi-step tasks autonomously.

This article examines the agent loop architecture: what it is, how it works, why every major AI company has converged on the same core pattern, and what is required to build one that holds up in production.

All code in this article is available as a runnable companion notebook in the Oracle AI Developer Hub on GitHub. Follow along step by step or execute the full implementation end to end.

Why Single-Pass Responses Hit a Wall

Single-pass chatbot vs. iterative agent loop: one response versus continuous execution until task completion

The standard chatbot interaction follows a simple pattern: user sends message, model generates response, done. One input, one output, no state between turns. It works brilliantly for question-answering, summarisation, and creative writing. It falls apart from the moment you need the model to do something in the real world.

A single-pass response has three fundamental constraints:

It cannot iterate on results. A single-pass system can execute a tool call within a turn, but it has no mechanism to evaluate whether that action succeeded, adapt based on the outcome, or chain a subsequent decision from the result. There is no feedback loop.
It cannot recover from failure. Without iterative execution, a failed tool call, an empty result set, or an ambiguous API response cannot trigger a revised strategy. The model has no visibility into downstream outcomes.
It cannot decompose dependent tasks. Real-world workflows require gathering information, making decisions based on that information, executing actions, and handling the consequences of those actions. Each step depends on the result of the previous one. That is a loop, not a straight line.

Russell and Norvig defined an agent back in 1995 as 'anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators.' That definition is 30 years old and it still holds. The key word is acting. Not responding. Acting.

The ReAct framework from Princeton and Google Research (Yao et al., 2022) made this practical for LLMs by interleaving reasoning with action in a single prompt-driven loop. The results demonstrated that models perform significantly better when they can reason, act, observe, and reason again: a 34% improvement on ALFWorld and 10% on WebShop. Single-pass responses are not just architecturally limiting. They leave measurable performance on the table.

The Agent Loop: A Mental Model

The five-stage agent loop: Perceive, Reason, Plan, Act, Observe

The agent loop operates across five stages that repeat until the task is complete or a stopping condition is met:

Perceive: The agent receives input. This could be a user message, an API response, an error, or the result of its last action.
Reason: The LLM processes everything in context and decides what to do next.
Plan: For complex tasks, the agent decomposes the objective into discrete subtasks before execution. Simpler workflows proceed directly to the Act stage without a dedicated planning step.
Act: The agent executes something: a tool call, an API request, a database query, a code execution.
Observe: The agent examines the result. Did it work? Is the task complete? Does the plan need adjusting?

Then, it loops back to step 1.

In pseudocode, the complete pattern reduces to six lines:

while not done:
   response = call_llm(messages)
   if response has tool_calls:
      results = execute_tools(response.tool_calls)
      messages.append(results)
   else:
      done = True
      return response

This execution pattern underpins every autonomous AI system currently in production. It is the foundation on which every major AI organisation has built its agentic architecture. Anthropic's engineering guidance describes the pattern plainly: agents are often just LLMs using tools based on environmental feedback in a loop.

When the Agent Loop Is Not the Right Architecture

The agent loop is not the appropriate architecture for every use case. Before building an agentic system, validate that the workflow requires iterative execution.

Agent loops are well-suited to tasks where the number of required steps cannot be predicted in advance, where the agent must adapt based on intermediate results, and where the cost of latency is acceptable relative to the value of task completion.

Workflows that follow a fixed, predictable sequence of steps are better served by deterministic pipelines. Single-step tasks that require one LLM call and one tool invocation do not benefit from the overhead of an agent loop. Tasks where latency is the primary constraint should be evaluated carefully, as each loop iteration adds LLM call latency.

The principle from both OpenAI and Anthropic's published guidance is consistent: start with the simplest architecture that solves the problem. Introduce the agent loop only when iterative reasoning and adaptive tool use are required.

How Every Major AI Company Converged on the Same Architecture

Six major AI organisations, one underlying architecture: LLM plus tools in a loop

Despite differences in SDK design, nomenclature, and architectural philosophy, every major AI organisation has converged on the same underlying execution pattern. The table below maps each implementation against the five stages of the core loop:

Company	What they call it	Core pattern	Key contribution
OpenAI	Agent Loop	Tool-calling loop via Codex SDK	Code-first approach; anti-declarative-graph philosophy
Anthropic	Agent loop	Augmented LLM + tools in loop	Simplicity-first design; workflows vs. agents distinction
Google	Orchestration layer	ReAct (Thought-Action-Observation)	Invented Chain-of-Thought and co-created ReAct
Microsoft	Think-Act-Learn	Conversation-driven loop	Dual-loop ledger planning (Magentic-One)
Meta	Agent loop	ReAct via Llama Stack	Open-source building blocks; security-first ('Rule of Two')
LangChain	Agent executor / StateGraph	Tool-calling state machine	Graph-based orchestration; middleware hooks for control

The implementations differ in naming conventions, SDK design, and architectural philosophy. The execution pattern is identical.

Lilian Weng's formula captures it simply: Agent = LLM + Memory + Planning + Tool Use. The agent loop is the runtime that ties those four components together.

How the Agent Loop Actually Works

Three iterations, three tool calls, one complete response.

The canonical pattern is ReAct: reasoning interleaved with acting. The model does not simply select a tool. It reasons about why that tool is appropriate, executes the call, processes the result, and reasons again.

To illustrate how the loop executes in practice, consider the following task: identify the most cited paper on agent memory published in 2026 and summarise its key findings.

Iteration 1 (Reason → Act → Observe):

The agent reasons that it needs to search for papers on agent memory from 2026 and selects the search tool. It calls the search API with relevant keywords. The result returns 15 papers with citation counts.

Iteration 2:

The agent identifies the top result with 340 citations and calls a document retrieval tool to access the full abstract and key sections.

Iteration 3:

The agent determines that sufficient information has been gathered, generates the summary, and exits the loop.

Three iterations. Three tool calls. One complete answer that no single-pass chatbot could have produced.

Tool integration: the universal pattern

Across every provider, tool integration follows the same structure. Tools are defined with a name, description, and JSON Schema parameters. The model decides whether to call a tool and with what arguments. The system executes the function and returns results as a tool message. The model processes results and decides whether to continue looping or return a final response.

Tools in an agent loop can be classified into three categories. Data tools retrieve context, such as database queries, vector search, or document retrieval. Action tools perform operations with side effects, such as writing records, calling external APIs, or executing code. Orchestration tools invoke other agents as callable sub-modules, enabling multi-agent coordination within a single workflow. Clear classification of tools at design time reduces ambiguous model behaviour at runtime.

Anthropic's Model Context Protocol (MCP) has emerged as a leading open standard for how agents discover and connect to external tools, with adoption across OpenAI, Google, Microsoft, and the broader ecosystem.

Beyond the basic loop

The core ReAct loop handles most use cases, but the pattern extends in two important directions.

Plan-and-execute separates planning from execution. Instead of invoking the LLM at every step, a planner generates a full task breakdown upfront, an executor works through each subtask, and a re-planner adjusts when execution diverges from the plan. LangChain's LLMCompiler implementation streams a directed acyclic graph of tasks with explicit dependency tracking, enabling parallel execution. The original paper (Kim et al., ICML 2024) reports a 3.6x speedup over sequential ReAct-style execution. At production scale, where each LLM call carries a direct cost, the architectural decision to plan upfront rather than reason at every step has measurable financial implications.

Multi-agent orchestration distributes work across specialised agents. Anthropic's Claude Research system uses an orchestrator-worker pattern where a lead agent spawns sub-agents to explore different threads in parallel. Their multi-agent system outperformed a single-agent setup by 90.2% on internal research evaluations. Microsoft's Magentic-One takes it further with a dual-loop system: an outer loop for strategic planning and an inner loop for step-by-step execution, with the ability to reset the entire strategy when progress stalls.

These are powerful extensions, but the advice from every company is the same: start with the simplest loop that works. Only add complexity when you can measure the improvement.

The Enterprise Reality: Cost and Observability

Token cost scaling from standard chat (1x) to single agent (4x) to multi-agent (15x), with corresponding production requirements

Agent loops that perform well in controlled environments frequently expose new failure modes at production scale. The two constraints that dominate enterprise deployments are cost and observability.

Cost scales with iteration

Every loop iteration is an LLM call. Anthropic's internal data shows that agents consume roughly 4x more tokens than standard chat. Multi-agent systems push that to approximately 15x. At thousands of agent sessions per day, token costs compound with every loop iteration. Without cost controls embedded at the architecture level, this becomes a significant operational constraint.

The mitigation strategies are architectural. Plan-and-execute patterns reduce the number of LLM calls by planning upfront rather than reasoning at every step. Caching commonly retrieved tool results avoids redundant work. Setting token and cost budgets per agent run prevents runaway spending. These controls must be designed into the system from the start, not added retroactively.

Observability: knowing what your agent did and why

A standard chat interaction produces a single response from a single LLM call. An agent running 15 iterations, calling 8 different tools, and branching across multiple reasoning paths produces a complex execution trace. When a failure occurs, diagnosing it requires structured visibility into every stage of that trace: what the model reasoned, which tool it invoked, what arguments it passed, what the result was, and how the model interpreted that result before the next iteration.

Production agent systems need structured logging at every stage of the loop: what the model reasoned, which tool it called, what arguments it passed, what came back, and how it interpreted the result. Microsoft's AutoGen 0.4 builds on OpenTelemetry for this. LangChain's middleware hooks (before_model, after_model, modify_model_request) let you intercept and inspect every iteration.

Stopping conditions are the other critical piece. Without them, agents can loop indefinitely, burning tokens and producing increasingly incoherent results. Every production system needs maximum iteration limits, no-progress detection (exiting when repeated iterations produce no new information), and token/cost budgets as hard guardrails.

The following scenario illustrates the consequence of deploying an agent loop without hard stopping conditions:

An agent is deployed to scrape a website and summarise the data. The target website updates its structure, causing the scraping tool to return an empty result. The agent lacks a hard stopping condition, and its prompt instructs it to retry until data is retrieved. It enters a runaway loop, calling the broken tool 400 times in five minutes and consuming thousands of tokens before hitting a platform rate limit. A maximum iteration limit of three cycles would have prevented the failure entirely.

Building an Agent Loop with LangChain and Oracle

Before selecting a framework or writing code, address the following implementation requirements. These apply regardless of which orchestration library is used:

Identify tools and schema: What actions can the agent take, and what exact parameters do those tools need?
Choose state representation: How will you store the conversation history and intermediate tool results?
Define stopping criteria: What are the hard limits (iterations, tokens, budget) that will force the loop to terminate?
Establish logging and telemetry: How will you track each reasoning step, tool call, and result?
Select a memory layer: Where will you store persistent knowledge (like vector embeddings or user preferences) across sessions?

Here is one concrete way to implement that checklist using LangChain and Oracle.

import oracledb
from langchain.agents import create_agent
from langchain_core.tools import tool
from langchain_core.messages import AIMessage, ToolMessage

# Connect to Oracle AI Database
pool = oracledb.create_pool(
    user="agent_user", password="password",
    dsn="hostname:port/service",
    min=2, max=10, increment=1,
)

# Define tools the agent can use
@tool
def calculate(expression: str) -> str:
    """Evaluate a mathematical expression and return the numeric result."""
    ...

@tool
def convert_units(value: float, from_unit: str, to_unit: str) -> str:
    """Convert a numeric value from one unit to another."""
    ...

@tool
def timezone_convert(time_str: str, from_city: str, to_city: str) -> str:
    """Convert a local time from one city's timezone to another."""
    ...

# Create the agent -- returns a compiled StateGraph that runs the loop
agent = create_agent(
    model=llm,
    tools=[calculate, convert_units, timezone_convert],
    system_prompt="You are a precise reasoning assistant. Use tools for all calculations.",
)

QUESTION = (
    "A flight from London to New York JFK covers 5,570 km. "
    "The aircraft cruises at 900 km/h. "
    "The flight departs London at 14:00 local time. "
    "How long is the flight in hours and minutes, "
    "and what local time does it arrive in New York?"
)

# Stream the loop live -- each chunk shows one stage of the agent's reasoning
for chunk in agent.stream(
    {"messages": [("human", QUESTION)]},
    stream_mode="values",
):
    last_msg = chunk["messages"][-1]
    if isinstance(last_msg, AIMessage) and last_msg.tool_calls:
        for call in last_msg.tool_calls:
            print(f"[ACT] \u2192 {call['name']}({call['args']})")
    elif isinstance(last_msg, ToolMessage):
        print(f"[OBSERVE] \u2190 {last_msg.content}")
    elif isinstance(last_msg, AIMessage) and last_msg.content:
        print(f"\nAnswer: {last_msg.content}")

The implementation above is a working agent loop. The compiled agent graph manages the while loop internally, invoking the LLM, evaluating tool calls, executing them, appending results to the message state, and repeating until the model returns a final response without further tool calls or the recursion limit is reached.

Oracle AI Database provides the storage backend for the tools the agent calls. Vector search for semantic retrieval, relational tables for structured data, and ACID transactions ensuring that every tool call either fully succeeds or fully rolls back. No partial state. No corrupted memory.

We've published a complete, runnable notebook that implements a full agent loop architecture with LangChain and Oracle AI Database in the Oracle AI Developer Hub.

Run the notebook → oracle-devrel/oracle-ai-developer-hub

Where This Is Heading

Three structural shifts are emerging in how production agent systems are designed and operated.

The core loop architecture is stable. The active area of development is the infrastructure built around it: context management, multi-loop coordination, and decision auditability. The while loop itself is not changing. What is evolving is how context is managed within it, how multiple loops are coordinated together, and how the loop's decisions are made auditable and controllable.

Agent middleware is emerging as the standard abstraction layer for production systems. LangChain's recent work on middleware hooks (intercepting the loop at before_model, after_model, and modify_model_request) suggests a future where developers don't modify the loop itself but layer behaviour on top of it: summarisation, PII redaction, human-in-the-loop approval, dynamic model switching. It's the same pattern that made web frameworks powerful: don't change the request-response cycle, add middleware to it.

Cost-per-task will replace cost-per-token as the primary efficiency metric. Token usage is an input measure. The metric that reflects actual business value is the total cost required to complete a task end to end, including LLM calls, tool executions, and any human escalations triggered by agent failures.

An agent that consumes 15x more tokens but resolves a customer issue without human escalation is cheaper than a chatbot that consumes fewer tokens but requires human intervention to complete the task.

The primary open question in production agent deployment is the pace at which observability tooling will mature. Debugging a 20-iteration agent run currently requires piecing together structured logs, tool call traces, and LLM reasoning outputs across multiple systems. The industry needs better tooling for tracing, replaying, and interpreting agent decisions. The building blocks exist in OpenTelemetry, structured logging, and middleware hooks. The developer experience remains the unsolved problem.

The agent loop is the foundational pattern for any AI system that needs to do more than generate a response. It is the architectural starting point for production-grade autonomous AI.

Frequently Asked Questions

What is an AI agent loop?

The AI agent loop is an iterative architecture where a large language model repeatedly reasons about a task, takes an action (typically a tool call), observes the result, and decides what to do next. The cycle continues until the task is complete or a stopping condition is met. In its simplest form, it's an LLM calling tools inside a while loop. This pattern, formalised in the ReAct framework (2022), is the core architecture behind every major autonomous AI system shipping today.

What is the architectural difference between an AI agent and a chatbot?

A chatbot generates a single response to a single input. It answers questions but cannot execute multi-step actions or adapt based on intermediate results. An AI agent uses the agent loop to iteratively reason, act, and observe, handling complex tasks that require multiple steps, tool interactions, and course corrections. The architectural difference is simple: a chatbot is one LLM call; an agent is an LLM calling tools in a loop until the job is done.

How does the ReAct framework work?

ReAct (Reasoning + Acting) interleaves reasoning traces with tool actions in a prompt-driven loop. At each step, the model generates a 'thought' explaining its reasoning, takes an 'action' by calling a tool, and receives an 'observation' with the result. This cycle repeats until the task is complete. The key innovation is that reasoning and acting reinforce each other: the model reasons about what to do (reason to act) and uses action results to inform further reasoning (act to reason).

What are common patterns for multi-agent orchestration?

Three patterns dominate. The manager pattern uses a central agent that delegates subtasks to specialised sub-agents via tool calls (used by OpenAI's Agents SDK). The orchestrator-worker pattern has a lead agent spawning workers for parallel exploration (used by Anthropic's Claude Research). The handoff pattern treats agents as peers that transfer control to one another based on specialisation. Most production systems start with a single agent loop and only move to multi-agent orchestration when task complexity genuinely demands it.

How do you prevent an AI agent from running forever?

Production agent loops use multiple stopping conditions layered together. Maximum iteration limits cap the number of loop cycles (for example, max_iterations=10). Token and cost budgets set hard spending limits per agent run. No-progress detection exits the loop when repeated iterations produce no new information. Goal-achievement checks evaluate whether the task objective has been met. Microsoft's Magentic-One adds a dual-loop approach where the outer loop can reset the entire strategy when the inner loop stalls, preventing the agent from spinning on a failed approach.

Building ONNX Embedding Workflows in Oracle AI Database with Python

Wojtek Pluta — Fri, 17 Apr 2026 08:44:35 +0000

A practical guide to importing an ONNX embedding model, generating embeddings, and running semantic search in Oracle AI Database

Companion notebook: ONNX In-Database Embeddings with Oracle AI Database 26ai

Key Takeaways

Oracle AI Database can load and register an augmented ONNX embedding model with DBMS_VECTOR.LOAD_ONNX_MODEL().
VECTOR_EMBEDDING() lets SQL generate embeddings directly inside Oracle AI Database.
Embeddings can be stored natively in VECTOR columns.
VECTOR_DISTANCE() enables semantic search directly in SQL.
LangChain can build on the same Oracle-native workflow without moving embeddings or retrieval outside the database (LangChain Oracle vector store integration).

Why This Matters

In many embedding pipelines, source data resides in a relational database, the model runs somewhere else as an external service, and the vectors are stored in a separate vector database. While this architecture can work well, it introduces additional data movement, infrastructure, and operational complexity.

Oracle AI Database supports a more consolidated approach. You can load an ONNX embedding model directly into the database, invoke it, store the generated embeddings in native VECTOR columns, and perform semantic search in the same database.

This article walks through that end-to-end workflow using an ONNX model: loading it into Oracle AI Database, validating that it is registered correctly, generating embeddings with SQL, storing them in a native vector column, and querying them using semantic similarity. It also demonstrates how the same architecture can be used with LangChain, without changing where embedding and retrieval occur.

What You'll Learn

How to load an augmented ONNX model with Oracle AI Database.
How to generate embeddings directly in SQL with VECTOR_EMBEDDING().
How to run semantic search with VECTOR_DISTANCE() in Oracle AI Database and through LangChain.

Architecture Overview

This workflow keeps model execution, vector storage, and semantic retrieval inside Oracle AI Database. An augmented ONNX model is exposed through an Oracle directory object, loaded with DBMS_VECTOR.LOAD_ONNX_MODEL(), invoked with VECTOR_EMBEDDING(), and queried with VECTOR_DISTANCE(). The model artifact can come either from a local or container-mounted path or directly from Oracle Cloud Object Storage using DBMS_VECTOR.LOAD_ONNX_MODEL_CLOUD(). LangChain can build on the same Oracle-native execution path through OracleEmbeddings and OracleVS.

Prerequisites

Python 3.10+
Oracle AI Database 26ai running in a container
Dependencies such as oracledb, python-dotenv, pandas, numpy, langchain, langchain-community, and langchain-oracledb
For cloud loading: an Oracle Cloud Object Storage bucket and model URI, or a PAR URL
If not using a PAR URL, an Object Storage credential created with DBMS_CLOUD.CREATE_CREDENTIAL

In the notebook, those packages are installed up front:

import subprocess
import sys

result = subprocess.run(
    [sys.executable, "-m", "pip", "install", "-q",
     "oracledb", "python-dotenv", "pandas", "numpy",
     "langchain", "langchain-core", "langchain-community", "langchain-oracledb"],
    capture_output=True, text=True
)
print("Packages installed." if result.returncode == 0 else f"Install failed: {result.stderr}")

The example also assumes Oracle AI Database 26ai is running in a container, with a mounted directory for ONNX model files. That mounted directory becomes important later, because Oracle accesses the model through a database directory object rather than through ad hoc file access.

Step-by-Step Guide

Step 1: Understand why Oracle requires an augmented ONNX model

One of the most important details in this workflow is that Oracle needs an augmented ONNX model, not just a standard transformer export.

For VECTOR_EMBEDDING() to accept raw text directly, tokenization and related preprocessing need to be included inside the ONNX graph itself. That is what allows Oracle to take a normal text string and produce an embedding without relying on external preprocessing in Python.

In the notebook, the model used is an augmented version of all-MiniLM-L12-v2:

MODEL_NAME = "all_MiniLM_L12_v2"
ONNX_FILE = "all_MiniLM_L12_v2.onnx"

Without that augmented packaging, the flow would no longer be fully Oracle-native, because preprocessing would have to happen outside the database first.

Step 2: Prepare an ONNX model for Oracle AI Database

Before the model can be used in SQL, Oracle needs controlled access to the ONNX file through a database directory object. This is a database-managed reference to a filesystem location, which means access to the model artifact is handled through Oracle privileges rather than through direct filesystem assumptions.

The notebook includes a one-time admin setup that creates the user, grants privileges, and registers the ONNX model directory. At runtime, the important pieces are:

a database user with the required privileges
permission to load mining models
a registered Oracle directory such as ONNX_DIR
access to the ONNX file from inside the container

A simplified version of the directory setup looks like this:

CREATE OR REPLACE DIRECTORY ONNX_DIR AS '/opt/oracle/onnx_models';
GRANT READ, WRITE ON DIRECTORY ONNX_DIR TO my_user;

This matters because the model import is not treated as an ad hoc file operation. The file is exposed to Oracle through a controlled database object, which is much more aligned with enterprise governance expectations.

Figure 1. An augmented ONNX model is exposed through an Oracle directory object, loaded with DBMS_VECTOR.LOAD_ONNX_MODEL(), registered in Oracle, and invoked from SQL.

Step 2b: Cloud option - load ONNX from Oracle Object Storage

Oracle also supports loading ONNX models from Oracle Cloud Object Storage with DBMS_VECTOR.LOAD_ONNX_MODEL_CLOUD(). This is a documented alternative to the local directory workflow used in the companion notebook.

Per Oracle documentation, use a credential for standard Object Storage URIs, and pass credential => NULL for pre-authenticated request (PAR) URLs.

-- Option A: regular Object Storage URI (credential required)
EXECUTE DBMS_VECTOR.LOAD_ONNX_MODEL_CLOUD(
  model_name => 'ALL_MINILM_L12_V2',
  credential => 'OBJ_STORE_CRED',
  uri        => 'https://objectstorage.<region>.oraclecloud.com/n/<namespace>/b/<bucket>/o/all_MiniLM_L12_v2.onnx',
  metadata   => JSON('{
    "function":"embedding",
    "embeddingOutput":"embedding",
    "input":{"input":["DATA"]}
  }')
);

-- Option B: PAR URL (credential must be NULL)
EXECUTE DBMS_VECTOR.LOAD_ONNX_MODEL_CLOUD(
  model_name => 'ALL_MINILM_L12_V2',
  credential => NULL,
  uri        => 'https://objectstorage.<region>.oraclecloud.com/p/<par-token>/n/<namespace>/b/<bucket>/o/all_MiniLM_L12_v2.onnx'
);

Note: According to Oracle documentation, metadata is optional for models prepared with Oracle's Python utility defaults, model names must follow Oracle naming rules, and the ONNX file size limit for cloud loading is 2 GB.

Step 2c: Multi-cloud note (AWS/GCP/Google Drive)

DBMS_VECTOR.LOAD_ONNX_MODEL_CLOUD() is documented for Oracle Cloud Object Storage. If your model artifact is hosted in AWS S3, Google Cloud Storage, or Google Drive, use a portable two-step pattern: download the ONNX file to a database-accessible local path, then load it with DBMS_VECTOR.LOAD_ONNX_MODEL().

This keeps embedding generation and semantic retrieval Oracle-native while allowing model artifact hosting outside OCI.

import os
import requests

model_url = os.environ["MODEL_SIGNED_URL"]  # S3 pre-signed URL / GCS signed URL / Drive direct URL
target_path = "/opt/oracle/onnx_models/all_MiniLM_L12_v2.onnx"

resp = requests.get(model_url, stream=True, timeout=180)
resp.raise_for_status()

with open(target_path, "wb") as f:
    for chunk in resp.iter_content(chunk_size=1024 * 1024):
        if chunk:
            f.write(chunk)

print(f"Model downloaded to {target_path}")

BEGIN
  DBMS_VECTOR.LOAD_ONNX_MODEL(
    directory  => 'ONNX_DIR',
    file_name  => 'all_MiniLM_L12_v2.onnx',
    model_name => 'ALL_MINILM_L12_V2'
  );
END;
/

Step 3: Connect to Oracle AI Database from Python

The notebook connects to Oracle AI Database using python-oracledb in Thin mode, so no Oracle Client libraries are required:

import oracledb

conn = oracledb.connect(...)
print("Connected to Oracle AI Database")

That same connection is then reused across the SQL examples and the LangChain integration later in the notebook.

To keep the notebook readable, it defines a small helper function for executing SQL and optionally returning results as a pandas DataFrame:

import pandas as pd

def run_sql(sql, params=None, fetch=False, many=False, data=None):
    """Execute SQL against Oracle Database."""
    with conn.cursor() as cur:
        if many and data:
            cur.executemany(sql, data)
        elif params:
            cur.execute(sql, params)
        else:
            cur.execute(sql)

        if fetch:
            cols = [c[0] for c in cur.description]
            return pd.DataFrame(cur.fetchall(), columns=cols)

    conn.commit()

Step 4: Load an ONNX embedding model into Oracle AI Database

The notebook does not assume the ONNX model is already present. If the file is missing, it downloads the official pre-built augmented model and places it in the model directory used by Oracle.

Once the model file is available, either through an Oracle directory object or a cloud URI, it can be imported with DBMS_VECTOR.LOAD_ONNX_MODEL() or DBMS_VECTOR.LOAD_ONNX_MODEL_CLOUD().

A simplified version of the local directory call looks like this:

BEGIN
  DBMS_VECTOR.LOAD_ONNX_MODEL(
    directory  => 'ONNX_DIR',
    file_name  => 'all_MiniLM_L12_v2.onnx',
    model_name => 'ALL_MINILM_L12_V2',
    metadata   => JSON('{
      "function":"embedding",
      "embeddingOutput":"embedding",
      "input":{"input":["DATA"]}
    }')
  );
END;
/

This is the point where the model becomes more than a file. Oracle registers it, stores the associated metadata, and exposes it as a named object that SQL can invoke directly.

The metadata is especially important. It defines how Oracle maps the SQL input text into the model graph and identifies which output node should be used as the embedding vector. In the notebook, the workflow also checks whether the model already exists before reloading it. This makes reruns safer and ensures the workflow remains idempotent.

model_check = run_sql(
    "SELECT COUNT(*) AS cnt FROM USER_MINING_MODELS WHERE MODEL_NAME = UPPER(:model_name)",
    params={"model_name": MODEL_NAME},
    fetch=True
)

Expected output: the model check confirms whether the ONNX model is already registered, so reruns stay idempotent.

Step 5: Verify that Oracle registered the model correctly

After the import, the next step is to validate that Oracle recognizes the model.

The notebook queries the model catalog to verify that the ONNX model has been loaded successfully:

SELECT model_name, mining_function, algorithm
FROM user_mining_models
WHERE model_name = 'ALL_MINILM_L12_V2';

This is a small but important part of the workflow. It confirms that the model is visible to Oracle as a registered object and is ready to be used by the vector functions that come next.

Expected output: the query returns the registered ONNX model from USER_MINING_MODELS.

Step 6: Generate embeddings in SQL with VECTOR_EMBEDDING()

Once the model is registered, Oracle can use it directly through VECTOR_EMBEDDING().

The notebook first tests this with a simple text input to confirm that the model works and that the returned vector has the expected size.

SELECT VECTOR_EMBEDDING(
         ALL_MINILM_L12_V2
         USING 'Oracle Database supports vector search.' AS DATA
       ) AS embedding
FROM dual;

This is one of the most important parts of the article. Embedding generation is no longer a separate service call. It becomes a SQL operation:

the application does not need to call an external embedding API
the database can generate embeddings internally
the semantic representation stays close to the data it describes

Expected output: Oracle returns a 384-dimensional embedding for the supplied text.

Step 7: Store embeddings in a native VECTOR column

After validating embedding generation, the notebook creates a table where the source text and its embedding live together.

CREATE TABLE onnx_docs (
  id        NUMBER GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
  category  VARCHAR2(100),
  doc_text  CLOB,
  embedding VECTOR(384, FLOAT32)
);

This is an important design choice. The vector is not stored as an opaque blob or external payload. It is stored in Oracle's native VECTOR type, which means it becomes part of the same database model as the relational data.

vectors stay linked to the exact rows they describe
access control applies consistently
backups and retention policies stay unified
the application does not need to coordinate data across multiple storage systems

The notebook inserts demo content and generates the embedding directly in the same SQL statement:

INSERT INTO onnx_docs (category, doc_text, embedding)
VALUES (
  'database',
  'Oracle AI Database supports in-database vector search and semantic retrieval.',
  VECTOR_EMBEDDING(
    ALL_MINILM_L12_V2
    USING 'Oracle AI Database supports in-database vector search and semantic retrieval.' AS DATA
  )
);

The semantic representation is created at the same time as the row is written, inside the same transactional boundary.

Figure 2. Embedding generation happens at insert time inside Oracle AI Database, where document text is embedded with VECTOR_EMBEDDING() and stored together with the row in a VECTOR column.

Before moving into retrieval, the notebook inspects the inserted rows:

SELECT id, category, DBMS_LOB.SUBSTR(doc_text, 120, 1) AS preview
FROM onnx_docs
ORDER BY id;

Step 8: Run semantic search in SQL and LangChain

Once embeddings are stored, semantic retrieval is handled entirely inside Oracle. The notebook uses VECTOR_DISTANCE() together with VECTOR_EMBEDDING() so that the query text is embedded on the fly and compared against the stored vectors:

SELECT
  id,
  category,
  DBMS_LOB.SUBSTR(doc_text, 200, 1) AS doc_preview,
  VECTOR_DISTANCE(
    embedding,
    VECTOR_EMBEDDING(ALL_MINILM_L12_V2 USING 'How does Oracle support semantic search?' AS DATA),
    COSINE
  ) AS distance
FROM onnx_docs
ORDER BY distance
FETCH FIRST 3 ROWS ONLY;

The user query is embedded directly within Oracle, where it is compared against stored document vectors. The results are then ranked by similarity, and the closest semantic matches are returned through SQL.

The notebook explicitly explains how to interpret the output: the smaller the cosine distance, the more semantically similar the document is to the query.

The notebook also runs several queries to validate that semantic ranking remains meaningful across different phrasings:

test_queries = [
    "Which Oracle feature helps semantic retrieval?",
    "Can I store embeddings in the database?",
    "How does LangChain work with Oracle vectors?",
    "Why are ONNX models useful here?"
]

Figure 3. At query time, Oracle embeds the input text, compares it with stored vectors using VECTOR_DISTANCE(), and returns the nearest semantic matches directly through SQL.

The notebook then adds an optional framework layer using LangChain:

from langchain_oracledb.embeddings import OracleEmbeddings
from langchain_oracledb.vectorstores.oraclevs import OracleVS

With OracleEmbeddings, the application can use Oracle's registered in-database embedding model:

oracle_embedder = OracleEmbeddings(
    conn=conn,
    params={"provider": "database", "model": MODEL_NAME}
)

The notebook also validates that the LangChain embedding call returns a vector of the expected size:

lc_embedding = oracle_embedder.embed_query(
    "Oracle AI Database performs semantic search using vectors."
)

print(f"Embedding dimension: {len(lc_embedding)}")
print(f"First 5 values: {lc_embedding[:5]}")

The notebook then uses OracleVS, a LangChain-compatible vector store backed by Oracle AI Vector Search.

from langchain_core.documents import Document
from langchain_oracledb.vectorstores.oraclevs import OracleVS
from langchain_community.vectorstores.utils import DistanceStrategy

langchain_docs = [
    Document(page_content="Oracle AI Database supports vector storage and semantic search."),
    Document(page_content="An ONNX embedding model can be loaded directly into Oracle."),
    Document(page_content="LangChain can use OracleVS to query Oracle AI Vector Search."),
    Document(page_content="Using in-database embeddings can reduce architectural complexity."),
]

vector_store = OracleVS.from_documents(
    documents=langchain_docs,
    embedding=oracle_embedder,
    client=conn,
    table_name="LC_ONNX_DEMO",
    distance_strategy=DistanceStrategy.COSINE
)

The notebook also runs a similarity query through the LangChain abstraction:

results = vector_store.similarity_search(
    "How can Oracle Database help with semantic retrieval?",
    k=3
)

for i, doc in enumerate(results, start=1):
    print(f"{i}. {doc.page_content}")

Validation & Troubleshooting

Validate that the model appears in USER_MINING_MODELS after DBMS_VECTOR.LOAD_ONNX_MODEL() or DBMS_VECTOR.LOAD_ONNX_MODEL_CLOUD().
Confirm that VECTOR_EMBEDDING() returns a 384-dimensional embedding for the loaded model.
If semantic ranking looks off, verify that the same model is used for both stored document embeddings and query embeddings.
If using cloud loading, verify URI or PAR validity, bucket path, region, and credential privileges.
When rerunning the notebook, check whether the model and demo tables already exist to avoid duplicate object errors.

Frequently Asked Questions

Why load the model into Oracle instead of calling an external API?

Because Oracle can generate embeddings directly in SQL, which reduces external dependencies and keeps data and inference inside the same system boundary.

Why does the model need to be augmented?

Because Oracle must be able to accept raw text input directly. That requires tokenization and preprocessing logic to already be included in the ONNX graph.

What does VECTOR_EMBEDDING() do?

It invokes the registered model inside Oracle and returns the embedding vector for the input text.

What does the VECTOR column store?

It stores the numeric embedding representation produced by the model. In this example, the vectors are 384-dimensional FLOAT32 values.

How is semantic similarity computed?

This workflow uses VECTOR_DISTANCE() with cosine distance to compare the stored document vectors with the embedded query.

Can the model be reused by multiple applications?

Yes. Once registered and granted appropriately, the model can be invoked by any application that has access to the Oracle environment.

Can I load the model from cloud storage instead of a local mounted directory?

Yes. Oracle AI Database supports DBMS_VECTOR.LOAD_ONNX_MODEL_CLOUD() for models in Oracle Cloud Object Storage, with either a credential or a PAR URL.

Does LangChain move embeddings outside Oracle?

No. LangChain provides a higher-level interface, but the model execution and vector search still run in Oracle.

Does this replace a separate vector database?

For many use cases, yes. Oracle provides native vector storage and vector search directly in the database.

Vector Embeddings: How They Work, Where to Store Them, and Best Practices

Wojtek Pluta — Thu, 16 Apr 2026 15:51:46 +0000

Key Takeaways

Vector embeddings convert unstructured data into numeric representations that power semantic search, recommendations, and multimodal analytics beyond keywords.
Embedding success isn’t just about the model—it also depends on a data platform that can meet requirements for scale, low latency, security, and governance, including vector indexing/ANN search, access controls, encryption, and monitoring.
Oracle AI Database unifies native vector types and similarity search, enterprise-grade security, and integrated vector, structured, and unstructured data—so teams can build RAG, search, and analytics without piecing together multiple systems.

Semantic similarity search over vector space - Oracle Help Center

Introduction to Vector Embeddings

Vector embeddings have changed the way we interact with unstructured data such as text, images, audio, and code. By transforming this data into high-dimensional numeric vectors, we can use embeddings to process the semantic meaning and relationships within the data.

We can look at embeddings as task or domain-specific representations of vectors. The geometric relationships among them represent meaningful similarities between concepts in semantic space. The efficient storage and querying of vector embeddings enables capabilities such as semantic search, recommendations, and advanced analytics; and bridges the gap between unstructured and structured information.

What are Vector Embeddings? A Definition and Their Role

Vector embeddings are mathematical representations of objects—such as words, sentences, images, or audio—encoded as dense, high-dimensional vectors. Each vector encapsulates features that capture semantic meaning, context, or structure of the data. For example, similar words or images will have embeddings positioned closely in the vector space, enabling similarity-based operations. This allows for similar “things” to be grouped together under a distance metric.

The adoption of vector embeddings underpins many cutting-edge technologies:

Retrieval-augmented generation (RAG): Enhances large language models by retrieving relevant context using embedding similarity.
Semantic search: Finds documents with similar context, not just matching keywords.
Recommendations: Suggests products or content by comparing user or item embeddings.
Deduplication and anomaly detection: Identifies near-duplicates or outliers based on embedding distances.
Multimodal analytics: Links information across text, image, audio, and other domains.

This ability to bridge structured and unstructured data makes embeddings indispensable across modern data architectures.

How to Create Embeddings? Some Tools That Can Help

A variety of tools can encode text, images, and code as vector embeddings, enabling similarity search, retrieval workflows (including RAG), and other ML tasks:

OpenAI – provides hosted embedding APIs backed by task-optimized models, accessible with REST interfaces.
Hugging Face – offers a large catalog of pre-trained multimodal embedding models and libraries (such as the Transformers library), plus community benchmarks.
Oracle AI Database – provides a native vector memory store in Oracle Database, enabling storage, indexing (e.g., IVF/flat/HNSW), and retrieval of vector embeddings alongside relational data with SQL and PL/SQL integration; supports hybrid search (vector + metadata filters), enterprise-grade security, and governance for RAG and semantic search workloads
TensorFlow – enables building and serving custom embedding models using Keras, enabling easy integration into training pipelines.
PyTorch – provides flexible primitives to fine-tune or implement embedding models, and deploy them via TorchScript.

Benefits of Working With Vector Embeddings

The following are just a few of the benefits vector embeddings have brought to today's AI tech stack:

Vector embeddings are currently the best way to transform complex data into numerical units that reflect meaning, similarity and enable clustering and retrieval beyond keyword matching.
The limitations of keyword methods were particularly visible in areas such as synonym handling, typos, and paraphrasing, and are now absent in modern-day LLMs relying on vector embeddings.
Embeddings support multilingual and cross-modal experiences by aligning meaning across languages and modalities.

Other approaches, such as sparse lexical retrieval and symbolic/ontology-based methods, can be effective, but dense vector embeddings are often a better fit when you need semantic similarity matching (for example, paraphrases and synonyms) rather than exact keyword overlap.

Challenges in Working With Vector Embeddings

The following are some of the potential challenges you may face in working with vector embeddings, and potential ways to mitigate them:

Storage Volume and High Dimensionality

Storage challenges include:

Large embedding volumes: Billions of vectors require scalable storage and efficient indexing.
High dimensionality: Embeddings of 128, 512, or 1024+ dimensions need specialized data structures and optimized storage formats.

Performance and Latency Bottlenecks

Performance factors include:

Indexing and search speed: ANN techniques improve latency, but very large datasets demand optimized infrastructure.
Batch insertion and streaming: Efficiently handling ongoing ingestion of new embeddings.

Distributed System Complexities and Operational Overhead

At scale, sharding, replication, and consistency management become complex. Automated scaling, monitoring, and failover are desirable for production systems.

Cost Factors

Vector embeddings may affect operational cost:

Compute and storage requirements: High-dimensional data and fast search consume substantial resources.
Operational overhead: Consider cost of infrastructure, team expertise, and maintenance.

Encryption at Rest and in Transit

Securing embeddings is crucial as they can encode sensitive information:

Encryption at rest: Protects stored vectors using strong industry-standard algorithms.
Encryption in transit: Ensures vectors remain confidential when transmitted between systems or users.

Oracle AI Database enforces encryption by default and integrates with enterprise key management solutions.

Access Control and Authentication

Control who can access, modify, or query embeddings:

Granular permissions: Define user roles and table-level permissions.
Integration with SSO and identity providers: Streamlines enterprise authentication.
Audit trails: Track access and changes for compliance.

Data Sanitization and Monitoring

Reduce risk by implementing:

Sanitization: Remove or obfuscate sensitive or personal information in embeddings before storage.
Monitoring and anomaly detection: Detect unusual access patterns or potential misuse.

Advanced Cryptographic Techniques

For highly sensitive embeddings:

Homomorphic encryption or secure multi-party computation: Enables computation and search on encrypted embeddings, minimizing exposure.

Common Vector Embedding Use Cases

Embeddings open up a wide array of practical use cases:

Enterprise search and information retrieval: Improved accuracy and relevance in document and knowledge base searches.
Personalization and recommendation engines: Enhanced user experiences by surfacing relevant content or products.
Fraud and anomaly detection: Early identification of unusual patterns using embedding distances.
Data deduplication and clustering: Streamlined datasets and improved analytics through intelligent grouping.
Multimodal retrieval and analytics: Unified analysis over diverse data types, fostering deeper insights.

Storing Vector Embeddings and the Oracle Advantage

The following are a few key points related to the storage of vector embeddings, and how Oracle AI Database's native vector store capabilities can streamline and strengthen your stack with its native vector store capabilities.

Specialized Vector Databases

Dedicated vector databases are built for storing, indexing, and searching embeddings efficiently. These databases excel at large-scale similarity search with features such as:

High-dimensional indexing: Specialized data structures to support billion-scale embeddings.
Approximate search capabilities: Fast, scalable similarity queries using Approximate Nearest Neighbor (ANN) techniques.
RESTful APIs and SDKs: Developer-friendly interfaces for ingestion and search.

Popular examples include Pinecone, Weaviate, Milvus, and Vespa. Specialized databases are ideal for workloads with large volumes of embeddings and demanding similarity search requirements.

SQL/NoSQL Databases with Vector Support

Traditional databases are evolving to meet AI's demands by adding native vector data types and search capabilities:

SQL databases: PostgreSQL (with pgvector), Oracle AI Database, and others support vector columns and similarity search via extensions or built-in features.
NoSQL databases: MongoDB and Redis now offer basic vector search features, often using plugins or modules.

This integration enables seamless blending of embeddings with structured business data, supporting hybrid query scenarios.

Oracle AI Database Approach

From Oracle's viewpoint, AI databases must natively support vector data types, efficient similarity queries, and enterprise security for integrating embeddings across applications. Oracle AI Database is designed to address these needs at scale.

The Oracle AI Database offers a unified approach allowing developers to:

Store embeddings alongside structured and unstructured data.
Run similarity queries directly using SQL and specialized vector search operators.
Integrate with Oracle's rich security, high availability, and scalability features.
Combine vector search, filtering, ranking, and analytical queries in a single stack.

Example Procedures - Using Vector Embeddings in Oracle AI Database

The following examples are intentionally minimal and illustrative. They highlight how Oracle AI Database supports native vector storage and SQL-based similarity search.

CREATE TABLE documents (

 id NUMBER,

 content CLOB,

 embedding VECTOR

);

This example shows a minimal table definition using Oracle AI Database’s native VECTOR data type. In practice, embeddings are stored alongside structured or unstructured application data in the same database.

SELECT id, content

FROM documents

ORDER BY VECTOR_DISTANCE(embedding, :query_vector, COSINE)

FETCH FIRST 5 ROWS ONLY;

This example illustrates SQL-based similarity search in Oracle AI Database. The :query_vector placeholder represents the embedding generated from user input by an embedding model (inside or outside the database) and is used to rank the nearest matches.

Hybrid query pattern (semantic + relational filtering)

SELECT id, content

FROM documents

WHERE content IS NOT NULL

ORDER BY VECTOR_DISTANCE(embedding, :query_vector, COSINE)

FETCH FIRST 5 ROWS ONLY;

This hybrid pattern combines standard SQL filtering with semantic ranking in a single query. It is useful when semantic search must also respect metadata constraints, access controls, or business rules. This streamlines workflows and facilitates embedding-driven applications without moving data across siloed systems.

Using Oracle Autonomous AI Database in conjunction with langchain-oracledb, for example, we can simply generate embeddings, store, and interact with vectors directly from within the database – requiring no additional investment in another separate vector database.

Querying and Searching for Stored Vector Embeddings

The following are a few of the things you should keep in mind if your work involves querying and searching for stored vector embeddings:

Approximate Nearest Neighbor (ANN) Algorithms and Data Structures

Searching for similar embeddings at scale requires efficient algorithms:

ANN Techniques: Rather than exact search, algorithms like HNSW (Hierarchical Navigable Small World), IVF (Inverted File Index), and PQ (Product Quantization) yield fast, near-accurate results.
Data Structures: Use trees (KD-Tree, Ball Tree), graphs (HNSW), or hash-based indices (LSH) to organize and retrieve vectors efficiently.

ANN can deliver millisecond-latency searches over millions or billions of embeddings, making it essential for operational AI applications.

High-level retrieval workflow (generalized)

At a high level, semantic retrieval follows a simple and reusable pattern that applies across vector databases, frameworks, and application stacks:

Convert user input into a query embedding.
Compare it against stored embeddings.
Rank results by similarity.
Apply filters and business rules as needed.

This high-level workflow is framework- and language-agnostic. While the underlying implementation differs across platforms and tools, the conceptual flow remains the same for the most vector search and RAG-style applications.

Popular Libraries

Several tools make it easier to store, and search embeddings:

Vector search libraries: FAISS (Facebook AI Similarity Search), Annoy (Spotify), NMSLIB, ScaNN.

These libraries power both stand-alone vector stores and integrations within general-purpose databases.

How to Choose the Right Similarity Metrics

Selecting the right similarity metric is critical for effective search:

Cosine similarity: Measures the angle between vectors; ideal for text and semantic similarity.
Euclidean distance: Useful for geometric or spatial data.
Dot product: Common in deep learning models; efficient for high-dimensional comparisons.

Your choice depends on the nature of your data and the specifics of your application.

Oracle AI Database Capabilities

Oracle’s AI Database combines native vector capabilities, enterprise security, and proven scalability, making it a robust choice for organizations seeking a unified solution for traditional data and AI-enabled workloads.

Native vector data types and indexing: Supports efficient storage and retrieval of high-dimensional vectors.
Integrated similarity search: Enables querying and filtering based on vector proximity.
Enterprise-grade security: Encryption at rest, robust access controls, and activity monitoring.
Hybrid queries: Seamless combination of structured, unstructured, and vector data in complex analytical tasks.
High scalability: Handles massive volumes of embeddings without performance degradation.

Best Practices for Working With Vector Embeddings

The following are a few of the best practices for using vector embeddings to power semantic search, personalized recommendations, multimodal analytics (including anomaly detection), and domain-specific insights across enterprise applications.

Semantic Search and Information Retrieval

Semantic search with embeddings offers better context and intent recognition than keyword search. Querying an embedding retrieves documents or objects with similar meanings—crucial for legal, healthcare, customer support, and research applications.

Recommendation Systems and Personalization

Compare user and item embeddings to power personalized recommendations. This increases engagement, retention, and value in e-commerce, media, and B2B applications.

Multimodal Search and Anomaly Detection

Combine embeddings across text, image, and audio for multimodal analytics or use distance-based thresholds to flag anomalies and outliers in fraud prevention or system monitoring.

Domain-Specific Analytics

Specialized embeddings can be trained for particular industries—finance, healthcare, retail—and stored/retrieved for advanced analytics, predictions, or compliance monitoring.

How to Select Appropriate Tools and Architectures

Match your use case to the data platform (dedicated vector database vs. extended relational/NoSQL).
If you want both, Oracle AI Database is a good option.
Factor in scale, integration needs, security requirements, and budget.
Leverage proven libraries and frameworks to speed up development.

Security and Scalability Considerations

Encrypt embeddings, control access, and monitor usage.
Choose solutions that scale with data growth and user demand.
Balance security, performance, and cost based on enterprise requirements.

Architectural Patterns

Hybrid architecture: Combine vector storage/search with structured data in a unified database like Oracle AI Database.
Microservices: Separate ingestion, search, and analytics as independently scaling components if needed.
Cloud-native solutions: Consider managed vector databases for elasticity and reduced operational burden.

Tooling Reminders

Use specialized libraries (FAISS, Annoy, HNSWLib) for local development, prototyping, or custom solutions.
For production or enterprise use, rely on databases with native vector support and robust security, such as Oracle AI Database.

Frequently Asked Questions (FAQ)

What are vector embeddings and why do they matter?

Vector embeddings are dense, high-dimensional numeric representations of objects like text, images, audio, or code. They place semantically similar items near each other in a continuous space, enabling tasks like semantic search, recommendations, RAG, deduplication, and anomaly detection. Compared with keyword or symbolic methods, embeddings better capture meaning, handle synonyms/paraphrases, and are robust across languages and modalities.

What are the main challenges in storing and querying embeddings at scale?

Volume and dimensionality: Billions of vectors, often 128–1024+ dimensions
Performance: Fast indexing and low-latency search, efficient batch/stream ingestion
Distributed ops: Sharding, replication, consistency, monitoring, and failover
Cost: Compute, storage, and operational overhead
Security: Encryption at rest/in transit, access control, auditing, data sanitization, and advanced cryptographic techniques for sensitive data

Where should I store embeddings: a dedicated vector database or a database with vector support?

Two common patterns:

Specialized vector databases (e.g., Pinecone, Weaviate, Milvus, Vespa) for high-scale, low-latency similarity search with ANN, SDKs, and REST APIs.
SQL/NoSQL databases with vector support (e.g., Oracle AI Database, PostgreSQL with pgvector, MongoDB, Redis) for blending vectors with structured data and enabling hybrid queries. Your choice should consider scale, integration with existing data, security, cost, and operational complexity.

What does Oracle AI Database provide for embeddings?

Oracle AI Database offers native vector types and indexing, integrated similarity search in SQL, enterprise-grade security (encryption, granular access control, auditing), and high scalability. It supports hybrid analytical queries across structured, unstructured, and vector data. With Oracle Autonomous AI Database and libraries like langchain-oracledb, teams can generate, store, and query embeddings within one platform—avoiding data silos and extra operational overhead. Encrypt data, enforce access controls, and monitor usage to meet enterprise requirements.

Conclusion

Storing and querying vector embeddings is a critical enabler for next-generation AI and data applications. By leveraging the right databases, libraries, and best practices, organizations and engineers can unlock new value from unstructured content, while maintaining performance, scalability, and security.

Resources

Agent Memory: A Free Short Course on Building Memory-Aware Agents

Wojtek Pluta — Thu, 16 Apr 2026 12:13:17 +0000

Oracle and DeepLearning.AI have launched Agent Memory: Building Memory-Aware Agents, a free short course on DeepLearning.AI that teaches developers how to architect memory systems that give agents persistence, continuity, and the ability to learn over time.

"Memory turns a stateless LLM into an agent that learns over time. How to architect agentic memory is one of the most debated topics in AI right now. This course gives AI developers and engineers a comprehensive view of the most common memory patterns."

Andrew Ng, Founder, DeepLearning.AI

Most agents forget. Each new session starts from zero, accumulated context from previous interactions is discarded, and the agent has no mechanism to learn from what it has already done. As a result, AI developers often rely on workarounds: cramming everything into the context window, reloading conversation logs, or bolting on ad-hoc retrieval.

These approaches can work, but they don't provide a clear mental model for how information should live inside an agentic system boundary. This course treats memory as a first-class citizen in AI agents, and is built around that memory-first perspective.

"For the past few years, we have focused on prompt and context engineering to get the best results from a single LLM call. But engineering the right context for agents that need to work over days or weeks needs an effective memory system. This course takes that memory-first approach to building agents."

Richmond Alake, AI Developer Experience Director, Oracle

Beyond Prompt Engineering

You’ve heard about prompt engineering. You've probably heard about context engineering. This course introduces the next layer: memory engineering, treating long-term memory as first-class infrastructure that is external to the model, persistent, and structured.

The course covers the full memory stack across five hands-on modules, built on LangChain, Tavily, and Oracle AI Database:

Why AI Agents Need Memory: Explore failure modes of stateless agents and the memory-first architecture used throughout the course.
Constructing the Memory Manager: Design persistent memory stores across memory types, model memory data for efficient retrieval, and implement a manager that orchestrates read, write, and retrieval operations during agent execution.
Scaling Agent Tool Use with Semantic Tool Memory: Treat tools as procedural memory, index them in a vector store, and retrieve only contextually relevant tools at inference time using semantic search.
Memory Operations: Extraction, Consolidation, and Self-Updating Memory: Build LLM-powered pipelines that extract structured facts from raw interactions, consolidate episodic memory into semantic memory, and implement write-back loops that let an agent autonomously update and resolve conflicts in its own knowledge base.
Memory-Aware Agent: Assemble a stateful agent that initializes from long-term memory at startup, checkpoints intermediate reasoning states during execution, and persists learned context across sessions.

"The patterns we cover here are not theoretical. AI developers and engineers will walk through real implementations: building memory stores, wiring up extraction pipelines, and handling contradictions in memory. You leave with working code you can adapt for your own production agents."

Nacho Martinez, AI Developer Advocate, Oracle

Oracle AI Database as the Agent Memory Core

Oracle AI Database serves as the unified agent memory core throughout the course. Instead of treating a database as a passive store, the course demonstrates how Oracle AI Database functions as the active retrieval and persistence layer that makes each memory pattern work in production.

Oracle AI Database brings key retrieval strategies into a single engine, including vector search for semantic similarity and unstructured knowledge retrieval, graph traversal for relationship-aware reasoning across connected entities, and relational queries for structured, transactional memory that demands precision and consistency. This helps reduce complexity by avoiding separate systems for different data types.

The memory patterns taught in this course, such as semantic tool memory, self-updating memory, and memory consolidation, are the same patterns used to build production-grade agentic systems on Oracle AI Database. This course puts that architecture directly in the hands of AI developers and engineers.

Who This Course Is For

Agent Memory: Building Memory-Aware Agents is designed for:

AI developers and engineers building or evaluating agentic systems who need production-grade memory architecture
ML engineers integrating LLMs into multi-turn or multi-session workflows
Developers working with LangChain, LangGraph, or Tavily who want durable, structured memory
Technical leaders assessing Oracle AI Database for agent infrastructure at scale

Availability

Agent Memory: Building Memory-Aware Agents is available now on DeepLearning.AI. The course is free to access and requires no prior Oracle experience. Developers can enroll in the course.

About Oracle AI Database

Oracle AI Database is a converged database platform built for AI workloads. It provides native vector search, graph traversal, relational retrieval, and the persistence infrastructure required for production agent memory systems in a single database engine. This removes the fragmented infrastructure that can become a bottleneck for AI innovation. Oracle AI Database is used by developers and enterprises as the unified memory core for AI agents to build and deploy intelligent, secure, memory-aware systems.

A Practical Guide to Choosing the Right Memory Substrate for Your AI Agents

Wojtek Pluta — Thu, 16 Apr 2026 12:11:25 +0000

Key Takeaways

Don't conflate interface with substrate. Filesystems win as an interface (LLMs already know how to use them); databases win as a substrate (concurrency, auditability, semantic search).
For prototypes, files are hard to beat. Simple, transparent, debuggable—a folder of markdown gets you surprisingly far when iteration speed matters most.
Shared state demands a database. Concurrent filesystem writes can silently corrupt data. If multiple agents or users touch the same memory, start with database guarantees.
Semantic retrieval beats keyword search at scale. Grep performance degrades on paraphrases and synonyms. Vector search finds content by meaning, this is critical once your knowledge base grows.
Avoid polyglot persistence. Running separate systems for vectors, documents, and transactions means four failure modes. Oracle AI Database simplifies your memory architecture.

AI developers are watching agent engineering evolve in real time, with leading teams openly sharing what works. One principle keeps showing up from the front lines: build within the LLM’s constraints.

In practice, two constraints dominate:

LLMs are stateless across sessions (no durable memory unless you bring it back in).
Context windows are bounded (and performance can degrade as you stuff more tokens in).

So “just add more context” isn’t a reliable strategy due to the quadratic cost of attention mechanisms and the degradation of reasoning capabilities as context fills up. The winning pattern is external memory + disciplined retrieval: store state outside the prompt (artifacts, decisions, tool outputs), then pull back only what matters for the current loop.

There’s also a useful upside: because models are trained on internet-era developer workflows, they’re unusually competent with developer-native interfaces: repos, folders, markdown, logs, and CLI-style interactions. That’s why filesystems keep showing up in modern agent stacks.

This is where the debate heats up: “files are all you need” for agent memory. Most arguments collapse because they treat interface, storage, and deployment as the same decision. They aren’t.

Filesystems are winning as an interface because models already know how to list directories, grep for patterns, read ranges, and write artifacts. Databases are winning as a substrate because once memory must be shared, audited, queried, and made reliable under concurrency, you either adopt database guarantees or painfully reinvent them.

In this piece, we give a systematic comparison of filesystems and databases for agent memory: where each approach shines, where it breaks down, and a decision framework for choosing the right foundation as you move from prototype to production.

Our aim is to educate AI developers on various approaches to agent memory, backed by performance guidance and working code.

All code presented in this article can be found here.

Understanding Agent Memory and Its Importance

Let’s take the common use case of building a Research Assistant with Agentic capabilities.

You build a Research Assistant agent that performs brilliantly in a demo; in the current execution, it can search arXiv, summarize papers, and draft a clean answer in a single run. Then you come back the next morning, start from a clean run, and then prompt the agent: “Continue from where we left off, and also compare Paper A to Paper B.” The agent responds as if it has never met you because LLMs are inherently stateless. Unless you send prior context back in, the model has no durable awareness of what happened in previous turns or previous sessions.

Once you move beyond single-turn Q&A into long-horizon tasks, deep research, multi-step workflows, and multi-agent coordination, you need a way to preserve continuity when the context window truncates, sessions restart, or multiple workers act on shared state. This takes us into the realm of leveraging systems of record for agents and introduces the concept of Agent Memory.

The Stateless LLM Problem

Why your Research Assistant forgets everything between sessions?

What is Agent Memory?

Agent memory is the set of system components and techniques that enable an AI agent to store, recall, and update information over time so it can adapt to new inputs and maintain continuity across long-horizon tasks. Core components typically include the language and embedding model, information retrieval mechanisms, and a persistent storage layer such as a database.

Types of Agent Memory

Types of Agent Memory

In practical systems, agent memory is usually classified into two distinct forms:

Short-term memory (working memory): whatever is currently inside the context window.
Long-term memory: a persistent state that survives beyond a single call or session (facts, artifacts, plans, prior decisions, tool outputs).

Concepts and techniques associated with agent memory all come together within the agent loop and the agent harness, as demonstrated in this notebook and explained later in this article.

Agent Loop and Agent Harness

The agent loop is the iterative execution cycle in which an LLM receives instructions from the environment and decides whether to generate a response or make a tool call based on its internal reasoning about the input provided in the current loop. This process repeats until the LLM produces a final output or an exit criterion is met. At a high level, the following operations are present within the agent loop:

Assemble context (user request + relevant memory + tool json schemas).
Call the model (plan, decide next action).
Take actions (tools, search, code execution, database queries).
Observe results (tool outputs, errors, intermediate artifacts).
Update memory (write transcripts, store artifacts, summarize, index).
Repeat until the task completes or hands control back to the user.

Anthropic’s guidance on long-running agents directly points to this: they describe harness practices that help agents quickly re-understand the state of work when starting with a fresh context window, including maintaining explicit progress artifacts.

The agent harness is the surrounding runtime and rules that make the loop reliable: how you wire tools, where you write artifacts, how you log/trace behavior, how you manage memory, and how you prevent the agent from drowning in context.

To complete the picture, the discipline of context engineering is heavily involved in the agent loop and aspect of the agent harness itself. Context engineering is the systematic design and curation of the content placed in an LLM’s context window so that the model receives high-signal tokens and produces the intended, reliable output within a fixed budget.

In this piece, we implement context engineering as a set of repeatable techniques inside the agent harness:

Context retrieval and selection: Pull only what is relevant (via grep for filesystem memory, via vector similarity and SQL filters for database memory).
Progressive disclosure: Start small (snippets, tails, line ranges) and expand only when needed.
Context offloading: Write large tool outputs and artifacts outside the prompt, then reload selectively.
Context reduction: Summarize or compact information when you approach a degradation threshold, then store the summary in durable memory so you can rehydrate later.

The concepts and explanations above set us up for the rest of the comparison we introduce in this piece. Now that we have the “why” and the moving parts (stateless models, the agent loop, the agent harness, and memory), we can evaluate the two dominant substrates teams are using today to make memory real: the filesystem and the database.

Filesystem-first Agentic Research Assistant

A filesystem-based memory architecture is not “the agent remembers everything forever”. It is the agent that can persist state and artifacts outside the context window and then pull them back selectively when needed. This aligns with two of the earlier-mentioned LLM constraints: a limited context window and statelessness.

In our Research Assistant, the filesystem becomes the memory substrate. Rather than injecting a large number of tools and extensive documentation into the LLM's context window (which would inflate the token count and trigger early summarization), we store them on disk and let the agent search and selectively read what it needs. This matches with what the Applied AI team at Cursor calls “Dynamic Context Discovery”: write large output to files, then let the agent tail and read ranges as required.

Our FSAgent and demo is using valid filesystem-OS related operations (such as tail and cat to read the contents of files; but that this is a very "simplified" approach, with a limited number of operations for demonstration purposes, and the capabilities offered in the file system can be optimized (with other commands and implementations).

On the other hand, it's a great start for people to get familiarized with tool access and how file system memory is achieved.

Semantic memory (durable knowledge): papers and reference docs saved as markdown.
Episodic memory (experience): conversation transcripts + tool outputs per session/run.
Procedural memory (how to work): “rules” / instructions files (e.g., CLAUDE.md / AGENTS.md) that shape behavior across sessions.

What does this look like in tooling?

Before we jump into the code, here’s the minimal tool surface we provide to the agent in the table below. Notice the pattern: instead of inventing specialized “memory APIs,” we expose a small set of filesystem primitives and let the agent compose them (very Unix).

Tool	What it does	Output
arxiv_search_candidates(query, k=5)	Searches arXiv and returns a JSON list of candidate papers with IDs, titles, authors, and abstracts.	JSON string of paper candidates
fetch_and_save_paper(arxiv_id)	Fetches full paper text (PDF → text) and saves to `semantic/knowledge_base/<id>.md`. Avoids routing full content through the LLM.	File path
read_file(path)	Reads a file from disk and returns its contents in full (use sparingly).	Full file contents
tail_file(path, n_lines=80)	Reads the last N lines of a file (first step for large files).	Last N lines
read_file_range(path, start_line, end_line)	Reads a line range to “zoom in” without loading everything.	Selected line range
grep_files(pattern, root_dir, file_glob)	Grep-like search across files to find relevant passages quickly.	Matches with file path + line number
list_papers()	Lists all locally saved papers in `semantic/knowledge_base/`.	List of filenames
conversation_to_file(run_id, messages)	Appends conversation entries to one transcript file per run in `episodic/conversations/`.	File path
summarise_conversation_to_file(run_id, messages)	Saves full transcript, then writes a compact summary to `episodic/summaries/`.	Dict with transcript + summary paths
monitor_context_window(messages)	Estimates current context usage (tokens used/remaining).	Dict with token stats

This design directly reflects what the AI ecosystem is converging on: a filesystem and a handful of core tools, rather than an explosion of bespoke tools.

Progressive reading (read, tail, range)

The first memory principle implementation is simple: don’t load large files unless you must. Filesystems are excellent at sequential read/write and work naturally with tools like grep and log-style access. This makes them a strong fit for append-only transcript and artifact storage.

That’s why we implement three reading tools:

Read everything (rare),
Read the end (common for logs/transcripts)
Read a slice (common for zooming into a match)

The tools below were implemented in Python and converted into objects callable by a langchain agent using the @tool decorator from the langchain agent module.

First is the read_file tool, the “load it all” option. This tool is useful when the file is small, or you truly need the full artifact, but it’s intentionally not the default because it can expand the context window.

@tool
def read_file(path: str) -> str:
 p = Path(path)
 if not p.exists():
 return f"File not found: {path}"
 return p.read_text(encoding="utf-8")

The tail_file function is the first step for large files. It grabs the end of a log/transcript to quickly see the latest or most relevant portion before deciding whether to read more.

@tool
def tail_file(path: str, n_lines: int = 80) -> str:
 p = Path(path)
 if not p.exists():
 return f"File not found: {path}"
 lines = p.read_text(encoding="utf-8").splitlines()
 return "\n".join(lines[-max(1, n_lines):])

The read_file_range function is seen as the surgical tool that is used once you’ve located the right region (often via grep or after a tail), pulls in just the exact line span you need, so the agent stays token-efficient and grounded.

@tool
def read_file_range(path: str, start_line: int, end_line: int) -> str:
 p = Path(path)
 if not p.exists():
 return f"File not found: {path}"
 lines = p.read_text(encoding="utf-8").splitlines()
 start = max(0, start_line)
 end = min(len(lines), end_line)
 if start >= end:
 return f"Empty range: {start_line}:{end_line} (file has {len(lines)} lines)"
 return "\n".join(lines[start:end])

Again, this is essentially dynamic context discovery in a microcosm: load a small view first, then expand only when needed.

Grep-style search (find first, read second)

A filesystem-based agent should quickly find relevant material and pull only the exact slices it needs. This is why grep is such a recurring theme in the agent tooling conversation: it gives the model a fast way to locate relevant regions before spending tokens to pull content.

Here’s a simple grep-like tool that returns line-numbered hits so the agent can immediately jump to read_file_range:

@tool
def grep_files(
pattern: str,
root_dir: str = "semantic",
file_glob: str = "**/*.md",
max_matches: int = 200,
ignore_case: bool = True,
) -> str:

root = Path(root_dir)
if not root.exists():
return f"Directory not found: {root_dir}"

flags = re.IGNORECASE if ignore_case else 0
try:
rx = re.compile(pattern, flags)
except re.error as e:
return f"Invalid regex pattern: {e}"

matches = []
for fp in root.glob(file_glob):
if not fp.is_file():
continue
try:
with open(fp, 'r', encoding='utf-8', errors='ignore') as f:
for i, line in enumerate(f, start=1):
if rx.search(line):
matches.append(f"{fp.as_posix()}:{i}: {line.strip()}")
if len(matches) >= max_matches:
return "\n".join(matches) + "\n\n[TRUNCATED: max_matches reached]"
except Exception:
continue

if not matches:
return "No matches found."
return "\n".join(matches)

One subtle but important detail in our grep_files implementation is how we read files. Rather than loading entire files into memory with read_text().splitlines(), we iterate lazily with for line in open(fp), which streams one line at a time and keeps memory usage constant regardless of file size.

This aligns with the "find first, read second" philosophy: locate what you need without loading everything upfront. For readers interested in maximum performance, the full notebook also includes a grep_files_os_based variant that shells out to ripgrep or grep, leveraging OS-level optimizations like memory-mapped I/O and SIMD instructions. In practice, this pattern (“search first, then read a range”) is one reason filesystem agents can feel surprisingly strong on focused corpora: the agent iteratively narrows the context instead of relying on a single-shot retrieval query.

Tool outputs as files: keeping big JSON out of the prompt

One of the fastest ways to blow up your context window is to return large JSON payloads from tools. Cursor’s approach is to write these results to files and let the agent inspect them on demand (often starting with tail).

That’s exactly why our folder structure includes a tool_outputs/<session_id>/ directory: it acts like an “evidence locker” for everything the agent did, without forcing those payloads into the current context.

{
 "ts_utc": "2026-01-27T12:41:12.135396+00:00",
 "tool": "arxiv_search_candidates",
 "input": "{'query': 'memgpt'}",
 "output": "content='[\\n {\\n \"arxiv_id\": \"2310.08560v2\",\\n \"entry_id\": \"http://arxiv.org/abs/2310.08560v2\",\\n \"title\": \"MemGPT: Towards LLMs as Operating Systems\",\\n \"authors\": \"Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, Joseph E. Gonzalez\",\\n \"published\": \"2024-02-12\",\\n \"abstract\": ...msPnaMxOl8Pa'"
}

Putting it together: the agent toolset

Before we create the agent, we bundle the tools into a small, composable toolbox. This matches the broader trend: agents often perform better with a smaller tool surface, less choice paralysis (aka context confusion), fewer weird and overlapping tool schemas, and more reliance on proven filesystem workflows.

FS_TOOLS = [
 arxiv_search_candidates, # search arXiv for relevant research papers
 fetch_and_save_paper, # fetch paper text (PDF->text) and save to semantic/knowledge_base/<id>.md
 read_file, # read a file in full (use sparingly)
 tail_file, # read end of file first
 read_file_range, # read a specific line range
 conversation_to_file, # append conversation entries to episodic memory
 summarise_conversation_to_file, # save transcript + compact summary
 monitor_context_window, # estimate token usage
 list_papers, # list saved papers
 grep_files # grep-like search over files
]

The “filesystem-first” system prompt: policy beats cleverness

Filesystem tools alone aren’t enough, you also need a reading policy that keeps the agent's token usage efficient and grounded. This is the same reason CLAUDE.md, AGENTS.md, and SKILLS.md matter: they’re procedural memory that is applied consistently across sessions.

Key policies we encode below:

Store big artifacts on disk (papers, tool outputs, transcripts).
Prefer grep + range reads over full reads.
Use tail first for large files and logs.
Be explicit about what you actually read (grounding).

Below is the implementation of an agent using the langchain framework.

fs_agent = create_agent(
 model=f"openai:{os.getenv('OPENAI_MODEL', 'gpt-4o-mini')}",
 tools=FS_TOOLS,
 system_prompt=(
 "You are a conversational research ingestion agent.\n\n"
 "Core behavior:\n"
 "- When asked to find a paper: use arxiv_search_candidates, pick the best arxiv_id, "
 "then call fetch_and_save_paper to store the full text in semantic/knowledge_base/.\n"
 "- Papers/knowledge base live in semantic/knowledge_base/.\n"
 "- Conversations (transcripts) live in episodic/conversations/ (one file per run).\n"
 "- Summaries live in episodic/summaries/.\n"
 "- Conversation may be summarised externally; respect summary + transcript references.\n"
 ),
)

What the memory footprint looks like on disk

After running the agent, you end up with a directory layout that makes the agent’s “memory” tangible and inspectable. In your example, the agent produces:

episodic/conversations/fsagent_session_0010.md — the session transcript (episodic memory)
episodic/tool_outputs/fsagent_session_0010/*.json — tool results saved as files (evidence + replay)
semantic/knowledge_base/*.md — saved papers (semantic memory)

That is exactly the point of filesystem-first memory: the model doesn’t “remember” by magically retaining state; it “remembers” because it can re-open, search, and selectively read its prior artifacts.

This is also why so many teams keep rediscovering the same pattern: files are a simple abstraction, and agents are surprisingly good at using them.

Advantages of File Systems In AI Agents

In the previous section, we showed what a filesystem‑first memory harness looks like in practice: the agent writes durable artifacts (papers, tool outputs, transcripts) to disk, then “remembers” by searching and selectively reading only the parts it needs.

This approach works because it directly addresses two core constraints of LLMs: limited context windows and inherent statelessness. Once those constraints are handled, it becomes clear why file systems so often become the default interface for early agent systems.

Pretraining‑native interface: LLMs have ingested massive amounts of repos, docs, logs, and README‑driven workflows, so folders and files are a familiar operating surface.
Simple primitives, strong composition: A small action set (list/read/write/search) composes into sophisticated behavior without needing schemas, migrations, or query planning.
Token efficiency via progressive disclosure: Retrieve via search, then load a small slice (snippets, line ranges) instead of dumping entire documents into the prompt.
Natural home for artifacts and evidence: Transcripts, intermediate results, cached documents, and tool outputs fit cleanly as files and remain human‑inspectable.
Debuggable by default: You can open the directory and see exactly what the agent saved, what tools returned, and what the agent could have referenced.
Portability: A folder is easy to copy, zip, diff, version, and replay elsewhere, great for demos, reproducibility, and handoffs.
Low operational overhead: For PoCs and MVPs, you get persistence and structure without provisioning extra infrastructure.

In practice, filesystem memory excels when the workload is artifact‑heavy (research notes, paper dumps, transcripts), when you want a clear audit trail, and when iteration speed matters more than sophisticated retrieval. It also encourages good agent hygiene: write outputs down, cite sources, and load only what you need.

Disadvantages of Filesystems In AI Agents

But, unfortunately, it doesn’t end there. The same strengths that make files attractive, simplicity, relatively low cost, and fast implementation, can quickly become bottlenecks once you promote these systems into production, where they are expected to behave like a shared, reliable memory platform.

As soon as an agent moves beyond single-user prototypes into real-world scenarios, where concurrent reads and writes are the norm and robustness under load is non-negotiable, filesystems start to show their limits.

Weak concurrency guarantees by default: Multiple processes can overwrite or interleave writes unless you implement locking correctly. Even then, locking semantics vary across platforms and network filesystems.
No ACID transactions: You don’t get atomic multi-step updates, isolation between writers, or durable commit semantics without building them. Partial writes and mid-operation failures can leave memory in inconsistent states.
Search quality is usually brittle: Keyword/grep-style retrieval misses meaning, synonyms, and paraphrases.
Scaling becomes “death by a thousand files”: Directory bloat, fragmented artifacts, and expensive scans make performance degrade as memory grows, especially if you rely on repeated full-folder searches.
Indexing is DIY: The moment you want fast retrieval, deduplication, ranking, or recency weighting, you end up maintaining your own indexes and metadata stores (which, being honest here…is basically a database).
Metadata and schema drift: Agents inevitably accumulate extra fields (source URLs, timestamps, embeddings, tags). Keeping those consistent across files is harder than enforcing constraints in tables.
Poor multi-user / multi-agent coordination: Shared memory across agents means shared state. Without a central coordinator, you’ll hit race conditions, inconsistent views, and an unclear “source of truth.”
Harder auditing at scale: Files are human-readable, but reconstructing “what happened” across many runs and threads becomes messy without structured logs, timestamps, and queryable history.
Security and access control are coarse: Permissions are filesystem-level, not row-level. It’s hard to enforce “agent A can read X but not Y” without duplicating data or adding an auth layer.

The core pattern is that filesystem memory stays attractive until you need correctness under concurrency, semantic retrieval, or structured guarantees. At that point, you either accept the limitations (and keep the agent single-user/single-process) or you adopt a database.

Database For Agent Memory

By this point, most AI developers can see why filesystem first agent implementations are having a moment. It is a familiar interface, easy to prototype with, and our agents can “remember” by writing artifacts to disk and reloading them later via search plus selective reads. For a single developer on a laptop, that is often enough. But once we move beyond “it works on my laptop” and start supporting developers who ship to thousands or millions of users, memory stops being a folder of helpful files and becomes a shared system that has to behave predictably under load.

Databases were created for the exact moment when “a pile of files” stops being good enough because too many people and processes are touching the same data. One of the most-cited origin stories of the database dates to the Apollo era. IBM, alongside partners, built what became IMS to manage complex operational data for the program, and early versions were installed in 1968 at the Rockwell Space Division, supporting NASA. The point was not simply storage. It was coordination, correctness, and the ability to trust shared data while many activities were happening simultaneously.

That same production reality is what pushes agent memory toward databases today.

When agent memory must handle concurrent reads and writes, preserve an auditable history of what happened, support fast retrieval across many sessions, and enforce consistent updates, we want database guarantees rather than best-effort file conventions.

Oracle has been solving these exact problems since 1979, when we shipped the first commercial SQL database. The goal then was the same as now: make shared state reliable, portable, and trustworthy under load.

On that note, allow us to show how this can work in practice.

Database-first Research Assistant

In the filesystem first section, our Research Assistant “remembered” by writing artifacts to disk and reloading them later using cheap search plus selective reads. That is a great starting point. But when we want memory that is shared, queryable, and reliable under concurrent use, we need a different foundation.

In this iteration of our agent, we keep the same user experience and the same high-level job. Search arXiv, ingest papers, answer follow-up questions, and maintain continuity across sessions. The difference is that memory now lives in the Oracle AI Database, where we can make it durable, indexed, filterable, and safe for concurrent reads and writes. We also achieve a clean separation between two memory surfaces: structured history in SQL tables and semantic recall via vector search.

The result is what we call a MemAgent, an agent whose memory is not a folder of artifacts, but a queryable system. It is designed to support multi-threaded sessions, store full conversational history, store tool logs for debugging and auditing, and store a semantic knowledge base that can be searched by meaning rather than keywords.

Available tools for MemAgent

Before we wire up the agent loop, we need to define the tool surface that MemAgent can use to reason, retrieve, and persist knowledge. The design goal here is similar to the filesystem-first approach: keep the toolset small and composable, but shift the memory substrate from files to the database. Instead of grepping folders and reading line ranges, MemAgent uses vector similarity search to retrieve semantically relevant context, and it persists what it learns in a way that is queryable and reliable across sessions.

In practice, that means two things.

First, ingestion tools do not just “fetch” content; they also chunk and embed it so it becomes searchable later.
Second, retrieval tools are meaning-based rather than keyword-based, so the agent can find relevant passages even when the user paraphrases, uses synonyms, or asks higher-level conceptual questions.

The table below summarizes the minimal set of tools we expose to MemAgent and where each tool stores its outputs.

Tool	What it does
arxiv_search_candidates(query, k)	Searches arXiv for candidate papers
fetch_and_save_paper_to_kb_db(arxiv_id)	Fetches paper, chunks text, stores embeddings
search_knowledge_base(query, k)	Semantic search over stored papers
store_to_knowledge_base(text, metadata)	Manually store text with metadata

FSAgent and MemAgent can look similar from the outside because both can ingest papers, answer questions, and maintain continuity. The difference is what powers that continuity and how retrieval works when the system grows.

FSAgent relies on the operating system as its memory surface, which is great for iteration speed and human inspectability, but it typically relies on keyword-style discovery and file traversal. MemAgent treats memory as a database concern, which adds setup overhead, but unlocks indexed retrieval, stronger guarantees under concurrency, and richer ways to query and filter what the agent has learned.

Aspect	FSAgent (Filesystem)	MemAgent (Database)
Search	Keyword and grep	Semantic similarity
Persistence	Markdown files	SQL tables + vector indexes
Scalability	Directory traversal	Indexed queries
Query Language	Paths and regex	SQL + vector similarity
Setup Complexity	Minimal	Requires database runtime

Creating data stores with LangChain and Oracle AI Database

Before we start defining tables and vector stores, it is worth being explicit about the stack we are using and why. In this implementation, we are not building a bespoke agent framework from scratch.

We use LangChain as the LLM framework to abstract the agent loop, tool calling, and message handling, then pair it with a model provider for reasoning and generation, and with Oracle AI Database as the unified memory core that stores both structured history and semantic embeddings.

This separation is important because it mirrors how production agent systems are typically built. The agent logic evolves quickly, the model can be swapped, and the memory layer must remain reliable and queryable.

Think of this as the agent stack. Each layer has a clear job, and together they create an agent that is both practical to build and robust enough to scale.

Model provider (OpenAI): generates reasoning, responses, and tool decisions.
LLM framework (LangChain): provides the agent abstraction, tool wiring, and runtime orchestration.
Unified memory core (Oracle AI Database): stores durable conversational memory in SQL and semantic memory in vector indexes.

With that stack in place, the first step is simply to connect to the Oracle Database and initialize an embedding model. The database connection serves as the foundation for all memory operations, and the embedding model enables us to store and retrieve knowledge semantically through the vector store layer.

def connect_oracle(user, password, dsn="127.0.0.1:1521/FREEPDB1", program="langchain_oracledb_demo"):
 return oracledb.connect(user=user, password=password, dsn=dsn, program=program)

database_connection = connect_oracle(
 user="VECTOR",
 password="VectorPwd_2025",
 dsn="127.0.0.1:1521/FREEPDB1",
 program="devrel.content.filesystem_vs_dbs",
)

print("Using user:", database_connection.username)

embedding_model = HuggingFaceEmbeddings(
 model_name="sentence-transformers/paraphrase-mpnet-base-v2"
)

Next, we define the database schema to store our agent’s memory and prepare a clean slate for the demo. We separate memory into distinct tables so each type can be managed, indexed, and queried appropriately.

Installing the Oracle Database integration in the LangChain ecosystem is straightforward. You can add it to your environment with a single pip command:

pip install -U langchain-oracledb

Conversational history and logs are naturally tabular, while semantic and summary memory are stored in vector-backed tables through OracleVS. For reproducibility, we drop any existing tables from previous runs, making the notebook deterministic and avoiding confusing results when you re-run the walkthrough.

from langchain_oracledb.vectorstores import OracleVS
from langchain_oracledb.vectorstores.oraclevs import create_index
from langchain_community.vectorstores.utils import DistanceStrategy

CONVERSATIONAL_TABLE = "CONVERSATIONAL_MEMORY"
KNOWLEDGE_BASE_TABLE = "SEMANTIC_MEMORY"
LOGS_TABLE = "LOGS_MEMORY"
SUMMARY_TABLE = "SUMMARY_MEMORY"

ALL_TABLES = [
 CONVERSATIONAL_TABLE,
 KNOWLEDGE_BASE_TABLE,
 LOGS_TABLE,
 SUMMARY_TABLE
]

for table in ALL_TABLES:
 try:
 with database_connection.cursor() as cur:
 cur.execute(f"DROP TABLE {table} PURGE")
 except Exception as e:
 if "ORA-00942" in str(e):
 print(f" - {table} (not exists)")
 else:
 print(f" [FAIL] {table}: {e}")

database_connection.commit()

Create the vector stores and HNSW indexes

For this section, it is worth explaining what a “vector store” actually is in the context of agents. A vector store is a storage system that persists embeddings alongside metadata and supports similarity search, so the agent can retrieve items by meaning rather than keywords.

Instead of asking “which file contains this exact phrase”, the agent asks “which chunks are semantically closest to my question” and pulls back the best matches.

Under the hood, that usually means an approximate nearest neighbor index, because scanning every vector becomes prohibitively expensive as your knowledge base grows. HNSW is one of the most common indexing approaches for this style of retrieval.

In the code below, we create two vector stores using the langchain_oracledb module OracleVS, one for the knowledge base and one for summaries, both using cosine distance.

Second, it builds HNSW indexes so similarity search stays fast as memory grows, which is exactly what you want once your Research Assistant starts ingesting many papers and running over long-lived threads.

knowledge_base_vs = OracleVS(
 client=database_connection,
 embedding_function=embedding_model,
 table_name=KNOWLEDGE_BASE_TABLE,
 distance_strategy=DistanceStrategy.COSINE,
)

summary_vs = OracleVS(
 client=database_connection,
 embedding_function=embedding_model,
 table_name=SUMMARY_TABLE,
 distance_strategy=DistanceStrategy.COSINE,
)

def safe_create_index(conn, vs, idx_name):
 try:
 create_index(
 client=conn,
 vector_store=vs,
 params={"idx_name": idx_name, "idx_type": "HNSW"}
 )
 print(f" Created index: {idx_name}")
 except Exception as e:
 if "ORA-00955" in str(e):
 print(f" [SKIP] Index already exists: {idx_name}")
 else:
 raise

print("Creating vector indexes...")
safe_create_index(database_connection, knowledge_base_vs, "kb_hnsw_cosine_idx")
safe_create_index(database_connection, summary_vs, "summary_hnsw_cosine_idx")
print("All indexes created!")

Memory Manager

In the code below, we create a custom Memory manager. The Memory manager is the abstraction layer that turns raw database operations into “agent memory behaviours”. This is the part that makes the database-first agent easy to reason about.

SQL methods store and load conversational history by thread_id
Vector methods store and retrieve semantic memory by similarity search
Summary methods store compressed context and let us rotate the working set when we approach context limits

from langchain.tools import tool
from typing import List, Dict

class MemoryManager:
 """
 A simplified memory manager for AI agents using Oracle AI Database.
 """

 def __init__(self, conn, conversation_table: str, knowledge_base_vs, summary_vs, tool_log_table):
 self.conn = conn
 self.conversation_table = conversation_table
 self.knowledge_base_vs = knowledge_base_vs
 self.summary_vs = summary_vs
 self.tool_log_table = tool_log_table

 def write_conversational_memory(self, content: str, role: str, thread_id: str) -> str:
 thread_id = str(thread_id)
 with self.conn.cursor() as cur:
 id_var = cur.var(str)
 cur.execute(f"""
 INSERT INTO {self.conversation_table} (thread_id, role, content, metadata, timestamp)
 VALUES (:thread_id, :role, :content, :metadata, CURRENT_TIMESTAMP)
 RETURNING id INTO :id
 """, {"thread_id": thread_id, "role": role, "content": content, "metadata": "{}", "id": id_var})
 record_id = id_var.getvalue()[0] if id_var.getvalue() else None
 self.conn.commit()
 return record_id

 def load_conversational_history(self, thread_id: str, limit: int = 50) -> List[Dict[str, str]]:
 thread_id = str(thread_id)
 with self.conn.cursor() as cur:
 cur.execute(f"""
 SELECT role, content FROM {self.conversation_table}
 WHERE thread_id = :thread_id AND summary_id IS NULL
 ORDER BY timestamp ASC
 FETCH FIRST :limit ROWS ONLY
 """, {"thread_id": thread_id, "limit": limit})
 results = cur.fetchall()

 return [{"role": str(role), "content": content.read() if hasattr(content, 'read') else str(content)} for role, content in results]

 def mark_as_summarized(self, thread_id: str, summary_id: str):
 thread_id = str(thread_id)
 with self.conn.cursor() as cur:
 cur.execute(f"""
 UPDATE {self.conversation_table}
 SET summary_id = :summary_id
 WHERE thread_id = :thread_id AND summary_id IS NULL
 """, {"summary_id": summary_id, "thread_id": thread_id})
 self.conn.commit()
 print(f" Marked messages as summarized (summary_id: {summary_id})")

 def write_knowledge_base(self, text: str, metadata_json: str = "{}"):
 metadata = json.loads(metadata_json)
 self.knowledge_base_vs.add_texts([text], [metadata])

 def read_knowledge_base(self, query: str, k: int = 5) -> str:
 results = self.knowledge_base_vs.similarity_search(query, k=k)
 content = "\n".join([doc.page_content for doc in results])
 return f"""## Knowledge Base Memory: This are general information that is relevant to the question
### How to use: Use the knowledge base as background information that can help answer the question

{content}"""

 def write_summary(self, summary_id: str, full_content: str, summary: str, description: str):
 self.summary_vs.add_texts(
 [f"{summary_id}: {description}"],
 [{"id": summary_id, "full_content": full_content, "summary": summary, "description": description}]
 )
 return summary_id

 def read_summary_memory(self, summary_id: str) -> str:
 results = self.summary_vs.similarity_search(
 summary_id,
 k=5,
 filter={"id": summary_id}
 )
 if not results:
 return f"Summary {summary_id} not found."
 doc = results[0]
 return doc.metadata.get('summary', 'No summary content.')

 def read_summary_context(self, query: str = "", k: int = 5) -> str:
 results = self.summary_vs.similarity_search(query or "summary", k=k)
 if not results:
 return "## Summary Memory\nNo summaries available."

 lines = ["## Summary Memory", "Use expand_summary(id) to get full content:"]
 for doc in results:
 sid = doc.metadata.get('id', '?')
 desc = doc.metadata.get('description', 'No description')
 lines.append(f" - [ID: {sid}] {desc}")
 return "\n".join(lines)

Then we instantiate it:

memory_manager = MemoryManager(
 conn=database_connection,
 conversation_table=CONVERSATION_HISTORY_TABLE,
 knowledge_base_vs=knowledge_base_vs,
 tool_log_table=TOOL_LOG_TABLE,
 summary_vs=summary_vs
)

Creating the tools and agent

The database-first agent follows a simple, production-friendly pattern.

Persists every conversation turn as structured rows, including user and assistant messages with thread or run IDs and timestamps, so sessions are recoverable, traceable, and consistent across restarts.
Persists long-term knowledge in a vector-enabled store by chunking documents, generating embeddings, and storing them with metadata, so retrieval is semantic, ranked, and fast as the corpus grows.
Persists tool activity as first-class records that capture the tool name, inputs, outputs, status, errors, and key metadata, so agent behavior is inspectable, reproducible, and auditable.

On top of that, the agent actively manages context: it tracks token usage and periodically rolls older dialogue and intermediate state into durable summaries (and/or “memory” tables), so the working prompt stays small while the full history remains available on demand.

Ingest papers into the knowledge base vector store

This is the database-first equivalent of “fetch and save paper”. Instead of writing markdown files, we do three steps:

Load paper text from arXiv
Chunk it to respect the embedding model limits
Store chunks with metadata in the vector store, which gives us fast semantic search later

from datetime import datetime, timezone
from langchain_core.tools import tool
from langchain_community.document_loaders import ArxivLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

@tool
def fetch_and_save_paper_to_kb_db(
 arxiv_id: str,
 chunk_size: int = 1500,
 chunk_overlap: int = 200,
) -> str:
 loader = ArxivLoader(
 query=arxiv_id,
 load_max_docs=1,
 doc_content_chars_max=None,
 )
 docs = loader.load()
 if not docs:
 return f"No documents found for arXiv id: {arxiv_id}"

 doc = docs[0]

 title = (
 doc.metadata.get("Title")
 or doc.metadata.get("title")
 or f"arXiv {arxiv_id}"
 )

 entry_id = doc.metadata.get("Entry ID") or doc.metadata.get("entry_id") or ""
 published = doc.metadata.get("Published") or doc.metadata.get("published") or ""
 authors = doc.metadata.get("Authors") or doc.metadata.get("authors") or ""

 full_text = doc.page_content or ""
 if not full_text.strip():
 return f"Loaded arXiv {arxiv_id} but extracted empty text (PDF parsing issue)."

 splitter = RecursiveCharacterTextSplitter(
 chunk_size=chunk_size,
 chunk_overlap=chunk_overlap,
 )
 chunks = splitter.split_text(full_text)

 ts_utc = datetime.now(timezone.utc).isoformat()
 metadatas = []
 for i in range(len(chunks)):
 metadatas.append(
 {
 "source": "arxiv",
 "arxiv_id": arxiv_id,
 "title": title,
 "entry_id": entry_id,
 "published": str(published),
 "authors": str(authors),
 "chunk_id": i,
 "num_chunks": len(chunks),
 "ingested_ts_utc": ts_utc,
 }
 )

 knowledge_base_vs.add_texts(chunks, metadatas)

 return (
 f"Saved arXiv {arxiv_id} to {KNOWLEDGE_BASE_TABLE}: "
 f"{len(chunks)} chunks (title: {title})."
 )

We create two more tools below:

search_knowledge_base(query, k=5): Runs a semantic similarity search over the database-backed knowledge base and returns the top k most relevant chunks, so the agent can retrieve context by meaning rather than exact keywords.
store_to_knowledge_base(text, metadata_json="{}"): Stores a new piece of text into the knowledge base and attaches metadata (as JSON), which gets embedded and indexed so it becomes searchable in future queries.

import os
from langchain.tools import tool

@tool
def search_knowledge_base(query: str, k: int = 5) -> str:
 return memory_manager.read_knowledge_base(query, k)

@tool
def store_to_knowledge_base(text: str, metadata_json: str = "{}") -> str:
 memory_manager.write_knowledge_base(text, metadata_json)
 return "Successfully stored text to knowledge base."

Now we build the LangChain agent using the database-first tools.

from langchain.agents import create_agent

MEM_AGENT = create_agent(
 model=f"openai:{os.getenv('OPENAI_MODEL', 'gpt-4o-mini')}",
 tools=[search_knowledge_base, store_to_knowledge_base, arxiv_search_candidates, fetch_and_save_paper_to_kb_db],
)

Result Comparison: FSAgent vs MemAgent: End-to-End Benchmark (Latency + Quality)

At this point, the difference between a filesystem agent and a database-backed agent should feel less like a philosophical debate and more like an engineering trade-off. Both approaches can “remember” in the sense that they can persist state, retrieve context, and answer follow-up questions. The real test is what happens when you leave the tidy laptop demo and hit production realities: larger corpora, fuzzier queries, and concurrent workloads.

To make that concrete, we ran an end-to-end benchmark and measured the full agent loop per query—retrieval, context assembly, tool calls, model invocations, and the final answer—across three scenarios:

Small-corpus retrieval: a tight, keyword-friendly dataset to validate baseline retrieval and answer synthesis with minimal context.
Large-corpus retrieval: a larger dataset with more paraphrase variability to stress retrieval quality and context efficiency at scale.
Concurrent write integrity: a multi-worker stress test to evaluate correctness under simultaneous reads/writes (integrity, race conditions, throughput).

FSAgent vs MemAgent: End-to-End Benchmark (Latency + Quality)

From the result shown in the image above, two conclusions immediately stand out: latency and answer quality.

In our run, MemAgent generally finished faster end-to-end than FSAgent. That might sound counterintuitive if you assume “database equals overhead,” and sometimes it does.

But the agent loop is not dominated by raw storage primitives. It is dominated by how quickly you can find the right information and how little unnecessary context you force into the model, also known as context engineering. Semantic retrieval tends to return fewer, more relevant chunks (subject to tuning of the retrieval pipelines), which means less scanning, less paging through files, and fewer tokens burned on irrelevant text.

In this particular run, both agents produced similar-quality answers. That is not surprising. When the questions are retrieval-friendly and the corpus is small enough, both approaches can find the right passages. FSAgent gets there through keyword search and careful reading. MemAgent gets there through similarity search over embedded chunks. Different roads, similar destination.

And I think it’s worth zooming in on one nuance here. When the information to traverse is minimal in terms of character length and the query is keyword-friendly, the retrieval quality of both agents tends to converge. At that scale, “search” is barely a problem, so the dominant factor becomes the model’s ability to read and synthesise, not the retrieval substrate. The gap only starts to widen when the corpus grows, the wording becomes fuzzier, and the system must retrieve reliably under real-world constraints such as noise, paraphrases, and concurrency. Which it eventually does.

About the “LLM-as-a-Judge” metric

We also scored answers using an LLM-as-a-judge prompt. It is a pragmatic way to get directional feedback when you do not have labeled ground truth, but it is not a silver bullet. Judges can be sensitive to prompt phrasing, can over-reward fluency, and can miss subtle grounding failures.

If you are building this for production, treat LLM judging as a starting signal, not the finish line. The more reliable approach is a mix of:

Reference-based evaluation when you have ground truth, such as rubric grading, exact match, or F1-style scoring.
Retrieval-aware evaluation when context matters, such as context precision and recall, answer faithfulness, and groundedness. Tracing plus evaluation tooling so you can connect failures to the specific retrievals, tool calls, and context assembly decisions that caused them.

Even with a lightweight judge, the directional story remains consistent. As retrieval becomes more difficult and the system becomes busier, database-backed memory tends to perform better.

Large Corpus Benchmark: Why the gap widens as data grows

The large-corpus test is designed to stress the exact weakness of keyword-first memory. We intentionally made the search problem harder by growing the corpus and making the queries less “exact match.”

FSAgent with a concatenated corpus:
When you merge many papers into large markdown files, FSAgent becomes dependent on grep-style discovery followed by paging the right sections into the context window. It can work, but it gets brittle as the corpus grows:

If the user paraphrases or uses synonyms, exact keyword matches can fail.
If the keyword is too common, you get too many hits, and the agent has to sift through them manually.
When uncertain, the agent often loads larger slices “just in case,” which increases token count, latency, and the risk of context dilution.

MemAgent with chunked, embedded memory:
Chunking plus embeddings makes retrieval more forgiving and more stable:

The user does not need to match the source phrasing exactly.
The agent can fetch a small set of high-similarity chunks, keeping context tight.
Indexed retrieval remains predictable as memory grows, rather than requiring repeated scans of files.

The narrative takeaway is simple. Filesystems feel great when the corpus is small and the queries are keyword-friendly. As the corpus grows and the questions get fuzzier, semantic retrieval becomes the differentiator, and database-backed memory becomes the more dependable default.

The quality gap widens with scale. On a handful of documents, grep can brute-force its way to a reasonable answer: the agent finds a keyword match, pulls surrounding context, and responds.

But scatter the same information across hundreds of files, and keyword search starts missing the forest for the trees. It returns too many shallow hits or none when the user's phrasing doesn't match the source text verbatim. Semantic search, by contrast, surfaces conceptually relevant chunks even when the vocabulary differs. The result isn't just faster retrieval, it's more coherent answers with fewer hallucinated gaps. This is evident in our LLM judge evaluation on the large corpus benchmark, where FSAgent achieved a score of 29.7% while MemAgent reached 87.1%.

Concurrency Test: What production teaches you very quickly

We find that the real breaking point for filesystem memory is rarely retrieval. It is concurrency.

We ran three versions of the same workload under concurrent writes:

Filesystem without locking, where multiple workers append to the same file.
Filesystem with locking, where writes are guarded by file locks.
Oracle AI Database with transactions, where multiple workers write rows under ACID guarantees.

Then we measured two things:

Integrity, meaning, did we get the expected number of entries with no corruption?
Execution time, meaning how long the batch took end-to-end.

What we observed maps to what many teams discover the hard way.

Naive filesystem writes can be fast and still be wrong. Without locking, concurrent writes conflict with each other. You might get good throughput and still lose memory entries. If your agent’s “memory” is used for downstream reasoning, silent loss is not a performance issue. It is a correctness failure.

Locking fixes integrity, but now correctness is your job. With explicit locking, you can make filesystem writes safe. But you inherit the complexity. Lock scope, lock contention, platform differences, network filesystem behavior, and failure recovery all become part of your agent engineering work.

Databases make correctness the default. Transactions and isolation are exactly what databases were designed for. Yes, there is overhead. But the key difference is that you are not bolting correctness on after a production incident. You start with a system whose job is to protect the shared state.

And of course, you can take the file-locking approach, add atomic writes, build a write-ahead log, introduce retry and recovery logic, maintain indexes for fast lookups, and standardise metadata so you can query it reliably.

Eventually, though, you will realise you have not “avoided” a database at all.

You have just rebuilt one, only with fewer guarantees and more edge cases to own.

Conclusion: Is there a happy medium for AI Developers?

This isn’t a religious war between “files” and “databases.” It’s a question of what you’re optimizing for—and which failure modes you’re willing to own. If you’re building single-user or single-writer prototypes, filesystem memory is a great default. It’s simple, transparent, and fast to iterate on. You can open a folder and see exactly what the agent saved, diff it, version it, and replay it with nothing more than a text editor.

If you’re building multi-user agents, background workers, or anything you plan to ship at scale, a database-backed memory store is a safer foundation at that stage. At that stage, concurrency, integrity, governance, access control, and auditability matter more than raw simplicity. A practical compromise is a hybrid design: keep file-like ergonomics for artifacts and developer workflows, but store durable memory in a database that can enforce correctness.

And if you insist on filesystem-only memory in production, treat locking, atomic writes, recovery, indexing, and metadata discipline as first-class engineering work. Because the moment you do that seriously, you’re no longer “just using files”—you’re rebuilding a database.

One last trap worth calling out: polyglot persistence.

Many AI stacks drift into an anti-pattern: a vector DB for embeddings, a NoSQL DB for JSON, a graph DB for relationships, and a relational DB for transactions. Each product is “best at its one thing,” until you realize you’re operating four databases, four security models, four backup strategies, four scaling profiles, and four cascading failure points.

Coordination becomes the tax. You end up building glue code and sync pipelines just to make the system feel unified to the agent. This is why converged approaches matter in agent systems: production memory isn’t only about storing vectors—it’s about storing operational history, artifacts, metadata, and semantics under one consistent set of guarantees.

For AI Developers, your application acts as an integration layer for multiple storage engines, each with different access patterns and operational semantics. You end up building glue code, sync pipelines, and reconciliation logic just to make the system feel unified to the agent.

Of course, production data is inherently heterogeneous. You will inevitably deal with structured, semi-structured, unstructured text, embeddings, JSON documents, and relationship-heavy data.

The point is not that “one model wins”.

The point is that when you understand the fundamentals of data management, reliability, indexing, governance, and queryability, you want a platform that can store and retrieve these forms without turning your AI infrastructure into a collection of loosely coordinated subsystems.

This is the philosophy behind Oracle’s converged database approach, which is designed to support multiple data types and workloads natively within a single engine. In the world of agents, that becomes a practical advantage because we can use Oracle as the unified memory core for both operational memory (SQL tables for history and logs) and semantic memory (vector search for retrieval).

Frequently Asked Questions

What is AI Agent memory? AI agent memory is the set of system components and techniques that enable an AI agent to store, recall, and update information over time. Because LLMs are inherently stateless—they have no built-in ability to remember previous sessions—agent memory provides the persistence layer that allows agents to maintain continuity across conversations, learn from past interactions, and adapt to user preferences.
Should I use a filesystem or a database for an AI agent's memory? It depends on your use case. Filesystems excel at single-user prototypes, artifact-heavy workflows, and rapid iteration—they're simple, transparent, and align with how LLMs naturally operate. Databases become essential when you need concurrent access, ACID transactions, semantic retrieval, or shared state across multiple agents or users. Many production systems use a hybrid approach: file-like interfaces for agent interaction, with database guarantees underneath.
How do I build an AI agent with long-term memory? Start by separating memory types: working memory (current context), semantic memory (knowledge base), episodic memory (interaction history), and procedural memory (behavioral rules). Implement storage: a filesystem for prototypes and a database for production. Add retrieval tools that the agent can call. Build a summarization to compress the old context. Test with multi-session scenarios where the agent must recall information from previous conversations.
What are semantic, episodic, and procedural memory in AI agents? These terms, borrowed from cognitive science, describe different types of agent memory. Semantic memory stores durable knowledge and facts (like saved documents or reference materials). Episodic memory captures experiences and interaction history (conversation transcripts, tool outputs). Procedural memory encodes how the agent should behave—instructions, rules, files like CLAUDE.md, and learned workflows that shape behavior across sessions.
What is the best database for AI applications? The best database depends on your requirements. For AI agent memory specifically, you need: vector search capability for semantic retrieval, SQL or structured queries for history and metadata, ACID transactions if multiple agents share state, and scalability as your memory corpus grows. Converged databases that combine these capabilities—like Oracle AI Database—reduce operational complexity versus running separate specialized systems.

Forem: Oracle Developers

Oracle AI Agent Memory: A Governed, Unified Memory Core for Enterprise AI Agents

Why Agent Memory, Why Now

Oracle AI Database as the Memory Core

Key Benefits For AI Workloads

Who Oracle AI Agent Memory Is For

Getting Started

What Is Agent Memory? A Beginner’s Guide for AI Developers

Why This Matters

What You'll Learn

What Is Agent Memory?

Context Window vs Memory: What Is the Difference?

The Four Types of Agent Memory

When Does Agent Memory Become a Data Problem?

Architecture Overview

Prerequisites

Step-by-Step Guide: A Simple Memory Layer with LangChain and Oracle

Step 1: Install the Packages

Step 2: Connect to a Persistent Store

Step 3: Store a Memory and Retrieve It Later

When Do You Actually Need Agent Memory?

Validation & Troubleshooting

Common Mistakes to Avoid

Key Takeaways

Frequently Asked Questions

Next Steps

Unified Memory Core for AI Agents

Key takeaways

What agent memory is and why it matters

You'll learn how to

Architecture overview

Prerequisites

Types of agent memory and their storage needs

Short-term vs. long-term memory

Episodic memory

Semantic memory

Matching memory types to storage technologies

Storage technologies for agent memory

In-memory storage: pros, cons, and use cases

File-based storage: when and why to use it

Databases: SQL, NoSQL, and key-value stores

Vector databases for semantic memory retrieval

Why text search belongs in the unified memory core

Hybrid storage architectures and patterns

Example: relationship-aware memory with SQL Property Graph

Scaling agent memory for large applications

Externalizing memory from models

Hierarchical and tiered memory layers

Retrieval-augmented generation (RAG) techniques

Retrieval evaluation metrics

Memory management: summarization, pruning, and lifecycle policies

Infrastructure considerations for scalability

Operations runbook for memory systems

Implementation walkthrough

How this guide maps to the notebook

Security and privacy considerations in agent memory storage

Common security risks and attack vectors

Data protection and access control strategies

Operational security checklist

Mitigation best practices and compliance

Recap of memory types and storage options

Guidelines for choosing and managing agent memory

Balancing performance, scalability, and security

Validation & troubleshooting: failure modes and fallback strategy

Frequently Asked Questions

Related documentation and further reading

Agent Memory with LangChain4j and Oracle AI Database

Database feature overview

Run the sample

Chat Memory vs Durable Memory

Hybrid retrieval: semantic + full-text search

Lightweight reranking

LangChain4j agent

Memory writeback

Recording user, agent, and tool messages

Why database memory is useful for agents

Code pointers

The tests validate the behavior that matters

Where you can take this next

Using Agent Skills to develop with Oracle AI Database