DEV Community

Ömer Berat Sezer
Ömer Berat Sezer

Posted on • Edited on

2 2 2 1 1

Agent with Memory, BuiltIn Search, LiteLLM using Google Agent Development Kit 🤖 Gemini, Fast API, Streamlit 🗃

In the past three months, TWO powerful AI agent development frameworks have been released:

  • Google Agent Development Kit (ADK)
  • AWS Strands Agents

Previously, we implemented with Google Agent Development Kit and learn how to create agent-based applications using local and remote MCP (Model Context Protocol) tools alongside Gemini, FastAPI, and a Streamlit.

Also, we've introduced AWS Strands Agent, and how to implement agent to use local and remote MCP tools in this POST.

In our current post, we’ll dive into the Google Agent Development Kit (ADK) and show how to create agent-based applications with Memory (long-term), Built-In Google Search Tool, and using different LLMs (Bedrock, OpenAI, Ollama, etc,) using Gemini 2.5, FastAPI, and a Streamlit interface.

Table of Contents

What is Google Agent Development Kit?

  • Agent Development Kit (ADK) is an open-source framework to develop AI agents to run anywhere:
    • VSCode, Terminal,
    • Docker Container
    • Google CloudRun
    • Kubernetes

Motivation: Why Use an Agent Framework?

  • Structured workflows: Automates decision-making, tool use, and response generation in a clear loop.
  • Session & memory support: Keeps context across interactions for smarter, personalized behavior.
  • Multi-agent orchestration: Lets specialized agents collaborate on complex tasks.
  • Tool integration: Easily plug in MCP tools, APIs, functions that agents know when and how to use them.
  • Multi-model flexibility: Use and switch between different LLMs (e.g. GPT, Claude, Nova).
  • Production-ready: Built-in logging, monitoring, and error handling for real-world use.

ADK Session, Memory Implementation

We've explored how a session keeps track of the history (events) and temporary state within a single ongoing conversation. But what happens when an agent needs to remember details from earlier conversations or tap into external knowledge sources? That’s where Long-Term Knowledge and the MemoryService become essential.

  • Session / State: Like your short-term memory during one specific chat.
  • Long-Term Knowledge (MemoryService): Like a searchable archive or knowledge library the agent can consult, potentially containing information from many past chats or other sources.

ADK provides services to manage these concepts:

  • SessionService: Manages the different conversation threads (Session objects)

    • Handles the lifecycle: creating, retrieving, updating (appending Events, modifying State), and deleting individual Sessions.
  • MemoryService: Manages the Long-Term Knowledge Store (Memory)

    • Handles ingesting information (often from completed Sessions) into the long-term store. Provides methods to search this stored knowledge based on queries.

ADK provides different ways to implement this long-term knowledge store:

  • InMemoryMemoryService:

    • How it works: Stores session information in the application's memory and performs basic keyword matching for searches.
    • Persistence: None. All stored knowledge is lost if the application restarts.
    • Requires: Nothing extra.
    • Best for: Prototyping, simple testing, scenarios where only basic keyword recall is needed and persistence isn't required.
  • VertexAiRagMemoryService:

    • How it works: Leverages Google Cloud's Vertex AI RAG (Retrieval-Augmented Generation) service. It ingests session data into a specified RAG Corpus and uses powerful semantic search capabilities for retrieval.
    • Persistence: Yes. The knowledge is stored persistently within the configured Vertex AI RAG Corpus.
    • Requires: A Google Cloud project, appropriate permissions, necessary SDKs (pip install google-adk[vertexai]), and a pre-configured Vertex AI RAG Corpus resource name/ID.
    • Best for: Production applications needing scalable, persistent, and semantically relevant knowledge retrieval, especially when deployed on Google Cloud.

Agent App with Memory, LiteLLM using Google ADK, Gemini, FastAPI and Streamlit

TWO small sample projects on GitHub:

Agent with Memory

  • To store data while running with InMemoryMemoryService().
from google.adk.agents import Agent
from google.adk.sessions import InMemorySessionService, Session
from google.adk.memory import InMemoryMemoryService

session_service = InMemorySessionService()
memory_service = InMemoryMemoryService()

...
runner = Runner(
    agent=agent,
    app_name=APP_NAME,
    session_service=session_service,
    memory_service=memory_service
)
Enter fullscreen mode Exit fullscreen mode

Agent with Different LLM Models

  • To use different LLM models using LiteLLM ("bedrock/meta.llama3-1-405b-instruct-v1:0").
from google.adk.agents import Agent
from google.adk.models.lite_llm import LiteLlm

Agent(
    model=LiteLlm(model="bedrock/meta.llama3-1-405b-instruct-v1:0"),
    # model=LiteLlm(model="ollama/llama3.2:1b"),
    name="bedrock_agent",
    instruction=instruction
)
Enter fullscreen mode Exit fullscreen mode

aws-bedrock

Agent with BuiltIn Search Tool

  • To use builtin Google Search tools (google_search) instead of using other tools like MCP Serper, Tavily.
from google.adk.tools import load_memory, google_search
from google.adk.agents import Agent

Agent(
    model=MODEL,
    name="google_search_agent",
    instruction=instruction,
    tools=[google_search],
)
Enter fullscreen mode Exit fullscreen mode

Installing Dependencies & Reaching Gemini Model

# .env
SERPER_API_KEY=PASTE_YOUR_ACTUAL_API_KEY_HERE
GOOGLE_GENAI_USE_VERTEXAI=FALSE
GOOGLE_API_KEY=PASTE_YOUR_ACTUAL_API_KEY_HERE
Enter fullscreen mode Exit fullscreen mode
  • Please install nodejs, npm, npx in your system to run npx-based MCP tool.
  • Please install requirements:
fastapi
uvicorn
google-adk
google-generativeai
Enter fullscreen mode Exit fullscreen mode

Frontend - Streamlit UI

# app.py
mport streamlit as st
import requests

st.set_page_config(page_title="Agent Chat", layout="centered")

if "messages" not in st.session_state:
    st.session_state.messages = []

st.title("Agent Memory, LiteLLM")

for msg in st.session_state.messages:
    with st.chat_message(msg["role"]):
        st.markdown(msg["content"])

user_query = st.chat_input("Ask to search for real-time data or anything...")

# send and display user + assistant messages
if user_query:
    st.chat_message("user").markdown(user_query)
    st.session_state.messages.append({"role": "user", "content": user_query})
    try:
        response = requests.post(
            "http://localhost:8000/ask",
            json={"query": user_query}
        )
        response.raise_for_status()
        agent_reply = response.json().get("response", "No response.")
    except Exception as e:
        agent_reply = f"Error: {str(e)}"

    st.chat_message("assistant").markdown(agent_reply)
    st.session_state.messages.append({"role": "assistant", "content": agent_reply})
Enter fullscreen mode Exit fullscreen mode

Backend for Memory App, Built-in Google Search Tool

# agent.py
from fastapi import FastAPI
from pydantic import BaseModel
from dotenv import load_dotenv
from google.genai import types
from google.adk.agents import Agent
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService, Session
from google.adk.memory import InMemoryMemoryService
from google.adk.tools import load_memory, google_search
load_dotenv()

# --- App/Session Config ---
APP_NAME = "search_memory_app"
USER_ID = "user123"
SESSION_ID = "session123"

MODEL = "gemini-2.5-flash-preview-04-17"
#MODEL="gemini-2.0-flash"

def create_agent() -> Agent:
    instruction = (
        "You are a helpful assistant. Search with 'google_search'."
    )
    return Agent(
        model=MODEL,
        name="google_search_agent",
        instruction=instruction,
        tools=[google_search],
    )

# --- Init Services ---
session_service = InMemorySessionService()
memory_service = InMemoryMemoryService()

# Create session once
session_service.create_session(
    app_name=APP_NAME,
    user_id=USER_ID,
    session_id=SESSION_ID
)

agent = create_agent()
runner = Runner(
    agent=agent,
    app_name=APP_NAME,
    session_service=session_service,
    memory_service=memory_service
)

app = FastAPI()

class QueryRequest(BaseModel):
    query: str

@app.post("/ask")
def ask_agent(req: QueryRequest):
    content = types.Content(role="user", parts=[types.Part(text=req.query)])
    events = runner.run(user_id=USER_ID, session_id=SESSION_ID, new_message=content)

    final_response = "No response received."
    for event in events:
        if event.is_final_response() and event.content and event.content.parts:
            final_response = event.content.parts[0].text
            break

    # Save session to memory after each turn
    session = session_service.get_session(app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID)
    memory_service.add_session_to_memory(session)

    return {"response": final_response}

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)
Enter fullscreen mode Exit fullscreen mode

Agent with Memory, Run & Demo

Run frontend (app.py):

streamlit run app.py
or
python -m streamlit run app.py
Enter fullscreen mode Exit fullscreen mode

Run backend (agent.py):

uvicorn agent:app --host 0.0.0.0 --port 8000
Enter fullscreen mode Exit fullscreen mode

Agent With Memory Demo: GIF on GitHub

Agent WithOUT Memory Demo: GIF on GitHub

Backend for LiteLLM (for different LLM models)

AWS Bedrock and Ollama models can be reached using LiteLLM.

# agent.py
from fastapi import FastAPI
from pydantic import BaseModel
from dotenv import load_dotenv
from google.genai import types
from google.adk.agents import Agent
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService, Session
from google.adk.models.lite_llm import LiteLlm
import boto3

load_dotenv()

APP_NAME = "search_memory_app"
USER_ID = "user123"
SESSION_ID = "session123"

## AWS Sample with Bedrock, one-by-one enable
def create_agent() -> Agent:
    instruction = (
        "You are a helpful assistant."
    )
    return Agent(
        model=LiteLlm(model="bedrock/meta.llama3-1-405b-instruct-v1:0"),
        name="bedrock_agent",
        instruction=instruction
    )

## Ollma Sample, one-by-one enable
# def create_agent() -> Agent:
#     instruction = (
#         "You are a helpful assistant."
#     )
#     return Agent(
#         model=LiteLlm(model="ollama/llama3.2:1b"),
#         name="ollama_agent",
#         instruction=instruction
#     )

session_service = InMemorySessionService()

session_service.create_session(
    app_name=APP_NAME,
    user_id=USER_ID,
    session_id=SESSION_ID
)

agent = create_agent()
runner = Runner(
    agent=agent,
    app_name=APP_NAME,
    session_service=session_service
)

app = FastAPI()

class QueryRequest(BaseModel):
    query: str

@app.post("/ask")
def ask_agent(req: QueryRequest):
    content = types.Content(role="user", parts=[types.Part(text=req.query)])
    events = runner.run(user_id=USER_ID, session_id=SESSION_ID, new_message=content)

    final_response = "No response received."
    for event in events:
        if event.is_final_response() and event.content and event.content.parts:
            final_response = event.content.parts[0].text
            break

    # Save session to memory after each turn
    session = session_service.get_session(app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID)

    return {"response": final_response}

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)
Enter fullscreen mode Exit fullscreen mode

Agent with LiteLLM, Run & Demo

Run frontend (app.py):

streamlit run app.py
or
python -m streamlit run app.py
Enter fullscreen mode Exit fullscreen mode

Run backend (agent.py):

uvicorn agent:app --host 0.0.0.0 --port 8000
Enter fullscreen mode Exit fullscreen mode

AWS Bedrock Demo: GIF on GitHub

Ollama Demo On-Prem: GIF on GitHub

Conclusion

In this post, we mentioned:

  • how to access Google Gemini 2.5,
  • how to implement Google ADK agent with Memory,
  • how to use/bind Built-In Google Search Tool to the agent,
  • how to implement Google ADK agent to use different LLMs with LiteLLM (Bedrock, Ollama) using Gemini, Fast API, Streamlit UI.

If you found the tutorial interesting, I’d love to hear your thoughts in the blog post comments. Feel free to share your reactions or leave a comment. I truly value your input and engagement 😉

For other posts 👉 https://dev.to/omerberatsezer 🧐

Follow for Tips, Tutorials, Hands-On Labs:

References

Your comments 🤔

  • Which tools are you using to develop AI Agents (e.g. Google ADK, AWS Strands, Google ADK, CrewAI, Langchain, etc.)?
  • What are you thinking about Google ADK?
  • Did you test to implement Memory before?

=> Welcome to any comments below related AI, agents for brainstorming 🤯

Google AI Education track image

Build Apps with Google AI Studio 🧱

This track will guide you through Google AI Studio's new "Build apps with Gemini" feature, where you can turn a simple text prompt into a fully functional, deployed web application in minutes.

Read more →

Top comments (5)

Collapse
 
dotallio profile image
Dotallio

Really appreciate how clearly you compared in-memory vs persistent memory with Vertex AI RAG - makes a big difference for scaling to production. Have you run into any challenges keeping memory consistent when switching between LLM backends?

Collapse
 
omerberatsezer profile image
Ömer Berat Sezer

Thanks 😊 I haven’t run into that scenario yet, but now I’m curious to test it. It’s a great idea to evaluate how memory behaves when swapping backends.

Collapse
 
omerberatsezer profile image
Ömer Berat Sezer • Edited

I've evaluated several AI agent frameworks including LangChain, CrewAI, AWS Strands, and Google ADK by testing them on a variety of use cases like agent development, multi-agent collaboration, MCP tool integration, support for different language models, and workflow orchestration. Still continue to evaluate Google ADK and AWS Strands by implementing different samples/projects (github.com/omerbsezer/Fast-LLM-Age...).

Collapse
 
nathan_tarbert profile image
Nathan Tarbert

extremely impressive, the practical code and all those memory details make me want to try this myself
you think memory will end up as must-have for agent apps or does it depend on the use case

Collapse
 
omerberatsezer profile image
Ömer Berat Sezer

Thanks 😊 Good question, it depends on the use case. For simple tasks (quick tasks like summarizing), memory isn’t needed. But as agents become more autonomous, multi-turn, or personalized, memory becomes a must-have. It’s what enables continuity, context retention, and better decision-making over time.

👋 Kindness is contagious

Discover fresh viewpoints in this insightful post, supported by our vibrant DEV Community. Every developer’s experience matters—add your thoughts and help us grow together.

A simple “thank you” can uplift the author and spark new discussions—leave yours below!

On DEV, knowledge-sharing connects us and drives innovation. Found this useful? A quick note of appreciation makes a real impact.

Okay