In the past three months, TWO powerful AI
agent development frameworks have been released:
- Google Agent Development Kit (ADK)
- AWS Strands Agents
Previously, we implemented with Google Agent Development Kit
and learn how to create agent-based applications using local and remote MCP (Model Context Protocol)
tools alongside Gemini, FastAPI, and a Streamlit.
Also, we've introduced AWS Strands Agent
, and how to implement agent to use local and remote MCP tools in this POST.
In our current post, we’ll dive into the Google Agent Development Kit (ADK)
and show how to create agent-based applications with Memory (long-term)
, Built-In Google Search Tool
, and using different LLMs
(Bedrock, OpenAI, Ollama, etc,) using Gemini 2.5, FastAPI, and a Streamlit interface.
Table of Contents
- What is Google Agent Development Kit?
- Motivation: Why Use an Agent Framework?
- ADK Session, Memory Implementation
-
Agent App with Memory, LiteLLM using Google ADK, Gemini, FastAPI and Streamlit
- Agent with Memory
- Agent with Different LLM Models
- Agent with BuiltIn Search Tool
- Installing Dependencies & Reaching Gemini Model
- Frontend - Streamlit UI
- Backend for Memory App, Built-in Google Search Tool
- Agent with Memory, Run & Demo
- Backend for LiteLLM (for different LLM models)
- Agent with LiteLLM, Run & Demo
- Conclusion
- References
What is Google Agent Development Kit?
- Agent Development Kit (ADK) is an
open-source framework
to develop AI agentsto run anywhere
:- VSCode, Terminal,
- Docker Container
- Google CloudRun
- Kubernetes
Motivation: Why Use an Agent Framework?
- Structured workflows: Automates decision-making, tool use, and response generation in a clear loop.
- Session & memory support: Keeps context across interactions for smarter, personalized behavior.
- Multi-agent orchestration: Lets specialized agents collaborate on complex tasks.
- Tool integration: Easily plug in MCP tools, APIs, functions that agents know when and how to use them.
- Multi-model flexibility: Use and switch between different LLMs (e.g. GPT, Claude, Nova).
- Production-ready: Built-in logging, monitoring, and error handling for real-world use.
ADK Session, Memory Implementation
We've explored how a session
keeps track of the history (events) and temporary state within a single ongoing conversation. But what happens when an agent needs to remember details from earlier conversations
or tap into external knowledge sources? That’s where Long-Term Knowledge
and the MemoryService
become essential.
- Session / State: Like your short-term memory during one specific chat.
- Long-Term Knowledge (MemoryService): Like a searchable archive or knowledge library the agent can consult, potentially containing information from many past chats or other sources.
ADK provides services to manage these concepts:
-
SessionService: Manages the
different conversation threads
(Session objects)- Handles the lifecycle: creating, retrieving, updating (appending Events, modifying State), and deleting
individual Sessions
.
- Handles the lifecycle: creating, retrieving, updating (appending Events, modifying State), and deleting
-
MemoryService: Manages the
Long-Term Knowledge Store
(Memory)- Handles ingesting information (often from completed Sessions) into the
long-term store
. Provides methods to search this stored knowledge based on queries.
- Handles ingesting information (often from completed Sessions) into the
ADK provides different ways to implement this long-term knowledge store
:
-
InMemoryMemoryService:
- How it works: Stores session information in the application's memory and performs basic keyword matching for searches.
-
Persistence:
None
. All stored knowledge islost if the application restarts
. - Requires: Nothing extra.
- Best for: Prototyping, simple testing, scenarios where only basic keyword recall is needed and persistence isn't required.
-
VertexAiRagMemoryService:
- How it works: Leverages Google Cloud's Vertex AI RAG (Retrieval-Augmented Generation) service. It ingests session data into a specified RAG Corpus and uses powerful semantic search capabilities for retrieval.
-
Persistence:
Yes
. The knowledge is storedpersistently within the configured Vertex AI RAG Corpus
. - Requires: A Google Cloud project, appropriate permissions, necessary SDKs (pip install google-adk[vertexai]), and a pre-configured Vertex AI RAG Corpus resource name/ID.
- Best for: Production applications needing scalable, persistent, and semantically relevant knowledge retrieval, especially when deployed on Google Cloud.
Agent App with Memory, LiteLLM using Google ADK, Gemini, FastAPI and Streamlit
TWO small sample projects
on GitHub:
- https://github.com/omerbsezer/Fast-LLM-Agent-MCP/tree/main/agents/google_adk/04-agent-memory-builtin-search-tool
- https://github.com/omerbsezer/Fast-LLM-Agent-MCP/tree/main/agents/google_adk/05-agent-litellm-bedrock-ollama
Agent with Memory
- To store data while running with
InMemoryMemoryService()
.
from google.adk.agents import Agent
from google.adk.sessions import InMemorySessionService, Session
from google.adk.memory import InMemoryMemoryService
session_service = InMemorySessionService()
memory_service = InMemoryMemoryService()
...
runner = Runner(
agent=agent,
app_name=APP_NAME,
session_service=session_service,
memory_service=memory_service
)
Agent with Different LLM Models
- To use different LLM models using LiteLLM (
"bedrock/meta.llama3-1-405b-instruct-v1:0"
).
from google.adk.agents import Agent
from google.adk.models.lite_llm import LiteLlm
Agent(
model=LiteLlm(model="bedrock/meta.llama3-1-405b-instruct-v1:0"),
# model=LiteLlm(model="ollama/llama3.2:1b"),
name="bedrock_agent",
instruction=instruction
)
Agent with BuiltIn Search Tool
- To use builtin Google Search tools (
google_search
) instead of using other tools like MCP Serper, Tavily.
from google.adk.tools import load_memory, google_search
from google.adk.agents import Agent
Agent(
model=MODEL,
name="google_search_agent",
instruction=instruction,
tools=[google_search],
)
Installing Dependencies & Reaching Gemini Model
- Go to: https://aistudio.google.com/
- Get API key to reach Gemini
- Please add .env with Gemini and Serper API Keys
# .env
SERPER_API_KEY=PASTE_YOUR_ACTUAL_API_KEY_HERE
GOOGLE_GENAI_USE_VERTEXAI=FALSE
GOOGLE_API_KEY=PASTE_YOUR_ACTUAL_API_KEY_HERE
- Please install nodejs, npm, npx in your system to run npx-based MCP tool.
- Please install requirements:
fastapi
uvicorn
google-adk
google-generativeai
Frontend - Streamlit UI
# app.py
mport streamlit as st
import requests
st.set_page_config(page_title="Agent Chat", layout="centered")
if "messages" not in st.session_state:
st.session_state.messages = []
st.title("Agent Memory, LiteLLM")
for msg in st.session_state.messages:
with st.chat_message(msg["role"]):
st.markdown(msg["content"])
user_query = st.chat_input("Ask to search for real-time data or anything...")
# send and display user + assistant messages
if user_query:
st.chat_message("user").markdown(user_query)
st.session_state.messages.append({"role": "user", "content": user_query})
try:
response = requests.post(
"http://localhost:8000/ask",
json={"query": user_query}
)
response.raise_for_status()
agent_reply = response.json().get("response", "No response.")
except Exception as e:
agent_reply = f"Error: {str(e)}"
st.chat_message("assistant").markdown(agent_reply)
st.session_state.messages.append({"role": "assistant", "content": agent_reply})
Backend for Memory App, Built-in Google Search Tool
# agent.py
from fastapi import FastAPI
from pydantic import BaseModel
from dotenv import load_dotenv
from google.genai import types
from google.adk.agents import Agent
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService, Session
from google.adk.memory import InMemoryMemoryService
from google.adk.tools import load_memory, google_search
load_dotenv()
# --- App/Session Config ---
APP_NAME = "search_memory_app"
USER_ID = "user123"
SESSION_ID = "session123"
MODEL = "gemini-2.5-flash-preview-04-17"
#MODEL="gemini-2.0-flash"
def create_agent() -> Agent:
instruction = (
"You are a helpful assistant. Search with 'google_search'."
)
return Agent(
model=MODEL,
name="google_search_agent",
instruction=instruction,
tools=[google_search],
)
# --- Init Services ---
session_service = InMemorySessionService()
memory_service = InMemoryMemoryService()
# Create session once
session_service.create_session(
app_name=APP_NAME,
user_id=USER_ID,
session_id=SESSION_ID
)
agent = create_agent()
runner = Runner(
agent=agent,
app_name=APP_NAME,
session_service=session_service,
memory_service=memory_service
)
app = FastAPI()
class QueryRequest(BaseModel):
query: str
@app.post("/ask")
def ask_agent(req: QueryRequest):
content = types.Content(role="user", parts=[types.Part(text=req.query)])
events = runner.run(user_id=USER_ID, session_id=SESSION_ID, new_message=content)
final_response = "No response received."
for event in events:
if event.is_final_response() and event.content and event.content.parts:
final_response = event.content.parts[0].text
break
# Save session to memory after each turn
session = session_service.get_session(app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID)
memory_service.add_session_to_memory(session)
return {"response": final_response}
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)
Agent with Memory, Run & Demo
Run frontend (app.py):
streamlit run app.py
or
python -m streamlit run app.py
Run backend (agent.py):
uvicorn agent:app --host 0.0.0.0 --port 8000
Agent With Memory Demo: GIF on GitHub
Agent WithOUT Memory Demo: GIF on GitHub
Backend for LiteLLM (for different LLM models)
AWS Bedrock and Ollama models can be reached using LiteLLM.
# agent.py
from fastapi import FastAPI
from pydantic import BaseModel
from dotenv import load_dotenv
from google.genai import types
from google.adk.agents import Agent
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService, Session
from google.adk.models.lite_llm import LiteLlm
import boto3
load_dotenv()
APP_NAME = "search_memory_app"
USER_ID = "user123"
SESSION_ID = "session123"
## AWS Sample with Bedrock, one-by-one enable
def create_agent() -> Agent:
instruction = (
"You are a helpful assistant."
)
return Agent(
model=LiteLlm(model="bedrock/meta.llama3-1-405b-instruct-v1:0"),
name="bedrock_agent",
instruction=instruction
)
## Ollma Sample, one-by-one enable
# def create_agent() -> Agent:
# instruction = (
# "You are a helpful assistant."
# )
# return Agent(
# model=LiteLlm(model="ollama/llama3.2:1b"),
# name="ollama_agent",
# instruction=instruction
# )
session_service = InMemorySessionService()
session_service.create_session(
app_name=APP_NAME,
user_id=USER_ID,
session_id=SESSION_ID
)
agent = create_agent()
runner = Runner(
agent=agent,
app_name=APP_NAME,
session_service=session_service
)
app = FastAPI()
class QueryRequest(BaseModel):
query: str
@app.post("/ask")
def ask_agent(req: QueryRequest):
content = types.Content(role="user", parts=[types.Part(text=req.query)])
events = runner.run(user_id=USER_ID, session_id=SESSION_ID, new_message=content)
final_response = "No response received."
for event in events:
if event.is_final_response() and event.content and event.content.parts:
final_response = event.content.parts[0].text
break
# Save session to memory after each turn
session = session_service.get_session(app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID)
return {"response": final_response}
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)
Agent with LiteLLM, Run & Demo
Run frontend (app.py):
streamlit run app.py
or
python -m streamlit run app.py
Run backend (agent.py):
uvicorn agent:app --host 0.0.0.0 --port 8000
AWS Bedrock Demo: GIF on GitHub
Ollama Demo On-Prem: GIF on GitHub
Conclusion
In this post, we mentioned:
- how to access Google Gemini 2.5,
- how to implement Google ADK agent with
Memory
, - how to use/bind
Built-In Google Search Tool
to the agent, - how to implement Google ADK agent to use
different LLMs with LiteLLM
(Bedrock, Ollama) using Gemini, Fast API, Streamlit UI.
If you found the tutorial interesting, I’d love to hear your thoughts in the blog post comments. Feel free to share your reactions or leave a comment. I truly value your input and engagement 😉
For other posts 👉 https://dev.to/omerberatsezer 🧐
Follow for Tips, Tutorials, Hands-On Labs:
References
Your comments 🤔
- Which tools are you using to develop AI Agents (e.g.
Google ADK
,AWS Strands, Google ADK
, CrewAI, Langchain, etc.)? - What are you thinking about Google ADK?
- Did you test to implement Memory before?
=> Welcome to any comments below related AI, agents for brainstorming 🤯
Top comments (5)
Really appreciate how clearly you compared in-memory vs persistent memory with Vertex AI RAG - makes a big difference for scaling to production. Have you run into any challenges keeping memory consistent when switching between LLM backends?
Thanks 😊 I haven’t run into that scenario yet, but now I’m curious to test it. It’s a great idea to evaluate how memory behaves when swapping backends.
I've evaluated several AI agent frameworks including LangChain, CrewAI, AWS Strands, and Google ADK by testing them on a variety of use cases like agent development, multi-agent collaboration, MCP tool integration, support for different language models, and workflow orchestration. Still continue to evaluate Google ADK and AWS Strands by implementing different samples/projects (github.com/omerbsezer/Fast-LLM-Age...).
extremely impressive, the practical code and all those memory details make me want to try this myself
you think memory will end up as must-have for agent apps or does it depend on the use case
Thanks 😊 Good question, it depends on the use case. For simple tasks (quick tasks like summarizing), memory isn’t needed. But as agents become more autonomous, multi-turn, or personalized, memory becomes a must-have. It’s what enables continuity, context retention, and better decision-making over time.