Forem: Richard Abishai

Building Context-Aware Agents with LangGraph

Richard Abishai — Mon, 22 Dec 2025 12:30:00 +0000

How to add memory, state, and long-term reasoning to LangGraph agents.

Most AI agents behave like goldfish — they respond only to the last message and forget everything else.

But real intelligence needs memory.

Context changes decisions.

History shapes reasoning.

Today we’ll build a LangGraph agent that:

remembers past interactions
stores its state
adapts reasoning based on context
reads/writes persistent memory
loops intelligently instead of starting fresh every time

Let’s get started.

⚙️ 1. Setup & Installation

Make sure you have LangGraph installed:

pip install langgraph langchain openai

If you're running this on a GPU or Colab, you’re good.

🧩 2. Idea: Context-Aware Agent Loop

Unlike stateless chatbot calls, a context-aware agent has:

State — what it knows so far

Memory — persistent information across runs

Tools — actions it can take

LLM nodes — thinking steps

In LangGraph, this becomes a state graph:

User → Planner → MemoryCheck → Executor → MemoryUpdate → Planner (loop)

🧠 3. Define the Agent State

LangGraph agents use pydantic-style states.

from typing import List, Optional
from langgraph.graph import StateGraph

class AgentState:
    history: List[str]
    task: Optional[str]
    memory: dict

This is the entire brain of your agent:

history → conversation log
task → current objective
memory → persistent knowledge

🔧 4. Add a Memory Backend (Simple JSON File)

Let’s create a tiny persistent memory store:

import json
import os

MEMORY_FILE = "agent_memory.json"

def load_memory():
    if not os.path.exists(MEMORY_FILE):
        return {}
    return json.load(open(MEMORY_FILE))

def save_memory(memory):
    json.dump(memory, open(MEMORY_FILE, "w"), indent=2)

You can replace this later with:

Redis

MongoDB

Pinecone vector memory

LangChain storage

But for demo purposes, JSON works beautifully.

🧠 5. LLM Nodes (Thinking + Planning)

from langchain.chat_models import ChatOpenAI
from langgraph.nodes import LLMNode

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

planner = LLMNode(
    id="planner",
    llm=llm,
    prompt=(
        "Given the memory and the current user task, "
        "decide: (1) what the user wants, (2) what steps to take next.\n"
        "Memory: {memory}\n"
        "Task: {task}\n"
        "History: {history}\n"
    )
)

The planner uses all accumulated context — not just the latest message.

🔧 6. MemoryCheck Node

This step checks whether the agent already knows something relevant:

def memory_check_node(state: AgentState):
    task = state.task or ""
    memory = state.memory

    matches = []
    for key, value in memory.items():
        if key.lower() in task.lower():
            matches.append((key, value))

    return {"memory_matches": matches}

🧰 7. Executor Node (Actions)

A placeholder tool:

def search_tool(query):
    return f"[Search results for '{query}']"

def executor_node(state: AgentState):
    task = state.task
    result = search_tool(task)
    return {"result": result}

You can later replace with:

web scraping

API calls

database lookups

custom tools

📥 8. MemoryUpdate Node

Store new knowledge after each run:

def memory_update_node(state: AgentState):
    memory = load_memory()
    last_result = state.result

    memory[state.task] = last_result
    save_memory(memory)

    return {"memory": memory}

Now your agent gets smarter with every loop.

🔗 9. Build the LangGraph

graph = StateGraph(AgentState)

graph.add_node("planner", planner)
graph.add_node("memory_check", memory_check_node)
graph.add_node("executor", executor_node)
graph.add_node("memory_update", memory_update_node)

graph.connect("planner", "memory_check")
graph.connect("memory_check", "executor")
graph.connect("executor", "memory_update")
graph.connect("memory_update", "planner")

graph.set_entry_point("planner")

agent = graph.compile()

This is a fully context-aware agent loop.

🚀 10. Run the Full Demo

state = agent.invoke({
    "history": [],
    "task": "Find articles on LangGraph",
    "memory": load_memory()
})

print(state)

Run it again — and watch memory kick in:

state = agent.invoke({
    "history": ["Hi again!"],
    "task": "Find articles on LangGraph",
    "memory": load_memory()
})

The second run will skip unnecessary work because the agent “remembers.”

🧩 Why Context Makes Agents Powerful

Fewer hallucinations — the agent doesn't forget past results

Action optimization — avoids repeating tasks

Long-term workflows — multi-step reasoning over time

Personalization — your agent remembers preferences

Multi-agent cooperation — context is shared across nodes

Context is the difference between an LLM and an agentic system.

🧠 Final Reflection

Building agents is no longer about chaining prompts.
It’s about orchestrating stateful intelligence.

Give your agent:

memory → to recall

state → to reason

structure → to act

And suddenly you’re not just prompting a model —
you’re designing a mind.

CNN vs Transformer – A Visual Comparison

Richard Abishai — Mon, 15 Dec 2025 12:30:00 +0000

How machines learn to see — locally vs globally.

If you’ve ever wondered why Vision Transformers (ViTs) replaced Convolutional Neural Networks (CNNs) so quickly in computer vision, you’re not alone.

Both models “see” — but they see differently.

Let’s visualize how these architectures process the same image step-by-step, and why attention has changed the way machines perceive the world.

🧩 1. How CNNs See: The Local Lens

A CNN processes an image piece by piece — a mosaic of local patterns.

Each convolution filter slides over pixels (a receptive field)
Early layers learn edges, textures, shapes
Deeper layers combine them into higher-level features (eyes, wheels, leaves)

import torch
import torch.nn as nn

cnn = nn.Sequential(
    nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1),
    nn.ReLU(),
    nn.MaxPool2d(2),
    nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1),
    nn.ReLU()
)

print(sum(p.numel() for p in cnn.parameters()), "trainable parameters")

Visual metaphor:
→ CNNs are like looking through a microscope — powerful, but only one patch at a time.
Local precision, global blindness.

🌍 2. How Transformers See: The Global Canvas

Transformers treat an image as a sequence of patches, not pixels.
Each patch becomes a token, similar to a word in NLP.

Instead of convolutions, a self-attention layer learns which patches matter to each other —
so the model can connect “eye” to “face,” or “wheel” to “car,” even if they’re far apart.

from transformers import ViTModel, ViTFeatureExtractor
import torch
from PIL import Image
import requests

url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/image_classification.jpeg"
image = Image.open(requests.get(url, stream=True).raw)

extractor = ViTFeatureExtractor.from_pretrained("google/vit-base-patch16-224")
model = ViTModel.from_pretrained("google/vit-base-patch16-224")

inputs = extractor(images=image, return_tensors="pt")
outputs = model(**inputs)
print("Hidden state shape:", outputs.last_hidden_state.shape)

Visual metaphor:
→ ViTs are like seeing from above — every part of the image talks to every other part.
Global awareness, context-rich understanding.

🔬 3. Visualizing the Difference

Let’s see this difference side-by-side:

Concept	CNN	Transformer
Vision style	Local → Hierarchical	Global → Relational
Input type	Pixels	Patches
Core operation	Convolution	Self-Attention
Memory	Spatial (fixed window)	Contextual (dynamic)
Inductive bias	Strong (translation invariant)	Minimal (learns from data)
Pretraining need	Works from scratch	Needs large datasets
Best for	Small data, simple patterns	Complex, global reasoning

🧠 4. Why Transformers Surpass CNNs (Eventually)

Transformers outperform CNNs when:

You have lots of data

You need long-range dependencies

You want to unify vision and language

But CNNs are still valuable — fast, efficient, and great at edge devices.
The real magic is in hybrid architectures — CNN + Attention (ConvNeXt, CoAtNet, etc.)
They combine the sharpness of convolution with the context of attention.

🧪 5. Minimal Code Comparison

Here’s a quick benchmark-style code snippet using PyTorch:

import torchvision.models as models
import torch

cnn_model = models.resnet18(pretrained=True)
vit_model = models.vit_b_16(pretrained=True)

x = torch.randn(1, 3, 224, 224)
cnn_out = cnn_model(x)
vit_out = vit_model(x)

print("CNN output:", cnn_out.shape)
print("ViT output:", vit_out.shape)

Output:

CNN output: torch.Size([1, 1000])
ViT output: torch.Size([1, 1000])

Same input, same output shape — completely different thought process.

🪞 6. The Philosophy Behind It

CNNs extract meaning.
Transformers connect meaning.

One builds understanding layer by layer.
The other builds it all at once — like a conversation, not a hierarchy.

Deep learning started with perception.
Transformers added awareness.

That’s the real leap.

⚡ 7. The Takeaway

CNNs = strong inductive bias, fast training, efficient on small data

Transformers = flexible reasoning, global context, scalability

Hybrids = the best of both worlds

Both architectures are tools — what matters is when to use which.

Use CNNs when your world is small.
Use Transformers when your world is connected.

Next Up → “Fine-Tuning Failures and Fixes” — my notes from debugging unstable Transformer training runs.

Fine-Tuning Llama 3 with PEFT

Richard Abishai — Wed, 10 Dec 2025 12:30:00 +0000

Efficient parameter tuning for smarter, faster large language models.

Fine-tuning large models like Llama 3 no longer means retraining billions of parameters.

Thanks to PEFT (Parameter-Efficient Fine-Tuning), we can adapt models for new tasks with minimal compute — and keep the original weights frozen.

Let’s go through the setup, training, and evaluation for fine-tuning a Llama 3 model using Hugging Face’s PEFT library.

⚙️ 1. Environment Setup

First, make sure you’re using Python 3.10+ with GPU access.

pip install torch transformers datasets peft accelerate bitsandbytes

These are the key libraries:

transformers — base Llama 3 model + tokenizer

datasets — data loading utilities

peft — adapter training framework

bitsandbytes — quantization support for low-memory GPUs

🧩 2. Load Model & Tokenizer

Here’s how to load a base model (Llama 3 – 8B or smaller variant).

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "meta-llama/Llama-3-8B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    load_in_8bit=True,
    device_map="auto"
)

We’re loading in 8-bit precision to save VRAM and allow fine-tuning on consumer GPUs (like RTX 4090 or A100).

🧠 3. Add a PEFT Adapter (LoRA)

The LoRA (Low-Rank Adaptation) method modifies only small matrices inside the model.

from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=8, 
    lora_alpha=32, 
    target_modules=["q_proj", "v_proj"], 
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

This trains less than 1 % of the model’s total parameters — keeping updates lightweight while maintaining accuracy.

📚 4. Load a Sample Dataset

You can use any dataset from Hugging Face Datasets, or create your own.

from datasets import load_dataset

dataset = load_dataset("Abirate/english_quotes")  # simple example dataset
def format_data(example):
    return {"input_ids": tokenizer(example["quote"], truncation=True, padding="max_length", max_length=128, return_tensors="pt").input_ids[0],
            "labels": tokenizer(example["quote"], truncation=True, padding="max_length", max_length=128, return_tensors="pt").input_ids[0]}

tokenized_dataset = dataset.map(format_data)

🚀 5. Train the Model

We’ll use the Trainer API with LoRA adapters.

from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    output_dir="./llama3-peft-demo",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=8,
    num_train_epochs=2,
    learning_rate=2e-4,
    logging_steps=10,
    save_steps=100,
    fp16=True
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
)

trainer.train()

Training modifies only the LoRA parameters — typically a few hundred MB — which makes it fast and cheap.

🔎 6. Evaluate and Generate

After training, merge the adapter and test the result.

from peft import PeftModel

model = PeftModel.from_pretrained(model, "./llama3-peft-demo")
prompt = "In the next decade, AI will"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

💾 7. Save & Upload to Hugging Face Hub

model.save_pretrained("./llama3-finetuned")
tokenizer.save_pretrained("./llama3-finetuned")

Optionally, share it with the community:

huggingface-cli login
huggingface-cli upload ./llama3-finetuned

🧩 Why PEFT Matters

Efficient: updates only a fraction of parameters

Modular: you can swap adapters for different domains

Scalable: fine-tune huge models on affordable hardware

In short: PEFT turns fine-tuning into plug-and-play intelligence.

🌟 Wrap-Up

You just fine-tuned a Llama 3 model in under an hour with minimal compute.
This workflow scales easily to domain-specific tasks — chatbots, summarizers, or research assistants.

Next Up → “Fine-Tuning Failures and Fixes” — how to debug instability, manage catastrophic forgetting, and evaluate your adapters.

Vector Databases 101: FAISS vs Pinecone

Richard Abishai — Wed, 03 Dec 2025 12:30:00 +0000

Because even intelligence needs a memory.

When you build an AI system — a chatbot, an agent, or a recommender — there’s one quiet hero behind the scenes: the vector database.

It’s where context lives.

Where similarity replaces keyword search.

And where your model stops guessing — and starts remembering.

In this guide, we’ll unpack what vector databases do, and how to use two of the most popular ones: FAISS and Pinecone.

🧠 What Is a Vector Database?

At its core, a vector database stores embeddings — high-dimensional numerical representations of data (like text, images, or audio).

Instead of matching exact words, it finds items that are semantically close in vector space.

Imagine plotting meaning as coordinates:

“AI” and “Machine Learning” would be near each other.
“Coffee” and “Quantum Physics” would probably not.

This spatial representation lets AI systems perform semantic search, contextual retrieval, and memory-based reasoning.

⚙️ How It Works (In 3 Steps)

Embed your data using a model like text-embedding-ada-002 or sentence-transformers.
Store those vectors in a database built for fast similarity search.
Query by meaning instead of by keyword — the database returns the closest matches.

That’s all a “vector database” really is: a search engine for ideas, not just words.

🧩 FAISS — Local and Lightning Fast

FAISS (by Meta AI) is an open-source library for efficient similarity search.

It’s perfect for local, offline, or prototype-scale projects.

🔧 Install FAISS

pip install faiss-cpu
# or if you have GPU support:
# pip install faiss-gpu

💾 Build a Simple FAISS Index

import faiss
import numpy as np
from sentence_transformers import SentenceTransformer

# Step 1: Create embeddings
model = SentenceTransformer('all-MiniLM-L6-v2')
texts = ["AI is evolving fast", "I love space exploration", "Quantum computing is fascinating"]
embeddings = model.encode(texts)

# Step 2: Initialize FAISS index
dimension = embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)
index.add(np.array(embeddings))

# Step 3: Search
query = model.encode(["Machine learning is the future"])
distances, results = index.search(np.array(query), k=2)

print("Results:", [texts[i] for i in results[0]])

✅ Why FAISS?

Open-source and free

Blazing fast for in-memory search

Great for experimentation and small-scale RAG (retrieval-augmented generation) pipelines

⚠️ Limitations:

No built-in persistence (you handle saving/loading manually)

Not ideal for distributed or large-scale production systems

☁️ Pinecone — Managed, Scalable, and Production-Ready

Pinecone is a fully managed vector database built for production-scale workloads.
Think of it as FAISS in the cloud — with persistence, scaling, and a beautiful dashboard.

🔧 Install Pinecone Client

pip install pinecone-client openai

🧠 Setup and Store Embeddings

import pinecone
from openai import OpenAI

# Initialize Pinecone
pinecone.init(api_key="YOUR_PINECONE_API_KEY", environment="us-west1-gcp")

# Create index
index_name = "demo-index"
if index_name not in pinecone.list_indexes():
    pinecone.create_index(index_name, dimension=1536)

index = pinecone.Index(index_name)

# Create embeddings using OpenAI
client = OpenAI(api_key="YOUR_OPENAI_KEY")
texts = ["AI agents are transforming automation", "Space inspires technology", "Machine learning changes industries"]

embeddings = [client.embeddings.create(model="text-embedding-ada-002", input=t).data[0].embedding for t in texts]

# Upsert to Pinecone
ids = [f"vec{i}" for i in range(len(texts))]
index.upsert(vectors=list(zip(ids, embeddings)))

# Query
query_text = "Automation through intelligent agents"
query_emb = client.embeddings.create(model="text-embedding-ada-002", input=query_text).data[0].embedding

results = index.query(vector=query_emb, top_k=2, include_metadata=True)
print(results)

✅ Why Pinecone?

Persistent, reliable, and distributed

Built-in metrics and dashboards

Works beautifully with LangChain and OpenAI

⚠️ Limitations:

Paid for large workloads

Requires API setup and internet access

🔗 Integrating With LangChain

LangChain makes it easy to swap FAISS or Pinecone as your backend.

from langchain.vectorstores import FAISS, Pinecone
from langchain.embeddings.openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

# For FAISS (local)
faiss_store = FAISS.from_texts(["AI", "Machine Learning"], embeddings)

# For Pinecone (cloud)
pinecone_store = Pinecone.from_texts(
    ["AI", "Machine Learning"],
    embeddings,
    index_name="demo-index"
)

Now your agent or RAG pipeline can retrieve context dynamically during conversations or workflows.

⚡ FAISS vs Pinecone — Quick Comparison

Feature	FAISS	Pinecone
Type	Local Library	Managed Cloud Service
Persistence	Manual save/load	Automatic
Scalability	Single-machine	Clustered
Cost	Free	Pay-as-you-scale
Best For	Prototyping & Research	Production-grade AI apps

🪐 When to Use Each

Use FAISS if you’re experimenting, running local notebooks, or building a personal project.

Use Pinecone if you’re deploying agents, chatbots, or any system that needs long-term, shared memory.

In most modern stacks, developers start local (FAISS), then scale to managed (Pinecone) once the system matures — it’s the same philosophy as model development itself.

🧩 Reflection

Every intelligent system needs memory — something to connect yesterday’s learning to today’s question.
Vector databases are that memory layer.

They don’t make your models smarter.
They make them aware.

💡 Try both setups.

Clone the examples above, plug them into your LangChain or LangGraph agents, and see which fits your workflow best.

Next Up → Fine-Tuning Llama 3 with PEFT
Your model can now remember.
Next, let’s teach it to specialize.

Build Your First LangGraph Agent

Richard Abishai — Mon, 24 Nov 2025 11:30:00 +0000

Make your models act, not just think.

We’ve built transformers. We’ve run inference.

Now it’s time to give our models something new — agency.

LangGraph is a framework that helps you build graph-based agentic systems.

Instead of chaining prompts in a line, LangGraph lets your agents reason, loop, and decide what to do next.

In this guide, we’ll build a minimal agent that plans a task, performs it, and summarizes the result.

🧩 1. Install LangGraph and Dependencies

LangGraph builds on top of LangChain, so install both.

pip install langgraph langchain openai

(If you’re using Hugging Face or Anthropic models, add their SDKs too.)

⚙️ 2. Create a Basic Workflow File

Create a file named simple_agent.py and start with imports:

from langgraph.graph import Graph
from langchain.chat_models import ChatOpenAI
from langgraph.nodes import ToolNode, LLMNode

LangGraph organizes logic as a directed graph:
each node can call a model, a tool, or another node.

🧠 3. Define the Nodes (Agent Logic)

Let’s build two small nodes — one that plans, another that acts.

# LLM node for planning
planner = LLMNode(
    id="planner",
    llm=ChatOpenAI(model_name="gpt-4-turbo", temperature=0),
    prompt="Plan how to complete the user's goal step by step."
)

# Tool node for execution
def search_tool(query: str) -> str:
    # Placeholder for real API calls or DB lookups
    return f"Search results for: {query}"

executor = ToolNode(
    id="executor",
    tool=search_tool
)

🔗 4. Connect the Nodes (The Graph)

Now, link how the information flows.

graph = Graph()

graph.connect("planner", "executor")     # plan → execute
graph.connect("executor", "planner")     # feedback loop
graph.set_entry_point("planner")         # start at planner

This structure allows iterative reasoning — the planner can refine its plan using feedback from execution.

🚀 5. Run the Agent

if __name__ == "__main__":
    result = graph.run("Find three recent AI papers on LangGraph.")
    print(result)

Run:

python simple_agent.py

You’ll see your model reason, call the search_tool, and return a structured summary.

⚡ 6. Add Memory (Optional)

Agents get smarter with memory.
LangGraph supports stateful nodes and context passing.

from langgraph.memory import Memory

memory = Memory()
graph.attach_memory(memory)

Now, each loop iteration can remember previous steps — useful for long tasks or multi-turn interactions.

🧠 7. Why Graphs Beat Chains

Traditional LangChain flows are linear — A → B → C.
LangGraph introduces feedback and branching, which enables:

Re-planning after failure
Parallel node execution
Conditional routing
Multi-agent collaboration

It’s intelligence with structure.

🧰 8. Full Example (Minimal Agent)

from langgraph.graph import Graph
from langgraph.nodes import LLMNode, ToolNode
from langchain.chat_models import ChatOpenAI

def search_tool(q):
    return f"Mock result for {q}"

planner = LLMNode(
    id="planner",
    llm=ChatOpenAI(model_name="gpt-4o-mini"),
    prompt="Plan steps to achieve: {input}"
)

executor = ToolNode(id="executor", tool=search_tool)

graph = Graph()
graph.connect("planner", "executor")
graph.connect("executor", "planner")
graph.set_entry_point("planner")

result = graph.run("List three upcoming space missions to Mars.")
print(result)

🪐 9. Ideas to Extend

Add a web-scraping tool for real data

Integrate a LangChain vector store for context

Wrap your agent in a FastAPI endpoint

Add voice I/O using Whisper or gTTS

You’re not limited to text — LangGraph agents can handle any modular pipeline.

🧩 10. Reflection

Building your first LangGraph agent teaches a deeper lesson:
Intelligence isn’t linear.
Real reasoning loops, adjusts, and re-tries — just like we do.

When you visualize your agent as a graph, you start designing systems that can truly think through problems, not just complete tasks.

Next Up → “The Day I Broke My Model” on Medium — a story about what happens when curiosity meets chaos.
Follow for more tutorials blending AI agents, physics, and automation — the pillars of my Quantum Codecast universe.

Getting Started With Hugging Face Transformers

Richard Abishai — Mon, 17 Nov 2025 11:30:00 +0000

Building your first language model pipeline — the right way.

When I first opened the Hugging Face documentation, it felt like stepping into a library that spoke every language of intelligence.

Thousands of models, endless tasks — but one philosophy: make state-of-the-art accessible.

If you’ve ever wanted to move beyond using GPTs and start building with them, this is your first step.

Let’s walk through the core building blocks — from installation to generating your own predictions.

⚙️ Step 1: Install the Essentials

pip install transformers torch sentencepiece

If you’re working in Colab, add --upgrade to avoid dependency issues.
transformers is the heart, torch runs the model, and sentencepiece handles tokenization for multilingual models.

🧩 Step 2: Load a Pre-trained Model and Tokenizer

from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_name = "distilbert-base-uncased-finetuned-sst-2-english"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

This one line pulls a complete, fine-tuned sentiment analysis model — ready to use out-of-the-box.

💬 Step 3: Run Inference

from transformers import pipeline

nlp = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)

result = nlp("I love how machines can actually learn!")
print(result)

Output:

[{'label': 'POSITIVE', 'score': 0.9997}]

That’s it — you’ve just used a transformer.
No training, no dataset, just intelligence on tap.

🧠 Step 4: Try Another Task

Transformers aren’t limited to sentiment.
You can change "sentiment-analysis" to:

"text-generation"

"question-answering"

"summarization"

"translation"

Example:

from transformers import pipeline

gen = pipeline("text-generation", model="gpt2")
print(gen("Artificial intelligence is", max_length=30, num_return_sequences=1))

🧬 Step 5: Go Deeper — Fine-Tune on Your Own Data

Once you’re comfortable, you can fine-tune models for your domain.

from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    num_train_epochs=3,
    weight_decay=0.01
)

With the right dataset and a GPU, you can train your own specialized model — whether that’s for medical text, code summarization, or research papers.

🧩 Reflection

Every transformer you load is more than a model — it’s a distillation of human language, reasoning, and bias into code.
Learning to use them isn’t just about syntax; it’s about understanding how intelligence scales.

If you want to explore how we think about thinking, check out my Medium essay:
👉 Why I Build With Intelligence

It’s the story behind why I started working with AI in the first place.

💡 If this post helped you, leave a ❤️ or comment below.
Follow me for more practical guides on agents, AI systems, and quantum-inspired learning.

Next up → Build Your First LangGraph Agent — where we’ll make these models act, not just think.