Forem: Joao Melo

AI That Actually Does Stuff: Autonomous Agents Explained

Joao Melo — Mon, 25 May 2026 06:14:29 +0000

Right now, most AI is basically a hyper-intelligent parrot. You type a prompt, it spits out text, and then it sits there waiting for you to tell it what to do next. It has no initiative. If you want it to plan a vacation, you have to ask for flights, then ask for hotels, then ask for activities, and copy-paste everything yourself. It’s a tool, like a hammer.

Autonomous Agents change that entirely. They don't just talk; they do.

What the Heck is an Autonomous Agent?

Imagine instead of a hammer, you hired a highly capable human assistant. You don't tell them exactly how to move their fingers to type an email. You just say: "Hey, find me a decent flight to Tokyo under $1,000 for next month, book it, and add it to my calendar."

Then you walk away and get a coffee.

An autonomous agent is AI software designed to act like that assistant. You give it a high-level goal, and it figures out the step-by-step plan, uses digital tools, fixes its own mistakes, and completes the task without you babysitting it.

How It Works (The 4-Part Brain)

To understand how an agent functions without losing your mind, think of it as a person working a regular office job. It relies on four main pillars:

The Brain (The LLM): This is the core AI model. It handles the thinking, reasoning, and decision-making.
The Planning: The agent breaks a massive goal into smaller, bite-sized tasks. If a step fails, it loops back, figures out why, and tries a different approach.
The Memory:
- Short-term memory: Keeping track of what it's doing right now in the middle of a task.
- Long-term memory: Remembering your preferences, past choices, and rules over weeks or months.
The Tools: This is the game-changer. An agent isn't locked in a chat box. It can be given "hands" to interact with the real world—like browsing the web, using a calculator, sending emails, or connecting to reservation systems.

The Difference in a Nutshell:

Standard AI: You ask for a recipe. It gives you a text list of ingredients.

Autonomous Agent: You ask for a meal. It checks your fridge, orders the missing groceries online, and sets a timer for dinner.

Wait, Isn't that just AGI?

The short answer is: No, but it's the closest stepping stone we have.

People often mix up Autonomous Agents and AGI (Artificial General Intelligence). Here is the distinction:

AGI is the ultimate holy grail of computer science. It is an AI that possesses human-level intelligence across everything—it can write poetry, invent a new physics theory, learn to ride a bicycle, and understand human emotions just as well as (or better than) any human. True AGI doesn't exist yet.
Autonomous Agents are highly focused, independent systems that exist today. They use current AI brains to execute complex workflows.

Think of AGI as a fully conscious, living digital human. An autonomous agent is more like an incredibly dedicated, tireless smart-drone running a specific mission for you.

Real-World Examples: From Lazy Text to Real Action

To see how this actually changes your life, let’s look at two everyday scenarios.

Scenario A: Booking a Vacation

Regular AI: You ask for hotel recommendations. It gives you a list of five cool-looking places. You still have to click the links, check availability, compare prices against your budget, and manually type in your credit card info.
Autonomous Agent: You give it a budget of $1,500 and tell it you want a beachfront hotel with a gym for next weekend.
- The agent browses travel sites.
- It filters out places without gyms.
- It checks real-time availability.
- It realizes one hotel is $100 over budget, so it searches for a coupon code online.
- It securely fills out the booking form and texts you: "Found the perfect spot at 15% off. Click 'Confirm' to let me pay for it."

Scenario B: The Customer Service Nightmare

Regular AI: You paste a company’s return policy and ask how to get a refund. It summarizes the text into three bullet points. You still have to write the email and track down the receipt.
Autonomous Agent: You say, "Get me a refund for this broken blender."
- The agent searches your emails to find the digital receipt.
- It opens the company's website and logs into the support portal.
- It drafts a polite but firm complaint letter, attaches the receipt, and submits the ticket.
- It monitors your inbox for a reply. If the company asks for a photo of the damage, the agent pings your phone: "Hey, snap a photo of the blender so I can send it to them and finish this."

The "Uh-Oh" Factor: What Happens When They Fail?

Because these systems operate on their own, they can occasionally lose their minds in hilarious (and terrifying) ways if they aren't built correctly.

The Infinite Loop: You tell an agent to buy a specific shoe. The shoe is out of stock. The agent refreshes the page, sees it’s out of stock, waits a second, and refreshes again... forever. It gets stuck in a digital existential crisis until someone pulls the plug.
The Over-Achiever: You tell an agent to "find the cheapest flight to Paris." It spends three days searching thousands of sketchy, virus-laden forums, automatically signs you up for 42 travel newsletters, and completely fills your inbox with junk just to save you $4.
The Big Spender: If you give an agent unrestricted access to your credit card without a confirmation step, a tiny misunderstanding in its code could result in 500 pounds of premium dog food showing up at your house because it misinterpreted a text.

The Golden Rule of Agents: Never give an AI agent your wallet without setting a maximum spending limit and forcing it to ask for your final approval before hitting "Buy."

The Takeaway

We are rapidly leaving the era where you have to learn how to write the "perfect prompt" to get a computer to do what you want.

In the very near future, you won't use apps by clicking buttons and navigating menus. You will simply talk to your autonomous agents like they are your personal staff, and they will go out into the digital wilderness to wrestle the internet into submission for you.

Deep Dive into OpenCode Agent Orchestration

Joao Melo — Mon, 18 May 2026 15:12:49 +0000

The evolution of AI coding assistants has rapidly shifted from single-prompt chat interactions to autonomous, multi-agent systems. At the forefront of this movement is OpenCode, a terminal-native AI engine built to read, write, test, and debug code directly within your local environment.

While standard AI tools handle simple, isolated edits, solving complex software tickets—such as a multi-layered codebase refactor, writing matching integration tests, or updating complex deployment pipelines—requires orchestration. By dividing responsibilities into Agents, Sub-agents, Tools, and Skills, the OpenCode ecosystem provides an enterprise-ready blueprint for true software autonomy.

The Four Pillars of Agentic Autonomy

To build an automated workflow, you must understand how its core layers pass context and execute logic.

1. Primary Agents (The Project Leads)

Primary agents are the high-level controllers that you interface with during a terminal session. They maintain the overarching goal of the task and map out the step-by-step strategy.

Build Mode: The default primary agent. It operates with full tool write privileges (file operations, system terminal access) and is optimized for heavy implementation.
Plan Mode: A restricted, read-only primary agent. It is designed purely for architectural analysis, brainstorming, and code review. It defaults to "ask before writing," ensuring it won't alter your filesystem while plotting a migration strategy.

2. Sub-agents (The Specialized Contractors)

A primary agent's main bottleneck is its context window; loading massive dependency files or heavy documentation can cause the model to lose track of the core objective. Sub-agents are temporary, highly isolated assistants spun up to execute highly focused micro-tasks.

@explore: A lightning-fast, read-only sub-agent built solely to navigate large codebases and locate files or specific structural patterns.
@scout: A dedicated research sub-agent that safely clones external repositories or pulls down upstream documentation into a managed cache, cross-referencing logic without cluttering your local environment.

3. Custom Tools (The Hands)

Tools are the deterministic functions and shell hooks that bridge an agent's reasoning loop with your physical system. When an AI generates a structured command, the underlying harness converts it into a concrete action, such as executing localized scripts or custom database query checkers stored inside your project's .opencode/tools/ folder.

4. Custom Skills (The Blueprints)

While tools are functional mechanisms, Skills represent specialized domain knowledge. Defined via the cross-platform Agent Skills Open Standard, these are structured SKILL.md markdown files containing YAML frontmatter that teach agents how to execute a workflow according to specific rules.

Real-World Architecture: The Automated Garage Platform

To see how these four pillars interact outside of theoretical abstractions, consider a real-world project: an Automated Garage & Maintenance Platform. This custom local stack is designed to track vehicle telemetry, manage analytics dashboards on Google Cloud, and orchestrate heavy vehicle detailing logs.

Instead of relying on a generic LLM that might mix up infrastructure code with chemical equations, the environment is orchestrated using custom OpenCode components.

       [User Prompt]
             │
             ▼
     @garage-lead (Agent)
             │
      ┌──────┴────────────────────────┐
      ▼                               ▼
@gcp-provisioner (Sub-agent)    @detailing-planner (Sub-agent)
      │                               │
      ├─► [cloud-native-standards]    ├─► [surface-prep-guidelines]
      │   (Skill)                     │   (Skill)
      │                               │
      └─► terraform_apply()           └─► query_inventory()
          (Tool)                          (Tool)

The Custom Setup

The Custom Agent (@garage-lead): The master coordinator. It is injected with a high-level system prompt via AGENTS.md to understand vehicle diagnostics and infrastructure boundaries, routing incoming tasks to specialized sub-agents.
The Custom Sub-agents: * @gcp-provisioner: A sub-agent restricted entirely to the /infrastructure directory, tasked with handling cloud deployments.
- @detailing-planner: A domain-specific sub-agent engineered to sequence vehicle restoration, paint protection steps, and chemical ratios.
The Custom Tools: Bespoke local scripts exposed to the AI, including read_obd2_telemetry() (extracting temperature and exhaust errors from a local diagnostic database), query_inventory() (checking active stocks of parts and detailing products), and a restricted terraform_apply() hook.
The Custom Skills: Bound via localized markdown files to inject strict operational guardrails:
- cloud-native-standards: Forces the infrastructure sub-agent to ensure any new cloud services are isolated to internal traffic and use proper service accounts.
- surface-prep-guidelines: Hardcodes strict domain-specific physical rules. It instructs the agent that V-Floc is a neutral pH shampoo (never to be used as an all-purpose cleaner/APC) and explicitly dictates that V-04 and Sinergy are entirely different products, preventing chemical layering hallucinations.

The Execution Loop

When a user inputs: "Check the vehicle telemetry, provision a dashboard for the data, and build a detailing checklist for the weekend," the orchestrator executes a multi-threaded autonomous loop:

@garage-lead triggers read_obd2_telemetry() to assess the vehicle's cooling and exhaust metrics.
The agent dispatches @gcp-provisioner, which reads the cloud-native-standards skill and uses terraform_apply() to safely stand up an internal analytics dashboard on Google Cloud.
Simultaneously, @detailing-planner wakes up, calls query_inventory() to check available products, pulls the surface-prep-guidelines skill, and generates a precise step-by-step cleaning log—ensuring the interior APC teardown and the neutral pH exterior wash happen in the exact sequence required to avoid material damage.

OpenCode vs. Claude Code: Quick Comparison

For engineering teams evaluating terminal-native automation, OpenCode and Anthropic's Claude Code share similar design principles, but their execution philosophies differ:

Architectural Component	OpenCode Framework	Claude Code CLI
Model Integration	Fully model-agnostic; supports 75+ cloud and local engines (Ollama, OpenRouter).	Proprietary ecosystem; heavily optimized for Anthropic's native Claude models.
The Skills Standard	Natively implements the cross-platform Agent Skills Open Standard (`SKILL.md`).	Developed the initial Agent Skills Open Standard for progressive token context reduction.
Tool Execution	Executes localized scripts, binary hooks, and raw shell commands out of the box.	Integrates heavily with the Model Context Protocol (MCP) to talk to local or remote servers.
Task Isolation	Relies on community plugins or wrapper workspaces (e.g., Superset) for multi-branch tasks.	Natively splits complex tasks into parallel executions using automated Git worktrees.

Summary

Modern software automation is shifting away from simple text completion toward structured agent networks. As demonstrated by the automated garage platform, separating high-level strategy (Primary Agents) from isolated research (Sub-agents), and pairing execution mechanics (Tools) with architectural rules (Skills) allows developers to step back from manual code writing and step into the role of a systems manager over a highly efficient AI factory.