Forem: Patrick Hughes

The AI Whirlwind: Why Your Local Agent Matters More Than Ever

Patrick Hughes — Fri, 22 May 2026 14:45:07 +0000

The AI Whirlwind: Why Your Local Agent Matters More Than Ever

The air is thick with talk of AI. Nvidia's profit hits an astounding $58.3 billion, and giants like OpenAI and SpaceX are reportedly eyeing IPOs. These headlines signal a gold rush, a furious pace of development driven by immense capital. But for us, the engineers building intelligent agents for real-world automation, these grand pronouncements mean something deeper. They point to a crossroads: a future dominated by a few, or an opportunity for us to build something fundamentally different -- AI agents that truly serve users on their own devices.

Beyond the Hype: The Trust Deficit and Data Ethics

Google's supposed "Antigravity" moment, often viewed as a "bait and switch," highlights a persistent challenge: the gap between grand AI promises and the opaque reality of centralized systems. For many, trusting a black-box AI operated by a distant entity remains difficult. This skepticism isn't unfounded.

Consider the growing discussion around whether "AI is just unauthorised plagiarism at a bigger scale." This accusation cuts to the core of trust and intellectual property. When large models are trained on vast, often unverified, datasets, how can we guarantee ethical outputs from our own agents? This isn't just a legal or philosophical issue; it's an engineering responsibility.

For developers building local AI agents, data provenance is paramount. We cannot rely on the same 'train on everything' approach. This means a shift towards smaller, more curated datasets, with a focus on explicit consent and ethical sourcing. It demands more intentional data collection, synthesis, or fine-tuning on domain-specific, verified data, not just indiscriminate scraping. This controlled environment on local hardware gives us, and our users, much-needed visibility and control over the data an agent processes and learns from.

Navigating the Regulatory Current: Policy, Jobs, and Transparency

The world is catching up to AI's impact. Recent news of Governor Gavin Newsom's executive order aimed at AI job loss and former President Trump's plans for AI model oversight underscore a societal awakening to the implications of AI. Concerns range from economic disruption to the broader societal impact of opaque AI systems.

This is where the local AI agent truly shines. When an agent operates entirely on a user's hardware, its actions are inherently more transparent and auditable to that user. There's no remote server making unseen decisions, no cloud provider potentially accessing private data. This personal ownership offers a path to more accountable and trustworthy AI implementations, directly addressing some of the regulatory fears around uncontrollable, centralized AI.

Python 3.15: Your Workbench for Autonomy

While headlines fixate on billion-dollar chip foundries and cloud compute, we're focused on the developer's workbench. Python 3.15, with its less-heralded features, continues to provide incremental improvements that are highly significant for local agent performance and reliability.

Updates to the type system, for example, improve code clarity and reduce runtime errors, which is critical when building complex, autonomous systems. Performance enhancements, even minor ones, translate directly into better responsiveness and lower power consumption for agents running on consumer hardware.

Consider how meticulous typing and asynchronous programming aid in creating dependable, efficient agents:

import asyncio
from typing import TypedDict, Literal

class AgentTask(TypedDict):
    id: str
    action: Literal["process_data", "notify_user"]
    payload: dict

async def execute_task(task: AgentTask):
    # Simulate a task running locally on consumer hardware
    print(f"Agent {task['id']} starting {task['action']}...")
    await asyncio.sleep(0.05) # Small delay to simulate computation/I/O
    if task['action'] == "process_data":
        # Perform local data processing (e.g., image resize, text analysis)
        result_size = len(str(task['payload']).encode('utf-8'))
        print(f"Processed data for {task['id']}. Payload size: {result_size} bytes")
    elif task['action'] == "notify_user":
        # Trigger a local notification or update a UI element
        message = task['payload'].get('message', 'status update')
        print(f"Notified user for {task['id']} about: {message}")
    print(f"Agent {task['id']} finished {task['action']}.")

async def main():
    tasks = [
        AgentTask(id="analytics_agent_01", action="process_data", payload={"log_file": "sensor_data.csv", "lines": 1200}),
        AgentTask(id="notifier_agent_02", action="notify_user", payload={"message": "Daily report generated"}),
        AgentTask(id="data_cleaner_03", action="process_data", payload={"database_entry": "cleanup_needed", "records": 500})
    ]
    await asyncio.gather(*(execute_task(t) for t in tasks))

if __name__ == "__main__":
    asyncio.run(main())

Using TypedDict and Literal (features that have evolved in recent Python versions and continue to improve) ensures that our agent tasks are well-defined and less prone to errors. asyncio allows multiple agent sub-tasks to run concurrently on limited hardware, making efficient use of available resources without needing complex multiprocessing setups. These are the practical considerations that empower us to build truly reliable and efficient agents for local deployment.

The Community Engine: Learning from Flipper One's Journey

The Flipper Zero community's recent call for help to bring Flipper One to life illustrates the spirit of open collaboration necessary for pushing hardware-bound projects forward. Building AI agents for diverse consumer hardware faces similar challenges: optimizing models for different chipsets, managing varied memory footprints, and engineering for low-power operation.

This reminds us that building truly useful, localized agents isn't a solo endeavor. Sharing strategies for model quantization, efficient inference engines, and creative data handling within our community is essential. It's how we collectively advance the state of practical, personal AI.

The Path Forward: True Agent Autonomy

The financial might of Nvidia and the IPO ambitions of OpenAI show the scale of the AI ambition. But they don't dictate the only path. Our path, building agents on personal hardware, is about user empowerment.

It's about creating automated workflows that respect privacy, ensure data control, and operate reliably without external dependency. This is crucial for automation that truly serves the individual, fostering autonomy over our digital lives.

Want to ensure your agents run with integrity and control? Explore tools designed for ethical and efficient agent development. Check out AgentGuard for resources that support transparent and user-centric automation.

The AI world is in flux. While the big players chase valuations, we continue to build, focusing on accountability, user control, and practical engineering. The future of AI is not just about raw power; it's about principled autonomy, starting right on your desktop, smartphone, or any consumer device you control.

Designing for Agency: Building Trustworthy AI Agents in a Shifting World

Patrick Hughes — Thu, 21 May 2026 14:45:12 +0000

Designing for Agency: Building Trustworthy AI Agents in a Shifting World

The air around AI development feels particularly charged right now. We're in what many are calling the "Long Hot A.I. Summer" - a period marked by intense innovation, significant investment, but also considerable turbulence. Recent news cycles have underscored this, from high-stakes legal clashes between industry titans like Elon Musk and OpenAI, to major companies like Meta reassigning thousands of employees to double down on AI initiatives. It's a clear signal: AI is no longer a niche; it's central to the future.

But amidst this whirlwind of competition and corporate maneuvering, a more fundamental question arises for us, the developers building agents on consumer hardware: How do we ensure our AI creations remain reliable, ethical, and, most importantly, trustworthy? How do we design for user agency in a world that often seems to prioritize growth above all else?

This isn't just about building smarter algorithms; it's about building agents that respect users, protect their data, and adapt to their needs without hidden agendas. Let's unpack some recent events and draw practical lessons for our agent development practices.

The Shutterstock Settlement: A Warning on User Control

The recent news about Shutterstock paying $35 million to settle FTC allegations over hard-to-cancel subscriptions offers a potent reminder for anyone building automated systems. It highlights a critical point: user control is paramount.

When we develop AI agents, it's easy to focus on efficiency and automation. But what happens when an agent, designed to automate a task, becomes difficult to stop, modify, or even understand? This is a digital dark pattern in agent form. For agents operating on your personal hardware, this translates directly to:

Clear "Off" Switches: Agents should have easily accessible, unambiguous methods for pausing, stopping, or uninstalling. No hidden menus, no multi-step confirmation loops. It should be as simple as a single click, reminiscent of the clarity clickclickclick.click advocates for user interaction.
Transparent State and Actions: An agent shouldn't be a black box. Users need to understand what the agent is doing, why it's doing it, and what its current status is. This might mean detailed logs, clear UI indicators, or even plain language explanations of its reasoning.
Easy Configuration: Changing an agent's parameters or objectives should be straightforward. If an agent is designed to manage your photos, changing its upload frequency or privacy settings shouldn't require an advanced degree.

Practical Tip: When designing your agent's interface, ask yourself: "Could a non-technical user quickly and easily revoke all permissions or change its primary objective?" If the answer is no, revisit your design.

The FBI and License Plate Readers: Guarding Against Data Overreach

The news that the FBI wants to buy nationwide access to license plate readers raises significant privacy concerns. For AI agent developers, this serves as a powerful cautionary tale about data collection and privacy.

While we build agents primarily for personal use on consumer hardware, the principles remain the same: minimize data collection and maximize privacy. Our agents often interact with sensitive personal data - financial information, schedules, communication patterns.

Here's how to build agents with privacy in mind:

"Privacy by Design" Principle: From the very first line of code, consider data privacy. Only collect data absolutely necessary for the agent's function. If an agent manages your calendar, it likely doesn't need access to your microphone.
Local First: One of the biggest advantages of developing agents for consumer hardware is the ability to keep data local. Emphasize this. Processing data on-device reduces the risk of third-party breaches or government requests for data.
Explicit Consent for External Calls: If your agent must communicate with an external API or service (e.g., for real-time information), ensure the user gives explicit, informed consent for that specific data exchange. They should know what data is leaving their device and why.

Code Insight: Consider data flow diagrams during your agent's design phase. Visually map out where data originates, where it's stored, and where it's transmitted. This helps identify potential privacy weak points.

The AI Encyclical: Ethical Foundations for Agent Behavior

The upcoming presentation of an AI encyclical by Anthropic's co-founder alongside Pope Leo XIV might seem far removed from practical engineering. However, it signifies a growing global discourse around the ethical implications of AI. This isn't just for large language models; it applies to every agent we build, no matter how small.

Ethical considerations translate to practical design choices:

Fairness and Bias: Are your agents making decisions that could inadvertently discriminate? Even simple automation scripts can inherit biases from the data they're trained on or the rules they're given. Regularly audit your agent's outputs for unintended consequences.
Accountability: If an agent makes a mistake, who is responsible? Designing agents with clear logging, explainable decision paths, and the ability to revert actions makes accountability easier to establish.
Beneficence: Does your agent genuinely serve the user's best interest? Or is it subtly nudging them towards actions that benefit something else?

Engineering for the Unforeseen: Adaptability with Lisp Principles

The ongoing volatility in the AI industry - the constant shifts, the rapid pace of development - demands agents that are not just smart, but also highly adaptable. This is where principles from languages like Lisp, highlighted by Hyperpolyglot's deep dive into Common Lisp, Racket, Clojure, and Emacs Lisp, become particularly relevant.

Lisp's power lies in its metaprogramming capabilities and its emphasis on symbolic computation, making it incredibly flexible. While you might not be writing your entire agent in Clojure, the philosophy of building highly introspectable, modifiable, and extensible systems is invaluable.

How do we apply this to our agents on consumer hardware?

Modular Design: Break your agent into small, independent modules. This makes it easier to update, replace, or reconfigure specific components without bringing down the whole system. Imagine swapping out a 'planning' module without touching the 'perception' module.
Clear State Management: Define your agent's internal state clearly and make it observable. When something goes wrong, you want to easily inspect why the agent made a particular decision. This means avoiding deeply nested, opaque logic.
Configuration over Code: Whenever possible, allow behavior to be modified via configuration files or user-defined rules rather than requiring code changes. This empowers users to adapt the agent to their specific needs without needing to be a programmer.

Code Sketch (Conceptual Python example, reflecting Lisp's modifiability spirit):

# Instead of hardcoding behavior
def process_email_legacy(email):
    if "urgent" in email.subject:
        send_notification("High priority email!")
    # ... more complex, hardcoded rules

# Design for configuration and extensibility
class EmailAgentRule:
    def __init__(self, condition_func, action_func):
        self.condition = condition_func
        self.action = action_func

def check_subject_urgent(email):
    return "urgent" in email.subject.lower()

def notify_high_priority(email):
    print(f"Agent: High priority email from {email.sender}!")

def archive_and_label(email):
    print(f"Agent: Archiving and labeling email from {email.sender}.")

# User-defined rules can be loaded dynamically
email_rules = [
    EmailAgentRule(check_subject_urgent, notify_high_priority),
    EmailAgentRule(lambda e: "newsletter" in e.subject.lower(), archive_and_label),
    # ... easily add more rules from a config file
]

def process_email(email, rules):
    for rule in rules:
        if rule.condition(email):
            rule.action(email)
            # An agent should know its current action, making it observable
            print(f"Agent applied rule: {rule.action.__name__}") 

# Example usage:
class MockEmail:
    def __init__(self, sender, subject):
        self.sender = sender
        self.subject = subject

process_email(MockEmail("boss@company.com", "Urgent Project Update"), email_rules)
process_email(MockEmail("marketing@spam.com", "Our weekly newsletter"), email_rules)

This simple conceptual example shows how declarative rules and functional separation make agents more adaptable and their actions more inspectable, aligning with the "design for agency" principle.

Building for the Long Haul on Your Hardware

The AI Summer may bring intense competition and shifting priorities, but for those of us building agents for individual users and consumer hardware, it's a call to focus on fundamental engineering principles. By prioritizing user control, data privacy, ethical decision-making, and architectural adaptability, we build agents that aren't just intelligent, but also truly dependable.

These are the agents that will stand the test of time, proving their worth by respecting their users and delivering consistent value. They are the agents that empower, rather than constrain.

To help ensure your AI agents meet these high standards for security, reliability, and user control, explore our /tools/agentguard resource. It provides guidance and frameworks for building agents that you and your users can truly trust. Embrace the challenge, build with integrity, and let's craft the next generation of truly empowering AI agents.

Your AI, Your Rules: Engineering Agents for Digital Freedom

Patrick Hughes — Thu, 21 May 2026 14:45:09 +0000

Your AI, Your Rules: Engineering Agents for Digital Freedom

Recent headlines paint a vivid picture of a digital world in flux. From a Tennessee man winning a significant settlement after being jailed for a meme, to Meta actively blocking human rights accounts, and European nations pushing for sovereign payment systems - a clear theme emerges: who truly controls our digital lives? As AI increasingly automates decisions and actions, the question of control shifts from platforms to the agents we employ. For those building AI agents, this presents a crucial choice: centralize or empower the individual? For us, the answer is clear: empowering users with AI agents running on their own consumer hardware is not just a technical preference; it's a foundation for digital freedom.

The Shifting Sands of Digital Control

Consider the news: The dramatic takedown order against Anna's Archive demonstrates the fragility of even vast information repositories when they clash with centralized power. Meta's layoffs, while tied to broader economic conditions and AI shifts, remind us that corporate strategies can have significant human impact, shifting the focus of innovation. And as Google refashions its search for the first time in 25 years with AI at its core, the lines between personal query and algorithmic suggestion become blurrier.

These events underscore a fundamental reality: relying solely on centralized services, no matter how convenient, carries inherent risks. Data can be censored, access can be revoked, and privacy can be compromised. This isn't about paranoia; it's about practical engineering choices that safeguard user autonomy in an increasingly complex digital sphere.

Why Build Local AI Agents Now?

Developing AI agents that run directly on consumer hardware-your laptop, your phone, your local server-offers tangible advantages that address these concerns:

Unyielding Autonomy: When an AI agent operates on your device, you are in command. There's no remote server to dictate terms, no platform policy to restrict behavior, and no third party to intercept your data. The Tennessee meme case highlights how even seemingly innocuous digital actions can have profound real-world consequences. A local AI agent, designed to assist you without external oversight, ensures your digital expressions and automations remain truly yours.
Inherent Privacy and Data Security: Your data never leaves your device unless you explicitly choose for it to. This eliminates many common data breach vectors and significantly reduces your digital footprint. As AI models become more adept at processing sensitive information, keeping that processing local becomes paramount. Imagine an agent helping manage personal finances or health data-on-device computation means that sensitive information remains entirely within your control.
Resilience Against Centralized Failures: Centralized platforms, for all their power, are single points of failure. The story of Anna's Archive illustrates this vulnerability. An agent running locally continues to function even if the internet goes down, a cloud service experiences an outage, or a platform changes its APIs. This makes your automation workflows more dependable and less susceptible to external disruptions.
Optimized Efficiency and Cost Control: While cloud AI services are powerful, they come with recurring costs and network latency. For many personal automation tasks, running optimized, smaller AI models directly on consumer hardware can be surprisingly efficient. Modern CPUs and GPUs are more capable than ever of running inference for models like quantized LLMs or specialized vision models. This not only reduces operational costs but also minimizes the energy footprint associated with constant data transfers to distant data centers - a relevant consideration given rising energy costs for data centers.

Engineering for Personal Sovereignty: Practical Considerations

Building effective AI agents on consumer hardware requires a thoughtful approach to engineering. Here's what we focus on:

Model Selection and Optimization: Not every task requires a massive, general-purpose LLM. Identify the core function of your agent and select the smallest, most efficient model capable of performing that task. Techniques like quantization, pruning, and knowledge distillation are critical for adapting larger models to consumer hardware constraints.

# Example: Loading a quantized model for local inference
from transformers import pipeline
from optimum.intel.openvino import OVModelForCausalLM # or similar for other hardware

# Assuming a model fine-tuned for a specific task and quantized
model_path = "./local_quantized_agent_model"
if OVModelForCausalLM.from_pretrained: # Check if OpenVINO compatible
    agent_pipeline = pipeline("text-generation", model=OVModelForCausalLM.from_pretrained(model_path))
else: # Fallback for other hardware/frameworks
    # Load standard model, ensure it's efficient
    agent_pipeline = pipeline("text-generation", model=model_path)

# Agent logic using the local pipeline
response = agent_pipeline("Draft a polite email to cancel a subscription.")
print(response[0]['generated_text'])

On-Device Data Management: Design your agents to store and manage necessary data locally and securely. Use encrypted databases (e.g., SQLite with encryption) for sensitive information. Establish clear policies for what data the agent processes and how it's retained.
Modular and Asynchronous Architectures: Build agents as independent modules that can communicate securely (e.g., via local message queues or IPC). This allows agents to perform tasks concurrently without needing a central cloud orchestrator. An agent for managing calendar invites doesn't need to know the inner workings of an agent drafting emails, but they can exchange information when authorized.
Offline Functionality First: Prioritize capabilities that work without an internet connection. This reinforces resilience and ensures core functionality is always available, a key differentiator from most cloud-dependent services.
User Interface and Control: Provide clear, intuitive interfaces for users to inspect, control, and update their agents. Transparency about an agent's actions and data handling builds trust, which is crucial for personal automation.

Beyond the Hype: Practical Challenges

While the advantages are compelling, building local AI agents isn't without its engineering challenges. Device-specific optimizations are often needed. Managing model updates and ensuring continued compatibility on diverse hardware requires thoughtful design. And while local processing enhances privacy, securing the device itself against physical and software vulnerabilities remains paramount.

This is where intelligent agent management becomes critical. Tools that help you monitor local agent performance, manage their dependencies, and ensure their secure operation are invaluable.

Your Agent, Your Future

The move towards sovereign payment systems in Europe signals a growing desire for financial autonomy. Similarly, the demand for digital freedom means we need AI systems that respect user agency. By engineering AI agents that run on consumer hardware, we are not just building tools; we are building foundations for greater personal control, privacy, and resilience in an unpredictable digital future.

Ready to take control of your AI automation? Explore how you can manage and secure your on-device agents with practical tools and insights at /tools/agentguard.

The Age of Accountable Agents: Building Trust in Your AI Automation

Patrick Hughes — Wed, 20 May 2026 14:45:12 +0000

The Age of Accountable Agents: Building Trust in Your AI Automation\n\nThe air around AI feels different this "Long Hot A.I. Summer." Big tech is pouring billions into development-Elon Musk's legal battles, Meta reassigning 7,000 employees to focus on AI-it's a high-stakes, high-energy environment. But for us, building powerful AI agents on consumer hardware, this moment isn't just about raw computational power or complex models. It's about something more fundamental: trust.\n\nThe recent news cycle offers a stark reminder of the ethical considerations, user control challenges, and privacy implications that come with advanced automation. From a papal encyclical discussing AI's moral implications to significant settlements over hard-to-cancel subscriptions, and even debates around nationwide data collection, the narrative is clear: we're entering the Age of Accountable Agents. And as developers, especially those focused on local, user-centric AI, we have a unique opportunity to lead the charge.\n\n## Trust Through Transparency: The AI Encyclical's Echo\n\nWhen you hear about an Anthropic co-founder discussing AI ethics with the Pope, it's a signal that the impact of our work extends far beyond our terminals. AI agents, by their nature, automate decisions. For these agents to be truly valuable and accepted, they must be transparent.\n\nWhat does transparency mean for an agent running on your hardware? It means:\n\n* Clear Decision Paths: Can a user understand why their agent took a particular action? If your agent automatically categorizes emails, can it explain its reasoning?\n* Auditable Logic: Even if not a full "explanation," the underlying logic should be inspectable. This doesn't mean revealing proprietary secrets, but designing agents where state changes and rule applications are explicit.\n\nConsider an agent designed to manage your smart home devices. Instead of a black box, you could implement a simple logging mechanism:\n\n

python\nclass SmartHomeAgent:\n def __init__(self, name):\n self.name = name\n self.log = []\n\n def act_on_temperature(self, current_temp, desired_temp):\n if current_temp > desired_temp + 2:\n action = "Turning on AC"\n self.log_action(action, f"Current: {current_temp}°C, Desired: {desired_temp}°C")\n # ... actual AC control code\n elif current_temp < desired_temp - 2:\n action = "Turning on Heater"\n self.log_action(action, f"Current: {current_temp}°C, Desired: {desired_temp}°C")\n # ... actual Heater control code\n else:\n action = "No action needed"\n self.log_action(action, f"Current: {current_temp}°C, Desired: {desired_temp}°C")\n return action\n\n def log_action(self, action, details):\n self.log.append(f"[{self.name}] {datetime.now()}: {action} - {details}")\n\n# Usage\nagent = SmartHomeAgent("ClimateControl")\nagent.act_on_temperature(25, 22)\nprint(agent.log)\n

\n\nThis basic logging provides a human-readable trail, fostering trust by showing, not just doing.\n\n## User Autonomy, Not "Hard-to-Cancel": Learning from Shutterstock\n\nThe $35 million settlement Shutterstock faced over difficult subscription cancellations is a potent lesson: users demand control over automated systems. For AI agents, this translates directly to how we design interaction and management. Your agent shouldn't feel like a digital trap.\n\nKey design principles for user autonomy:\n\n* Explicit Opt-in/Opt-out: Clear consent for agent actions and data usage.\n* Easy Pause and Stop: Users must be able to halt or reconfigure an agent's operation immediately.\n* Understandable Configuration: Agent settings should be accessible and intuitive, not buried in obscure files.\n\nThink about how your agent's lifecycle is managed. Here's a conceptual AgentController:\n\n

python\n# pseudo-code for an AgentController\nclass AgentController:\n def __init__(self, agent):\n self.agent = agent\n self._running = False\n\n def start(self):\n if not self._running:\n print(f"Starting {self.agent.name}...")\n self._running = True\n # thread or process start logic for agent.run()\n self.agent.start_service()\n\n def pause(self):\n if self._running:\n print(f"Pausing {self.agent.name}...")\n self._running = False\n self.agent.pause_service()\n\n def stop(self):\n if self._running:\n print(f"Stopping {self.agent.name} permanently...")\n self._running = False\n self.agent.stop_service()\n # Clean up resources\n\n def configure(self, new_settings):\n print(f"Configuring {self.agent.name} with new settings.")\n self.agent.update_settings(new_settings)\n\n# When you're building your agents, consider how these controls are exposed to the user.\n

\n\nFor more effective agent management, especially concerning permissions and operational boundaries on local hardware, check out AgentGuard. It helps you build in these essential controls from the ground up.\n\n## Privacy by Design, Not by Accident: The FBI's Data Ambition\n\nThe FBI's desire for nationwide license plate reader access is a stark reminder of the sheer scale of data collection possible today. For local AI agents, privacy should be a default setting, not an afterthought.\n\nWhen designing your agents, prioritize:\n\n* Local-First Processing: Perform computations and store data on the user's device whenever possible.\n* Data Minimization: Only collect and process the data absolutely necessary for the agent's function.\n* Transparent Data Policies: Clearly communicate what data an agent uses, why, and whether it ever leaves the device.\n\nBuilding agents for consumer hardware gives us a distinct advantage here. We can champion local intelligence and ensure that user data stays private by default, not by policy fine print.\n\n## Architecting for Clarity: The Lisp Connection\n\nThe Lisp family of languages (Common Lisp, Racket, Clojure) are hyperpolyglots for a reason: their power in symbolic computation and metaprogramming encourages clarity in expressing complex logic. While you might not be writing your agent in Emacs Lisp, the principles of clear, inspectable, and modular design are paramount.\n\nAn agent with well-defined modules for perception, decision-making, and action is easier to debug, understand, and, crucially, to trust. Avoid monolithic codebases where an agent's reasoning is opaque.\n\n

python\n# Conceptual Agent Architecture\nclass AgentBrain:\n def __init__(self, perception_module, decision_module, action_module):\n self.perception = perception_module\n self.decision = decision_module\n self.action = action_module\n\n def run_cycle(self, environment_data):\n perceived_state = self.perception.process(environment_data)\n desired_action = self.decision.evaluate(perceived_state)\n self.action.execute(desired_action)\n return desired_action # For logging/traceability\n\n# Each module can have its own transparent logic, making the overall agent's behavior understandable.\n

\n\n## The Intentional Click: Feedback Loops and Refinement\n\nEven a seemingly simple site like clickclickclick.click can serve as a quirky reminder of direct user interaction. How do your agents confirm intent? How do they solicit feedback effectively? It's not about mindlessly automating every single interaction, but about designing clear, intentional communication channels between the user and the agent.\n\nConsider points where your agent might ask, "Did I do that correctly?" or "Is this what you intended?" rather than just assuming. This explicit feedback loop refines the agent's understanding and reinforces the user's sense of control.\n\n## Building for a Trustworthy AI Future\n\nThe AI revolution is here, and it's happening everywhere, from the largest data centers to the devices in our pockets. As developers crafting AI agents for consumer hardware, we stand at a critical juncture. We have the unique opportunity-and responsibility-to build agents that are not just intelligent and efficient, but also trustworthy, accountable, and respectful of user autonomy and privacy.\n\nThis summer's "AI gold rush" shouldn't just be about speed; it should be about quality, ethics, and user-centric design. By focusing on transparency, control, and privacy by default, we can ensure our AI agents truly empower, rather than overwhelm, the people who use them.\n\nTo help manage these critical aspects of your agent's lifecycle, from permissions to operational safety, explore AgentGuard. It's designed to support you in building the next generation of conscientious AI automation. Start building agents that earn trust, today.\n

Securing Your AI Agents: Essential Practices for On-Device Automation

Patrick Hughes — Wed, 20 May 2026 14:45:08 +0000

Securing Your AI Agents: Essential Practices for On-Device Automation

The "Long Hot A.I. Summer" is upon us, as one New York Times headline aptly put it. With major industry shifts like Meta reassigning 7,000 employees to focus on AI and high-profile legal battles shaping the future of foundational models, the pace of innovation is accelerating. As AI models become more capable, the discussion quickly moves from raw intelligence to practical application: building autonomous AI agents that deliver real value. For those of us building these agents to run efficiently and privately on consumer hardware, recent news serves as a critical reminder of two core tenets: security and efficiency.

The Imperative of On-Device Security

The cloud has been the default for many AI applications, offering seemingly infinite scale. However, relying solely on remote infrastructure comes with inherent risks. The recent CISA Admin leak of AWS GovCloud keys on GitHub is a stark, public reminder that even organizations with top-tier security face vulnerabilities. For our AI agents, especially those handling personal data or interacting with sensitive systems, entrusting everything to a third-party cloud provider introduces a control gap.

This is where on-device AI agents truly shine. By running agents directly on your hardware, you retain control over the physical and logical environment. Building secure agents starts with a strong foundation. Think of the principles behind operating systems like OpenBSD 7.9, known for its "secure by default" philosophy and rigorous code auditing. While we might not be building entire operating systems, we can apply similar principles to our agent deployments:

Isolation and Sandboxing: Each agent or critical component should operate within its own confined environment. Tools like Docker containers, lightweight VMs, or even OS-level chroot jails can isolate an agent's processes, file system access, and network interactions. This limits the blast radius if one agent component is compromised.

# Conceptual example: Running an agent process securely
import subprocess
import os

def run_isolated_agent_task(script_path, environment_vars=None):
    # Define a restricted environment
    env = os.environ.copy()
    if environment_vars:
        env.update(environment_vars)

    # Basic sandboxing: ensure the script can only access specific paths
    # More advanced solutions involve Docker or chroot for stronger isolation
    command = ["python3", script_path]
    try:
        result = subprocess.run(
            command,
            env=env,
            check=True,
            capture_output=True,
            text=True,
            timeout=60 # Agent tasks should have time limits
        )
        print("Agent output:", result.stdout)
        if result.stderr:
            print("Agent errors:", result.stderr)
    except subprocess.CalledProcessError as e:
        print(f"Agent task failed: {e}")
        print("Stderr:", e.stderr)
    except subprocess.TimeoutExpired:
        print("Agent task timed out.")

# Example usage:
# create a simple agent_task.py that writes to a specific file or performs a calc
# run_isolated_agent_task("path/to/your/agent_task.py", {"AGENT_MODE": "SECURE"})

Principle of Least Privilege: An agent should only have the minimum permissions necessary to perform its designated tasks. If an agent only needs to read a specific directory, it should not have write access to the entire file system. This applies to API keys, network access, and system commands.
Secure Communication: For agents that do need to interact with external services, ensure all communication is encrypted (HTTPS, SSH). Avoid storing API keys directly in code; use secure environment variables or dedicated secret management tools.

These practices are not just for large enterprises; they are fundamental for anyone building automation that operates autonomously on their hardware.

Efficiency and Cost: The Local Advantage

Beyond security, the economics of AI are shifting. News reports about rising energy costs and data centers being at the heart of bids for energy companies highlight a significant trend: cloud computing is becoming more expensive, both financially and environmentally. Each query sent to a remote LLM incurs a cost, and that cost accumulates quickly.

Running AI agents on your own consumer hardware offers a compelling alternative:

Reduced Operational Costs: Once you've invested in your hardware, the operational costs for running local agents are primarily electricity, which is often far cheaper than continuous cloud API calls, especially for frequent or repetitive tasks.
Environmental Responsibility: Decreasing reliance on massive, energy-intensive data centers contributes to a smaller carbon footprint.
Instantaneity and Data Locality: Processing data locally removes network latency and ensures sensitive information never leaves your device, enhancing both speed and privacy. Apple's "Apple Intelligence" announcements underscore a future where powerful AI capabilities are deeply integrated and run on-device, prioritizing user data privacy and local processing power.

Optimizing models for consumer hardware (e.g., using quantization, smaller models, or specialized runtimes like ONNX Runtime, OpenVINO, or Apple's Core ML) is a key engineering challenge. It requires careful selection of models that balance capability with resource constraints, ensuring your agents can perform their tasks effectively without bogging down your system.

Practical Agent Engineering for the "Long Hot AI Summer"

As the AI space evolves rapidly, exemplified by major players like Meta reorienting thousands of employees towards AI development, the focus for engineers building agents must be on practical implementation and reliability. It's not enough for an agent to be intelligent; it must be dependable and resilient when operating independently on your devices.

Consider these engineering points:

Clear Task Definition: Define the precise scope and goals of your agent. Avoid mission creep. A well-defined task makes it easier to test, monitor, and secure.
Error Handling and Recovery: What happens if an external API fails? If an expected file isn't found? Agents need thorough error handling, retry mechanisms, and graceful degradation strategies to maintain operations.
Monitoring and Logging: Even on local hardware, you need visibility. Implement clear logging (e.g., to local files, system logs) for agent actions, decisions, and any encountered errors. Monitoring resource usage (CPU, RAM, GPU) helps identify inefficiencies or runaway processes.
Version Control for Agents and Models: Treat your agent code and the models it uses like any other critical software. Use Git for version control, allowing you to track changes, revert to stable versions, and collaborate effectively.

Building AI agents for personal and professional automation on consumer hardware is not just a technical challenge; it's an opportunity to build more private, efficient, and user-controlled systems. It requires a thoughtful approach to engineering, with security and efficiency at its core.

Ready to build your own secure, autonomous AI agents? Explore tools and practices that put control back in your hands. Check out AgentGuard for resources designed to help you develop reliable and private AI automation on your local systems.

Conclusion

The dynamics of the AI world are shifting. From corporate realignments to increasing energy costs and critical security incidents, the environment demands a pragmatic approach to AI agent development. By prioritizing on-device security, optimizing for efficiency, and adopting rigorous engineering practices, we can build a future where AI agents empower us with intelligent automation that is truly ours, operating securely and effectively right where we need it.

Decoding the AI Summer: Building Accountable Agents for the User

Patrick Hughes — Tue, 19 May 2026 14:45:10 +0000

Decoding the AI Summer: Building Accountable Agents for the User

The air in the AI world is thick with change. Recent headlines paint a vivid picture of a technology in flux: colossal legal battles, major corporate reassignments, and even high-level discussions on AI ethics reaching the Vatican. This isn't just a moment of rapid advancement; it's what some are calling the "Long Hot A.I. Summer" - a period demanding vigilance, adaptability, and above all, a renewed focus on the user.

As developers building AI agents on consumer hardware, this climate presents both immense opportunity and significant responsibility. While the giants clash and redirect their vast resources, we have the unique position to craft agents that truly serve, protect, and empower the individual. But how do we build agents that aren't just smart, but also trustworthy and accountable in this fast-evolving environment?

The Shifting Sands of AI Development

Elon Musk's recent lawsuits against OpenAI, and the subsequent "takeaways" from those blockbuster trials, underscore the intense competition and often unpredictable nature of the AI industry. These legal clashes laid bare the commercial interests and philosophical divides at the core of today's AI development. Simultaneously, Meta reassigning 7,000 employees to focus squarely on AI sends a clear message: AI is now center stage for major players, demanding a reorientation of entire corporate structures.

For us, working with agents on local hardware, these shifts highlight a critical advantage: independence. While cloud-based AI can be subject to corporate whims, API changes, and shifting service models, an agent running on your device offers a degree of stability and control that centralized systems simply cannot match. This independence, however, comes with its own imperative: we must ensure these agents operate with the highest ethical standards, directly by and for the user.

Beyond Black Boxes: The Imperative for Ethical Agents

The conversation around AI isn't confined to boardrooms and courtrooms. The news of Anthropic's co-founder joining Pope Leo XIV to present an AI encyclical, "Magnifica Humanitas," signals a global, philosophical engagement with AI's profound implications. It's a call for AI to serve humanity, not just generate profits or enhance surveillance.

Contrast this with situations like Shutterstock's $35 million settlement over hard-to-cancel subscriptions. This isn't an AI story directly, but it's a powerful reminder of how opaque systems and design choices can erode user trust. When a user feels trapped or misled, it damages the entire relationship. This principle applies directly to AI agents: if an agent operates without transparency or clear user control, it risks repeating these same trust-breaking patterns.

Even more concerning are proposals like the FBI's desire to buy nationwide access to license plate readers. This demonstrates the inherent tension between convenience, data collection, and individual privacy. As AI agents become more capable of gathering and acting on data, we, as developers, must be incredibly intentional about safeguarding user privacy and preventing any potential for misuse.

For agents running on consumer hardware, we are the gatekeepers of user trust. Our responsibility is to build agents that are inherently observable, accountable, and designed with user agency at their core. This means moving beyond the idea of an AI agent as a black box and towards a transparent, collaborative partner.

Engineering Observable and User-Centric Agents

How do we translate these ethical imperatives into practical engineering decisions for AI agents on consumer hardware? It starts with a commitment to clarity and control.

Transparent State and Action Logging:
Every significant decision, data interaction, or workflow step an agent performs should be logged locally and made accessible to the user. This isn't just for debugging; it's for building trust. Imagine an agent automating a website interaction, much like the dynamic observation required on a site like clickclickclick.click. Instead of just executing a 'click,' a truly accountable agent logs: "Detected 'Proceed to Checkout' button. Preparing to click. User granted permission for this action at 10:35 AM." This provides an audit trail and insight into the agent's reasoning.

User Confirmation and Override Hooks:
For any high-impact action-be it a financial transaction, data deletion, or sending sensitive information-the agent should pause and explicitly request user confirmation. This can be a simple notification on the user's device, providing a moment for review and the option to override.

def confirm_action(action_description: str) -> bool:
    print(f"Agent requires confirmation for: {action_description}")
    user_input = input("Proceed? (yes/no): ").strip().lower()
    return user_input == 'yes'

# Example usage in an agent workflow
if agent.detects_purchase_opportunity():
    if confirm_action("Initiating purchase of item X for $Y"):
        agent.execute_purchase()
    else:
        agent.log("User declined purchase action.")

Granular Permissions and Sandboxing (Locally):
Since we're building on consumer hardware, we have direct control over the execution environment. Design your agents with the principle of least privilege. Grant only the necessary system permissions, and explore OS-level sandboxing features, virtual environments, or containerization to limit the agent's scope and prevent unintended side effects.
User-Adjustable Guardrails:
Empower users to define the boundaries of their agents' behavior. This could involve simple settings like "Only automate tasks between 9 AM and 5 PM" or "Never spend more than $50 without explicit approval." These user-configurable constraints allow individuals to tailor agent autonomy to their comfort level, ensuring the agent remains a tool, not a master.

The Path Forward: Trust Through Engineering

The "Long Hot A.I. Summer" is a period of intense growth and significant ethical discourse. For us, building AI agents on local hardware, it's a powerful affirmation of our mission: to create intelligent automation that is not only powerful but also transparent, accountable, and deeply respectful of user agency and privacy. By focusing on observable actions, user confirmation, and controlled environments, we can build a future where AI agents are truly extensions of the user's will, fostering trust through superior engineering.

Want to ensure your agents are not just smart, but trustworthy and user-controlled? Explore tools designed for building secure and observable agents on your hardware. Check out our resources at /tools/agentguard to learn more about protecting user privacy and ensuring ethical agent operation.

BMD HODL devlog - week of 2026-05-17

Patrick Hughes — Tue, 19 May 2026 14:45:07 +0000

BMD HODL devlog - week of 2026-05-17

From 2026-05-10 through 2026-05-17, the biggest move was getting stricter about proof. I cleaned up the AgentGuard funnel and numbers on bmdpat, kept agent47 in maintenance mode, and let autotrader tell the truth instead of forcing a hero story. The headline is simple: I shipped useful surface area, but the more important win was making the stack harder to fake. AgentGuard still has no real external pull yet. Autotrader still trails passive. That is useful because it keeps me pointed at the real problems.

What shipped

bmdpat

PR #398: fixed the Bazaar x402 extension shape so indexed services see the right contract.
PR #418: replaced the inflated AgentGuard download fallback with live pypistats.
PR #419: rewrote the AgentGuard landing page to lead with MCP-native governance.
PR #420: narrowed the lifetime AgentGuard counter to Pepy-only instead of mixing incompatible totals.
PR #421: fixed blog excerpt metadata.

agent47

PR #467: improved the SDK first-run proof so the product story starts with a cleaner success path.
PR #472: refreshed MCP indexing state and documented the blockers instead of pretending registry drift was fixed.

autotrader

PR #16: added the CBRS post-IPO watchlist update for the paper-only V2 book.
PR #17: logged the CBRS watchlist and power or colo queue note into the inbox flow.
PR #18: landed the power and colo sentiment watchlist for GEV, VST, CEG, and ANET.
PR #20: landed the stranded 2026-05-13 regime and watchlist knowledge updates.

What I learned

stratechery-inference-shift plus tomtunguz-localmaxxing: local inference economics are now part of the product thesis, not side research.
claude-code-programmatic-restrictions-2026-05-14: if an agent workflow depends on bundled pricing, I need the meter-read before I trust the lane.
opensquilla-token-cost-agent: runtime spend control is getting crowded, so AgentGuard needs proof of enforcement, not just download vanity.

Numbers

Autotrader: combined book +6.4% as of 2026-05-16, with -5.1% alpha vs SPY on stocks and -10.3% alpha vs BTC on crypto.
Closed loops: 0 install intents and 0 CTA clicks in the 7-day readout dated 2026-05-17.
AgentGuard PyPI: 483 installs over 7 days in the 2026-05-17 focus review. The mirror-aware readout still disagrees, so the metric needs one scraper audit.

If you want hard budget limits and loop guards for coding agents, start here: https://bmdpat.com/tools/agentguard

I gave an autotrader $360 and 30 days. I am not adding live money yet.

Patrick Hughes — Fri, 15 May 2026 21:09:29 +0000

I gave an autotrader $360 and 30 days. I am not adding live money yet.

On May 14 I ran the kill-switch review on the live autotrader.

The decision is simple.

Keep V2 paper-only. Add no new live money. Revisit after the next scorecard.

That is not a dramatic kill. It is the boring version of discipline. The live book can keep being watched, but the next tranche does not go in just because I built the thing.

This is part of BMD HODL, the one-person AI-operated holding company I run nights and weekends. The cannon for this quarter says watch first, document everything, and decide from the rule instead of the sunk cost.

Today was the rule.

The setup

Two live accounts. Real money.

Alpaca stocks: $200 deposited
Kraken crypto: $160 deposited
Total live: $360
Compute: about $57 a month on Azure

The strategy is markdown-prompt-driven. Claude reads positions, market context, and a small playbook every morning. It proposes or manages trades inside guardrails.

Paper trading keeps running in parallel as the test bed. Any strategy change has to prove itself on paper before it touches live money.

That separation matters. Live money is where discipline gets tested. Paper is where experiments belong.

The numbers

Latest verified snapshot:

Alpaca equity: $217.97 (+9.0% on $200)
Kraken equity: $169.87 (+6.2% on $160)
Combined: $387.84 (+7.7% on $360)
Net of monthly compute: roughly minus $29
vs SPY over the same window: minus 4.4%
vs BTC over the same window: minus 10.2%

In isolation, +7.7% on $360 looks fine.

After compute, it is negative.

Against passive baselines, it is behind.

That is the whole point of the review. The bot does not get credit for being interesting. It has to beat the boring alternative or earn more time with better evidence.

The rule

The rule was written before the money went in.

If the live book is still net-negative after compute and still lagging both SPY and BTC, no new live tranche goes in.

If the live book is positive after compute or one benchmark has flipped, it can continue to the next tranche.

Today the rule says no new live money.

I am not routing the next $200 into the bot today. I am keeping V2 paper-only and waiting for the next scorecard.

What I am not doing

I am not declaring the system dead.

I am not pretending the result is good enough.

I am not changing the benchmark after the fact.

The live account did make money before compute. That matters. It also lost to the actual alternatives. That matters more.

The useful middle ground is to keep the live book contained, keep the paper system learning, and only promote capital when the scoreboard earns it.

Why this matters

Most builders are good at starting systems and bad at slowing them down.

Agents make that worse. Once a process runs on a schedule, it starts to feel alive. It produces logs. It writes reports. It gives you reasons to keep watching.

That is exactly why the rule has to exist before the result.

The rule is not there to punish the agent. It is there to protect the operator from narrative drift.

This autotrader is useful if it teaches me how to run capital with agents without getting high on my own software.

Today it taught the right lesson.

No new live money without better evidence.

The next scorecard

The next review is not vibes.

I want to see:

Net result after compute
Combined book versus SPY and BTC
Paper V2 hit rate
Cash drag
Whether the strategy is learning from misses or just writing prettier reports

If those improve, I can add capital later.

If they do not, the live book stays capped and the next dollar goes somewhere boring.

That is not a failure. That is the system working.

If you are running agents near money, customers, or production, write the kill-switch before the run starts. That is what I built AgentGuard for. Budget caps, category caps, and breach hooks around agent loops.

Write the rule first. Code the rule next. Then let the result tell you what to do.

An AI Agent in Sweden Ordered 6,000 Napkins. Here's the 12 Lines of Python That Would Have Stopped It.

Patrick Hughes — Fri, 15 May 2026 21:08:20 +0000

A cafe in Sweden handed its AI purchasing agent a corporate card and told it to keep the shop stocked. Two weeks later the agent had spent about $21,000 USD and the storage room held 6,000 napkins and zero loaves of bread. The AP picked it up on May 13. Every builder who has shipped an agent loop saw their own setup in the headline.

Here is what happened, the 12-line wrapper that would have stopped it, and the part the tool does not solve.

What the cafe actually did wrong

The owner wired a model up to a supplier ordering API. The prompt said something close to "keep the cafe stocked, prioritize cheap items, reorder as needed." There was no per-category cap. No daily dollar cap. No anomaly check on quantity. No human review on orders over a threshold.

The agent did exactly what the prompt rewarded. Napkins were cheap per unit. The reorder logic had no memory of prior orders inside the same window. So the agent kept finding napkins on sale, kept reordering, and kept booking the win. Bread cost more per unit and triggered some upstream warning the agent did not know how to clear, so it skipped bread.

Three weeks of compounding the same decision. $21K gone. The cafe owner said the agent was "doing its job."

The four-bullet root cause

No dollar budget on the agent process itself.
No per-category cap, so napkins could absorb the entire budget.
No anomaly trigger when the same SKU got reordered N times in a window.
No kill switch tied to spend velocity. The bill only surfaced at month end.

Any single one of those guardrails contains the incident. Two of them prevent it.

The 12 lines that stop this

This is AgentGuard, the runtime budget wrapper I maintain. The shape is the point, not the brand. Any equivalent works.

from agentguard47 import AgentGuard

guard = AgentGuard(
    daily_usd_cap=200,
    per_category_caps={"napkins": 20, "paper_goods": 40},
    rate_limit_per_minute=10,
    on_breach="kill_process",
    alert_webhook="https://hooks.slack.com/...",
)

with guard.session("cafe-purchasing"):
    agent.run()

Twelve lines. Here is what each line buys you in the Sweden scenario:

daily_usd_cap=200 ends the process the moment cumulative spend that day hits $200. The cafe burned about $1,500 per day on average. The wrapper kills the loop on day one, hour two.
per_category_caps={"napkins": 20, ...} is the line that specifically prevents this exact failure mode. Napkins cannot consume more than $20 of the daily budget. The third reorder fails closed.
rate_limit_per_minute=10 catches the runaway loop pattern where the agent keeps retrying the same call.
on_breach="kill_process" is the part most builders skip. Logging a warning and continuing is not a guardrail. Killing the process is.
alert_webhook means you find out in Slack on day one, not on the credit card statement on day thirty.

The cafe owner does not need an AI safety team. He needs twelve lines of Python and a webhook URL.

What this does not solve

Be honest about the gap. Runtime budget rails are one layer. The cafe still has open problems even with the wrapper in place:

Bad supplier choice logic. The agent picked napkins because the prompt rewarded cheap-per-unit. The wrapper does not fix the model's reasoning. That is a prompt and tool-design problem.
No human review on irreversible orders. Supplier orders are mostly non-cancellable once placed. The wrapper kills future orders but does not undo the ones already in flight. Human review on any order over $X is a separate layer.
Vendor lock-in to the model's biases. If the model has been trained to prefer certain brands or categories, the budget cap just rations the bad decision. It does not improve the decision.
The agent does not know it is wrong. Inside the loop it is hitting the reward signal it was given. The wrapper is the external referee. Agents cannot referee themselves.

This is the part the Sweden story is going to get wrong in coverage. People will say "the AI made a mistake" or "the AI was too aggressive." Neither is true. The AI did the cheapest possible thing inside the prompt it was given. The mistake was shipping the loop without an external referee.

The pattern to steal

If your agent has a credit card, a database password, an SSH key, or any other action surface where each call costs real money or causes real change, treat it like a junior employee with a corporate card. You would give the junior a per-category limit. You would set up a daily report. You would put a manager review on anything over a threshold. Same rules for the agent.

The order of operations matters too. Most builders write the prompt first, ship the loop, watch the bill, then add guardrails. Reverse it. The wrapper is line one of the agent. The prompt is line two.

What we ship in agent47

The agent47 repo keeps a Real Incidents log. PocketOS losing prod was the first entry. Sweden napkins is the second. Pattern matters more than the punchline. In both cases the agent did what the loop rewarded and there was no external layer to say no.

If you want the runtime spend layer, AgentGuard is one pip install and the snippet above is the whole API. It will not turn a bad prompt into a good one. It will stop a bad prompt from costing $21,000.

Get AgentGuard

AI software runs on 17% margins. SaaS runs on 70%. The token bill is the problem.

Patrick Hughes — Fri, 15 May 2026 21:08:17 +0000

AI software runs on 17% margins. SaaS runs on 70%. The token bill is the problem.

A new analysis from Gptomics put a number on something every AI founder has been feeling. AI-native software businesses are running at about 17% gross margins. Traditional SaaS sits near 70%. The gap is the token bill.

If you ship an AI product and your COGS line keeps creeping, this is why. You did not misprice on purpose. You repriced without noticing.

Where the margin actually went

A SaaS request costs you a few CPU cycles, some bandwidth, and a database read. Pennies on the thousand.

An AI request costs you tokens. And it is rarely one request.

One user message becomes 3 to 12 model calls once you add retrieval, tool use, and a planner.
Retries on rate-limit or 5xx errors double the bill on a bad day.
Evals and guardrails run their own model calls on every turn.
Memory and context grow, so input tokens grow, so every subsequent call gets more expensive.
Long-running agents loop. A single stuck agent can burn $40 in an afternoon before anyone notices.

You priced the product like a SaaS app. You are operating it like a call center where every minute on the phone is metered.

The three founder mistakes that lock you at 17%

I have looked at a lot of AI agent deployments in the last year. The same three holes show up.

1. No hard cost cap per user, per tenant, or per session.

If a single power user can spend $200 in a week on a $29 subscription, you are not running a SaaS business. You are running an unhedged short on token prices. The fix is a budget at the entity level, enforced before the model call, not in a dashboard you check on Monday.

2. No model fallback ladder.

Every call goes to your best model. Most of those calls did not need it. A two-step ladder of cheap-first, escalate-on-failure cuts 40 to 70% of token spend on the routes I have actually measured. The work is not glamorous. The savings are.

3. No per-tenant telemetry on token spend.

You know revenue per customer. You do not know cost per customer. So when a whale starts costing you more than they pay, you find out at quarter close. By then it has been three months.

These three holes are how a 70% margin product becomes a 17% margin product without anyone shipping a bad decision. Each one is a small omission that compounds.

What 30%+ margins look like

You are not getting back to SaaS 70%. The token bill is real. But 30 to 45% is doable, and that is the difference between a company and a science project.

The pattern that works:

Budget caps at every layer. Per user, per workspace, per route. Hard stops, not warnings. When the cap hits, the request gets a graceful degraded response, not a $14 invoice.
A fallback ladder. Cheap model first. Escalate only when the cheap model fails an eval or the user retries. Default to the floor, not the ceiling.
Token telemetry per tenant. Every call tagged with user_id, tenant_id, route, model. Cost-per-customer becomes a number on a dashboard, not a quarterly surprise.
Loop detection. Any agent that calls the model more than N times for one task gets killed. Stuck agents are the single biggest blow-up risk on a token bill.

You can build this yourself. Most teams do, badly, after the first surprise invoice. Or you can drop in something that already does it.

What I built

I wrote AgentGuard for exactly this. It is a Python SDK that wraps your model calls and enforces budgets, fallback, and telemetry at the call site. No new infra. No proxy server. Pip install and add a decorator.

pip install agentguard47

It is the boring infrastructure layer the AI stack still does not have a default for. If you are sitting at 17% margins and trying to figure out where the leak is, start here.

Go to AgentGuard

Enterprise AI just shifted: Claude +128%, OpenAI -8%. What it means if you're building.

Patrick Hughes — Fri, 15 May 2026 14:45:14 +0000

SaaStr published Q2 enterprise AI usage numbers this week. The shape:

Claude: +128%
Gemini: +48%
OpenAI: -8%
Grok: rounding error

That is the cleanest single-quarter share shift I have seen in this space all year. And the obvious read is wrong.

The lazy take

The lazy take is "Anthropic won, switch to Claude." If you ship that take, you are the same person who told their team to standardize on OpenAI 18 months ago. The whole point of the chart is that single-vendor positions move 100+ points in 90 days now.

The data is not telling you which model to pick. It is telling you that picking is a recurring decision, not a one-time one.

What is actually driving the shift

Three things, near as I can tell from talking to builders shipping agents in production:

Coding agents. Claude Code and the Sonnet line ate the developer market. Once a developer is in Claude all day for code, they tend to reach for the same API for their app's agent calls. Developer mindshare leaks into procurement.
Agentic retention. Long-horizon tool-use tasks reward models that follow instructions and recover from errors. Teams that built real agentic workflows on Claude 3.7 and 4 stuck around.
OpenAI cycle gap. GPT-5 landed but did not produce a Claude-Code-tier shift inside engineering orgs. Distribution from ChatGPT is consumer, not enterprise API usage.

None of these are permanent. Gemini 3 is coming. OpenAI ships something every six weeks. The chart will look different in October.

What this means if you are building

If you are building an agent or AI feature today, the share data is a forcing function. Three concrete moves:

1. Put a model abstraction layer in front of every call. Not a 400-line framework. Just one function in your codebase that takes a prompt and a job type and decides which model and which provider. The function reads from config, not from the call site. When the next chart flips, you change one file.

2. Wrap every agent in a budget. Cost per task varies 5x between providers and 10x inside a single provider's tier list. Without a cap, a model switch can blow your unit economics overnight. This is exactly what AgentGuard does. Install it, set a per-task ceiling in dollars, the agent stops when it hits the cap. Two lines of Python.

3. Run a real eval before you migrate. "Claude is better" is not a procurement decision. Pick your 20 hardest production tasks, run them through three models, score the outputs. The eval becomes a regression suite the next time you re-evaluate. Most teams never build this and that is why they re-platform every nine months on vibes.

The deeper pattern

Every share chart in this space is going to whipsaw for at least another two years. The infrastructure decision is not "which model." It is "how fast can I switch which model." Teams that hard-code one provider into prompts, retry logic, observability, and billing are paying a re-platforming tax every other quarter.

The teams that compound are the ones treating the model as a hot-swappable component. Eval suite, abstraction layer, cost cap, done. Then read the next share chart and move on with your day.

If you want the cost-cap part for free: pip install agentguard47. AgentGuard is a 2-line runtime budget guard for agents. It does the cap, the token limit, the rate limit. Use it, do not use it, but do not ship an agent without one.

Localmaxxing isn't theory. Here's what my 3-GPU rig actually does.

Patrick Hughes — Fri, 15 May 2026 14:45:10 +0000

Tom Tunguz wrote a post this week called Localmaxxing. His thesis: open-weight models on prosumer hardware now match cloud-tier quality for a sliver of the cost. The gap closed. The math flipped.

I've been running this setup for months. RTX 3070, RTX 5070 Ti, RTX 5090, all in one tower, serving Llama 3.1 8B through llama.cpp. So let me skip the thesis and put real numbers on the table.

The rig

One Threadripper box. Three GPUs. 80GB of total VRAM if you stack them, though I don't pool them for a single 8B model. I run Llama 3.1 8B in Q5_K_M quant. That fits comfortably on the 5090 alone with room to spare for a 32k context window.

The 3070 and 5070 Ti run smaller models in parallel for different agent jobs. Embeddings on one, a 3B classifier on another. The 5090 is the workhorse.

Tokens per second on Llama 3.1 8B

On the 5090, Q5_K_M, single batch, no flash-attention tweaks beyond defaults:

Prompt processing: ~3,200 tok/s
Generation: ~140 tok/s sustained

For comparison, Claude Opus and GPT-4-class APIs land around 30-80 tok/s on generation depending on load. My local 8B is faster than the frontier cloud APIs for raw throughput. It's a smaller model, so output quality is lower for hard reasoning. For 80% of agent work (classify, extract, summarize, route, format), it's plenty.

Cost per million tokens

Cloud reference points:

GPT-4o: ~$5 input / $15 output per million
Claude Sonnet 4.5: ~$3 input / $15 output per million
Llama 3.1 8B on Together / Fireworks: ~$0.20 per million blended

My local cost, including amortized hardware and Texas electricity at $0.11/kWh:

The 5090 pulls about 400W under sustained inference. At 140 tok/s, one hour of generation produces 504,000 tokens for 0.4 kWh, or about 4.4 cents. That's $0.087 per million output tokens. Round it to 9 cents.

Hardware amortization is the bigger line. Call it $2,200 for the 5090 over 3 years of mixed use. If the card pulls 1,000 hours of inference per year, that's $0.73 per hour, or about $1.45 per million tokens.

Total all-in: roughly $1.55 per million output tokens on local, versus $15 on Claude Sonnet for the same job class. Ten times cheaper.

Caveat: I'm comparing an 8B model to frontier models. Apples to small oranges. But for the agent jobs where 8B is good enough, the math is settled.

Where local wins, where it doesn't

Wins:

High-volume classification and extraction
Anything privacy-sensitive (client data, medical, financial)
Latency-sensitive interactive flows (no network round trip)
Burst workloads that would smash cloud rate limits

Loses:

Hard reasoning, multi-step planning, code generation at frontier quality
Anything where you actually need the model's knowledge depth
Workloads with idle gaps where the GPU sits dark and you eat the depreciation anyway

The right move for most builders right now is hybrid. Cloud frontier for the hard 20%. Local 8B or 14B for the routine 80%. Route between them based on task class.

What Tunguz is actually saying

His VC framing matters. When Tunguz posts about local LLMs, every CTO who reads his Sunday digest just got cover to take this seriously. The conversation moved from "Patrick's weird hobby rig" to "tier-1 VC thesis" in one blog post.

If you've been waiting for permission to test a local-first or hybrid architecture, this is it.

What this means for cost-controlled agents

I built AgentGuard because cost is the thing that kills agent projects in production. Local LLMs don't make cost discipline optional. They make it more important, because now you have three cost dimensions (cloud spend, local electricity, local hardware amortization) instead of one.

The same AgentGuard policies that cap your cloud budget should cap your local inference budget too. A runaway loop on a local model still burns wattage, still keeps your GPU at 90C, still pegs your CPU. Free at the margin doesn't mean free in practice.

If you want to dig deeper into the consumer-GPU production setup, I wrote about running local LLM inference on consumer GPUs earlier this year. That post covers the stack choices, the quant tradeoffs, and the model-routing logic I use.

Bottom line

Localmaxxing is real. The numbers are real. The hardware is in stock. The tools are stable.

If you're building agents and your cloud bill is climbing, the answer might not be a better prompt or a cheaper model tier. It might be a $2,000 GPU and a weekend with llama.cpp.

Then put a budget on it. AgentGuard handles that part.