Forem: Hooman

The Quiet Shift: AI Is Moving Onto Your Device, and It Changes Everything

Hooman — Fri, 01 May 2026 15:54:31 +0000

Most people are watching the wrong scoreboard.

The AI conversation in 2026 is dominated by benchmark comparisons. Which model scores higher on reasoning tests. Which lab released the biggest parameter count. Who is ahead in the race to AGI.

Meanwhile, something more practically significant is happening, and it is getting far less attention: the place where AI actually runs is changing.

For the past few years, running an AI model meant sending your data to a server somewhere, waiting for a response, and paying someone per token. That model made sense when the models were enormous and the hardware to run them cost millions of dollars.

That assumption no longer holds.

What Actually Changed
Let me be specific, because vague gestures at “progress” are not useful.

This month, Google released Gemma 4, a family of open-weight models under an Apache 2.0 license. The smallest variant, called E2B (Effective 2B), is a multimodal model that handles text, images, and audio. It is small enough to run on a phone or a laptop, and it is genuinely capable.

At almost the same moment, Hugging Face shipped Transformers.js v4, and the headline feature is a native WebGPU backend.

Here is why that combination matters: WebGPU is a modern browser API that gives JavaScript code direct access to your device’s GPU. Transformers.js is a JavaScript library for running machine learning models. Put Gemma 4 E2B and Transformers.js v4 together, and you have a capable multimodal AI model running inside a browser tab, using your local GPU, with no network request going anywhere.

No API key. No cloud account. No data leaving your machine.

Hugging Face published a guide showing how to build a Chrome extension powered by this exact stack. A fully functional on-device AI browser assistant. The open source demo project hit 800+ GitHub stars within days of launch.

This is not a research prototype. It is a working template that any developer can use today.

Apple Is Betting the Same Way
If the Gemma 4 story is about what developers can build right now, Apple’s announcement is about what hundreds of millions of consumers will experience whether they think about it or not.

Apple just unveiled the Foundation Models framework for iOS 26, iPadOS 26, and macOS 26. The architecture is straightforward: a roughly 3 billion parameter model runs on the Neural Engine of your device by default. The cloud is the fallback, not the starting point.

Read that again, because it is the opposite of how most AI products work today. Every other major AI product defaults to the cloud and falls back to a limited local experience when connectivity fails. Apple is inverting that. Local first. Cloud as a last resort.

For a company with over 2 billion active devices, that is not a small architectural decision. It is the largest deployment of local AI inference in history, rolled out quietly as a software update.

The reason is not hard to figure out. Apple’s entire brand is built on privacy. Sending every Siri request, every writing suggestion, and every photo analysis query to a remote server is fundamentally in tension with that brand promise. Running models on-device resolves the tension entirely.

The Hardware Story Nobody Tells
There is a reason local AI is becoming practical now rather than three years ago, and it comes down to silicon.

Apple’s Unified Memory Architecture deserves more attention than it gets. In a standard computer, the CPU and GPU have separate memory pools. Moving data between them takes time and bandwidth. Apple Silicon eliminates that boundary entirely. The CPU, GPU, and Neural Engine all share the same memory pool.

The practical result: a MacBook with 64GB of unified memory can run a 35 billion parameter Mixture-of-Experts model locally. End to end. No cloud. No monthly bill. No rate limits.

That is a model roughly comparable to what was considered frontier-class just two years ago, running on a consumer laptop you can buy at an Apple Store.

Qualcomm is doing similar work on the Windows and Android side. The Snapdragon X Elite chips ship with NPUs delivering 45+ TOPS (Tera Operations Per Second). MediaTek and Samsung are not far behind. The current generation of flagship Android phones runs Gemini Nano locally for on-device tasks.

The hardware wave that makes local AI practical is not coming. It is here.

What This Unlocks for Builders
Let me be concrete about the use cases, because this is where the real story is.

Healthcare and legal tools. These are verticals that have struggled to adopt AI because of data privacy requirements. HIPAA in the US, GDPR in Europe, and a growing set of sector-specific regulations make it genuinely risky to send patient records or legal documents to a third-party cloud. Local inference eliminates that risk. The data never leaves the device. There is no third party involved.

Offline-first applications. Most of the world does not have reliable internet connectivity all the time. A local model works on a plane, in a hospital basement, in a rural area, or anywhere connectivity is intermittent. This is not a niche concern. It is the reality for billions of people.

Browser extensions and desktop utilities. The Gemma 4 E2B + Transformers.js stack means a solo developer can ship an AI-powered browser extension with zero backend infrastructure. No server to maintain. No API costs that scale with usage. No vendor dependency. The economics of building AI tools just changed fundamentally.

Privacy-preserving personalization. A model running locally can learn from your behavior and adapt to your preferences without any of that data touching a server. Personalized AI assistance without a company building a profile on you.

The Models Are Good Enough
A reasonable objection to all of this: are models small enough to run locally actually useful, or are they watered-down toys compared to frontier cloud models?

The honest answer is: it depends on the task, and the gap is closing faster than most people expected.

On-device multimodal models are now hitting 90 to 95 percent of cloud model performance on most everyday tasks. Coding assistance, summarization, Q and A, image description, document analysis: local models handle all of these well. Open models like Llama 4 Scout, Qwen 3, and Phi-4 match GPT-4o and Claude on coding and math benchmarks when running on consumer hardware.

Where local models still struggle: very long context windows, tasks requiring real-time web knowledge, and complex multi-step reasoning over very large documents. These are real limitations, not marketing spin. If your application needs 128K tokens of context or live web search, local inference is not the right tool yet.

But if your application needs capable, fast, private AI for most standard tasks, local models are there.

The Developer Opportunity
Here is what I keep coming back to when I think about the Gemma 4 + Transformers.js + WebGPU story.

For the first time, “build an AI product” does not require:

A cloud account

An API key

Infrastructure that gets more expensive as you grow

Trust in a third-party vendor to handle your users’ data

A solo developer with a laptop can ship a browser extension, desktop app, or mobile app powered by a genuinely capable AI model, with zero ongoing external costs.

That is a different world from the one we were in 18 months ago, when building anything with AI meant signing up for an API and watching your costs scale with every user.

The open source community has already noticed. Tools like Ollama (for running models locally on desktop), LM Studio, and Jan have been growing fast. Developers who got burned by API price hikes or unexpected bills are actively looking for self-hosted alternatives. The Transformers.js stack gives web developers a browser-native version of that same independence.

What Still Needs Work
Fair reporting means naming the limitations honestly.

Battery and thermal management on mobile is still a challenge. Running a model inference on a phone’s NPU at full speed drains the battery and heats the device. OS-level scheduling is improving, but it is not solved. You cannot run continuous background inference on a phone the same way you can on a plugged-in laptop.

Model updates are messier locally. With a cloud API, the model updates invisibly and your application gets better for free. With a local model, you need an explicit update mechanism. Versioning, compatibility, and update distribution are all problems you own.

Real-time video understanding is still rough on-device. Most local models handle images well, but processing video frames in real time requires more compute than current mobile hardware delivers smoothly.

And the context window gap is real. Local models typically cap at 8K to 32K tokens. Cloud frontier models are pushing 1 million. For most applications this does not matter. For some it is a dealbreaker.

The Bigger Picture
Step back from the specific announcements and the hardware specs, and something interesting comes into focus.

The history of computing is a history of moving capability closer to the user. Mainframes became minicomputers became personal computers became smartphones. Each step meant giving individuals more direct access to compute without routing through a centralized resource.

Local AI is the same pattern applied to intelligence. The question has always been when the hardware would catch up to make it practical. Based on what shipped this month alone, that moment is now.

The next generation of AI-powered software will not ask for your API key. It will not send your data anywhere. It will not have usage limits or pricing tiers. It will just run, on your device, private and fast.

That is not a prediction. It is a description of tools you can download and use today.

What are you building with local models? Or what would you build if the cloud dependency wasn’t there? Reply and let me know. I read everything.

Sources: Google DeepMind Gemma 4 announcement (April 2, 2026), Hugging Face Transformers.js v4 release (March 30, 2026), Hugging Face Transformers.js Chrome Extension guide (April 23, 2026), Apple Foundation Models framework announcement (April 26, 2026), local-llm.net State of Local AI 2026 (April 8, 2026), Yewsafe Edge AI Revolution report (March 2026).

Originally posted on my Substack: https://substack.com/home/post/p-195662175

The Rise of Self-Evolving AI Agents: Memory, Skills, and the Architecture That Changes Everything

Hooman — Mon, 27 Apr 2026 15:55:22 +0000

Most AI agents have a dirty secret. The moment you close the tab, they forget everything. Your preferences, your workflow, the mistakes they made last time, the shortcuts you taught them; gone. Every session starts from zero.

That's not intelligence. That's a very expensive autocomplete.

A new class of AI system is changing this. Self-evolving agents don't just respond to you. They remember you. They learn from you. They get better the longer you use them; without anyone retraining the underlying model.

This is not science fiction. It's happening right now, in production systems used by thousands of developers and builders. And understanding how it works is one of the most important things anyone building with AI can do in 2026.

What "Self-Evolving" Actually Means
Before we go further, let's be precise. Self-evolving does not mean the model's weights are changing. The base LLM, whether it's Claude, GPT-4o, or Gemini, stays frozen. What evolves is everything around the model: the context it operates in, the knowledge it can draw on, and the procedures it uses to get work done.

There are two broad categories of self-evolving agents, and most people confuse them.

Type 1: Harness Evolution

This approach evolves the agent's software architecture itself. A meta-agent reads a vision document, proposes improvements to the agent harness, evaluates those improvements against a baseline, keeps the ones that win, and repeats. This is powerful but requires a large task database and a programmatic evaluation function. Most practitioners don't have these things, which makes harness evolution hard to implement in practice.

Type 2: In-Context Evolution

This approach evolves what the agent knows and how it behaves at runtime. No code changes. No retraining. The agent accumulates memory, builds skills, and maintains a searchable history of its interactions. This is what most builders need today, and it's what the rest of this piece is about.

The Three Pillars
Every serious self-evolving agent is built on three foundational pillars. Get all three right and the result feels qualitatively different from anything you've used before.

Pillar 1: Memory
Memory is how the agent retains knowledge about you and your environment across sessions. Not in a vague, statistical way; but explicitly, in structured files and databases it can read, update, and reason over.

The best memory systems use three tiers:

Hot memory is always loaded into the system prompt. It contains your most important preferences, your working style, your project conventions. The agent has this in mind from the first word of every session.

Warm memory consists of indexed files the agent loads on demand. Detailed documentation, reference material, domain-specific context. It doesn't need to clutter the system prompt because the agent knows how to find it when it's needed.

Cold memory is a searchable database of every past conversation. Every session is logged, indexed, and queryable. When you ask the agent about something you discussed three weeks ago, it can find it. This is what creates the genuine sense of persistent, cross-session recall that makes users feel like the agent actually knows them.

Most agents today only use hot memory. That's exactly why they feel forgetful.

Pillar 2: Skills
Skills are the most underrated pillar of the three. Not facts. Not preferences. Reusable, executable procedures; a recipe book of everything the agent has learned to do well.

The first time an agent helps you do something complex, it figures it out from scratch. The fiftieth time, it should have a well-tested, refined procedure it can follow immediately, updated each time it discovers a better approach.

The key insight is that outdated skills are not just unhelpful. They are actively harmful. An agent following a procedure that no longer works will produce wrong results confidently, which is worse than not having a procedure at all. The best implementations treat stale skills as liabilities and instruct agents to patch them the moment they discover something is wrong; not on the next session, not when asked, immediately.

Pillar 3: History
History is the raw, unprocessed record of what the agent has done. Not curated, not compressed; the ground truth log that memory and skills are eventually distilled from.

The critical property of history that most systems get wrong is searchability. A log you can't query is a liability, not an asset. The best self-evolving systems store conversation history in searchable databases, with both keyword and semantic (meaning-based) search. This allows the agent to retrieve not just what happened, but the reasoning behind its decisions; making future decision-making genuinely informed by past experience.

How the Best Systems Actually Work

Claude Code: Three-Layer Memory
Claude Code, Anthropic's agentic coding assistant, pioneered a practical three-layer memory architecture. A CLAUDE.md file provides always-on hot memory. Additional indexed files provide warm memory loaded on demand. And a background process called AutoDream, discovered in leaked source code; runs asynchronously after each session ends, consolidating memory, removing outdated entries, and updating the index without interrupting the user's workflow.

AutoDream is important because it solves a problem that every prompt-based memory system has: the agent forgetting to maintain its own memory. You can instruct an LLM to update its memory files after every session. It will follow that instruction inconsistently at best. AutoDream removes the dependency on the agent's own discipline by making memory consolidation a scheduled, external process.

Hermes Agent: The State of the Art
Hermes Agent is currently the most sophisticated implementation of in-context self-evolution available. It introduces two autonomous background processes that together produce an agent that feels meaningfully smarter over time.

The Skill Generator monitors how many steps the agent takes to complete tasks. Every time the agent executes more than 10 steps without generating a new skill, a background sub-agent is spawned. It reviews the recent work, evaluates whether a non-trivial approach was used, and if so, writes a new skill or updates an existing one. The main agent is explicitly instructed: "If you find a skill that's outdated or wrong, patch it immediately. Don't wait to be asked."

The Memory Reviewer triggers every 10 conversation turns. A background agent reviews the recent conversation looking for revealed preferences, expressed expectations, and personal context. Anything useful gets written into the memory files automatically.

Neither of these processes requires user input. They run in the background, silently, while you keep working. The result is an agent whose context becomes progressively richer and more accurate with every interaction; not because you told it to remember things, but because remembering is built into its architecture.

The Risks Nobody Is Talking About
Self-evolving agents are powerful. They are also risky in ways that static agents are not. The evolution process itself can go wrong, and when it does, the results are insidious.

Researchers have named this phenomenon misevolution: unintended deviations in agent behavior caused by the accumulation of experience.

The findings are alarming. Studies have shown that refusal rates, how often an agent declines to perform harmful actions; can drop by 45 to 55% after sustained memory accumulation. The mechanism is subtle: benign interactions gradually reinforce a bias toward task completion over refusal, and this bias compounds in memory over time. This effect has been observed in GPT-4o, Gemini 2.5 Pro, and other top-tier models. It is not a weakness of any specific model. It is a structural property of any system that learns from its own experience.

Auto-generated skills carry their own risks. Research has found that 76 to 93% of autonomously created tools introduce some form of vulnerability; through insecure code patterns, unvalidated inputs, or unintended side effects. Any system that allows agents to write their own procedures must include safety scanning before those procedures are saved.

Memory pollution is perhaps the most immediately practical risk. An incorrect memory written early, a wrong preference, a miscategorized fact; will corrupt every session that follows it, because the agent will act on that memory confidently. Wrong information stored in hot memory is worse than no information, because it displaces the correct information the agent might otherwise infer from context.

Prompt-based safety measures are insufficient against all of these risks. The field needs architectural solutions: mandatory scanning of generated skills, rollback mechanisms for memory updates, post-evolution safety evaluations, and character caps on hot memory to limit the blast radius of pollution.

The Recipe
For anyone building self-evolving agents today, the research and production implementations point to a clear set of principles:

Separate memory from skills. Factual memory and procedural knowledge serve different purposes. Mixing them creates bloated, hard-to-maintain files and makes both worse.

Use hot, warm, and cold tiers. Not everything needs to be in the system prompt. Keep hot memory lean and push everything else to indexed warm files or a searchable cold database.

Use async background processes, not prompt-based memory. Relying on the agent to update its own memory is unreliable. Background processes that trigger on schedules or event counts are robust and consistent.

Make history searchable. Keyword search at minimum. Semantic search for best results. A log you can't query is useless.

Treat outdated skills as liabilities. Instruct agents to patch wrong or incomplete skills immediately and unconditionally.

Safety scan every generated skill. No exceptions. Define a list of forbidden patterns and enforce it automatically.

Build in forgetting. Memory systems that only accumulate will eventually become noisy and counterproductive. Design pruning and consolidation mechanisms from day one.

The Bigger Picture
We are still early. Most of the self-evolving agent systems described here are less than two years old. The benchmarks for evaluating them longitudinally barely exist. The safety frameworks adequate for continuously learning systems have not yet been built.

But the trajectory is clear. The agents that will define the next wave of AI products are not the ones running the most powerful models. They are the ones with the most sophisticated memory architectures. A better base model is a one-time upgrade. A memory system that improves with every interaction compounds indefinitely.

The model is frozen. The memory is alive.

That distinction, simple as it sounds, is the most important architectural idea in AI agent design right now. The builders who internalize it first will have an enormous advantage over those who don't.

Sources: "On Safety Risks in Experience-Driven Self-Evolving Agents" (arXiv:2604.16968); "Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents" (NeurIPS 2025); "A Survey of Self-Evolving Agents" (arXiv); EVOLVE-MEM, MemEvolve, Memento-Skills, and REASONINGBANK research frameworks.

🧠 Context Engineering: The Skill That Actually Makes AI Work

Hooman — Fri, 24 Apr 2026 15:59:00 +0000

Every time you send a message to ChatGPT, Claude, or any LLM, it forgets you exist. No memory of your last conversation. No context from before. Nothing. The only way it “knows” who you are is if you re-introduce yourself. Every. Single. Time. That is not a bug. That is the problem context engineering solves.

Most people blame the model when AI gives garbage outputs. Wrong diagnosis. The model is not the problem. The context is. Everything an LLM needs to answer your question must be sent WITH your question. Understanding this one thing will change how you work with AI.

Context has 4 building blocks. Think of each as a dial you can tune.

1/ Memory

Your conversation history and preferences. The catch? LLMs have limits. Long conversations get summarized, and that is when “drift” happens. You start repeating yourself. The AI forgets your rules. That is not the model getting dumber. That is imperfect memory management.

2/ Files

Documents, screenshots, PDFs fed directly into the context. Hot tip: drop a screenshot into ChatGPT and ask it to build a prompt from it. Works shockingly well.

3/ RAG (Retrieval Augmented Generation)

Before your question hits the LLM, smart systems quietly search a knowledge base and stuff relevant results into the context alongside it. That is why internal AI chatbots can answer questions about your specific products. Not magic. Retrieval.

4/ Tools

The LLM does not execute tools. It tells your system WHAT to run. Your system runs it, stuffs the result back into the context, and THEN the model answers. Tool calling is the AI directing traffic, not doing the work.

Most people treat prompts like a black box. Throw stuff in, hope for the best.

But when you see memory, files, tools, and prompt structure as separate levers, you stop guessing and start engineering.

The 6 prompt components that actually matter:

✓ Role: “You are an expert in...”

✓ Personality: tone and style

✓ Request: the actual task

✓ Format: be explicit (bullets, JSON, table)

✓ Examples: two good, two bad works wonders

✓ Constraints: “never do X”

Most people only use the Request and skip the other five.

Here is the bigger picture.

Context engineering is not just for developers building agentic systems. It is a thinking skill.

When you learn to structure information so a machine can reason over it accurately, you get better at structuring information for humans too. Clearer writing. Clearer thinking. Clearer communication.

Context is not a setting you configure once. Context is everything you send.

Get that right and the model almost does not matter.

Are you still treating AI like a magic black box, or are you already thinking about context?

Drop a comment. I am curious where people are at with this.

Originally posted on my Substack

Google Just Upgraded Its AI Research Agent, and It's a Big Deal

Hooman — Fri, 24 Apr 2026 15:42:17 +0000

Google DeepMind announced two significant evolutions of its autonomous research agent: Deep Research and Deep Research Max. Both are built on Gemini 3.1 Pro, and together they represent a meaningful shift in what AI-assisted research can actually do at a professional level.

If you’ve been following the AI space, you’ll know that “research agents” have been a hot topic. But most of them still feel like glorified summarizers. Google is making a credible case that it’s moving past that.

From Summarizer to Research Engine

When Google first released the Gemini Deep Research agent to developers back in December 2025 via the Interactions API, it was already impressive. But the team describes the new version as a transformation from “a sophisticated summarization engine into a foundation for enterprise workflows.” That’s not just marketing language. The new agents are designed to support real-world use cases in finance, life sciences, and market research, producing fully cited, professional-grade analyses from a single API call. The key upgrade? The ability to blend the open web with proprietary data streams, which changes the game significantly for enterprise users.

Two Agents, Two Use Cases

Google is now offering two distinct configurations. Deep Research is the speed-optimized option, built for interactive user surfaces where low latency matters. Think of it as the version you’d embed directly into a product that a user is actively engaging with. Deep Research Max is the heavy-duty option, using extended test-time compute to iteratively reason, search, and refine its output. Google specifically positions it for asynchronous, background workflows, like a nightly cron job that generates exhaustive due diligence reports for an analyst team by morning.

MCP Support, Native Charts, and Multimodal Inputs

Three new capabilities stand out. First, Model Context Protocol (MCP) support is arguably the most significant addition. Deep Research can now connect to custom data sources and specialized professional data providers (think financial data, market intelligence platforms) securely via MCP. This turns it from a web searcher into an autonomous agent capable of navigating any specialized data repository you point it at. Google is already collaborating with FactSet, S&P Global, and PitchBook on their MCP server designs.

Second, native charts and infographics are new for the Gemini API. The agent now generates high-quality charts and infographics inline with HTML, dynamically visualizing complex data sets within the report itself. That’s a meaningful step toward presentation-ready outputs.

Third, multimodal research grounding means you can feed the agent a mix of PDFs, CSVs, images, audio, and video as input context. Combined with Google Search, URL context, code execution, and file search running simultaneously, the tool is starting to look like a genuine research analyst rather than a chatbot with a search button.

Collaborative Planning and Real-Time Streaming

Two other additions are worth noting for developers and product builders. Collaborative planning lets you review and refine the agent’s research plan before it begins execution, giving you granular control over scope. This is critical in regulated industries where you need to know exactly what the agent is and isn’t looking at. Real-time streaming, meanwhile, lets you track the agent’s intermediate reasoning steps live, with thought summaries and outputs arriving as they’re generated, which is a much better experience than waiting for a finished report to land.

The Enterprise Bet

What’s clear from this announcement is that Google is making a serious enterprise play. The focus on finance and life sciences, the partnerships with FactSet, S&P Global, and PitchBook, and the emphasis on factuality and source diversity all point to a product team working in regulated, high-stakes environments. Google also notes that Deep Research already powers research capabilities inside the Gemini App, NotebookLM, Google Search, and Google Finance, which is a meaningful signal about the maturity of the underlying infrastructure.

Deep Research and Deep Research Max are available today in public preview via paid tiers in the Gemini API, with Google Cloud availability for startups and enterprises coming soon. If you’re building research-heavy products or workflows, this is worth a close look.

Originally posted on my Substack

How I learn in the age of Ai coding

Hooman — Fri, 24 Apr 2026 15:39:23 +0000

There is a strange thing that happens when AI starts writing your code for you. You might assume learning slows down. That you become passive. That you are just a prompt monkey, copy-pasting outputs and shipping features without really understanding what is going on under the hood.

That has not been my experience at all. At least not anymore.

If anything, I am learning differently now, and in some ways more richly than before. The topics I am picking up are not always the ones I expected to care about. But they are sticking, and they are making me a better builder. Let me explain what I mean.

The AI Codes, But I Still Have to Understand the Problem

When I work with an AI coding assistant, the code gets written fast. But bugs still surface. Edge cases still bite. And when something breaks, I still have to diagnose it. That diagnosis process is where a lot of my learning now lives.

A recent example: I was building a text-to-speech (TTS) feature. The AI scaffolded the whole thing quickly. But then things started going wrong in ways I did not immediately understand. Fixing those issues sent me down some genuinely interesting rabbit holes.

Learning About TTS Input Length Limits and Chunking

The first thing I ran into was that TTS APIs have input length limitations. Most of them cap how much text you can send in a single request. When I fed a long block of content into the API, it either failed silently or threw an error I did not immediately recognize.

The AI could generate a chunking solution for me, and it did. But to actually steer it toward the right solution, I had to understand the problem first. What counts as a “chunk”? Do you split on character count, word count, or sentence boundaries? What happens if you split mid-sentence? Does the audio sound jarring at the seam?

I learned that splitting on sentence boundaries produces much cleaner audio output. I learned about the tradeoffs between chunk size and API latency. I learned how to think about reassembling audio segments in the right order. None of this was in the original feature spec. All of it came from debugging a problem the AI helped create and then helped solve.

Learning About Markdown and Special Characters in TTS

The second rabbit hole was even more interesting. When you pipe markdown content directly into a TTS engine, it reads everything. And I mean everything. The asterisks. The pound signs. The underscores. The hyphens used as bullet points.

Suddenly your clean article gets narrated as “asterisk asterisk important asterisk asterisk” and the whole thing sounds broken. This is not a bug exactly. It is just a mismatch between how markdown is structured for visual rendering versus how raw text is processed by a speech engine.

To fix it, I had to learn about stripping markdown before passing text to TTS. There are libraries that help with this, and the AI pointed me toward them. But understanding why the problem existed, and what categories of characters cause issues, meant I could write better prompts the next time. I could tell the model exactly what I needed: strip headers, remove emphasis markers, preserve sentence structure, handle code blocks gracefully.

That kind of specific instruction only comes from having gone through the problem once.

Learning About Porosity

This one came from a completely different project but the same pattern. I was working on something that involved understanding how materials or data structures “breathe,” for lack of a better word. How things pass through layers. The concept of porosity came up, and because the AI was doing the implementation, I had the mental space to actually sit with the concept rather than rushing to write code.

I looked it up. I read about it. I let myself get curious about it in a way that I might not have if I had been heads-down in syntax.

That is a real benefit of AI-assisted development that does not get talked about enough. When the mechanical parts of coding are handled, you get cognitive headroom to actually learn the domain you are building in.

How This Changes the Way I Work Next Time

Here is the part that matters most to me. Every one of these learning moments, TTS chunking, markdown stripping, porosity, they do not just inform the current project. They change how I approach the next project.

When I hit a TTS feature again, I will prompt the AI differently from the start. I will say: handle input length limits with sentence-aware chunking, strip markdown before synthesis, and return audio segments in order. That is a much better starting prompt than “build me a TTS feature.” The AI can only be as precise as the person steering it.

This is the core of how I think about learning in this era. The AI handles execution. My job is to build a richer and richer mental model of the problem space so that I can direct the execution better each time. The learning loop is still very much alive. It just runs through different channels now.

The Takeaway

AI coding has not made learning irrelevant. It has shifted what you need to learn and when. You spend less time memorizing syntax and more time understanding systems, constraints, and domain concepts. You learn through debugging, through curiosity, and through the iterative process of steering a model toward better outputs.

Every weird edge case is a lesson. Every broken feature is a map of something you did not know yet. And the next time you sit down to build something similar, you bring all of that with you.

That feels like learning to me.

Originally posted on my Substack