<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Debbie O'Brien</title>
    <description>The latest articles on Forem by Debbie O'Brien (@debs_obrien).</description>
    <link>https://forem.com/debs_obrien</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F212929%2F947ba7e0-41fe-464a-a4f3-abb66a3170c6.jpg</url>
      <title>Forem: Debbie O'Brien</title>
      <link>https://forem.com/debs_obrien</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/debs_obrien"/>
    <language>en</language>
    <item>
      <title>How I Documented an Entire Product in 4 Days with an AI Agent</title>
      <dc:creator>Debbie O'Brien</dc:creator>
      <pubDate>Wed, 13 May 2026 20:18:51 +0000</pubDate>
      <link>https://forem.com/debs_obrien/how-i-documented-an-entire-product-in-4-days-with-an-ai-agent-3338</link>
      <guid>https://forem.com/debs_obrien/how-i-documented-an-entire-product-in-4-days-with-an-ai-agent-3338</guid>
      <description>&lt;p&gt;I had 55 pages of documentation to write, 59 screenshots to capture, and a product that was still shipping features and being rebranded weeks before release. I did it in four days with &lt;a href="https://github.com/aaif-goose/goose" rel="noopener noreferrer"&gt;Goose&lt;/a&gt;, an open-source AI agent by Block, part of the Linux Foundation, and I want to walk you through exactly how. Not the polished version. The real one: how I built it, how it works, everything that broke along the way, and what I learned from it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://theaiplatform.app" rel="noopener noreferrer"&gt;The AI Platform&lt;/a&gt; by &lt;a href="https://zephyr-cloud.io" rel="noopener noreferrer"&gt;Zephyr Cloud&lt;/a&gt; is a desktop app where teams collaborate with AI specialists in channels. Think Slack meets AI agents. The product had been moving fast for months. Features were shipping, the UI was evolving, and the documentation was... not keeping up. What existed was a handful of developer-focused reference pages. Markdown files describing CRDT schemas and workflow adapter formats. Useful if you were building the product. Useless if you were trying to use it.&lt;/p&gt;

&lt;p&gt;We needed end-user documentation. The kind where someone installs the app, opens the docs, and understands how to create a channel, mention a specialist, and get work done. And we needed it before the official release, which was a few weeks away.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why an AI Agent
&lt;/h2&gt;

&lt;p&gt;I have written plenty of documentation by hand. It is one of the most time-consuming parts of shipping a product. Not because the writing itself is hard, but because of everything around it. You need to understand the feature by reading source code. You need to take screenshots. You need to crop and optimize them. You need to keep the screenshots updated when the UI changes. You need to maintain consistent voice and structure across dozens of pages. And you need to do all of this while the product is still changing underneath you.&lt;/p&gt;

&lt;p&gt;I had been using the agent for other tasks in the codebase and thought: what if I could create a way to write all the documentation from source code, capture screenshots that could be recaptured any time the app changes, and also improve the documentation based on those screenshots.&lt;/p&gt;

&lt;p&gt;For those unfamiliar, Goose is an open-source AI agent that runs on your machine. It can read and write files, run shell commands, interact with APIs, and use extensions and &lt;strong&gt;skills&lt;/strong&gt; to specialize in different tasks. Skills are markdown files that encode instructions, conventions, and tooling for a specific task. When you load a skill, the agent follows those instructions. When you improve the skill, every future session benefits. It is the difference between telling an agent what to do every time and teaching it once.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Plan
&lt;/h2&gt;

&lt;p&gt;Before writing a single page, I sat down and created a phased plan. This turned out to be the most important decision of the whole project. You have an idea in your head but no real structure, and you need to think it through before throwing an agent at it. We created a tracer bullet format with sub-tasks so the agent could work phase by phase and tick off what it had done. One night I even went to bed and left it working on a task. The next morning I reviewed everything it had done and iterated over the parts that needed adjusting. I deliberately avoided using a loop where the agent just runs through everything unattended. I wanted to stay in charge and monitor how things were going, because I was also refining the skills as I went along.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Phase 0: Restructure.&lt;/strong&gt; Delete developer-focused content from the user guide. Move reference docs to a separate section. Set up the directory structure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Phase 1: Getting Started.&lt;/strong&gt; Installation, account creation, platform tour, first channel. The first five minutes of the product.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Phase 2: Daily Use.&lt;/strong&gt; Chat, messaging, threads, specialists. The features people use every day.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Phase 3: Power Features.&lt;/strong&gt; Projects, tasks, workflows, knowledge garden. Features that experienced users reach for.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Phase 4: Settings.&lt;/strong&gt; Connections, sandbox, MCP servers, billing, permissions, browser extensions. Every settings page documented.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Phase 5: Polish.&lt;/strong&gt; Screenshots for all pages. Cross-linking. Consistent voice. Image optimization.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Phase 6: Undocumented Features.&lt;/strong&gt; Go through the app screen by screen and find anything I missed. This phase caught the embedded browser, the code editor panel, and several settings pages that had no documentation at all.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The phased approach mattered because it gave me clear stopping points. After each phase, I could commit, review, and course-correct.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe06yzpcmpuljc4l5bg8y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe06yzpcmpuljc4l5bg8y.png" alt="4-Day Sprint Timeline showing commit activity: Day 1 kickoff with 4 commits, Day 2 evening sprint with 12 commits, Day 3 with 43 commits including sidebar redesign disruption, Day 4 with 22 commits to finish and ship" width="800" height="267"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Three Skills I Built
&lt;/h2&gt;

&lt;p&gt;Here is where it gets interesting. I did not just use the agent to write documentation. I built three skills that taught it &lt;em&gt;how the documentation works&lt;/em&gt;, and those skills evolved throughout the project as I hit problems and found better approaches.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. write-docs: The Style Guide in Code
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm3z0qbmo7s40ifp0x5vm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm3z0qbmo7s40ifp0x5vm.png" alt="write-docs skill card: 513 lines covering voice and tone rules, page structure template, formatting conventions, and verification checklist" width="800" height="135"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This skill is 513 lines of instructions that define how every documentation page should be written. It covers:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Voice and tone.&lt;/strong&gt; Casual and friendly. Direct. Confident. "Click Settings" not "You may want to consider clicking Settings."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Formatting rules.&lt;/strong&gt; Bold for UI elements the user needs to find. Italics for text the user will see but not interact with. Code backticks for anything the user types. No emojis. No em dashes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Page structure.&lt;/strong&gt; Start with what the user sees, not how it works internally. One idea per paragraph. Lead with the action. A full page template with frontmatter, headings, screenshots, callouts, and cross-links.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What not to document.&lt;/strong&gt; Internal implementation details, developer workflows, API references, features behind feature flags. This is user documentation, not a code tour.&lt;/p&gt;

&lt;p&gt;The skill also includes a verification checklist that the agent walks through before committing. Content checks (no emojis, no em dashes, UI elements bolded), screenshot checks (optimized, cropped, registered in the manifest), and a build check (&lt;code&gt;pnpm build&lt;/code&gt; must pass with no dead links). It is not an automated gate. It is instructions baked into the skill that the agent follows every time.&lt;/p&gt;

&lt;p&gt;Why does this matter? Because without it, every documentation session would start with me re-explaining the same conventions. With the skill loaded, the agent writes in the right voice from the first sentence. And when I noticed a pattern I did not like (too many callouts per page, screenshots that were too large), I updated the skill once and every future page followed the new rule.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. doc-screenshots: Automated Screenshot Capture
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fht3mwcraro7vuclm2i3u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fht3mwcraro7vuclm2i3u.png" alt="doc-screenshots skill card: 478 lines of instructions plus 1,722 lines of tooling code, covering Peekaboo integration, Vision OCR, YAML manifest runner, and batch capture modes" width="800" height="133"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is the most technically interesting skill and the one that saved the most time. It is 478 lines of instructions backed by 1,722 lines of tooling code across four scripts: a bash CLI, a Python manifest runner, a Swift OCR text finder, and a Python highlight overlay renderer.&lt;/p&gt;

&lt;h4&gt;
  
  
  Why Not Playwright?
&lt;/h4&gt;

&lt;p&gt;The first question people ask: why not use Playwright? I use Playwright every day. I love it. But it would not have worked here.&lt;/p&gt;

&lt;p&gt;The AI Platform is a Tauri desktop app. The UI runs in a native webview, not a browser tab. Playwright automates browsers. It cannot connect to a Tauri webview. Even if you could somehow attach to the webview's DevTools protocol, you would be fighting against the native window chrome, the system title bar, and the fact that the app's routing and state management are wired through Tauri's IPC bridge, not standard browser navigation.&lt;/p&gt;

&lt;p&gt;I needed something that works at the OS level: find the window, click things on screen, capture what the user actually sees. That led me to &lt;a href="https://github.com/openclaw/Peekaboo" rel="noopener noreferrer"&gt;Peekaboo&lt;/a&gt;, a macOS automation tool that interacts with apps through accessibility APIs and screen coordinates.&lt;/p&gt;

&lt;h4&gt;
  
  
  The Pipeline
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy4699jvgefweup3s6kkd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy4699jvgefweup3s6kkd.png" alt="Screenshot pipeline flow: Peekaboo navigates and focuses, Peekaboo --retina captures at 2x, Swift Vision OCR finds text, Pillow adds highlights, pngquant and optipng compress" width="800" height="405"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The pipeline works like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Peekaboo&lt;/strong&gt; finds the app window and focuses it. If you need to navigate somewhere first, it clicks UI elements by their visible text.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Peekaboo &lt;code&gt;--retina&lt;/code&gt;&lt;/strong&gt; captures the window at 2x retina resolution without the drop shadow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A Swift script using the Vision framework&lt;/strong&gt; runs OCR on the captured image. It finds every piece of text and returns pixel-accurate bounding boxes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A Python script using Pillow&lt;/strong&gt; draws highlight overlays, borders, and spotlight effects on the image based on the OCR results.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;pngquant and optipng&lt;/strong&gt; compress the final image. This typically reduces file size by 50 to 60 percent with no visible quality loss.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No hardcoded coordinates for content elements. No browser automation. No authentication tokens. The agent looks at the actual app window, reads the text on screen, and figures out where things are.&lt;/p&gt;

&lt;p&gt;The pipeline originally used three separate native macOS tools stitched together. I filed an issue on the &lt;a href="https://github.com/openclaw/Peekaboo" rel="noopener noreferrer"&gt;Peekaboo repo&lt;/a&gt; requesting retina capture support, and the maintainer shipped it within days. That simplified the pipeline to a single &lt;code&gt;peekaboo image --retina&lt;/code&gt; call plus the Swift OCR script.&lt;/p&gt;

&lt;h4&gt;
  
  
  The Screen Takeover Problem
&lt;/h4&gt;

&lt;p&gt;There is a real trade-off with this approach. Peekaboo needs the app window visible and in focus. While the audit or batch capture is running, it is clicking through your app, opening dialogs, navigating between pages, pressing Escape to close things. Your screen is not yours for the duration.&lt;/p&gt;

&lt;p&gt;A full audit takes about 10 minutes. A full recapture takes 15 to 20. During that time, you cannot touch the mouse or keyboard without breaking the run. In practice, you kick off the batch, go make coffee, and come back to 59 freshly captured, cropped, and optimized screenshots. Captures can technically run in the background, but navigation clicks need the window in focus and control of the mouse. Even with a second monitor, if you move the mouse it interferes with the run. The agent needs your machine for the duration. Treat it as a coffee break. It is also not ready for CI yet since macOS CI runners do not have a logged-in GUI session with the Accessibility and Screen Recording permissions that Peekaboo needs.&lt;/p&gt;

&lt;p&gt;The key insight was the &lt;strong&gt;screenshot manifest&lt;/strong&gt;. Instead of capturing screenshots one at a time, I defined all 59 of them in a YAML file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;screenshots&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;getting-started/app-overview&lt;/span&gt;
    &lt;span class="na"&gt;output&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docs/public/images/getting-started/app-overview.png&lt;/span&gt;
    &lt;span class="na"&gt;crop&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;window&lt;/span&gt;
    &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="s"&gt;Full app window showing the icon rail, channel list,&lt;/span&gt;
      &lt;span class="s"&gt;and a chat conversation.&lt;/span&gt;
    &lt;span class="na"&gt;validate&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Channels&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;getting-started/create-channel-dialog&lt;/span&gt;
    &lt;span class="na"&gt;output&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docs/public/images/getting-started/create-channel-dialog.png&lt;/span&gt;
    &lt;span class="na"&gt;crop&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;main&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;click&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;+'&lt;/span&gt;
        &lt;span class="na"&gt;near&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Channels'&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;wait&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1.5&lt;/span&gt;
    &lt;span class="na"&gt;cleanup&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;press&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Escape'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each entry declares what to capture, how to navigate there, what to crop, and what text should appear in the final image (the &lt;code&gt;validate&lt;/code&gt; field). The manifest runner executes them in sequence, resetting the app state between each one.&lt;/p&gt;

&lt;p&gt;The manifest means that when the UI changes, you do not retake screenshots by hand. You run the manifest and get all 59 back in one batch. An &lt;code&gt;--audit&lt;/code&gt; mode walks every navigation step and reports which targets are broken. A &lt;code&gt;--compare&lt;/code&gt; mode recaptures everything and saves new versions alongside the originals for side-by-side review.&lt;/p&gt;

&lt;p&gt;I ran the audit while writing this blog post. 50 of 59 passed. Every failure was about test data that had changed (renamed channels, deleted workflows), not broken navigation. The core paths all still worked. The lesson: treat screenshot test data like E2E fixtures. Navigation screenshots are stable. Content-dependent ones need a dedicated docs workspace with controlled data.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. docs-preview: Deploy and Verify
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4y9z3lp42qn5d5ykv2fm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4y9z3lp42qn5d5ykv2fm.png" alt="docs-preview skill card: 155 lines covering Zephyr Cloud edge deploy, 3-second build cycle, URL management, and stale URL prevention" width="800" height="135"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The simplest skill, at 155 lines, but it solved two problems at once.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why not localhost?&lt;/strong&gt; The documentation site builds with Rspress. You can run &lt;code&gt;pnpm dev&lt;/code&gt; and preview on &lt;code&gt;localhost:3000&lt;/code&gt;, but that only works for you. You cannot share a localhost URL in a PR review, paste it into a Slack thread, or hand it to a teammate to check your work. I needed shareable URLs.&lt;/p&gt;

&lt;p&gt;The docs build uses the &lt;code&gt;withZephyr()&lt;/code&gt; Rspress plugin, which uploads the built site to &lt;a href="https://zephyr-cloud.io" rel="noopener noreferrer"&gt;Zephyr Cloud's&lt;/a&gt; edge network on every &lt;code&gt;pnpm build&lt;/code&gt;. The whole cycle takes under 2 seconds. Build, upload, deploy, live URL. I timed it while writing this post: 1.8 seconds for 55 pages and 59 images to go from source files to a production-ready URL on a global CDN.&lt;/p&gt;

&lt;p&gt;That means every time the agent finishes writing or updating a page, it can build and hand me a live URL to check in the browser. No local server to start, no port conflicts, no "works on my machine." Just a URL that anyone on the team can open.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The URL problem.&lt;/strong&gt; Every build produces a unique URL with a hash suffix that changes each time. AI agents are bad at this. The URL has a fixed project number (like &lt;code&gt;213&lt;/code&gt;) and a per-build hash (like &lt;code&gt;4a62f09db&lt;/code&gt;). Before the skill existed, the agent would sometimes "increment" the project number thinking it was a build counter, or type a URL from memory with a fabricated hash. Both produce links that have never existed and always 404.&lt;/p&gt;

&lt;p&gt;The skill stamps that out. It pipes the build output to a log file and re-greps the log whenever the URL is needed. It includes explicit warnings about not reusing stale URLs and not typing URLs from memory. Simple, but it eliminated a genuinely annoying class of failures.&lt;/p&gt;

&lt;h3&gt;
  
  
  Verifying the Docs With Playwright CLI
&lt;/h3&gt;

&lt;p&gt;There is an important distinction in this workflow. Peekaboo automates the desktop app to capture screenshots. But who verifies that the documentation pages themselves render correctly?&lt;/p&gt;

&lt;p&gt;That is where &lt;a href="https://github.com/nichochar/playwright-cli" rel="noopener noreferrer"&gt;Playwright CLI&lt;/a&gt; comes in. It is a command-line tool that wraps Playwright's browser automation into simple terminal commands. The agent uses it to open the built documentation site in a real browser, take a DOM snapshot, and verify that headings and images rendered correctly.&lt;/p&gt;

&lt;p&gt;The verification flow looks like this. After the agent writes a page, it runs &lt;code&gt;playwright-cli snapshot&lt;/code&gt; to get the full DOM tree and checks that the H1 matches, all images loaded, the sidebar navigation includes the new page, and the table of contents lists the right H2 headings. If something is missing or broken, it fixes the page and rebuilds.&lt;/p&gt;

&lt;p&gt;This matters because a build passing does not mean the page looks right. Rspress generates static HTML that hydrates with React, so a page can exist but render incorrectly if something is off in the markdown or frontmatter. Playwright actually loads the page in a browser engine and lets the agent inspect what a user would see. It catches dead images, broken navigation links, callouts that rendered as raw markdown instead of styled containers, and layout issues that only show up in the browser.&lt;/p&gt;

&lt;p&gt;Two tools, two targets. Peekaboo verifies the app. Playwright CLI verifies the docs about the app.&lt;/p&gt;

&lt;h2&gt;
  
  
  Working With the Agent, Not Watching It
&lt;/h2&gt;

&lt;p&gt;I want to be clear about something: this was not me kicking off an agent and walking away. It was a constant back-and-forth, like working with a colleague sitting right next to you.&lt;/p&gt;

&lt;p&gt;Every page went through iteration. I would review what the agent wrote, point out what was wrong, ask for restructuring, and push back on phrasing. The getting started guide in particular went through several rounds of reworking. What is the right order to introduce features? Should installation come before the platform tour or after? How do you title a page so someone scanning the sidebar instantly knows what it covers? These are editorial decisions that an agent cannot make alone.&lt;/p&gt;

&lt;p&gt;One technique that worked well was passing screenshots directly to the agent and saying "check all the clickable items on this and document anything I missed." This shifted the process from documenting based on source code to documenting based on what a user actually sees. The agent could look at a screenshot, identify buttons, tabs, and menu items through OCR, cross-reference them with the existing docs, and flag the gaps. That is how I caught undocumented features like the embedded browser and the code editor panel in Phase 6.&lt;/p&gt;

&lt;p&gt;The quality of what the agent produced was good first-draft material that needed editorial direction, not a rewrite. The voice was right because the skill defined it. The structure was right because the template enforced it. What I spent my time on was the higher-level decisions: how to organize the getting started flow, what to emphasize, what to cut, and making sure the documentation told a coherent story rather than just listing features.&lt;/p&gt;

&lt;p&gt;You can see the output at &lt;a href="https://docs.theaiplatform.app/" rel="noopener noreferrer"&gt;docs.theaiplatform.app&lt;/a&gt;. The &lt;a href="https://docs.theaiplatform.app/guide/getting-started/" rel="noopener noreferrer"&gt;Platform Tour&lt;/a&gt; shows the structure I landed on for the getting started flow. The &lt;a href="https://docs.theaiplatform.app/guide/chat/" rel="noopener noreferrer"&gt;Chat section&lt;/a&gt; shows how a feature area breaks down into overview, channels, and messaging pages. The &lt;a href="https://docs.theaiplatform.app/guide/settings/" rel="noopener noreferrer"&gt;Settings section&lt;/a&gt; shows the most straightforward pages where the structure was consistent enough that the agent could produce near-final drafts with minimal editing.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Day-by-Day Walkthrough
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Day 1: The Kickoff
&lt;/h3&gt;

&lt;p&gt;Day 1 was about the plan. I sat down and mapped out the phased approach: what to tackle in what order, how to break 55 pages into manageable batches, and what the agent would need to know before writing the first page. This was the most important work of the entire sprint. The product was also being rebranded, so I ran a rename pass across the existing documentation. Four commits. No new content yet, but the groundwork was laid.&lt;/p&gt;

&lt;h3&gt;
  
  
  Day 2: The Evening Sprint
&lt;/h3&gt;

&lt;p&gt;Phases 0 through 4 in a single evening. This sounds aggressive, and it was. But the phased plan made it possible. Each phase had a clear scope, and the agent could read the source code to understand each feature before writing about it.&lt;/p&gt;

&lt;p&gt;The first commit kicked off Phase 0, which restructured everything, moving 6,769 lines of developer-focused content out of the user-facing docs. Then Phases 1 through 4 each produced a batch of pages with screenshots.&lt;/p&gt;

&lt;p&gt;Twelve commits in about ninety minutes. All the scaffolding, all the content, all the initial screenshots. The quality was rough in places (I would fix that in later phases), but the coverage was there. Every major section of the product had at least a first-draft page.&lt;/p&gt;

&lt;h3&gt;
  
  
  Day 3: The Real Work
&lt;/h3&gt;

&lt;p&gt;Day 3 had 43 commits. This is where the polish happened and where most of the problems surfaced.&lt;/p&gt;

&lt;p&gt;Phase 5 started with adding missing screenshots and cross-links. Then the big disruption: the app's sidebar got redesigned mid-sprint. Text labels were replaced with an icon rail. Every screenshot showing the sidebar was wrong. Every navigation step clicking a text label was broken. The manifest paid for itself here. I updated the navigation steps, re-ran the batch, and had all 59 screenshots regenerated in minutes instead of retaking them by hand.&lt;/p&gt;

&lt;p&gt;I also added &lt;code&gt;reset&lt;/code&gt; steps to the manifest on day 3. Before each screenshot, the runner presses Escape twice and clicks the Chat icon to return to a known state. Without this, a failed screenshot left the app in a broken state that cascaded into every subsequent capture.&lt;/p&gt;

&lt;h3&gt;
  
  
  Day 4: Finish and Ship
&lt;/h3&gt;

&lt;p&gt;Day 4 was Phase 6 (undocumented features) plus a thorough review pass. The embedded browser and code editor panels had no documentation at all. The agent read the source components, I opened the app to verify what the UI actually looked like, and wrote the pages together.&lt;/p&gt;

&lt;p&gt;The review pass caught real issues: contradictory text on the account creation page, screenshots that were cropped too loosely, duplicate content between the workflows overview and the build-and-run page.&lt;/p&gt;

&lt;p&gt;The final commit merged the PR: 55 documentation pages, 59 screenshots, and the three skills.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Broke Along the Way
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Rebrand
&lt;/h3&gt;

&lt;p&gt;The product was rebranded from Zephyr Agency to The AI Platform during the documentation sprint. The rename itself is mechanically simple (find and replace), but the follow-on work is not. Alt text on 59 screenshots. Config files. Every page referencing the product name. Sentences that started with the product name suddenly reading awkwardly with the article "The" prepended. This is not an agent problem. It is just the reality of documenting a product that is still evolving. But it added real friction to a sprint that was already moving fast.&lt;/p&gt;

&lt;h3&gt;
  
  
  OCR Is Not Perfect
&lt;/h3&gt;

&lt;p&gt;The Vision framework's OCR is very good, but not flawless. It occasionally misreads text. "Get update" becomes "Get undate." The letter "I" gets confused with "l" in certain fonts. When the agent tries to click "Get update" and OCR returns "Get undate," the navigation step fails.&lt;/p&gt;

&lt;p&gt;The workaround I built into the skill: search for a substring instead of the full text, use nearby anchor text to disambiguate, or fall back to coordinate-based clicking. The &lt;code&gt;continue_on_failure&lt;/code&gt; flag on manifest steps lets non-critical navigation steps fail without aborting the entire screenshot.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tooltips and Hover States
&lt;/h3&gt;

&lt;p&gt;Moving the mouse to click an element sometimes triggers a tooltip that appears in the screenshot. The fix was straightforward once I understood it: move the cursor away from interactive elements before capturing. The script now does this automatically, but it cost me a round of retakes before I figured out what was happening.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Worked Surprisingly Well
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Skills as Accumulated Knowledge
&lt;/h3&gt;

&lt;p&gt;The three skills started small and grew with every problem I hit. The &lt;code&gt;doc-screenshots&lt;/code&gt; skill started as a wrapper around &lt;code&gt;screencapture&lt;/code&gt; and Pillow. By the end, it had manifest batch processing, audit mode, validation, reset steps, coordinate-based fallbacks, card-level pixel scanning, and anti-tooltip cursor management.&lt;/p&gt;

&lt;p&gt;Each improvement was triggered by a real problem. And because skills persist across sessions, the fix was permanent. The next time anyone on the team works on documentation, all of those fixes are already loaded.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Manifest as a Screenshot Database
&lt;/h3&gt;

&lt;p&gt;Defining all 59 screenshots declaratively in YAML turned out to be the single most valuable technical decision. Not because batch capture is faster than individual capture (it is), but because it made screenshots a reproducible artifact. The sidebar redesign on day 3 proved it: update a width constant and a few navigation steps, run one command, and all 59 screenshots are regenerated. No manual retakes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reading Source Code for Accuracy
&lt;/h3&gt;

&lt;p&gt;The agent reads the actual source code before writing documentation. When the docs said "click the + button next to Channels," it was because the agent had found that button in the component tree, not because it was guessing. That said, source code is not always the final truth. The running app sometimes differs from what the code suggests. The skill instructs the agent to verify text against screenshots using OCR and update the docs when they do not match.&lt;/p&gt;

&lt;h2&gt;
  
  
  By the Numbers
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxtsnliu4dp8fl7agwv9t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxtsnliu4dp8fl7agwv9t.png" alt="By the Numbers: 55 pages, 59 screenshots, 81 commits, 4-day sprint, 24K words, 3 skills built, 1 rebrand survived, 6.2 MB of images" width="800" height="295"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Would Do Differently
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Start the skills earlier.&lt;/strong&gt; The skills were created during the documentation sprint itself. If I had written even a rough version of the &lt;code&gt;write-docs&lt;/code&gt; and &lt;code&gt;doc-screenshots&lt;/code&gt; skills before starting, the first day would have gone smoother. The early pages needed more revision because the conventions were not yet codified.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Find a way to run screenshot audits in CI.&lt;/strong&gt; As mentioned above, the navigation clicks need a real display, so CI is not an option yet. But even running &lt;code&gt;--audit&lt;/code&gt; locally before merging a PR that touches the UI would catch most stale screenshots early.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Write the manifest first, content second.&lt;/strong&gt; I wrote pages and captured screenshots as I went. It would have been faster to define the full manifest up front (just the navigation steps, no content), run it once to see what the app actually looks like everywhere, and then write the pages based on real screenshots instead of source code alone.&lt;/p&gt;

&lt;h2&gt;
  
  
  What You Can Take Away
&lt;/h2&gt;

&lt;p&gt;If you are thinking about using an AI agent for documentation, here is what I think matters most.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Teach the agent, do not just instruct it.&lt;/strong&gt; A prompt that says "write documentation for this feature" produces generic content. A skill that defines your voice, your formatting rules, your page structure, and your verification checklist produces documentation that sounds like your team wrote it. The upfront investment in the skill pays off on every subsequent page.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Make screenshots reproducible.&lt;/strong&gt; Manual screenshots are the first thing that goes stale. A declarative manifest that can regenerate every screenshot in one command is worth the engineering effort. It changes screenshots from a one-time cost to a maintained artifact.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase your work.&lt;/strong&gt; Even if you are using an agent, "write all the docs" is not a plan. Break it into phases with clear scope and clear deliverables. This gives you stopping points, review points, and the ability to course-correct.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Expect things to break.&lt;/strong&gt; OCR will misread text. The UI will change mid-sprint. Preview URLs will go stale. The difference between a frustrating experience and a productive one is whether you encode the fix into a skill so it never happens again.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Review everything.&lt;/strong&gt; The agent does not replace your judgment. It replaces the mechanical work. You still need to read every page, check every screenshot, and verify that the documentation matches what the user actually sees. The agent writes the first draft. You make it right.&lt;/p&gt;

&lt;h2&gt;
  
  
  Making Docs Agent-Ready
&lt;/h2&gt;

&lt;p&gt;Writing 55 pages for humans was only half the problem. Agents need to read documentation too.&lt;/p&gt;

&lt;p&gt;I added &lt;a href="https://docs.theaiplatform.app/llms.txt" rel="noopener noreferrer"&gt;llms.txt&lt;/a&gt; and &lt;a href="https://docs.theaiplatform.app/llms-full.txt" rel="noopener noreferrer"&gt;llms-full.txt&lt;/a&gt; to the documentation site using the Rspress &lt;code&gt;@rspress/plugin-llms&lt;/code&gt; plugin. The &lt;code&gt;llms.txt&lt;/code&gt; file is a structured index of every page with one-line descriptions. The &lt;code&gt;llms-full.txt&lt;/code&gt; file is the entire documentation site as a single 3,000-line markdown file that an agent can ingest in one request. Every page also has "Copy as Markdown" and "Open in Claude" buttons so users can feed specific pages to an LLM directly.&lt;/p&gt;

&lt;p&gt;This is live now. Any agent that can fetch a URL can read the entire documentation in seconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Automated Video Walkthroughs (Work in Progress)
&lt;/h2&gt;

&lt;p&gt;Screenshots document a single state. But some features are easier to understand when you see them in motion. Creating a channel, mentioning a specialist, watching the response stream in. These are flows, not static screens.&lt;/p&gt;

&lt;p&gt;I have a proof of concept for automated video walkthroughs using Peekaboo. The same manifest that defines screenshot navigation steps can drive a screen recording session: navigate to the starting point, start recording, walk through the steps, stop recording. The tooling exists in early form and produces usable results, but it is not production-ready yet. I am still working on consistent timing, smooth scrolling, and keeping the recordings tight enough to be useful without being rushed.&lt;/p&gt;

&lt;p&gt;The goal is to embed these videos directly in the documentation pages so that when the UI changes, both screenshots and videos can be regenerated from the same manifest. That is not done yet, but the foundation is there.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Future: Documentation in an Agent-First World
&lt;/h2&gt;

&lt;p&gt;Here is what I keep thinking about. I just spent four days writing 55 pages of documentation. It is good documentation. People will use it. But the way people use software is changing.&lt;/p&gt;

&lt;p&gt;If you have a product with AI specialists built in, the product itself can guide you. Instead of leaving the app to read a documentation page about how to create a workflow, you ask the specialist in the app and it walks you through it. Instead of searching the docs for how to configure a setting, you describe what you want and the agent does it for you.&lt;/p&gt;

&lt;p&gt;That does not mean documentation is dead. It means its role is shifting. Documentation becomes the knowledge layer that agents draw from. The &lt;code&gt;llms.txt&lt;/code&gt; work is a step in that direction. But the bigger shift is making the product itself so intuitive, with specialists that genuinely help, that fewer people need to leave the app to figure things out.&lt;/p&gt;

&lt;p&gt;We are not there yet. Right now, the documentation is essential. But the future we are building toward is one where the product teaches you how to use it, and documentation exists as a reference layer for agents and for the edge cases that in-app guidance does not cover.&lt;/p&gt;




&lt;p&gt;The documentation is live at &lt;a href="https://docs.theaiplatform.app/" rel="noopener noreferrer"&gt;docs.theaiplatform.app&lt;/a&gt;. If you want to try &lt;a href="https://theaiplatform.app" rel="noopener noreferrer"&gt;The AI Platform&lt;/a&gt;, it is available for macOS, Windows, and Linux.&lt;/p&gt;

&lt;p&gt;And yes, this blog post was also created using Goose. It took about five hours of back-and-forth: pulling git history, running the audit and compare, timing preview builds, drafting sections, and then iterating step by step, redrafting, re-checking, and fixing everything until it was right. Agent-driven, not agent-written. Same process as the docs.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>documentation</category>
      <category>webdev</category>
    </item>
    <item>
      <title>How I Used AI to Fix Our E2E Test Architecture</title>
      <dc:creator>Debbie O'Brien</dc:creator>
      <pubDate>Wed, 29 Apr 2026 18:28:37 +0000</pubDate>
      <link>https://forem.com/debs_obrien/how-i-used-ai-to-fix-our-e2e-test-architecture-444a</link>
      <guid>https://forem.com/debs_obrien/how-i-used-ai-to-fix-our-e2e-test-architecture-444a</guid>
      <description>&lt;p&gt;I joined a project with an existing Playwright E2E test suite, 38 spec files, ~165 tests, around 14,000 lines of test infrastructure. My first step was simple: run the tests locally.&lt;/p&gt;

&lt;p&gt;8 out of 130 non-skipped tests passed. A 6% pass rate.&lt;/p&gt;

&lt;p&gt;The confusing part? CI was green. It turned out CI ran everything with &lt;code&gt;workers: 1&lt;/code&gt;, multiple workers plus the dev environment meant running tests locally just wasn't possible.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Analysis — asking questions I didn't know the answers to
&lt;/h2&gt;

&lt;p&gt;I had zero domain knowledge of this codebase. No context on why tests were written a certain way, what the custom wrappers did, or where the real problems were. So I started asking AI to analyze everything, the Playwright configs, the page objects, the spec files, the CI workflows. I asked questions to help me understand the codebase and to figure out what we could do to get tests running locally.&lt;/p&gt;

&lt;p&gt;Over a few days, this produced 18 analysis documents covering &lt;strong&gt;Architecture&lt;/strong&gt;, &lt;strong&gt;Root causes&lt;/strong&gt;, &lt;strong&gt;Anti-patterns&lt;/strong&gt;, &lt;strong&gt;Silent bugs&lt;/strong&gt; and &lt;strong&gt;Test isolation&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;The analysis phase was about building a map of a codebase I didn't understand. Every document was a question answered.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: The tracer bullet plan
&lt;/h2&gt;

&lt;p&gt;With the analysis done, I had a clear picture of what needed to change. But the question was: in what order, and how do you avoid a big refactor that breaks everything?&lt;/p&gt;

&lt;p&gt;The answer was tracer bullets, a concept from &lt;em&gt;The Pragmatic Programmer&lt;/em&gt;. The idea is to build a thin end-to-end slice through all the layers to prove the architecture works, then expand from there.&lt;/p&gt;

&lt;p&gt;I created 8 tracer bullets, each targeting a specific slice:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;UI fixture chain&lt;/strong&gt; — Use worker-scoped and test-scoped fixtures. Prove: fixtures work, teardown works, tests pass in CI.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API fixture chain&lt;/strong&gt; — Same pattern for API tests. Prove: composable fixtures work for API scenarios.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Expand UI migrations&lt;/strong&gt; — Apply the proven UI pattern to more files.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MFE-scoped projects&lt;/strong&gt; — Split one Playwright project into 7 projects by MFE folder (Applications, Organizations, Projects, etc.), each with &lt;code&gt;dependencies: ['Setup']&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Teardown project&lt;/strong&gt; — Add a cleanup project using Playwright's project dependencies.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API fixture expansion&lt;/strong&gt; — Composable API fixtures (&lt;code&gt;ownerOrg&lt;/code&gt; → &lt;code&gt;ownerProject&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;UI migration at scale&lt;/strong&gt; — Remaining UI spec files.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API setup project&lt;/strong&gt; — Replace the no-op &lt;code&gt;globalSetup&lt;/code&gt; with a proper setup project.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The key insight: the dependency graph told me which bullets could run in parallel. Bullets 1 and 2 were independent. Bullet 4 was independent. Bullet 3 depended on 1. This became important later when running multiple AI sessions.&lt;/p&gt;

&lt;h3&gt;
  
  
  What a tracer bullet looked like in practice
&lt;/h3&gt;

&lt;p&gt;Bullet 1 targeted a single file with 5 tests. The steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Add the fixture infrastructure (&lt;code&gt;currentUser&lt;/code&gt; → &lt;code&gt;sharedOrg&lt;/code&gt; → &lt;code&gt;project&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Migrate &lt;code&gt;projects-settings-general.spec.ts&lt;/code&gt; to use the fixtures&lt;/li&gt;
&lt;li&gt;Run locally, verify tests pass&lt;/li&gt;
&lt;li&gt;Push, verify CI is green&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Step 3: I created a skill to do the work
&lt;/h2&gt;

&lt;p&gt;Once I had a plan with all 33 tasks organized into phases. I needed something to work through them consistently — same process every time, same quality bar, same benchmarking. So I built a skill: &lt;code&gt;pw-test-improvement&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the skill does
&lt;/h3&gt;

&lt;p&gt;A strict 7-step process for every change:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Identify&lt;/strong&gt; — Pick one item from the implementation tracker&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Baseline&lt;/strong&gt; — Run the affected tests 3× before changes, record pass rate and timing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fix&lt;/strong&gt; — Apply the change following embedded Playwright best practices&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test&lt;/strong&gt; — Run 3× after changes, all must pass&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compare&lt;/strong&gt; — Document before/after benchmarks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Update&lt;/strong&gt; — Mark the tracker item done&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Commit&lt;/strong&gt; — Only when asked, with a structured PR description&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The skill had built-in knowledge: Playwright's locator priority (&lt;code&gt;getByRole&lt;/code&gt; &amp;gt; &lt;code&gt;getByLabel&lt;/code&gt; &amp;gt; &lt;code&gt;getByText&lt;/code&gt; &amp;gt; ...), a list of anti-patterns to avoid (&lt;code&gt;waitForTimeout&lt;/code&gt;, no-op assertions, CSS class selectors, forced clicks without justification), and migration patterns for replacing the &lt;code&gt;Actions&lt;/code&gt; wrapper with direct Playwright calls.&lt;/p&gt;

&lt;p&gt;It used the Playwright CLI to run tests directly and capture results.&lt;/p&gt;

&lt;h2&gt;
  
  
  The architecture changes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Fixtures replaced boilerplate
&lt;/h3&gt;

&lt;p&gt;The biggest change was moving from repeated &lt;code&gt;beforeAll&lt;/code&gt;/&lt;code&gt;afterAll&lt;/code&gt; blocks to Playwright fixtures. Before: each of 5 test files independently called &lt;code&gt;getUser()&lt;/code&gt;, &lt;code&gt;createOrg()&lt;/code&gt;, &lt;code&gt;createProject()&lt;/code&gt; — 15 API calls total. After: worker-scoped fixtures shared across files — 7 calls total (53% reduction).&lt;/p&gt;

&lt;p&gt;The key distinction was &lt;strong&gt;worker-scoped vs test-scoped&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Worker-scoped&lt;/strong&gt; (&lt;code&gt;{ scope: 'worker' }&lt;/code&gt;) — created once, shared across all tests in that worker. Good for expensive setup like orgs and projects.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test-scoped&lt;/strong&gt; (default) — created fresh for each test. Good for data that tests mutate.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Project structure
&lt;/h3&gt;

&lt;p&gt;The Playwright config went from one project running all 38 spec files to 7 projects, each pointing to its MFE folder:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Applications&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="nx"&gt;testDir&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;apps/ui/applications/e2e&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;dependencies&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Setup&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Organizations&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;testDir&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;apps/ui/organizations/e2e&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;dependencies&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Setup&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Projects&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="na"&gt;testDir&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;apps/ui/projects/e2e&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="na"&gt;dependencies&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Setup&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="c1"&gt;// ... Subscriptions, Host, User Profile&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This meant you could run &lt;code&gt;--project=Applications&lt;/code&gt; to test just what you need, HTML reports grouped by area, and heavy specs got their own parallelism settings.&lt;/p&gt;

&lt;h3&gt;
  
  
  The serial cascade fix
&lt;/h3&gt;

&lt;p&gt;4 actual test failures looked like 57. Application tests used &lt;code&gt;serial&lt;/code&gt; mode, so when the first test failed, all subsequent tests in that describe block were marked "did not run." The fix: split heavy specs into a dedicated project, increase timeouts (30s → 60s for &lt;code&gt;beforeAll&lt;/code&gt;), cap workers to prevent API overload, and use worker-scoped fixtures to share expensive setup.&lt;/p&gt;

&lt;h2&gt;
  
  
  What went wrong
&lt;/h2&gt;

&lt;p&gt;Not everything worked first time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The cleanup project broke CI.&lt;/strong&gt; We added a teardown project with Playwright's project dependencies to clean up test data after runs. It worked locally. In CI, it caused failures — the cleanup ran against a shared environment and interfered with other pipelines. Had to revert it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not everything should be a fixture.&lt;/strong&gt; We tried converting everything to fixtures. After reviewing Playwright docs, we rejected one of the fixtures before doing it as worker-scoped fixtures share across files, which would pollute serial tests that need per-file isolation with different options.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I worked with AI
&lt;/h2&gt;

&lt;p&gt;This wasn't "tell AI to fix it." It was a collaboration process:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Ask questions relentlessly&lt;/strong&gt; — "What does this method do?" "Why is this test flaky?" "According to Playwright docs we can do X, can you verify your suggestion based on the docs" I asked hundreds of questions during the analysis phase which lasted a few days.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Challenge every suggestion&lt;/strong&gt; — "Are you sure? What about edge case X?" If the AI suggested a pattern, I'd ask it to explain why and if it was sure that was a good way of doing it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Use docs as ground truth&lt;/strong&gt; — I'd link to Playwright docs and ask "does this align with whats in the docs?" The AI's training data can be outdated; the docs are current.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Validate with multiple tools&lt;/strong&gt; — I used Goose, Claude Code, and GitHub Copilot. Different tools catch different blind spots and have different opinions just like when you work with different team mates.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Check confidence explicitly&lt;/strong&gt; — "What's your confidence level on this? why only a 7? How can we get a 10 confidence level?" This surfaces uncertainty the AI might not volunteer and also goes deeper to understanding what we haven't thought about and how we can improve things.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Running it in practice
&lt;/h3&gt;

&lt;p&gt;I ran up to 4 AI sessions in parallel — based on which tracer bullets were independent of each other. The dependency graph from the implementation plan told me what could safely run at the same time.&lt;/p&gt;

&lt;p&gt;I'd switch between sessions to check progress, read through what was being changed, and step in when something needed verifying. The AI did the mechanical work, applying patterns, running tests, capturing benchmarks. I did the oversight, deciding what to fix next, catching when a suggestion didn't look right, and verifying against the actual Playwright docs.&lt;/p&gt;

&lt;p&gt;Never more than 4 at a time. I wanted to read and understand everything that was happening.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we measured
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;th&gt;Change&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;API calls per file&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;53% reduction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;UI test setup lines&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;62% reduction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API setup/cleanup lines&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;80% reduction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Files with manual try/finally&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;Fixtures handle it&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Boilerplate removed&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;~1,000 lines&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What we created along the way
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;18 analysis documents&lt;/li&gt;
&lt;li&gt;5 implementation guides&lt;/li&gt;
&lt;li&gt;33 tasks with verification commands&lt;/li&gt;
&lt;li&gt;1 skills (test improvement)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Lessons learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;About testing:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Green CI doesn't mean tests work locally&lt;/li&gt;
&lt;li&gt;One real failure can cascade into dozens of phantom failures in serial mode&lt;/li&gt;
&lt;li&gt;Web-first assertions (&lt;code&gt;expect(locator)&lt;/code&gt;) catch timing issues that manual checks miss&lt;/li&gt;
&lt;li&gt;Fixtures aren't always the answer, some setup belongs in &lt;code&gt;beforeAll&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;About working with AI:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI is better at applying known patterns than inventing new ones, give it a clear process&lt;/li&gt;
&lt;li&gt;The analysis phase was the highest-leverage use of AI, it found things I'd have missed for weeks&lt;/li&gt;
&lt;li&gt;Multiple tools &amp;gt; one tool, cross-checking catches hallucinations and enhances confidence in the approach&lt;/li&gt;
&lt;li&gt;The skill made it scalable, without it, every fix would need the same instructions repeated&lt;/li&gt;
&lt;li&gt;Keep the human in the loop, 4 parallel sessions, never unattended&lt;/li&gt;
&lt;li&gt;Find the time to do these kind of tasks. They take time at first but then you achieve so much more.&lt;/li&gt;
&lt;li&gt;Use AI just like it's a new colleague that you don't know very well who never turns on their camera so it's hard to get to know them and therefore you can't fully trust them but you know they have good opinions and are good at their job but you need to be sure they have thought things through and are not just being lazy and making bad decisions.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>testing</category>
      <category>e2e</category>
      <category>playwright</category>
      <category>ai</category>
    </item>
    <item>
      <title>Getting Started with Claude Code: A Guide to Slash Commands and Tips</title>
      <dc:creator>Debbie O'Brien</dc:creator>
      <pubDate>Tue, 31 Mar 2026 21:06:39 +0000</pubDate>
      <link>https://forem.com/debs_obrien/getting-started-with-claude-code-a-guide-to-slash-commands-and-tips-10n1</link>
      <guid>https://forem.com/debs_obrien/getting-started-with-claude-code-a-guide-to-slash-commands-and-tips-10n1</guid>
      <description>&lt;p&gt;When you first open Claude Code, it's not immediately obvious what commands are available to you. I spent some time today exploring the slash commands and keyboard shortcuts thanks to Matt Pocock's &lt;a href="https://www.aihero.dev/cohorts/claude-code-for-real-engineers-2026-04" rel="noopener noreferrer"&gt;&lt;em&gt;Claude Code for Real Engineers&lt;/em&gt;&lt;/a&gt; course, and found them genuinely useful for day-to-day work. Here's a quick rundown of what each one does and when you might reach for it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Slash Commands
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;/intro&lt;/code&gt; - Setting Up Your Project Instructions
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;/intro&lt;/code&gt; creates a &lt;code&gt;claude.md&lt;/code&gt; file where you can define instructions for how Claude should behave in your project. If you're working in a team or want consistent responses across sessions, this is a good place to start.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;/terminal-setup&lt;/code&gt; - Fixing Multi-Line Input
&lt;/h3&gt;

&lt;p&gt;By default, hitting Enter sends your message immediately, which can be frustrating when you're trying to write something longer. &lt;code&gt;/terminal-setup&lt;/code&gt; configures your terminal so that &lt;strong&gt;Option + Enter&lt;/strong&gt; (or Alt + Enter on Windows) gives you a new line instead.&lt;/p&gt;

&lt;p&gt;One thing to note: you'll need to restart your terminal app after running this for the changes to take effect.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;/model&lt;/code&gt; - Changing the Default Model
&lt;/h3&gt;

&lt;p&gt;If you want to switch which model Claude Code uses, &lt;code&gt;/model&lt;/code&gt; lets you do that. Straightforward, but easy to miss if you don't know it's there.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;/usage&lt;/code&gt; - Checking Your Subscription
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;/usage&lt;/code&gt; shows your current usage for your subscription plan. Handy for keeping track of where you are without having to leave the terminal.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;/context&lt;/code&gt; - Understanding What's in Your Context Window
&lt;/h3&gt;

&lt;p&gt;This one I found particularly useful. &lt;code&gt;/context&lt;/code&gt; gives you a breakdown of what's currently loaded in your conversation, with estimated usage by category:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;System prompts&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;System tools&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Skills&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Messages&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Free space&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────────────┐
│  /context                                                   │
│  └─ Context Usage                                           │
│                                                             │
│  claude-opus-4-6 · 15k/1000k tokens (1%)                   │
│                                                             │
│  Estimated usage by category                                │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   │
│  ● System prompt:      5.6k tokens  (0.6%)                  │
│  ● System tools:       8.3k tokens  (0.8%)                  │
│  ○ Skills:              715 tokens  (0.1%)                   │
│  ○ Messages:             58 tokens  (0.0%)                   │
│  □ Free space:         952k tokens  (95.2%)                  │
│  ■ Autocompact buffer:  33k tokens  (3.3%)                   │
│                                                             │
│  [████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 1%  │
│   ^^^^                                                      │
│   used                              free space              │
└─────────────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It also tells you when autocompaction will happen, that's when Claude automatically trims older context because the token limit is running low. If you've ever wondered why Claude seems to "forget" something from earlier in a long session, this command helps explain what's going on.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;/clear&lt;/code&gt; - Starting Fresh
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;/clear&lt;/code&gt; wipes your chat history and context window. It's essentially the same as closing and starting a new Claude session. Useful when you're switching to a completely different task and don't need the previous context hanging around.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;/ide&lt;/code&gt; - Connect to Your IDE
&lt;/h3&gt;

&lt;p&gt;There's a Claude Code extension for VS Code, and you can connect to it by running &lt;code&gt;/ide&lt;/code&gt;. Once connected, things like git diffs will open in VS Code instead of displaying in the terminal. If you're reviewing changes regularly this is a much better experience, you get proper syntax highlighting and the familiar side-by-side diff view rather than trying to read through diffs in the terminal.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;/resume&lt;/code&gt; - Browse Previous Sessions
&lt;/h3&gt;

&lt;p&gt;Type &lt;code&gt;/resume&lt;/code&gt; and use the &lt;strong&gt;up and down arrow keys&lt;/strong&gt; to browse through your previous sessions. There's also a search box so you can find a specific session across all sessions in the repo.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tips
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Interrupting Claude with Escape
&lt;/h3&gt;

&lt;p&gt;Press &lt;strong&gt;Escape&lt;/strong&gt; at any time to interrupt Claude while it's generating a response. If you want it to continue from where it left off, just type "go." Press Escape again if you want to stop it for good.&lt;/p&gt;

&lt;p&gt;This is helpful when you realise partway through that you need to rephrase your question or Claude is heading in the wrong direction.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rewind with Escape + Escape
&lt;/h3&gt;

&lt;p&gt;Press &lt;strong&gt;Escape&lt;/strong&gt; twice to enter rewind mode. This lets you scroll back through your conversation using the &lt;strong&gt;up arrow key&lt;/strong&gt;. When you land on the point you want to go back to, press &lt;strong&gt;Enter&lt;/strong&gt; and you'll get a few options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Restore code and conversation&lt;/strong&gt; - rolls back both your files and the chat&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Restore conversation&lt;/strong&gt; - rewinds the chat but keeps your code as-is&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Restore code&lt;/strong&gt; - reverts your files but keeps the conversation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Summarize from here&lt;/strong&gt; - condenses everything from that point forward&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Never mind&lt;/strong&gt; - cancels and takes you back to where you were&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is really useful when Claude has gone down the wrong path and you want to undo a series of changes without manually reverting files yourself.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stash Your Prompt with Ctrl + S
&lt;/h3&gt;

&lt;p&gt;This one would have saved me a lot of time if I'd known about it sooner. If you're mid-way through typing a prompt and realise you need to ask something else first, press &lt;strong&gt;Ctrl + S&lt;/strong&gt; to stash it. Your current prompt gets set aside, you can type and submit something else, and then the stashed prompt automatically restores in the input field, ready for you to send or stash again.&lt;/p&gt;

&lt;p&gt;If you decide you no longer need the stashed prompt, just press &lt;strong&gt;Ctrl + C&lt;/strong&gt; to get rid of it.&lt;/p&gt;

&lt;p&gt;Before I knew this existed, I was copying my prompt to the clipboard, typing the other thing, and then pasting it back in. Not the end of the world, but once you've done that a few times in a session it gets old fast.&lt;/p&gt;

&lt;h3&gt;
  
  
  Paste Images Directly into Claude Code
&lt;/h3&gt;

&lt;p&gt;Something I didn't expect from a terminal-based tool: you can copy and paste images right into Claude Code. Just copy an image and paste it into the input field, then ask questions about it. Useful for things like sharing a screenshot of an error and asking what's wrong, pasting a design mockup and asking Claude to build it, or getting help interpreting a diagram or chart.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bash Mode with &lt;code&gt;!&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Prefix any input with &lt;code&gt;!&lt;/code&gt; to run it as a bash command directly from Claude Code. For example, &lt;code&gt;!npm run typecheck&lt;/code&gt; will run your typecheck and show the output. The useful part here is that any error messages from those commands are now in Claude's context, so you can immediately ask it to help fix whatever went wrong.&lt;/p&gt;

&lt;p&gt;You can also run long-running processes like &lt;code&gt;!npm run dev&lt;/code&gt; and then press &lt;strong&gt;Ctrl + B&lt;/strong&gt; to send it to the background. You'll see a message like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Command was manually backgrounded by user with ID: be96u9i91. Output is being written...&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A background task indicator will appear, and you can use the &lt;strong&gt;arrow keys&lt;/strong&gt; to navigate to it and press &lt;strong&gt;Enter&lt;/strong&gt; to view the shell details:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;Shell details

Status:  running
Runtime: 2m 15s
Command: npm run dev

Output:
&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;dev
&lt;span class="gp"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;react-router dev
&lt;span class="go"&gt;  ➜  Local:   http://localhost:5173/
  ➜  Network: use --host to expose
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From the shell details view, you can press &lt;strong&gt;X&lt;/strong&gt; to stop the background process, or press the &lt;strong&gt;left arrow key&lt;/strong&gt; to go back to your conversation.&lt;/p&gt;

&lt;p&gt;This means you can keep your dev server running in the background while continuing to work with Claude in the foreground. Because the output is being captured, Claude can see what's happening with the process so if something crashes or throws an error, it already has that context and can help you debug it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Suspend Claude with Ctrl + Z
&lt;/h3&gt;

&lt;p&gt;If you need to run a bash command outside of Claude, something you don't want in its context, press &lt;strong&gt;Ctrl + Z&lt;/strong&gt; to suspend the process. Run whatever you need to in your terminal, then type &lt;code&gt;fg&lt;/code&gt; to bring Claude back. Handy for things like checking credentials, running unrelated scripts, or anything you'd rather keep out of the conversation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Ending and Resuming Sessions
&lt;/h3&gt;

&lt;p&gt;Press &lt;strong&gt;Ctrl + C&lt;/strong&gt; twice to end your current session. Claude persists sessions locally, so when you exit it gives you a command to resume that session, something like &lt;code&gt;claude --resume &amp;lt;session-id&amp;gt;&lt;/code&gt;. Just copy and paste it to pick up where you left off.&lt;/p&gt;

&lt;p&gt;If you've already closed the session and didn't save the command, no problem. Open Claude Code and use the &lt;code&gt;/resume&lt;/code&gt; slash command to browse your history.&lt;/p&gt;

&lt;p&gt;If you just want to jump straight back into your most recent session, &lt;code&gt;claude --continue&lt;/code&gt; does exactly that.&lt;/p&gt;

&lt;h3&gt;
  
  
  Managing Permissions
&lt;/h3&gt;

&lt;p&gt;When Claude needs to run something, it will ask for permission with a few options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Yes&lt;/strong&gt; - allow it this once&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Yes, and don't ask again for...&lt;/strong&gt; - allow it going forward&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No&lt;/strong&gt; - block it, with the option to give a reason or suggest a different command&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These choices are saved to a file called &lt;code&gt;settings.local.json&lt;/code&gt; inside the &lt;code&gt;.claude&lt;/code&gt; folder in your project. Inside that file you'll find a &lt;code&gt;permissions&lt;/code&gt; property with an &lt;code&gt;allow&lt;/code&gt; array listing everything you've approved. You can edit this manually to add commands, for example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"permissions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"allow"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(pnpm typecheck)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(pnpm *)"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"deny"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(git push *)"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use wildcards to allow a range of commands—&lt;code&gt;Bash(pnpm *)&lt;/code&gt; will permit any pnpm command. Use &lt;code&gt;deny&lt;/code&gt; to explicitly block things you never want Claude to run, like &lt;code&gt;Bash(git push *)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Permissions aren't limited to bash commands either, they also cover things like web search and other tools.&lt;/p&gt;

&lt;p&gt;By default, &lt;code&gt;settings.local.json&lt;/code&gt; is ignored via &lt;code&gt;.gitignore&lt;/code&gt; so your permissions stay local to your machine. If you want to share them with your team, rename the file to &lt;code&gt;settings.json&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Hope this helps you move faster with Claude. Have fun.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>ai</category>
      <category>programming</category>
    </item>
    <item>
      <title>How I built a practical agent skill that turns rough READMEs into polished project docs</title>
      <dc:creator>Debbie O'Brien</dc:creator>
      <pubDate>Tue, 24 Mar 2026 21:44:06 +0000</pubDate>
      <link>https://forem.com/debs_obrien/how-i-built-a-practical-agent-skill-that-turns-rough-readmes-into-polished-project-docs-2mef</link>
      <guid>https://forem.com/debs_obrien/how-i-built-a-practical-agent-skill-that-turns-rough-readmes-into-polished-project-docs-2mef</guid>
      <description>&lt;p&gt;If you're new to agent skills, start with my beginner guide first:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/debs_obrien/what-are-agent-skills-beginners-guide-e2n"&gt;What Are Agent Skills? Beginners Guide&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That post covers what skills are, how they get loaded, and how to build a tiny one from scratch.&lt;/p&gt;

&lt;p&gt;This post picks up where that one stops.&lt;/p&gt;

&lt;p&gt;Instead of another tiny example, I want to show you what a practical skill looks like when it solves a real problem.&lt;/p&gt;

&lt;p&gt;We are going to take the idea of a skill and use it to turn rough project READMEs into polished docs that are consistent, accurate, and reusable across repos. I picked README generation because the output is easy to judge, it comes up again and again, and once you get it right for one project you want the same quality bar everywhere.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3vdwqr7y2i1m5t5vki8r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3vdwqr7y2i1m5t5vki8r.png" alt="Before vs After" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem with one-off README prompts
&lt;/h2&gt;

&lt;p&gt;You can absolutely ask an agent to improve your README and get something decent back.&lt;/p&gt;

&lt;p&gt;Sometimes it will even be very good.&lt;/p&gt;

&lt;p&gt;But if you do that across multiple projects, the cracks show up quickly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;badge styles are inconsistent&lt;/li&gt;
&lt;li&gt;section order changes from repo to repo&lt;/li&gt;
&lt;li&gt;install commands drift away from the actual package manager&lt;/li&gt;
&lt;li&gt;social links get guessed&lt;/li&gt;
&lt;li&gt;simple projects end up with bloated READMEs&lt;/li&gt;
&lt;li&gt;the agent repeats the same repo-scanning work every time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is exactly the kind of problem skills are good at solving.&lt;/p&gt;

&lt;p&gt;Not because they magically make the model smarter, but because they turn a vague prompt into a reusable workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  The first version was just one file
&lt;/h2&gt;

&lt;p&gt;I did not start with a big architecture.&lt;/p&gt;

&lt;p&gt;The first version of &lt;code&gt;readme-wizard&lt;/code&gt; was just a single &lt;code&gt;SKILL.md&lt;/code&gt; with instructions telling the agent to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;detect the project name, description, license, git remote, package manager, and CI setup&lt;/li&gt;
&lt;li&gt;add a better structure to the README&lt;/li&gt;
&lt;li&gt;use shields.io badges&lt;/li&gt;
&lt;li&gt;include a Quick Start section with real commands&lt;/li&gt;
&lt;li&gt;show a project structure tree&lt;/li&gt;
&lt;li&gt;add contributor avatars, documentation links, and optional social badges&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That first version worked.&lt;/p&gt;

&lt;p&gt;And that matters.&lt;/p&gt;

&lt;p&gt;One of the easiest mistakes to make with agent workflows is over-engineering too early. A single file is often enough to prove whether the workflow is useful before you invest more time into it.&lt;/p&gt;

&lt;p&gt;Here is the important part: start with the smallest thing that can produce a useful result on a real project.&lt;/p&gt;

&lt;h2&gt;
  
  
  What broke in practice
&lt;/h2&gt;

&lt;p&gt;Once I started testing the skill on real repos, the limitations showed up quickly.&lt;/p&gt;

&lt;p&gt;The main issue was not that the agent could not write a README. It could.&lt;/p&gt;

&lt;p&gt;The issue was consistency.&lt;/p&gt;

&lt;p&gt;The single-file version was asking the &lt;code&gt;SKILL.md&lt;/code&gt; to do too many jobs at once:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;writing guidance&lt;/li&gt;
&lt;li&gt;badge formats&lt;/li&gt;
&lt;li&gt;project-type adaptation rules&lt;/li&gt;
&lt;li&gt;README structure templates&lt;/li&gt;
&lt;li&gt;Mermaid diagram templates&lt;/li&gt;
&lt;li&gt;instructions for how to detect project metadata&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That creates a few problems.&lt;/p&gt;

&lt;p&gt;First, the file gets bloated fast. By the time I had all those rules and templates inline, it was over 150 lines and hard to maintain.&lt;/p&gt;

&lt;p&gt;Second, the agent had to figure out how to inspect the repo on every single run. There was no scanning script yet — just instructions saying "detect the package manager, find the license, parse the git remote." The agent would improvise that detection work each time. Sometimes it got it right. Sometimes it missed a CI workflow file, guessed at the wrong package manager, or invented social links that did not exist.&lt;/p&gt;

&lt;p&gt;Third, all of that detection reasoning burned tokens and produced inconsistent results. The kind of work that should be boring and repeatable was instead fuzzy and error-prone.&lt;/p&gt;

&lt;h2&gt;
  
  
  The turning point: treat the skill like a workflow, not a prompt
&lt;/h2&gt;

&lt;p&gt;That was the point where the skill stopped being just a better prompt and started becoming a real workflow.&lt;/p&gt;

&lt;p&gt;The structure ended up looking like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.agents/skills/readme-wizard/
├── SKILL.md
├── scripts/
│   └── scan_project.sh
├── references/
│   └── readme-best-practices.md
├── assets/
│   ├── badges.json
│   ├── diagrams.md
│   └── readme-template.md
└── evals/
    └── evals.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every part has a different job. And that is the point.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;code&gt;SKILL.md&lt;/code&gt; became the orchestrator
&lt;/h2&gt;

&lt;p&gt;Instead of being one giant wall of instructions, &lt;code&gt;SKILL.md&lt;/code&gt; became the thin coordinator.&lt;/p&gt;

&lt;p&gt;Its job is to define the workflow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;run the scan script&lt;/li&gt;
&lt;li&gt;read the README best-practices guide&lt;/li&gt;
&lt;li&gt;build from the template&lt;/li&gt;
&lt;li&gt;pull badge formats from the badge catalog&lt;/li&gt;
&lt;li&gt;validate against the eval assertions&lt;/li&gt;
&lt;li&gt;only load diagram templates if the project actually needs them&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is a much better use of the main skill file.&lt;/p&gt;

&lt;p&gt;It keeps the top-level instructions focused on sequence and judgment instead of burying everything in one place.&lt;/p&gt;

&lt;p&gt;Here is what the workflow section of the final &lt;code&gt;SKILL.md&lt;/code&gt; looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Workflow&lt;/span&gt;

&lt;span class="gu"&gt;### 1. Scan the project&lt;/span&gt;
Run &lt;span class="sb"&gt;`scripts/scan_project.sh &amp;lt;project-directory&amp;gt;`&lt;/span&gt; to collect structured JSON metadata.

&lt;span class="gu"&gt;### 2. Read the best practices guide&lt;/span&gt;
Read &lt;span class="sb"&gt;`references/readme-best-practices.md`&lt;/span&gt; before writing.

&lt;span class="gu"&gt;### 3. Build the README&lt;/span&gt;
Use &lt;span class="sb"&gt;`assets/readme-template.md`&lt;/span&gt; as the base structure.
Replace {{PLACEHOLDER}} markers with actual project data from the scan.

&lt;span class="gu"&gt;### 4. Add badges&lt;/span&gt;
Read &lt;span class="sb"&gt;`assets/badges.json`&lt;/span&gt; for the full badge catalog.
Only include badges for things that actually exist.

&lt;span class="gu"&gt;### 5. Validate the output&lt;/span&gt;
Review the generated README against the assertions in &lt;span class="sb"&gt;`evals/evals.json`&lt;/span&gt;.

&lt;span class="gu"&gt;### 6. Optionally add a diagram&lt;/span&gt;
Only read &lt;span class="sb"&gt;`assets/diagrams.md`&lt;/span&gt; if the project has multiple components.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Short, focused, and easy to follow. Each step points to another file instead of trying to carry everything inline.&lt;/p&gt;

&lt;h2&gt;
  
  
  The script handled the mechanical work
&lt;/h2&gt;

&lt;p&gt;The biggest improvement was moving repo scanning into a script.&lt;/p&gt;

&lt;p&gt;The skill now runs &lt;code&gt;scripts/scan_project.sh &amp;lt;project-directory&amp;gt;&lt;/code&gt; and gets structured JSON back with things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;project name&lt;/li&gt;
&lt;li&gt;description&lt;/li&gt;
&lt;li&gt;license&lt;/li&gt;
&lt;li&gt;owner and repo&lt;/li&gt;
&lt;li&gt;package manager&lt;/li&gt;
&lt;li&gt;CI provider and workflows&lt;/li&gt;
&lt;li&gt;social links&lt;/li&gt;
&lt;li&gt;directory structure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of the agent improvising that detection work every time, it runs one script and gets clean, structured data back. Boring and repeatable. Exactly what you want for metadata gathering.&lt;/p&gt;

&lt;p&gt;The current reference version also goes a bit further. It checks local files first, then uses the GitHub API to look up the repo homepage and crawls it for additional social links. That is a good example of how a skill can evolve — start with the reliable local-file path, then add enrichment once the core workflow is stable.&lt;/p&gt;

&lt;h2&gt;
  
  
  References and assets gave everything a home
&lt;/h2&gt;

&lt;p&gt;The remaining pieces fell into two folders.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;references/readme-best-practices.md&lt;/code&gt; holds the writing guidance: section order, tone, project-type adaptation, badge rules, and common pitfalls. The agent only reads it when it is about to write, not every time the skill loads.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;assets/&lt;/code&gt; holds reusable inputs: &lt;code&gt;badges.json&lt;/code&gt; for badge formats, &lt;code&gt;readme-template.md&lt;/code&gt; for the base README structure, and &lt;code&gt;diagrams.md&lt;/code&gt; for Mermaid templates when a project is complex enough to justify one.&lt;/p&gt;

&lt;p&gt;This is where the skill becomes easy to customize. Want to change badge styles? Edit the badge catalog. Want a different README structure? Edit the template. Want to skip diagrams for simpler repos? The skill just avoids loading that asset entirely.&lt;/p&gt;

&lt;p&gt;Keeping domain knowledge and data out of the main instructions makes the whole thing much easier to maintain.&lt;/p&gt;

&lt;h2&gt;
  
  
  Evals made the quality bar explicit
&lt;/h2&gt;

&lt;p&gt;Once the skill was doing real work, I wanted a way to define what good actually meant.&lt;/p&gt;

&lt;p&gt;That is what the evals are for.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;evals/evals.json&lt;/code&gt; file includes prompts for different cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a straightforward README improvement request&lt;/li&gt;
&lt;li&gt;a casual "make this look professional" request&lt;/li&gt;
&lt;li&gt;a minimal project that should not get bloated&lt;/li&gt;
&lt;li&gt;a badge-focused request that should only generate real badges&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I like this part because it forces the standards out into the open.&lt;/p&gt;

&lt;p&gt;Instead of vaguely feeling that the README is better, you can check for specific things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;no placeholder text&lt;/li&gt;
&lt;li&gt;badges only for real metadata&lt;/li&gt;
&lt;li&gt;Quick Start commands that match the detected package manager&lt;/li&gt;
&lt;li&gt;section depth proportional to the project&lt;/li&gt;
&lt;li&gt;no fabricated social links&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That makes the skill easier to improve without drifting.&lt;/p&gt;

&lt;h2&gt;
  
  
  The larger lesson
&lt;/h2&gt;

&lt;p&gt;The interesting thing about this project is not really README generation.&lt;/p&gt;

&lt;p&gt;The larger lesson is that a useful skill usually stops looking like a prompt pretty quickly.&lt;/p&gt;

&lt;p&gt;It becomes a small system.&lt;/p&gt;

&lt;p&gt;Some parts should stay flexible and language-driven.&lt;/p&gt;

&lt;p&gt;Some parts should be deterministic.&lt;/p&gt;

&lt;p&gt;Some parts should be reusable data.&lt;/p&gt;

&lt;p&gt;Some parts should act as tests.&lt;/p&gt;

&lt;p&gt;Once you see that pattern, it applies to a lot more than READMEs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;commit message workflows&lt;/li&gt;
&lt;li&gt;code review checklists&lt;/li&gt;
&lt;li&gt;release note generation&lt;/li&gt;
&lt;li&gt;internal documentation standards&lt;/li&gt;
&lt;li&gt;repo audits&lt;/li&gt;
&lt;li&gt;team-specific engineering conventions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the shift I find most useful when working with agents.&lt;/p&gt;

&lt;p&gt;You stop asking the model to improvise the whole workflow every time.&lt;/p&gt;

&lt;p&gt;Instead, you give it a structure that makes good behavior easier.&lt;/p&gt;

&lt;h2&gt;
  
  
  If you want to build your own skill
&lt;/h2&gt;

&lt;p&gt;If you want to build your own skill, this is the path I would recommend:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Start with one &lt;code&gt;SKILL.md&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Test it on a real project as early as possible.&lt;/li&gt;
&lt;li&gt;Watch for repeated logic and consistency failures.&lt;/li&gt;
&lt;li&gt;Move mechanical work into scripts.&lt;/li&gt;
&lt;li&gt;Move domain knowledge into references.&lt;/li&gt;
&lt;li&gt;Move templates and data into assets.&lt;/li&gt;
&lt;li&gt;Add evals once the skill matters enough to maintain.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That sequence keeps the architecture earned.&lt;/p&gt;

&lt;p&gt;You are not building a folder structure for its own sake. You are extracting parts only when they prove they deserve to exist.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it yourself
&lt;/h2&gt;

&lt;p&gt;If you want to explore the full tutorial series or inspect the finished reference implementation, the repo is here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/debs-obrien/learn-agent-skills" rel="noopener noreferrer"&gt;debs-obrien/learn-agent-skills&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And if you just want to try the skill without building it yourself:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx skills add debs-obrien/learn-agent-skills

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then open any project and tell your agent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Improve the README for this project using the readme-wizard skill.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The point is not just that a skill can write a better README.&lt;/p&gt;

&lt;p&gt;The point is how you get from a useful first draft to something reusable.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>documentation</category>
      <category>showdev</category>
    </item>
    <item>
      <title>I Used Skill Creator v2 to Improve One of My Agent Skills in VS Code</title>
      <dc:creator>Debbie O'Brien</dc:creator>
      <pubDate>Sat, 21 Mar 2026 08:32:36 +0000</pubDate>
      <link>https://forem.com/debs_obrien/i-used-skill-creator-v2-to-improve-one-of-my-agent-skills-in-vs-code-fhd</link>
      <guid>https://forem.com/debs_obrien/i-used-skill-creator-v2-to-improve-one-of-my-agent-skills-in-vs-code-fhd</guid>
      <description>&lt;p&gt;I just published a video showing how I used Skill Creator v2 to improve an existing AI skill inside VS Code, and honestly, I was seriously surprised at how much this thing does.&lt;/p&gt;

&lt;p&gt;What impressed me most is that it does much more than just rewrite instructions.&lt;/p&gt;

&lt;p&gt;It can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;review an existing skill&lt;/li&gt;
&lt;li&gt;suggest targeted improvements&lt;/li&gt;
&lt;li&gt;run evals against a baseline&lt;/li&gt;
&lt;li&gt;compare outputs side by side&lt;/li&gt;
&lt;li&gt;generate benchmark summaries&lt;/li&gt;
&lt;li&gt;help optimize descriptions for better triggering&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the video, I ran it against a skill I had already created to see whether the updated version actually performed better, or if there was anything I was missing.&lt;/p&gt;

&lt;p&gt;And it was a ton of fun.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Skill I Tested
&lt;/h2&gt;

&lt;p&gt;The skill I used for the demo was a skill I had already created called README Wizard.&lt;/p&gt;

&lt;p&gt;It basically generates a polished, professional README for any project and is meant to kick in whenever someone mentions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;improving a README&lt;/li&gt;
&lt;li&gt;project documentation&lt;/li&gt;
&lt;li&gt;badges&lt;/li&gt;
&lt;li&gt;first impressions for a repo&lt;/li&gt;
&lt;li&gt;making a GitHub repo look more professional&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It also checks project metadata, reads best practices, uses badges, Mermaid diagrams, and works from a README template. (need to create a video for this too.. on it)&lt;/p&gt;

&lt;p&gt;So rather than creating a skill from scratch, I wanted to see if Skill Creator v2 could improve something real that I had already built.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding Skill Creator v2
&lt;/h2&gt;

&lt;p&gt;The first thing I did was go to &lt;code&gt;skills.sh&lt;/code&gt; and search for Anthropic.&lt;/p&gt;

&lt;p&gt;From there, I found Skill Creator, and it now shows a summary of the skill which is nice.&lt;/p&gt;

&lt;p&gt;The skill covers test case creation, evaluation and also runs parallel test cases with and without the skill to measure impact, capturing timing and token usage for comparison.&lt;/p&gt;

&lt;p&gt;And on top of that, it generates an interactive browser-based reviewer showing outputs, qualitative feedback, and benchmark metrics.&lt;/p&gt;

&lt;p&gt;It also includes description optimization, which is really important for improving skill triggering accuracy by testing realistic trigger and non-trigger queries.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installing the Skill
&lt;/h2&gt;

&lt;p&gt;Installing it was pretty straightforward.&lt;/p&gt;

&lt;p&gt;I copied the install command from the page, pasted it into the terminal, and then selected where I wanted the skill installed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Looking Inside the Skill
&lt;/h2&gt;

&lt;p&gt;Before running it, I wanted to see what was actually inside the Skill Creator skill.&lt;/p&gt;

&lt;p&gt;Skills are written for agents, not really for you to sit there and read through line by line, but I always like to have a look.&lt;/p&gt;

&lt;p&gt;And this one is pretty complex.&lt;/p&gt;

&lt;p&gt;It includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the SKILL.md file&lt;/li&gt;
&lt;li&gt;its own agents&lt;/li&gt;
&lt;li&gt;references and schemas&lt;/li&gt;
&lt;li&gt;Python scripts&lt;/li&gt;
&lt;li&gt;an eval viewer&lt;/li&gt;
&lt;li&gt;review tooling&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What I found really cool is that it comes with its own agents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;an analyzer agent&lt;/li&gt;
&lt;li&gt;a comparator&lt;/li&gt;
&lt;li&gt;a grader&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The analyzer looks at comparison results and tries to understand why the winner won and generate better suggestions.&lt;/p&gt;

&lt;p&gt;The comparator compares two outputs without knowing which skill produced them.&lt;/p&gt;

&lt;p&gt;The grader evaluates expectations against the execution transcript and outputs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Running It Against a Real Skill
&lt;/h2&gt;

&lt;p&gt;I then used Skill Creator against my README Wizard skill.&lt;/p&gt;

&lt;p&gt;I’ve done this a couple of times now, and I found that in VS Code I sometimes need to be a little more explicit if I want the full benefit of the sub agents.&lt;/p&gt;

&lt;p&gt;Claude seems to pick that up more naturally because the skill was built for it, but in VS Code I wanted to make sure it really used everything available.&lt;/p&gt;

&lt;p&gt;So I’d definitely encourage being explicit there.&lt;/p&gt;

&lt;h2&gt;
  
  
  What It Found
&lt;/h2&gt;

&lt;p&gt;Very quickly, it started identifying issues with my skill.&lt;/p&gt;

&lt;p&gt;Things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the workflow being under-specified&lt;/li&gt;
&lt;li&gt;missing guidance for handling existing or missing READMEs&lt;/li&gt;
&lt;li&gt;README best practices being too thin&lt;/li&gt;
&lt;li&gt;sections that should only appear if relevant links exist&lt;/li&gt;
&lt;li&gt;eval coverage being too small&lt;/li&gt;
&lt;li&gt;missing edge cases&lt;/li&gt;
&lt;li&gt;limited project detection&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example, it pointed out that a personal learning repo probably doesn’t need the same sections as every other project.&lt;/p&gt;

&lt;p&gt;It also spotted that I only had two evals and suggested adding more realistic test cases, including edge cases like minimal projects and badge-focused README requests.&lt;/p&gt;

&lt;h2&gt;
  
  
  Applying Improvements
&lt;/h2&gt;

&lt;p&gt;Once it had reviewed everything, it started applying targeted improvements across the skill files.&lt;/p&gt;

&lt;p&gt;This part was honestly kind of exciting to watch because it moved fast.&lt;/p&gt;

&lt;p&gt;It updated the skill instructions and made the guidance more explicit.&lt;/p&gt;

&lt;p&gt;It improved the best practices.&lt;/p&gt;

&lt;p&gt;It tightened up the logic around when certain sections should or shouldn’t be included.&lt;/p&gt;

&lt;p&gt;And it expanded the eval coverage.&lt;/p&gt;

&lt;p&gt;I could go through the changed files while it was working and see that it wasn’t just randomly changing things. It was making focused improvements that actually made sense.&lt;/p&gt;

&lt;p&gt;That part gave me a lot of confidence.&lt;/p&gt;

&lt;h2&gt;
  
  
  Adding More Evals
&lt;/h2&gt;

&lt;p&gt;One thing I really liked was how it expanded the eval set.&lt;/p&gt;

&lt;p&gt;I had two evals.&lt;/p&gt;

&lt;p&gt;It added more.&lt;/p&gt;

&lt;p&gt;For example, it created cases around:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;minimal project README generation&lt;/li&gt;
&lt;li&gt;badge-focused requests&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And these evals work like tests.&lt;/p&gt;

&lt;p&gt;They include assertions such as whether the output has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a project description&lt;/li&gt;
&lt;li&gt;a quick start or usage section&lt;/li&gt;
&lt;li&gt;appropriate badges&lt;/li&gt;
&lt;li&gt;the right structure for the kind of project being documented&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This was super useful because it meant I wasn’t just guessing whether the skill was better. I could actually measure it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Running Sub Agents in Parallel
&lt;/h2&gt;

&lt;p&gt;Then came one of the coolest parts.&lt;/p&gt;

&lt;p&gt;It launched sub-agent runs in parallel.&lt;/p&gt;

&lt;p&gt;It ran:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the improved skill&lt;/li&gt;
&lt;li&gt;the old skill baseline&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;side by side across multiple test cases.&lt;/p&gt;

&lt;p&gt;That meant it could directly compare the version with the new changes against the original version.&lt;/p&gt;

&lt;p&gt;This is where the workflow really stood out to me. It wasn’t just making edits and calling it a day. It was actually testing whether the changes improved results.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Results
&lt;/h2&gt;

&lt;p&gt;After the runs completed, it graded all the outputs against their assertions and generated benchmark results.&lt;/p&gt;

&lt;p&gt;The improved skill outperformed the baseline on two out of four evals and tied on the other two.&lt;/p&gt;

&lt;p&gt;The overall result improved from 81 to 97.5.&lt;/p&gt;

&lt;p&gt;That’s a 15.7% improvement.&lt;/p&gt;

&lt;p&gt;Some of the biggest wins came from improving the skill’s ability to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;generate good content even when metadata is sparse&lt;/li&gt;
&lt;li&gt;adapt README length and sections to different project types instead of always forcing the full template&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Workspace It Creates
&lt;/h2&gt;

&lt;p&gt;Another thing I wanted to show in the video was the workspace it creates while doing all this.&lt;/p&gt;

&lt;p&gt;It creates a workspace folder where it stores things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;skill snapshots&lt;/li&gt;
&lt;li&gt;old skill outputs&lt;/li&gt;
&lt;li&gt;grading results&lt;/li&gt;
&lt;li&gt;benchmark data&lt;/li&gt;
&lt;li&gt;iteration files&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You don’t necessarily need to go through all of that manually, but it’s very cool that you can.&lt;/p&gt;

&lt;p&gt;If you want to inspect exactly what happened at each stage, it’s all there.&lt;/p&gt;

&lt;p&gt;That level of visibility is really nice.&lt;/p&gt;

&lt;h2&gt;
  
  
  The HTML Eval Viewer
&lt;/h2&gt;

&lt;p&gt;Then I asked whether there was a way to see the benchmarks in HTML.&lt;/p&gt;

&lt;p&gt;And yes, there is.&lt;/p&gt;

&lt;p&gt;Skill Creator has an eval viewer for that.&lt;/p&gt;

&lt;p&gt;This was another really nice surprise.&lt;/p&gt;

&lt;p&gt;It launched an HTML review page showing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;old skill vs improved skill&lt;/li&gt;
&lt;li&gt;formal grades&lt;/li&gt;
&lt;li&gt;pass/fail results&lt;/li&gt;
&lt;li&gt;benchmark comparisons&lt;/li&gt;
&lt;li&gt;review flows for feedback submission&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It’s made for a human to read.&lt;/p&gt;

&lt;p&gt;You can actually review what happened and decide whether you agree with the results.&lt;/p&gt;

&lt;p&gt;I really liked that.&lt;/p&gt;

&lt;h2&gt;
  
  
  Description Optimization
&lt;/h2&gt;

&lt;p&gt;And then, because apparently this skill wasn’t done showing off yet, I ran the description optimization flow as well.&lt;/p&gt;

&lt;p&gt;This generates trigger and non-trigger queries to see whether your skill description is actually good enough to fire when it should, and stay out of the way when it shouldn’t.&lt;/p&gt;

&lt;p&gt;That workflow lets you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;review trigger queries&lt;/li&gt;
&lt;li&gt;review non-trigger queries&lt;/li&gt;
&lt;li&gt;edit them&lt;/li&gt;
&lt;li&gt;export the eval set&lt;/li&gt;
&lt;li&gt;run the optimization loop&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is super valuable.&lt;/p&gt;

&lt;p&gt;A lot of the time, the problem with a skill is not the logic inside it. It’s that the description is not specific enough, or not clear enough, for the agent to trigger it properly.&lt;/p&gt;

&lt;p&gt;So I really liked that this was built in too.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;If you’re already building custom skills for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub Copilot&lt;/li&gt;
&lt;li&gt;Claude Code&lt;/li&gt;
&lt;li&gt;or other coding agents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;this is absolutely worth checking out.&lt;/p&gt;

&lt;p&gt;And yes, the video is long.&lt;/p&gt;

&lt;p&gt;But the skill does a lot, and I found it hard to cut it down because I kept finding more things it could do.&lt;/p&gt;

&lt;p&gt;So I pretty much left it as is.&lt;/p&gt;

&lt;p&gt;Have fun.&lt;/p&gt;

&lt;h2&gt;
  
  
  Watch the Video
&lt;/h2&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/WplS5lycPHM"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;If you’re creating your own skills already, or even just experimenting with prompts and instructions, I’d be really curious to know how you’re approaching it.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>githubcopilot</category>
    </item>
    <item>
      <title>Build Websites, Games, and Teaching Resources With Google Gemini for Free (No Coding Required)</title>
      <dc:creator>Debbie O'Brien</dc:creator>
      <pubDate>Sun, 15 Mar 2026 17:14:32 +0000</pubDate>
      <link>https://forem.com/debs_obrien/build-websites-games-and-teaching-resources-with-google-gemini-for-free-no-coding-required-3gld</link>
      <guid>https://forem.com/debs_obrien/build-websites-games-and-teaching-resources-with-google-gemini-for-free-no-coding-required-3gld</guid>
      <description>&lt;p&gt;I literally came home from a podcast interview and my husband said, "Debbie, I've built three websites." I said, "What?" And he said, "Yes, I've built three websites." And I said, "I heard you, but what?"&lt;/p&gt;

&lt;p&gt;My husband is not in tech. He has never cared about anything I do in tech. He does not know how to build anything. He works in the public sector and is rarely on the computer. And yet, he was able to build three websites in 10 minutes. He watched a video for 10 minutes and just went for it. I knew I had to tell the world about this.&lt;/p&gt;

&lt;h2&gt;
  
  
  How he did it
&lt;/h2&gt;

&lt;p&gt;He went to &lt;a href="https://gemini.google.com" rel="noopener noreferrer"&gt;gemini.google.com&lt;/a&gt;. That's it. This is free. He paid absolutely zero money. I do have a pro account, but he has a free account and can just build websites. Seriously, you've got to check it out.&lt;/p&gt;

&lt;p&gt;When you open Gemini, you'll see a bunch of options like create image, create music, create video, write anything, help me learn. There's a lot you can play around with. But there are a couple of other things that I find are a little bit hidden. If you click into the options, you'll see &lt;strong&gt;Canvas&lt;/strong&gt;, deep research, guided learning, and more.&lt;/p&gt;

&lt;p&gt;Canvas is the one we want. Open a canvas, pick the fast model (totally fine for this), and you're ready to go.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building a Batman game for toddlers
&lt;/h2&gt;

&lt;p&gt;After my husband built his three websites, the next day he started building games for our kids. I wanted to show you what that looks like so I typed in "build a Batman game for toddlers." That's it. A simple prompt.&lt;/p&gt;

&lt;p&gt;The first thing that pops up is all this code. Don't be scared by that. As my husband said, "I watched it spit out all the code that you normally write by hand." I normally write this code by hand. That's the insane thing.&lt;/p&gt;

&lt;p&gt;Once the code finishes generating, it jumps into preview mode. And there it was. A Batman city helper game where you move Batman to collect stars. I was just moving with the trackpad and it worked. It was actually really simple for toddlers to play.&lt;/p&gt;

&lt;h2&gt;
  
  
  Just use your imagination
&lt;/h2&gt;

&lt;p&gt;The cool thing is you just have to have an imagination and think about how to make it better. I typed "can you add sound?" and Gemini added a cheerful ping whenever Batman catches a star or a balloon. I could hear the little sound effects and honestly it was amazing.&lt;/p&gt;

&lt;p&gt;You can even select a specific area of the preview, drag a box over it, and ask Gemini to make changes right there. I dragged over the top area and said "can you add the person's name here?" You don't even have to specify "the right hand corner" or anything. Just drag and ask. It figured it out.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building multiple things at the same time
&lt;/h2&gt;

&lt;p&gt;Here's something people don't even know about. While one thing is generating, you can open a new canvas and start building something else. I had the Batman game going and at the same time asked it to create a website for a circus act.&lt;/p&gt;

&lt;p&gt;One line. That's all I typed. And out came a full circus website. "A night of pure magic." It looked incredible. And I could iterate over it, add the actual address, YouTube links, whatever I wanted.&lt;/p&gt;

&lt;p&gt;Then I created some games for learning numbers for toddlers. I used to be a school teacher. I used to have to prep and create all this stuff in my free time. And now I can just create resources on the fly.&lt;/p&gt;

&lt;h2&gt;
  
  
  The number learning game blew my mind
&lt;/h2&gt;

&lt;p&gt;The toddler number game it created was called "Number Fun" with a bubble pop game. It actually spoke out loud and said "pop the bubbles in order, start with one." You go one, two, three. I tried doing it wrong on purpose and it said "find number four." It had a home button where you could click on count items and count fruits. One, two, three, four. It had sound you could turn on and off.&lt;/p&gt;

&lt;p&gt;This is what kids want. This is the kind of interactive learning material that used to take ages to build. And I made it with one prompt.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sharing your creations
&lt;/h2&gt;

&lt;p&gt;You've got access to the code if you know how to code and want to tweak things yourself. But the really cool thing is the share button. You can copy the content, share it with someone, or just copy a link. My husband was sharing things with me saying "here's a website I've started, can you help me improve it?"&lt;/p&gt;

&lt;p&gt;You can also go to previous versions and see changes saved. And there's an option to add Gemini features, which adds AI stuff to your creation. Great for writing stories. We actually created a book that reads the story to you. I can't even remember all the things we created in just a couple of minutes.&lt;/p&gt;

&lt;p&gt;The one thing that is missing is there's no quick and easy deploy button. You can share a link and people can see it, which is kind of like it's deployed but not really deployed. It's a bit weird. But for getting a design together and having someone help you with the deployment part later? It works.&lt;/p&gt;

&lt;h2&gt;
  
  
  The only limit is your imagination
&lt;/h2&gt;

&lt;p&gt;Whether you're creating games, building your own book, making material for teaching, or putting together a website and having someone help you with the deploy part, you're in control.&lt;/p&gt;

&lt;p&gt;This is free. You might hit some limits if you keep going nonstop, and then you can pay for more. But to try it out, it costs nothing. Just go to &lt;a href="https://gemini.google.com" rel="noopener noreferrer"&gt;gemini.google.com&lt;/a&gt; and start building.&lt;/p&gt;

&lt;p&gt;I encourage you all to play around with this. Think about the possibilities. The technical barrier is gone. The only problem now is your imagination. So start being creative and just start imagining things.&lt;/p&gt;

&lt;p&gt;Have fun building.&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/BUDAS6E4xyY"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

</description>
      <category>ai</category>
      <category>gemini</category>
      <category>webdev</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>What Are Agent Skills? Beginners Guide</title>
      <dc:creator>Debbie O'Brien</dc:creator>
      <pubDate>Wed, 04 Mar 2026 18:27:36 +0000</pubDate>
      <link>https://forem.com/debs_obrien/what-are-agent-skills-beginners-guide-e2n</link>
      <guid>https://forem.com/debs_obrien/what-are-agent-skills-beginners-guide-e2n</guid>
      <description>&lt;p&gt;AI agents are smart. But they're generic. Your agent is trained on a ton of general knowledge, but it doesn't have your specific domain knowledge. It doesn't know your preferences, your team's conventions, or how you personally want things done.&lt;/p&gt;

&lt;p&gt;When we learn a new skill — playing basketball, riding a bike — we're adding knowledge we didn't have before. Skills work the same way for your agent. You give it the domain knowledge it's missing, personalized to how you want things done.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is a skill?
&lt;/h3&gt;

&lt;p&gt;A skill is a reusable set of instructions that teaches an AI agent how to do a specific task well. Think of it like a recipe card you hand to a talented chef. The chef knows how to cook, but they don't know your family's secret sauce. The recipe card tells them exactly what to do.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Without a skill&lt;/strong&gt; → the agent produces generic output&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;With a skill&lt;/strong&gt; → the agent follows your instructions and produces exactly what you want, every time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At its simplest, a skill is just &lt;strong&gt;one file&lt;/strong&gt;: a &lt;code&gt;SKILL.md&lt;/code&gt; with a name, description, and instructions. That's it. You can add extras like scripts, references, assets, and evals — but you don't have to. All you need right now is the &lt;code&gt;SKILL.md&lt;/code&gt; file.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fys1th2hfjfp7d29oam26.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fys1th2hfjfp7d29oam26.png" alt="anatonomy of a skill"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let's build one.&lt;/p&gt;

&lt;h3&gt;
  
  
  Build your first skill
&lt;/h3&gt;

&lt;p&gt;Open VS Code in your project directory. We're going to create a &lt;code&gt;good-morning&lt;/code&gt; skill step by step.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Create the folder structure&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Create a new folder in your project root. You can use &lt;code&gt;.agents/&lt;/code&gt;, &lt;code&gt;.github/&lt;/code&gt;, or &lt;code&gt;.claude/&lt;/code&gt; — they all work. The &lt;code&gt;.agents/skills/&lt;/code&gt; path is the cross-agent convention that works with Copilot. Inside that, create a &lt;code&gt;skills&lt;/code&gt; folder, and inside that, create a folder called &lt;code&gt;good-morning&lt;/code&gt;. This folder name is your skill's name.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;your-project/
└── .agents/
    └── skills/
        └── good-morning/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 2: Create the SKILL.md file&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Inside the &lt;code&gt;good-morning&lt;/code&gt; folder, create a file called &lt;code&gt;SKILL.md&lt;/code&gt;. It must be in capital letters — that's how agents find it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;your-project/
└── .agents/
    └── skills/
        └── good-morning/
            └── SKILL.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 3: Add the frontmatter&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Open &lt;code&gt;SKILL.md&lt;/code&gt; and add the YAML frontmatter at the top:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;good-morning&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;skill&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;that&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;responds&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;good&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;morning&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;with&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;cheerful&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;greeting"&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two important things here:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The name must match the folder name.&lt;/strong&gt; If the folder is called &lt;code&gt;good-morning&lt;/code&gt;, the name must be &lt;code&gt;good-morning&lt;/code&gt;. If they don't match, your editor will flag it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The name and description are always in context.&lt;/strong&gt; Every time you're working in this project, the agent sees the name and description so it knows what skills are available. Keep the description short and specific, this is how the agent knows when to use the skill.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Step 4: Write the instructions&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Everything below the frontmatter is the skill body. This only gets added to context &lt;strong&gt;when the skill is called&lt;/strong&gt;, not all the time. The agent only loads these instructions when it decides to use the skill.&lt;/p&gt;

&lt;p&gt;Add the body below the frontmatter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;good-morning&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;A skill that responds to good morning with a cheerful greeting&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gh"&gt;# Good Morning Skill&lt;/span&gt;

When the user says good morning, respond with:
&lt;span class="p"&gt;
-&lt;/span&gt; "Hi Debbie, hope you have a great day!"
&lt;span class="p"&gt;-&lt;/span&gt; Ask if they have done any sport today
&lt;span class="p"&gt;-&lt;/span&gt; Include a funny joke about sports

&lt;span class="gu"&gt;## Example&lt;/span&gt;

&lt;span class="gs"&gt;**User:**&lt;/span&gt; Good morning

&lt;span class="gs"&gt;**Agent:**&lt;/span&gt; Hi Debbie, have you done any sport today? Here's a funny joke about sports: Why did the soccer player bring string to the game? Because he wanted to tie the score!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the complete skill. One file. A few lines of instructions. Make it as personal as you like, put your own name in there, change the topic from sports to whatever you want.&lt;/p&gt;

&lt;h3&gt;
  
  
  Test it
&lt;/h3&gt;

&lt;p&gt;Start a &lt;strong&gt;new session&lt;/strong&gt; from the same directory (skills are discovered at session start) and type:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Good morning&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The agent finds the skill, reads the &lt;code&gt;SKILL.md&lt;/code&gt; file, and responds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In GitHub Copilot&lt;/strong&gt;: &lt;em&gt;"Hi Debbie, have you done any sport today? Here's a funny joke about sports: Why did the bicycle fall over? Because it was too tired from all that cycling!"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In Claude Code&lt;/strong&gt;: Open Claude Code from the same project directory, say "good morning", and you get the same thing: &lt;em&gt;"Hi Debbie, have you done any sport today? Here's a funny joke for you: Why do basketball players love donuts? Because they can always dunk them!"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Skills work across agents. The same &lt;code&gt;SKILL.md&lt;/code&gt; file works in Copilot, Claude Code, and others. Each agent discovers the skill, reads the instructions, and follows them.&lt;/p&gt;

&lt;p&gt;That's a skill in action. Now imagine instead of "good morning", the instructions told the agent how to generate a polished README, write commit messages in your team's format, or review code against your standards. Same idea, bigger impact.&lt;/p&gt;

&lt;h3&gt;
  
  
  How skills get loaded
&lt;/h3&gt;

&lt;p&gt;Skills are designed to be efficient with context windows. They use a three-level loading system. The agent only loads what it needs, when it needs it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa1kun34lhdwapm8s56py.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa1kun34lhdwapm8s56py.png" alt="how skills get loaded"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Level 1&lt;/strong&gt; is always in the agent's context. It's just the name and description (~100 words). This is how the agent decides whether to use the skill. If someone says "improve my README", the agent scans its available skills and picks the one whose description matches.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Level 2&lt;/strong&gt; loads when the skill triggers. The full SKILL.md body with all the instructions, steps, and examples. This is ideally under 500 lines.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Level 3&lt;/strong&gt; loads on demand. Scripts, references, and assets that the agent pulls in only when it needs them. Scripts can even run without being loaded into context at all, saving tokens. And some resources might not load at all for certain projects. For example, a diagram template file only needs to be read if the project is complex enough to need an architecture diagram. Simple projects skip it entirely.&lt;/p&gt;

&lt;p&gt;This matters because context windows are limited. A well-designed skill is lean at the top and detailed at the bottom.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where skills live
&lt;/h3&gt;

&lt;p&gt;Skills can be installed at two levels:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Project-level&lt;/strong&gt;: in your project directory, available only when you're in that directory&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Global&lt;/strong&gt;: in your home directory, available from anywhere&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each agent checks slightly different locations:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub Copilot (VS Code)&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Project-level (any of these work)
your-project/.github/skills/
your-project/.claude/skills/
your-project/.agents/skills/

# Personal (works from any directory)
~/.copilot/skills/
~/.claude/skills/
~/.agents/skills/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Claude Code&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Project-level
your-project/.claude/skills/

# Personal (works from any directory)
~/.claude/skills/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;.agents/skills/&lt;/code&gt; path is part of the &lt;a href="https://agentskills.io" rel="noopener noreferrer"&gt;Agent Skills open standard&lt;/a&gt; which is a cross-tool standard, but Claude Code uses its own &lt;code&gt;.claude/&lt;/code&gt; directory structure, not &lt;code&gt;.agents/&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  The skills ecosystem
&lt;/h3&gt;

&lt;p&gt;There's a whole directory of skills at &lt;a href="https://skills.sh" rel="noopener noreferrer"&gt;skills.sh&lt;/a&gt; where you can browse and discover skills built by the community.&lt;/p&gt;

&lt;p&gt;To install a skill, use the skills CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx skills add anthropics/skills &lt;span class="nt"&gt;--skill&lt;/span&gt; skill-creator
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This installs the &lt;code&gt;skill-creator&lt;/code&gt; skill from Anthropic. A skill that helps you create other skills. One command and it's ready to use.&lt;/p&gt;

&lt;p&gt;You can see what you have installed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx skills list
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And search for skills:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx skills find
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Skills work across multiple AI agents — Copilot, Claude Code, Cursor, Goose, and many more. The skills CLI handles installing to the right location for each agent.&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/2REiUlciObk"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>webdev</category>
    </item>
    <item>
      <title>I Needed an APP to Track My Learning Journey, AI built it in less than half an hour with a single prompt</title>
      <dc:creator>Debbie O'Brien</dc:creator>
      <pubDate>Sat, 17 Jan 2026 20:31:26 +0000</pubDate>
      <link>https://forem.com/debs_obrien/i-needed-an-app-to-track-my-learning-journey-ai-built-it-in-less-than-half-an-hour-with-a-single-31c4</link>
      <guid>https://forem.com/debs_obrien/i-needed-an-app-to-track-my-learning-journey-ai-built-it-in-less-than-half-an-hour-with-a-single-31c4</guid>
      <description>&lt;p&gt;I have been trying to build a Learning Hub App for a good few months using various tools and have had many iterations back and forth on trying to get it to work and ended up going round in circles fixing one thing and breaking another. &lt;/p&gt;

&lt;p&gt;Today I gave it a shot using &lt;a href="https://block.github.io/goose/" rel="noopener noreferrer"&gt;goose coding agent&lt;/a&gt; and Claude Opus 4.5 as my modal. I used a simple prompt. No .prd file. Just clear instructions of what I wanted and I asked it to come up for a plan for it.&lt;/p&gt;

&lt;p&gt;And it did. The plan was great. It started off with: What a great idea! Let me put together a comprehensive plan for this app. I'll call it "AI Learning Hub" - your personal knowledge management system for AI learning resources.&lt;/p&gt;

&lt;p&gt;I got a Core Features Overview, Tech Stack Recommendation, Data Structure,  UI Layout Concept, Build phases and some extra nice to have stuff for future improvements. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3hj38yxohffpm2zn40sw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3hj38yxohffpm2zn40sw.png" alt="layout of a site design in goose" width="800" height="686"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I asked it to change two things and then off it went. In less than half an hour it was not just built but tested before my eyes by using the Playwright MCP. I watched as it added links tested out the filters and used the site like a user would ensuring everything worked. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foodhfu2u1ai94wo8cb30.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foodhfu2u1ai94wo8cb30.png" alt="goose using Playwright MCP to verify its work" width="800" height="674"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  AI Learning Hub - Complete!
&lt;/h2&gt;

&lt;p&gt;My agent told me: I've built your AI Learning Hub app! Here's a summary of what's been created:&lt;/p&gt;

&lt;p&gt;Features Implemented&lt;br&gt;
Add URLs with metadata fetching: ✅ Auto-fetches title, description, detects category&lt;br&gt;
Categories: ✅ Blog, Video, Podcast, Course, Paper, Other&lt;br&gt;
Learning workflow :✅ Queue → Learning → Completed&lt;br&gt;
Search: ✅ Full-text search across resources&lt;br&gt;
Category filter: ✅ Filter dropdown&lt;br&gt;
Notes: ✅ Rich notes modal per resource&lt;br&gt;
Drag &amp;amp; Drop: ✅ dnd-kit installed, drag handles ready&lt;br&gt;
Content Ideas: ✅ Sidebar with types (Blog, Video, Tutorial, Thread)&lt;br&gt;
Password protection: ✅ Environment variable based&lt;br&gt;
Netlify ready: ✅ Config file created&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8aa8jn48sfj1c3jxwbc3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8aa8jn48sfj1c3jxwbc3.png" alt="ai learning hub site" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And how do I add the URLs of the sites I want to add, easy. Just paste them into goose and it will use the Playwright MCP which I already configured and it will populate them all for me so I don't have to.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;And that's it, finished and working in less than half an hour while I drank a beer on a Saturday night watching it all in amazement that something that should have taken me weeks to build was built before my very own eyes in minutes. &lt;/p&gt;

&lt;p&gt;Now here is the thing. I didn't open an editor once. I haven't looked at the code. It is working as it should and that is really all I care about for this particular project. I studied coding. I care about the quality of code but right now I am ok with not caring. I am ok with trusting the agent and LLM to ensure the code is good and meets the standards it should.&lt;/p&gt;

&lt;p&gt;I will add tests and check performance and out of curiosity I might just look at the code when doing the pr. But I am seriously blown away with how easy it is to do this. &lt;/p&gt;

&lt;h2&gt;
  
  
  Try it yourself
&lt;/h2&gt;

&lt;p&gt;Want to give it a try yourself: Here is the prompt I used:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I would like you to build me an app so that I can easily manage urls for blog posts, podcasts, videos and other things that I would like to learn when it comes to AI. It would be great to be able to easily add the URL and then have a title and description field which can be populated when adding it. search by category would be great. I would be cool to have some sort of system like a todo list so when it is done it goes to a different place but is still findable should i want to share it with someone. maybe even notes so i could add some notes on it for later findings or note taking. should be able to prioritize things so that i learn things based on a particualar order maybe drag or drop so i can change it. it should be a fun app that i can easily deploy, nice and easy on the eye. it would also great to have a section where i can put ideas on content creation based off of the stuff I have learnt. these could be create blog posts, videos etc. just an idea and not sure if this will look great but we could try it out. can you come up with some sort of a plan for this.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>mcp</category>
      <category>goose</category>
      <category>ai</category>
      <category>playwright</category>
    </item>
    <item>
      <title>How I Use AI Agents + MCP to Fully Automate My Website’s Content</title>
      <dc:creator>Debbie O'Brien</dc:creator>
      <pubDate>Wed, 14 Jan 2026 19:29:47 +0000</pubDate>
      <link>https://forem.com/debs_obrien/how-i-use-ai-agents-mcp-to-fully-automate-my-websites-content-3ekj</link>
      <guid>https://forem.com/debs_obrien/how-i-use-ai-agents-mcp-to-fully-automate-my-websites-content-3ekj</guid>
      <description>&lt;p&gt;Recently I have been playing with a lot of tools to help automate simple tasks just so I can keep my website uptodate. As I create a lot of content from videos to blog posts and appear as guests on many podcasts I want to have this reflected on my site as it's good to have all this info in one place to easily share with others and it's also great to look back on. But it is tedious and it takes time, time that I have very little of. So this is a perfect use case for AI to take over this task. So where do I start.&lt;/p&gt;

&lt;h2&gt;
  
  
  Before AI
&lt;/h2&gt;

&lt;p&gt;First of all let me tell you what it is like to add a new podcast episode to my site. I use Nuxt content so basically each podcast is just a markdown file with some yaml. This yaml contains things like the date, the name of the podcast the image url and the host for example. Last year I was simply getting an old podcast episode and clicking duplicate in VSCode and then renaming everything with the new podcast information. So basically manually clicking the link to new episode and copying and pasting the information from that site where it hosted into my markdown file. Then I had to download the image and upload it to cloudinary, get the image name from there and paste that into my file. Cloudinary is great for managing images and keeping my site performant but the extra work of downloading and uploading the images was tedious and meant that sometimes it took me ages to add new episodes to my site cause I simply couldn't be bothered doing it. &lt;/p&gt;

&lt;h2&gt;
  
  
  Automating with Prompts and the Playwright MCP
&lt;/h2&gt;

&lt;p&gt;I started to automate some of the process by using reusable prompts in VSCode. I created instructions for Copilot of what it needed to do and then all I had to do was press the play button to run the prompt in a new chat and then give it the URL for the podcast. I had the Playwright MCP installed so Copilot would use it to navigate to the URL of where the podcast was hosted and find the relevant information it needed to complete the metadata. It was pretty good and saved a lot of time and I could even bulk update new episodes by giving it more than one URL. &lt;/p&gt;

&lt;p&gt;However the images where still an issue and I was tempted to actually just stop using Cloudinary just cause it was quicker and easier to use images stored in the public folder of my site. But then I would loose out on the benefits of Cloudinary and it's image optimization. &lt;/p&gt;

&lt;h2&gt;
  
  
  Automating with Goose, Playwright MCP, Cloudinary MCP &amp;amp; GitHub MCP
&lt;/h2&gt;

&lt;p&gt;I then started playing around with Goose, a coding agent from Block. Goose is a desktop app although there is also a CLI available. I decided to give it a go and see if I could improve the way I automated this process. I probably could have continued playing around in VSCode and achieved similar but lately I have been trying to code, or should I say, get tasks done, without using an editor. As in just review the pull request later on CI and let the agent do it's thing cause I really believe this is the way we are heading so I want to keep experimenting on how it feels to code this way. &lt;/p&gt;

&lt;p&gt;So I copied my prompt into Goose and saved it as a recipe. Recipes seem pretty similar to prompts but you can use parameters so I could add the podcast url as a parameter and it will automatically get detected. There are also a lot of other options which I haven't got round to properly checking out but this seemed enough for my use case. Now I have the Playwright MCP navigating to the site and getting all the info I need for the podcast page. I then just asked Goose to download the image for me and add it locally and it did. This was great but did I really want to just stop using Cloudinary just cause I was lazy!&lt;/p&gt;

&lt;p&gt;So I thought what if Cloudinary had an MCP and then Goose could just use that MCP server to upload the image and then update the image metadata with the correct image id. Now if it could do that then all my problems would be solved. And so I looked in Goose's extensions and searched for Cloudinary and would you beleive it there was an MCP server for Cloudinary. Not only that but it actually worked. Goose used the Playwright MCP to navigate to the site and get all the content it needed including the image and then the Cloudinary MCP was used to add the image to my Cloudinary account using my API key stored in the extension's settings. It even figured out which folder to save it to without me asking. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F57bviw879thizuqsy35r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F57bviw879thizuqsy35r.png" alt="cloudinary mcp in goose" width="800" height="686"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And that was it. It all just worked. I checked my Cloudinary account and the images were there. I then asked Goose to run the dev server and verify it's work using the Playwright MCP by navigating to the podcasts page to ensure everything looked as it should. Not only could I see the browser being opened and see the new podcast episodes with images but I could also ask for a screenshot of the page. &lt;/p&gt;

&lt;p&gt;Then one more thing of course. We had come so far so may as well finish it all off. I then asked Goose to create a pull request which it did using the used the GitHub MCP which I previously configured. I then reviewed the code just incase anything looked wrong, especially with regards to the cloudinary URL, even though I had visually reviewed it and as you can imagine, it was good to go. I merged it and new podcast episodes were added to my site.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;So heres the thing. It took me time to figure all this out and set up the process and ensure it was all working. Yes it would have been quicker to copy and paste myself. But now it's done and the next time I want to add a podcast episode I just have to run my recipe in Goose and pass in the podcast URL. I am a guest on one tonight so when that is out I will be able to add it easily to the site. In fact if I had a team of people working on my site I could even share the recipe with them and they could simply run it. &lt;/p&gt;

&lt;p&gt;I am using Nuxt content for my site which means I have no CMS. My content lives in markdown files and it makes it very easy as a developer to add content but perhaps not so easy for non developers. But now, now even my mother could add a new podcast episode to my site. That is just amazing. This is just my personal site but think about the possibilities of this use case for many other businesses. &lt;/p&gt;

&lt;p&gt;I am very impressed with what Goose can do. The more I am using it the more it is blowing my mind. I am now going to go ahead and add other receipes for the rest of the content I add or perhaps just modify this recipe with parameters so I can have one recipe. I shall keep playing around. This is fun.&lt;/p&gt;

&lt;p&gt;Let me know if you found this interesting or are doing something similar or have used any of the MCPs  mentioned above. We are living in exciting times so if you haven't started to experiment yet then what are you waiting for. Just play around and have fun. &lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>playw</category>
      <category>goose</category>
    </item>
    <item>
      <title>Debugging My Zsh Config With Goose (and Why Agentic AI Actually Helped)</title>
      <dc:creator>Debbie O'Brien</dc:creator>
      <pubDate>Mon, 15 Dec 2025 15:37:30 +0000</pubDate>
      <link>https://forem.com/debs_obrien/debugging-my-zsh-config-with-goose-and-why-agentic-ai-actually-helped-1noh</link>
      <guid>https://forem.com/debs_obrien/debugging-my-zsh-config-with-goose-and-why-agentic-ai-actually-helped-1noh</guid>
      <description>&lt;p&gt;I’ve been playing around with a few things recently and wanted to share a real experience that genuinely surprised me.&lt;/p&gt;

&lt;p&gt;You might have seen the news from the Linux Foundation announcing the formation of the Agentic AI Foundation. As part of that, a few projects were donated into the foundation, including the Model Context Protocol (MCP), Agents.md, and Goose.&lt;/p&gt;

&lt;p&gt;I’m guessing a lot of people haven’t heard of Goose yet. I hadn’t either until recently, so I figured I’d dig into it and see what it’s actually about.&lt;/p&gt;

&lt;p&gt;If you want the full announcement, you can read it on the Linux Foundation website. This post isn’t about the announcement though — it’s about what happened when I tried to use Goose for real.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Goose?
&lt;/h2&gt;

&lt;p&gt;Goose was released in early 2025, so it’s still very new. It’s open source (which I love), and it’s a local-first AI agent framework. It combines language models with extensible tools like MCP so it can actually do things, not just talk about them.&lt;/p&gt;

&lt;p&gt;There’s both a desktop app and a CLI. I’m much more of a desktop app person than a CLI person, but everything seems to be going CLI these days, so I decided to give the Goose CLI a try.&lt;/p&gt;

&lt;p&gt;The docs live at block.github.io, and I did start reading them. Like most people, I got through the first bit and then thought, “I’ll figure it out as I go.”&lt;/p&gt;

&lt;p&gt;That decision led me directly into a very familiar kind of developer pain.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Goose Wouldn’t Run
&lt;/h2&gt;

&lt;p&gt;I installed the desktop app first and then followed the docs to install the CLI. The instructions said that after updating my &lt;code&gt;.zshrc&lt;/code&gt;, I should be able to run the &lt;code&gt;goose&lt;/code&gt; command.&lt;/p&gt;

&lt;p&gt;I couldn’t. No matter what I did, I kept getting:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;zsh: command not found: goose
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I sourced my &lt;code&gt;.zshrc&lt;/code&gt;. I ran &lt;code&gt;goose --help&lt;/code&gt;. Nothing worked. The agent kept telling me everything was done correctly, and I kept replying with some version of, “No, it’s not working.”&lt;/p&gt;

&lt;p&gt;This is where it got interesting.&lt;/p&gt;

&lt;h2&gt;
  
  
  Goose Didn’t Argue — It Investigated
&lt;/h2&gt;

&lt;p&gt;Instead of looping on generic advice, Goose pointed out something important: terminal config changes are only picked up when a new session starts. That’s something I &lt;em&gt;always&lt;/em&gt; forget.&lt;/p&gt;

&lt;p&gt;So I closed the terminal, opened a new one, and tried again.&lt;/p&gt;

&lt;p&gt;Still broken.&lt;/p&gt;

&lt;p&gt;At this point, Goose acknowledged that something wasn’t right. It explained that if sourcing the file and restarting the terminal didn’t work, then one of two things was probably happening:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The change didn’t save correctly&lt;/li&gt;
&lt;li&gt;Another startup config was overriding the PATH&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Honestly, that explanation alone would normally make me sigh and prepare to lose an hour.&lt;/p&gt;

&lt;p&gt;Instead, Goose suggested checking the file directly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Letting an AI Read My &lt;code&gt;.zshrc&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;Goose used a tool to open my &lt;code&gt;.zshrc&lt;/code&gt; and read it. I didn't need to install this tool. Reading &lt;code&gt;.zshrc&lt;/code&gt; files is not something I do often and I definitely don’t enjoy debugging them.&lt;/p&gt;

&lt;p&gt;Goose scanned through all the usual stuff, pnpm, bun, nvm, PATH exports  and then immediately spotted the problem.&lt;/p&gt;

&lt;p&gt;When it had added the Goose PATH export earlier, it didn’t include a newline. That meant the new export was stuck onto the end of the previous line, creating a syntax error.&lt;/p&gt;

&lt;p&gt;I wouldn’t have noticed that quickly. I was looking at the file and just seeing noise.&lt;/p&gt;

&lt;p&gt;Goose explained exactly what went wrong, showed me the broken line, and explained that it should really be two separate lines.&lt;/p&gt;

&lt;p&gt;Then it fixed it. It replaced the bad line with two clean, correctly formatted lines and even added a comment to make it clearer for the future.&lt;/p&gt;

&lt;p&gt;After that, Goose asked me to restart my terminal one more time. This time, when I typed goose, the command worked. I could see all the available commands. Sessions, MCP servers, bundled tools, everything was there.&lt;/p&gt;

&lt;p&gt;At that point, I just sat back for a second. Not because this was some massive, complex bug, but because this is exactly the kind of small, annoying issue that can completely derail your flow.&lt;/p&gt;

&lt;h2&gt;
  
  
   Why This Actually Matters
&lt;/h2&gt;

&lt;p&gt;If I had debugged this myself, I would have figured it out eventually. But it would have taken time, frustration, and a lot of trial and error. Instead, the agent noticed something was off. It inspected a real config file and identified a subtle syntax error, fixed it safely and then it explained what happened.&lt;/p&gt;

&lt;p&gt;That’s the difference between AI that answers questions and AI that actually helps you get unstuck.&lt;/p&gt;

&lt;p&gt;I also didn’t know Goose could edit files like this. Seeing it work through the problem step by step, without pretending everything was fine when it wasn’t, made a big difference.&lt;/p&gt;

&lt;p&gt;We don’t have the answers to everything as developers. That’s normal. What is changing is how quickly we can get unblocked.&lt;/p&gt;

&lt;p&gt;If something doesn’t feel right, push back. Say it’s not working. Let the agent iterate. Let it re-check assumptions. That’s where this starts to become genuinely useful.&lt;/p&gt;

&lt;p&gt;If you’re curious about Goose, it’s worth a look. And even if you’re not, this kind of experience is a good reminder that using AI well isn’t about shortcut, it’s about reducing unnecessary friction.&lt;/p&gt;

&lt;p&gt;That’s it. Have fun experimenting with AI.&lt;/p&gt;

&lt;h3&gt;
  
  
  Useful Links:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://block.github.io/goose/docs/getting-started/installation/" rel="noopener noreferrer"&gt;Goose&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.linuxfoundation.org/press/linux-foundation-announces-the-formation-of-the-agentic-ai-foundation" rel="noopener noreferrer"&gt;Linux Foudation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/fk6S4yLxzYU"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>webdev</category>
    </item>
    <item>
      <title>How I vibe code: Improving my site design with Goose and Gemini 3</title>
      <dc:creator>Debbie O'Brien</dc:creator>
      <pubDate>Thu, 20 Nov 2025 12:32:20 +0000</pubDate>
      <link>https://forem.com/debs_obrien/how-i-vibe-code-improving-my-site-design-with-goose-and-gemini-3-2a3k</link>
      <guid>https://forem.com/debs_obrien/how-i-vibe-code-improving-my-site-design-with-goose-and-gemini-3-2a3k</guid>
      <description>&lt;p&gt;OK this was so much fun: Googles Gemini 3 is amazing. just got it to redesign my home page. I was having fun with this one so no real idea what I wanted just vibing along. It gave me a matrix style hero component which blew me away. This is so cool and the fact that I can spend less than an hour to improve my personal site is insane. &lt;/p&gt;

&lt;p&gt;I used goose coding agent for this one which is open source and free and I just put my Gemini API key in which I am still using a free trial so my total cost for having fun was zero.&lt;/p&gt;

&lt;p&gt;Was quite impressed that by giving Goose the link to an image it just downloaded it for me and added it to my public folder of my site. One less tedious task for me to do.&lt;/p&gt;

&lt;p&gt;Towards the end I had the crazy idea of creating 7 hero component designs that change when you refresh the page. Why? Cause it's cool. This is maybe not how you build production apps but it sure is great for prototyping and getting to learn how new tools work and improving your communication with AI Agents and LLMs. &lt;/p&gt;

&lt;p&gt;I encourage you all to take time out of your day and play around. Build a personal site even if you never deploy it. Improve your personal site and modify the design just for fun. Have fun cause Gemini 3 is pretty amazing and the tools we have available to us right now is insane. &lt;/p&gt;

&lt;p&gt;And of course don't forget to run the Playwright healer agent after you have changed your design so your tests are updated. All it takes is a prompt. I didn't show it in this video but check out my other videos on Playwright Agents.&lt;/p&gt;

&lt;p&gt;Have fun and happy vibe coding&lt;/p&gt;

&lt;p&gt;Links:&lt;br&gt;
Goose: AI Coding Agent: &lt;a href="https://block.github.io/goose/" rel="noopener noreferrer"&gt;https://block.github.io/goose/&lt;/a&gt;&lt;br&gt;
Gemini 3: &lt;a href="https://blog.google/products/gemini/gemini-3/" rel="noopener noreferrer"&gt;https://blog.google/products/gemini/gemini-3/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/nSsBYokJefw"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

</description>
      <category>gemini</category>
      <category>ai</category>
      <category>webdev</category>
      <category>vibecoding</category>
    </item>
    <item>
      <title>Playwright MCP Servers Explained: Automation and Testing</title>
      <dc:creator>Debbie O'Brien</dc:creator>
      <pubDate>Mon, 17 Nov 2025 13:10:33 +0000</pubDate>
      <link>https://forem.com/debs_obrien/playwright-mcp-servers-explained-automation-and-testing-4mo0</link>
      <guid>https://forem.com/debs_obrien/playwright-mcp-servers-explained-automation-and-testing-4mo0</guid>
      <description>&lt;p&gt;Did you know Playwright has two MCP servers. Yes kinda confusing, let me explain it. The Playwright MCP server is great from Browser Automation, filling out forms for example or even using so LLM's can verify their work by opening the browser and taking a page snapshot to see it actually implemented what it said it did. It is built in to GitHub Copilot Coding Agent so if you assign a pr to Copilot it will use Playwright which you can see in the session logs. It is very cool indeed.&lt;/p&gt;

&lt;p&gt;Then we have another Playwright MCP server called Playwright Test MCP which is built into Playwright test and is for, yes you guessed it, testing. It has some similar tools as the Playwright MCP server but it also has other ones that you only need if you are testing. It starts running when you use the Playwright Agents, Planner, Generator and Healer. However this MCP server only supports TypeScript/JavaScript for now. &lt;/p&gt;

&lt;p&gt;So depending on your needs you can use one MCP server or the other. The Playwright MCP server you need to install while the Playwright Test MCP server is installed when you run an npx command when using the latest version of Playwright.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx playwright init-agents &lt;span class="nt"&gt;--loop&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;vscode
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The installing of the MCP server is done for you and it doesn't matter what other MCP server you have as the agent will only use the tools that it has assigned to it. &lt;/p&gt;

&lt;p&gt;Check out the docs for more info on how to get started. Have fun and happy testing with Playwright MCPs&lt;/p&gt;

&lt;p&gt;&lt;a href="https://playwright.dev/docs/test-agents" rel="noopener noreferrer"&gt;https://playwright.dev/docs/test-agents&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/microsoft/playwright-mcp" rel="noopener noreferrer"&gt;https://github.com/microsoft/playwright-mcp&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/U5Hsa6s2EqE"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

</description>
      <category>playwright</category>
      <category>testing</category>
      <category>ai</category>
      <category>mcp</category>
    </item>
  </channel>
</rss>
