<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: QAYS KADHIM</title>
    <description>The latest articles on Forem by QAYS KADHIM (@qays_kadhim_c3fea1c94957f).</description>
    <link>https://forem.com/qays_kadhim_c3fea1c94957f</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3800166%2Fad696a4b-647f-49b3-8be8-097a1c50aa14.jpg</url>
      <title>Forem: QAYS KADHIM</title>
      <link>https://forem.com/qays_kadhim_c3fea1c94957f</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/qays_kadhim_c3fea1c94957f"/>
    <language>en</language>
    <item>
      <title>I Tested My AI Ad Generator on 3 Completely Different Ad Formats — Here's What Actually Happened</title>
      <dc:creator>QAYS KADHIM</dc:creator>
      <pubDate>Tue, 03 Mar 2026 02:39:15 +0000</pubDate>
      <link>https://forem.com/qays_kadhim_c3fea1c94957f/i-tested-my-ai-ad-generator-on-3-completely-different-ad-formats-heres-what-actually-happened-1e0j</link>
      <guid>https://forem.com/qays_kadhim_c3fea1c94957f/i-tested-my-ai-ad-generator-on-3-completely-different-ad-formats-heres-what-actually-happened-1e0j</guid>
      <description>&lt;p&gt;I recently open-sourced &lt;a href="https://github.com/UrNas/advideo-creator" rel="noopener noreferrer"&gt;AdVideo Creator&lt;/a&gt;, a CLI tool that lets Claude generate complete video ads — script, images, voiceover, music, and final video — through a single prompt. In my &lt;a href="https://dev.to"&gt;first post&lt;/a&gt;, I walked through the architecture: 45 MCP tools, 5 quality gates, and a 15-step pipeline.&lt;/p&gt;

&lt;p&gt;The response was great. But one comment stuck with me:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Would love to see a follow-up post benchmarking output quality across different ad formats."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Fair point. Architecture posts are nice, but what actually comes out the other end? So I picked 3 very different ad scenarios, ran them through the full pipeline, and recorded everything — scores, retries, failures, and the final videos.&lt;/p&gt;

&lt;p&gt;Here's what happened.&lt;/p&gt;




&lt;h2&gt;
  
  
  The 3 Tests
&lt;/h2&gt;

&lt;p&gt;I deliberately chose formats that stress different parts of the pipeline:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Product Demo&lt;/th&gt;
&lt;th&gt;Storytelling&lt;/th&gt;
&lt;th&gt;CTA / Urgency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Product&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;HydroSync (smart water bottle)&lt;/td&gt;
&lt;td&gt;Ember &amp;amp; Oak (coffee roastery)&lt;/td&gt;
&lt;td&gt;SkillSprint (online courses)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Template&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Product Demo (5 scenes)&lt;/td&gt;
&lt;td&gt;Storytelling (5 scenes)&lt;/td&gt;
&lt;td&gt;Countdown/Urgency (4 scenes)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Platform&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;TikTok 1080×1920&lt;/td&gt;
&lt;td&gt;Instagram Reel 1080×1920&lt;/td&gt;
&lt;td&gt;Instagram Feed 1080×1080&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Duration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;15 seconds&lt;/td&gt;
&lt;td&gt;30 seconds&lt;/td&gt;
&lt;td&gt;15 seconds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Image Style&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Photorealistic&lt;/td&gt;
&lt;td&gt;Watercolor&lt;/td&gt;
&lt;td&gt;Flat-design&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Language&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;English&lt;/td&gt;
&lt;td&gt;English&lt;/td&gt;
&lt;td&gt;Arabic (RTL)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Voice&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;ElevenLabs Elli&lt;/td&gt;
&lt;td&gt;ElevenLabs Rachel&lt;/td&gt;
&lt;td&gt;ElevenLabs Adam&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each test uses a different template, platform, aspect ratio, image style, and voice. The Arabic test also throws RTL text rendering into the mix.&lt;/p&gt;




&lt;h2&gt;
  
  
  Test 1: Product Demo — HydroSync Smart Water Bottle
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The prompt:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Create a 15 second TikTok product demo ad for HydroSync — a smart water bottle that tracks your daily hydration and syncs with your phone app. Target audience is fitness-conscious millennials. Tone: energetic and modern.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;What happened:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This was the smoothest run. The script passed on the first attempt at 8.05/10. Claude wrote a tight 5-scene structure: bold product reveal, two feature highlights (hydration tracking, phone sync), a lifestyle benefit shot, and a CTA.&lt;/p&gt;

&lt;p&gt;Image generation was fast — all 5 scenes generated via Replicate Flux Schnell in about 2 seconds each. The photorealistic style produced clean, product-shot-style images that scored 9.88/10 average. Voiceover landed at 9.67/10 on the first try.&lt;/p&gt;

&lt;p&gt;The final video exported at 14.4 seconds, 1080×1920, 12.9 MB. Hardware acceleration kicked in via Apple VideoToolbox.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The catch:&lt;/strong&gt; The pipeline hit the 20-tool round limit before it could add subtitles or run the final composition scoring. The video still exported fine — it just skipped those last two steps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Verdict:&lt;/strong&gt; Product demos are the tool's sweet spot. Clear features, simple structure, photorealistic images — everything lines up.&lt;/p&gt;




&lt;h2&gt;
  
  
  Test 2: Storytelling — Ember &amp;amp; Oak Coffee Roastery
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The prompt:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Create a 30 second Instagram Reel storytelling ad for Ember &amp;amp; Oak, a small-batch coffee roastery that partners directly with farmers in Colombia. The story should follow a farmer's journey from harvest to your cup. Tone: warm, authentic, emotional.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;What happened:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is where the self-grading system proved its value.&lt;/p&gt;

&lt;p&gt;The first script scored 7.7/10. The grading system flagged two specific problems: the hook was generic (7/10) and the CTA was weak (6/10). Claude rewrote the script. The new hook — a pattern interrupt about coffee traveling 3,000 miles — scored 9/10. The CTA got specific. Version 2 passed at 8.4/10.&lt;/p&gt;

&lt;p&gt;The watercolor image style was interesting. Four of the five scenes looked cohesive and atmospheric. Scene 2 (the discovery scene) scored lowest at 7.91 — slightly less watercolor consistency than the others. The average still held strong at 9.36/10.&lt;/p&gt;

&lt;p&gt;Voiceover had a hiccup. The first attempt ran 32.79 seconds — almost 3 seconds over the 30-second target. The quality gate caught it, auto-shortened the text, and the retry came in at 29.95 seconds with a 9.0/10 score.&lt;/p&gt;

&lt;p&gt;This was the only test where the full pipeline completed — including subtitles and composition scoring (8.35/10). The final video landed at exactly 30.0 seconds, 27.5 MB.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Verdict:&lt;/strong&gt; Storytelling ads need more iteration, but the quality gates handle it. The self-grading loop catching the weak hook is exactly what you want from an automated system.&lt;/p&gt;




&lt;h2&gt;
  
  
  Test 3: CTA / Urgency — SkillSprint Flash Sale (Arabic)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The prompt:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Create a 15 second Instagram Feed ad in Arabic for SkillSprint — an online learning platform running a 48-hour flash sale with 60% off all courses. Target audience: Arabic-speaking young professionals. Tone: urgent and exciting.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;What happened:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This was the hardest test by design — Arabic RTL, urgency template, square format, flat-design style. I wanted to push the tool.&lt;/p&gt;

&lt;p&gt;The script passed first try at 8.4/10. Urgency ads have a clear structure (limited offer → value → scarcity → CTA), and Claude wrote strong Arabic copy with the right energy.&lt;/p&gt;

&lt;p&gt;Then the voiceover became a challenge. Attempt 1 came back at 21.69 seconds — over 6 seconds too long for a 15-second ad. The quality gate caught it and auto-shortened. Attempt 2 scored 7.24/10 — below the 7.5 threshold due to pacing issues. Attempt 3 finally passed at 7.55/10 with 14.35 seconds duration.&lt;/p&gt;

&lt;p&gt;Three attempts for voiceover. That's the most retries across all tests.&lt;/p&gt;

&lt;p&gt;The cross-asset consistency check scored 6.45/10 — just below the 6.5 threshold. It flagged color palette variations between flat-design scenes. The pipeline noted it needs review but continued to export.&lt;/p&gt;

&lt;p&gt;The final video: 14.4 seconds, 1080×1080, 6.4 MB. RTL text overlays rendered correctly with &lt;code&gt;lang: ar&lt;/code&gt;. Arabic metadata and hashtags were generated automatically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Verdict:&lt;/strong&gt; Arabic ads work, but they're the hardest path. Voice generation needs more attempts, and flat-design consistency across scenes is trickier than photorealistic or watercolor. The pipeline handles it — it just works harder.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Numbers Side by Side
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Product Demo&lt;/th&gt;
&lt;th&gt;Storytelling&lt;/th&gt;
&lt;th&gt;CTA/Urgency (Arabic)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Script grade&lt;/td&gt;
&lt;td&gt;8.05/10&lt;/td&gt;
&lt;td&gt;8.4/10 (v2)&lt;/td&gt;
&lt;td&gt;8.4/10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Script iterations&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Image quality (avg)&lt;/td&gt;
&lt;td&gt;9.88/10&lt;/td&gt;
&lt;td&gt;9.36/10&lt;/td&gt;
&lt;td&gt;8.99/10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Voice quality&lt;/td&gt;
&lt;td&gt;9.67/10&lt;/td&gt;
&lt;td&gt;9.0/10&lt;/td&gt;
&lt;td&gt;7.55/10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Voice retries&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Music quality&lt;/td&gt;
&lt;td&gt;8.1/10&lt;/td&gt;
&lt;td&gt;8.1/10&lt;/td&gt;
&lt;td&gt;7.98/10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Consistency score&lt;/td&gt;
&lt;td&gt;9.25/10&lt;/td&gt;
&lt;td&gt;7.65/10&lt;/td&gt;
&lt;td&gt;6.45/10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pipeline time&lt;/td&gt;
&lt;td&gt;~5 min&lt;/td&gt;
&lt;td&gt;~6 min&lt;/td&gt;
&lt;td&gt;~3 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;File size&lt;/td&gt;
&lt;td&gt;12.9 MB&lt;/td&gt;
&lt;td&gt;27.5 MB&lt;/td&gt;
&lt;td&gt;6.4 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A clear pattern: simpler formats score higher, but the quality gates keep complex formats in check.&lt;/p&gt;




&lt;h2&gt;
  
  
  5 Things I Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Self-grading is the most valuable feature.&lt;/strong&gt;&lt;br&gt;
The storytelling test proved it. A 7.7 script became an 8.4 script because the system knew the hook was weak. Without that feedback loop, the first draft would have gone straight to production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Voice generation is the bottleneck for non-English.&lt;/strong&gt;&lt;br&gt;
English voiceovers passed on the first try in both tests. Arabic needed 3 attempts. The issue is duration estimation — Arabic speech pacing differs from English, and the first-pass text is often too long. This is a clear area for improvement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Photorealistic is the easiest style for consistency.&lt;/strong&gt;&lt;br&gt;
The product demo scored 9.25 on consistency. Watercolor dropped to 7.65. Flat-design hit 6.45. Stylized images have more variance between scenes, which makes cross-scene consistency harder. A style-locking mechanism could help here.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. The tool limit is a real constraint.&lt;/strong&gt;&lt;br&gt;
Two of three tests hit the 20-tool round limit before completing subtitles and composition scoring. The videos still exported fine, but the pipeline should be optimized to fit within fewer tool calls — or the limit needs to increase.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Every ad format exported a real video.&lt;/strong&gt;&lt;br&gt;
Despite all the retries and edge cases, every test produced a platform-ready video with correct specs. That's the baseline promise, and it held.&lt;/p&gt;


&lt;h2&gt;
  
  
  What I'd Improve Next
&lt;/h2&gt;

&lt;p&gt;Based on these tests, here's what's on the roadmap:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Arabic voice calibration&lt;/strong&gt; — Pre-calculate duration estimates using Arabic-specific WPM ranges to reduce retries&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Style consistency locking&lt;/strong&gt; — Extract color palette and visual parameters from Scene 1 and enforce them across all subsequent scenes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pipeline optimization&lt;/strong&gt; — Reduce tool calls by batching operations (generate all images in one call, grade them in one call)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Subtitle fallback&lt;/strong&gt; — Prioritize subtitle generation over composition scoring when approaching the tool limit&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;The tool is open source. Pick one of these three prompts, run it, and see what comes out:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/UrNas/advideo-creator.git
&lt;span class="nb"&gt;cd &lt;/span&gt;advideo-creator
&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env  &lt;span class="c"&gt;# Add your API keys&lt;/span&gt;
uv run main.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Star the repo if you find it useful. Open an issue if something breaks. PRs are welcome.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/UrNas/advideo-creator" rel="noopener noreferrer"&gt;GitHub: UrNas/advideo-creator&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is part 2 of a series on building AI-powered ad generation. Part 1 covered the architecture. Part 3 will go deeper on the quality gate system and how self-grading actually works under the hood.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>beginners</category>
      <category>python</category>
    </item>
    <item>
      <title>I Built an AI Video Ad Generator with Claude + MCP — Here's the Architecture</title>
      <dc:creator>QAYS KADHIM</dc:creator>
      <pubDate>Sun, 01 Mar 2026 16:38:50 +0000</pubDate>
      <link>https://forem.com/qays_kadhim_c3fea1c94957f/i-built-an-ai-video-ad-generator-with-claude-mcp-heres-the-architecture-1kei</link>
      <guid>https://forem.com/qays_kadhim_c3fea1c94957f/i-built-an-ai-video-ad-generator-with-claude-mcp-heres-the-architecture-1kei</guid>
      <description>&lt;p&gt;I wanted to see what happens when you give Claude real tools — not a weather API, not a todo app — but image generation, voice synthesis, video composition, and quality grading. Could it orchestrate a full creative pipeline from a single prompt?&lt;/p&gt;

&lt;p&gt;The result is &lt;strong&gt;AdVideo Creator&lt;/strong&gt;: an open-source CLI where you type "create a 15-second TikTok ad for artisan coffee" and get back a finished &lt;code&gt;.mp4&lt;/code&gt; file. Script, images, voiceover, music, transitions, subtitles — all generated and composed automatically.&lt;/p&gt;

&lt;p&gt;Here's how it works under the hood.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem: Claude Has No Hands
&lt;/h2&gt;

&lt;p&gt;Claude can write an excellent marketing script. Give it a product, a target audience, and a tone — it'll produce a hook, emotional beats, and a call to action that actually works.&lt;/p&gt;

&lt;p&gt;Then what?&lt;/p&gt;

&lt;p&gt;You still need images. A voiceover. Background music. Video editing. Platform-specific export. And if the script doesn't fit the timing after you lay it over the visuals, you go back to Claude, ask for a rewrite, and start the cycle again.&lt;/p&gt;

&lt;p&gt;This is the gap between "AI chatbot" and "AI application." Claude can &lt;em&gt;think&lt;/em&gt; about your ad, but it can't &lt;em&gt;make&lt;/em&gt; it. It has no hands.&lt;/p&gt;

&lt;h2&gt;
  
  
  Giving Claude Hands with MCP
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://modelcontextprotocol.io/" rel="noopener noreferrer"&gt;Model Context Protocol&lt;/a&gt; solves this. MCP is an open protocol that defines how AI models discover and use external tools. Think of it like HTTP but for AI capabilities — a standardized way for a client (the AI) and a server (the tools) to talk to each other.&lt;/p&gt;

&lt;p&gt;The architecture is simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────┐
│           CLIENT (Python)           │
│  User ←→ Claude API ←→ Tool Router │
└──────────────┬──────────────────────┘
               │ stdio (JSON-RPC)
┌──────────────┴──────────────────────┐
│           MCP SERVER (Python)       │
│  Image │ Voice │ Video │ Grading   │
│  Stock │ Brand │ Cache │ System    │
└─────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;strong&gt;client&lt;/strong&gt; handles conversation with Claude. The &lt;strong&gt;server&lt;/strong&gt; handles doing things — generating images, producing voiceover, composing video. They communicate through stdio using the MCP protocol.&lt;/p&gt;

&lt;p&gt;Claude never talks to the server directly. The client is always the intermediary: Claude decides what tools to call, the client routes those calls to the server, and the server executes them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 15-Step Pipeline
&lt;/h2&gt;

&lt;p&gt;When you ask for an ad, Claude doesn't just run one tool. It orchestrates a 15-step pipeline, calling different tools at each stage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Brief → Template → Project → Script → Grade → Iterate → Save
  → Images (Gate 1) → Voiceover (Gate 2) → Music (Gate 3)
  → Consistency (Gate 4) → Compose (Gate 5)
  → Subtitles → Export → Deliver
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The critical thing: &lt;strong&gt;Claude decides the order, not the code.&lt;/strong&gt; There's no hardcoded workflow. Claude sees all 45 tools and their descriptions, and it figures out which ones to call and when. The system prompt gives it a recommended pipeline, but Claude adapts — if the user imports their own images, it skips image generation. If they want stock footage instead of AI images, it searches Pexels.&lt;/p&gt;

&lt;p&gt;Here's what a real session looks like behind the scenes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User: "Create a 15s TikTok ad for artisan coffee beans"

Claude → create_project("coffee-ad", "tiktok", 15)
Claude → get_ad_template("problem-agitate-solve")
Claude → save_script(project_id, script_json)
Claude → search_stock_video("tired person morning")
Claude → use_stock_video(project_id, scene_0, video_id)
Claude → generate_scene_image(project_id, 1, "coffee bag close-up...")
Claude → evaluate_scene_image(project_id, 1)        ← Quality Gate
Claude → generate_scene_image(project_id, 2, "person smiling...")
Claude → evaluate_scene_image(project_id, 2)        ← Quality Gate
Claude → generate_voiceover(project_id, script_text)
Claude → evaluate_voiceover(project_id)              ← Quality Gate
Claude → generate_background_music(project_id, "energetic")
Claude → evaluate_background_music(project_id)       ← Quality Gate
Claude → evaluate_asset_consistency(project_id)       ← Quality Gate
Claude → compose_video(project_id, timeline)
Claude → evaluate_composition(project_id)             ← Quality Gate
Claude → add_subtitles(project_id, "word_highlight")
Claude → export_video(project_id, "tiktok")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's ~18 tool calls from a single user message. Each one goes through the MCP protocol: Claude emits a tool call → client routes to server → server executes → result goes back to Claude → Claude decides what's next.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Part I'm Most Proud Of: Quality Gates
&lt;/h2&gt;

&lt;p&gt;Here's where it gets interesting. Most AI pipelines generate output and hope for the best. AdVideo Creator has &lt;strong&gt;5 quality gates&lt;/strong&gt; that grade every generated asset automatically:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Gate&lt;/th&gt;
&lt;th&gt;What It Checks&lt;/th&gt;
&lt;th&gt;Pass Threshold&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Scene Image&lt;/td&gt;
&lt;td&gt;CLIP similarity to prompt, safe-zone compliance, framing&lt;/td&gt;
&lt;td&gt;7.0/10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Voiceover&lt;/td&gt;
&lt;td&gt;Whisper transcription vs script, WPM pacing, duration fit&lt;/td&gt;
&lt;td&gt;7.5/10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Background Music&lt;/td&gt;
&lt;td&gt;BPM, duration match, loop quality, mix compatibility&lt;/td&gt;
&lt;td&gt;7.0/10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-Asset Consistency&lt;/td&gt;
&lt;td&gt;Color palette coherence, pacing alignment, energy match&lt;/td&gt;
&lt;td&gt;6.5/10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Final Composition&lt;/td&gt;
&lt;td&gt;Duration accuracy, audio balance, platform spec compliance&lt;/td&gt;
&lt;td&gt;7.5/10&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;When an asset fails a gate, Claude retries — but not randomly. The system follows a &lt;strong&gt;drift prevention&lt;/strong&gt; rule: always retry from the &lt;em&gt;original&lt;/em&gt; parameters with a targeted fix, never modify the previous retry's parameters. This prevents the common problem where each retry drifts further from the creative direction.&lt;/p&gt;

&lt;p&gt;For images, the fix is additive — append a composition hint like "leave center space for text." For voiceover, it's subtractive — shorten the text if pacing is too fast. For music, it's a swap — try a different mood keyword. For consistency, it's surgical — only regenerate the outlier assets.&lt;/p&gt;

&lt;p&gt;The graders themselves use real signal processing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Image grading&lt;/strong&gt;: CLIP similarity score between the prompt and generated image, plus safe-zone compliance checking that important content isn't cut off at platform edges&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Voiceover grading&lt;/strong&gt;: Whisper transcription compared against the original script text, words-per-minute checking against language-specific ranges (English: 130-170 WPM, Arabic: 100-140 WPM)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Music grading&lt;/strong&gt;: librosa for BPM extraction, pydub for loudness analysis and loop-point detection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consistency grading&lt;/strong&gt;: K-means clustering on color palettes across all scene images, BPM-to-pacing correlation&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Script Self-Grading
&lt;/h2&gt;

&lt;p&gt;Before any assets are generated, Claude grades its own script on 6 marketing criteria:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Criteria&lt;/th&gt;
&lt;th&gt;Weight&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Hook Strength&lt;/td&gt;
&lt;td&gt;25%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Emotional Appeal&lt;/td&gt;
&lt;td&gt;20%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CTA Clarity&lt;/td&gt;
&lt;td&gt;20%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Audience Targeting&lt;/td&gt;
&lt;td&gt;15%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pacing &amp;amp; Flow&lt;/td&gt;
&lt;td&gt;10%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memorability&lt;/td&gt;
&lt;td&gt;10%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Scripts must score &lt;strong&gt;8.0/10&lt;/strong&gt; or higher. If they don't, Claude identifies the weakest criterion and rewrites targeting that specific weakness — up to 3 iterations. This means the script is already strong before the expensive image and voice generation starts.&lt;/p&gt;

&lt;p&gt;The grading rubric lives in the MCP server as a resource (&lt;code&gt;config://grading-rubric&lt;/code&gt;), not hardcoded in the prompt. Claude reads it at runtime. This means you can modify the rubric without touching any code.&lt;/p&gt;

&lt;h2&gt;
  
  
  8 Ad Templates
&lt;/h2&gt;

&lt;p&gt;Claude doesn't write scripts from scratch — it uses proven frameworks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Problem-Agitate-Solve&lt;/strong&gt; — Hook with pain point, amplify the problem, reveal the solution&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Before/After&lt;/strong&gt; — Show the transformation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Testimonial&lt;/strong&gt; — Social proof format&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Product Demo&lt;/strong&gt; — Feature showcase&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trend Hijack&lt;/strong&gt; — Ride a current trend&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Countdown/Urgency&lt;/strong&gt; — Limited time offers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storytelling&lt;/strong&gt; — Mini narrative arc&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;UGC Style&lt;/strong&gt; — Raw, authentic feel&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each template defines a scene structure — how many scenes, what each scene should contain, where the hook goes, where the CTA lands. Claude selects the best template for the product type and follows its structure while adapting the content.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multi-Backend Architecture
&lt;/h2&gt;

&lt;p&gt;The tool has tiered fallbacks for each capability:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Image generation:&lt;/strong&gt; Replicate (Flux Schnell, ~1-2s, ~$0.003/image) → HuggingFace (free, ~3-5s) → Local SDXL (free, requires GPU)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Voice synthesis:&lt;/strong&gt; ElevenLabs (ultra-natural, ~$0.06/ad) → OpenAI TTS (~$0.003/ad)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stock video:&lt;/strong&gt; Pexels API (free, 200 req/hour)&lt;/p&gt;

&lt;p&gt;The factory pattern makes this transparent — &lt;code&gt;create_image_engine()&lt;/code&gt; checks which API keys are available and returns the best backend. Add a new key to &lt;code&gt;.env&lt;/code&gt; and the entire pipeline upgrades automatically. Remove it and it gracefully falls back.&lt;/p&gt;

&lt;p&gt;The minimum setup is just an Anthropic API key. Everything else is optional. You can generate a complete ad for as little as &lt;strong&gt;$0.01&lt;/strong&gt; (Anthropic only, no images/voice) or &lt;strong&gt;$0.10-$0.15&lt;/strong&gt; with all premium backends.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multilingual: Arabic RTL Support
&lt;/h2&gt;

&lt;p&gt;This was the hardest engineering challenge. The pipeline supports full Arabic ads with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;RTL text rendering&lt;/strong&gt; — Pillow's HarfBuzz backend with Noto Sans Arabic font, automatic text reshaping&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-language voice defaults&lt;/strong&gt; — Arabic uses ElevenLabs &lt;code&gt;eleven_multilingual_v2&lt;/code&gt; with stability tuned to 0.50 (vs 0.35 default) for more consistent Arabic pronunciation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Language-aware grading&lt;/strong&gt; — Arabic has different WPM ranges (100-140 vs English's 130-170), and the voiceover grader normalizes Arabic text (strips tashkeel, normalizes hamza) before comparing against Whisper transcription&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Not many AI tools handle RTL correctly. Getting Arabic subtitles to render properly over video, with the right font and correct text direction, required diving deep into Pillow's text rendering internals.&lt;/p&gt;

&lt;h2&gt;
  
  
  The MCP Server: 45 Tools, 12 Resources
&lt;/h2&gt;

&lt;p&gt;The server exposes everything through MCP's three primitives:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tools (45)&lt;/strong&gt; — actions Claude can take. Project management, image generation, voice synthesis, video composition, quality grading, brand profiles, stock video search, asset import, cache management.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Resources (12)&lt;/strong&gt; — read-only data Claude can access. Platform specs, style presets, grading rubrics, pricing info, voice catalogs, ad templates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompts (8)&lt;/strong&gt; — reusable instruction templates. The main system prompt with the 15-step workflow, the script grader, the asset grader with drift prevention rules.&lt;/p&gt;

&lt;p&gt;The key design decision: everything is discoverable at runtime. When the client connects, it calls &lt;code&gt;tools/list&lt;/code&gt; and gets back all 45 tools with their schemas. It calls &lt;code&gt;resources/list&lt;/code&gt; and gets all 12 resources. Claude sees everything and decides what to use. Add a new tool to the server? Claude picks it up on the next connection.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;Building this taught me patterns that apply to any AI application:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude is better at orchestration than you'd expect.&lt;/strong&gt; Given clear tool descriptions and a recommended workflow, Claude makes remarkably good decisions about which tools to call and in what order. The key is writing descriptive tool descriptions — Claude reads them carefully.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quality gates change everything.&lt;/strong&gt; Without them, you get "generate and pray." With them, you get consistent, predictable output. The cost overhead is small (~5-10% of total pipeline cost for grading) and the quality improvement is significant.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Drift prevention matters.&lt;/strong&gt; When retrying failed generations, always go back to the original parameters and apply a targeted fix. Never modify the previous retry's output. This single rule eliminated most of our "the 3rd retry looks nothing like what was requested" problems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MCP's separation of concerns pays off.&lt;/strong&gt; Building the server independently from the client made development much faster. I could test every tool with MCP Inspector (a web UI) without making a single Claude API call. And the same server works with Claude Desktop, no modifications needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;AdVideo Creator is MIT licensed and open source:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/UrNas/advideo-creator" rel="noopener noreferrer"&gt;github.com/UrNas/advideo-creator&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Minimum setup: Python 3.12+, FFmpeg, and an Anthropic API key. That's it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/UrNas/advideo-creator.git
&lt;span class="nb"&gt;cd &lt;/span&gt;advideo-creator
uv &lt;span class="nb"&gt;sync
cp&lt;/span&gt; .env.example .env    &lt;span class="c"&gt;# add your ANTHROPIC_API_KEY&lt;/span&gt;
uv run python main.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add more API keys (Replicate, ElevenLabs, OpenAI, Pexels) to unlock premium features.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If you're interested in learning how to build this kind of AI application from scratch — tool design, agentic loops, quality gates, engine abstractions — I'm working on a full course covering every module in detail. Star the repo and follow for updates.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>beginners</category>
      <category>python</category>
    </item>
  </channel>
</rss>
