<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Aakash Gour</title>
    <description>The latest articles on Forem by Aakash Gour (@aakash_gour).</description>
    <link>https://forem.com/aakash_gour</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3861923%2Ffa03288d-9c97-4d68-ab57-2485fc056a66.jpg</url>
      <title>Forem: Aakash Gour</title>
      <link>https://forem.com/aakash_gour</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/aakash_gour"/>
    <language>en</language>
    <item>
      <title>I Tested OpenAI, Anthropic, and Cohere for Bulk Content Generation. Here's What the Data Actually Shows.</title>
      <dc:creator>Aakash Gour</dc:creator>
      <pubDate>Thu, 16 Apr 2026 06:30:05 +0000</pubDate>
      <link>https://forem.com/aakash_gour/i-tested-openai-anthropic-and-cohere-for-bulk-content-generation-heres-what-the-data-actually-17j2</link>
      <guid>https://forem.com/aakash_gour/i-tested-openai-anthropic-and-cohere-for-bulk-content-generation-heres-what-the-data-actually-17j2</guid>
      <description>&lt;p&gt;My content pipeline needed to process 10,000 articles a month.&lt;br&gt;
I had three serious API options: OpenAI, Anthropic, and Cohere.&lt;/p&gt;

&lt;p&gt;Every comparison article I found online was either two years old, benchmarked on toy examples, or written by someone with a vendor relationship. So I ran my own.&lt;/p&gt;

&lt;p&gt;Three weeks, 4,200 test requests, one specific use case: bulk content generation at production scale. Here's what happened.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My Evaluation Criteria (And Why These Specific Metrics)&lt;/strong&gt;&lt;br&gt;
Before I get to the numbers, let me be clear about what I was optimizing for. "Best LLM API" is meaningless. Best for my use case is what I cared about:&lt;/p&gt;

&lt;p&gt;Output quality on structured content — I needed articles with consistent heading structure, tone, and word count. Not just fluent text.&lt;/p&gt;

&lt;p&gt;Cost per 1,000 words — At 10K articles/month, a $0.002 difference per article is $20/month. A $0.02 difference is $200/month.&lt;br&gt;
Latency (p50 and p95) — The p95 matters more than p50 for bulk work. One slow request holds up a queue.&lt;/p&gt;

&lt;p&gt;Instruction adherence — If I say "use h2 headers, not h3," does it actually do that across 1,000 requests? Or does it drift?&lt;br&gt;
Error rate over volume — Rate limit errors, context errors, malformed responses. What breaks at scale?&lt;/p&gt;

&lt;p&gt;I didn't test: coding tasks, reasoning, math, or anything multimodal. Those benchmarks exist everywhere. This one doesn't.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Test Setup&lt;/strong&gt;&lt;br&gt;
Same prompt, same word count target, same structural requirements, across all three providers. I wrote a simple Node.js harness to run the tests and log results to a SQLite database.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;Anthropic&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@anthropic-ai/sdk&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;OpenAI&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;openai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;CohereClient&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;cohere-ai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;Database&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;better-sqlite3&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Database&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;benchmark.db&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
  CREATE TABLE IF NOT EXISTS results (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    provider TEXT,
    model TEXT,
    prompt_tokens INTEGER,
    completion_tokens INTEGER,
    latency_ms INTEGER,
    cost_usd REAL,
    heading_count INTEGER,
    word_count INTEGER,
    error TEXT,
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP
  )
`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;runBenchmark&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;generateFn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;generateFn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;latency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;start&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// Count h2 headings to measure instruction adherence&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;headings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/^## /gm&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="p"&gt;[]).&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;words&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;&lt;span class="sr"&gt;+/&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;prepare&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
      INSERT INTO results (provider, model, prompt_tokens, completion_tokens, 
                           latency_ms, cost_usd, heading_count, word_count)
      VALUES (?, ?, ?, ?, ?, ?, ?, ?)
    `&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="nx"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;promptTokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completionTokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;latency&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cost&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;headings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;words&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;success&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;latency&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;prepare&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
      INSERT INTO results (provider, model, error) VALUES (?, ?, ?)
    `&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;unknown&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;success&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I ran 1,400 requests against each provider — enough to get stable percentiles and surface intermittent errors. The prompt asked for a 600-word article with exactly 3 ## sections and a specific tone. Straightforward structural requirements. The kind of thing you'd run 10,000 times.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Numbers&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4v2e1a3tk2ok7m9gcfm4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4v2e1a3tk2ok7m9gcfm4.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At 10,000 articles a month (averaging 600 words each), the cost difference between gpt-4o and gpt-4o-mini is roughly $167/month. That's not nothing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm4jo73yvust0v8lt3gyc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm4jo73yvust0v8lt3gyc.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The p95 numbers are where things get interesting. Cohere's command-r-plus had the highest p95 latency by a significant margin — nearly 2x OpenAI's gpt-4o and almost 4x Anthropic's claude-haiku-4-5. For synchronous use cases this would be painful. For queued bulk generation it's manageable, but you need to account for it in your timeout settings.&lt;/p&gt;

&lt;p&gt;Claude Haiku had the best p95 of any capable model. If latency matters more than cost in your use case, that's worth noting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Instruction Adherence&lt;/strong&gt;&lt;br&gt;
This is the metric nobody else was measuring. I asked for exactly 3 ## sections in every request.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn54ef8xmbbkd48tvsixn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn54ef8xmbbkd48tvsixn.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Claude Sonnet followed structural instructions the most consistently. This surprised me — I expected the output quality gap between Sonnet and Haiku to be larger than the instruction adherence gap. It wasn't. Haiku drifted noticeably more on structure.&lt;/p&gt;

&lt;p&gt;Cohere's models had the most drift. command-r would frequently add extra sections or collapse two sections into one. For casual content this is fine. For template-driven content pipelines where downstream parsing depends on consistent structure, it's a problem.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhuw52lip6zh1g5bc7lub.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhuw52lip6zh1g5bc7lub.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At 10,000 requests/month, a 2.8% error rate means 280 failed generations that need retries. That's not catastrophic, but it's a cost: retry logic, queue overhead, and the occasional job that fails three times and needs manual intervention.&lt;/p&gt;

&lt;p&gt;OpenAI and Anthropic both had error rates under 1% in my test. Cohere's error rate was high enough that I'd budget for retry infrastructure before relying on it at scale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Output Quality: The Part That's Hard to Put in a Table&lt;/strong&gt;&lt;br&gt;
I spot-checked 150 outputs across providers — 50 per provider, sampled across model tiers. I evaluated them on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tone consistency with the prompt&lt;/li&gt;
&lt;li&gt;Logical flow between sections&lt;/li&gt;
&lt;li&gt;Avoidance of filler phrases ("In conclusion...", "It's important to note...")&lt;/li&gt;
&lt;li&gt;Whether the content was actually useful or just plausible-sounding&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The honest assessment:&lt;/strong&gt;&lt;br&gt;
Claude Sonnet produced the most editable drafts. The structure was clean, the tone held throughout, and it was less prone to the kind of filler-heavy conclusions that make AI content feel generic. If I was generating content that humans would lightly edit before publishing, Sonnet gave editors the least work.&lt;/p&gt;

&lt;p&gt;GPT-4o was close behind. Slightly more verbose, occasionally padded, but strong structural instincts and good default tone. If you're already in the OpenAI ecosystem and using the Assistants API, there's no compelling reason to switch just for content generation.&lt;/p&gt;

&lt;p&gt;Claude Haiku surprised me on quality given its cost. The outputs weren't Sonnet-level, but they were significantly better than I expected from a model at that price point. For high-volume, lower-stakes content (product tags, meta descriptions, brief blurbs), Haiku is underrated.&lt;/p&gt;

&lt;p&gt;Cohere command-r-plus had the most inconsistent quality. Some outputs were excellent. Others had structural problems or tonal drift mid-article. For human-reviewed content pipelines this is manageable. For automated pipelines where content goes straight to a CMS, the variance is a real issue.&lt;/p&gt;

&lt;p&gt;GPT-4o-mini was fine. Not inspiring. Solid enough for use cases where you're generating high volumes of content that gets human reviewed anyway. At its price point, the quality-per-dollar ratio is hard to beat.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Retry Handler I Ended Up Writing&lt;/strong&gt;&lt;br&gt;
Every provider needs retry logic. Here's the one I landed on after testing various approaches:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;generateWithRetry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;generateFn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;maxRetries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;baseDelay&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;maxDelay&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;30000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;options&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="nx"&gt;maxRetries&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;attempt&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;generateFn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;isRateLimit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
        &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;429&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt;
        &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;rate limit&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt;
        &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;too many requests&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;isRetryable&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;isRateLimit&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;isRetryable&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nx"&gt;maxRetries&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;

      &lt;span class="c1"&gt;// Exponential backoff with jitter&lt;/span&gt;
      &lt;span class="c1"&gt;// Without jitter, retrying clients hit the API in waves and cause more rate limits&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;exponential&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;baseDelay&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;attempt&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;jitter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;random&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;exponential&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;delay&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;exponential&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;jitter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;maxDelay&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

      &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="s2"&gt;`Attempt &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; failed (&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;). Retrying in &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;delay&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;ms...`&lt;/span&gt;
      &lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;setTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;delay&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The jitter is important. I learned this the hard way: without it, rate-limited requests all retry at the same interval, which creates another burst that triggers another rate limit. Jitter spreads the retry load.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What Can Go Wrong (And Did)&lt;/strong&gt;&lt;br&gt;
Rate limits hit differently at scale than in testing. My test harness ran requests at a controlled rate. In production, queue drains aren't that clean — you get bursts when a lot of jobs land at once. I hit OpenAI rate limits in production that I never hit in testing because of this. Solution: implement a token bucket limiter, not just a fixed delay between requests.&lt;/p&gt;

&lt;p&gt;Instruction adherence degrades with longer prompts. The 3-section test used a clean, short prompt. When I added more context (brand guidelines, examples, negative constraints), adherence dropped across all models. Claude Sonnet held up best under prompt complexity. GPT-4o-mini degraded the most.&lt;/p&gt;

&lt;p&gt;Cohere's context window handling is different. A few of my requests hit a content filtering response that wasn't a standard API error — it returned a 200 with a specific response body structure. My generic error handler missed it and logged a successful request with garbled output. Read Cohere's error documentation more carefully than I did.&lt;/p&gt;

&lt;p&gt;Cold starts are real with Anthropic's API. A small percentage of Haiku requests (roughly 2-3% in my data) had latencies 3-5x higher than normal. Not errors — just slow. I don't know if this is model loading, infrastructure, or something else, but it showed up consistently enough to affect the p95.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My Recommendation (Conditional, As It Should Be)&lt;/strong&gt;&lt;br&gt;
For high-volume, cost-sensitive generation where content gets human review: &lt;em&gt;gpt-4o-mini or command-r.&lt;/em&gt; The cost savings are significant. The quality gap is real but acceptable if humans are in the loop.&lt;/p&gt;

&lt;p&gt;For high-volume, automated pipelines where structure consistency matters: &lt;em&gt;claude-haiku-4-5.&lt;/em&gt; Best p95 latency, solid instruction adherence, reasonable cost. The quality is better than the price suggests.&lt;/p&gt;

&lt;p&gt;For lower-volume, higher-quality generation that feeds into editorial workflows: &lt;em&gt;claude-sonnet-4-5.&lt;/em&gt; The instruction adherence and output editability are worth the cost premium when you're generating content that humans will touch.&lt;/p&gt;

&lt;p&gt;For Cohere: If you have a specific reason to use it (enterprise contract, data residency requirements, a use case where Command-R performs unusually well for your specific domain), fine. For general content generation benchmarked against my criteria, it didn't compete with the OpenAI and Anthropic options.&lt;/p&gt;

&lt;p&gt;One thing I'd do differently: I didn't test Anthropic's prompt caching for repeated system prompts. For bulk content generation where you're sending the same lengthy system prompt with each request, caching can significantly reduce input token costs. That's the next benchmark I'm running.&lt;/p&gt;

&lt;p&gt;What's your use case? I'm curious whether the instruction adherence numbers match what others are seeing — especially if you're doing high-volume structured generation. Different domains might surface different failure modes than content generation did.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>openai</category>
      <category>api</category>
      <category>benchmark</category>
    </item>
    <item>
      <title>How I Built a Keyword-to-Blog-Post Pipeline in Python (Under 50 Lines)</title>
      <dc:creator>Aakash Gour</dc:creator>
      <pubDate>Mon, 13 Apr 2026 11:24:34 +0000</pubDate>
      <link>https://forem.com/aakash_gour/how-i-built-a-keyword-to-blog-post-pipeline-in-python-under-50-lines-264c</link>
      <guid>https://forem.com/aakash_gour/how-i-built-a-keyword-to-blog-post-pipeline-in-python-under-50-lines-264c</guid>
      <description>&lt;p&gt;I had a list of 40 keywords and needed a blog post for each one.&lt;br&gt;
Writing them manually would take two weeks.&lt;br&gt;
Writing a script to generate them took one afternoon — and 47 lines of Python.&lt;br&gt;
Here's exactly how I built it.&lt;/p&gt;

&lt;p&gt;This isn't a tutorial about AI being magic. It's a tutorial about the specific, unsexy plumbing you need to turn a keyword into a structured, usable blog post — with retry logic, output formatting, and a folder of &lt;em&gt;.md&lt;/em&gt; files you can actually work with.&lt;/p&gt;

&lt;p&gt;By the end of this, you'll have a working script that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Takes a list of keywords from a &lt;em&gt;.txt&lt;/em&gt; file&lt;/li&gt;
&lt;li&gt;Sends a structured prompt to the OpenAI API&lt;/li&gt;
&lt;li&gt;Parses and saves each response as a Markdown file&lt;/li&gt;
&lt;li&gt;Handles rate limit errors without crashing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why this is harder than it looks&lt;/strong&gt;&lt;br&gt;
The naive version is 5 lines:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhmzz154k8p4rftwurucv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhmzz154k8p4rftwurucv.png" alt=" " width="800" height="172"&gt;&lt;/a&gt;&lt;br&gt;
That works exactly once, in a Jupyter notebook, for a demo.&lt;br&gt;
In practice, you hit three problems immediately:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Rate limits. OpenAI's default tier for &lt;em&gt;gpt-4o&lt;/em&gt; is 3 requests per minute. Try to fire 40 at once and you'll get a &lt;em&gt;RateLimitError&lt;/em&gt; on request 4.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Unstructured output. "Write a blog post" gets you anything from a 200-word paragraph to a 3,000-word essay with inconsistent headers. If you're using this content anywhere, you need predictable structure.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;No persistence. If the script crashes on keyword 22, you've lost the first 21. You need to write each output to disk as it completes.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The 47-line version handles all three.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prerequisites&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.9+&lt;/li&gt;
&lt;li&gt;An OpenAI API key&lt;/li&gt;
&lt;li&gt;openai library: &lt;em&gt;pip install openai&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's it. No frameworks, no databases, no Docker.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The setup&lt;/strong&gt;&lt;br&gt;
Create this file structure:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftrq5s6x3kiksn7cw3s0l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftrq5s6x3kiksn7cw3s0l.png" alt=" " width="800" height="126"&gt;&lt;/a&gt;&lt;br&gt;
Your &lt;em&gt;keywords.txt&lt;/em&gt; should look like this — one keyword per line:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7yd9pn8vqhnxe0wjwpdw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7yd9pn8vqhnxe0wjwpdw.png" alt=" " width="753" height="126"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The script&lt;/strong&gt;&lt;br&gt;
Here's &lt;em&gt;generate.py&lt;/em&gt; in full:&lt;br&gt;
import os&lt;br&gt;
import time&lt;br&gt;
from pathlib import Path&lt;br&gt;
from openai import OpenAI, RateLimitError&lt;/p&gt;

&lt;p&gt;client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])&lt;/p&gt;

&lt;p&gt;OUTPUT_DIR = Path("output")&lt;br&gt;
OUTPUT_DIR.mkdir(exist_ok=True)&lt;/p&gt;

&lt;p&gt;PROMPT_TEMPLATE = """Write a blog post about: "{keyword}"&lt;/p&gt;

&lt;p&gt;Use this exact structure:&lt;/p&gt;

&lt;h1&gt;
  
  
  [Title]
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;[2-3 sentences introducing the topic]&lt;/p&gt;

&lt;h2&gt;
  
  
  [Section 1 heading]
&lt;/h2&gt;

&lt;p&gt;[3-4 sentences]&lt;/p&gt;

&lt;h2&gt;
  
  
  [Section 2 heading]
&lt;/h2&gt;

&lt;p&gt;[3-4 sentences]&lt;/p&gt;

&lt;h2&gt;
  
  
  [Section 3 heading]
&lt;/h2&gt;

&lt;p&gt;[3-4 sentences]&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;[2-3 sentences wrapping up with a practical takeaway]&lt;/p&gt;

&lt;p&gt;Tone: conversational and practical. Avoid fluff. Total length: ~400 words."""&lt;/p&gt;

&lt;p&gt;def slugify(keyword: str) -&amp;gt; str:&lt;br&gt;
    return keyword.lower().replace(" ", "-").replace("/", "-")&lt;/p&gt;

&lt;p&gt;def generate_post(keyword: str, retries: int = 3) -&amp;gt; str:&lt;br&gt;
    for attempt in range(retries):&lt;br&gt;
        try:&lt;br&gt;
            response = client.chat.completions.create(&lt;br&gt;
                model="gpt-4o",&lt;br&gt;
                messages=[{"role": "user", "content": PROMPT_TEMPLATE.format(keyword=keyword)}],&lt;br&gt;
                max_tokens=700,&lt;br&gt;
                temperature=0.7,&lt;br&gt;
            )&lt;br&gt;
            return response.choices[0].message.content&lt;br&gt;
        except RateLimitError:&lt;br&gt;
            wait = 20 * (attempt + 1)  # back off: 20s, 40s, 60s&lt;br&gt;
            print(f"  Rate limit hit. Waiting {wait}s before retry {attempt + 1}/{retries}...")&lt;br&gt;
            time.sleep(wait)&lt;br&gt;
    raise RuntimeError(f"Failed to generate post for '{keyword}' after {retries} retries.")&lt;/p&gt;

&lt;p&gt;def run(keywords_file: str = "keywords.txt"):&lt;br&gt;
    keywords = Path(keywords_file).read_text().strip().splitlines()&lt;br&gt;
    print(f"Processing {len(keywords)} keywords...\n")&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;for i, keyword in enumerate(keywords, 1):
    output_path = OUTPUT_DIR / f"{slugify(keyword)}.md"

    if output_path.exists():
        print(f"[{i}/{len(keywords)}] Skipping '{keyword}' — already generated.")
        continue

    print(f"[{i}/{len(keywords)}] Generating: '{keyword}'")
    content = generate_post(keyword)
    output_path.write_text(content, encoding="utf-8")

    # OpenAI free tier: ~3 requests/minute. This keeps us just under.
    time.sleep(22)

print("\nDone. Check the output/ folder.")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;if &lt;strong&gt;name&lt;/strong&gt; == "&lt;strong&gt;main&lt;/strong&gt;":&lt;br&gt;
    run()&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What each part is actually doing&lt;/strong&gt;&lt;br&gt;
The prompt template is doing the real work here. "Write a blog post" is too open-ended — the model will vary wildly in length and structure. The template locks in a specific H2 structure, a word count target, and a tone instruction. Your output becomes predictable enough to actually use.&lt;/p&gt;

&lt;p&gt;The &lt;em&gt;retries&lt;/em&gt; loop in _generate_post _uses exponential-ish backoff — 20s, 40s, 60s — because OpenAI's rate limit errors are temporary. Most of the time, waiting 20 seconds is enough. The retry loop means the script keeps running instead of crashing and forcing you to restart.&lt;/p&gt;

&lt;p&gt;The &lt;em&gt;if output_path.exists(): continue&lt;/em&gt; check is the most important line for long runs. If you're processing 100 keywords and the script dies at #73, you don't want to regenerate the first 72. This check skips already-completed files and resumes from where you left off.&lt;/p&gt;

&lt;p&gt;The &lt;em&gt;time.sleep(22)&lt;/em&gt; at the bottom of the loop is tuned for the free tier rate limit. If you're on a paid OpenAI tier with higher limits, you can drop this to &lt;em&gt;time.sleep(2)&lt;/em&gt; or remove it entirely and let the retry logic handle any occasional errors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Running it&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffx8945fg35t6psxsdw01.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffx8945fg35t6psxsdw01.png" alt=" " width="800" height="111"&gt;&lt;/a&gt;&lt;br&gt;
Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0zhz0j6hj79pxujxifgd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0zhz0j6hj79pxujxifgd.png" alt=" " width="777" height="245"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Your &lt;em&gt;output/&lt;/em&gt; folder now has three &lt;em&gt;.md&lt;/em&gt; files, each with consistent structure:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fechj874fmowdybg6qwnu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fechj874fmowdybg6qwnu.png" alt=" " width="800" height="154"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What can go wrong&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;RateLimitError&lt;/em&gt; even with the sleep. This happens if you've been running the script multiple times in the same minute. The rate limit is per-minute across all your requests, not just this script. Fix: increase &lt;em&gt;time.sleep(22)&lt;/em&gt; to &lt;em&gt;time.sleep(30)&lt;/em&gt; if you're hitting it consistently.&lt;/p&gt;

&lt;p&gt;The model ignores your structure prompt. This happens more with &lt;em&gt;gpt-3.5-turbo&lt;/em&gt; than &lt;em&gt;gpt-4o.&lt;/em&gt; If you switch models to reduce cost, test with 5 keywords first and inspect the output structure before running it on your full list.&lt;/p&gt;

&lt;p&gt;You get back an empty string. Rare, but it happens. The &lt;em&gt;generate_post&lt;/em&gt; function returns the raw content string — add a check after the call: &lt;em&gt;if not content.strip(): raise ValueError(...)&lt;/em&gt; to catch and flag empty responses before they get written to disk as empty files.&lt;/p&gt;

&lt;p&gt;Special characters in keywords break the filename. The &lt;em&gt;slugify&lt;/em&gt; function handles spaces and forward slashes, but if your keywords have apostrophes, colons, or question marks, you'll get OS-level errors. Add &lt;em&gt;.replace("'", "").replace(":", "").replace("?", "")&lt;/em&gt; to the slugify function if your keyword list is user-generated.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I'd add next&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This script is intentionally minimal — 47 lines, no external dependencies beyond &lt;em&gt;openai.&lt;/em&gt; But if you're running this in production for more than a few hundred keywords, here's what breaks next:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Cost tracking. Add a token counter so you know what each run costs before you've spent $40 without noticing.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Quality validation. A second API call that checks whether the output meets a minimum quality bar (does it have all the required sections? Is it close to the target word count?). This sounds expensive but catching bad outputs early is cheaper than rewriting them manually.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Concurrent requests. The serial approach (one keyword at a time, sleep 22 seconds) is slow. With &lt;em&gt;asyncio&lt;/em&gt; and a proper rate limiter, you can process 3 keywords simultaneously and cut wall-clock time by ~60%.&lt;/p&gt;

&lt;p&gt;I'll cover the async version in the next post.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The full script on GitHub&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The complete source, with a &lt;em&gt;requirements.txt&lt;/em&gt; and a sample &lt;em&gt;keywords.txt&lt;/em&gt; to test with, is at:&lt;br&gt;
&lt;a href="//github.com/your-username/keyword-pipeline"&gt;&lt;/a&gt;&lt;br&gt;
Drop a star if it saved you time.&lt;/p&gt;

&lt;p&gt;Have you hit rate limits running keyword pipelines at scale? What's your retry strategy — and are you running requests serially or concurrently? Curious what others have landed on.&lt;/p&gt;

</description>
      <category>python</category>
      <category>automation</category>
      <category>ai</category>
      <category>contentwriting</category>
    </item>
  </channel>
</rss>
