<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Talal Bazerbachi</title>
    <description>The latest articles on Forem by Talal Bazerbachi (@talalbazerbachi).</description>
    <link>https://forem.com/talalbazerbachi</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3867433%2F13d4627a-4e65-4902-a241-6200dc628a67.jpg</url>
      <title>Forem: Talal Bazerbachi</title>
      <link>https://forem.com/talalbazerbachi</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/talalbazerbachi"/>
    <language>en</language>
    <item>
      <title>From Timeouts to Savings: How we optimized 24-page PDF parsing with Gemini &amp; OpenRouter</title>
      <dc:creator>Talal Bazerbachi</dc:creator>
      <pubDate>Wed, 08 Apr 2026 09:43:27 +0000</pubDate>
      <link>https://forem.com/talalbazerbachi/from-timeouts-to-savings-how-we-optimized-24-page-pdf-parsing-with-gemini-openrouter-271e</link>
      <guid>https://forem.com/talalbazerbachi/from-timeouts-to-savings-how-we-optimized-24-page-pdf-parsing-with-gemini-openrouter-271e</guid>
      <description>&lt;p&gt;I'm building &lt;a href="https://parsli.co" rel="noopener noreferrer"&gt;Parsli&lt;/a&gt;, a document parser SaaS that's powered by Google Gemini for intelligent document processing. &lt;br&gt;
Recently, a user hit a wall trying to process large, scanned PDFs. Here is the play-by-play of how we moved from a 4-minute timeout to a cost-effective, reliable pipeline.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Problem: The Single-Pass Failure
&lt;/h2&gt;

&lt;p&gt;Initially, we tried the "one big request" approach.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Step 1:&lt;/strong&gt; Sent a 24-page scanned PDF as a single base64 blob to Gemini 2.5 Pro.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Result:&lt;/strong&gt; 4+ minute hangs and serverless timeouts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Step 2:&lt;/strong&gt; Moved to a background worker (300s timeout). Still failed.&lt;/li&gt;
&lt;/ul&gt;


&lt;div class="crayons-card c-embed"&gt;

  &lt;br&gt;
&lt;strong&gt;Key Lesson:&lt;/strong&gt; Large multi-page documents cannot be treated as a single context window item if you want reliability.&lt;br&gt;

&lt;/div&gt;


&lt;h2&gt;
  
  
  The Pivot: Per-Page Chunking
&lt;/h2&gt;

&lt;p&gt;We decided to split the PDF into 24 individual pages and process them in parallel.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The Cost Trap (Structured JSON)
&lt;/h3&gt;

&lt;p&gt;Asking for structured JSON per page worked, but:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cost:&lt;/strong&gt; ~$3.12 per document.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token bloat:&lt;/strong&gt; 19,000 output tokens for simple JSON.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Solving the "502 Bad Gateway"
&lt;/h3&gt;

&lt;p&gt;We noticed OpenRouter/Vertex errors. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fix:&lt;/strong&gt; Added provider routing to prefer &lt;strong&gt;Google AI Studio&lt;/strong&gt; over Vertex. The errors vanished.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. The "Markdown" Breakthrough
&lt;/h3&gt;

&lt;p&gt;We changed the prompt from "Extract JSON" to "Convert to Markdown."&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Result:&lt;/strong&gt; Output tokens dropped from 12,000 to &lt;strong&gt;300&lt;/strong&gt; per page.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Accuracy:&lt;/strong&gt; Verification showed the OCR quality remained high.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Consolidating the Results
&lt;/h2&gt;

&lt;p&gt;We tried using Gemini Flash to merge the 24 pages back into a single JSON. It failed to handle the volume.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Current Production Solution:&lt;/strong&gt; &lt;br&gt;
For "extract everything" requests, we now skip consolidation. The concatenated per-page Markdown &lt;em&gt;is&lt;/em&gt; the output. It preserves layout and tables perfectly without the LLM overhead.&lt;/p&gt;
&lt;h2&gt;
  
  
  Research &amp;amp; What's Next
&lt;/h2&gt;

&lt;p&gt;While our current stack uses Gemini, our research suggests other models might be even faster:&lt;/p&gt;

&lt;p&gt;
  See the Benchmarks we are watching
  &lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;HunyuanOCR (0.9B):&lt;/strong&gt; Reportedly beats Gemini Pro at OCR fidelity.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PaddleOCR-VL:&lt;/strong&gt; Claims 253% faster throughput than competitors.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mistral OCR 3:&lt;/strong&gt; Competitive pricing on Vertex AI ($1-2/1k pages).
&lt;/li&gt;
&lt;/ul&gt;



&lt;/p&gt;
&lt;h3&gt;
  
  
  Future Routing Strategy
&lt;/h3&gt;

&lt;p&gt;We are testing a routing logic based on input size:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&amp;lt;10K tokens:&lt;/strong&gt; GPT-4o Nano&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&amp;gt;10K tokens:&lt;/strong&gt; Claude Haiku &lt;/li&gt;
&lt;/ul&gt;



&lt;p&gt;Are you building document parsers? I'd love to hear how you handle large file timeouts in the comments!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://parsli.co" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;Check out Parsli.co&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>ai</category>
      <category>gemini</category>
      <category>llm</category>
      <category>performance</category>
    </item>
  </channel>
</rss>
