<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Steven Tsao</title>
    <description>The latest articles on Forem by Steven Tsao (@steventsao).</description>
    <link>https://forem.com/steventsao</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F441072%2Fa72a0047-8c1f-4382-8c75-731bbaefe1d6.png</url>
      <title>Forem: Steven Tsao</title>
      <link>https://forem.com/steventsao</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/steventsao"/>
    <language>en</language>
    <item>
      <title>Your AI Agent Shouldn't Parse PDFs. Delegate to a Subagent.</title>
      <dc:creator>Steven Tsao</dc:creator>
      <pubDate>Wed, 04 Mar 2026 23:46:26 +0000</pubDate>
      <link>https://forem.com/steventsao/pdf-sdk-for-ai-agents-b13</link>
      <guid>https://forem.com/steventsao/pdf-sdk-for-ai-agents-b13</guid>
      <description>&lt;p&gt;&lt;em&gt;Written with Claude&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I started by building an AI sandbox for financial reports.&lt;/p&gt;

&lt;p&gt;Upload a PDF, extract text and tables, run analysis, let an agent answer questions. Simple enough — until I looked at where the data was going.&lt;/p&gt;

&lt;p&gt;A typical pipeline:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Upload PDF to blob storage&lt;/li&gt;
&lt;li&gt;Send to a parser service&lt;/li&gt;
&lt;li&gt;Store extracted text somewhere else&lt;/li&gt;
&lt;li&gt;Render page images in another service&lt;/li&gt;
&lt;li&gt;Run inference in yet another&lt;/li&gt;
&lt;li&gt;Save chat results in a sixth place&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Five iterations later, I had 5x more Markdown and image artifacts scattered across subprocessors, with no reliable way to trace them back to a single document. For financial PDFs, that's a compliance problem.&lt;/p&gt;

&lt;p&gt;So I colocated everything.&lt;/p&gt;




&lt;h2&gt;
  
  
  One document, one URL
&lt;/h2&gt;

&lt;p&gt;When you upload a PDF to OkraPDF, you get a single base URL. Everything lives under it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://api.okrapdf.com/document/{docId}/
  /chat/completions   ← OpenAI-compatible query endpoint
  /status              ← Processing state
  /pages               ← Page images
  /nodes               ← Extracted entities
  /export              ← Markdown, Excel, DOCX
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Any client that speaks the OpenAI protocol can use this directly — the OpenAI SDK, Vercel AI SDK, LangChain, or curl. Each document &lt;em&gt;is&lt;/em&gt; a model endpoint.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Upload&lt;/span&gt;
&lt;span class="nv"&gt;DOC_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-X&lt;/span&gt; POST https://api.okrapdf.com/v1/documents &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$OKRA_API_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-F&lt;/span&gt; &lt;span class="s2"&gt;"file=@report.pdf"&lt;/span&gt; | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.documentId'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Query&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST https://api.okrapdf.com/document/&lt;span class="nv"&gt;$DOC_ID&lt;/span&gt;/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$OKRA_API_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"messages":[{"role":"user","content":"What was net income?"}]}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three HTTP calls. One subprocessor. Your agent doesn't need to know anything about PDFs — it just calls an endpoint.&lt;/p&gt;




&lt;h2&gt;
  
  
  The surprise: colocation made it faster
&lt;/h2&gt;

&lt;p&gt;I moved the pipeline into an edge runtime and colocated storage, parsing, rendering, and inference coordination into one place. The goal was privacy — fewer services touching sensitive documents.&lt;/p&gt;

&lt;p&gt;The surprise was performance. Because core logic runs through bindings instead of network hops between services, everything got noticeably faster. Page image rendering happens in the same runtime as the source PDF. Chat completions read from a colocated database, not a remote Postgres.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Colocate more = better isolation + better performance.&lt;/strong&gt; That's the architectural bet.&lt;/p&gt;




&lt;h2&gt;
  
  
  Per-document control
&lt;/h2&gt;

&lt;p&gt;Real-world PDFs are messy. A single global config doesn't work when one file is a clean digital report and the next is a scanned 1990s filing.&lt;/p&gt;

&lt;p&gt;With OkraPDF, config is per-document:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Parsing strategy&lt;/strong&gt; — choose based on PDF complexity, change it later without re-uploading&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vendor selection&lt;/strong&gt; — pick the AI vendor per document. Need a BAA vendor for medical records? Use it for those docs only. That's the only vendor that sees those bytes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-document chat&lt;/strong&gt; — query any document directly without app-level routing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Edge previews&lt;/strong&gt; — page images rendered in the same binding as the source PDF&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And because everything is colocated under one document ID, deletion is atomic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;DELETE /document/{docId}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;PDF, derived markdown, preview images, chat history — gone in one call. No orphaned artifacts across 5 services.&lt;/p&gt;




&lt;h2&gt;
  
  
  Real numbers
&lt;/h2&gt;

&lt;p&gt;We ran the full &lt;a href="https://huggingface.co/datasets/PatronusAI/financebench" rel="noopener noreferrer"&gt;FinanceBench&lt;/a&gt; evaluation — 129 questions across 10 SEC filings:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Pass rate&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;86.8%&lt;/strong&gt; (112/129)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost per question&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.009&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Total eval cost&lt;/td&gt;
&lt;td&gt;$1.15&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Questions range from metric extraction (&lt;em&gt;"What is Amazon's FY2019 net income?"&lt;/em&gt;) to multi-step reasoning (&lt;em&gt;"What is AMD's quick ratio and what does it imply about their liquidity?"&lt;/em&gt;). Sub-cent per question across dense 200-page filings.&lt;/p&gt;

&lt;p&gt;Live demo: &lt;a href="https://okrapdf.com/demo/financebench" rel="noopener noreferrer"&gt;okrapdf.com/demo/financebench&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Use your existing SDK
&lt;/h2&gt;

&lt;p&gt;You don't need a new client. Each document exposes a standard OpenAI-compatible endpoint.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vercel AI SDK:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createOkra&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@okrapdf/ai-sdk&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;streamText&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;okra&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createOkra&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;OKRA_API_KEY&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;streamText&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;okra&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;doc-abc123&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;What is the revenue?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;OpenAI SDK:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;OpenAI&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;openai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;OKRA_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;baseURL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://api.okrapdf.com/v1/documents/doc-abc123&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;completion&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;default&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;What was net income?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Works with LangChain, CrewAI, or any agent framework that speaks the OpenAI protocol. Your agent calls it like any other model — because it is one.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;okrapdf
npx okra upload report.pdf
npx okra chat &lt;span class="s2"&gt;"What was net income in FY2023?"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/okrapdf/okrapdf-sdk" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.okrapdf.com" rel="noopener noreferrer"&gt;Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://okrapdf.com/demo/financebench" rel="noopener noreferrer"&gt;FinanceBench demo&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>pdf</category>
      <category>webdev</category>
      <category>ai</category>
      <category>subagent</category>
    </item>
  </channel>
</rss>
