<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Sarah Guthals, PhD</title>
    <description>The latest articles on Forem by Sarah Guthals, PhD (@drguthals).</description>
    <link>https://forem.com/drguthals</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F438497%2F31762a59-12b3-4ff9-9e95-3e73d3ac9dd2.jpeg</url>
      <title>Forem: Sarah Guthals, PhD</title>
      <link>https://forem.com/drguthals</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/drguthals"/>
    <language>en</language>
    <item>
      <title>Underdocumented issues are the worst, especially when it's more about configuration because it's often under-error-messaged too *tear*</title>
      <dc:creator>Sarah Guthals, PhD</dc:creator>
      <pubDate>Thu, 04 Dec 2025 00:25:57 +0000</pubDate>
      <link>https://forem.com/drguthals/underdocumented-issues-are-the-worst-especially-when-its-more-about-configuration-because-its-1ink</link>
      <guid>https://forem.com/drguthals/underdocumented-issues-are-the-worst-especially-when-its-more-about-configuration-because-its-1ink</guid>
      <description>&lt;p&gt;

&lt;/p&gt;
&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/blackgirlbytes/how-to-query-a-railway-sqlite-database-from-github-actions-376j" class="crayons-story__hidden-navigation-link"&gt;How to Query a Railway SQLite Database from GitHub Actions&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/blackgirlbytes" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F302741%2F5ed2ea8e-056a-4065-9bed-57d24a96b607.png" alt="blackgirlbytes profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/blackgirlbytes" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Rizèl Scarlett
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Rizèl Scarlett
                
              
              &lt;div id="story-author-preview-content-3082388" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/blackgirlbytes" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F302741%2F5ed2ea8e-056a-4065-9bed-57d24a96b607.png" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Rizèl Scarlett&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/blackgirlbytes/how-to-query-a-railway-sqlite-database-from-github-actions-376j" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Dec 3 '25&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/blackgirlbytes/how-to-query-a-railway-sqlite-database-from-github-actions-376j" id="article-link-3082388"&gt;
          How to Query a Railway SQLite Database from GitHub Actions
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/devops"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;devops&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/githubactions"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;githubactions&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/automation"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;automation&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/blackgirlbytes/how-to-query-a-railway-sqlite-database-from-github-actions-376j" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/exploding-head-daceb38d627e6ae9b730f36a1e390fca556a4289d5a41abb2c35068ad3e2c4b5.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/multi-unicorn-b44d6f8c23cdd00964192bedc38af3e82463978aa611b4365bd33a0f1f4f3e97.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;6&lt;span class="hidden s:inline"&gt; reactions&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/blackgirlbytes/how-to-query-a-railway-sqlite-database-from-github-actions-376j#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              2&lt;span class="hidden s:inline"&gt; comments&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            4 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;




</description>
      <category>devops</category>
      <category>githubactions</category>
      <category>automation</category>
    </item>
    <item>
      <title>It's time to start measuring accuracy of data extraction with downstream systems and usability in mind, not just vanity metrics for a marketing slide</title>
      <dc:creator>Sarah Guthals, PhD</dc:creator>
      <pubDate>Wed, 05 Nov 2025 17:59:55 +0000</pubDate>
      <link>https://forem.com/drguthals/its-time-to-start-measuring-accuracy-of-data-extraction-with-downstream-systems-and-usability-in-1a2m</link>
      <guid>https://forem.com/drguthals/its-time-to-start-measuring-accuracy-of-data-extraction-with-downstream-systems-and-usability-in-1a2m</guid>
      <description>&lt;div class="ltag__link"&gt;
  &lt;a href="/tensorlake" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__org__pic"&gt;
      &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F10835%2Fac1cc915-0d29-4bf0-a2a3-68fee95acfee.png" alt="Tensorlake" width="128" height="128"&gt;
      &lt;div class="ltag__link__user__pic"&gt;
        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F438497%2F31762a59-12b3-4ff9-9e95-3e73d3ac9dd2.jpeg" alt="" width="460" height="460"&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://dev.to/tensorlake/benchmarking-the-most-reliable-document-parsing-api-1mln" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;Benchmarking the Most Reliable Document Parsing API&lt;/h2&gt;
      &lt;h3&gt;Sarah Guthals, PhD for Tensorlake ・ Nov 5&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;


</description>
    </item>
    <item>
      <title>Benchmarking the Most Reliable Document Parsing API</title>
      <dc:creator>Sarah Guthals, PhD</dc:creator>
      <pubDate>Wed, 05 Nov 2025 17:37:05 +0000</pubDate>
      <link>https://forem.com/tensorlake/benchmarking-the-most-reliable-document-parsing-api-1mln</link>
      <guid>https://forem.com/tensorlake/benchmarking-the-most-reliable-document-parsing-api-1mln</guid>
      <description>&lt;p&gt;Document parsing is the foundation of enterprise AI applications. Whether you're building RAG pipelines, automating insurance claims, or extracting data from financial reports, everything starts with one question: &lt;strong&gt;Can you consistently transform messy, real-world documents into structured, machine-readable data?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Our customers need the best document ingestion API for their use cases. They're comparing Azure, AWS Textract, popular open-source models like Docling and Marker.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;We built a benchmark that measures what matters: Can downstream systems actually use this output?&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Measuring What Actually Matters
&lt;/h2&gt;

&lt;p&gt;Tensorlake both reads documents and extracts structured data, so when choosing what to measure accuracy with, we wanted to ensure we were measuring both document parsing with structural preservation and structured extraction for downstream usability.&lt;/p&gt;

&lt;p&gt;The aspects of Document Parsing that we wanted to measure were:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tables:&lt;/strong&gt; Ensuring we can parse and measure accuracy of complex tables with merged cells and multi-row headers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reading Order:&lt;/strong&gt; In multi-column documents, and documents with complex layouts, we measure whether the reading order is preserved while parsing. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structured Extraction Accuracy:&lt;/strong&gt; Measuring direct downstream usability of extracted data. A small OCR error in parsing a table cell can cause failure in achieving the downstream task, while the overall accuracy of the OCR on the document may be high.&lt;/li&gt;
&lt;li&gt;Extraction of footnotes, formulas, figures and other non-textual content.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Our Evaluation Methodology
&lt;/h2&gt;

&lt;p&gt;We employ two metrics that better capture these features with real-world reliability:&lt;/p&gt;

&lt;h3&gt;
  
  
  TEDS (Tree Edit Distance Similarity)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Compares predicted and ground-truth Markdown/HTML tree structures&lt;/li&gt;
&lt;li&gt;Captures structural fidelity in tables and complex layouts&lt;/li&gt;
&lt;li&gt;Widely adopted in OCRBench v2 and OmniDocBench evaluations&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Measures whether the document's logical structure and textual alignment remains intact&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;TEDS answers: "Is this table still a table?" Not just "Is the text similar?"&lt;/p&gt;

&lt;h3&gt;
  
  
  JSON F1 (Field-Level Precision and Recall)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Compares extracted JSON against schema-based ground truth&lt;/li&gt;
&lt;li&gt;Precision measures correctness of extracted fields&lt;/li&gt;
&lt;li&gt;Recall measures completeness of required field capture&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;F1 score balances both for overall reliability assessment&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;JSON F1 answers: "Can downstream automation actually use this data?" Not just "Is some text present?"&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Together, these metrics answer the essential question: &lt;strong&gt;"Can downstream systems use this output?"&lt;/strong&gt; rather than simply "Is the text similar?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Stage 1: Document Reading Ability (OCR and Structural Preservation)&lt;/strong&gt;  &lt;/p&gt;

&lt;p&gt;Each parsing model generates Markdown/HTML output. We evaluate using TEDS to measure how well structure is preserved; reading order, table integrity, and layout coherence. You can find our &lt;a href="https://tlake.link/benchmark-dataset" rel="noopener noreferrer"&gt;updated dataset published here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We use the public OCRBench v2 and OmniDocBench datasets. However, upon review, we identified inconsistencies in the published ground truth of OCRBench v2. We conducted a comprehensive audit and correction to ensure evaluation accuracy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage 2: Structured Extraction Accuracy (Downstream Usability)&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;We pass the Markdown through a standardized LLM (GPT-4o) with predefined JSON schemas, measuring JSON F1. This isolates how OCR quality impacts real extraction workflows, where an LLM interprets the parsed text.&lt;/p&gt;

&lt;p&gt;Initial JSON schemas and reference answers are generated using Gemini Pro 2.5, then human reviewers audit and correct them to ensure high-quality gold standards.&lt;/p&gt;

&lt;p&gt;This methodology ensures fair, reproducible comparisons by varying only the OCR models (Stage 1) while keeping the extraction model constant (Stage 2).&lt;/p&gt;

&lt;h2&gt;
  
  
  The Results: Public Dataset Performance
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Document Parsing Performance
&lt;/h3&gt;

&lt;p&gt;We evaluated leading open-source and proprietary models:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo2iklmdxawivl0t9hebc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo2iklmdxawivl0t9hebc.png" alt="Table showing table parsing accuracy on OmniDocBench dataset. Five models compared with TEDS scores and TEDS-Structure only scores: Docling (63.84%, 77.68%), Marker (57.88%, 71.17%), Azure (78.14%, 83.61%), Textract (80.75%, 88.78%), and Tensorlake highlighted in green (86.79%, 90.62%). Tensorlake achieves the highest scores in both categories." width="800" height="608"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Findings:&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tensorlake achieves the highest TEDS score, indicating superior structural preservation&lt;/li&gt;
&lt;li&gt;The gap between Docling and production-grade systems is substantial&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Table Parsing Performance
&lt;/h3&gt;

&lt;p&gt;We evaluated Tensorlake’s table parsing accuracy using the OmniDocBench dataset — a CVPR-accepted benchmark for comprehensive document understanding tasks (&lt;a href="https://github.com/opendatalab/OmniDocBench" rel="noopener noreferrer"&gt;GitHub link&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Table accuracy in OmniDocBench is quantified using a combination of tree-based and string-based metrics. In particular, we measured TEDS (Tree Edit Distance Similarity), which assesses both the structural and textual alignment between predicted and ground-truth HTML tables.&lt;/p&gt;

&lt;p&gt;To reproduce our results, generate Markdown outputs using the models listed below, then run the evaluation method provided in the OmniDocBench repository. We have used 512 document images with tables and v1.5 of the code version. Evaluation outputs are released in Huggingface(&lt;a href="https://huggingface.co/datasets/tensorlake/OmniDocBench-eval-outputs" rel="noopener noreferrer"&gt;link&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6byj9xutcea6ndczncau.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6byj9xutcea6ndczncau.png" alt="Bar chart showing Table Parsing Task performance on OmniDocBench dataset measured by TEDS (Tree Edit Distance Similarity) score, where higher is better. Five models compared from left to right: Marker (57.88%), Docling (63.84%), Azure (78.14%), Textract (80.75%), and Tensorlake highlighted in green (86.79%). Tensorlake achieves the highest TEDS score, outperforming the next best competitor (Textract) by approximately 6 percentage points and leading open-source alternatives by over 20 percentage points." width="800" height="646"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;¹ &lt;em&gt;Marker's Number is from the officially published OmniDocBench repository.&lt;/em&gt;  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Findings:&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;On OmniDocBench's challenging tables, Tensorlake leads with 86.79% TEDS&lt;/li&gt;
&lt;li&gt;Open-source solutions struggle with table extraction (sub-70% TEDS)&lt;/li&gt;
&lt;li&gt;Tensorlake maintains table structure even on complex, multi-page tables&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Performance on Real World Enterprise Documents
&lt;/h2&gt;

&lt;p&gt;OCR Models are rarely trained on enterprise documents, because they are not publicly available. We wanted to test how well our model performs and others perform on these documents. &lt;/p&gt;

&lt;h3&gt;
  
  
  Enterprise Document Performance (100 pages)
&lt;/h3&gt;

&lt;p&gt;We curated 100 document pages spanning banking, retail, and insurance sectors. This represents real production workloads: invoices with water damage, scanned contracts with skewed text, bank statements with multi-level tables.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhoytm5zaibro9wadqpd8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhoytm5zaibro9wadqpd8.png" alt="Bar chart showing Enterprise Document JSON Accuracy F1 scores. Six models compared: Docling (68.90%), Marker (83.30%), Azure (88.10%), Textract (88.40%), Gemini (89.00%), and Tensorlake highlighted in green (91.70%). Tensorlake achieves the highest accuracy, with approximately 5 more correctly extracted fields per 20 documents compared to the next best competitor." width="800" height="588"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Findings:&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tensorlake achieves 91.7% F1 with standard extraction, beating all competitors&lt;/li&gt;
&lt;li&gt;The difference between 91.7% and 68.9% F1 is massive: it’s &lt;strong&gt;5 extra&lt;/strong&gt; fields correctly extracted out of every 20&lt;/li&gt;
&lt;li&gt;In production workflows processing thousands of documents daily, this accuracy gap compounds into significant error reduction&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But even comparing the higher F1 scores when parsing a standard form, Azure and Textract jumble the reading order and skip data completely, whereas Tensorlake preserves the complex reading order and groups data correctly and accurately:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmyusjypydmgusuwc2ivr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmyusjypydmgusuwc2ivr.png" alt="Comparison showing how different document parsing APIs handle a contract notice section. Original document at top shows buyer and seller information with addresses, phone numbers, and email addresses. Three parsed outputs below demonstrate failures: Textract (labeled with coral background) shows jumbled addresses and missing buyer information; Azure (labeled with blue background) shows jumbled addresses and missing parenthesis; Tensorlake (labeled with green background) preserves complex reading order with no missing data and accurate information. Key differences highlighted: competitors lose structure and omit critical fields, while Tensorlake maintains logical reading order and captures all information correctly." width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Delivering the Best Performance/Price Ratio
&lt;/h2&gt;

&lt;p&gt;Accuracy without affordability isn't practical. Here's how Tensorlake compares to other Document Ingestion APIs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.tensorlake.ai/pricing" rel="noopener noreferrer"&gt;Tensorlake&lt;/a&gt;: $10 per 1k pages&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;TEDS Score: &lt;strong&gt;86.79&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;F1 Score: &lt;strong&gt;91.7&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;a href="https://azure.microsoft.com/en-us/pricing/details/ai-document-intelligence/" rel="noopener noreferrer"&gt;Azure&lt;/a&gt;: $10 per 1k pages&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;TEDS Score: 78.14&lt;/li&gt;
&lt;li&gt;F1 Score: 88.1&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/textract/pricing/" rel="noopener noreferrer"&gt;AWS Textract&lt;/a&gt;: $15 per 1k pages&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;TEDS Score: 80.75&lt;/li&gt;
&lt;li&gt;F1 Score: 88.4&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Tensorlake delivers the highest accuracy than both Azure and AWS Textract, matching Azure's cost while AWS Textract is 50% more expensive.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Take the Next Step
&lt;/h2&gt;

&lt;p&gt;When your business depends on accurate document processing, you can't afford to use anything less.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://tlake.link/cloud" rel="noopener noreferrer"&gt;Try Tensorlake free&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Want to discuss your specific use case?  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://tlake.link/chat" rel="noopener noreferrer"&gt;Schedule a technical demo&lt;/a&gt; with our team.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Questions about the benchmark?  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://tlake.link/slack" rel="noopener noreferrer"&gt;Join our Slack community&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>api</category>
      <category>rag</category>
      <category>ai</category>
      <category>performance</category>
    </item>
    <item>
      <title>Process documents with hundreds of pages with no issues. In this example, I extracted crypto holdings from 200+ page SEC filings by first classifying pages using VLM support and then extracting relevant information only from those pages.</title>
      <dc:creator>Sarah Guthals, PhD</dc:creator>
      <pubDate>Thu, 16 Oct 2025 20:54:21 +0000</pubDate>
      <link>https://forem.com/drguthals/process-documents-with-hundreds-of-pages-with-no-issues-in-this-example-i-extracted-crypto-35le</link>
      <guid>https://forem.com/drguthals/process-documents-with-hundreds-of-pages-with-no-issues-in-this-example-i-extracted-crypto-35le</guid>
      <description>&lt;div class="ltag__link"&gt;
  &lt;a href="/tensorlake" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__org__pic"&gt;
      &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F10835%2Fac1cc915-0d29-4bf0-a2a3-68fee95acfee.png" alt="Tensorlake" width="128" height="128"&gt;
      &lt;div class="ltag__link__user__pic"&gt;
        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F438497%2F31762a59-12b3-4ff9-9e95-3e73d3ac9dd2.jpeg" alt="" width="460" height="460"&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://dev.to/tensorlake/new-vision-language-models-for-document-processing-3fdm" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;New: Vision Language Models for Document Processing&lt;/h2&gt;
      &lt;h3&gt;Sarah Guthals, PhD for Tensorlake ・ Oct 16&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;


</description>
    </item>
    <item>
      <title>New: Vision Language Models for Document Processing</title>
      <dc:creator>Sarah Guthals, PhD</dc:creator>
      <pubDate>Thu, 16 Oct 2025 20:52:08 +0000</pubDate>
      <link>https://forem.com/tensorlake/new-vision-language-models-for-document-processing-3fdm</link>
      <guid>https://forem.com/tensorlake/new-vision-language-models-for-document-processing-3fdm</guid>
      <description>&lt;p&gt;We've expanded our use of Vision Language Models (VLMs) across multiple DocumentAI features for faster and more accurate document processing on documents with hundreds of pages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Page Classification&lt;/strong&gt;: Identify relevant pages in large documents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Figure and Table Summarization&lt;/strong&gt;: Extract insights from visual elements&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structured Extraction (with &lt;code&gt;skip_ocr&lt;/code&gt;)&lt;/strong&gt;: Direct visual understanding for more accurate extraction on harder to parse documents (e.g. scanned documents, engineering diagrams, or documents with complex reading order)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This post focuses on our enhanced page classification capabilities for demonstration. With VLM support, you can quickly process large documents by identifying and extracting from only relevant pages.&lt;/p&gt;

&lt;p&gt;Try it in this &lt;a href="https://tlake.link/notebooks/vlm-parsing" rel="noopener noreferrer"&gt;Colab Notebook&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Improvements
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Scale &amp;amp; Performance
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Handle Large Documents&lt;/strong&gt;: Classify documents with hundreds of pages without performance degradation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;VLM-Powered Classification&lt;/strong&gt;: Replaced OCR with Vision Language Models for faster, more accurate classification&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Selective Processing&lt;/strong&gt;: Only parse pages that matter, reducing processing time and costs&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Recommended Workflow
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Classify First&lt;/strong&gt;: Use the &lt;code&gt;classify&lt;/code&gt; endpoint to identify relevant pages based on your criteria&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parse Selectively&lt;/strong&gt;: Set &lt;code&gt;page_range&lt;/code&gt; to only process the classified relevant pages&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extract Efficiently&lt;/strong&gt;: Apply structured extraction only to pages containing the information you need&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Use Case Example: SEC Filings Analysis
&lt;/h2&gt;

&lt;p&gt;This approach is particularly powerful for extracting specific information from lengthy documents like SEC filings. For example, when analyzing cryptocurrency holdings across multiple companies' 10-K and 10-Q reports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Challenge&lt;/strong&gt;: Each filing can be 100-200+ pages, but crypto-related information might only appear on 10-20 pages&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Solution&lt;/strong&gt;: First classify pages containing "digital assets holdings", then extract structured data only from those pages&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Result&lt;/strong&gt;: 80-90% reduction in processing time and more focused, accurate extractions&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Code Example
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tensorlake.documentai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DocumentAI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PageClassConfig&lt;/span&gt;

&lt;span class="n"&gt;doc_ai&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DocumentAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Step 1: Classify pages
&lt;/span&gt;&lt;span class="n"&gt;page_classifications&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="nc"&gt;PageClassConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;digital_assets_holdings&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Pages showing cryptocurrency holdings on balance sheet...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;parse_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;doc_ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;classify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;file_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;filing_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;page_classifications&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;page_classifications&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;doc_ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait_for_completion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parse_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;parse_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Step 2: Parse only relevant pages
&lt;/span&gt;&lt;span class="n"&gt;relevant_pages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page_classes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;page_numbers&lt;/span&gt;
&lt;span class="n"&gt;page_range&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;relevant_pages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;final_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;doc_ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse_and_wait&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;filing_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;page_range&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;page_range&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;structured_extraction_options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[...]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Benefits
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cost Efficiency&lt;/strong&gt;: Process only what you need&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Speed&lt;/strong&gt;: Reduce processing time by focusing on relevant content&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Accuracy&lt;/strong&gt;: VLM classification provides better understanding of page content&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scalability&lt;/strong&gt;: Handle large document sets without compromising performance&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try It Out
&lt;/h2&gt;

&lt;p&gt;Check out our &lt;a href="https://tlake.link/notebooks/vlm-parsing" rel="noopener noreferrer"&gt;example notebook&lt;/a&gt; demonstrating how to extract cryptocurrency metrics from SEC filings using the new classification approach.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;Update to the latest version of Tensorlake:&lt;br&gt;
&lt;code&gt;pip install --upgrade tensorlake&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Then start classifying, summarizing, and extracting with improved efficiency!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>llm</category>
    </item>
    <item>
      <title>Precise Data Extraction: Pattern-Based Partitioning for Structured Extraction</title>
      <dc:creator>Sarah Guthals, PhD</dc:creator>
      <pubDate>Wed, 15 Oct 2025 23:12:33 +0000</pubDate>
      <link>https://forem.com/tensorlake/precise-data-extraction-pattern-based-partitioning-for-structured-extraction-2fi5</link>
      <guid>https://forem.com/tensorlake/precise-data-extraction-pattern-based-partitioning-for-structured-extraction-2fi5</guid>
      <description>&lt;p&gt;Your document extraction pipeline is brittle. Hard-coded page ranges break when layouts shift, full-document parsing burns through tokens on irrelevant content, and template based extraction fails when target data moves between document versions.&lt;/p&gt;

&lt;p&gt;Tensorlake's pattern-based extraction solves this within your &lt;code&gt;StructuredExtractionOption&lt;/code&gt; workflows. Define start and end patterns to extract only the data sections you need. &lt;/p&gt;

&lt;p&gt;No more parsing noise, no more layout dependencies.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem with Traditional Extraction
&lt;/h2&gt;

&lt;p&gt;Document layout variability breaks positional extraction logic. A financial report might place the "Total Assets" section starting on page 3 and going through page 4 in one document and all within page 7 in another. Parsing entire documents wastes compute cycles and introduces noise into structured extraction workflows. Fixed page or section boundaries miss target data that spans inconsistent locations across document sets.&lt;/p&gt;

&lt;p&gt;Pattern-based partitioning solves this by decoupling extraction logic from document layout through regex-driven zone targeting.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pattern-Based Partitioning: Content-Aware Extraction
&lt;/h2&gt;

&lt;p&gt;Pattern-based partitioning delivers precision targeting through regex pattern recognition within your &lt;code&gt;StructuredExtractionOption&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Precise&lt;/strong&gt;: Use regex patterns like &lt;code&gt;\\bAccount\\s+Summary\\b&lt;/code&gt; to identify exactly where your target data begins or ends, skipping irrelevant content that clutters extraction results.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flexible&lt;/strong&gt;: Define start patterns to begin extraction at specific markers, end patterns to stop at precise terminators, or both to capture data between known markers. Extract only the contract section you need, not the entire 200-page document.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-Page&lt;/strong&gt;: Extract data that spans pages by focusing on content patterns rather than arbitrary page boundaries or specific section headers. Perfect for financial summaries, property listings, or contract clauses that don't respect document structure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API-Native&lt;/strong&gt;: Integrate into existing workflows with simple JSON configuration in your parse endpoint calls, no architectural changes required.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Implementation: Four Steps to Pattern-Based Partitioning for Extraction
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Identify Your Patterns&lt;/strong&gt;&lt;br&gt;
Analyze your documents to find consistent text markers around target data. Look for headers like "Account Summary", "Grand Total:", or "Section 4.2" that reliably indicate extraction zones.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsn42335bbnscj1dp68jk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsn42335bbnscj1dp68jk.png" alt="TD Bank statement with highlighted sections showing 'ACCOUNT SUMMARY' and 'DAILY ACCOUNT ACTIVITY' headers. Left side shows 'Identify Partitioning Patterns' text with Tensorlake URL, demonstrating pattern-based extraction targeting." width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Configure Extraction&lt;/strong&gt;&lt;br&gt;
Add pattern configuration to your &lt;code&gt;StructuredExtractionOption&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"partition_strategy"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"patterns"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"start_patterns"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;bACCOUNT&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;s+SUMMARY&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;b"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"end_patterns"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;bDAILY&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;s+ACCOUNT&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;s+ACTIVITY&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;b"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Test &amp;amp; Refine&lt;/strong&gt;&lt;br&gt;
Process sample documents and adjust patterns to capture exactly the data sections you need. Start broad, then narrow your regex patterns for precision.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Scale Deployment&lt;/strong&gt;&lt;br&gt;
Apply consistent extraction rules across thousands of similar documents with confidence. Pattern-based targeting scales linearly, define once, extract consistently.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Results: Deterministic Extraction with Content-Aware Targeting
&lt;/h2&gt;

&lt;p&gt;Pattern-based partitioning delivers deterministic extraction performance across variable document layouts. Your extraction logic becomes resilient to structural changes. Financial summaries can migrate from page 3 to page 7, contract clauses can span different sections, and your pipeline continues extracting the correct data zones.&lt;/p&gt;

&lt;p&gt;The architectural benefit: your extraction logic becomes layout-agnostic. Instead of brittle positional dependencies, you define semantic boundaries that scale across document variations. The result: consistent structured outputs from inconsistent document inputs, with extraction accuracy that doesn't degrade as document templates evolve.&lt;/p&gt;

&lt;p&gt;Ready to stop guessing and start targeting your extractions? Try Tensorlake's pattern-based partitioning today:&lt;/p&gt;

&lt;p&gt;-&amp;gt; Documentation: &lt;a href="https://tlake.link/pattern-based-partioning" rel="noopener noreferrer"&gt;Learn more about pattern-based partitioning&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;-&amp;gt; Colab Notebook: &lt;a href="https://tlake.link/notebooks/pattern-based-partitioning" rel="noopener noreferrer"&gt;Try the notebook&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;-&amp;gt; Schedule a call and we’ll help you get going: &lt;a href="https://tlake.link/chat" rel="noopener noreferrer"&gt;Book a call now&lt;/a&gt;&lt;/p&gt;

</description>
      <category>automation</category>
      <category>dataengineering</category>
      <category>llm</category>
    </item>
    <item>
      <title>Document engineers/DevRel - how are you using Claude (or other tools) in content creation?</title>
      <dc:creator>Sarah Guthals, PhD</dc:creator>
      <pubDate>Fri, 19 Sep 2025 18:59:37 +0000</pubDate>
      <link>https://forem.com/drguthals/document-engineersdevrel-how-are-you-using-claude-or-other-tools-in-content-creation-2i7m</link>
      <guid>https://forem.com/drguthals/document-engineersdevrel-how-are-you-using-claude-or-other-tools-in-content-creation-2i7m</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/drguthals/working-effectively-with-claude-from-vibe-prompting-to-context-engineering-for-technical-content-46gl" class="crayons-story__hidden-navigation-link"&gt;Working Effectively with Claude: From Vibe Prompting to Context Engineering for Technical Content&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/drguthals" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F438497%2F31762a59-12b3-4ff9-9e95-3e73d3ac9dd2.jpeg" alt="drguthals profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/drguthals" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Sarah Guthals, PhD
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Sarah Guthals, PhD
                
              
              &lt;div id="story-author-preview-content-2856702" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/drguthals" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F438497%2F31762a59-12b3-4ff9-9e95-3e73d3ac9dd2.jpeg" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Sarah Guthals, PhD&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/drguthals/working-effectively-with-claude-from-vibe-prompting-to-context-engineering-for-technical-content-46gl" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Sep 19 '25&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/drguthals/working-effectively-with-claude-from-vibe-prompting-to-context-engineering-for-technical-content-46gl" id="article-link-2856702"&gt;
          Working Effectively with Claude: From Vibe Prompting to Context Engineering for Technical Content
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/drguthals/working-effectively-with-claude-from-vibe-prompting-to-context-engineering-for-technical-content-46gl" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;2&lt;span class="hidden s:inline"&gt; reactions&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/drguthals/working-effectively-with-claude-from-vibe-prompting-to-context-engineering-for-technical-content-46gl#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              &lt;span class="hidden s:inline"&gt;Add Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            6 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
    </item>
    <item>
      <title>Working Effectively with Claude: From Vibe Prompting to Context Engineering for Technical Content</title>
      <dc:creator>Sarah Guthals, PhD</dc:creator>
      <pubDate>Fri, 19 Sep 2025 18:58:28 +0000</pubDate>
      <link>https://forem.com/drguthals/working-effectively-with-claude-from-vibe-prompting-to-context-engineering-for-technical-content-46gl</link>
      <guid>https://forem.com/drguthals/working-effectively-with-claude-from-vibe-prompting-to-context-engineering-for-technical-content-46gl</guid>
      <description>&lt;p&gt;&lt;em&gt;How to leverage AI as a collaborative tool for creating educational content that actually works&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In my recent post about &lt;a href="https://dev.to/drguthals/the-mythical-vibe-month-vibe-coding-context-engineering-and-the-future-of-ai-dev-tools-m5i"&gt;The Mythical Vibe-Month&lt;/a&gt;, I wrote about how "vibe coding" (aka throwing prompts at LLMs and hoping for magic) creates fragile, context-free outputs. But here's what I didn't mention: the same principle applies to content creation.&lt;/p&gt;

&lt;p&gt;Over the past several months, I've been working with Claude to create technical documentation, blog posts, tutorials, and educational materials. Not through vibe prompting, but through what I call &lt;strong&gt;collaborative context engineering&lt;/strong&gt;: treating Claude as a learning partner rather than a magic answer machine.&lt;/p&gt;

&lt;p&gt;The difference? Instead of expecting Claude to be the "sage on the stage" delivering perfect content from thin air, I position &lt;em&gt;myself&lt;/em&gt; as the "guide on the side," directing our collaboration toward better outcomes. This mirrors how effective learning actually works: through dialogue, iteration, and building understanding together.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem with Vibe Prompting for Content
&lt;/h2&gt;

&lt;p&gt;Most people approach AI content creation like this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Write a blog post about document processing for developers"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And then they're surprised when the output feels generic, misses crucial context, or doesn't match their voice. It's the content equivalent of landing in Knockturn Alley instead of Diagon Alley; close enough to feel right, but missing the mark entirely.&lt;/p&gt;

&lt;p&gt;The problem isn't the AI's capability. The problem is that &lt;strong&gt;good technical content requires lived context&lt;/strong&gt; that can't be vibed into existence:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The specific pain points your audience faces&lt;/li&gt;
&lt;li&gt;The mental models that help concepts click&lt;/li&gt;
&lt;li&gt;The edge cases and gotchas from real implementation&lt;/li&gt;
&lt;li&gt;Your unique perspective and voice&lt;/li&gt;
&lt;li&gt;The broader strategic context of why this content matters&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Context Engineering for Content Creation
&lt;/h2&gt;

&lt;p&gt;Instead of hoping Claude vibes its way to good content, I treat our collaboration as a context engineering exercise. Here's my actual process:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Start with Research, Not Writing
&lt;/h3&gt;

&lt;p&gt;When I want Claude to help with content, I never start with "write this for me." I start with "learn about this with me."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My typical opening prompt:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"I want you to do research on [topic] and learn about the challenges facing my ICP by asking me questions, if needed. You can also find information by search competitors such as X, Y, or Z, referencing previously published work [links] and reading through docs [links]. You should also do research on me to better understand how I would position and explain this topic. You can find information about me on my GitHub, LinkedIn, and resume. You should also read these blog posts to understand my tone [links]."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This isn't delegation; it's collaboration setup. I'm giving Claude the tools to understand both the subject matter and my perspective. Claude then researches, asks follow-up questions, and builds context before we ever touch content creation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this works:&lt;/strong&gt; Claude becomes a learning partner who understands my voice, my audience, and my goals, rather than a content generator working from assumptions.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Guide Through Questions, Don't Dictate Answers
&lt;/h3&gt;

&lt;p&gt;After Claude does initial research, I don't immediately jump to "now write the thing." Instead, I let Claude ask me questions...lots of them.&lt;/p&gt;

&lt;p&gt;In our collaborations, Claude typically asks things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"What are the biggest barriers you've observed that prevent engineers from achieving what they want with code?"&lt;/li&gt;
&lt;li&gt;"What does success look like for you in this role?"&lt;/li&gt;
&lt;li&gt;"How do you see the thread between [your previous work] and [current work]?"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These aren't just information-gathering questions. They're the kinds of questions that help me clarify my own thinking. Often, Claude's questions surface insights I hadn't explicitly articulated, even to myself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this works:&lt;/strong&gt; Good content comes from clear thinking. The Socratic method Claude uses here forces me to crystallize ideas that might have been fuzzy, giving us both better material to work with.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Iterate Through Feedback, Not Replacement
&lt;/h3&gt;

&lt;p&gt;When Claude produces content, I never treat the first draft as the final product. Instead, I provide specific feedback:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"This section needs more technical depth"&lt;/li&gt;
&lt;li&gt;"The tone here doesn't match my voice from [previous article]"&lt;/li&gt;
&lt;li&gt;"Add a concrete example of [specific scenario]"&lt;/li&gt;
&lt;li&gt;"This analogy doesn't quite work for our audience"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Crucially, I don't just say "make it better." I give Claude the context it needs to improve in the right direction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this works:&lt;/strong&gt; Each iteration teaches Claude more about my standards, voice, and audience. The content gets progressively better, and Claude gets progressively better at anticipating what I need.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Supplement with Additional Research
&lt;/h3&gt;

&lt;p&gt;Mid-conversation, I often ask Claude to research additional context:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Look up the latest developments in [technology area]"&lt;/li&gt;
&lt;li&gt;"Find examples of how [specific company] approaches this problem"&lt;/li&gt;
&lt;li&gt;"Research the best practices around [implementation detail]"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This isn't because Claude's initial research was insufficient. It's because good content creation is an exploratory process. As we develop ideas together, new research needs emerge.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this works:&lt;/strong&gt; Real-time research keeps our content current and comprehensive. It also models how I actually write: constantly fact-checking, finding examples, and building on new information.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Actually Publish vs. What Claude Produces
&lt;/h2&gt;

&lt;p&gt;Here's something crucial: &lt;strong&gt;I never publish Claude's output directly.&lt;/strong&gt; What Claude produces is sophisticated first-draft material that captures my voice and ideas, but it's not my final work.&lt;/p&gt;

&lt;p&gt;My published content goes through several more layers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Structural editing&lt;/strong&gt; where I reorganize for better flow&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Voice refinement&lt;/strong&gt; where I adjust tone and style to exactly match my perspective&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Technical validation&lt;/strong&gt; where I verify every code example and technical claim&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audience optimization&lt;/strong&gt; where I add specific details that resonate with my community&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Think of Claude as an extremely capable research assistant and thought partner, not a ghostwriter. The ideas, insights, and expertise are mine. Claude helps me organize and articulate them more effectively.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters for Engineers Building with AI
&lt;/h2&gt;

&lt;p&gt;If you're working on agentic applications, LLM integrations, or AI-powered developer tools, this collaborative approach offers several lessons:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context is everything.&lt;/strong&gt; The most sophisticated AI in the world can't substitute for domain knowledge, user empathy, and situational awareness. Your job isn't to automate human expertise away, it's to amplify it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Iteration beats generation.&lt;/strong&gt; Instead of building systems that try to produce perfect outputs in one shot, build systems that support rapid iteration and refinement. The magic happens in the feedback loop, not the initial output.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI should teach, not just execute.&lt;/strong&gt; The best AI tools I've used don't just perform tasks, they help me understand problems better, ask better questions, and develop better solutions. Claude's questioning approach has genuinely improved my thinking about content strategy and audience needs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Takeaways
&lt;/h2&gt;

&lt;p&gt;Whether you're creating documentation, building developer education, or working on any AI-powered content creation:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Set up context before requesting output.&lt;/strong&gt; Give your AI tool the background information it needs to understand your goals, audience, and constraints.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Use AI to help you think, not just to produce.&lt;/strong&gt; The best prompts are often questions that help clarify your own thinking.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Plan for iteration.&lt;/strong&gt; Your first output won't be your final output. Build feedback loops into your workflow.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Maintain human judgment.&lt;/strong&gt; AI can help you articulate ideas and organize thoughts, but the expertise, creativity, and final decisions should remain yours.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Meta-Point About Tools
&lt;/h2&gt;

&lt;p&gt;This brings me back to the core insight about working with any powerful tool: &lt;strong&gt;when you have a hammer, not everything is a nail.&lt;/strong&gt; Claude is incredibly capable, but knowing when and how to use it effectively makes all the difference.&lt;/p&gt;

&lt;p&gt;The goal isn't to automate content creation. The goal is to amplify human expertise, accelerate the iteration cycle, and create better outcomes through collaboration.&lt;/p&gt;

&lt;p&gt;In my work, this approach has helped me create documentation that developers actually use, tutorials that successfully onboard new users, and content that genuinely advances our mission of making document AI accessible to engineers at any level.&lt;/p&gt;

&lt;p&gt;That's the difference between vibe prompting and context engineering. One hopes for magic. The other creates it, systematically and sustainably.&lt;/p&gt;

&lt;p&gt;Because at the end of the day, the best tools don't replace human expertise, they make that expertise more powerful.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Ever asked a model “where did you get that?” Citations for document parsing solve that exact trust gap. Fewer hallucinations. More trust. Reliable workflows. Read the article, watch the video, try the notebook 👇</title>
      <dc:creator>Sarah Guthals, PhD</dc:creator>
      <pubDate>Wed, 03 Sep 2025 19:50:17 +0000</pubDate>
      <link>https://forem.com/drguthals/ever-asked-a-model-where-did-you-get-that-citations-for-document-parsing-solve-that-exact-27i5</link>
      <guid>https://forem.com/drguthals/ever-asked-a-model-where-did-you-get-that-citations-for-document-parsing-solve-that-exact-27i5</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/tensorlake/make-rag-provable-page-bbox-citations-for-all-extracted-data-4ipc" class="crayons-story__hidden-navigation-link"&gt;Verify Structured Output with Field-Level Citations&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;
          &lt;a class="crayons-logo crayons-logo--l" href="/tensorlake"&gt;
            &lt;img alt="Tensorlake logo" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F10835%2Fac1cc915-0d29-4bf0-a2a3-68fee95acfee.png" class="crayons-logo__image"&gt;
          &lt;/a&gt;

          &lt;a href="/drguthals" class="crayons-avatar  crayons-avatar--s absolute -right-2 -bottom-2 border-solid border-2 border-base-inverted  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F438497%2F31762a59-12b3-4ff9-9e95-3e73d3ac9dd2.jpeg" alt="drguthals profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/drguthals" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Sarah Guthals, PhD
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Sarah Guthals, PhD
                
              
              &lt;div id="story-author-preview-content-2815571" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/drguthals" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F438497%2F31762a59-12b3-4ff9-9e95-3e73d3ac9dd2.jpeg" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Sarah Guthals, PhD&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

            &lt;span&gt;
              &lt;span class="crayons-story__tertiary fw-normal"&gt; for &lt;/span&gt;&lt;a href="/tensorlake" class="crayons-story__secondary fw-medium"&gt;Tensorlake&lt;/a&gt;
            &lt;/span&gt;
          &lt;/div&gt;
          &lt;a href="https://dev.to/tensorlake/make-rag-provable-page-bbox-citations-for-all-extracted-data-4ipc" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Sep 3 '25&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/tensorlake/make-rag-provable-page-bbox-citations-for-all-extracted-data-4ipc" id="article-link-2815571"&gt;
          Verify Structured Output with Field-Level Citations
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/tensorlake/make-rag-provable-page-bbox-citations-for-all-extracted-data-4ipc" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;1&lt;span class="hidden s:inline"&gt; reaction&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/tensorlake/make-rag-provable-page-bbox-citations-for-all-extracted-data-4ipc#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              &lt;span class="hidden s:inline"&gt;Add Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            3 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
    </item>
    <item>
      <title>Verify Structured Output with Field-Level Citations</title>
      <dc:creator>Sarah Guthals, PhD</dc:creator>
      <pubDate>Wed, 03 Sep 2025 16:45:00 +0000</pubDate>
      <link>https://forem.com/tensorlake/make-rag-provable-page-bbox-citations-for-all-extracted-data-4ipc</link>
      <guid>https://forem.com/tensorlake/make-rag-provable-page-bbox-citations-for-all-extracted-data-4ipc</guid>
      <description>&lt;p&gt;Missing evidence is one of the biggest blockers in production AI workflows.  &lt;/p&gt;

&lt;p&gt;It’s not enough to say &lt;em&gt;what&lt;/em&gt; a document claims, you need to show &lt;em&gt;where&lt;/em&gt; in the source that claim came from. Whether you’re auditing bank statements, verifying medical referral forms, or investigating fraud, traceability is a hard requirement.&lt;/p&gt;

&lt;p&gt;That’s why we’ve introduced a new parameter in Tensorlake’s &lt;code&gt;StructuredExtractionOptions&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nc"&gt;StructuredExtractionOptions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;schema_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ExampleSchema&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;json_schema&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ExampleSchema&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;provide_citations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When &lt;code&gt;provide_citations=True&lt;/code&gt;, every extracted field includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Page number&lt;/li&gt;
&lt;li&gt;Bounding box (bbox) coordinates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This means structured outputs are no longer just machine-readable; they’re auditable, verifiable, and traceable back to the source document.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/dfTp4xEceoQ"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Traceable Context Means Trustworthy RAG
&lt;/h2&gt;

&lt;p&gt;In many workflows, “close enough” isn’t good enough. Teams need confidence that extracted values align with the document’s ground truth. Let’s look at where this matters most:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Banking &amp;amp; Finance&lt;/strong&gt;: Auditors need to understand exactly which account, statement, or transaction produced a reported number. If an account balance doesn’t reconcile, citations let you trace back to the precise page and bounding box where the discrepancy originates. No more guesswork in backtracking totals.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fraud Detection&lt;/strong&gt;: When anomalies appear in reported values, bounding-box citations provide the evidence trail. Investigators can quickly verify whether a suspicious number came from an altered document, a duplicated entry, or a genuine filing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Healthcare &amp;amp; Forms Processing&lt;/strong&gt;: At UCLA, teams processing medical referral forms wanted faster verification of ground truth. With citations, a structured field (like “referral date” or “doctor’s signature”) can point directly to the page span and bounding box where it was found, cutting human review time dramatically.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In short: &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Citations turn structured extraction into a compliance-grade tool.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Implement Citations with One Line of Code
&lt;/h2&gt;

&lt;p&gt;Let’s take a simple example: extracting transaction summaries from a bank statement.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tensorlake.documentai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DocumentAI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;StructuredExtractionOptions&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Field&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Transaction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Transaction date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Transaction description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Transaction amount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;BankStatement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;transactions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Transaction&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;doc_ai&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DocumentAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;structured_extraction_options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="nc"&gt;StructuredExtractionOptions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;schema_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BankStatement&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;json_schema&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;BankStatement&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;provide_citations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;   &lt;span class="c1"&gt;# &amp;lt;-- new parameter
&lt;/span&gt;    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;doc_ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse_and_wait&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://tlake.link/documents/bank-statement&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;structured_extraction_options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;structured_extraction_options&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;structured_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The returned JSON now looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"transactions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"08/24"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Date_citation"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"page_number"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"x1"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;59&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"x2"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;135&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"y1"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;448&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"y2"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;482&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"amount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"50.00"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"amount_citation"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"page_number"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"x1"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;515&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"x2"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;585&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"y1"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;447&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"y2"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;482&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"descriptions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ATM CASH DEPOSIT, ***** 30073995581 AUT 082220 ATM CASH DEPOSIT 550 LONG BEACH BLVD LONG BEACH * NY"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"descriptions_citation"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"page_number"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"x1"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;135&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"x2"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;515&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"y1"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;447&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"y2"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;482&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each field is now annotated with a citation: the page number and bounding-box coordinates.&lt;/p&gt;

&lt;p&gt;If you use our Tensorlake Cloud Playground, you can even get the visual bounding-boxes labeled for each extracted bit of information&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu93p6w9jv0matr8kelxg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu93p6w9jv0matr8kelxg.png" alt="Example of Tensorlake structured data extraction with citations, showing JSON output linked to highlighted fields on a TD Bank statement. A $50.00 transaction is mapped from the document to the JSON citation with bounding box coordinates."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  From Data to Evidence
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;“In insurance, structured outputs power our workflows, but people still verify. With field-level citations, reviewers can jump from a data row straight to the exact COI or endorsement language. That’s the difference between ‘parsed’ and provable.”&lt;/p&gt;

&lt;p&gt;— &lt;em&gt;Jesse McClure, CTO and Co-Founder, &lt;a href="https://www.sublynk.com/" rel="noopener noreferrer"&gt;Sublynk&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Citations aren’t just nice-to-have, our customers across industries know that they unlock new workflows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Audit-ready outputs&lt;/strong&gt;: Every number is backed by ground-truth evidence.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automated review&lt;/strong&gt;: Flag discrepancies automatically and point reviewers directly to the source.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Explainability in RAG/Agents&lt;/strong&gt;: Don’t just return answers—return the highlighted document snippets.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;UI Enhancements&lt;/strong&gt;: Build document viewers that highlight the exact fields extracted.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The benefit is twofold: engineers can build more reliable systems and stakeholders (auditors, compliance teams, regulators) get confidence and transparency.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try Structured Extraction Citations Now
&lt;/h2&gt;

&lt;p&gt;You can try &lt;code&gt;provide_citations=True&lt;/code&gt; today in both the Tensorlake Playground and the API/Python SDK.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Docs: &lt;a href="https://tlake.link/citations" rel="noopener noreferrer"&gt;Structured Extraction&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Example Notebook: &lt;a href="https://tlake.link/notebooks/citations" rel="noopener noreferrer"&gt;Parse Bank Statements&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you have any questions or feedback, we'd love to hear from you! Join our &lt;a href="https://tlake.link/slack" rel="noopener noreferrer"&gt;Slack&lt;/a&gt; and let us know how you're using citations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Traceability Built In
&lt;/h2&gt;

&lt;p&gt;With the new &lt;code&gt;provide_citations&lt;/code&gt; parameter, structured extraction becomes not only machine-readable but also evidence-backed.&lt;/p&gt;

&lt;p&gt;Every field can now point back to its exact source location in the document, making Tensorlake the foundation for audit-ready, compliance-grade, and fraud-resistant AI workflows.&lt;/p&gt;

&lt;p&gt;Start using it today. In production AI, traceability isn’t optional.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>I've been dancing around this idea for a long time now. Regardless of my title, how the industry is shaping, how tech evolves, the core of why I love this industry is because it's really all about learning...and I'm a learning nerd 🤓</title>
      <dc:creator>Sarah Guthals, PhD</dc:creator>
      <pubDate>Thu, 21 Aug 2025 02:22:29 +0000</pubDate>
      <link>https://forem.com/drguthals/ive-been-dancing-around-this-idea-for-a-long-time-now-regardless-of-my-title-how-the-industry-is-36h6</link>
      <guid>https://forem.com/drguthals/ive-been-dancing-around-this-idea-for-a-long-time-now-regardless-of-my-title-how-the-industry-is-36h6</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/drguthals/the-mythical-vibe-month-vibe-coding-context-engineering-and-the-future-of-ai-dev-tools-m5i" class="crayons-story__hidden-navigation-link"&gt;The Mythical Vibe-Month: Vibe Coding, Context Engineering, and the Future of AI Dev Tools&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/drguthals" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F438497%2F31762a59-12b3-4ff9-9e95-3e73d3ac9dd2.jpeg" alt="drguthals profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/drguthals" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Sarah Guthals, PhD
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Sarah Guthals, PhD
                
              
              &lt;div id="story-author-preview-content-2786463" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/drguthals" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F438497%2F31762a59-12b3-4ff9-9e95-3e73d3ac9dd2.jpeg" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Sarah Guthals, PhD&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/drguthals/the-mythical-vibe-month-vibe-coding-context-engineering-and-the-future-of-ai-dev-tools-m5i" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Aug 21 '25&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/drguthals/the-mythical-vibe-month-vibe-coding-context-engineering-and-the-future-of-ai-dev-tools-m5i" id="article-link-2786463"&gt;
          The Mythical Vibe-Month: Vibe Coding, Context Engineering, and the Future of AI Dev Tools
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/programming"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;programming&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/contextengineering"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;contextengineering&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/vibecoding"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;vibecoding&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/drguthals/the-mythical-vibe-month-vibe-coding-context-engineering-and-the-future-of-ai-dev-tools-m5i" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;1&lt;span class="hidden s:inline"&gt; reaction&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/drguthals/the-mythical-vibe-month-vibe-coding-context-engineering-and-the-future-of-ai-dev-tools-m5i#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              &lt;span class="hidden s:inline"&gt;Add Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            4 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
      <category>programming</category>
      <category>ai</category>
      <category>contextengineering</category>
      <category>vibecoding</category>
    </item>
    <item>
      <title>The Mythical Vibe-Month: Vibe Coding, Context Engineering, and the Future of AI Dev Tools</title>
      <dc:creator>Sarah Guthals, PhD</dc:creator>
      <pubDate>Thu, 21 Aug 2025 02:20:19 +0000</pubDate>
      <link>https://forem.com/drguthals/the-mythical-vibe-month-vibe-coding-context-engineering-and-the-future-of-ai-dev-tools-m5i</link>
      <guid>https://forem.com/drguthals/the-mythical-vibe-month-vibe-coding-context-engineering-and-the-future-of-ai-dev-tools-m5i</guid>
      <description>&lt;p&gt;In &lt;em&gt;The Mythical Man-Month&lt;/em&gt;, Fred Brooks famously wrote:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The magic of myth and legend has come true in our time. One types the correct incantation on a keyboard, and a display screen comes to life, showing things that never were nor could be.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In 1975, that was programming itself; type the right sequence of symbols and something new appeared.&lt;/p&gt;

&lt;p&gt;Today, we’re living through a new version of that magic. With large language models, you don’t even need the exact incantation. A vague prompt - &lt;em&gt;a vibe&lt;/em&gt; - can conjure up working code. It feels like we’ve entered what I like to call &lt;strong&gt;The Mythical Vibe-Month&lt;/strong&gt;: a world where AI gives us the illusion of infinite acceleration.&lt;/p&gt;

&lt;p&gt;But here’s the rub: &lt;em&gt;&lt;strong&gt;magic without context is messy&lt;/strong&gt;&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Vibe Coding’s Surface-Level Problem
&lt;/h2&gt;

&lt;p&gt;“Vibe coding” works beautifully in demos, toy projects, and small scripts. But in production systems, it misses the most important ingredient: &lt;strong&gt;context&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;And I don’t just mean the context of the code snippet. I mean the living context of a software project:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The Slack conversations that explain why a shortcut was chosen.&lt;/li&gt;
&lt;li&gt;The GitHub issues where trade-offs were debated.&lt;/li&gt;
&lt;li&gt;The PR comments that capture edge cases and gotchas.&lt;/li&gt;
&lt;li&gt;The searches, docs, and past LLM queries engineers already ran.&lt;/li&gt;
&lt;li&gt;The experiments, bugs, fixes, and fast follows that shaped today’s code.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;LLMs can’t see any of that. So they often generate code that’s plausible, but out of tune with the broader system. This adds hidden complexity, reintroduces old bugs, or sometimes over-engineers where a simpler fix would have worked.&lt;/p&gt;

&lt;p&gt;This mirrors what I studied in my grad-school days, where I focused on designing learning experiences that &lt;em&gt;enculturate novices&lt;/em&gt;. Teaching novices syntax alone doesn’t make them programmers. They need the culture of programming: exposure to how experts debug, comment, negotiate trade-offs, and work together.&lt;/p&gt;

&lt;p&gt;Without that context, their “magic” fizzles.&lt;/p&gt;

&lt;p&gt;It’s a bit like (and forgive the reference...we’ll leave the author out of it 🙃) when a certain boy wizard mispronounces &lt;em&gt;Diagon Alley&lt;/em&gt; and ends up somewhere entirely unintended. The spell was close enough to feel right, but without precision and context, he landed in &lt;em&gt;Knockturn Alley&lt;/em&gt; instead.&lt;/p&gt;

&lt;p&gt;AI today is in that same position. It can chant the incantations, but without the lived context of the codebase and its history, it often lands us in the wrong alley.&lt;/p&gt;

&lt;h2&gt;
  
  
  Context Engineering: The Antidote to Vibes Alone
&lt;/h2&gt;

&lt;p&gt;This is where context engineering comes in: the discipline of giving AI systems the right information, in the right form, at the right time.&lt;/p&gt;

&lt;p&gt;Instead of hoping an LLM vibes its way into correctness, context engineering means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Capturing rationale, history, and constraints alongside code.&lt;/li&gt;
&lt;li&gt;Distilling unstructured knowledge (docs, PDFs, logs, contracts) into structured signals.&lt;/li&gt;
&lt;li&gt;Connecting artifacts across the software lifecycle so AI can see the bigger picture.&lt;/li&gt;
&lt;li&gt;Making the invisible visible so the AI doesn’t just guess, it reasons.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With context, AI shifts from being a clumsy novice to a genuine collaborator.&lt;/p&gt;

&lt;p&gt;At Tensorlake, this is exactly what we’re focused on but from the document perspective. It's why I joined this company. This has been a problem since "before AI" because it's a &lt;em&gt;&lt;strong&gt;learning&lt;/strong&gt;&lt;/em&gt; problem. We need to start addressing AI dev tools for vibe coding like we address AI data tools: unlock data that’s trapped in unstructured formats so that both humans and AI can use it as context. Not bigger models. Not longer prompts. Smarter inputs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters for Engineers
&lt;/h2&gt;

&lt;p&gt;For engineers experimenting with AI, this is the difference between a parlor trick and a production tool:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;With just vibes&lt;/strong&gt;: AI accelerates you today but introduces subtle complexity for tomorrow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;With context&lt;/strong&gt;: AI can understand systems, not just snippets.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But it’s not just about the AI. Engineers themselves should be using these AI-driven dev tools to learn.&lt;/p&gt;

&lt;p&gt;Learning a new framework? Use AI to surface not just the docs, but the design decisions and trade-offs baked into them.&lt;/p&gt;

&lt;p&gt;Trying to understand a legacy codebase? Use AI tools that highlight the history of changes, PR debates, and bugs fixed, not just the latest code snapshot.&lt;/p&gt;

&lt;p&gt;Building awareness in a fast-moving team? Let AI summarize Slack threads, issues, and commits so you don’t miss evolving context.&lt;/p&gt;

&lt;p&gt;In other words: don’t just let AI code for you. Let it teach you, by surfacing the cultural and contextual knowledge that makes the code what it is. That’s how engineers can stay enculturated in their own systems, even as those systems evolve.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: Please do all this responsibly. This is not the post to dive into ethics, but I hope you understand what "responsibly" means.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing Thoughts
&lt;/h2&gt;

&lt;p&gt;Fred Brooks showed us that programming itself once felt like magic; typing the right incantation to summon something new. Today, AI has made that magic even more accessible. But without context, it’s the wrong kind of magic: flashy, fragile, and ultimately unsustainable.&lt;/p&gt;

&lt;p&gt;When I think about my research on how people learn to program, the lesson for AI is the same: magic isn’t learned in isolation. It’s learned in community, through practice, feedback, and, most importantly, context .&lt;/p&gt;

&lt;p&gt;If Brooks were writing today, I think he’d smile at the idea of The Mythical Vibe-Month. But he’d also remind us that engineering discipline is what makes software scale.&lt;/p&gt;

&lt;p&gt;Vibes are the incantation.&lt;br&gt;
Context is the curriculum.&lt;br&gt;
And that’s what turns messy magic into real mastery.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>ai</category>
      <category>contextengineering</category>
      <category>vibecoding</category>
    </item>
  </channel>
</rss>
