Forem: Sarah Guthals, PhD

Underdocumented issues are the worst, especially when it's more about configuration because it's often under-error-messaged too tear

Sarah Guthals, PhD — Thu, 04 Dec 2025 00:25:57 +0000

Rizèl Scarlett

Dec 3 '25

How to Query a Railway SQLite Database from GitHub Actions

#devops #githubactions #automation

Comments 2

4 min read

It's time to start measuring accuracy of data extraction with downstream systems and usability in mind, not just vanity metrics for a marketing slide

Sarah Guthals, PhD — Wed, 05 Nov 2025 17:59:55 +0000

Benchmarking the Most Reliable Document Parsing API

Sarah Guthals, PhD for Tensorlake ・ Nov 5

Benchmarking the Most Reliable Document Parsing API

Sarah Guthals, PhD — Wed, 05 Nov 2025 17:37:05 +0000

Document parsing is the foundation of enterprise AI applications. Whether you're building RAG pipelines, automating insurance claims, or extracting data from financial reports, everything starts with one question: Can you consistently transform messy, real-world documents into structured, machine-readable data?

Our customers need the best document ingestion API for their use cases. They're comparing Azure, AWS Textract, popular open-source models like Docling and Marker.

We built a benchmark that measures what matters: Can downstream systems actually use this output?

Measuring What Actually Matters

Tensorlake both reads documents and extracts structured data, so when choosing what to measure accuracy with, we wanted to ensure we were measuring both document parsing with structural preservation and structured extraction for downstream usability.

The aspects of Document Parsing that we wanted to measure were:

Tables: Ensuring we can parse and measure accuracy of complex tables with merged cells and multi-row headers
Reading Order: In multi-column documents, and documents with complex layouts, we measure whether the reading order is preserved while parsing.
Structured Extraction Accuracy: Measuring direct downstream usability of extracted data. A small OCR error in parsing a table cell can cause failure in achieving the downstream task, while the overall accuracy of the OCR on the document may be high.
Extraction of footnotes, formulas, figures and other non-textual content.

Our Evaluation Methodology

We employ two metrics that better capture these features with real-world reliability:

TEDS (Tree Edit Distance Similarity)

Compares predicted and ground-truth Markdown/HTML tree structures
Captures structural fidelity in tables and complex layouts
Widely adopted in OCRBench v2 and OmniDocBench evaluations
Measures whether the document's logical structure and textual alignment remains intact

TEDS answers: "Is this table still a table?" Not just "Is the text similar?"

JSON F1 (Field-Level Precision and Recall)

Compares extracted JSON against schema-based ground truth
Precision measures correctness of extracted fields
Recall measures completeness of required field capture
F1 score balances both for overall reliability assessment

JSON F1 answers: "Can downstream automation actually use this data?" Not just "Is some text present?"

Together, these metrics answer the essential question: "Can downstream systems use this output?" rather than simply "Is the text similar?"

Stage 1: Document Reading Ability (OCR and Structural Preservation)

Each parsing model generates Markdown/HTML output. We evaluate using TEDS to measure how well structure is preserved; reading order, table integrity, and layout coherence. You can find our updated dataset published here.

We use the public OCRBench v2 and OmniDocBench datasets. However, upon review, we identified inconsistencies in the published ground truth of OCRBench v2. We conducted a comprehensive audit and correction to ensure evaluation accuracy.

Stage 2: Structured Extraction Accuracy (Downstream Usability)

We pass the Markdown through a standardized LLM (GPT-4o) with predefined JSON schemas, measuring JSON F1. This isolates how OCR quality impacts real extraction workflows, where an LLM interprets the parsed text.

Initial JSON schemas and reference answers are generated using Gemini Pro 2.5, then human reviewers audit and correct them to ensure high-quality gold standards.

This methodology ensures fair, reproducible comparisons by varying only the OCR models (Stage 1) while keeping the extraction model constant (Stage 2).

The Results: Public Dataset Performance

Document Parsing Performance

We evaluated leading open-source and proprietary models:

Key Findings:

Tensorlake achieves the highest TEDS score, indicating superior structural preservation
The gap between Docling and production-grade systems is substantial

Table Parsing Performance

We evaluated Tensorlake’s table parsing accuracy using the OmniDocBench dataset — a CVPR-accepted benchmark for comprehensive document understanding tasks (GitHub link).

Table accuracy in OmniDocBench is quantified using a combination of tree-based and string-based metrics. In particular, we measured TEDS (Tree Edit Distance Similarity), which assesses both the structural and textual alignment between predicted and ground-truth HTML tables.

To reproduce our results, generate Markdown outputs using the models listed below, then run the evaluation method provided in the OmniDocBench repository. We have used 512 document images with tables and v1.5 of the code version. Evaluation outputs are released in Huggingface(link)

¹ Marker's Number is from the officially published OmniDocBench repository.

Key Findings:

On OmniDocBench's challenging tables, Tensorlake leads with 86.79% TEDS
Open-source solutions struggle with table extraction (sub-70% TEDS)
Tensorlake maintains table structure even on complex, multi-page tables

Performance on Real World Enterprise Documents

OCR Models are rarely trained on enterprise documents, because they are not publicly available. We wanted to test how well our model performs and others perform on these documents.

Enterprise Document Performance (100 pages)

We curated 100 document pages spanning banking, retail, and insurance sectors. This represents real production workloads: invoices with water damage, scanned contracts with skewed text, bank statements with multi-level tables.

Key Findings:

Tensorlake achieves 91.7% F1 with standard extraction, beating all competitors
The difference between 91.7% and 68.9% F1 is massive: it’s 5 extra fields correctly extracted out of every 20
In production workflows processing thousands of documents daily, this accuracy gap compounds into significant error reduction

But even comparing the higher F1 scores when parsing a standard form, Azure and Textract jumble the reading order and skip data completely, whereas Tensorlake preserves the complex reading order and groups data correctly and accurately:

Delivering the Best Performance/Price Ratio

Accuracy without affordability isn't practical. Here's how Tensorlake compares to other Document Ingestion APIs:

Tensorlake: $10 per 1k pages
- TEDS Score: 86.79
- F1 Score: 91.7
Azure: $10 per 1k pages
- TEDS Score: 78.14
- F1 Score: 88.1
AWS Textract: $15 per 1k pages
- TEDS Score: 80.75
- F1 Score: 88.4

Tensorlake delivers the highest accuracy than both Azure and AWS Textract, matching Azure's cost while AWS Textract is 50% more expensive.

Take the Next Step

When your business depends on accurate document processing, you can't afford to use anything less.

Try Tensorlake free

Want to discuss your specific use case?

Schedule a technical demo with our team.

Questions about the benchmark?

Join our Slack community

Process documents with hundreds of pages with no issues. In this example, I extracted crypto holdings from 200+ page SEC filings by first classifying pages using VLM support and then extracting relevant information only from those pages.

Sarah Guthals, PhD — Thu, 16 Oct 2025 20:54:21 +0000

New: Vision Language Models for Document Processing

Sarah Guthals, PhD for Tensorlake ・ Oct 16

New: Vision Language Models for Document Processing

Sarah Guthals, PhD — Thu, 16 Oct 2025 20:52:08 +0000

We've expanded our use of Vision Language Models (VLMs) across multiple DocumentAI features for faster and more accurate document processing on documents with hundreds of pages:

Page Classification: Identify relevant pages in large documents
Figure and Table Summarization: Extract insights from visual elements
Structured Extraction (with skip_ocr): Direct visual understanding for more accurate extraction on harder to parse documents (e.g. scanned documents, engineering diagrams, or documents with complex reading order)

This post focuses on our enhanced page classification capabilities for demonstration. With VLM support, you can quickly process large documents by identifying and extracting from only relevant pages.

Try it in this Colab Notebook.

Key Improvements

Scale & Performance

Handle Large Documents: Classify documents with hundreds of pages without performance degradation
VLM-Powered Classification: Replaced OCR with Vision Language Models for faster, more accurate classification
Selective Processing: Only parse pages that matter, reducing processing time and costs

Recommended Workflow

Classify First: Use the classify endpoint to identify relevant pages based on your criteria
Parse Selectively: Set page_range to only process the classified relevant pages
Extract Efficiently: Apply structured extraction only to pages containing the information you need

Use Case Example: SEC Filings Analysis

This approach is particularly powerful for extracting specific information from lengthy documents like SEC filings. For example, when analyzing cryptocurrency holdings across multiple companies' 10-K and 10-Q reports:

Challenge: Each filing can be 100-200+ pages, but crypto-related information might only appear on 10-20 pages
Solution: First classify pages containing "digital assets holdings", then extract structured data only from those pages
Result: 80-90% reduction in processing time and more focused, accurate extractions

Code Example

from tensorlake.documentai import DocumentAI, PageClassConfig

doc_ai = DocumentAI()

# Step 1: Classify pages
page_classifications = [
    PageClassConfig(
        name="digital_assets_holdings",
        description="Pages showing cryptocurrency holdings on balance sheet..."
    )
]

parse_id = doc_ai.classify(
    file_url=filing_url,
    page_classifications=page_classifications
)

result = doc_ai.wait_for_completion(parse_id=parse_id)

# Step 2: Parse only relevant pages
relevant_pages = result.page_classes[0].page_numbers
page_range = ",".join(str(i) for i in relevant_pages)

final_result = doc_ai.parse_and_wait(
    file=filing_url,
    page_range=page_range,
    structured_extraction_options=[...]
)

Benefits

Cost Efficiency: Process only what you need
Speed: Reduce processing time by focusing on relevant content
Accuracy: VLM classification provides better understanding of page content
Scalability: Handle large document sets without compromising performance

Try It Out

Check out our example notebook demonstrating how to extract cryptocurrency metrics from SEC filings using the new classification approach.

Getting Started

Update to the latest version of Tensorlake:
pip install --upgrade tensorlake

Then start classifying, summarizing, and extracting with improved efficiency!

Precise Data Extraction: Pattern-Based Partitioning for Structured Extraction

Sarah Guthals, PhD — Wed, 15 Oct 2025 23:12:33 +0000

Your document extraction pipeline is brittle. Hard-coded page ranges break when layouts shift, full-document parsing burns through tokens on irrelevant content, and template based extraction fails when target data moves between document versions.

Tensorlake's pattern-based extraction solves this within your StructuredExtractionOption workflows. Define start and end patterns to extract only the data sections you need.

No more parsing noise, no more layout dependencies.

The Problem with Traditional Extraction

Document layout variability breaks positional extraction logic. A financial report might place the "Total Assets" section starting on page 3 and going through page 4 in one document and all within page 7 in another. Parsing entire documents wastes compute cycles and introduces noise into structured extraction workflows. Fixed page or section boundaries miss target data that spans inconsistent locations across document sets.

Pattern-based partitioning solves this by decoupling extraction logic from document layout through regex-driven zone targeting.

Pattern-Based Partitioning: Content-Aware Extraction

Pattern-based partitioning delivers precision targeting through regex pattern recognition within your StructuredExtractionOption:

Precise: Use regex patterns like \\bAccount\\s+Summary\\b to identify exactly where your target data begins or ends, skipping irrelevant content that clutters extraction results.
Flexible: Define start patterns to begin extraction at specific markers, end patterns to stop at precise terminators, or both to capture data between known markers. Extract only the contract section you need, not the entire 200-page document.
Multi-Page: Extract data that spans pages by focusing on content patterns rather than arbitrary page boundaries or specific section headers. Perfect for financial summaries, property listings, or contract clauses that don't respect document structure.
API-Native: Integrate into existing workflows with simple JSON configuration in your parse endpoint calls, no architectural changes required.

Implementation: Four Steps to Pattern-Based Partitioning for Extraction

Identify Your Patterns
Analyze your documents to find consistent text markers around target data. Look for headers like "Account Summary", "Grand Total:", or "Section 4.2" that reliably indicate extraction zones.
Configure Extraction
Add pattern configuration to your StructuredExtractionOption:

{
    "partition_strategy": {
        "patterns": {
        "start_patterns": ["\\bACCOUNT\\s+SUMMARY\\b"],
        "end_patterns": ["\\bDAILY\\s+ACCOUNT\\s+ACTIVITY\\b"]
        }
    }
}

Test & Refine
Process sample documents and adjust patterns to capture exactly the data sections you need. Start broad, then narrow your regex patterns for precision.
Scale Deployment
Apply consistent extraction rules across thousands of similar documents with confidence. Pattern-based targeting scales linearly, define once, extract consistently.

Results: Deterministic Extraction with Content-Aware Targeting

Pattern-based partitioning delivers deterministic extraction performance across variable document layouts. Your extraction logic becomes resilient to structural changes. Financial summaries can migrate from page 3 to page 7, contract clauses can span different sections, and your pipeline continues extracting the correct data zones.

The architectural benefit: your extraction logic becomes layout-agnostic. Instead of brittle positional dependencies, you define semantic boundaries that scale across document variations. The result: consistent structured outputs from inconsistent document inputs, with extraction accuracy that doesn't degrade as document templates evolve.

Ready to stop guessing and start targeting your extractions? Try Tensorlake's pattern-based partitioning today:

-> Documentation: Learn more about pattern-based partitioning

-> Colab Notebook: Try the notebook

-> Schedule a call and we’ll help you get going: Book a call now

Document engineers/DevRel - how are you using Claude (or other tools) in content creation?

Sarah Guthals, PhD — Fri, 19 Sep 2025 18:59:37 +0000

Sarah Guthals, PhD

Sep 19 '25

Working Effectively with Claude: From Vibe Prompting to Context Engineering for Technical Content

Comments

6 min read

Working Effectively with Claude: From Vibe Prompting to Context Engineering for Technical Content

Sarah Guthals, PhD — Fri, 19 Sep 2025 18:58:28 +0000

How to leverage AI as a collaborative tool for creating educational content that actually works

In my recent post about The Mythical Vibe-Month, I wrote about how "vibe coding" (aka throwing prompts at LLMs and hoping for magic) creates fragile, context-free outputs. But here's what I didn't mention: the same principle applies to content creation.

Over the past several months, I've been working with Claude to create technical documentation, blog posts, tutorials, and educational materials. Not through vibe prompting, but through what I call collaborative context engineering: treating Claude as a learning partner rather than a magic answer machine.

The difference? Instead of expecting Claude to be the "sage on the stage" delivering perfect content from thin air, I position myself as the "guide on the side," directing our collaboration toward better outcomes. This mirrors how effective learning actually works: through dialogue, iteration, and building understanding together.

The Problem with Vibe Prompting for Content

Most people approach AI content creation like this:

"Write a blog post about document processing for developers"

And then they're surprised when the output feels generic, misses crucial context, or doesn't match their voice. It's the content equivalent of landing in Knockturn Alley instead of Diagon Alley; close enough to feel right, but missing the mark entirely.

The problem isn't the AI's capability. The problem is that good technical content requires lived context that can't be vibed into existence:

The specific pain points your audience faces
The mental models that help concepts click
The edge cases and gotchas from real implementation
Your unique perspective and voice
The broader strategic context of why this content matters

Context Engineering for Content Creation

Instead of hoping Claude vibes its way to good content, I treat our collaboration as a context engineering exercise. Here's my actual process:

1. Start with Research, Not Writing

When I want Claude to help with content, I never start with "write this for me." I start with "learn about this with me."

My typical opening prompt:

"I want you to do research on [topic] and learn about the challenges facing my ICP by asking me questions, if needed. You can also find information by search competitors such as X, Y, or Z, referencing previously published work [links] and reading through docs [links]. You should also do research on me to better understand how I would position and explain this topic. You can find information about me on my GitHub, LinkedIn, and resume. You should also read these blog posts to understand my tone [links]."

This isn't delegation; it's collaboration setup. I'm giving Claude the tools to understand both the subject matter and my perspective. Claude then researches, asks follow-up questions, and builds context before we ever touch content creation.

Why this works: Claude becomes a learning partner who understands my voice, my audience, and my goals, rather than a content generator working from assumptions.

2. Guide Through Questions, Don't Dictate Answers

After Claude does initial research, I don't immediately jump to "now write the thing." Instead, I let Claude ask me questions...lots of them.

In our collaborations, Claude typically asks things like:

"What are the biggest barriers you've observed that prevent engineers from achieving what they want with code?"
"What does success look like for you in this role?"
"How do you see the thread between [your previous work] and [current work]?"

These aren't just information-gathering questions. They're the kinds of questions that help me clarify my own thinking. Often, Claude's questions surface insights I hadn't explicitly articulated, even to myself.

Why this works: Good content comes from clear thinking. The Socratic method Claude uses here forces me to crystallize ideas that might have been fuzzy, giving us both better material to work with.

3. Iterate Through Feedback, Not Replacement

When Claude produces content, I never treat the first draft as the final product. Instead, I provide specific feedback:

"This section needs more technical depth"
"The tone here doesn't match my voice from [previous article]"
"Add a concrete example of [specific scenario]"
"This analogy doesn't quite work for our audience"

Crucially, I don't just say "make it better." I give Claude the context it needs to improve in the right direction.

Why this works: Each iteration teaches Claude more about my standards, voice, and audience. The content gets progressively better, and Claude gets progressively better at anticipating what I need.

4. Supplement with Additional Research

Mid-conversation, I often ask Claude to research additional context:

"Look up the latest developments in [technology area]"
"Find examples of how [specific company] approaches this problem"
"Research the best practices around [implementation detail]"

This isn't because Claude's initial research was insufficient. It's because good content creation is an exploratory process. As we develop ideas together, new research needs emerge.

Why this works: Real-time research keeps our content current and comprehensive. It also models how I actually write: constantly fact-checking, finding examples, and building on new information.

What I Actually Publish vs. What Claude Produces

Here's something crucial: I never publish Claude's output directly. What Claude produces is sophisticated first-draft material that captures my voice and ideas, but it's not my final work.

My published content goes through several more layers:

Structural editing where I reorganize for better flow
Voice refinement where I adjust tone and style to exactly match my perspective
Technical validation where I verify every code example and technical claim
Audience optimization where I add specific details that resonate with my community

Think of Claude as an extremely capable research assistant and thought partner, not a ghostwriter. The ideas, insights, and expertise are mine. Claude helps me organize and articulate them more effectively.

Why This Matters for Engineers Building with AI

If you're working on agentic applications, LLM integrations, or AI-powered developer tools, this collaborative approach offers several lessons:

Context is everything. The most sophisticated AI in the world can't substitute for domain knowledge, user empathy, and situational awareness. Your job isn't to automate human expertise away, it's to amplify it.

Iteration beats generation. Instead of building systems that try to produce perfect outputs in one shot, build systems that support rapid iteration and refinement. The magic happens in the feedback loop, not the initial output.

AI should teach, not just execute. The best AI tools I've used don't just perform tasks, they help me understand problems better, ask better questions, and develop better solutions. Claude's questioning approach has genuinely improved my thinking about content strategy and audience needs.

Practical Takeaways

Whether you're creating documentation, building developer education, or working on any AI-powered content creation:

Set up context before requesting output. Give your AI tool the background information it needs to understand your goals, audience, and constraints.
Use AI to help you think, not just to produce. The best prompts are often questions that help clarify your own thinking.
Plan for iteration. Your first output won't be your final output. Build feedback loops into your workflow.
Maintain human judgment. AI can help you articulate ideas and organize thoughts, but the expertise, creativity, and final decisions should remain yours.

The Meta-Point About Tools

This brings me back to the core insight about working with any powerful tool: when you have a hammer, not everything is a nail. Claude is incredibly capable, but knowing when and how to use it effectively makes all the difference.

The goal isn't to automate content creation. The goal is to amplify human expertise, accelerate the iteration cycle, and create better outcomes through collaboration.

In my work, this approach has helped me create documentation that developers actually use, tutorials that successfully onboard new users, and content that genuinely advances our mission of making document AI accessible to engineers at any level.

That's the difference between vibe prompting and context engineering. One hopes for magic. The other creates it, systematically and sustainably.

Because at the end of the day, the best tools don't replace human expertise, they make that expertise more powerful.

Ever asked a model “where did you get that?” Citations for document parsing solve that exact trust gap. Fewer hallucinations. More trust. Reliable workflows. Read the article, watch the video, try the notebook 👇

Sarah Guthals, PhD — Wed, 03 Sep 2025 19:50:17 +0000

Sarah Guthals, PhD for Tensorlake

Sep 3 '25

Verify Structured Output with Field-Level Citations

Comments

3 min read

Verify Structured Output with Field-Level Citations

Sarah Guthals, PhD — Wed, 03 Sep 2025 16:45:00 +0000

Missing evidence is one of the biggest blockers in production AI workflows.

It’s not enough to say what a document claims, you need to show where in the source that claim came from. Whether you’re auditing bank statements, verifying medical referral forms, or investigating fraud, traceability is a hard requirement.

That’s why we’ve introduced a new parameter in Tensorlake’s StructuredExtractionOptions:

StructuredExtractionOptions(
    schema_name="ExampleSchema",
    json_schema=ExampleSchema,
    provide_citations=True
)

When provide_citations=True, every extracted field includes:

Page number
Bounding box (bbox) coordinates

This means structured outputs are no longer just machine-readable; they’re auditable, verifiable, and traceable back to the source document.

Traceable Context Means Trustworthy RAG

In many workflows, “close enough” isn’t good enough. Teams need confidence that extracted values align with the document’s ground truth. Let’s look at where this matters most:

Banking & Finance: Auditors need to understand exactly which account, statement, or transaction produced a reported number. If an account balance doesn’t reconcile, citations let you trace back to the precise page and bounding box where the discrepancy originates. No more guesswork in backtracking totals.
Fraud Detection: When anomalies appear in reported values, bounding-box citations provide the evidence trail. Investigators can quickly verify whether a suspicious number came from an altered document, a duplicated entry, or a genuine filing.
Healthcare & Forms Processing: At UCLA, teams processing medical referral forms wanted faster verification of ground truth. With citations, a structured field (like “referral date” or “doctor’s signature”) can point directly to the page span and bounding box where it was found, cutting human review time dramatically.

In short:

Citations turn structured extraction into a compliance-grade tool.

Implement Citations with One Line of Code

Let’s take a simple example: extracting transaction summaries from a bank statement.

from tensorlake.documentai import DocumentAI, StructuredExtractionOptions
from pydantic import BaseModel, Field
from typing import List

class Transaction(BaseModel):
    date: str = Field(description="Transaction date")
    description: str = Field(description="Transaction description")
    amount: float = Field(description="Transaction amount")

class BankStatement(BaseModel):
    transactions: List[Transaction]

doc_ai = DocumentAI()

structured_extraction_options = [
    StructuredExtractionOptions(
        schema_name="BankStatement",
        json_schema=BankStatement,
        provide_citations=True   # <-- new parameter
    )
]

result = doc_ai.parse_and_wait(
    file="https://tlake.link/documents/bank-statement",
    structured_extraction_options=structured_extraction_options
)

print(result.structured_data[0].data)

The returned JSON now looks like this:

"transactions": [
{
    "Date": "08/24",
    "Date_citation": [
    {
        "page_number": 1,
        "x1": 59,
        "x2": 135,
        "y1": 448,
        "y2": 482
    }
    ],
    "amount": "50.00",
    "amount_citation": [
    {
        "page_number": 1,
        "x1": 515,
        "x2": 585,
        "y1": 447,
        "y2": 482
    }
    ],
    "descriptions": "ATM CASH DEPOSIT, ***** 30073995581 AUT 082220 ATM CASH DEPOSIT 550 LONG BEACH BLVD LONG BEACH * NY",
    "descriptions_citation": [
    {
        "page_number": 1,
        "x1": 135,
        "x2": 515,
        "y1": 447,
        "y2": 482
    }
    ]
}

Each field is now annotated with a citation: the page number and bounding-box coordinates.

If you use our Tensorlake Cloud Playground, you can even get the visual bounding-boxes labeled for each extracted bit of information

From Data to Evidence

“In insurance, structured outputs power our workflows, but people still verify. With field-level citations, reviewers can jump from a data row straight to the exact COI or endorsement language. That’s the difference between ‘parsed’ and provable.”

— Jesse McClure, CTO and Co-Founder, Sublynk

Citations aren’t just nice-to-have, our customers across industries know that they unlock new workflows:

Audit-ready outputs: Every number is backed by ground-truth evidence.
Automated review: Flag discrepancies automatically and point reviewers directly to the source.
Explainability in RAG/Agents: Don’t just return answers—return the highlighted document snippets.
UI Enhancements: Build document viewers that highlight the exact fields extracted.

The benefit is twofold: engineers can build more reliable systems and stakeholders (auditors, compliance teams, regulators) get confidence and transparency.

Try Structured Extraction Citations Now

You can try provide_citations=True today in both the Tensorlake Playground and the API/Python SDK.

Docs: Structured Extraction
Example Notebook: Parse Bank Statements

If you have any questions or feedback, we'd love to hear from you! Join our Slack and let us know how you're using citations.

Traceability Built In

With the new provide_citations parameter, structured extraction becomes not only machine-readable but also evidence-backed.

Every field can now point back to its exact source location in the document, making Tensorlake the foundation for audit-ready, compliance-grade, and fraud-resistant AI workflows.

Start using it today. In production AI, traceability isn’t optional.

I've been dancing around this idea for a long time now. Regardless of my title, how the industry is shaping, how tech evolves, the core of why I love this industry is because it's really all about learning...and I'm a learning nerd 🤓

Sarah Guthals, PhD — Thu, 21 Aug 2025 02:22:29 +0000

Sarah Guthals, PhD

Aug 21 '25

The Mythical Vibe-Month: Vibe Coding, Context Engineering, and the Future of AI Dev Tools

#programming #ai #contextengineering #vibecoding

Comments

4 min read

The Mythical Vibe-Month: Vibe Coding, Context Engineering, and the Future of AI Dev Tools

Sarah Guthals, PhD — Thu, 21 Aug 2025 02:20:19 +0000

In The Mythical Man-Month, Fred Brooks famously wrote:

The magic of myth and legend has come true in our time. One types the correct incantation on a keyboard, and a display screen comes to life, showing things that never were nor could be.

In 1975, that was programming itself; type the right sequence of symbols and something new appeared.

Today, we’re living through a new version of that magic. With large language models, you don’t even need the exact incantation. A vague prompt - a vibe - can conjure up working code. It feels like we’ve entered what I like to call The Mythical Vibe-Month: a world where AI gives us the illusion of infinite acceleration.

But here’s the rub: magic without context is messy.

Vibe Coding’s Surface-Level Problem

“Vibe coding” works beautifully in demos, toy projects, and small scripts. But in production systems, it misses the most important ingredient: context.

And I don’t just mean the context of the code snippet. I mean the living context of a software project:

The Slack conversations that explain why a shortcut was chosen.
The GitHub issues where trade-offs were debated.
The PR comments that capture edge cases and gotchas.
The searches, docs, and past LLM queries engineers already ran.
The experiments, bugs, fixes, and fast follows that shaped today’s code.

LLMs can’t see any of that. So they often generate code that’s plausible, but out of tune with the broader system. This adds hidden complexity, reintroduces old bugs, or sometimes over-engineers where a simpler fix would have worked.

This mirrors what I studied in my grad-school days, where I focused on designing learning experiences that enculturate novices. Teaching novices syntax alone doesn’t make them programmers. They need the culture of programming: exposure to how experts debug, comment, negotiate trade-offs, and work together.

Without that context, their “magic” fizzles.

It’s a bit like (and forgive the reference...we’ll leave the author out of it 🙃) when a certain boy wizard mispronounces Diagon Alley and ends up somewhere entirely unintended. The spell was close enough to feel right, but without precision and context, he landed in Knockturn Alley instead.

AI today is in that same position. It can chant the incantations, but without the lived context of the codebase and its history, it often lands us in the wrong alley.

Context Engineering: The Antidote to Vibes Alone

This is where context engineering comes in: the discipline of giving AI systems the right information, in the right form, at the right time.

Instead of hoping an LLM vibes its way into correctness, context engineering means:

Capturing rationale, history, and constraints alongside code.
Distilling unstructured knowledge (docs, PDFs, logs, contracts) into structured signals.
Connecting artifacts across the software lifecycle so AI can see the bigger picture.
Making the invisible visible so the AI doesn’t just guess, it reasons.

With context, AI shifts from being a clumsy novice to a genuine collaborator.

At Tensorlake, this is exactly what we’re focused on but from the document perspective. It's why I joined this company. This has been a problem since "before AI" because it's a learning problem. We need to start addressing AI dev tools for vibe coding like we address AI data tools: unlock data that’s trapped in unstructured formats so that both humans and AI can use it as context. Not bigger models. Not longer prompts. Smarter inputs.

Why This Matters for Engineers

For engineers experimenting with AI, this is the difference between a parlor trick and a production tool:

With just vibes: AI accelerates you today but introduces subtle complexity for tomorrow.
With context: AI can understand systems, not just snippets.

But it’s not just about the AI. Engineers themselves should be using these AI-driven dev tools to learn.

Learning a new framework? Use AI to surface not just the docs, but the design decisions and trade-offs baked into them.

Trying to understand a legacy codebase? Use AI tools that highlight the history of changes, PR debates, and bugs fixed, not just the latest code snapshot.

Building awareness in a fast-moving team? Let AI summarize Slack threads, issues, and commits so you don’t miss evolving context.

In other words: don’t just let AI code for you. Let it teach you, by surfacing the cultural and contextual knowledge that makes the code what it is. That’s how engineers can stay enculturated in their own systems, even as those systems evolve.

Note: Please do all this responsibly. This is not the post to dive into ethics, but I hope you understand what "responsibly" means.

Closing Thoughts

Fred Brooks showed us that programming itself once felt like magic; typing the right incantation to summon something new. Today, AI has made that magic even more accessible. But without context, it’s the wrong kind of magic: flashy, fragile, and ultimately unsustainable.

When I think about my research on how people learn to program, the lesson for AI is the same: magic isn’t learned in isolation. It’s learned in community, through practice, feedback, and, most importantly, context .

If Brooks were writing today, I think he’d smile at the idea of The Mythical Vibe-Month. But he’d also remind us that engineering discipline is what makes software scale.

Vibes are the incantation.
Context is the curriculum.
And that’s what turns messy magic into real mastery.

Forem: Sarah Guthals, PhD

Underdocumented issues are the worst, especially when it's more about configuration because it's often under-error-messaged too *tear*

How to Query a Railway SQLite Database from GitHub Actions

It's time to start measuring accuracy of data extraction with downstream systems and usability in mind, not just vanity metrics for a marketing slide

Benchmarking the Most Reliable Document Parsing API

Sarah Guthals, PhD for Tensorlake ・ Nov 5

Benchmarking the Most Reliable Document Parsing API

Measuring What Actually Matters

Our Evaluation Methodology

TEDS (Tree Edit Distance Similarity)

JSON F1 (Field-Level Precision and Recall)

The Results: Public Dataset Performance

Document Parsing Performance

Table Parsing Performance

Performance on Real World Enterprise Documents

Enterprise Document Performance (100 pages)

Delivering the Best Performance/Price Ratio

Take the Next Step

Process documents with hundreds of pages with no issues. In this example, I extracted crypto holdings from 200+ page SEC filings by first classifying pages using VLM support and then extracting relevant information only from those pages.

New: Vision Language Models for Document Processing

Sarah Guthals, PhD for Tensorlake ・ Oct 16

New: Vision Language Models for Document Processing

Key Improvements

Scale & Performance

Recommended Workflow

Use Case Example: SEC Filings Analysis

Code Example

Benefits

Try It Out

Getting Started

Precise Data Extraction: Pattern-Based Partitioning for Structured Extraction

The Problem with Traditional Extraction

Pattern-Based Partitioning: Content-Aware Extraction

Implementation: Four Steps to Pattern-Based Partitioning for Extraction

Results: Deterministic Extraction with Content-Aware Targeting

Document engineers/DevRel - how are you using Claude (or other tools) in content creation?

Working Effectively with Claude: From Vibe Prompting to Context Engineering for Technical Content

Working Effectively with Claude: From Vibe Prompting to Context Engineering for Technical Content

The Problem with Vibe Prompting for Content

Context Engineering for Content Creation

1. Start with Research, Not Writing

2. Guide Through Questions, Don't Dictate Answers

3. Iterate Through Feedback, Not Replacement

4. Supplement with Additional Research

What I Actually Publish vs. What Claude Produces

Why This Matters for Engineers Building with AI

Practical Takeaways

The Meta-Point About Tools

Ever asked a model “where did you get that?” Citations for document parsing solve that exact trust gap. Fewer hallucinations. More trust. Reliable workflows. Read the article, watch the video, try the notebook 👇

Verify Structured Output with Field-Level Citations

Verify Structured Output with Field-Level Citations

Traceable Context Means Trustworthy RAG

Implement Citations with One Line of Code

From Data to Evidence

Try Structured Extraction Citations Now

Traceability Built In

I've been dancing around this idea for a long time now. Regardless of my title, how the industry is shaping, how tech evolves, the core of why I love this industry is because it's really all about learning...and I'm a learning nerd 🤓

The Mythical Vibe-Month: Vibe Coding, Context Engineering, and the Future of AI Dev Tools

The Mythical Vibe-Month: Vibe Coding, Context Engineering, and the Future of AI Dev Tools

Vibe Coding’s Surface-Level Problem

Context Engineering: The Antidote to Vibes Alone

Why This Matters for Engineers

Closing Thoughts

Underdocumented issues are the worst, especially when it's more about configuration because it's often under-error-messaged too tear