<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: dengkui yang</title>
    <description>The latest articles on Forem by dengkui yang (@dengkui_yang_fcb5dbe2da32).</description>
    <link>https://forem.com/dengkui_yang_fcb5dbe2da32</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3891878%2F09738b27-6499-46ca-a364-4d3336583d7d.png</url>
      <title>Forem: dengkui yang</title>
      <link>https://forem.com/dengkui_yang_fcb5dbe2da32</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/dengkui_yang_fcb5dbe2da32"/>
    <language>en</language>
    <item>
      <title>A Research Workflow That Starts With Sources, Not Prompts</title>
      <dc:creator>dengkui yang</dc:creator>
      <pubDate>Thu, 30 Apr 2026 11:21:14 +0000</pubDate>
      <link>https://forem.com/dengkui_yang_fcb5dbe2da32/a-research-workflow-that-starts-with-sources-not-prompts-1f79</link>
      <guid>https://forem.com/dengkui_yang_fcb5dbe2da32/a-research-workflow-that-starts-with-sources-not-prompts-1f79</guid>
      <description>&lt;p&gt;&lt;em&gt;How private AI notebooks turn scattered files, links, notes, and local models into a reusable thinking loop.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Based on public materials from&lt;/em&gt; &lt;code&gt;opennotebook.shop&lt;/code&gt; &lt;em&gt;and the open-source&lt;/em&gt; &lt;code&gt;open-notebook&lt;/code&gt; &lt;em&gt;repository reviewed on April 30, 2026.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Many AI note-taking tools begin with the same interface: a blank prompt box.&lt;/p&gt;

&lt;p&gt;That is convenient, but it quietly puts the wrong thing at the center. Real research does not start with a prompt. It starts with a pile of material: papers, links, meeting notes, transcripts, PDFs, half-formed thoughts, and questions that become clearer only after you spend time with the sources.&lt;/p&gt;

&lt;p&gt;This makes Open Notebook useful to examine as a workflow idea. The &lt;a href="https://www.opennotebook.shop/" rel="noopener noreferrer"&gt;opennotebook.shop page&lt;/a&gt; presents a simple flow: add files, links, and notes; ask questions; save cited answers; then turn the notebook into audio-style briefings with local or cloud models. The open-source project adds the deeper architecture: self-hosting, multiple model providers, full-text and vector search, context-aware chat, AI-assisted notes, podcasts, REST API access, and local model options such as Ollama.&lt;/p&gt;

&lt;p&gt;The useful question is not whether an AI notebook can answer a question.&lt;/p&gt;

&lt;p&gt;The useful question is whether it can help a person keep thinking after the answer.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Scenario: Turning Raw Material Into a Briefing
&lt;/h2&gt;

&lt;p&gt;Imagine a small team preparing for a product strategy review.&lt;/p&gt;

&lt;p&gt;They have customer interview notes, a few internal memos, a competitor page, a product analytics export, and a recording transcript from last week's meeting. None of these is enough on its own. Together, they contain a direction, but only if someone can collect them, ask better questions, preserve evidence, and turn the result into something reusable.&lt;/p&gt;

&lt;p&gt;The common AI shortcut is to paste everything into a chatbot and ask for a summary.&lt;/p&gt;

&lt;p&gt;That works once. Then the problems begin:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which source supported the summary?&lt;/li&gt;
&lt;li&gt;Which parts came from private notes versus public links?&lt;/li&gt;
&lt;li&gt;What should be sent to a cloud model, and what should stay local?&lt;/li&gt;
&lt;li&gt;Where does the useful answer go after the chat ends?&lt;/li&gt;
&lt;li&gt;Can the team turn the result into a note, a briefing, or a follow-up research plan?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where the notebook metaphor becomes more than UI. A notebook is not just where answers appear. It is where the research state accumulates.&lt;/p&gt;




&lt;h2&gt;
  
  
  Start by Protecting the Difference Between Sources and Notes
&lt;/h2&gt;

&lt;p&gt;A good research workflow begins by refusing to collapse everything into "content."&lt;/p&gt;

&lt;p&gt;Sources and notes are not the same thing.&lt;/p&gt;

&lt;p&gt;Sources are evidence. They are the imported material: files, links, transcripts, videos, audio, pasted text. They should remain stable and referenceable because they are the ground from which later claims are made.&lt;/p&gt;

&lt;p&gt;Notes are thinking. They are summaries, extracted insights, saved answers, manual observations, and decisions made after interacting with sources. Notes should be editable because understanding changes.&lt;/p&gt;

&lt;p&gt;Open Notebook's own mental model follows this split: notebooks contain sources and notes. Sources are processed, indexed, and searchable. Notes are the evolving layer of insight.&lt;/p&gt;

&lt;p&gt;This distinction matters more than it first appears. If a system lets generated summaries blur into source material, the notebook becomes harder to trust. If a system preserves source identity, then every later output can be inspected:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;This answer came from these sources.&lt;/li&gt;
&lt;li&gt;This note was created from that interaction.&lt;/li&gt;
&lt;li&gt;This briefing reused these materials.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In ontology terms, a source and a note exist differently inside the workspace because they support different interactions. A source supports verification. A note supports adaptation. Confusing the two weakens both.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr9i26haepwmor2xm94z4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr9i26haepwmor2xm94z4.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure: a private notebook is a transformation loop. Sources remain evidence; notes, answers, and audio become reusable layers of understanding.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The First Real Control Is Context
&lt;/h2&gt;

&lt;p&gt;Once sources are collected, the next decision is not "which prompt should I write?"&lt;/p&gt;

&lt;p&gt;It is:&lt;/p&gt;

&lt;p&gt;What should the model be allowed to see?&lt;/p&gt;

&lt;p&gt;This is the most underrated part of AI notebook design. Context is not just a technical limit. It is a privacy, cost, and reasoning boundary.&lt;/p&gt;

&lt;p&gt;Open Notebook's docs describe context levels such as full content, summary only, or not in context. That seems like a small control, but it changes the whole workflow.&lt;/p&gt;

&lt;p&gt;For the strategy review scenario:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Public competitor pages can go into full context.&lt;/li&gt;
&lt;li&gt;Customer interviews might be summarized before model use.&lt;/li&gt;
&lt;li&gt;Sensitive internal notes might stay out of cloud context entirely.&lt;/li&gt;
&lt;li&gt;A local model can be used for a first pass on private material.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where a notebook becomes a cognitive tool rather than a chatbot. It gives the researcher a way to decide what participates in the current act of reasoning.&lt;/p&gt;

&lt;p&gt;The practical ontology idea is simple: boundaries are part of the object. A source shared in full is not operationally the same as a source represented only by a summary. A private note excluded from context is not participating in the same interaction network as a public article. Good AI tooling should let users express those differences.&lt;/p&gt;




&lt;h2&gt;
  
  
  Chat Is for Exploration, Ask Is for Discovery, Transformations Are for Reuse
&lt;/h2&gt;

&lt;p&gt;One reason prompt-first tools feel shallow is that every task becomes the same interaction.&lt;/p&gt;

&lt;p&gt;Research has more shapes than that.&lt;/p&gt;

&lt;p&gt;Sometimes the team wants a conversation:&lt;/p&gt;

&lt;p&gt;"Compare these two customer interviews. What tension do you see?"&lt;/p&gt;

&lt;p&gt;That is Chat. The user chooses the context, asks follow-up questions, and steers the reasoning.&lt;/p&gt;

&lt;p&gt;Sometimes the team wants discovery:&lt;/p&gt;

&lt;p&gt;"Across all sources, what are the strongest arguments for delaying the launch?"&lt;/p&gt;

&lt;p&gt;That is Ask. Retrieval matters because the user may not know where the relevant evidence lives.&lt;/p&gt;

&lt;p&gt;Sometimes the team wants repeatability:&lt;/p&gt;

&lt;p&gt;"For each interview, extract pain points, buying triggers, objections, and quoted phrases."&lt;/p&gt;

&lt;p&gt;That is a transformation. The goal is not a conversation but a consistent note structure that can be compared later.&lt;/p&gt;

&lt;p&gt;Open Notebook separates these modes, and that separation is healthy. Chat, Ask, and Transformations are not three labels for the same thing. They are three ways of working with knowledge:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Chat keeps the thinking fluid.&lt;/li&gt;
&lt;li&gt;Ask finds relevant material across the notebook.&lt;/li&gt;
&lt;li&gt;Transformations turn raw sources into structured notes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A good workflow uses all three. The team might transform every interview into a consistent format, use Ask to find cross-source patterns, and then use Chat to reason through tradeoffs before saving the final answer as a note.&lt;/p&gt;

&lt;p&gt;That is much closer to how thinking actually happens.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Notebook Should Remember the Work
&lt;/h2&gt;

&lt;p&gt;The moment an AI answer becomes useful, it should not disappear into chat history.&lt;/p&gt;

&lt;p&gt;It should become part of the notebook.&lt;/p&gt;

&lt;p&gt;This is where saved answers and AI-assisted notes matter. A research workspace needs a way to turn a transient interaction into durable knowledge. Otherwise, the team repeats the same questions and loses the path from evidence to decision.&lt;/p&gt;

&lt;p&gt;In a good notebook workflow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;raw sources stay available for verification&lt;/li&gt;
&lt;li&gt;generated answers can be saved as notes&lt;/li&gt;
&lt;li&gt;manual notes can correct or extend AI output&lt;/li&gt;
&lt;li&gt;notes can become searchable material for later work&lt;/li&gt;
&lt;li&gt;citations keep processed claims connected to evidence&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is not merely organization. It is internal adjustment.&lt;/p&gt;

&lt;p&gt;In existence-oriented language, a system survives and develops by acting outward and adjusting inward. For a research notebook, outward action means importing sources, asking questions, generating answers, and producing briefings. Inward adjustment means saving notes, changing context, revising interpretation, and keeping the workspace ready for the next question.&lt;/p&gt;

&lt;p&gt;That is the difference between using AI once and building understanding over time.&lt;/p&gt;




&lt;h2&gt;
  
  
  Audio Briefings Are Not a Gimmick if the Source Trail Survives
&lt;/h2&gt;

&lt;p&gt;Podcast-style output can look like a flashy feature, but in a research workflow it solves a real problem.&lt;/p&gt;

&lt;p&gt;Not every stakeholder will read the full notebook. Not every teammate has time to inspect every source. Sometimes the useful output is a short audio-style briefing that turns a messy pile of material into a listenable synthesis.&lt;/p&gt;

&lt;p&gt;Open Notebook's open-source materials describe podcast generation as a higher-level transformation: sources and notes become an outline, dialogue, text-to-speech output, and finally an audio file. The important part is not just the audio. It is the path from evidence to briefing.&lt;/p&gt;

&lt;p&gt;If the briefing is detached from the notebook, it becomes just another generated artifact. If it remains connected to sources and notes, it becomes a new consumption layer for the same research state.&lt;/p&gt;

&lt;p&gt;That matters because knowledge work is not only about producing text. It is about changing form without losing traceability:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;source to answer&lt;/li&gt;
&lt;li&gt;answer to note&lt;/li&gt;
&lt;li&gt;note to briefing&lt;/li&gt;
&lt;li&gt;briefing back to follow-up questions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Better-designed AI notebook systems should not only generate outputs. They should preserve the continuity between outputs.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Local and Self-Hosted Options Change the Workflow
&lt;/h2&gt;

&lt;p&gt;Model choice changes behavior.&lt;/p&gt;

&lt;p&gt;If a team has to send everything to a single cloud model, it may over-share or avoid using AI for sensitive work. If the same notebook can use local models for privacy-sensitive passes and cloud models for less sensitive synthesis, the workflow becomes more flexible.&lt;/p&gt;

&lt;p&gt;Open Notebook's support for multiple providers and local options such as Ollama is valuable for this reason. It lets model selection become part of the work, not a hidden infrastructure detail.&lt;/p&gt;

&lt;p&gt;For the strategy review example, the team might use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a local model to summarize sensitive notes&lt;/li&gt;
&lt;li&gt;a cloud model to polish a non-sensitive stakeholder briefing&lt;/li&gt;
&lt;li&gt;embeddings and search to find relevant source chunks&lt;/li&gt;
&lt;li&gt;a self-hosted deployment to keep the notebook near private data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Self-hosting is not free. It brings setup, updates, credentials, backups, and security responsibility. But it also gives a team more control over where research lives and how models interact with it.&lt;/p&gt;

&lt;p&gt;The point is not that every team must self-host.&lt;/p&gt;

&lt;p&gt;The point is that serious knowledge workflows need visible tradeoffs.&lt;/p&gt;




&lt;h2&gt;
  
  
  What This Style of Notebook Is Really For
&lt;/h2&gt;

&lt;p&gt;Open Notebook is best understood as a tool for people who do not only want answers.&lt;/p&gt;

&lt;p&gt;They want a controlled path from source material to reusable understanding.&lt;/p&gt;

&lt;p&gt;That makes it relevant for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;researchers collecting papers, transcripts, and notes&lt;/li&gt;
&lt;li&gt;product teams preparing decisions from mixed evidence&lt;/li&gt;
&lt;li&gt;students building long-term understanding instead of one-off summaries&lt;/li&gt;
&lt;li&gt;consultants turning interviews and documents into client-ready briefings&lt;/li&gt;
&lt;li&gt;teams that need local or self-hosted workflows for sensitive context&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The pattern is broader than any one product:&lt;/p&gt;

&lt;p&gt;Start with sources. Decide context. Ask and explore. Save notes. Transform knowledge into the format the next person can use.&lt;/p&gt;

&lt;p&gt;That is the workflow an AI notebook should support.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Takeaway
&lt;/h2&gt;

&lt;p&gt;A useful AI notebook is not simply the one that produces the smoothest answer to the first question.&lt;/p&gt;

&lt;p&gt;It is the one that helps a person keep control of the research process after that answer appears.&lt;/p&gt;

&lt;p&gt;Open Notebook points toward that direction: sources remain evidence, notes become evolving understanding, context control defines what the model can touch, and outputs can become briefings without losing their relationship to the notebook.&lt;/p&gt;

&lt;p&gt;That is why the product is more interesting as a cognitive workflow than as a chat interface.&lt;/p&gt;

&lt;p&gt;It starts with sources, not prompts.&lt;/p&gt;

&lt;p&gt;And when it works, it helps research become something you can return to, revise, and reuse.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;opennotebook.shop page: &lt;a href="https://www.opennotebook.shop/" rel="noopener noreferrer"&gt;https://www.opennotebook.shop/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Open Notebook open-source repository: &lt;a href="https://github.com/lfnovo/open-notebook" rel="noopener noreferrer"&gt;https://github.com/lfnovo/open-notebook&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Originally published on [&lt;a href="https://medium.com/@li3169086779/a-research-workflow-that-starts-with-sources-not-prompts-134f86e53e5a" rel="noopener noreferrer"&gt;https://medium.com/@li3169086779/a-research-workflow-that-starts-with-sources-not-prompts-134f86e53e5a&lt;/a&gt;]&lt;/p&gt;

</description>
      <category>ai</category>
    </item>
    <item>
      <title>Why File-to-Markdown Conversion Is Becoming an AI Input Layer</title>
      <dc:creator>dengkui yang</dc:creator>
      <pubDate>Thu, 30 Apr 2026 05:18:37 +0000</pubDate>
      <link>https://forem.com/dengkui_yang_fcb5dbe2da32/why-file-to-markdown-conversion-is-becoming-an-ai-input-layer-4hn0</link>
      <guid>https://forem.com/dengkui_yang_fcb5dbe2da32/why-file-to-markdown-conversion-is-becoming-an-ai-input-layer-4hn0</guid>
      <description>&lt;p&gt;&lt;em&gt;Based on public materials from&lt;/em&gt; &lt;code&gt;markitdown.store&lt;/code&gt; &lt;em&gt;and Microsoft's open-source&lt;/em&gt; &lt;code&gt;markitdown&lt;/code&gt; &lt;em&gt;repository reviewed on April 29, 2026.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Most teams first meet document conversion as a utility problem.&lt;/p&gt;

&lt;p&gt;Take a PDF, a Word file, a spreadsheet, maybe a webpage, and turn it into text so an LLM can read it.&lt;/p&gt;

&lt;p&gt;That framing is understandable, but it is too small.&lt;/p&gt;

&lt;p&gt;Once you build retrieval, agents, or any serious document workflow, conversion stops being a side utility and starts becoming part of your system architecture.&lt;/p&gt;

&lt;p&gt;That is why &lt;a href="https://github.com/microsoft/markitdown" rel="noopener noreferrer"&gt;MarkItDown&lt;/a&gt; is interesting, and why the browser experience at &lt;a href="https://www.markitdown.store/" rel="noopener noreferrer"&gt;markitdown.store&lt;/a&gt; is worth paying attention to.&lt;/p&gt;

&lt;p&gt;The upstream open-source project from Microsoft is a lightweight Python utility for converting many file types into Markdown for LLM and text-analysis pipelines. The website makes that idea visible in a more inspectable way: the homepage presents upload, text, and URL inputs, shows a reviewable Markdown output panel, and explicitly frames the result as something you should inspect before using in retrieval or agent workflows.&lt;/p&gt;

&lt;p&gt;That combination points to a bigger engineering idea:&lt;/p&gt;

&lt;p&gt;Markdown is not just an output format here.&lt;/p&gt;

&lt;p&gt;It is an input layer for AI systems.&lt;/p&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;MarkItDown matters because it treats messy source files as something that should be normalized into a stable, reviewable, token-efficient working surface before deeper AI processing begins. The technical lesson is not "convert everything to plain text." The lesson is to preserve enough structure for downstream reasoning, while keeping clear trust boundaries around how files are fetched, parsed, and routed.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. The Real Job Is Normalization, Not Conversion
&lt;/h2&gt;

&lt;p&gt;If you only describe the task as "document conversion," you miss the real systems problem.&lt;/p&gt;

&lt;p&gt;The real problem is this:&lt;/p&gt;

&lt;p&gt;How do you turn heterogeneous files into something an LLM, a retriever, and a human reviewer can all reason about without each downstream component reinventing its own parser?&lt;/p&gt;

&lt;p&gt;That is a normalization problem.&lt;/p&gt;

&lt;p&gt;In a practical ontology sense, a document is not just a named file. It reveals itself through the interactions it supports. A spreadsheet invites table reasoning. A webpage carries links and hierarchy. A PDF may contain layout clues, embedded images, or scanned pages. If you flatten everything too aggressively, you lose the very evidence downstream tools need.&lt;/p&gt;

&lt;p&gt;What makes MarkItDown useful is that it does not aim at perfect visual reproduction. It aims at a stable intermediate representation that still carries enough structure to matter.&lt;/p&gt;

&lt;p&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/.%2Fassets%2Fmarkitdown%2Fmarkdown-working-surface-en.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/.%2Fassets%2Fmarkitdown%2Fmarkdown-working-surface-en.png" alt="Diagram showing mixed inputs normalized into Markdown and then used for review, retrieval, and agents" width="800" height="400"&gt;&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure: the important move is not just extraction, but converging mixed sources onto one working surface that humans and LLM systems can both inspect.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This is where the site demo is helpful. It does not present conversion as a magical black box. It presents source choices, a visible Markdown result, and workflow toggles such as table output, source note, and local preview. That is exactly how an input layer should behave: not only transforming data, but exposing enough of the transformation for humans to verify it.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Why Markdown Is a Strong Intermediate Format
&lt;/h2&gt;

&lt;p&gt;The MarkItDown README gives the clearest argument for the format choice.&lt;/p&gt;

&lt;p&gt;Its core claim is simple: Markdown stays close to plain text, but still preserves document structure such as headings, lists, tables, and links. The README also notes that mainstream LLMs are very comfortable with Markdown and that the format is relatively token-efficient.&lt;/p&gt;

&lt;p&gt;That is a stronger point than it first appears.&lt;/p&gt;

&lt;p&gt;A good intermediate format for AI needs at least four properties:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It should be legible to humans during review.&lt;/li&gt;
&lt;li&gt;It should preserve enough structure for retrieval and reasoning.&lt;/li&gt;
&lt;li&gt;It should avoid carrying unnecessary visual noise.&lt;/li&gt;
&lt;li&gt;It should move cheaply through prompts, indexes, and tool chains.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Markdown hits a practical balance.&lt;/p&gt;

&lt;p&gt;It is not a truth format. It does not preserve every layout detail. It is not the right choice for pixel-faithful publishing. But for review, chunking, citations, agent context, and post-processing, it is often a much better surface than raw OCR text or opaque binary formats.&lt;/p&gt;

&lt;p&gt;This is where the existence-oriented lens becomes useful without becoming abstract. Naming is not reality. A file called &lt;code&gt;report.pdf&lt;/code&gt; is not useful because we named it that way. It becomes useful when a system can interact with its actual content and recover a structure that supports later decisions.&lt;/p&gt;

&lt;p&gt;Markdown is valuable because it turns that recovered structure into something operational.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Coverage Matters, but Routing Matters More
&lt;/h2&gt;

&lt;p&gt;One reason MarkItDown has become popular is simple format breadth.&lt;/p&gt;

&lt;p&gt;According to the public repository, it currently supports conversion from PDF, PowerPoint, Word, Excel, images, audio, HTML, text-based formats such as CSV, JSON, and XML, ZIP archives, YouTube URLs, EPubs, and more. It also exposes optional dependency groups instead of forcing every installation to carry every parser.&lt;/p&gt;

&lt;p&gt;That design choice matters.&lt;/p&gt;

&lt;p&gt;In production, format support should not be monolithic. Different environments have different cost, security, and dependency constraints. A local notebook, a browser-assisted workflow, a server-side API, and an internal batch pipeline do not all want the exact same surface area.&lt;/p&gt;

&lt;p&gt;The README's plugin model reinforces this idea. Plugins are disabled by default, can be listed explicitly, and can extend conversion behavior such as OCR. That is a healthy signal. It treats conversion not as one magic parser, but as a policy surface that teams can widen carefully.&lt;/p&gt;

&lt;p&gt;The deeper lesson is this:&lt;/p&gt;

&lt;p&gt;format coverage is useful, but routing discipline is what makes coverage trustworthy.&lt;/p&gt;

&lt;p&gt;If every input takes the same path, you often end up with a system that is either too permissive or too brittle. Stronger systems separate lightweight paths from heavier ones, and trusted inputs from untrusted ones.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Trust Boundaries Are Part of the Design
&lt;/h2&gt;

&lt;p&gt;This is the part I find most important.&lt;/p&gt;

&lt;p&gt;MarkItDown's public README includes a direct security warning: it performs I/O with the privileges of the current process, so inputs should be sanitized and callers should prefer the narrowest &lt;code&gt;convert_*&lt;/code&gt; method that fits the job, such as &lt;code&gt;convert_stream()&lt;/code&gt; or &lt;code&gt;convert_local()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;That warning should not be treated as boilerplate.&lt;/p&gt;

&lt;p&gt;It is a statement about architecture.&lt;/p&gt;

&lt;p&gt;A file conversion layer is not neutral. The moment it can open files, fetch URIs, or load parser dependencies, it becomes part of your trust boundary.&lt;/p&gt;

&lt;p&gt;The homepage of markitdown.store makes a similar idea visible at the product level. The demo distinguishes between lightweight text and URL paths on one side, and heavier formats such as PDF, Office files, images, audio, ZIP, and EPub on the other. It also notes that those heavier formats are routed to a hosted worker manifest, while the output panel reminds users to review the result before using it in production retrieval or agent workflows.&lt;/p&gt;

&lt;p&gt;That is a good design instinct.&lt;/p&gt;

&lt;p&gt;In ontology terms, boundaries are part of what a thing is. A local text input is not operationally the same object as an untrusted remote document. A CSV pasted into a textbox is not the same risk surface as a complex attachment that may trigger multiple external dependencies. If you erase those differences, the system becomes harder to reason about and easier to misuse.&lt;/p&gt;

&lt;p&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/.%2Fassets%2Fmarkitdown%2Ftrust-boundary.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/.%2Fassets%2Fmarkitdown%2Ftrust-boundary.png" alt="Illustration of isolated conversion workers between uploaded files and downstream AI systems" width="800" height="400"&gt;&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure: trustworthy conversion layers separate upload, isolation, routing, and downstream AI use instead of collapsing everything into one permissive parser path.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This is also why review matters. A conversion pipeline should not act like every generated Markdown file is automatically fit for retrieval, summarization, or action. Reviewable output is not a cosmetic UI feature. It is part of the safety model.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. CLI, Python, Docker, and MCP Are Architecture Choices
&lt;/h2&gt;

&lt;p&gt;The project is also notable for how many entry points it exposes.&lt;/p&gt;

&lt;p&gt;The public materials show a command-line tool, a Python API, Docker usage, optional Azure Document Intelligence support, plugin hooks, and now an MCP server for integration with LLM applications.&lt;/p&gt;

&lt;p&gt;It is tempting to treat that as a feature checklist.&lt;/p&gt;

&lt;p&gt;I think the more useful way to read it is architectural:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CLI fits batch conversion and shell workflows.&lt;/li&gt;
&lt;li&gt;Python fits ingestion services and custom pipelines.&lt;/li&gt;
&lt;li&gt;Docker fits repeatable execution boundaries.&lt;/li&gt;
&lt;li&gt;MCP fits agent ecosystems that want document conversion as a tool.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That makes MarkItDown more than a parser. It becomes a reusable normalization layer that can sit behind a browser UI, a backend worker, a local script, or an agent runtime without changing the core idea of the output.&lt;/p&gt;

&lt;p&gt;For teams building document-aware AI systems, that consistency matters. You do not want four different conversion philosophies just because you have four different application surfaces.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Practical Checklist
&lt;/h2&gt;

&lt;p&gt;If you are designing a similar system, these are the questions I would ask first:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Are we preserving structure, or only extracting raw text?&lt;/li&gt;
&lt;li&gt;Can a human inspect the Markdown before it enters retrieval or agent workflows?&lt;/li&gt;
&lt;li&gt;Do low-risk and high-risk inputs take the same execution path?&lt;/li&gt;
&lt;li&gt;Are we using the narrowest conversion API that matches the actual trust boundary?&lt;/li&gt;
&lt;li&gt;Do we preserve enough provenance, notes, or source hints to debug downstream errors?&lt;/li&gt;
&lt;li&gt;Are plugins and optional dependencies treated as deliberate policy choices instead of default sprawl?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If those questions are answered well, document conversion starts behaving like infrastructure instead of a hidden source of errors.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Takeaway
&lt;/h2&gt;

&lt;p&gt;MarkItDown is not interesting because it converts files. Many tools can do that.&lt;/p&gt;

&lt;p&gt;It is interesting because it treats Markdown as a stable intermediate surface between messy source documents and downstream AI systems. The open-source project gives that idea a practical engine. The browser experience at markitdown.store makes the workflow easier to inspect. Together, they point toward a useful engineering pattern:&lt;/p&gt;

&lt;p&gt;normalize early, preserve meaningful structure, separate trust boundaries, and make the output reviewable before automation builds on top of it.&lt;/p&gt;

&lt;p&gt;That is a much stronger design than "just get me some text."&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;MarkItDown website: &lt;a href="https://www.markitdown.store/" rel="noopener noreferrer"&gt;https://www.markitdown.store/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Microsoft MarkItDown repository: &lt;a href="https://github.com/microsoft/markitdown" rel="noopener noreferrer"&gt;https://github.com/microsoft/markitdown&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;MarkItDown MCP registry page: &lt;a href="https://github.com/mcp/microsoft/markitdown" rel="noopener noreferrer"&gt;https://github.com/mcp/microsoft/markitdown&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>markitdown</category>
      <category>llm</category>
      <category>ai</category>
    </item>
    <item>
      <title>Why Chat-with-Docs Breaks in Real Companies: An Engineering Look at Onyx</title>
      <dc:creator>dengkui yang</dc:creator>
      <pubDate>Thu, 30 Apr 2026 04:21:38 +0000</pubDate>
      <link>https://forem.com/dengkui_yang_fcb5dbe2da32/why-chat-with-docs-breaks-in-real-companies-an-engineering-look-at-onyx-2861</link>
      <guid>https://forem.com/dengkui_yang_fcb5dbe2da32/why-chat-with-docs-breaks-in-real-companies-an-engineering-look-at-onyx-2861</guid>
      <description>&lt;p&gt;&lt;em&gt;Based on the onyx.guru page and the Onyx open-source repository reviewed on April 29, 2026.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Most internal AI projects begin with a reasonable demo: connect a folder of documents, add retrieval, ask a question, get an answer with citations.&lt;/p&gt;

&lt;p&gt;Then the system meets a real company.&lt;/p&gt;

&lt;p&gt;The docs are scattered across Google Drive, Notion, GitHub, Slack, support tickets, policy pages, and user uploads. Some pages are stale. Some are private. Some are deleted upstream but still cached somewhere. Some are visible to one team but not another. Some answers require a fresh web lookup or a tool call, not just a paragraph from an old document.&lt;/p&gt;

&lt;p&gt;This is where "chat with your docs" starts to break.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://onyx.guru/" rel="noopener noreferrer"&gt;onyx.guru page&lt;/a&gt; is interesting because it frames Onyx less like a chatbot and more like a permission-aware knowledge layer. Its public materials emphasize connectors, source permissions, freshness, citations, search, agents, actions, and cloud or self-hosted deployment. That makes it a useful case study for a broader engineering question:&lt;/p&gt;

&lt;p&gt;What does it actually take to build a private AI knowledge system that can be trusted in production?&lt;/p&gt;

&lt;h2&gt;
  
  
  The Failure Mode: Retrieval Without Reality
&lt;/h2&gt;

&lt;p&gt;The simplest RAG architecture is easy to describe:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Load documents into a vector database.&lt;/li&gt;
&lt;li&gt;Retrieve similar chunks for a user query.&lt;/li&gt;
&lt;li&gt;Put those chunks into an LLM prompt.&lt;/li&gt;
&lt;li&gt;Ask the model to answer with citations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That can work for a static knowledge base. It is not enough for an enterprise workspace.&lt;/p&gt;

&lt;p&gt;In real environments, trust fails in more mundane ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A user sees an answer based on a document they should not have access to.&lt;/li&gt;
&lt;li&gt;A policy was updated last week, but the old version still ranks first.&lt;/li&gt;
&lt;li&gt;A deleted document remains embedded and continues to influence answers.&lt;/li&gt;
&lt;li&gt;A support ticket, a code comment, and a runbook all describe the same incident differently.&lt;/li&gt;
&lt;li&gt;A user asks for an operational next step, but the assistant can only summarize.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these failures are solved by simply choosing a larger model. They are systems problems. The model is only the final voice of a chain that includes source ingestion, permission mapping, indexing, retrieval, ranking, citation, tool use, and deployment boundaries.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4butlwuzppa03y8ec58w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4butlwuzppa03y8ec58w.png" alt=" " width="800" height="469"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure: trustworthy private knowledge is a continuous loop, not a one-time model call.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Engineering Requirement 1: Connect the Sources Before Optimizing the Prompt
&lt;/h2&gt;

&lt;p&gt;A private AI system cannot answer from knowledge it never truly ingested.&lt;/p&gt;

&lt;p&gt;That sounds obvious, but it is where many systems become fragile. Enterprise knowledge does not live in one database. It lives in documents, tickets, repositories, chat threads, policies, dashboards, and files uploaded by users. A useful system needs connectors that preserve more than raw text.&lt;/p&gt;

&lt;p&gt;It needs to preserve:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;document identity&lt;/li&gt;
&lt;li&gt;source type&lt;/li&gt;
&lt;li&gt;update time&lt;/li&gt;
&lt;li&gt;authorship&lt;/li&gt;
&lt;li&gt;metadata&lt;/li&gt;
&lt;li&gt;deletion state&lt;/li&gt;
&lt;li&gt;access rules where available&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Onyx puts connectors near the center of the product. Its public materials describe more than 50 indexing-based connectors, plus MCP-based extensibility. This matters because connectors are not just import tools. They are the system's contact surface with reality.&lt;/p&gt;

&lt;p&gt;If the connector layer is weak, the model may still produce polished prose, but the answer will be grounded in an incomplete or outdated world.&lt;/p&gt;

&lt;h2&gt;
  
  
  Engineering Requirement 2: Permissions Must Travel With the Knowledge
&lt;/h2&gt;

&lt;p&gt;The most dangerous enterprise AI bug is not a bad summary. It is a correct answer shown to the wrong person.&lt;/p&gt;

&lt;p&gt;That is why permission-aware retrieval is not a compliance add-on. It is part of the knowledge model itself. A private finance memo and a public engineering guide are not merely two text blocks with different labels. They have different organizational meaning because they participate in different visibility networks.&lt;/p&gt;

&lt;p&gt;From an ontology perspective, boundaries are part of what a thing is. In engineering terms: access control must be attached before retrieval, not patched after generation.&lt;/p&gt;

&lt;p&gt;Onyx's public positioning highlights permission-aware search and keeping permissions attached to the source. That is the right architectural direction. The retrieval system should know what the current user is allowed to see before the model ever receives context.&lt;/p&gt;

&lt;p&gt;A useful test is simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Can two users ask the same question and receive different valid results based on their permissions?&lt;/li&gt;
&lt;li&gt;Can the system explain which sources were used?&lt;/li&gt;
&lt;li&gt;Can revoked access stop influencing future answers?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the answer is no, the AI system is not ready for private knowledge work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Engineering Requirement 3: Freshness Is a Data Pipeline Problem
&lt;/h2&gt;

&lt;p&gt;Freshness is often presented as a UI feature: "This answer cites recent sources."&lt;/p&gt;

&lt;p&gt;In practice, freshness is a pipeline property.&lt;/p&gt;

&lt;p&gt;The system has to detect source changes, schedule sync jobs, update chunks, remove deleted content, refresh embeddings or indexes, and preserve enough metadata for ranking and filtering. This is not glamorous, but it is the difference between a useful knowledge layer and a historical archive with a chat interface.&lt;/p&gt;

&lt;p&gt;Onyx Standard mode is interesting here because the public materials describe the heavier machinery behind production retrieval: vector and keyword indexing, background workers for sync jobs, inference services used during indexing and inference, Redis, MinIO, Postgres, and Vespa. The stack is a reminder that trustworthy AI is not one model call. It is a stateful system that has to keep adjusting.&lt;/p&gt;

&lt;p&gt;This is where the ontology lens becomes practical. A system continues to exist by doing two things: acting outward and adjusting inward. For enterprise AI, "inward adjustment" means re-syncing, pruning, re-ranking, re-checking permissions, and correcting its own representation of the organization as the organization changes.&lt;/p&gt;

&lt;p&gt;Without that internal adjustment, citations eventually become decoration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Engineering Requirement 4: Search Should Be Inspectable, Not Hidden Inside Chat
&lt;/h2&gt;

&lt;p&gt;Chat is a convenient interface, but it should not be the only interface.&lt;/p&gt;

&lt;p&gt;When a user is doing serious work, they often need to inspect the source set before trusting the synthesis. They may want to filter by author, time range, source type, tag, or document family. They may want to compare sources instead of accepting a single blended answer.&lt;/p&gt;

&lt;p&gt;Onyx's public materials describe a dedicated search experience with query classification and filters. That is more important than it might look. It separates retrieval from generation.&lt;/p&gt;

&lt;p&gt;This separation gives teams a way to debug and trust the system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What did the system retrieve?&lt;/li&gt;
&lt;li&gt;Why did this source rank higher than that one?&lt;/li&gt;
&lt;li&gt;Was the answer built from the right category of documents?&lt;/li&gt;
&lt;li&gt;Did the model summarize the evidence correctly?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In production, observability is not only for servers. Knowledge retrieval needs observability too.&lt;/p&gt;

&lt;h2&gt;
  
  
  Engineering Requirement 5: Some Answers Need Actions, Not Just Text
&lt;/h2&gt;

&lt;p&gt;Internal AI becomes more useful when it can move from "tell me" to "help me do the work."&lt;/p&gt;

&lt;p&gt;Some questions require internal search. Some require fresh web context. Some require code execution, calculations, API calls, or interaction with an operational system. If the assistant cannot use tools, it remains a commentator on the workflow rather than a participant in it.&lt;/p&gt;

&lt;p&gt;Onyx supports built-in actions such as internal search, web search, code execution, and image generation, and it supports custom actions through OpenAPI and MCP. The important part is not just that actions exist. It is that actions need governance.&lt;/p&gt;

&lt;p&gt;For enterprise use, tool access should answer the same questions as document access:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which user is allowed to call this action?&lt;/li&gt;
&lt;li&gt;Does the action use shared authentication or user-level authentication?&lt;/li&gt;
&lt;li&gt;What data leaves the deployment boundary?&lt;/li&gt;
&lt;li&gt;Can the result be traced back to the tool and source?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where many AI assistants become operationally risky. The moment an assistant can act, permissions, auditability, and data boundaries matter even more.&lt;/p&gt;

&lt;h2&gt;
  
  
  Engineering Requirement 6: Deployment Boundaries Are Product Decisions
&lt;/h2&gt;

&lt;p&gt;Private knowledge systems cannot treat deployment as an afterthought.&lt;/p&gt;

&lt;p&gt;Some teams are comfortable with cloud hosting. Others need self-hosting because of data sensitivity, compliance requirements, network topology, or internal security review. The architecture has to make those tradeoffs explicit.&lt;/p&gt;

&lt;p&gt;Onyx describes both Lite and Standard deployment modes. Lite is lighter and chat-oriented. Standard adds the heavier retrieval and synchronization infrastructure needed for stronger production knowledge workflows. Its public materials also describe a self-hosted architecture where the core system runs inside the deployment boundary, while external services such as LLM APIs, embedding providers, web search, or image generation are explicitly configured by the admin.&lt;/p&gt;

&lt;p&gt;That distinction matters. A private AI system should make it clear where data is stored, when data leaves the boundary, and which external services participate in the answer.&lt;/p&gt;

&lt;p&gt;Good security architecture is not just about preventing incidents. It also makes trust explainable.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Practical Evaluation Checklist
&lt;/h2&gt;

&lt;p&gt;If you are evaluating a private AI knowledge system, the most useful questions are not about demo magic. They are about failure modes.&lt;/p&gt;

&lt;p&gt;Ask these:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What source systems can it connect to?&lt;/li&gt;
&lt;li&gt;Does it preserve metadata and deletion state?&lt;/li&gt;
&lt;li&gt;Are permissions enforced before retrieval?&lt;/li&gt;
&lt;li&gt;Can different users receive different valid answers?&lt;/li&gt;
&lt;li&gt;How does the system handle stale, conflicting, or removed content?&lt;/li&gt;
&lt;li&gt;Can users inspect retrieved sources before trusting the answer?&lt;/li&gt;
&lt;li&gt;Does it support both search and chat?&lt;/li&gt;
&lt;li&gt;Can it use tools or actions safely?&lt;/li&gt;
&lt;li&gt;What leaves the deployment boundary?&lt;/li&gt;
&lt;li&gt;Can the architecture scale from a small pilot to a production knowledge layer?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This checklist is deliberately practical. If a system cannot answer these questions, the risk is not that the model sounds bad. The risk is that it sounds good while being wrong, stale, or unsafe.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Onyx Fits
&lt;/h2&gt;

&lt;p&gt;Onyx is not interesting because it promises a prettier chatbot. It is interesting because its public architecture acknowledges the parts of enterprise AI that are easy to underestimate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;source connectivity&lt;/li&gt;
&lt;li&gt;permission-aware retrieval&lt;/li&gt;
&lt;li&gt;citations and freshness&lt;/li&gt;
&lt;li&gt;dedicated search&lt;/li&gt;
&lt;li&gt;agents and governed actions&lt;/li&gt;
&lt;li&gt;cloud and self-hosted deployment options&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The phrase "Search private knowledge before you trust the answer" is a good summary of the engineering posture. Trust should be earned by the source path, not assumed from model fluency.&lt;/p&gt;

&lt;p&gt;That is also where the ontology idea fits naturally. A knowledge system has to keep existing correctly in relation to its environment. It must absorb changes from the outside, adjust its internal representation, respect boundaries, and act only through governed channels. Otherwise, it is not a trusted layer. It is a static snapshot with a fluent interface.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Takeaway
&lt;/h2&gt;

&lt;p&gt;The next wave of enterprise AI will not be defined by "chat with docs" alone. It will be defined by systems that can connect private sources, preserve permissions, stay fresh, expose evidence, and act safely.&lt;/p&gt;

&lt;p&gt;Onyx is a useful case study because it treats those concerns as core architecture rather than as optional polish.&lt;/p&gt;

&lt;p&gt;For teams exploring this category, the best next step is small and concrete: choose one knowledge domain with real permissions, frequent updates, and answers that require citations. Test whether the system can handle the full chain from source connection to permission-aware retrieval to cited answer to governed action. If that chain works, the pilot can grow. If it breaks, the problem is probably not the prompt. It is the knowledge system.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;onyx.guru page: &lt;a href="https://onyx.guru/" rel="noopener noreferrer"&gt;https://onyx.guru/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Onyx open-source repository: &lt;a href="https://github.com/onyx-dot-app/onyx" rel="noopener noreferrer"&gt;https://github.com/onyx-dot-app/onyx&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>rag</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>Why File Type Detection Is More Than a Metadata Problem</title>
      <dc:creator>dengkui yang</dc:creator>
      <pubDate>Wed, 29 Apr 2026 08:20:32 +0000</pubDate>
      <link>https://forem.com/dengkui_yang_fcb5dbe2da32/why-file-type-detection-is-more-than-a-metadata-problem-32h4</link>
      <guid>https://forem.com/dengkui_yang_fcb5dbe2da32/why-file-type-detection-is-more-than-a-metadata-problem-32h4</guid>
      <description>&lt;h3&gt;
  
  
  What Magika teaches us about names, evidence, boundaries, and trustworthy file intelligence
&lt;/h3&gt;

&lt;p&gt;Author note: This article is written for engineers building upload flows, storage systems, CI pipelines, security tooling, and AI products that need to reason about real files instead of just trusting filenames.&lt;/p&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;When a system accepts a file, one of the first questions sounds almost trivial:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What is this thing?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But many production systems still answer that question with a weak proxy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the filename extension&lt;/li&gt;
&lt;li&gt;the browser-provided MIME type&lt;/li&gt;
&lt;li&gt;a user claim&lt;/li&gt;
&lt;li&gt;a storage metadata field&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That works until it does not.&lt;/p&gt;

&lt;p&gt;A file called &lt;code&gt;invoice.pdf&lt;/code&gt; may actually be a ZIP container, a JavaScript payload, a damaged document, or a binary blob that should never reach the parser you are about to invoke.&lt;/p&gt;

&lt;p&gt;This is why Google's open-source &lt;a href="https://github.com/google/magika" rel="noopener noreferrer"&gt;Magika&lt;/a&gt; project is interesting.&lt;/p&gt;

&lt;p&gt;Magika is not just another convenience wrapper around file metadata. It is a content-based file type detector that tries to ground classification in the file's actual bytes.&lt;/p&gt;

&lt;p&gt;For readers who want to inspect that idea without installing a command-line tool, &lt;a href="https://www.magika.uk" rel="noopener noreferrer"&gt;magika.uk&lt;/a&gt; provides a web version of the same practical workflow: upload a file, and the result exposes detected type, MIME type, file group, confidence, and an extension mismatch signal.&lt;/p&gt;

&lt;p&gt;That design choice matters technically. It also gives us a useful way to think about file identity.&lt;/p&gt;

&lt;p&gt;If we borrow the word "ontology" in a practical engineering sense, it simply means:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;the model a system uses to decide what kind of thing it is interacting with, where the boundary of that thing is, and what actions are valid once that classification is made.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;From that perspective, file type detection is not only a naming problem.&lt;/p&gt;

&lt;p&gt;It is a boundary and evidence problem.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. The Extension Mistake
&lt;/h2&gt;

&lt;p&gt;Let me start with a question.&lt;/p&gt;

&lt;p&gt;Suppose your upload service receives these three files:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;headshot.png&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;report.docx&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;archive.txt&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Which one should go to the image thumbnailer?&lt;br&gt;
Which one is safe to send to a document parser?&lt;br&gt;
Which one deserves secondary inspection before entering the rest of your pipeline?&lt;/p&gt;

&lt;p&gt;If your answer is mostly based on the suffix after the last dot, your system is not classifying files. It is trusting labels.&lt;/p&gt;

&lt;p&gt;That is a very human habit.&lt;/p&gt;

&lt;p&gt;Humans like names. Names are cheap. Names are convenient. Names are socially useful.&lt;/p&gt;

&lt;p&gt;But files do not become PNGs because we call them &lt;code&gt;.png&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Operationally, a file becomes a "PNG" because its internal structure, magic bytes, and content patterns support a set of downstream interactions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;image decoders can parse it&lt;/li&gt;
&lt;li&gt;rendering pipelines can transform it&lt;/li&gt;
&lt;li&gt;security scanners can apply the right rules&lt;/li&gt;
&lt;li&gt;storage systems can make the right policy decisions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The file's practical identity is tied to how systems interact with it, not to what a human named it.&lt;/p&gt;

&lt;p&gt;This is where a deeper model of identity becomes useful.&lt;/p&gt;

&lt;p&gt;A useful principle is that we should stop treating human definitions as if they were identical to reality. Things reveal themselves through interaction. In file pipelines, that means the "real type" of a file is closer to its interaction surface than to its filename.&lt;/p&gt;

&lt;p&gt;An extension is a claim.&lt;/p&gt;

&lt;p&gt;Content is evidence.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpbejpq2dxv4epa83w3gn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpbejpq2dxv4epa83w3gn.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Caption: A filename is a useful claim, but the file's bytes provide stronger evidence for downstream decisions.&lt;/p&gt;


&lt;h2&gt;
  
  
  2. What Magika Actually Adds
&lt;/h2&gt;

&lt;p&gt;Magika matters because it operationalizes that distinction.&lt;/p&gt;

&lt;p&gt;According to the official project materials, Magika uses a compact deep learning model, only a few megabytes in size, trained and evaluated on roughly 100 million samples across more than 200 content types. After the one-time model load, inference is on the order of milliseconds per file on a single CPU. It also avoids reading entire large files into memory, typically inspecting only a limited subset of content, usually a few hundred bytes and up to around 2 KB depending on the model.&lt;/p&gt;

&lt;p&gt;That combination leads to an engineering result that is more important than it first appears:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;content-based classification&lt;/li&gt;
&lt;li&gt;near-constant inference time&lt;/li&gt;
&lt;li&gt;enough accuracy to be useful in real routing decisions&lt;/li&gt;
&lt;li&gt;enough speed to sit early in a pipeline&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is why Magika is not just a developer toy.&lt;/p&gt;

&lt;p&gt;It is a pre-routing layer.&lt;/p&gt;

&lt;p&gt;It answers a system question that sits before many more expensive questions:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Before I parse, transform, render, index, execute, or scan this object deeply, what kind of object am I probably dealing with?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That early answer changes architecture.&lt;/p&gt;

&lt;p&gt;Instead of letting every downstream component discover the file type in its own fragile way, you can establish a first-pass classification layer and make later steps conditional on that result.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Weak pipeline&lt;/th&gt;
&lt;th&gt;Stronger pipeline&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Route by extension&lt;/td&gt;
&lt;td&gt;Route by detected content label&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Trust client MIME type&lt;/td&gt;
&lt;td&gt;Compare claimed type with observed type&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Parse first, reject later&lt;/td&gt;
&lt;td&gt;Identify first, then choose parser&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Demand exact guesses&lt;/td&gt;
&lt;td&gt;Allow generic fallback when confidence is low&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That last row is especially important.&lt;/p&gt;

&lt;p&gt;Because one of Magika's best ideas is not only that it predicts types.&lt;/p&gt;

&lt;p&gt;It is that it does not always pretend to know.&lt;/p&gt;


&lt;h2&gt;
  
  
  3. The Most Interesting Part: It Separates Belief from Decision
&lt;/h2&gt;

&lt;p&gt;This is, to me, the most underrated design choice in Magika.&lt;/p&gt;

&lt;p&gt;The official output model distinguishes between:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the raw deep-learning prediction&lt;/li&gt;
&lt;li&gt;the final tool output used for operational decisions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In other words, the system separates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what the model believes&lt;/li&gt;
&lt;li&gt;what the product is willing to say&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is a powerful distinction.&lt;/p&gt;

&lt;p&gt;If the model predicts a type with low confidence, Magika does not have to force a precise answer. It can return a more generic label such as a broad text or unknown binary category, depending on the case. The documentation also describes per-content-type thresholds and multiple prediction modes, including &lt;code&gt;high-confidence&lt;/code&gt;, &lt;code&gt;medium-confidence&lt;/code&gt;, and &lt;code&gt;best-guess&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This is not just a tuning convenience.&lt;/p&gt;

&lt;p&gt;It is an epistemic boundary.&lt;/p&gt;

&lt;p&gt;A careless classifier says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I always owe you a specific answer.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A disciplined classifier says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I owe you the strongest answer that the evidence can justify.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That difference is the heart of trustworthy file intelligence.&lt;/p&gt;

&lt;p&gt;From a system design perspective, this is a very healthy move. A system should not confuse naming with knowing. If it cannot identify an object precisely enough, it should still place that object honestly within a safer boundary.&lt;/p&gt;

&lt;p&gt;That is why Magika's generic labels are not a weakness.&lt;/p&gt;

&lt;p&gt;They are a form of boundary recognition.&lt;/p&gt;

&lt;p&gt;And boundary recognition is one of the hardest things to get right in production systems.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fohpb7aaisp420aujawhl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fohpb7aaisp420aujawhl.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Caption: The valuable step is not only prediction, but converting confidence into an honest system output.&lt;/p&gt;


&lt;h2&gt;
  
  
  4. A Practical Model of File Identity
&lt;/h2&gt;

&lt;p&gt;If "ontology" sounds too abstract, here is the same idea in narrower engineering terms.&lt;/p&gt;

&lt;p&gt;For a production system, a file identity model answers questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What entities do I believe exist here?&lt;/li&gt;
&lt;li&gt;How do I distinguish one entity from another?&lt;/li&gt;
&lt;li&gt;What evidence is strong enough to justify that distinction?&lt;/li&gt;
&lt;li&gt;What actions become valid after classification?&lt;/li&gt;
&lt;li&gt;What should I do when the boundary is unclear?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now apply those questions to files.&lt;/p&gt;

&lt;p&gt;A simplistic model says:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Entity = filename extension
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A better model says:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Entity = content-bearing object with a detectable internal structure
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An even better operational model says:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Entity = content-bearing object whose probable downstream interactions
can be estimated from observed bytes, confidence thresholds, and routing policy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That third version is much closer to how resilient systems should think.&lt;/p&gt;

&lt;p&gt;Two ideas are especially useful here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;do not remain trapped in human-centered naming&lt;/li&gt;
&lt;li&gt;understand things through external interaction and internal adjustment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Magika maps neatly onto both.&lt;/p&gt;

&lt;p&gt;First, it moves classification away from human-centered naming. The extension may still be useful, but it is no longer treated as the essence of the object.&lt;/p&gt;

&lt;p&gt;Second, it helps a larger system connect external interaction with internal adjustment.&lt;/p&gt;

&lt;p&gt;The file produces an external signal through its bytes.&lt;/p&gt;

&lt;p&gt;The system then performs internal adjustment:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;allow&lt;/li&gt;
&lt;li&gt;block&lt;/li&gt;
&lt;li&gt;quarantine&lt;/li&gt;
&lt;li&gt;route to a safer parser&lt;/li&gt;
&lt;li&gt;request secondary scanning&lt;/li&gt;
&lt;li&gt;log an extension mismatch&lt;/li&gt;
&lt;li&gt;downgrade trust&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why I would describe Magika not merely as a classifier, but as a boundary-aware adjustment trigger for file pipelines.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Why the Web Version Matters
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.magika.uk" rel="noopener noreferrer"&gt;magika.uk&lt;/a&gt; is useful not because every file intelligence idea needs a website, but because a web version makes the classification process easier to inspect.&lt;/p&gt;

&lt;p&gt;The interface does not present file detection as a mystical black box. It surfaces a set of operationally relevant fields:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;detected type&lt;/li&gt;
&lt;li&gt;MIME type&lt;/li&gt;
&lt;li&gt;group&lt;/li&gt;
&lt;li&gt;confidence&lt;/li&gt;
&lt;li&gt;extension mismatch&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It also frames the runtime explicitly; the upload demo shows &lt;code&gt;magika-js/browser&lt;/code&gt;, which is a useful reminder that the same classification idea can run close to the user, not only deep in backend infrastructure.&lt;/p&gt;

&lt;p&gt;That matters for product architecture.&lt;/p&gt;

&lt;p&gt;If a browser-side or edge-side layer can classify content early, then some decisions can happen before the file reaches more privileged systems. Even when you still need server-side verification, early detection can improve UX, reduce bad uploads, and make downstream policy more explainable.&lt;/p&gt;

&lt;p&gt;Notice what is absent from this kind of interface:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;hype about "understanding all files"&lt;/li&gt;
&lt;li&gt;vague security theater&lt;/li&gt;
&lt;li&gt;a single overconfident badge that hides uncertainty&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead, it exposes the kind of metadata a builder actually needs to reason with.&lt;/p&gt;

&lt;p&gt;That is a good product instinct.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. A Better Mental Model: File Identity as Interaction Potential
&lt;/h2&gt;

&lt;p&gt;One reason file classification is often implemented poorly is that teams think of type as static metadata.&lt;/p&gt;

&lt;p&gt;But in real systems, type is better understood as interaction potential.&lt;/p&gt;

&lt;p&gt;A file type is a compressed summary of likely behavior:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what parser chain it can enter&lt;/li&gt;
&lt;li&gt;what rendering path it can trigger&lt;/li&gt;
&lt;li&gt;what policy rules should apply&lt;/li&gt;
&lt;li&gt;what scanners become relevant&lt;/li&gt;
&lt;li&gt;what storage or preview behavior is safe&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From this viewpoint, a file's "real type" is not just descriptive.&lt;/p&gt;

&lt;p&gt;It is predictive.&lt;/p&gt;

&lt;p&gt;That also connects nicely to another useful idea: simulation.&lt;/p&gt;

&lt;p&gt;Before a system acts on a file, it benefits from a lightweight simulation of what kind of world this object belongs to. Magika effectively provides that first simulation layer. It does not fully validate the object, and it does not tell you whether the file is malicious in every sense. But it does offer an informed prior about what downstream interactions are likely to make sense.&lt;/p&gt;

&lt;p&gt;That is enough to improve many workflows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;upload moderation&lt;/li&gt;
&lt;li&gt;malware triage&lt;/li&gt;
&lt;li&gt;CI artifact inspection&lt;/li&gt;
&lt;li&gt;ETL pipelines&lt;/li&gt;
&lt;li&gt;object storage intake&lt;/li&gt;
&lt;li&gt;AI systems that ingest user-provided documents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is also where the "extension mismatch" signal becomes more interesting than it looks.&lt;/p&gt;

&lt;p&gt;A mismatch is not just a UX warning.&lt;/p&gt;

&lt;p&gt;It is a conflict between claimed identity and observed structure.&lt;/p&gt;

&lt;p&gt;And conflicts of identity are exactly where good systems should slow down.&lt;/p&gt;




&lt;h2&gt;
  
  
  7. How I Would Use Magika in a Real Pipeline
&lt;/h2&gt;

&lt;p&gt;Here is a minimal example in JavaScript using the official package:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Magika&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;magika&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;magika&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;Magika&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Uint8Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;arrayBuffer&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;magika&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;identifyBytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;label&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;label&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;score&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;mime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;mime_type&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;group&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;group&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;label&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;unknown&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;holdForReview&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;file&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;group&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;code&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;sendToCodeScanning&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;label&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;mime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;score&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;group&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;document&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;sendToDocumentPipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;label&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;mime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;score&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;sendToGenericProcessing&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;label&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;mime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;score&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That example is intentionally simple, but the architectural pattern is the point.&lt;/p&gt;

&lt;p&gt;I would not use Magika as the final judge of safety.&lt;/p&gt;

&lt;p&gt;I would use it as the first trustworthy classifier that helps the rest of the system choose the right next interaction.&lt;/p&gt;

&lt;p&gt;A stronger production version might do something like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Capture the claimed extension and client MIME type.&lt;/li&gt;
&lt;li&gt;Run Magika on content bytes.&lt;/li&gt;
&lt;li&gt;Compare claim vs observed label.&lt;/li&gt;
&lt;li&gt;Apply a risk-dependent prediction mode.&lt;/li&gt;
&lt;li&gt;Route to different scanners or parsers.&lt;/li&gt;
&lt;li&gt;Log mismatches and low-confidence outcomes for monitoring.&lt;/li&gt;
&lt;li&gt;Refuse dangerous transitions, such as "claimed image, detected executable/script-like content."&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0rcq5hozkbk51mi16bfz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0rcq5hozkbk51mi16bfz.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Caption: Magika is most useful as an early identity layer that helps the rest of the pipeline choose the right next interaction.&lt;/p&gt;

&lt;p&gt;This is where the identity model becomes practical.&lt;/p&gt;

&lt;p&gt;You are not only classifying the object.&lt;/p&gt;

&lt;p&gt;You are defining what kinds of interactions your system is willing to have with that object.&lt;/p&gt;




&lt;h2&gt;
  
  
  8. Where Magika Should Not Be Overstated
&lt;/h2&gt;

&lt;p&gt;A good technical article should also name the limits.&lt;/p&gt;

&lt;p&gt;Magika does not eliminate the need for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;malware scanning&lt;/li&gt;
&lt;li&gt;parser hardening&lt;/li&gt;
&lt;li&gt;archive recursion policies&lt;/li&gt;
&lt;li&gt;schema validation&lt;/li&gt;
&lt;li&gt;business-level content checks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It is also not the same as full semantic understanding. Knowing that a file is likely a PDF is not the same as knowing whether the PDF is safe, well-formed, policy-compliant, or useful for your application.&lt;/p&gt;

&lt;p&gt;The official documentation also makes clear that some edge cases are handled outside the model itself, such as empty files, non-regular files like directories or symlinks, and very small inputs where only coarse heuristics make sense.&lt;/p&gt;

&lt;p&gt;That is normal.&lt;/p&gt;

&lt;p&gt;In fact, it is another sign of maturity.&lt;/p&gt;

&lt;p&gt;Reliable systems are often hybrids. They combine learned models, thresholds, heuristics, and policy logic. Pretending that one model should do everything is usually a symptom of bad architecture.&lt;/p&gt;

&lt;p&gt;So the right question is not:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Can Magika solve file security by itself?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The better question is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Where in my pipeline do I need a fast, content-grounded identity layer so later decisions become safer and more explainable?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is a much more realistic framing.&lt;/p&gt;




&lt;h2&gt;
  
  
  9. Four Questions I Would Ask Before Integrating It
&lt;/h2&gt;

&lt;p&gt;If you are evaluating Magika or any similar system, I think these questions matter more than benchmark screenshots:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What decisions in your pipeline are currently driven by extension or client-provided MIME alone?&lt;/li&gt;
&lt;li&gt;Which of those decisions are high-risk enough to require &lt;code&gt;high-confidence&lt;/code&gt; behavior instead of &lt;code&gt;best-guess&lt;/code&gt; behavior?&lt;/li&gt;
&lt;li&gt;What should your system do when it cannot classify precisely but can still classify safely as "generic text" or "unknown binary"?&lt;/li&gt;
&lt;li&gt;Do you treat extension mismatch as an actionable policy event, or only as debug information?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Those questions expose whether your problem is merely "file detection" or whether it is actually "boundary-aware system design."&lt;/p&gt;

&lt;p&gt;Most of the time, it is the second.&lt;/p&gt;




&lt;h2&gt;
  
  
  10. Closing Thought
&lt;/h2&gt;

&lt;p&gt;What I like about Magika is not the vague idea that "AI can classify files better."&lt;/p&gt;

&lt;p&gt;What I like is the discipline behind it.&lt;/p&gt;

&lt;p&gt;It pushes a system to stop asking:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What did the user call this object?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;and to start asking:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Based on the evidence available in the object itself, what kind of thing is this, how sure am I, and what interactions are justified next?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is a better technical question.&lt;/p&gt;

&lt;p&gt;It is also a better question about identity and boundaries.&lt;/p&gt;

&lt;p&gt;And I suspect the same lesson applies far beyond file uploads:&lt;/p&gt;

&lt;p&gt;reliable systems improve when they ground identity in interaction, preserve uncertainty honestly, and let classification drive internal adjustment instead of blind execution.&lt;/p&gt;

&lt;p&gt;If you are building anything that accepts untrusted files, that shift is worth thinking about.&lt;/p&gt;

&lt;p&gt;Not because it sounds philosophical.&lt;/p&gt;

&lt;p&gt;Because it is operationally useful.&lt;/p&gt;




&lt;h2&gt;
  
  
  Open Questions
&lt;/h2&gt;

&lt;p&gt;I would be curious how other builders think about this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Are you still routing uploads mainly by extension or client MIME?&lt;/li&gt;
&lt;li&gt;Where in your pipeline would a generic "unknown" answer actually be safer than an overconfident specific label?&lt;/li&gt;
&lt;li&gt;Do you treat file identity as metadata, or as a prediction about downstream interaction?&lt;/li&gt;
&lt;li&gt;If you have tried Magika or the magika.uk web version, what did it change in your routing or security design?&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Web version: &lt;a href="https://www.magika.uk" rel="noopener noreferrer"&gt;https://www.magika.uk&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Upstream open-source project: &lt;a href="https://github.com/google/magika" rel="noopener noreferrer"&gt;https://github.com/google/magika&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Official documentation: &lt;a href="https://securityresearch.google/magika" rel="noopener noreferrer"&gt;https://securityresearch.google/magika&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>cybersecurity</category>
      <category>machinelearning</category>
      <category>security</category>
      <category>tooling</category>
    </item>
    <item>
      <title>We Built Multica to Make Multi-Agent AI Useful for Real Workflows</title>
      <dc:creator>dengkui yang</dc:creator>
      <pubDate>Tue, 28 Apr 2026 16:15:59 +0000</pubDate>
      <link>https://forem.com/dengkui_yang_fcb5dbe2da32/we-built-multica-to-make-multi-agent-ai-useful-for-real-workflows-41h1</link>
      <guid>https://forem.com/dengkui_yang_fcb5dbe2da32/we-built-multica-to-make-multi-agent-ai-useful-for-real-workflows-41h1</guid>
      <description>&lt;p&gt;AI is getting better fast, but the way most people use it still feels fragmented.&lt;/p&gt;

&lt;p&gt;You open one chat, try one prompt, get one answer, and then start over somewhere else. That works for simple tasks, but it quickly becomes limiting when real work requires comparison, coordination, and iteration across multiple models.&lt;/p&gt;

&lt;p&gt;That gap is what led us to build Multica.&lt;/p&gt;

&lt;p&gt;Multica is a multi-agent collaboration platform designed for real workflows. Instead of treating AI like a series of isolated conversations, Multica gives you one place to run, compare, and coordinate multiple AI agents in parallel. You can review outputs side by side, test different reasoning styles, and turn parallel AI execution into a more structured and repeatable process.&lt;/p&gt;

&lt;p&gt;We built it around a simple belief: the future of AI is not just better models. It is better workflow.&lt;/p&gt;

&lt;p&gt;In practice, real work is rarely solved by a single prompt or a single model. One model may be better at reasoning. Another may be better at speed. A third may produce a stronger draft, better structure, or a more useful perspective for the task. But switching across tools and trying to manage that process manually is slow, messy, and difficult to scale.&lt;/p&gt;

&lt;p&gt;Multica was created to solve that problem.&lt;/p&gt;

&lt;p&gt;With Multica, teams can work across models like GPT, Claude, Gemini, and others in one workflow instead of scattering tasks across separate tabs and disconnected chats. The goal is not just to generate more output. It is to create clearer decisions, faster iteration loops, and a more reliable way to use AI in production work.&lt;/p&gt;

&lt;p&gt;That matters because multi-agent collaboration only becomes valuable when it is usable. It needs structure. It needs visibility. It needs a workflow that helps people compare outputs, make decisions, and move forward without losing context.&lt;/p&gt;

&lt;p&gt;This is the direction we believe AI tools need to go.&lt;/p&gt;

&lt;p&gt;Not just chat interfaces. Not just one-off prompts. Not just isolated answers.&lt;/p&gt;

&lt;p&gt;But systems that help people coordinate intelligence in a way that fits real work.&lt;/p&gt;

&lt;p&gt;That is what we are building with Multica.&lt;/p&gt;

&lt;p&gt;If you are exploring how to make multi-agent AI more practical, organized, and useful for real workflows, you can learn more here:&lt;a href="https://www.multica.uk/" rel="noopener noreferrer"&gt;https://www.multica.uk/&lt;/a&gt; &lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>What If You Could Predict Decisions Before Making Them?</title>
      <dc:creator>dengkui yang</dc:creator>
      <pubDate>Tue, 28 Apr 2026 14:36:42 +0000</pubDate>
      <link>https://forem.com/dengkui_yang_fcb5dbe2da32/what-if-you-could-predict-decisions-before-making-them-f94</link>
      <guid>https://forem.com/dengkui_yang_fcb5dbe2da32/what-if-you-could-predict-decisions-before-making-them-f94</guid>
      <description>&lt;p&gt;What if you could test a decision before actually making it?&lt;/p&gt;

&lt;p&gt;Not with spreadsheets.  Not with gut feeling.  But by simulating how people, markets, and narratives might react.&lt;/p&gt;

&lt;p&gt;That’s the idea I’ve been exploring.&lt;/p&gt;

&lt;p&gt;— -&lt;/p&gt;

&lt;p&gt;Most decisions today are still based on guesswork.&lt;/p&gt;

&lt;p&gt;We run A/B tests.  We analyze past data.  We try to predict outcomes.&lt;/p&gt;

&lt;p&gt;But real-world systems don’t behave like clean models.&lt;/p&gt;

&lt;p&gt;People react.  Narratives spread.  Unexpected things happen.&lt;/p&gt;

&lt;p&gt;And once a decision is made, it’s often too late to go back.&lt;/p&gt;

&lt;p&gt;— -&lt;/p&gt;

&lt;p&gt;So I started thinking:&lt;/p&gt;

&lt;p&gt;What if we could simulate decisions before committing to them?&lt;/p&gt;

&lt;p&gt;— -&lt;/p&gt;

&lt;p&gt;I built a tool called MiroFish.&lt;/p&gt;

&lt;p&gt;It lets you ask questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“What happens if we raise prices next quarter?”  - “How might public opinion shift after a policy change?”  - “What happens to a brand after a PR crisis?”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Behind the scenes, it runs multi-agent simulations to model how things might unfold — and returns structured predictions.&lt;/p&gt;

&lt;p&gt;— -&lt;/p&gt;

&lt;p&gt;The experience is simple.&lt;/p&gt;

&lt;p&gt;You ask a question — just like you would in ChatGPT.&lt;/p&gt;

&lt;p&gt;But instead of generating a single answer, the system simulates interactions between agents, narratives, and possible outcomes.&lt;/p&gt;

&lt;p&gt;It’s less about finding the answer,  and more about exploring what could happen.&lt;/p&gt;

&lt;p&gt;— -&lt;/p&gt;

&lt;p&gt;Some of the use cases I’m exploring:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pricing strategy decisions  - Market sentiment forecasting  - Narrative and public opinion shifts  - Policy impact simulation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Basically, any situation where outcomes depend on complex interactions.&lt;/p&gt;

&lt;p&gt;— -&lt;/p&gt;

&lt;p&gt;This is still an early project.&lt;/p&gt;

&lt;p&gt;But I believe the idea of “simulating decisions” will become increasingly important as systems get more complex.&lt;/p&gt;

&lt;p&gt;— -&lt;/p&gt;

&lt;p&gt;If you’re curious, you can try it here: &lt;a href="https://www.mirofish.work/" rel="noopener noreferrer"&gt;https://www.mirofish.work/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Would love to hear what you think — or how you’d use something like this.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Why Prompts Are Not Enough for Long-Running AI Agents</title>
      <dc:creator>dengkui yang</dc:creator>
      <pubDate>Wed, 22 Apr 2026 07:16:05 +0000</pubDate>
      <link>https://forem.com/dengkui_yang_fcb5dbe2da32/why-prompts-are-not-enough-for-long-running-ai-agents-2bn5</link>
      <guid>https://forem.com/dengkui_yang_fcb5dbe2da32/why-prompts-are-not-enough-for-long-running-ai-agents-2bn5</guid>
      <description>&lt;p&gt;&lt;em&gt;A small ontology-inspired model for understanding why AI agents fail after the first obstacle&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Author note: This article is written for AI builders, prompt engineers, automation teams, and founders experimenting with long-running AI agents.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Most AI agent failures are not caused by a lack of instructions.&lt;/p&gt;

&lt;p&gt;They happen after instructions meet resistance.&lt;/p&gt;

&lt;p&gt;The agent starts well. It understands the goal. It calls a tool. It writes a plan. It takes the first step. Then reality pushes back: a missing field, an unclear constraint, a failed API call, a contradictory user request, an impossible subtask, a weak assumption.&lt;/p&gt;

&lt;p&gt;At that moment, many agents do not adjust themselves.&lt;/p&gt;

&lt;p&gt;They repeat. They rephrase. They overthink. They add more steps. They call the same tool again. They produce a more confident version of the same mistake.&lt;/p&gt;

&lt;p&gt;That is why prompts are not enough for long-running AI agents.&lt;/p&gt;

&lt;p&gt;A prompt tells an agent what to do. A survival framework tells it how to continue when the task pushes back.&lt;/p&gt;

&lt;p&gt;This article introduces a small ontology-inspired model for AI agent behavior:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A stable agent needs two loops: external action and internal adjustment.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  1. The Prompt Patch Problem
&lt;/h2&gt;

&lt;p&gt;When an AI agent fails, the usual response is to patch the prompt.&lt;/p&gt;

&lt;p&gt;We add:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;more rules&lt;/li&gt;
&lt;li&gt;more constraints&lt;/li&gt;
&lt;li&gt;more examples&lt;/li&gt;
&lt;li&gt;more warnings&lt;/li&gt;
&lt;li&gt;more formatting requirements&lt;/li&gt;
&lt;li&gt;more tool-use instructions&lt;/li&gt;
&lt;li&gt;more "do not hallucinate" clauses&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Sometimes this works.&lt;/p&gt;

&lt;p&gt;But prompt patching has a limit. Past a certain point, the prompt becomes a pile of defensive instructions. The agent is not becoming more stable. It is simply carrying more fragile rules.&lt;/p&gt;

&lt;p&gt;The problem is deeper:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Many prompts describe the desired behavior, but they do not define how the agent should transform itself after failure.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That missing transformation is the core issue.&lt;/p&gt;

&lt;h3&gt;
  
  
  Diagram: Prompt Patch vs Adjustment Loop
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsezs9dfi01y53v4ytft4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsezs9dfi01y53v4ytft4.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Prompt patching says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Here is another rule. Try not to fail again."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Internal adjustment says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"When you fail, identify what changed inside your model of the task, then act again."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Those are not the same thing.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. The Failure Pattern
&lt;/h2&gt;

&lt;p&gt;Here is a common long-running agent failure pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User:
Find 20 relevant communities where I can discuss AI agent reliability,
then draft a short post for each one.

Agent:
Understood. I will search for communities and draft posts.

Step 1:
The agent searches.

Problem:
The search result is noisy. Some communities ban self-promotion.
Some are inactive. Some are not about AI agents.

Bad agent behavior:
The agent still drafts 20 posts anyway.

Worse agent behavior:
When corrected, it says "You're right" and drafts another 20 posts,
but with slightly different wording.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The failure is not that the agent misunderstood the original instruction.&lt;/p&gt;

&lt;p&gt;The failure is that it did not adjust after discovering new reality:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;community rules matter&lt;/li&gt;
&lt;li&gt;activity level matters&lt;/li&gt;
&lt;li&gt;relevance is not binary&lt;/li&gt;
&lt;li&gt;self-promotion risk must be modeled&lt;/li&gt;
&lt;li&gt;a search result is not yet a valid target&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The agent performed external action.&lt;/p&gt;

&lt;p&gt;It did not perform internal adjustment.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. A Small Ontology for AI Agents
&lt;/h2&gt;

&lt;p&gt;I use "ontology" here in a practical sense.&lt;/p&gt;

&lt;p&gt;Not as a grand metaphysical claim.&lt;/p&gt;

&lt;p&gt;For AI agent design, ontology means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what entities the agent recognizes&lt;/li&gt;
&lt;li&gt;what boundaries it assigns&lt;/li&gt;
&lt;li&gt;what actions it can take&lt;/li&gt;
&lt;li&gt;what feedback it treats as meaningful&lt;/li&gt;
&lt;li&gt;how it updates itself after interaction&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this model, any agent trying to persist through a task needs two loops.&lt;/p&gt;

&lt;h3&gt;
  
  
  Loop 1: External Action
&lt;/h3&gt;

&lt;p&gt;External action is how the agent affects the world.&lt;/p&gt;

&lt;p&gt;It can include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;writing text&lt;/li&gt;
&lt;li&gt;calling tools&lt;/li&gt;
&lt;li&gt;searching&lt;/li&gt;
&lt;li&gt;editing files&lt;/li&gt;
&lt;li&gt;sending messages&lt;/li&gt;
&lt;li&gt;making plans&lt;/li&gt;
&lt;li&gt;asking questions&lt;/li&gt;
&lt;li&gt;changing a workflow&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Loop 2: Internal Adjustment
&lt;/h3&gt;

&lt;p&gt;Internal adjustment is how the agent changes itself after the world pushes back.&lt;/p&gt;

&lt;p&gt;It can include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;revising assumptions&lt;/li&gt;
&lt;li&gt;narrowing scope&lt;/li&gt;
&lt;li&gt;identifying missing data&lt;/li&gt;
&lt;li&gt;recognizing a boundary&lt;/li&gt;
&lt;li&gt;changing strategy&lt;/li&gt;
&lt;li&gt;asking for help&lt;/li&gt;
&lt;li&gt;stopping a risky path&lt;/li&gt;
&lt;li&gt;updating the task model&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Diagram: The Two-Loop Agent
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7alemsvks4ud1vgifdqn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7alemsvks4ud1vgifdqn.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A long-running agent does not need only a stronger instruction.&lt;/p&gt;

&lt;p&gt;It needs a way to process feedback into self-change.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Why Longer Prompts Can Make Agents Less Stable
&lt;/h2&gt;

&lt;p&gt;Longer prompts often try to solve every possible future failure in advance.&lt;/p&gt;

&lt;p&gt;But the real world is interactive. The agent will encounter states that the prompt did not predict.&lt;/p&gt;

&lt;p&gt;When this happens, long prompts can create three problems.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Problem&lt;/th&gt;
&lt;th&gt;What happens&lt;/th&gt;
&lt;th&gt;Why it matters&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Rule collision&lt;/td&gt;
&lt;td&gt;Multiple instructions apply at once&lt;/td&gt;
&lt;td&gt;The agent chooses one arbitrarily&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;False confidence&lt;/td&gt;
&lt;td&gt;The prompt sounds complete&lt;/td&gt;
&lt;td&gt;The agent stops checking reality&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No recovery layer&lt;/td&gt;
&lt;td&gt;The prompt says what to do, not how to recover&lt;/td&gt;
&lt;td&gt;The agent repeats failure&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The issue is not prompt length itself.&lt;/p&gt;

&lt;p&gt;The issue is using prompt length as a substitute for adjustment architecture.&lt;/p&gt;

&lt;h3&gt;
  
  
  A prompt can say:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;If something goes wrong, fix it.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But a stronger agent needs to know:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;What kind of wrong is this?
Did my assumption fail?
Did my boundary fail?
Did my tool fail?
Did my goal conflict with the environment?
Should I continue, ask, narrow, stop, or replan?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is not just instruction following.&lt;/p&gt;

&lt;p&gt;That is self-diagnosis.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. The Four Failure Types
&lt;/h2&gt;

&lt;p&gt;When I look at long-running agent failures, I usually see four categories.&lt;/p&gt;

&lt;h3&gt;
  
  
  Diagram: Agent Failure Map
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvfj1cdnffp2pd8uizxlq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvfj1cdnffp2pd8uizxlq.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  5.1 Assumption Failure
&lt;/h3&gt;

&lt;p&gt;The agent assumes something that is not true.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;It assumes a community allows promotional posts because similar communities do.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  5.2 Boundary Failure
&lt;/h3&gt;

&lt;p&gt;The agent does not recognize what it should not do.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;It drafts outreach messages that violate platform rules or user trust.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  5.3 Validation Failure
&lt;/h3&gt;

&lt;p&gt;The agent does not define how success will be checked.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;It produces a list of targets without checking whether they are active.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  5.4 Adjustment Failure
&lt;/h3&gt;

&lt;p&gt;The agent receives feedback but does not change its internal model.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;It says "You're right" and repeats the same flawed strategy.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This fourth type is the most important.&lt;/p&gt;

&lt;p&gt;Because if the agent has no adjustment loop, the other failures keep returning.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. A Teacher AI -&amp;gt; Student AI Training Example
&lt;/h2&gt;

&lt;p&gt;Here is a simplified example of how an ontology-based teacher AI might train a student AI.&lt;/p&gt;

&lt;h3&gt;
  
  
  Student AI: Before training
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User:
Find forums where I can post about AI agent reliability.

Student AI:
I found 20 forums. Here are draft posts for each one.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The answer looks productive.&lt;/p&gt;

&lt;p&gt;But it skipped key checks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Are these forums active?&lt;/li&gt;
&lt;li&gt;Do they allow self-promotion?&lt;/li&gt;
&lt;li&gt;Are they about AI agents or general AI?&lt;/li&gt;
&lt;li&gt;Should the first action be posting, commenting, or observing?&lt;/li&gt;
&lt;li&gt;Is there a reputational risk?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Teacher AI: Diagnosis
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Teacher AI:
You performed external action, but you did not perform internal adjustment.

Before drafting posts, identify:
1. The target entity: which community?
2. The boundary: what does the community allow?
3. The validation point: what proves the community is relevant?
4. The risk: what behavior would be considered spam?
5. The next safe action: post, comment, observe, or skip?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Student AI: After training
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Student AI:
I should not draft posts yet.

First, I will classify each community by:
- relevance to AI agents
- activity level
- self-promotion rules
- preferred contribution style
- risk level

For high-risk communities, I will not post links.
I will first contribute comments and only share the longer article if someone asks.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a small change.&lt;/p&gt;

&lt;p&gt;But it is the difference between a task executor and an agent that can adjust itself.&lt;/p&gt;




&lt;h2&gt;
  
  
  7. From Prompt Template to Training Protocol
&lt;/h2&gt;

&lt;p&gt;Here is the practical shift:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Prompt template mindset&lt;/th&gt;
&lt;th&gt;Training protocol mindset&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tell the agent what to do&lt;/td&gt;
&lt;td&gt;Teach the agent how to recover&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Add more rules&lt;/td&gt;
&lt;td&gt;Diagnose failure modes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Optimize first answer&lt;/td&gt;
&lt;td&gt;Improve multi-step behavior&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prevent mistakes in advance&lt;/td&gt;
&lt;td&gt;Convert mistakes into adjustment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Focus on output&lt;/td&gt;
&lt;td&gt;Focus on action loop&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This is why I think the future of AI agent reliability will not be only prompt engineering.&lt;/p&gt;

&lt;p&gt;It will also involve agent training protocols.&lt;/p&gt;

&lt;p&gt;Not necessarily in the heavy machine-learning sense.&lt;/p&gt;

&lt;p&gt;Even structured conversations can train behavior if they repeatedly force the agent to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;name the target&lt;/li&gt;
&lt;li&gt;define the boundary&lt;/li&gt;
&lt;li&gt;simulate failure&lt;/li&gt;
&lt;li&gt;validate action&lt;/li&gt;
&lt;li&gt;review feedback&lt;/li&gt;
&lt;li&gt;update strategy&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Diagram: A Minimal Training Protocol
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpv4tk4dqyt80ce5in0q7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpv4tk4dqyt80ce5in0q7.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  8. What This Changes in Agent Design
&lt;/h2&gt;

&lt;p&gt;If this model is useful, then an AI agent prompt should not only contain task instructions.&lt;/p&gt;

&lt;p&gt;It should contain recovery questions.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Before acting:
- What entity am I acting on?
- What boundary limits my action?
- What assumption am I relying on?
- What would prove that I am wrong?

After failure:
- Did the target change?
- Did the boundary change?
- Did my assumption fail?
- Do I need to ask, stop, narrow, or replan?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not a magic solution.&lt;/p&gt;

&lt;p&gt;It will not eliminate hallucination.&lt;/p&gt;

&lt;p&gt;It will not guarantee business outcomes.&lt;/p&gt;

&lt;p&gt;But it gives the agent a better structure for converting failure into adjustment.&lt;/p&gt;

&lt;p&gt;And that is one of the missing layers in long-running agent design.&lt;/p&gt;




&lt;h2&gt;
  
  
  9. The Checklist
&lt;/h2&gt;

&lt;p&gt;When diagnosing an AI agent, I would start with these 10 questions.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Question&lt;/th&gt;
&lt;th&gt;Good sign&lt;/th&gt;
&lt;th&gt;Bad sign&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Does it define the target entity?&lt;/td&gt;
&lt;td&gt;It names what it acts on&lt;/td&gt;
&lt;td&gt;It acts on vague context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Does it define boundaries?&lt;/td&gt;
&lt;td&gt;It knows what not to do&lt;/td&gt;
&lt;td&gt;It overreaches&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Does it define success checks?&lt;/td&gt;
&lt;td&gt;It validates progress&lt;/td&gt;
&lt;td&gt;It assumes completion&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Does it simulate failure?&lt;/td&gt;
&lt;td&gt;It predicts resistance&lt;/td&gt;
&lt;td&gt;It acts blindly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Does it notice missing data?&lt;/td&gt;
&lt;td&gt;It asks or narrows&lt;/td&gt;
&lt;td&gt;It invents&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Does it classify feedback?&lt;/td&gt;
&lt;td&gt;It diagnoses failure type&lt;/td&gt;
&lt;td&gt;It says "sorry" and repeats&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Does it update strategy?&lt;/td&gt;
&lt;td&gt;It changes its approach&lt;/td&gt;
&lt;td&gt;It rephrases&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Does it know when to stop?&lt;/td&gt;
&lt;td&gt;It uses stop-loss&lt;/td&gt;
&lt;td&gt;It loops&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Does it escalate uncertainty?&lt;/td&gt;
&lt;td&gt;It asks for help&lt;/td&gt;
&lt;td&gt;It hides uncertainty&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Does it record the adjustment?&lt;/td&gt;
&lt;td&gt;It learns within the session&lt;/td&gt;
&lt;td&gt;It forgets the correction&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If an agent fails most of these, it probably does not need a longer prompt first.&lt;/p&gt;

&lt;p&gt;It needs an internal adjustment loop.&lt;/p&gt;




&lt;h2&gt;
  
  
  10. Open Question
&lt;/h2&gt;

&lt;p&gt;I am still testing this framework, so I am more interested in criticism than agreement.&lt;/p&gt;

&lt;p&gt;My current claim is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Long-running AI agents fail when they can perform external action but cannot convert feedback into internal adjustment.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I am curious:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Do you see the same pattern in your own AI agents?&lt;/li&gt;
&lt;li&gt;Are there failure types this model misses?&lt;/li&gt;
&lt;li&gt;Have you found a better way to train recovery behavior?&lt;/li&gt;
&lt;li&gt;Is "ontology" the wrong word for this, even if the model is useful?&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>agents</category>
      <category>promptengineering</category>
      <category>ai</category>
      <category>automation</category>
    </item>
  </channel>
</rss>
