<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Luhui Dev</title>
    <description>The latest articles on Forem by Luhui Dev (@luhuidev).</description>
    <link>https://forem.com/luhuidev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3812586%2Fae3fdf39-429c-4db4-8f7a-71835412451a.jpeg</url>
      <title>Forem: Luhui Dev</title>
      <link>https://forem.com/luhuidev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/luhuidev"/>
    <language>en</language>
    <item>
      <title>Dino-GSP Major Update: Algeo SDK 2.0 embedded editing mode is now available</title>
      <dc:creator>Luhui Dev</dc:creator>
      <pubDate>Sun, 10 May 2026 15:16:35 +0000</pubDate>
      <link>https://forem.com/luhuidev/dino-gsp-major-update-algeo-sdk-20-embedded-editing-mode-is-now-available-5ea</link>
      <guid>https://forem.com/luhuidev/dino-gsp-major-update-algeo-sdk-20-embedded-editing-mode-is-now-available-5ea</guid>
      <description>&lt;p&gt;Videos can be embedded. Documents can be embedded. Spreadsheets can be embedded.&lt;/p&gt;

&lt;p&gt;But what about &lt;strong&gt;geometry&lt;/strong&gt;?&lt;/p&gt;

&lt;p&gt;For the past decade, whenever a product needed users to draw a geometry problem, edit a dynamic figure, or save an interactive geometry asset, the workflow usually broke in the same place: leave the product, use a separate tool, take a screenshot, and paste it back. That fractured workflow has sat in the middle of education platforms, teaching research systems, and AI math products for years.&lt;/p&gt;

&lt;p&gt;Today, &lt;strong&gt;&lt;a href="https://open.dajiaoai.com/?utm_source=luhuidev" rel="noopener noreferrer"&gt;Algeo SDK 2.0 embedded editing mode&lt;/a&gt;&lt;/strong&gt; is officially available. Geometry is no longer the missing embeddable format. It can now live inside your product like a standard component, with data flowing back into your business system, UI matching your product design, and permissions staying under your own control.&lt;/p&gt;

&lt;p&gt;Here are five common scenarios we see. If any of them sounds like your product, this release is worth a closer look.&lt;/p&gt;



&lt;h2&gt;
  
  
  Scenario 1: online education platforms can let teachers create geometry problems in place
&lt;/h2&gt;

&lt;p&gt;A high school math teacher is preparing tomorrow's geometry lesson on your platform. She needs an example problem about angle proofs in a circle.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before&lt;/strong&gt;: she opened a separate geometry tool, finished the diagram, took a screenshot, and pasted it back into your question bank. The text lived in one place and the image in another. Students saw a static picture that could not be dragged, edited, or reused after the test.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Now&lt;/strong&gt;: she clicks "insert geometry board" in your question bank admin, and the Algeo editor opens in place. Circles, points, and auxiliary lines are created in the same workflow. When she saves, the board data enters your question bank and is bound to her account, school, and textbook chapter.&lt;/p&gt;

&lt;p&gt;When students open the problem, they can drag a point on the circle and see the angle change directly. Throughout the whole process, &lt;strong&gt;your product stays in control&lt;/strong&gt;: the data is yours, the permissions are yours, the content rights are yours, and the user behavior logs are yours.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsp1wgfvlw7awtf2fh0w7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsp1wgfvlw7awtf2fh0w7.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1p8ejnf8rbpuzqmv26zk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1p8ejnf8rbpuzqmv26zk.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;



&lt;h2&gt;
  
  
  Scenario 2: AI math products can let AI and students work on the same board
&lt;/h2&gt;

&lt;p&gt;This is one of the fastest-growing customer categories we have seen over the past year.&lt;/p&gt;

&lt;p&gt;A student uploads a photo of a geometry problem. Your AI parses the problem and generates a solution path. But text alone is not enough. The student needs to &lt;strong&gt;see&lt;/strong&gt; why an auxiliary line is drawn that way, and needs to &lt;strong&gt;test by hand&lt;/strong&gt; whether an equality still holds when a point starts moving.&lt;/p&gt;

&lt;p&gt;Algeo embedded editing closes that loop for the first time:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;After AI parsing, code can generate board content and load it into the editor automatically&lt;/li&gt;
&lt;li&gt;Students interact directly inside your product by dragging, modifying, and trying alternatives&lt;/li&gt;
&lt;li&gt;Every student edit can be sent back to your system as an event and used in the next AI analysis round&lt;/li&gt;
&lt;li&gt;AI can respond to the student's specific change instead of giving generic explanation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Education is a &lt;strong&gt;feedback loop&lt;/strong&gt;. Text plus static diagrams can no longer carry that loop for geometry. The missing piece is a board that can be driven by code while still giving students hands-on control.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe6n0vfy5i3xm1pkzdo45.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe6n0vfy5i3xm1pkzdo45.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq9n4as04z8m2r7oainni.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq9n4as04z8m2r7oainni.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;



&lt;h2&gt;
  
  
  Scenario 3: educational publishing can turn geometry assets into a managed production workflow
&lt;/h2&gt;

&lt;p&gt;In many publishing workflows, geometry illustrations used to operate like a separate workshop: an author drew the figure, a designer remade it as vector art, an editor reviewed it, and a layout designer processed it again. One geometry asset for one problem could pass through four tools and five people.&lt;/p&gt;

&lt;p&gt;After embedding Algeo into a content management system, that pipeline becomes much flatter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Authors write problems and draw figures directly in the CMS, with assets stored as structured geometry data rather than images&lt;/li&gt;
&lt;li&gt;Editors can open the original board and revise it directly instead of asking the author to recreate it&lt;/li&gt;
&lt;li&gt;The same geometry data can export to PDF, web, print, and interactive courseware: &lt;strong&gt;draw once, reuse everywhere&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Version control stays inside the CMS, so geometry boards stop being external unmanaged files&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For content organizations, this is not just about saving one tool. It is about turning geometry into a managed asset.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq7rolxl07hg2z4c4qr9d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq7rolxl07hg2z4c4qr9d.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2lwbdb4kx9f6c1u1r0u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2lwbdb4kx9f6c1u1r0u.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;



&lt;h2&gt;
  
  
  Scenario 4: schools and institutions can finally build a shared geometry asset library
&lt;/h2&gt;

&lt;p&gt;Teaching research has an old pain point: Chinese language groups have material libraries, English groups have corpora, math teams have question banks, but &lt;strong&gt;geometry&lt;/strong&gt; often remains scattered. Every teacher has dozens of local geometry source files. They leave with the teacher, disappear with an old computer, and are hard for new teachers to inherit.&lt;/p&gt;

&lt;p&gt;When an institution embeds Algeo into its collaborative teaching research platform:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Geometry assets enter the institutional asset library and can be organized by subject, grade, and knowledge point&lt;/li&gt;
&lt;li&gt;Teachers can remix the same board while keeping a complete revision history&lt;/li&gt;
&lt;li&gt;New teachers can receive accumulated geometry resources on day one&lt;/li&gt;
&lt;li&gt;Permissions and approvals follow the institution's own rules, including what can be shared broadly and what stays inside a subject group&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftcaswwaxh7qygkxiadtu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftcaswwaxh7qygkxiadtu.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;



&lt;h2&gt;
  
  
  Scenario 5: question banks and homework systems can make geometry a first-class format
&lt;/h2&gt;

&lt;p&gt;Many question bank systems have structured templates for multiple choice, fill-in-the-blank, and written-response questions. &lt;strong&gt;Geometry is often still just an image&lt;/strong&gt;. That creates three limits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Similar-question recommendation is weak because the system cannot tell whether two geometry problems share the same mathematical structure&lt;/li&gt;
&lt;li&gt;Fine-grained grading is hard because the student's answer often comes back as another image&lt;/li&gt;
&lt;li&gt;Learning analytics are shallow because the system cannot see which construction step caused the student to get stuck&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once Algeo turns geometry problems into structured data, these workflows become possible:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Both the problem and the solving process are structured, so the question bank can handle geometry more like algebra&lt;/li&gt;
&lt;li&gt;Every student operation can be reported back, allowing the grading system to locate which point was moved at which step&lt;/li&gt;
&lt;li&gt;Learning analytics can tell a teacher that 70% of a class did not think to draw a specific auxiliary line&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsau5z1jvpun8dkvr9b99.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsau5z1jvpun8dkvr9b99.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;



&lt;h2&gt;
  
  
  What is ready at the technical level
&lt;/h2&gt;

&lt;p&gt;The scenarios are compelling, but production adoption is always an engineering problem. Algeo SDK 2.0 is designed to be production-ready in several core areas.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bidirectional communication with clear data ownership
&lt;/h3&gt;

&lt;p&gt;Every edit, board switch, and save request can be sent back to the host application through &lt;code&gt;postMessage&lt;/code&gt;. &lt;strong&gt;You control the save button&lt;/strong&gt;. The iframe does not bypass your business system to persist anything directly. When to save, where to save, and which permissions are required are all decided by your backend. The SDK only maintains the UI state for saved and unsaved changes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fully configurable UI that fits into your product
&lt;/h3&gt;

&lt;p&gt;The navigation bar, board list, toolbox, algebra panel, and document panel can each be toggled independently at runtime. In an AI-assisted scenario, the editor can be reduced to a clean canvas. In a professional authoring scenario, the full toolchain can be shown. In advanced integrations, you can even &lt;strong&gt;replace our board list with your own UI&lt;/strong&gt; and drive it through the SDK capability APIs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Engineered capability layers
&lt;/h3&gt;

&lt;p&gt;The SDK separates editor capabilities into four clear units: board file document, multi-board slides, history, and display mode. Each unit can be called independently, which also gives us room to improve each one over time without breaking the others.&lt;/p&gt;

&lt;h3&gt;
  
  
  Versioned protocol for long-term evolution
&lt;/h3&gt;

&lt;p&gt;Every handshake between the SDK and iframe carries a protocol version. That means an integration you build today can continue to work after future upgrades, while still allowing us to deliver new capabilities without asking you to rewrite the integration every time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Production-oriented robustness
&lt;/h3&gt;

&lt;p&gt;The SDK includes a 30-second initialization timeout, standardized error codes, a clean destroy lifecycle, and self-hosted base URL support through &lt;code&gt;baseUrl&lt;/code&gt;. These details matter when a real product faces network jitter, CSP rules, and complex route changes in single-page applications. We have already validated the approach in multiple production customer environments.&lt;/p&gt;



&lt;h2&gt;
  
  
  Why choose Dino-GSP and Algeo
&lt;/h2&gt;

&lt;p&gt;There are very few teams in China that can build a &lt;strong&gt;dynamic geometry&lt;/strong&gt; editor at this level. We spent a year making it production-ready, then another release cycle turning it from a product into a component. Geometry as a category really opens up only when it can be installed inside any product.&lt;/p&gt;

&lt;p&gt;If your product contains the word "geometry", whether in K12, higher education, AI math, educational publishing, or teaching research, we would be glad to talk.&lt;/p&gt;

&lt;p&gt;Docs: &lt;a href="https://open.dajiaoai.com/?utm_source=luhuidev" rel="noopener noreferrer"&gt;open.dajiaoai.com&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Repository: &lt;a href="https://github.com/dajiaoai/algeo-sdk" rel="noopener noreferrer"&gt;github.com/dajiaoai/algeo-sdk&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Put a geometry board inside your product, starting today.&lt;/p&gt;

</description>
      <category>luhuidev</category>
      <category>aitools</category>
    </item>
    <item>
      <title>AHE Deep Dive: How Coding Agent Harnesses Automatically Evolve</title>
      <dc:creator>Luhui Dev</dc:creator>
      <pubDate>Mon, 04 May 2026 15:02:16 +0000</pubDate>
      <link>https://forem.com/luhuidev/ahe-deep-dive-how-coding-agent-harnesses-automatically-evolve-2him</link>
      <guid>https://forem.com/luhuidev/ahe-deep-dive-how-coding-agent-harnesses-automatically-evolve-2him</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;When building a coding agent, the capability of your base model is only part of the equation. In real production scenarios, what matters just as much is the &lt;strong&gt;harness&lt;/strong&gt; wrapped around that model — the prompt, tools, middleware, memory, execution environment, trace, and evaluation pipeline.&lt;/p&gt;

&lt;p&gt;This is exactly what the AHE paper addresses: &lt;strong&gt;how to make a coding agent's harness continuously observable, modifiable, testable, rollback-able, and even self-iterating — just like software engineering.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The full paper title is &lt;strong&gt;"Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses"&lt;/strong&gt;, authored by researchers from Fudan University, Peking University, and Shanghai Qiji Zhifeng Co., Ltd. The academic teams bring methodological design, while the industry team contributes experience from Agent/LLM infrastructure and Nex AGI systems.&lt;/p&gt;

&lt;p&gt;Even better, AHE is open source: &lt;code&gt;china-qijizhifeng/agentic-harness-engineering&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This makes it more than just a paper concept — you can directly examine the seed coding agent, evolve agent, experiment configs, traces, manifests, and rollback structures. For anyone building coding agents, agent infrastructure, or broader agent products, this repository is worth dissecting.&lt;/p&gt;

&lt;p&gt;This article explores three questions: why AHE works, how it evolves harnesses, and how to start your own small experiment with the repository.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 1: A Quick Intro to Harness Engineering
&lt;/h2&gt;

&lt;p&gt;A harness is the external engineering shell that makes a model actually work. In a coding agent, it typically includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;System prompt&lt;/strong&gt;: defines the agent's basic working mode&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tools&lt;/strong&gt;: file I/O, shell, search, test execution, code modification, etc.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tool descriptions&lt;/strong&gt;: what the model sees about tool usage and parameter schemas&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Middleware&lt;/strong&gt;: interception, validation, correction, and logging before/after tool calls&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Memory&lt;/strong&gt;: short-term, long-term, and experience accumulation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Context management&lt;/strong&gt;: compression, pruning, and retrieval&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Execution environment&lt;/strong&gt;: sandbox, permissions, runtime isolation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Evaluation/observability&lt;/strong&gt;: testing, trace, logs, rewards, failure reports, regression tracking&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This structure determines how the model approaches tasks, invokes tools, handles failures, and judges completion.&lt;/p&gt;

&lt;p&gt;For example, when a shell command hangs in production, the solution isn't to keep adding "don't use interactive commands" to the prompt. A more robust approach: add timeout to the shell tool, use middleware to detect high-risk commands, truncate long outputs at the response layer, and enforce state checks before task completion.&lt;/p&gt;

&lt;p&gt;This is the essence of Harness Engineering: putting agent capabilities into a maintainable runtime system.&lt;/p&gt;

&lt;p&gt;I won't dive deeper into the Harness concept here. If you want to learn more, search for keywords like: Harness Engineering, Agent Harness, Agent Runtime, Tool-use Agent, Agent Observability, Agent Evaluation, Coding Agent Infrastructure.&lt;/p&gt;

&lt;p&gt;Let's move to the main focus of this article.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 2: AHE's Core Positioning — Self-Iterating Coding Agent Harnesses
&lt;/h2&gt;

&lt;p&gt;AHE stands for &lt;strong&gt;Agentic Harness Engineering&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The paper's subtitle contains the key phrase: &lt;strong&gt;Observability-Driven Automatic Evolution of Coding-Agent Harnesses&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This breaks down into three layers:&lt;/p&gt;

&lt;p&gt;First, AHE targets &lt;strong&gt;coding agent harnesses&lt;/strong&gt;. It doesn't train new models or modify base model parameters.&lt;/p&gt;

&lt;p&gt;Second, it performs &lt;strong&gt;automatic evolution&lt;/strong&gt;. The goal isn't a one-time manual prompt tweak, but continuous harness evolution across multiple runs.&lt;/p&gt;

&lt;p&gt;Third, it relies on &lt;strong&gt;observability&lt;/strong&gt;. Changes come from traces, logs, rewards, failure analysis, change manifests — not from vague "self-reflection" in a prompt.&lt;/p&gt;

&lt;p&gt;So AHE's precise positioning is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;An automatic evolution framework for coding agent harnesses. Through observable runtime evidence, it continuously improves the agent's surrounding prompt, tools, middleware, memory, skills, and sub-agents.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the key difference from ordinary prompt optimization. AHE does modify prompts, but its &lt;strong&gt;action space is much larger — it includes tools, middleware, and memory as evolvable structures&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 3: AHE's Experimental Results
&lt;/h2&gt;

&lt;p&gt;AHE's main experiments ran on Terminal-Bench 2. The paper reports that after 10 iterations, AHE improved the seed harness's pass @1 from &lt;strong&gt;69.7% to 77.0%&lt;/strong&gt;. This shows that on the target benchmark, AHE found effective harness modifications.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2F0094abe827f1daff598a8586b8943a64be147b4e8c28a826975e7d62ad5546ef" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2F0094abe827f1daff598a8586b8943a64be147b4e8c28a826975e7d62ad5546ef" alt="Results Chart" width="1502" height="725"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The ablation study is even more revealing. The paper replaced different components in full AHE back to the seed harness individually, with roughly these results:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2Fdbc78ed72f2d6e8a0f5afbcb56b474840216fad514f863392b1360362d1ace1d" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2Fdbc78ed72f2d6e8a0f5afbcb56b474840216fad514f863392b1360362d1ace1d" alt="Ablation Study" width="908" height="496"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This result is highly informative.&lt;/p&gt;

&lt;p&gt;If gains mainly came from better system prompts, prompt-only should improve. But in the experiment, prompt-only actually decreased, while memory, tools, and middleware showed more significant improvements.&lt;/p&gt;

&lt;p&gt;This means AHE's key benefits come from structural harness modifications. It also suggests that in complex tasks, many agent failures require harder (more engineering-focused) mechanisms: tool behavior, runtime interception, state recording, long-term experience, regression testing.&lt;/p&gt;

&lt;p&gt;The paper also conducted transfer experiments. When the evolved harness transferred to SWE-bench-verified, success rate gains were small, but token usage dropped more noticeably. This suggests AHE's evolved structures may be better at reducing ineffective exploration and context waste.&lt;/p&gt;

&lt;p&gt;Cross-model transfer is also noteworthy. When AHE-generated harnesses were applied to multiple base models, the paper reports positive gains across the board. This indicates the learned components contain some transferable engineering structures.&lt;/p&gt;

&lt;p&gt;My assessment: AHE's prediction of "which changes will fix problems" is significantly better than random, but its prediction of "which changes will cause regressions" is still relatively weak. It does prove that harnesses can be continuously evolved in a file-based, evidence-based, version-controlled manner.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 4: AHE's Key Workflow — Evaluate, Diagnose, Modify, Verify, Rollback
&lt;/h2&gt;

&lt;p&gt;AHE's main loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;graph TD
    A[Current Harness] --&amp;gt; B[Run Code Agent on benchmark]
    B --&amp;gt; C[Collect trace, log, reward]
    C --&amp;gt; D[Analyze failure patterns]
    D --&amp;gt; E[Evolve Agent modifies Harness files]
    E --&amp;gt; F[Write change_manifest]
    F --&amp;gt; G[Re-evaluate next round]
    G --&amp;gt; H[Verify if changes work, rollback if needed]
    H -.-&amp;gt; A
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This closed loop has three main actors.&lt;/p&gt;

&lt;p&gt;First is the &lt;strong&gt;Code Agent&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This is the actual agent completing coding tasks, and the object being optimized. In the AHE repository, the seed agent is quite simple — basically a bash-only coding agent.&lt;/p&gt;

&lt;p&gt;Second is the &lt;strong&gt;Agent Debugger&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It reads the Code Agent's execution traces and compresses massive traces into readable failure reports. After a benchmark run, raw traces can be extremely long, making direct model reading too costly. Agent Debugger converts these traces into overviews and per-task analyses, providing evidence for subsequent modifications.&lt;/p&gt;

&lt;p&gt;Third is the &lt;strong&gt;Evolve Agent&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It reads the previous round's results, failure analysis, and historical modification records, then modifies harness files in the workspace. Its modification targets include prompts, tools, middleware, memory, skills, sub-agent configs, etc.&lt;/p&gt;

&lt;p&gt;AHE adds strong engineering constraints to this process:&lt;/p&gt;

&lt;p&gt;Every modification must land in files. Every modification requires a manifest. The next round must verify predictions in the manifest. Poor results must be rollback-able. The entire process should leave an auditable evidence chain.&lt;/p&gt;

&lt;p&gt;The self-reflection agent must answer more specific questions: which file was changed, why, which tasks are expected to be fixed, which tasks might be harmed, and whether the next round's results validate this judgment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 5: What Evolvable Components Does AHE Break the Harness Into?
&lt;/h2&gt;

&lt;p&gt;AHE's first step is breaking the harness into explicit components.&lt;/p&gt;

&lt;p&gt;The paper emphasizes several evolvable object types:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;System Prompt&lt;/strong&gt;: Defines the Code Agent's basic behavior, like executing shell non-interactively, checking state before task completion, not exiting prematurely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tool Descriptions&lt;/strong&gt;: What the model sees about tools. The tool itself might not change, but if the description changes, so does how the model calls it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tool Implementations&lt;/strong&gt;: The actual tool implementation. For example, how the shell tool executes commands, handles timeouts, truncates output, returns error messages.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Middleware&lt;/strong&gt;: Runtime interception layer. It can check before/after tool calls, like detecting dangerous commands, reminding about unverified tasks, blocking premature endings, recording risk states.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skills&lt;/strong&gt;: Reusable experience. Think of these as operation manuals for certain task patterns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sub-agents&lt;/strong&gt;: Sub-agent configurations. Complex tasks can be split to different roles.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Long-term Memory&lt;/strong&gt;: For accumulating experience across tasks and rounds.&lt;/p&gt;

&lt;p&gt;This decomposition gives the Evolve Agent a richer action space. It can choose the right place to intervene based on failure evidence.&lt;/p&gt;

&lt;p&gt;Example: Code Agent keeps hanging in shell. The least efficient approach is adding more prompt reminders. AHE's path is more engineering-focused: add timeout to shell tool; middleware checks for obviously interactive commands; return messages explicitly state failure reasons; system prompt adds behavioral constraints.&lt;/p&gt;

&lt;p&gt;These structural modifications are more stable and easier to reuse and rollback.&lt;/p&gt;

&lt;p&gt;The key is understanding the positioning: &lt;strong&gt;prompts are behavioral suggestions; tools, middleware, and memory are execution mechanisms.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;AHE's value lies in bringing these execution mechanisms into the evolution scope.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 6: Three Layers of Observability — How AHE Avoids Blind Search
&lt;/h2&gt;

&lt;p&gt;Just having an agent randomly modify files and rerun benchmarks has limited value. AHE's core design is three layers of observability.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Component Observability
&lt;/h3&gt;

&lt;p&gt;Component observability means the system knows what parts the harness has, where each part is, how to modify it, and how to register it.&lt;/p&gt;

&lt;p&gt;In the AHE repository, prompts, tool descriptions, tool implementations, middleware, memory, etc., all appear as files. New tools need YAML descriptions and Python implementations, plus config registration; new middleware needs explicit integration; new skills or sub-agents also need config exposure.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Experience Observability
&lt;/h3&gt;

&lt;p&gt;Experience observability means after an agent runs, the system records how it succeeded or failed.&lt;/p&gt;

&lt;p&gt;AHE collects each task's trace, runtime log, reward, etc. Then Agent Debugger compresses these raw traces into analysis reports.&lt;/p&gt;

&lt;p&gt;When a coding agent fails, simply knowing "it failed" isn't very useful. What you really need to locate is the failure level: command execution failure, dependency installation failure, test not run, file path error, output too long causing context pollution, agent prematurely judging task complete, losing previous state in long tasks.&lt;/p&gt;

&lt;p&gt;Through traces and analysis, AHE turns failures into readable, summarizable, actionable evidence.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Decision Observability
&lt;/h3&gt;

&lt;p&gt;After each modification, the Evolve Agent must write a &lt;code&gt;change_manifest.json&lt;/code&gt;. This manifest records which files were changed, what failure pattern they address, why this component was chosen, which tasks are expected to be fixed, which might regress, and the modification's constraint strength.&lt;/p&gt;

&lt;p&gt;After the next evaluation round, the system checks this manifest to see if predictions came true.&lt;/p&gt;

&lt;p&gt;This step turns every modification into a verifiable hypothesis. Even without using AHE's full automatic evolution pipeline, just introducing the change manifest habit into your own agent team will immediately improve engineering transparency.&lt;/p&gt;

&lt;p&gt;Many agent projects struggle with long-term maintenance precisely because of this: lots of prompt changes, lots of tool adjustments, but nobody knows what each change actually solved, and nobody knows if it introduced new problems. AHE's manifest mechanism at least makes this process auditable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 7: AHE's Engineering Organization from the Repository
&lt;/h2&gt;

&lt;p&gt;The main entry point for the AHE repository is &lt;code&gt;evolve.py&lt;/code&gt;. It orchestrates the entire evolution workflow, including initializing workspace, running evaluations, handling iteration directories, doing attribution, recovery, and rollback.&lt;/p&gt;

&lt;p&gt;The seed agent being evolved is &lt;code&gt;agents/code_agent_simple/&lt;/code&gt;, which includes:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;code_agent.yaml&lt;/code&gt; describes how this agent loads prompts, which tools it uses, what tracer to use.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;systemprompt.md&lt;/code&gt; is the initial system prompt.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;LongTermMEMORY.md&lt;/code&gt; and &lt;code&gt;ShortTermMEMORY.md&lt;/code&gt; correspond to long-term and short-term memory interfaces. &lt;code&gt;tool_descriptions/&lt;/code&gt; holds tool descriptions, &lt;code&gt;tools/&lt;/code&gt; holds tool implementations.&lt;/p&gt;

&lt;p&gt;The Evolve Agent is in &lt;code&gt;agents/evolve_agent/&lt;/code&gt;. Key files worth examining:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;evolve_agent.yaml&lt;/code&gt; defines what tools, middleware, and skills the Evolve Agent itself can use.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;evolve_prompt.md&lt;/code&gt; is an evolution contract: it specifies that Evolve Agent can only modify workspace, must make evidence-based changes, must write summaries and manifests, must follow registration rules.&lt;/p&gt;

&lt;p&gt;Config files are in &lt;code&gt;configs/&lt;/code&gt; and &lt;code&gt;configs/experiments/&lt;/code&gt;. &lt;code&gt;configs/base.yaml&lt;/code&gt; is the base config, &lt;code&gt;configs/experiments/exp-simple-code-gpt54.yaml&lt;/code&gt; is a config overlay close to the paper experiments.&lt;/p&gt;

&lt;p&gt;Launch scripts are in &lt;code&gt;scripts/&lt;/code&gt;, like &lt;code&gt;scripts/evolve.sh&lt;/code&gt; for starting long experiments, &lt;code&gt;scripts/build_templates.py&lt;/code&gt; for building task templates for E2B.&lt;/p&gt;

&lt;p&gt;If you just want to understand the project, you don't need to read all files at once. I recommend this reading order:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;README
  ↓
agents/code_agent_simple/code_agent.yaml
  ↓
agents/code_agent_simple/systemprompt.md
  ↓
agents/evolve_agent/evolve_prompt.md
  ↓
configs/base.yaml
  ↓
configs/experiments/exp-simple-code-gpt54.yaml
  ↓
evolve.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This sequence helps you build concepts first, then see execution details.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 8: Getting Started with the Repository — Run a Small Experiment First
&lt;/h2&gt;

&lt;p&gt;AHE is not a lightweight SDK. You can't expect to &lt;code&gt;pip install&lt;/code&gt; and immediately embed it in production systems.&lt;/p&gt;

&lt;p&gt;It's more like a research experiment framework. Running full paper-level experiments requires LLM API, E2B sandbox, SERPER API, benchmark data, concurrent scheduling, and considerable token costs.&lt;/p&gt;

&lt;p&gt;So a more realistic onboarding approach is to run a minimal closed loop first.&lt;/p&gt;

&lt;p&gt;Set the goal as: get AHE's core pipeline running.&lt;/p&gt;

&lt;p&gt;That is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;graph LR
    A[Task execution] --&amp;gt; B[Trace generation]
    B --&amp;gt; C[Analysis generation]
    C --&amp;gt; D[change_manifest written]
    D --&amp;gt; E[Next round re-evaluation]
    E --&amp;gt; F[change_evaluation&amp;lt;br&amp;gt;judges modification effect]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once this pipeline works, you understand AHE's practical value.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Clone the Repository
&lt;/h3&gt;

&lt;p&gt;Official repository:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone https://github.com/china-qijizhifeng/agentic-harness-engineering.git
cd agentic-harness-engineering
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Install Dependencies
&lt;/h3&gt;

&lt;p&gt;The project uses &lt;code&gt;uv&lt;/code&gt; to manage Python dependencies.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;uv sync
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Configure Environment Variables
&lt;/h3&gt;

&lt;p&gt;Copy the environment variable template:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cp .env.example .env
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At minimum, pay attention to these variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LLM_API_KEY
LLM_BASE_URL
E2B_API_KEY
SERPER_API_KEY
GITHUB_TOKEN
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Agent Debugger can also configure model endpoints separately. Refer to &lt;code&gt;.env.example&lt;/code&gt; for specifics.&lt;/p&gt;

&lt;p&gt;One important note: AHE's task execution depends on E2B sandbox. Much code execution happens in isolated remote environments. This helps with security and reproducibility, but also means you need an E2B account and credits.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Prepare Benchmark Task Templates
&lt;/h3&gt;

&lt;p&gt;The official workflow requires building task templates first. Example command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;uv run python scripts/build_templates.py --dataset-dir /path/to/dataset -j 16
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Replace &lt;code&gt;/path/to/dataset&lt;/code&gt; with your actual task data path.&lt;/p&gt;

&lt;p&gt;If you're just doing a small experiment, I don't recommend preparing full Terminal-Bench 2 at the start. Select a few tasks and get the pipeline working first — that's more important.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Start with a Small Config
&lt;/h3&gt;

&lt;p&gt;For paper experiment config, refer to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;configs/experiments/exp-simple-code-gpt54.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Running the full config is quite costly. Copy a small config, for example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cp configs/experiments/exp-simple-code-gpt54.yaml configs/experiments/exp-mini.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then reduce the parameters:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;max_iterations: 2
harbor:
  k: 2
  n_concurrent: 4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the config supports specifying task subsets, use only 3 to 5 tasks. The point of a small experiment is validating the workflow, not chasing scores.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Launch the Evolution Experiment
&lt;/h3&gt;

&lt;p&gt;You can use the script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;./scripts/evolve.sh configs/experiments/exp-mini.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or look inside the script to see how it calls &lt;code&gt;evolve.py&lt;/code&gt;, then manually launch as needed.&lt;/p&gt;

&lt;p&gt;Full experiments can run for a long time. Even small experiments require attention to API costs, E2B concurrency limits, and network stability.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. Look at Experiment Artifacts, Not Just Scores
&lt;/h3&gt;

&lt;p&gt;After running, don't just look at pass rate.&lt;/p&gt;

&lt;p&gt;What's more worth examining are these artifacts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;runs/iteration_*/
analysis/overview.md
analysis/detail/*.md
change_manifest.json
change_evaluation.json
agent/nexau_in_memory_tracer.cleaned.json
verifier/reward.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After running, focus on observing and answering these questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;What patterns were this round's failures attributed to?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Which files did Evolve Agent change?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Why did it choose to change these files?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Which tasks does the manifest predict will be fixed?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Did the next round verify this prediction?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Were there cases where fixing one task broke another?&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you can find answers to all these questions in the artifacts, it means AHE's core closed loop is working.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 9: What AHE Hasn't Solved Yet
&lt;/h2&gt;

&lt;p&gt;AHE is valuable, but its boundaries should be clear too.&lt;/p&gt;

&lt;p&gt;First, it's still a research framework. Full runs aren't cheap, requiring benchmarks, sandboxes, LLM APIs, and fairly complex experiment configs.&lt;/p&gt;

&lt;p&gt;Second, the effectiveness evidence in the paper needs more replication experiments. The improvement on Terminal-Bench 2 is clear, but for strong statistical conclusions, more seeds, more campaigns, and more confidence intervals are needed.&lt;/p&gt;

&lt;p&gt;Third, its prediction of regression risk isn't strong enough. The system is better at explaining what a modification might fix, but not as good at judging what it might harm. This is a hard problem for automatic evolution systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 10: AHE's Inspiration for Agent Product Teams
&lt;/h2&gt;

&lt;p&gt;AHE's biggest inspiration for product-focused agent teams is pulling agent improvement processes from "mystical prompt tuning" back into the engineering world.&lt;/p&gt;

&lt;p&gt;A real agent product will eventually face these questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;After a user reports an error, how do you reproduce it?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;How do you aggregate failure causes?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Did a certain prompt modification actually help?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Did a tool change regress other scenarios?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Is there regression testing before release?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Can you rollback if production performance degrades?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;How do you distill effective experience into memory or skills?&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No single model can solve these problems for you.&lt;/p&gt;

&lt;p&gt;They belong to the scope of harness engineering work.&lt;/p&gt;

&lt;p&gt;If you're also building your own agent, this repository is worth thoroughly dissecting. Even without running it completely, you can learn a lot about harness organization, trace design, modification attribution, and regression verification engineering methods.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses\&lt;br&gt;
arXiv: &lt;a href="https://arxiv.org/abs/2604.25850" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2604.25850&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;AHE Official Code Repository\&lt;br&gt;
GitHub: &lt;a href="https://github.com/china-qijizhifeng/agentic-harness-engineering" rel="noopener noreferrer"&gt;https://github.com/china-qijizhifeng/agentic-harness-engineering&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Harness engineering: leveraging Codex in an agent-first world\&lt;br&gt;
OpenAI Engineering Blog: &lt;a href="https://openai.com/index/harness-engineering/" rel="noopener noreferrer"&gt;https://openai.com/index/harness-engineering/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;🙋‍&lt;br&gt;
&lt;em&gt;I’m &lt;a href="http://luhuidev.com/" rel="noopener noreferrer"&gt;Luhui Dev&lt;/a&gt;, a developer who has been breaking down Agent engineering and exploring how AI can be applied in education.&lt;br&gt;
I focus on Agent Harness, LLM application engineering, AI for Math, and the productization of education SaaS.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>luhuidev</category>
    </item>
    <item>
      <title>Why Signatures Make Automatic Optimization Easier Than Writing Prompts Directly</title>
      <dc:creator>Luhui Dev</dc:creator>
      <pubDate>Wed, 22 Apr 2026 13:40:26 +0000</pubDate>
      <link>https://forem.com/luhuidev/why-signatures-make-automatic-optimization-easier-than-writing-prompts-directly-35k1</link>
      <guid>https://forem.com/luhuidev/why-signatures-make-automatic-optimization-easier-than-writing-prompts-directly-35k1</guid>
      <description>&lt;p&gt;A great discovery from my recent project work: &lt;strong&gt;DSPy&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;While building the content generation pipeline for Canviz, I encountered a recurring engineering problem—it was extremely difficult to maintain stable "problem explanation quality + canvas script usability" through prompts alone. Whenever I switched models or added new grade levels, I had to re-tune the entire string of prompts. DSPy offered me a systematic solution that's worth sharing separately.&lt;/p&gt;



&lt;h2&gt;
  
  
  The Fundamental Contradiction of Prompt Engineering
&lt;/h2&gt;

&lt;p&gt;Before diving into DSPy, I need to clarify one thing: &lt;strong&gt;Why is writing prompts an engineering problem, not just a matter of technique?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Traditional prompts have a fatal design flaw: &lt;strong&gt;they mix "what I want to do" with "how to tell the model to do it."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That natural language prompt you write simultaneously handles two things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Describing the task's &lt;strong&gt;logic&lt;/strong&gt; (what inputs to accept, what outputs to produce);&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The "incantation" tuned for &lt;strong&gt;this specific model&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Take a math teaching scenario as an example—the logic of "explaining a chicken-and-rabbit problem" is eternal, but the incantation to make GPT explain it well versus making Claude Sonnet explain it well can be quite different. Once you switch models, or change from third grade to fifth grade, that incantation might fail. Worse yet, there's no systematic way to fix it—you can only rely on intuition and trial-and-error.&lt;/p&gt;

&lt;p&gt;This is what software engineering calls the hard-coding problem. For ordinary logic, we've long learned not to hard-code; but for AI pipelines, we willingly lock the most core logic into a fragile string.&lt;/p&gt;

&lt;p&gt;DSPy's author, Stanford's Omar Khattab, describes this problem as:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Existing LM pipelines are typically implemented using hard-coded prompt templates, discovered through trial and error, and extremely brittle."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;



&lt;h2&gt;
  
  
  What is DSPy? What's Its Core Insight?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;DSPy (Declarative Self-improving Python)&lt;/strong&gt; is a framework open-sourced by Stanford NLP Lab in 2023, published at ICLR 2024. Its core proposition is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Programming language models, not prompting them.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It offers an elegant solution: &lt;strong&gt;completely separate the task's interface description from the specific prompt implementation&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;You only need to tell DSPy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;What this step inputs and outputs;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;What the logical structure of the entire pipeline is;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;What your evaluation criteria are.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then DSPy's &lt;strong&gt;Compiler&lt;/strong&gt; and &lt;strong&gt;Optimizer&lt;/strong&gt; will automatically find the best prompt for you—tailored to your chosen model, your data, and your metrics.&lt;/p&gt;

&lt;p&gt;To borrow the official analogy: &lt;strong&gt;This is like jumping from assembly language to high-level languages, or from writing raw SQL to using an ORM.&lt;/strong&gt;&lt;/p&gt;



&lt;h2&gt;
  
  
  Three Core Concepts to Understand DSPy's Full Picture
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Signature: Type Signature of Tasks
&lt;/h3&gt;

&lt;p&gt;Signature is DSPy's interface description. It tells the framework what this step does, not how to do it, using a type-declaration-like approach:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ExplainMathProblem&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Signature&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Explain a math problem to students of a specified grade, using language appropriate to their cognitive level.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;problem&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;InputField&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;desc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Original text of the math problem&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;grade&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;InputField&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;desc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Student grade level, e.g., 3 for third grade&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;explanation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;OutputField&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;desc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Step-by-step explanation suitable for the grade, friendly and easy to understand&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;key_concept&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;OutputField&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;desc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Core concept tested by this problem, explained in one sentence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice: you haven't written any prompt at all. This only contains &lt;strong&gt;the semantics of the interface&lt;/strong&gt;, without any "you are a gentle and patient math teacher..." type of prompting.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Module: Composable Functional Units
&lt;/h3&gt;

&lt;p&gt;Module is DSPy's execution unit, inspired by PyTorch's &lt;code&gt;nn. Module&lt;/code&gt;. You can compose them like building blocks to construct a complete teaching content generation pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MathLessonPipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Module&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Step 1: Explain the problem
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;explain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ChainOfThought&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ExplainMathProblem&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;# Step 2: Generate corresponding Dinogsp geometry visualization script based on explanation
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generate_diagram&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;problem, explanation -&amp;gt; dinogsp_script: str&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;# Step 3: Create a practice problem of the same type
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;make_exercise&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;problem, key_concept, grade -&amp;gt; exercise: str, answer: str&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;forward&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;problem&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;grade&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Explain
&lt;/span&gt;        &lt;span class="n"&gt;step1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;explain&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;problem&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;problem&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;grade&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;grade&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;# Generate diagram
&lt;/span&gt;        &lt;span class="n"&gt;step2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_diagram&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;problem&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;problem&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;explanation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;step1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;explanation&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;# Create practice problem
&lt;/span&gt;        &lt;span class="n"&gt;step3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;make_exercise&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;problem&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;problem&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;key_concept&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;step1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;key_concept&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;grade&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;grade&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Prediction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;explanation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;step1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;explanation&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;dinogsp_script&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;step2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dinogsp_script&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;exercise&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;step3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exercise&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;step3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This entire three-step pipeline doesn't contain a single word of prompt—everything written is &lt;strong&gt;logic&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;DSPy includes several classic reasoning strategy modules:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Module&lt;/th&gt;
&lt;th&gt;Corresponding Reasoning Method&lt;/th&gt;
&lt;th&gt;Application in Teaching&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;dspy. Predict&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Direct prediction&lt;/td&gt;
&lt;td&gt;Problem difficulty grading, concept tagging&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;dspy. ChainOfThought&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Chain of Thought (CoT)&lt;/td&gt;
&lt;td&gt;Step-by-step problem-solving explanation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;dspy. ReAct&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Reasoning-Action loop&lt;/td&gt;
&lt;td&gt;Calling external tools to validate scripts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;dspy. ProgramOfThought&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Program-based thinking&lt;/td&gt;
&lt;td&gt;Generating executable math calculation code&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  3. Optimizer: Automatic Tuning Engine
&lt;/h3&gt;

&lt;p&gt;This is the most magical part of DSPy, where its truly unique value lies.&lt;/p&gt;

&lt;p&gt;You need to provide:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;An evaluation dataset (e.g., 100 problems, each with manually annotated good explanation samples);&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;An evaluation metric function (to judge whether the generated explanation is good).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then call the optimizer, which will automatically search for the optimal combination of prompt instructions and few-shot examples:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Define evaluation metric: whether explanation is age-appropriate, whether diagram script is parseable
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;lesson_quality_metric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prediction&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;explanation_ok&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prediction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;explanation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;  &lt;span class="c1"&gt;# Basic length
&lt;/span&gt;    &lt;span class="n"&gt;script_parseable&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;validate_dinogsp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prediction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dinogsp_script&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Script usability
&lt;/span&gt;    &lt;span class="n"&gt;grade_appropriate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;check_vocabulary_level&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;prediction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;explanation&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;grade&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Age-appropriate vocabulary
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;explanation_ok&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;script_parseable&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;grade_appropriate&lt;/span&gt;

&lt;span class="c1"&gt;# Optimize using MIPROv2
&lt;/span&gt;&lt;span class="n"&gt;optimizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;MIPROv2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;metric&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;lesson_quality_metric&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;auto&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;medium&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;optimized_pipeline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nc"&gt;MathLessonPipeline&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;trainset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;annotated_lessons&lt;/span&gt;  &lt;span class="c1"&gt;# Your annotated data
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Save results, load directly in production without re-optimization
&lt;/span&gt;&lt;span class="n"&gt;optimized_pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./optimized_math_lesson.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A medium-level optimization costs about $10 and takes 20 minutes to run, resulting in a teaching content generation system automatically tuned for your chosen model and specified grade-level data.&lt;/p&gt;



&lt;h2&gt;
  
  
  Looking at the Data
&lt;/h2&gt;

&lt;p&gt;DSPy's official documentation provides a set of impressive data:&lt;/p&gt;

&lt;p&gt;On the HotPotQA multi-hop reasoning task (which requires combining information across documents, very similar to the logical structure of math word problems), running &lt;code&gt;dspy. ReAct&lt;/code&gt; with the &lt;code&gt;gpt mini&lt;/code&gt; series:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Before optimization: 24% accuracy&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;After MIPROv2 optimization with 500 samples: 51% accuracy&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More than doubled, not by switching to a more expensive model, but by teaching this smaller model to better complete this type of task.&lt;/p&gt;



&lt;h2&gt;
  
  
  The Essential Difference from LangChain/LlamaIndex
&lt;/h2&gt;

&lt;p&gt;You might wonder how DSPy differs from LangChain—for instance, if you're already using LangChain, do you need to switch?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LangChain / LlamaIndex&lt;/strong&gt; are tool chain orchestration frameworks. They connect components like LLMs, vector databases, and tool calls, but the prompts themselves are still strings written by humans. If you switch models, you still have to manually modify the prompts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DSPy&lt;/strong&gt; is an AI program compilation framework. It doesn't just connect components—it takes over the generation and optimization of prompts. Humans are responsible for writing the logic, while it translates that into the most effective natural language instructions for a specific model.&lt;/p&gt;

&lt;p&gt;Specifically for math teaching scenarios: if you built a "generate third-grade explanations" pipeline with LangChain, and tomorrow the product requires fifth-grade support, you need to manually go back and modify all related prompt strings—because the vocabulary and logical depth requirements for fifth grade have changed. With DSPy, you only change the input parameter &lt;code&gt;grade=5&lt;/code&gt;, then rerun compilation, and the framework will automatically adjust the internal prompting strategy.&lt;/p&gt;

&lt;p&gt;If I were to make an analogy: LangChain is an automated assembly line, DSPy is a high-level language with a JIT compiler.&lt;/p&gt;



&lt;h2&gt;
  
  
  My Developer Perspective: What It Solves, What's Still Missing
&lt;/h2&gt;

&lt;p&gt;After all these praises, I should also mention what I think it still lacks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What DSPy truly solves:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Pain of model migration&lt;/strong&gt;: Switching from GPT-5.4 to the cheaper Kimi 2.5, just recompile once—no need to manually modify prompts;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Multi-step joint optimization&lt;/strong&gt;: Explanation quality + diagram script usability—these two goals were previously hard to optimize simultaneously, but DSPy's compiler can perform global optimal search;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Reproducible experiments&lt;/strong&gt;: Optimization results saved as JSON, shareable with the team, version-controlled, goodbye to "which document has that best-performing prompt we used before?"&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Current limitations:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Evaluation metrics are the hard part&lt;/strong&gt;: Functions like &lt;code&gt;validate_dinogsp()&lt;/code&gt; need to be written by you, and writing them well isn't easy. DSPy's optimization effectiveness highly depends on metric quality—vague metrics lead the optimizer to game the system;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Optimization isn't free&lt;/strong&gt;: Medium-level optimization on 100 samples costs about $2; if you have multiple grade levels and problem types, costs will rise significantly as data volume increases;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Debugging experience is still maturing&lt;/strong&gt;: When an optimized pipeline still underperforms, it's sometimes hard to determine whether it's insufficient data, flawed metrics, or the model's inherent capability boundary.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;



&lt;h2&gt;
  
  
  When Should You Use DSPy?
&lt;/h2&gt;

&lt;p&gt;If you're encountering any of the following situations, it's worth seriously considering DSPy:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;✅ Very suitable:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;You're building multi-step LLM pipelines (explanation + diagram + practice problems is exactly this structure)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You need to switch between different models (cost control, or selecting different capability models by age group)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You have an evaluation dataset and want quantifiable improvement in effectiveness&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You're tired of modifying prompts by feel and want a systematic optimization method&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Your application needs long-term maintenance in production&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;⚠️ Not quite suitable:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Just quickly validating an idea, no need for long-term maintenance&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The task has no clear evaluation metrics, leaving the optimizer with nothing to work with&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;



&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;I think DSPy's approach is good because it proposes &lt;strong&gt;a more engineering-reliable way of thinking&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;Prompts in AI pipelines are essentially &lt;strong&gt;parameters&lt;/strong&gt; of the program, not the program's &lt;strong&gt;source code&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just as I wouldn't hard-code neural network weights into source code, I shouldn't treat prompts tuned for a specific model as the program logic itself. These weights should be systematically learnable, optimizable, savable, and transferable.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;logic&lt;/strong&gt; of teaching content is stable—step-by-step, illustrated, age-appropriate expression; but &lt;strong&gt;how to guide the model to achieve all this&lt;/strong&gt; will constantly change with model updates, grade expansions, and problem type additions. Using DSPy to separate the two enables a truly maintainable AI teaching system.&lt;/p&gt;



&lt;p&gt;🙋‍♀️ &lt;em&gt;If you're also working on AI education, feel free to &lt;a href="https://luhuidev.com/en" rel="noopener noreferrer"&gt;connect&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;



&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;DSPy Official Documentation: &lt;a href="https://dspy.ai" rel="noopener noreferrer"&gt;dspy.ai&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Paper: &lt;a href="https://arxiv.org/abs/2310.03714" rel="noopener noreferrer"&gt;DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines, ICLR 2024&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;GitHub: &lt;a href="https://github.com/stanfordnlp/dspy" rel="noopener noreferrer"&gt;stanfordnlp/dspy&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Optimizer Details: &lt;a href="https://dspy.ai/learn/optimization/optimizers/" rel="noopener noreferrer"&gt;dspy.ai/learn/optimization/optimizers&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>promptengineering</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>Struggling with Research Figures? Here's How Multi-Agent Collaboration Gets It Right</title>
      <dc:creator>Luhui Dev</dc:creator>
      <pubDate>Sat, 11 Apr 2026 08:51:32 +0000</pubDate>
      <link>https://forem.com/luhuidev/struggling-with-research-figures-heres-how-multi-agent-collaboration-gets-it-right-2pka</link>
      <guid>https://forem.com/luhuidev/struggling-with-research-figures-heres-how-multi-agent-collaboration-gets-it-right-2pka</guid>
      <description>&lt;h1&gt;
  
  
  Struggling with Research Figures? Here's How Multi-Agent Collaboration Gets It Right
&lt;/h1&gt;

&lt;h2&gt;
  
  
  The Problem Every Researcher Knows Too Well
&lt;/h2&gt;

&lt;p&gt;Anyone who's done research knows this pain: creating a single figure from concept to completion can be more exhausting than writing the actual paper. You need logical structure, data precision, and style compliance—miss any one of these, and you're back to the drawing board.&lt;/p&gt;

&lt;p&gt;Single-model AI generation tools often produce beautiful images with broken logic, or logically sound diagrams that look terrible, or worst of all—figures where all the proportions are completely off.&lt;/p&gt;

&lt;p&gt;PaperBanana solved this problem, and it works remarkably well. The key insight? &lt;strong&gt;Break the task into multiple roles and let an AI team collaborate.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2F218018b2e853dd507f5f7584dfdb9fb246edd46bc7ffc814445b9f62d34b1f09" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2F218018b2e853dd507f5f7584dfdb9fb246edd46bc7ffc814445b9f62d34b1f09" alt="image.png" width="760" height="174"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Traditional AI Falls Short
&lt;/h2&gt;

&lt;p&gt;Many assume that throwing a large language model at the problem should work. But research figures aren't ordinary illustrations—they need to &lt;strong&gt;accurately express logic&lt;/strong&gt;, &lt;strong&gt;ensure data precision&lt;/strong&gt;, and ultimately meet academic journal aesthetics.&lt;/p&gt;

&lt;p&gt;A single model can't nail all three at once. The result? Either gorgeous images with completely wrong logic, or logically correct diagrams that look like they're from the '90s, and almost always with numerical proportions that make no sense.&lt;/p&gt;

&lt;p&gt;This is the core pain point of research figure generation, and exactly why solutions like PaperBanana emerged.&lt;/p&gt;

&lt;h2&gt;
  
  
  PaperBanana's Five-Role Collaboration
&lt;/h2&gt;

&lt;p&gt;PaperBanana's design philosophy is simple: &lt;strong&gt;Split the generation task into five specialized roles, let each focus on what they do best, then collaborate iteratively.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The Visual Workflow&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2Ffe549e3b91d7fc3b05cd5bd00de0736024847c57f36ddf8340eb6e34ff1e4c36" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2Ffe549e3b91d7fc3b05cd5bd00de0736024847c57f36ddf8340eb6e34ff1e4c36" alt="image.png" width="600" height="1075"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Retriever — The Inspiration Board&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The Retriever searches through a curated reference database to find the most relevant examples.&lt;/p&gt;

&lt;p&gt;It focuses on &lt;strong&gt;visual structure matching&lt;/strong&gt;, ensuring that subsequent generation has reliable layout references to work from.&lt;/p&gt;

&lt;p&gt;Think of it like a designer browsing templates before starting to sketch—that's what the Retriever does.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2. Planner — The Skeleton Designer&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The Planner is the core brain. It transforms paper descriptions and figure objectives into detailed figure plans, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Figure components (nodes/modules)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Logical relationships and arrow directions between components&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Spatial layout suggestions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Labels, annotations, etc.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Planner's core job is to provide the skeleton, preventing the generation from going off the rails.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3. Stylist — The Aesthetic Director&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;With the skeleton in place, the Stylist handles the aesthetics.&lt;/p&gt;

&lt;p&gt;It extracts colors, fonts, line weights, and shapes from reference examples, optimizing the Planner's output to meet journal standards.&lt;/p&gt;

&lt;p&gt;NeurIPS and Nature have different figure styles—the Stylist ensures generated figures comply with academic norms.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;4. Visualizer — The Executor&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The Visualizer generates figures based on the standardized plan:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Method figures&lt;/strong&gt; → Rendered using high-quality image generation models&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data charts&lt;/strong&gt; → Outputs &lt;strong&gt;reproducible Matplotlib code&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This means generated figures aren't just pretty—they're directly usable as research materials, reproducible and modifiable.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;5. Critic — The QA/Feedback Loop&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The Critic is key to closing the loop. It checks whether the figure faithfully reflects the text, whether it's clear, and whether it meets style specifications.&lt;/p&gt;

&lt;p&gt;If unsatisfied, it provides revision suggestions, prompting the Planner/Visualizer to iterate. Usually 2–3 rounds produce high-quality figures.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Multi-Role Collaboration Works
&lt;/h2&gt;

&lt;p&gt;Compared to single-model end-to-end generation, PaperBanana has three major advantages:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Reference-driven&lt;/strong&gt;: The Retriever provides structural and stylistic examples, making generation more reliable&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Clear division of labor&lt;/strong&gt;: Logic, style, and rendering are separated, avoiding the chaos of black-box generation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Closed-loop self-checking&lt;/strong&gt;: Critic + iteration makes figure quality controllable&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In other words, this is a &lt;strong&gt;process innovation&lt;/strong&gt; for AI-assisted research figure creation. In experiments, PaperBanana significantly outperformed baselines in fidelity, readability, and aesthetics.&lt;/p&gt;

&lt;p&gt;If you're interested in the design of this scenario, I've compiled &lt;a href="https://luhuidev.com/zh-cn/essays/paperbanana-ai-academic-method-figure-collaboration" rel="noopener noreferrer"&gt;the complete Prompt set&lt;/a&gt;—grab it below 👇&lt;/p&gt;

&lt;h2&gt;
  
  
  Beyond Academic Figures
&lt;/h2&gt;

&lt;p&gt;This multi-role collaboration pattern isn't limited to academic illustrations.&lt;/p&gt;

&lt;p&gt;For flowcharts, experimental design diagrams, teaching demonstrations, automated data visualization, and even complex tasks like code generation and decision planning, multi-agent collaboration proves more reliable.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://arxiv.org/abs/2601.23265" rel="noopener noreferrer"&gt;PaperBanana: Automating Academic Illustration for AI Scientists (arXiv)&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://paper-banana.ai/" rel="noopener noreferrer"&gt;PaperBanana Official Site&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://hyper.ai/en/papers/2601.23265" rel="noopener noreferrer"&gt;PaperBananaBench Dataset and Evaluation&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>agents</category>
    </item>
    <item>
      <title>Dino-GSP Major Update: dynamic geometry demos, geometry embeds, and AI drawing upgrades</title>
      <dc:creator>Luhui Dev</dc:creator>
      <pubDate>Tue, 07 Apr 2026 12:52:04 +0000</pubDate>
      <link>https://forem.com/luhuidev/dino-gsp-major-update-dynamic-geometry-demos-geometry-embeds-and-ai-drawing-upgrades-33ep</link>
      <guid>https://forem.com/luhuidev/dino-gsp-major-update-dynamic-geometry-demos-geometry-embeds-and-ai-drawing-upgrades-33ep</guid>
      <description>&lt;p&gt;&lt;strong&gt;Dino-GSP 2.4.0 was released on March 23, 2026.&lt;/strong&gt; This update is not just a list of extra features. It connects &lt;strong&gt;dynamic geometry demos, online geometry embeds, region area calculation, and AI geometry drawing&lt;/strong&gt; into a more complete workflow.&lt;/p&gt;

&lt;p&gt;If you are comparing &lt;strong&gt;dynamic geometry software, online geometry tools, math teaching tools, or interactive geometry platforms&lt;/strong&gt; for lessons, content, or websites, this release deserves attention.&lt;/p&gt;



&lt;h2&gt;
  
  
  Dino-GSP 2.4.0 at a glance
&lt;/h2&gt;

&lt;p&gt;This release focuses on four high-frequency needs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Slider-based dynamic demos&lt;/strong&gt; that make geometry figures actually move&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Geometry embed mode&lt;/strong&gt; for blogs, course pages, and product sites&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Boolean region operations and area calculation&lt;/strong&gt; for more complex analysis&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Broader AI geometry assistance&lt;/strong&gt; that fits real creation workflows&lt;/li&gt;
&lt;/ul&gt;



&lt;h2&gt;
  
  
  1. Dynamic geometry demos upgraded: sliders are now a first-class feature
&lt;/h2&gt;

&lt;p&gt;The point of dynamic geometry is not just drawing figures. It is showing parameter changes, geometric relationships, and reasoning processes in motion. The latest Dino-GSP release fully rounds out slider support and makes it much closer to a real &lt;strong&gt;dynamic geometry software&lt;/strong&gt; workflow for classrooms and content creation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2Fa58a39769f251141c6d3bcc585fbe820c405b114b6dcf46b09dd74b187e26622" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2Fa58a39769f251141c6d3bcc585fbe820c405b114b6dcf46b09dd74b187e26622" alt="10.gif" width="760" height="428"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This upgrade includes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Create and edit dynamic parameters&lt;/strong&gt;: sliders can directly control lengths, angles, and point positions, with figures updating in real time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Text-linked values&lt;/strong&gt;: slider values can be inserted into explanatory text so teaching copy updates together with the figure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Autoplay support&lt;/strong&gt;: presentation and sharing modes support autoplay, speed adjustment, and looping for lessons and recorded demos.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;More complete exports&lt;/strong&gt;: sliders can be exported to SVG and TikZ while preserving labels and control styles for papers, handouts, and blogs.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This pushes Dino-GSP beyond a static geometry board and makes it more suitable for &lt;strong&gt;interactive geometry demos&lt;/strong&gt;, classroom walkthroughs, and parameter-driven explanations.&lt;/p&gt;



&lt;h2&gt;
  
  
  2. Geometry embed mode arrives: the online geometry tool can now live inside web pages
&lt;/h2&gt;

&lt;p&gt;For course builders, bloggers, and documentation teams, the ability to embed geometry into a page is a practical requirement. The latest Dino-GSP release adds a full &lt;strong&gt;geometry embed mode&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.1 Where this helps
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Embedding interactive geometry into teaching blogs&lt;/li&gt;
&lt;li&gt;Showing manipulable math demos inside online courses&lt;/li&gt;
&lt;li&gt;Adding interactive diagrams to product sites or knowledge bases&lt;/li&gt;
&lt;li&gt;Preserving parameter control and geometry state in shared pages&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2F838d76b316c3d441659c1ccc341f35117ebd7dda3619aee0fbb80b01e6267b2f" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2F838d76b316c3d441659c1ccc341f35117ebd7dda3619aee0fbb80b01e6267b2f" alt="image.png" width="2648" height="1540"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2.2 What is included
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;A complete embed architecture&lt;/strong&gt;: dedicated routing, state synchronization, and communication bridging.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;iframe export&lt;/strong&gt;: exportable iframe links with configurable aspect ratios for different layouts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;REPL integration&lt;/strong&gt;: embedded surfaces can load and edit geometry content, so the experience goes beyond passive viewing.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2Fed3e488fd355a98c966cece2aed70ffe343370f5474174f95e26d44740a859c7" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2Fed3e488fd355a98c966cece2aed70ffe343370f5474174f95e26d44740a859c7" alt="image.png" width="2574" height="1449"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2F1247a1fc8ef4f49ef2c48f31b8be4aea7e074c1068dbabe260e3bf7067c6d8d2" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2F1247a1fc8ef4f49ef2c48f31b8be4aea7e074c1068dbabe260e3bf7067c6d8d2" alt="image.png" width="2495" height="1543"&gt;&lt;/a&gt;&lt;/p&gt;



&lt;h2&gt;
  
  
  3. Region area calculation and boolean operations improved: analysis is more complete
&lt;/h2&gt;

&lt;p&gt;If you need to work with overlapping shapes, composite figures, or region logic, this release strengthens the analytical layer.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2F534a98eb2178cfe915e6e3724f00f8dfac755cb0dd98c4ac29d24e7b2e5986be" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2F534a98eb2178cfe915e6e3724f00f8dfac755cb0dd98c4ac29d24e7b2e5986be" alt="image.png" width="1280" height="751"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The update includes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Boolean path operations&lt;/strong&gt;: intersection, union, and difference for more complex region construction.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Region area calculation&lt;/strong&gt;: direct area calculation plus &lt;code&gt;contains&lt;/code&gt; checks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Precision fixes&lt;/strong&gt;: better handling of boundary precision issues, negative radii, and undefined dependencies.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This matters for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Solving geometry problems involving overlapping areas&lt;/li&gt;
&lt;li&gt;Verifying region relationships in teaching contexts&lt;/li&gt;
&lt;li&gt;Building composite paths for cleaner exports&lt;/li&gt;
&lt;li&gt;Running more stable geometry computation workflows&lt;/li&gt;
&lt;/ul&gt;



&lt;h2&gt;
  
  
  4. Master management is now available: keep diagram styles consistent at scale
&lt;/h2&gt;

&lt;p&gt;If you produce many teaching diagrams or worksheet visuals, repeated style setup quickly becomes inefficient. The latest release adds &lt;strong&gt;master management&lt;/strong&gt; to improve content production efficiency.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2F107f62f13c64ff4c1d3f379ac9ca8002ef6a553bfd65b96479ee1f35231eb9ba" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2F107f62f13c64ff4c1d3f379ac9ca8002ef6a553bfd65b96479ee1f35231eb9ba" alt="image.png" width="1280" height="639"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can now:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open the master panel directly from the editor tabs&lt;/li&gt;
&lt;li&gt;Create, update, apply, and delete masters&lt;/li&gt;
&lt;li&gt;Set default styles and preview them in real time&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For teachers, geometry creators, and worksheet teams, this improves batch production more than one-off drawing speed.&lt;/p&gt;



&lt;h2&gt;
  
  
  5. AI geometry drawing keeps improving: a smarter geometry assistant
&lt;/h2&gt;

&lt;p&gt;Dino-GSP has been pushing AI toward an executable geometry assistant, not just a chat box. This AI update is part of that broader workflow.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2Fc7ba50c8650058ab2e99fee1b28fd9d8026cfad1d0f35eda4d51b3e20b017e07" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2Fc7ba50c8650058ab2e99fee1b28fd9d8026cfad1d0f35eda4d51b3e20b017e07" alt="image.png" width="2077" height="1220"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The main AI improvements include:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Usage and credit records&lt;/strong&gt;: clearer tracking for AI costs and consumption.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Image upload entry points&lt;/strong&gt;: users can upload sketches or images and be routed to image-capable models.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Better conversation tools&lt;/strong&gt;: copy, reaction, and feedback support for a more stable interaction loop.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clearer instruction display&lt;/strong&gt;: formatting, truncation, and expansion improve readability for complex prompts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Animation support&lt;/strong&gt;: AI can help create geometry animations and assist with keyframes and motion paths.&lt;/li&gt;
&lt;/ol&gt;



&lt;h2&gt;
  
  
  6. Axes, grids, and algebra definitions continue to improve
&lt;/h2&gt;

&lt;p&gt;Beyond the larger features, this release also includes lower-level upgrades that affect daily use.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2Fac9ae40a0c1226863e8592fa029a3429f7024777df4148f8ee21303f20ab24a1" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2Fac9ae40a0c1226863e8592fa029a3429f7024777df4148f8ee21303f20ab24a1" alt="image.png" width="2077" height="1220"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  6.1 Coordinate system and grid
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Custom grid ranges are supported&lt;/li&gt;
&lt;li&gt;Axis point selection can lock intelligently&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pi&lt;/code&gt; and &lt;code&gt;pi/2&lt;/code&gt; spacing are supported&lt;/li&gt;
&lt;li&gt;X and Y ranges, labels, and intervals are more configurable&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  6.2 Automatic algebra definition reordering
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Object order is adjusted automatically when algebra definitions change&lt;/li&gt;
&lt;li&gt;Circular dependency detection and error prompts are supported&lt;/li&gt;
&lt;/ul&gt;



&lt;h2&gt;
  
  
  7. More upgrades across drawing and sharing workflows
&lt;/h2&gt;

&lt;h3&gt;
  
  
  7.1 Geometry and drawing
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;New orthogonal drawing mode&lt;/li&gt;
&lt;li&gt;Better ellipse arc editing&lt;/li&gt;
&lt;li&gt;Added arrow styles&lt;/li&gt;
&lt;li&gt;Dynamic anchor support for labels&lt;/li&gt;
&lt;li&gt;Formula editor symbols better aligned with classroom math notation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2Fd9942534705e4ecccee109eaa89209a0284f04917b4c031b2f923ce029534f1c" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2Fd9942534705e4ecccee109eaa89209a0284f04917b4c031b2f923ce029534f1c" alt="image.png" width="2574" height="1347"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  7.2 Interaction and interface
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Floating toolbar for union selection, color settings, and hover hints&lt;/li&gt;
&lt;li&gt;More line and point styling options&lt;/li&gt;
&lt;li&gt;Clearer property panel structure&lt;/li&gt;
&lt;li&gt;Input width adjusts dynamically with expression count&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  7.3 Sharing and SEO
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Community sharing can control whether AI chat records are public&lt;/li&gt;
&lt;li&gt;Shared works can restrict saving and remixing&lt;/li&gt;
&lt;li&gt;Shared pages support dynamic titles and descriptions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This makes Dino-GSP better not just for drawing, but also for &lt;strong&gt;distribution, discoverability, and search visibility&lt;/strong&gt;.&lt;/p&gt;



&lt;h2&gt;
  
  
  8. Which day-to-day issues were fixed
&lt;/h2&gt;

&lt;p&gt;This release also fixes a large number of practical issues, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Region computation&lt;/strong&gt;: negative area, path restoration, arc judgment, and precision flicker&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sliders&lt;/strong&gt;: style copying, step and speed defaults, snapping, previews, and history behavior&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Selection&lt;/strong&gt;: deselect with Shift, incorrect select-all behavior, and function graph box selection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exports&lt;/strong&gt;: inconsistencies across SVG, LaTeX, and Canvas, plus font embedding and clipping offsets&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool compatibility&lt;/strong&gt;: grid snapping, compass and transform tool errors, file jumps, and copy/paste&lt;/li&gt;
&lt;/ul&gt;



&lt;h2&gt;
  
  
  Try Dino-GSP
&lt;/h2&gt;

&lt;p&gt;If you are comparing geometry software, math teaching tools, or embeddable dynamic geometry options, this version is now a much stronger reference point.&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://dajiaoai.com/?utm_source=luhuidev" rel="noopener noreferrer"&gt;Try Dino-GSP now&lt;/a&gt;&lt;/p&gt;



&lt;h3&gt;
  
  
  About Dino-GSP
&lt;/h3&gt;

&lt;p&gt;Dino-GSP is a tool for math teaching, geometry creation, and online sharing. It combines a geometry engine, AI assistance, and professional export capabilities into a more modern geometry workflow.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>news</category>
      <category>science</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Embed a Geometry Canvas in Your Webpage with One Line of Code</title>
      <dc:creator>Luhui Dev</dc:creator>
      <pubDate>Tue, 17 Mar 2026 13:46:02 +0000</pubDate>
      <link>https://forem.com/luhuidev/embed-a-geometry-canvas-in-your-webpage-with-one-line-of-code-2clg</link>
      <guid>https://forem.com/luhuidev/embed-a-geometry-canvas-in-your-webpage-with-one-line-of-code-2clg</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Many products actually need &lt;strong&gt;geometry capabilities&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Online education platforms need to display geometric shapes in courses&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Question bank systems need to create diagrams for math problems&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;AI Tutors need to draw diagrams dynamically when explaining problems&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Lesson plan and courseware tools need to generate mathematical graphics&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But here's the problem:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A geometry canvas is actually a very complex software system.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you develop it yourself, you'll quickly find yourself dealing with a pile of problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Geometric object management (points, lines, circles, angles, curves)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Intersection calculation and constraint computation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Graphics rendering and drag-and-drop interaction&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Multi-canvas management&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;File format and sharing system&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All these capabilities combined basically constitute a complete product.&lt;/p&gt;

&lt;p&gt;The final choice for many teams is either to use &lt;strong&gt;static images&lt;/strong&gt; or integrate an &lt;strong&gt;existing geometry system&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Recently, we did something interesting: &lt;strong&gt;We turned a geometry canvas into a component that can be directly embedded in webpages.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Developers only need one line of code to put a complete geometry canvas into their own products.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Geometry Canvas That Can Be Embedded in Webpages
&lt;/h2&gt;

&lt;p&gt;The Dino-GSP（大角几何）Open Platform provides an &lt;strong&gt;embeddable geometry canvas SDK&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Developers can embed the geometry canvas into their own web applications just like using a frontend component.&lt;/p&gt;

&lt;p&gt;The core concept is actually quite simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Your webpage
   ↓
Embed geometry canvas
   ↓
Gain complete geometry capabilities
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;No need to develop your own geometry engine&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;No need to implement geometry calculations yourself&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;No need to write complex interaction logic yourself&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Just embed it and use it.&lt;/p&gt;

&lt;p&gt;In the official capability design, Dino-GSP（大角几何）aims to become “geometry capability infrastructure”: through SDK, API, REPL, and other methods, making geometry capabilities embeddable in more products and systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Simplest Way: Direct Embedding
&lt;/h2&gt;

&lt;p&gt;If you just want to display a geometric figure, the simplest method is &lt;strong&gt;iframe embedding&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;iframe&lt;/span&gt; &lt;span class="na"&gt;src=&lt;/span&gt;&lt;span class="s"&gt;"https://dajiaoai.com/e/33TA3484"&lt;/span&gt; &lt;span class="na"&gt;width=&lt;/span&gt;&lt;span class="s"&gt;"800"&lt;/span&gt; &lt;span class="na"&gt;height=&lt;/span&gt;&lt;span class="s"&gt;"600"&lt;/span&gt; &lt;span class="na"&gt;allow=&lt;/span&gt;&lt;span class="s"&gt;"fullscreen"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;/iframe&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This way you can directly embed a geometry canvas into a webpage.&lt;/p&gt;

&lt;p&gt;Suitable scenarios include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Displaying geometric figures on teaching pages&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Embedding mathematical graphics in blog articles&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Showing dynamic figures in online textbooks&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No additional development work required.&lt;/p&gt;

&lt;h2&gt;
  
  
  Developer Approach: Using the SDK
&lt;/h2&gt;

&lt;p&gt;If you want deeper control over the canvas, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Dynamically loading graphics&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Switching canvases&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Importing files&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Calling geometry operations&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can use the &lt;strong&gt;SDK integration approach&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;First, install the SDK:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; @dajiaoai/algeo-sdk
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then create a canvas on the page:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;AlgeoSdk&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@dajiaoai/algeo-sdk&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;container&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getElementById&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;algeo-container&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sdk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;AlgeoSdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;container&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;initialId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;33TA3484&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates a geometry canvas instance.&lt;/p&gt;

&lt;p&gt;You can then operate it through the API, for example:&lt;/p&gt;

&lt;p&gt;Load shared content:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;sdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loadShareById&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;33TA3484&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Get canvas count:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;count&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;sdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getSlideCount&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Switch canvas:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;sdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;switchSlide&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Developers can use the geometry canvas as a &lt;strong&gt;programmable component&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Very Interesting Capability: REPL
&lt;/h2&gt;

&lt;p&gt;In addition to regular APIs, Dino-GSP（大角几何）also provides a &lt;strong&gt;REPL interface&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Simply put, it means using commands to directly control the geometry system.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Define geometric objects&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Query graphic states&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Execute geometry operations&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The REPL output is in structured text format, making it convenient for AI or Agent systems to call.&lt;/p&gt;

&lt;p&gt;This means that in the future, not only humans can operate the canvas, &lt;strong&gt;but AI can also directly call geometry capabilities.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is why we call it: &lt;strong&gt;AI-native geometry capability interface.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Which Products Is This Suitable For?
&lt;/h2&gt;

&lt;p&gt;The embeddable geometry canvas is actually suitable for many products.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Online Education Platforms
&lt;/h3&gt;

&lt;p&gt;Directly embed geometric figures in course pages, supporting drag-and-drop and dynamic demonstrations.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2F1a919fe1a3c596dce7d3270701c17000157b9403b4cc42d37298b18be476731d" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2F1a919fe1a3c596dce7d3270701c17000157b9403b4cc42d37298b18be476731d" alt="image.png" width="600" height="334"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Question Bank Systems
&lt;/h3&gt;

&lt;p&gt;Automatically generate or load geometric figures for math problems.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2F7f28e602b8519daf692a7dbaf4979155c3f05acd536ba8adcf9845e846e1f7d1" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2F7f28e602b8519daf692a7dbaf4979155c3f05acd536ba8adcf9845e846e1f7d1" alt="image.png" width="600" height="334"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3. AI Tutors
&lt;/h3&gt;

&lt;p&gt;Draw diagrams dynamically when explaining geometry problems.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2F9a6c0311e26881878ec3f8bd499a2ad3cb8d1fb5bb64ff8acc9dd60524b20cfc" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2F9a6c0311e26881878ec3f8bd499a2ad3cb8d1fb5bb64ff8acc9dd60524b20cfc" alt="image.png" width="600" height="334"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Math Content Platforms
&lt;/h3&gt;

&lt;p&gt;Directly embed geometric figures in articles.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2F40c1b6716b7618e647785e0fc2421e998f95cab6c8d10201048afa5773e13d68" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2F40c1b6716b7618e647785e0fc2421e998f95cab6c8d10201048afa5773e13d68" alt="image.png" width="760" height="424"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Independent Developer Tools
&lt;/h3&gt;

&lt;p&gt;Quickly build a math tool without developing your own geometry engine.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2F6bd2cecffbc991f7c6bcb53e806e60606e1e9306830a6b4cf183b14a2a1c143a" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2F6bd2cecffbc991f7c6bcb53e806e60606e1e9306830a6b4cf183b14a2a1c143a" alt="image.png" width="600" height="334"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why We Built This Open Platform
&lt;/h2&gt;

&lt;p&gt;Over the past year, while working on the geometry system, I've had a deep realization: &lt;strong&gt;geometry capability is actually a fundamental capability for many products.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;But there aren't many solutions available on the market currently—either complete software (like GeoGebra) or simple graphics libraries.&lt;/p&gt;

&lt;p&gt;There's a lack of a way &lt;strong&gt;to call geometry capabilities like an API.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So what the Dino-GSP（大角几何）Open Platform hopes to do is enable more products to directly use geometry capabilities without having to reinvent the wheel.&lt;/p&gt;

&lt;p&gt;👉 Dino-GSP（大角几何）Open Platform: &lt;a href="https://open.dajiaoai.com/en/" rel="noopener noreferrer"&gt;open.dajiaoai.com&lt;/a&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>ai</category>
      <category>ai4math</category>
      <category>opensource</category>
    </item>
    <item>
      <title>AlphaGeometry DSL Guide: Google Geometry DSL, defs.txt Actions, and Predicates</title>
      <dc:creator>Luhui Dev</dc:creator>
      <pubDate>Sun, 08 Mar 2026 07:35:44 +0000</pubDate>
      <link>https://forem.com/luhuidev/alphageometry-dsl-guide-google-geometry-dsl-defstxt-actions-and-predicates-479j</link>
      <guid>https://forem.com/luhuidev/alphageometry-dsl-guide-google-geometry-dsl-defstxt-actions-and-predicates-479j</guid>
      <description>&lt;p&gt;This article focuses only on AlphaGeometry DSL itself. It does not cover model training, search strategy, or paper results.&lt;/p&gt;

&lt;p&gt;The goal is to treat the DSL as an engineering-facing protocol document and answer four questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How problem input is encoded&lt;/li&gt;
&lt;li&gt;How actions are defined in &lt;code&gt;defs.txt&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;How geometric relations are mapped into predicates&lt;/li&gt;
&lt;li&gt;How numerical construction and symbolic reasoning are connected&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to reproduce AlphaGeometry, build a geometry data generator, or design a compatibility layer for a custom solver, understanding the DSL protocol is a prerequisite.&lt;/p&gt;


&lt;h2&gt;
  
  
  1. Role of the DSL
&lt;/h2&gt;

&lt;p&gt;AlphaGeometry DSL is a domain-specific language for geometric construction and relation expression. It mainly serves 3 purposes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Express initial geometric premises&lt;/li&gt;
&lt;li&gt;Express executable construction actions&lt;/li&gt;
&lt;li&gt;Express target geometric goals to be verified&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Its output is not a final proof text, but a set of geometric relations consumable by a reasoning engine.&lt;/p&gt;

&lt;p&gt;From an implementation perspective, the DSL is closer to an intermediate representation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Upstream, it connects to problem descriptions or data generators&lt;/li&gt;
&lt;li&gt;In the middle, it connects to action definitions and relation expansion&lt;/li&gt;
&lt;li&gt;Downstream, it connects to &lt;code&gt;rules.txt&lt;/code&gt; and DDAR reasoning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The protocol is centered on relation generation rather than diagram drawing.&lt;/p&gt;


&lt;h2&gt;
  
  
  2. Problem File Structure
&lt;/h2&gt;

&lt;p&gt;A complete problem is usually written as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;problem_name
premises ? goal
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;orthocenter
a b c = triangle;
h = on_tline b a c, on_tline c a b
? perp a h b c
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It can be decomposed into 3 sections:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;a b c = triangle;&lt;/code&gt;
Initial premises and free objects&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;h = on_tline b a c, on_tline c a b&lt;/code&gt;
Construction based on known objects&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;? perp a h b c&lt;/code&gt;
Target predicate&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This DSL fragment is not a natural-language solution. It is a geometric program:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Premises
  -&amp;gt; constructions
  -&amp;gt; predicate graph
  -&amp;gt; goal checking
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After parsing, the system usually needs to produce:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An initial object table&lt;/li&gt;
&lt;li&gt;An action invocation sequence&lt;/li&gt;
&lt;li&gt;An initial predicate set&lt;/li&gt;
&lt;li&gt;A target to be checked&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  3. Basic Syntax Conventions
&lt;/h2&gt;

&lt;p&gt;The core expression form of the DSL is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;output = action(parameters)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;h = on_tline b a c, on_tline c a b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means that point &lt;code&gt;h&lt;/code&gt; satisfies two construction constraints at the same time:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Draw a line through &lt;code&gt;b&lt;/code&gt; perpendicular to &lt;code&gt;ac&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Draw a line through &lt;code&gt;c&lt;/code&gt; perpendicular to &lt;code&gt;ab&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Therefore, &lt;code&gt;h&lt;/code&gt; is the intersection of the two perpendicular lines.&lt;/p&gt;

&lt;p&gt;This style has two main properties:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Output variables are uniformly represented as point variables&lt;/li&gt;
&lt;li&gt;Geometric objects are represented implicitly through point sets rather than independent object types&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;line a b
circle o a
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;line a b&lt;/code&gt; denotes the line defined by points &lt;code&gt;a&lt;/code&gt; and &lt;code&gt;b&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;circle o a&lt;/code&gt; denotes the circle centered at &lt;code&gt;o&lt;/code&gt; passing through &lt;code&gt;a&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This design simplifies the parser and relation graph structure, but it requires the predicate system to be expressive enough to cover higher-level semantics of lines, circles, and angles.&lt;/p&gt;


&lt;h2&gt;
  
  
  4. Action Definition Structure in &lt;code&gt;defs.txt&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;defs.txt&lt;/code&gt; is the action registry. Each action usually contains 5 parts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;action_name outputs inputs

variable dependency

input conditions

geometric constraints

numerical constructions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;midpoint x a b
x : a b
a b = diff a b
x : coll x a b, cong x a x b
midp a b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The role of each part is as follows.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Action signature
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;midpoint x a b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The action name is &lt;code&gt;midpoint&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;The output point is &lt;code&gt;x&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;The input points are &lt;code&gt;a b&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Variable dependency
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;x : a b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means output variable &lt;code&gt;x&lt;/code&gt; depends on inputs &lt;code&gt;a&lt;/code&gt; and &lt;code&gt;b&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This part is typically used for dependency graphs or variable scope management.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Input validity conditions
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;a b = diff a b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means &lt;code&gt;a&lt;/code&gt; and &lt;code&gt;b&lt;/code&gt; must be distinct points.&lt;/p&gt;

&lt;p&gt;This layer is used to prevent degenerate constructions and does not directly generate proof relations.&lt;/p&gt;


&lt;h3&gt;
  
  
  4. Geometric constraints
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;x : coll x a b, cong x a x b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This means the following predicates must hold after the action is completed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;coll x a b&lt;/code&gt;, meaning &lt;code&gt;x&lt;/code&gt; is collinear with &lt;code&gt;a&lt;/code&gt; and &lt;code&gt;b&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;cong x a x b&lt;/code&gt;, meaning &lt;code&gt;XA = XB&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This part defines the symbolic semantics of the action and is the main input consumed by the downstream reasoning system.&lt;/p&gt;
&lt;h3&gt;
  
  
  5. Numerical construction interface
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;midp a b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This means the numerical engine should invoke a midpoint construction.&lt;/p&gt;

&lt;p&gt;Typical uses of the numerical layer include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generating concrete coordinate instances&lt;/li&gt;
&lt;li&gt;Checking whether a construction degenerates&lt;/li&gt;
&lt;li&gt;Providing numerical truth checks for predicates&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  5. Predicate System
&lt;/h2&gt;

&lt;p&gt;Predicates are the core input format of the AlphaGeometry reasoning system.&lt;/p&gt;

&lt;p&gt;Common core predicates include:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;predicate&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;coll A B C&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Three points are collinear&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;cong A B C D&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;AB = CD&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;perp A B C D&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;AB ⟂ CD&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;para A B C D&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;AB ∥ CD&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;eqangle ...&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Angles are equal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;cyclic A B C D&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Four points are concyclic&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These predicates enter the rule system defined in &lt;code&gt;rules.txt&lt;/code&gt; and trigger further inference.&lt;/p&gt;

&lt;p&gt;A typical flow is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;construction
  -&amp;gt; initial predicates
  -&amp;gt; rule firing
  -&amp;gt; new predicates
  -&amp;gt; goal reached / not reached
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So the core value of the DSL is not the action catalog itself, but its predicate generation capability.&lt;/p&gt;

&lt;p&gt;Whether an action is useful mainly depends on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which predicates it introduces&lt;/li&gt;
&lt;li&gt;Whether those predicates are likely to trigger rule chains&lt;/li&gt;
&lt;li&gt;Whether they significantly shorten the proof path to the goal&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  6. Common Action Types
&lt;/h2&gt;
&lt;h3&gt;
  
  
  1. Basic objects
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;free a
triangle a b c
quadrangle a b c d
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;free&lt;/code&gt; generates a free point&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;triangle&lt;/code&gt; generates the 3 base points of a triangle&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;quadrangle&lt;/code&gt; generates the 4 base points of a quadrilateral&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  2. Points on a line or circle
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;on_line x a b
on_circle x o a
on_pline x a b c
on_tline x a b c
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;on_line&lt;/code&gt; corresponds to a collinearity constraint&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;on_circle&lt;/code&gt; corresponds to an equal-radius constraint&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;on_pline&lt;/code&gt; corresponds to a parallel constraint&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;on_tline&lt;/code&gt; corresponds to a perpendicular constraint&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  3. Intersection constructions
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;intersection_ll x a b c d
intersection_lc x a o b
intersection_cc x o w a
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;These represent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;line-line intersection&lt;/li&gt;
&lt;li&gt;line-circle intersection&lt;/li&gt;
&lt;li&gt;circle-circle intersection&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  4. Basic geometric constructions
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;midpoint x a b
foot x a b c
mirror x a b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;midpoint&lt;/code&gt; generates a midpoint&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;foot&lt;/code&gt; generates a foot of the perpendicular&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;mirror&lt;/code&gt; generates a symmetric point&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key property of these actions is that a single invocation can introduce multiple high-density relations.&lt;/p&gt;
&lt;h3&gt;
  
  
  5. Triangle centers
&lt;/h3&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;circumcenter x a b c
incenter x a b c
excenter x a b c
centroid x y z i a b c
ninepoints x y z i a b c
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These actions typically introduce multiple relation groups at once, such as equidistance, angle bisection, and perpendicular bisectors.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Special polygons
&lt;/h3&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;square a b x y
rectangle a b c d
parallelogram a b c x
trapezoid a b c d
eq_trapezoid a b c d
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These primitives have stronger initial relations and are better suited for generating structured problems or high-constraint training samples.&lt;/p&gt;


&lt;h2&gt;
  
  
  7. Worked Example
&lt;/h2&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;orthocenter
a b c = triangle;
h = on_tline b a c, on_tline c a b
? perp a h b c
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The execution process is as follows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Parse the premises
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;triangle a b c&lt;/code&gt; generates the base point set and non-degeneracy conditions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Execute the construction
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;h&lt;/code&gt; is defined as the intersection of the following two constraints:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A line through &lt;code&gt;b&lt;/code&gt; perpendicular to &lt;code&gt;ac&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;A line through &lt;code&gt;c&lt;/code&gt; perpendicular to &lt;code&gt;ab&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 3: Materialize predicates
&lt;/h3&gt;

&lt;p&gt;The construction is converted into:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;perp b h a c
perp c h a b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4: Verify the goal
&lt;/h3&gt;

&lt;p&gt;The system checks whether it can derive:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;perp a h b c
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If yes, the goal is established. Otherwise, the problem is not proved under the current construction and rule set.&lt;/p&gt;


&lt;h2&gt;
  
  
  8. Execution Pipeline
&lt;/h2&gt;

&lt;p&gt;From problem text to goal verification, the typical pipeline is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Problem DSL
  -&amp;gt; parse.py
  -&amp;gt; action expansion via defs.txt
  -&amp;gt; geometry graph
  -&amp;gt; predicate inference via rules.txt
  -&amp;gt; DDAR solver
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The responsibility of each stage is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Parse the problem text and identify premises, constructions, and goals&lt;/li&gt;
&lt;li&gt;Look up the corresponding action definition in &lt;code&gt;defs.txt&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Expand variable dependencies, input conditions, and geometric constraints&lt;/li&gt;
&lt;li&gt;Write the constraints into the geometry relation graph&lt;/li&gt;
&lt;li&gt;Trigger new predicates according to &lt;code&gt;rules.txt&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Run reachability checks against the target predicate&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Numerical construction and symbolic reasoning usually coexist in parallel in this pipeline:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The numerical layer handles instantiation and truth checking&lt;/li&gt;
&lt;li&gt;The symbolic layer handles strict inference and proof tracing&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  9. Implementation Notes
&lt;/h2&gt;
&lt;h3&gt;
  
  
  1. Action design should optimize for relation output
&lt;/h3&gt;

&lt;p&gt;Whether an action is worth keeping should be judged by the quality of the predicates it introduces, not by whether the geometric meaning feels intuitive.&lt;/p&gt;
&lt;h3&gt;
  
  
  2. Degeneracy must be handled explicitly
&lt;/h3&gt;

&lt;p&gt;Cases such as coincident points, parallel lines without an intersection, and zero-radius circles should be intercepted either in input conditions or in the numerical layer.&lt;/p&gt;
&lt;h3&gt;
  
  
  3. Predicate coverage determines the expressive ceiling
&lt;/h3&gt;

&lt;p&gt;If the system can only express collinearity, parallelism, and perpendicularity, the representational power for harder geometry problems will become limited very quickly.&lt;/p&gt;
&lt;h3&gt;
  
  
  4. The numerical interface should not be omitted
&lt;/h3&gt;

&lt;p&gt;If symbolic definitions exist without numerical construction interfaces, the cost of data generation, debugging, and truth checking rises substantially.&lt;/p&gt;


&lt;h2&gt;
  
  
  10. Recommended Minimal Implementable Subset
&lt;/h2&gt;

&lt;p&gt;If you want a minimal version compatible with the AlphaGeometry approach, prioritize support for the following actions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;triangle&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;on_line&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;on_tline&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;on_pline&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;intersection_ll&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;midpoint&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;foot&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And support at least the following predicates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;coll&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;perp&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;para&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;cong&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;cyclic&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;eqangle&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is a small but workable protocol core.&lt;/p&gt;


&lt;h2&gt;
  
  
  11. Protocol Essence
&lt;/h2&gt;

&lt;p&gt;AlphaGeometry DSL can be summarized as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Geometry Construction DSL
+ Predicate Interface
+ Rule-System Input Layer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Its main value is not in describing diagrams, but in compressing geometry problems into an executable, verifiable, and inferable protocol layer.&lt;/p&gt;



&lt;h2&gt;
  
  
  12. Reliable Recommendation: Dino-GSP
&lt;/h2&gt;

&lt;p&gt;If you need a geometry representation environment that is more open than AlphaGeometry DSL and better suited for product and ecosystem integration, take a look at &lt;a href="https://dev.to/en/products/dino-gsp"&gt;Dino-GSP&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It also represents geometric objects as executable structures and defines its own DSL and constraint representation layer to support:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;More open geometry construction and editing workflows&lt;/li&gt;
&lt;li&gt;Ecosystem integration for teaching, content production, and AI geometry applications&lt;/li&gt;
&lt;li&gt;Programmable figure generation, constraint validation, and auxiliary structure construction&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If AlphaGeometry DSL is closer to an internal protocol for solvers and research systems, Dino-GSP is closer to an extensible product layer and an open ecosystem interface.&lt;/p&gt;



</description>
      <category>ai</category>
      <category>math</category>
    </item>
  </channel>
</rss>
