Forem: Luhui Dev

Dino-GSP Major Update: Algeo SDK 2.0 embedded editing mode is now available

Luhui Dev — Sun, 10 May 2026 15:16:35 +0000

Videos can be embedded. Documents can be embedded. Spreadsheets can be embedded.

But what about geometry?

For the past decade, whenever a product needed users to draw a geometry problem, edit a dynamic figure, or save an interactive geometry asset, the workflow usually broke in the same place: leave the product, use a separate tool, take a screenshot, and paste it back. That fractured workflow has sat in the middle of education platforms, teaching research systems, and AI math products for years.

Today, Algeo SDK 2.0 embedded editing mode is officially available. Geometry is no longer the missing embeddable format. It can now live inside your product like a standard component, with data flowing back into your business system, UI matching your product design, and permissions staying under your own control.

Here are five common scenarios we see. If any of them sounds like your product, this release is worth a closer look.

Scenario 1: online education platforms can let teachers create geometry problems in place

A high school math teacher is preparing tomorrow's geometry lesson on your platform. She needs an example problem about angle proofs in a circle.

Before: she opened a separate geometry tool, finished the diagram, took a screenshot, and pasted it back into your question bank. The text lived in one place and the image in another. Students saw a static picture that could not be dragged, edited, or reused after the test.

Now: she clicks "insert geometry board" in your question bank admin, and the Algeo editor opens in place. Circles, points, and auxiliary lines are created in the same workflow. When she saves, the board data enters your question bank and is bound to her account, school, and textbook chapter.

When students open the problem, they can drag a point on the circle and see the angle change directly. Throughout the whole process, your product stays in control: the data is yours, the permissions are yours, the content rights are yours, and the user behavior logs are yours.

Scenario 2: AI math products can let AI and students work on the same board

This is one of the fastest-growing customer categories we have seen over the past year.

A student uploads a photo of a geometry problem. Your AI parses the problem and generates a solution path. But text alone is not enough. The student needs to see why an auxiliary line is drawn that way, and needs to test by hand whether an equality still holds when a point starts moving.

Algeo embedded editing closes that loop for the first time:

After AI parsing, code can generate board content and load it into the editor automatically
Students interact directly inside your product by dragging, modifying, and trying alternatives
Every student edit can be sent back to your system as an event and used in the next AI analysis round
AI can respond to the student's specific change instead of giving generic explanation

Education is a feedback loop. Text plus static diagrams can no longer carry that loop for geometry. The missing piece is a board that can be driven by code while still giving students hands-on control.

Scenario 3: educational publishing can turn geometry assets into a managed production workflow

In many publishing workflows, geometry illustrations used to operate like a separate workshop: an author drew the figure, a designer remade it as vector art, an editor reviewed it, and a layout designer processed it again. One geometry asset for one problem could pass through four tools and five people.

After embedding Algeo into a content management system, that pipeline becomes much flatter:

Authors write problems and draw figures directly in the CMS, with assets stored as structured geometry data rather than images
Editors can open the original board and revise it directly instead of asking the author to recreate it
The same geometry data can export to PDF, web, print, and interactive courseware: draw once, reuse everywhere
Version control stays inside the CMS, so geometry boards stop being external unmanaged files

For content organizations, this is not just about saving one tool. It is about turning geometry into a managed asset.

Scenario 4: schools and institutions can finally build a shared geometry asset library

Teaching research has an old pain point: Chinese language groups have material libraries, English groups have corpora, math teams have question banks, but geometry often remains scattered. Every teacher has dozens of local geometry source files. They leave with the teacher, disappear with an old computer, and are hard for new teachers to inherit.

When an institution embeds Algeo into its collaborative teaching research platform:

Geometry assets enter the institutional asset library and can be organized by subject, grade, and knowledge point
Teachers can remix the same board while keeping a complete revision history
New teachers can receive accumulated geometry resources on day one
Permissions and approvals follow the institution's own rules, including what can be shared broadly and what stays inside a subject group

Scenario 5: question banks and homework systems can make geometry a first-class format

Many question bank systems have structured templates for multiple choice, fill-in-the-blank, and written-response questions. Geometry is often still just an image. That creates three limits:

Similar-question recommendation is weak because the system cannot tell whether two geometry problems share the same mathematical structure
Fine-grained grading is hard because the student's answer often comes back as another image
Learning analytics are shallow because the system cannot see which construction step caused the student to get stuck

Once Algeo turns geometry problems into structured data, these workflows become possible:

Both the problem and the solving process are structured, so the question bank can handle geometry more like algebra
Every student operation can be reported back, allowing the grading system to locate which point was moved at which step
Learning analytics can tell a teacher that 70% of a class did not think to draw a specific auxiliary line

What is ready at the technical level

The scenarios are compelling, but production adoption is always an engineering problem. Algeo SDK 2.0 is designed to be production-ready in several core areas.

Bidirectional communication with clear data ownership

Every edit, board switch, and save request can be sent back to the host application through postMessage. You control the save button. The iframe does not bypass your business system to persist anything directly. When to save, where to save, and which permissions are required are all decided by your backend. The SDK only maintains the UI state for saved and unsaved changes.

Fully configurable UI that fits into your product

The navigation bar, board list, toolbox, algebra panel, and document panel can each be toggled independently at runtime. In an AI-assisted scenario, the editor can be reduced to a clean canvas. In a professional authoring scenario, the full toolchain can be shown. In advanced integrations, you can even replace our board list with your own UI and drive it through the SDK capability APIs.

Engineered capability layers

The SDK separates editor capabilities into four clear units: board file document, multi-board slides, history, and display mode. Each unit can be called independently, which also gives us room to improve each one over time without breaking the others.

Versioned protocol for long-term evolution

Every handshake between the SDK and iframe carries a protocol version. That means an integration you build today can continue to work after future upgrades, while still allowing us to deliver new capabilities without asking you to rewrite the integration every time.

Production-oriented robustness

The SDK includes a 30-second initialization timeout, standardized error codes, a clean destroy lifecycle, and self-hosted base URL support through baseUrl. These details matter when a real product faces network jitter, CSP rules, and complex route changes in single-page applications. We have already validated the approach in multiple production customer environments.

Why choose Dino-GSP and Algeo

There are very few teams in China that can build a dynamic geometry editor at this level. We spent a year making it production-ready, then another release cycle turning it from a product into a component. Geometry as a category really opens up only when it can be installed inside any product.

If your product contains the word "geometry", whether in K12, higher education, AI math, educational publishing, or teaching research, we would be glad to talk.

Docs: open.dajiaoai.com

Repository: github.com/dajiaoai/algeo-sdk

Put a geometry board inside your product, starting today.

AHE Deep Dive: How Coding Agent Harnesses Automatically Evolve

Luhui Dev — Mon, 04 May 2026 15:02:16 +0000

Introduction

When building a coding agent, the capability of your base model is only part of the equation. In real production scenarios, what matters just as much is the harness wrapped around that model — the prompt, tools, middleware, memory, execution environment, trace, and evaluation pipeline.

This is exactly what the AHE paper addresses: how to make a coding agent's harness continuously observable, modifiable, testable, rollback-able, and even self-iterating — just like software engineering.

The full paper title is "Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses", authored by researchers from Fudan University, Peking University, and Shanghai Qiji Zhifeng Co., Ltd. The academic teams bring methodological design, while the industry team contributes experience from Agent/LLM infrastructure and Nex AGI systems.

Even better, AHE is open source: china-qijizhifeng/agentic-harness-engineering.

This makes it more than just a paper concept — you can directly examine the seed coding agent, evolve agent, experiment configs, traces, manifests, and rollback structures. For anyone building coding agents, agent infrastructure, or broader agent products, this repository is worth dissecting.

This article explores three questions: why AHE works, how it evolves harnesses, and how to start your own small experiment with the repository.

Part 1: A Quick Intro to Harness Engineering

A harness is the external engineering shell that makes a model actually work. In a coding agent, it typically includes:

System prompt: defines the agent's basic working mode
Tools: file I/O, shell, search, test execution, code modification, etc.
Tool descriptions: what the model sees about tool usage and parameter schemas
Middleware: interception, validation, correction, and logging before/after tool calls
Memory: short-term, long-term, and experience accumulation
Context management: compression, pruning, and retrieval
Execution environment: sandbox, permissions, runtime isolation
Evaluation/observability: testing, trace, logs, rewards, failure reports, regression tracking

This structure determines how the model approaches tasks, invokes tools, handles failures, and judges completion.

For example, when a shell command hangs in production, the solution isn't to keep adding "don't use interactive commands" to the prompt. A more robust approach: add timeout to the shell tool, use middleware to detect high-risk commands, truncate long outputs at the response layer, and enforce state checks before task completion.

This is the essence of Harness Engineering: putting agent capabilities into a maintainable runtime system.

I won't dive deeper into the Harness concept here. If you want to learn more, search for keywords like: Harness Engineering, Agent Harness, Agent Runtime, Tool-use Agent, Agent Observability, Agent Evaluation, Coding Agent Infrastructure.

Let's move to the main focus of this article.

Part 2: AHE's Core Positioning — Self-Iterating Coding Agent Harnesses

AHE stands for Agentic Harness Engineering.

The paper's subtitle contains the key phrase: Observability-Driven Automatic Evolution of Coding-Agent Harnesses.

This breaks down into three layers:

First, AHE targets coding agent harnesses. It doesn't train new models or modify base model parameters.

Second, it performs automatic evolution. The goal isn't a one-time manual prompt tweak, but continuous harness evolution across multiple runs.

Third, it relies on observability. Changes come from traces, logs, rewards, failure analysis, change manifests — not from vague "self-reflection" in a prompt.

So AHE's precise positioning is:

An automatic evolution framework for coding agent harnesses. Through observable runtime evidence, it continuously improves the agent's surrounding prompt, tools, middleware, memory, skills, and sub-agents.

This is the key difference from ordinary prompt optimization. AHE does modify prompts, but its action space is much larger — it includes tools, middleware, and memory as evolvable structures.

Part 3: AHE's Experimental Results

AHE's main experiments ran on Terminal-Bench 2. The paper reports that after 10 iterations, AHE improved the seed harness's pass @1 from 69.7% to 77.0%. This shows that on the target benchmark, AHE found effective harness modifications.

The ablation study is even more revealing. The paper replaced different components in full AHE back to the seed harness individually, with roughly these results:

This result is highly informative.

If gains mainly came from better system prompts, prompt-only should improve. But in the experiment, prompt-only actually decreased, while memory, tools, and middleware showed more significant improvements.

This means AHE's key benefits come from structural harness modifications. It also suggests that in complex tasks, many agent failures require harder (more engineering-focused) mechanisms: tool behavior, runtime interception, state recording, long-term experience, regression testing.

The paper also conducted transfer experiments. When the evolved harness transferred to SWE-bench-verified, success rate gains were small, but token usage dropped more noticeably. This suggests AHE's evolved structures may be better at reducing ineffective exploration and context waste.

Cross-model transfer is also noteworthy. When AHE-generated harnesses were applied to multiple base models, the paper reports positive gains across the board. This indicates the learned components contain some transferable engineering structures.

My assessment: AHE's prediction of "which changes will fix problems" is significantly better than random, but its prediction of "which changes will cause regressions" is still relatively weak. It does prove that harnesses can be continuously evolved in a file-based, evidence-based, version-controlled manner.

Part 4: AHE's Key Workflow — Evaluate, Diagnose, Modify, Verify, Rollback

AHE's main loop:

graph TD
    A[Current Harness] --> B[Run Code Agent on benchmark]
    B --> C[Collect trace, log, reward]
    C --> D[Analyze failure patterns]
    D --> E[Evolve Agent modifies Harness files]
    E --> F[Write change_manifest]
    F --> G[Re-evaluate next round]
    G --> H[Verify if changes work, rollback if needed]
    H -.-> A

This closed loop has three main actors.

First is the Code Agent.

This is the actual agent completing coding tasks, and the object being optimized. In the AHE repository, the seed agent is quite simple — basically a bash-only coding agent.

Second is the Agent Debugger.

It reads the Code Agent's execution traces and compresses massive traces into readable failure reports. After a benchmark run, raw traces can be extremely long, making direct model reading too costly. Agent Debugger converts these traces into overviews and per-task analyses, providing evidence for subsequent modifications.

Third is the Evolve Agent.

It reads the previous round's results, failure analysis, and historical modification records, then modifies harness files in the workspace. Its modification targets include prompts, tools, middleware, memory, skills, sub-agent configs, etc.

AHE adds strong engineering constraints to this process:

Every modification must land in files. Every modification requires a manifest. The next round must verify predictions in the manifest. Poor results must be rollback-able. The entire process should leave an auditable evidence chain.

The self-reflection agent must answer more specific questions: which file was changed, why, which tasks are expected to be fixed, which tasks might be harmed, and whether the next round's results validate this judgment.

Part 5: What Evolvable Components Does AHE Break the Harness Into?

AHE's first step is breaking the harness into explicit components.

The paper emphasizes several evolvable object types:

System Prompt: Defines the Code Agent's basic behavior, like executing shell non-interactively, checking state before task completion, not exiting prematurely.

Tool Descriptions: What the model sees about tools. The tool itself might not change, but if the description changes, so does how the model calls it.

Tool Implementations: The actual tool implementation. For example, how the shell tool executes commands, handles timeouts, truncates output, returns error messages.

Middleware: Runtime interception layer. It can check before/after tool calls, like detecting dangerous commands, reminding about unverified tasks, blocking premature endings, recording risk states.

Skills: Reusable experience. Think of these as operation manuals for certain task patterns.

Sub-agents: Sub-agent configurations. Complex tasks can be split to different roles.

Long-term Memory: For accumulating experience across tasks and rounds.

This decomposition gives the Evolve Agent a richer action space. It can choose the right place to intervene based on failure evidence.

Example: Code Agent keeps hanging in shell. The least efficient approach is adding more prompt reminders. AHE's path is more engineering-focused: add timeout to shell tool; middleware checks for obviously interactive commands; return messages explicitly state failure reasons; system prompt adds behavioral constraints.

These structural modifications are more stable and easier to reuse and rollback.

The key is understanding the positioning: prompts are behavioral suggestions; tools, middleware, and memory are execution mechanisms.

AHE's value lies in bringing these execution mechanisms into the evolution scope.

Part 6: Three Layers of Observability — How AHE Avoids Blind Search

Just having an agent randomly modify files and rerun benchmarks has limited value. AHE's core design is three layers of observability.

1. Component Observability

Component observability means the system knows what parts the harness has, where each part is, how to modify it, and how to register it.

In the AHE repository, prompts, tool descriptions, tool implementations, middleware, memory, etc., all appear as files. New tools need YAML descriptions and Python implementations, plus config registration; new middleware needs explicit integration; new skills or sub-agents also need config exposure.

2. Experience Observability

Experience observability means after an agent runs, the system records how it succeeded or failed.

AHE collects each task's trace, runtime log, reward, etc. Then Agent Debugger compresses these raw traces into analysis reports.

When a coding agent fails, simply knowing "it failed" isn't very useful. What you really need to locate is the failure level: command execution failure, dependency installation failure, test not run, file path error, output too long causing context pollution, agent prematurely judging task complete, losing previous state in long tasks.

Through traces and analysis, AHE turns failures into readable, summarizable, actionable evidence.

3. Decision Observability

After each modification, the Evolve Agent must write a change_manifest.json. This manifest records which files were changed, what failure pattern they address, why this component was chosen, which tasks are expected to be fixed, which might regress, and the modification's constraint strength.

After the next evaluation round, the system checks this manifest to see if predictions came true.

This step turns every modification into a verifiable hypothesis. Even without using AHE's full automatic evolution pipeline, just introducing the change manifest habit into your own agent team will immediately improve engineering transparency.

Many agent projects struggle with long-term maintenance precisely because of this: lots of prompt changes, lots of tool adjustments, but nobody knows what each change actually solved, and nobody knows if it introduced new problems. AHE's manifest mechanism at least makes this process auditable.

Part 7: AHE's Engineering Organization from the Repository

The main entry point for the AHE repository is evolve.py. It orchestrates the entire evolution workflow, including initializing workspace, running evaluations, handling iteration directories, doing attribution, recovery, and rollback.

The seed agent being evolved is agents/code_agent_simple/, which includes:

code_agent.yaml describes how this agent loads prompts, which tools it uses, what tracer to use.

systemprompt.md is the initial system prompt.

LongTermMEMORY.md and ShortTermMEMORY.md correspond to long-term and short-term memory interfaces. tool_descriptions/ holds tool descriptions, tools/ holds tool implementations.

The Evolve Agent is in agents/evolve_agent/. Key files worth examining:

evolve_agent.yaml defines what tools, middleware, and skills the Evolve Agent itself can use.

evolve_prompt.md is an evolution contract: it specifies that Evolve Agent can only modify workspace, must make evidence-based changes, must write summaries and manifests, must follow registration rules.

Config files are in configs/ and configs/experiments/. configs/base.yaml is the base config, configs/experiments/exp-simple-code-gpt54.yaml is a config overlay close to the paper experiments.

Launch scripts are in scripts/, like scripts/evolve.sh for starting long experiments, scripts/build_templates.py for building task templates for E2B.

If you just want to understand the project, you don't need to read all files at once. I recommend this reading order:

README
  ↓
agents/code_agent_simple/code_agent.yaml
  ↓
agents/code_agent_simple/systemprompt.md
  ↓
agents/evolve_agent/evolve_prompt.md
  ↓
configs/base.yaml
  ↓
configs/experiments/exp-simple-code-gpt54.yaml
  ↓
evolve.py

This sequence helps you build concepts first, then see execution details.

Part 8: Getting Started with the Repository — Run a Small Experiment First

AHE is not a lightweight SDK. You can't expect to pip install and immediately embed it in production systems.

It's more like a research experiment framework. Running full paper-level experiments requires LLM API, E2B sandbox, SERPER API, benchmark data, concurrent scheduling, and considerable token costs.

So a more realistic onboarding approach is to run a minimal closed loop first.

Set the goal as: get AHE's core pipeline running.

That is:

graph LR
    A[Task execution] --> B[Trace generation]
    B --> C[Analysis generation]
    C --> D[change_manifest written]
    D --> E[Next round re-evaluation]
    E --> F[change_evaluation<br>judges modification effect]

Once this pipeline works, you understand AHE's practical value.

1. Clone the Repository

Official repository:

git clone https://github.com/china-qijizhifeng/agentic-harness-engineering.git
cd agentic-harness-engineering

2. Install Dependencies

The project uses uv to manage Python dependencies.

uv sync

3. Configure Environment Variables

Copy the environment variable template:

cp .env.example .env

At minimum, pay attention to these variables:

LLM_API_KEY
LLM_BASE_URL
E2B_API_KEY
SERPER_API_KEY
GITHUB_TOKEN

Agent Debugger can also configure model endpoints separately. Refer to .env.example for specifics.

One important note: AHE's task execution depends on E2B sandbox. Much code execution happens in isolated remote environments. This helps with security and reproducibility, but also means you need an E2B account and credits.

4. Prepare Benchmark Task Templates

The official workflow requires building task templates first. Example command:

uv run python scripts/build_templates.py --dataset-dir /path/to/dataset -j 16

Replace /path/to/dataset with your actual task data path.

If you're just doing a small experiment, I don't recommend preparing full Terminal-Bench 2 at the start. Select a few tasks and get the pipeline working first — that's more important.

5. Start with a Small Config

For paper experiment config, refer to:

configs/experiments/exp-simple-code-gpt54.yaml

Running the full config is quite costly. Copy a small config, for example:

cp configs/experiments/exp-simple-code-gpt54.yaml configs/experiments/exp-mini.yaml

Then reduce the parameters:

max_iterations: 2
harbor:
  k: 2
  n_concurrent: 4

If the config supports specifying task subsets, use only 3 to 5 tasks. The point of a small experiment is validating the workflow, not chasing scores.

6. Launch the Evolution Experiment

You can use the script:

./scripts/evolve.sh configs/experiments/exp-mini.yaml

Or look inside the script to see how it calls evolve.py, then manually launch as needed.

Full experiments can run for a long time. Even small experiments require attention to API costs, E2B concurrency limits, and network stability.

7. Look at Experiment Artifacts, Not Just Scores

After running, don't just look at pass rate.

What's more worth examining are these artifacts:

runs/iteration_*/
analysis/overview.md
analysis/detail/*.md
change_manifest.json
change_evaluation.json
agent/nexau_in_memory_tracer.cleaned.json
verifier/reward.txt

After running, focus on observing and answering these questions:

What patterns were this round's failures attributed to?
Which files did Evolve Agent change?
Why did it choose to change these files?
Which tasks does the manifest predict will be fixed?
Did the next round verify this prediction?
Were there cases where fixing one task broke another?

If you can find answers to all these questions in the artifacts, it means AHE's core closed loop is working.

Part 9: What AHE Hasn't Solved Yet

AHE is valuable, but its boundaries should be clear too.

First, it's still a research framework. Full runs aren't cheap, requiring benchmarks, sandboxes, LLM APIs, and fairly complex experiment configs.

Second, the effectiveness evidence in the paper needs more replication experiments. The improvement on Terminal-Bench 2 is clear, but for strong statistical conclusions, more seeds, more campaigns, and more confidence intervals are needed.

Third, its prediction of regression risk isn't strong enough. The system is better at explaining what a modification might fix, but not as good at judging what it might harm. This is a hard problem for automatic evolution systems.

Part 10: AHE's Inspiration for Agent Product Teams

AHE's biggest inspiration for product-focused agent teams is pulling agent improvement processes from "mystical prompt tuning" back into the engineering world.

A real agent product will eventually face these questions:

After a user reports an error, how do you reproduce it?
How do you aggregate failure causes?
Did a certain prompt modification actually help?
Did a tool change regress other scenarios?
Is there regression testing before release?
Can you rollback if production performance degrades?
How do you distill effective experience into memory or skills?

No single model can solve these problems for you.

They belong to the scope of harness engineering work.

If you're also building your own agent, this repository is worth thoroughly dissecting. Even without running it completely, you can learn a lot about harness organization, trace design, modification attribution, and regression verification engineering methods.

References

Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses\
arXiv: https://arxiv.org/abs/2604.25850
AHE Official Code Repository\
GitHub: https://github.com/china-qijizhifeng/agentic-harness-engineering
Harness engineering: leveraging Codex in an agent-first world\
OpenAI Engineering Blog: https://openai.com/index/harness-engineering/

🙋‍
I’m Luhui Dev, a developer who has been breaking down Agent engineering and exploring how AI can be applied in education.
I focus on Agent Harness, LLM application engineering, AI for Math, and the productization of education SaaS.

Why Signatures Make Automatic Optimization Easier Than Writing Prompts Directly

Luhui Dev — Wed, 22 Apr 2026 13:40:26 +0000

A great discovery from my recent project work: DSPy.

While building the content generation pipeline for Canviz, I encountered a recurring engineering problem—it was extremely difficult to maintain stable "problem explanation quality + canvas script usability" through prompts alone. Whenever I switched models or added new grade levels, I had to re-tune the entire string of prompts. DSPy offered me a systematic solution that's worth sharing separately.

The Fundamental Contradiction of Prompt Engineering

Before diving into DSPy, I need to clarify one thing: Why is writing prompts an engineering problem, not just a matter of technique?

Traditional prompts have a fatal design flaw: they mix "what I want to do" with "how to tell the model to do it."

That natural language prompt you write simultaneously handles two things:

Describing the task's logic (what inputs to accept, what outputs to produce);
The "incantation" tuned for this specific model.

Take a math teaching scenario as an example—the logic of "explaining a chicken-and-rabbit problem" is eternal, but the incantation to make GPT explain it well versus making Claude Sonnet explain it well can be quite different. Once you switch models, or change from third grade to fifth grade, that incantation might fail. Worse yet, there's no systematic way to fix it—you can only rely on intuition and trial-and-error.

This is what software engineering calls the hard-coding problem. For ordinary logic, we've long learned not to hard-code; but for AI pipelines, we willingly lock the most core logic into a fragile string.

DSPy's author, Stanford's Omar Khattab, describes this problem as:

"Existing LM pipelines are typically implemented using hard-coded prompt templates, discovered through trial and error, and extremely brittle."

What is DSPy? What's Its Core Insight?

DSPy (Declarative Self-improving Python) is a framework open-sourced by Stanford NLP Lab in 2023, published at ICLR 2024. Its core proposition is:

Programming language models, not prompting them.

It offers an elegant solution: completely separate the task's interface description from the specific prompt implementation.

You only need to tell DSPy:

What this step inputs and outputs;
What the logical structure of the entire pipeline is;
What your evaluation criteria are.

Then DSPy's Compiler and Optimizer will automatically find the best prompt for you—tailored to your chosen model, your data, and your metrics.

To borrow the official analogy: This is like jumping from assembly language to high-level languages, or from writing raw SQL to using an ORM.

Three Core Concepts to Understand DSPy's Full Picture

1. Signature: Type Signature of Tasks

Signature is DSPy's interface description. It tells the framework what this step does, not how to do it, using a type-declaration-like approach:

import dspy

class ExplainMathProblem(dspy.Signature):
    """Explain a math problem to students of a specified grade, using language appropriate to their cognitive level."""
    problem: str = dspy.InputField(desc="Original text of the math problem")
    grade: int = dspy.InputField(desc="Student grade level, e.g., 3 for third grade")
    explanation: str = dspy.OutputField(desc="Step-by-step explanation suitable for the grade, friendly and easy to understand")
    key_concept: str = dspy.OutputField(desc="Core concept tested by this problem, explained in one sentence")

Notice: you haven't written any prompt at all. This only contains the semantics of the interface, without any "you are a gentle and patient math teacher..." type of prompting.

2. Module: Composable Functional Units

Module is DSPy's execution unit, inspired by PyTorch's nn. Module. You can compose them like building blocks to construct a complete teaching content generation pipeline:

class MathLessonPipeline(dspy.Module):
    def __init__(self):
        # Step 1: Explain the problem
        self.explain = dspy.ChainOfThought(ExplainMathProblem)
        # Step 2: Generate corresponding Dinogsp geometry visualization script based on explanation
        self.generate_diagram = dspy.Predict(
            "problem, explanation -> dinogsp_script: str"
        )
        # Step 3: Create a practice problem of the same type
        self.make_exercise = dspy.Predict(
            "problem, key_concept, grade -> exercise: str, answer: str"
        )

    def forward(self, problem, grade):
        # Explain
        step1 = self.explain(problem=problem, grade=grade)
        # Generate diagram
        step2 = self.generate_diagram(
            problem=problem,
            explanation=step1.explanation
        )
        # Create practice problem
        step3 = self.make_exercise(
            problem=problem,
            key_concept=step1.key_concept,
            grade=grade
        )
        return dspy.Prediction(
            explanation=step1.explanation,
            dinogsp_script=step2.dinogsp_script,
            exercise=step3.exercise,
            answer=step3.answer
        )

This entire three-step pipeline doesn't contain a single word of prompt—everything written is logic.

DSPy includes several classic reasoning strategy modules:

Module	Corresponding Reasoning Method	Application in Teaching
`dspy. Predict`	Direct prediction	Problem difficulty grading, concept tagging
`dspy. ChainOfThought`	Chain of Thought (CoT)	Step-by-step problem-solving explanation
`dspy. ReAct`	Reasoning-Action loop	Calling external tools to validate scripts
`dspy. ProgramOfThought`	Program-based thinking	Generating executable math calculation code

3. Optimizer: Automatic Tuning Engine

This is the most magical part of DSPy, where its truly unique value lies.

You need to provide:

An evaluation dataset (e.g., 100 problems, each with manually annotated good explanation samples);
An evaluation metric function (to judge whether the generated explanation is good).

Then call the optimizer, which will automatically search for the optimal combination of prompt instructions and few-shot examples:

# Define evaluation metric: whether explanation is age-appropriate, whether diagram script is parseable
def lesson_quality_metric(example, prediction, trace=None):
    explanation_ok = len(prediction.explanation) > 50  # Basic length
    script_parseable = validate_dinogsp(prediction.dinogsp_script)  # Script usability
    grade_appropriate = check_vocabulary_level(
        prediction.explanation, example.grade
    )  # Age-appropriate vocabulary
    return explanation_ok and script_parseable and grade_appropriate

# Optimize using MIPROv2
optimizer = dspy.MIPROv2(metric=lesson_quality_metric, auto="medium")
optimized_pipeline = optimizer.compile(
    MathLessonPipeline(),
    trainset=annotated_lessons  # Your annotated data
)

# Save results, load directly in production without re-optimization
optimized_pipeline.save("./optimized_math_lesson.json")

A medium-level optimization costs about $10 and takes 20 minutes to run, resulting in a teaching content generation system automatically tuned for your chosen model and specified grade-level data.

Looking at the Data

DSPy's official documentation provides a set of impressive data:

On the HotPotQA multi-hop reasoning task (which requires combining information across documents, very similar to the logical structure of math word problems), running dspy. ReAct with the gpt mini series:

Before optimization: 24% accuracy
After MIPROv2 optimization with 500 samples: 51% accuracy

More than doubled, not by switching to a more expensive model, but by teaching this smaller model to better complete this type of task.

The Essential Difference from LangChain/LlamaIndex

You might wonder how DSPy differs from LangChain—for instance, if you're already using LangChain, do you need to switch?

LangChain / LlamaIndex are tool chain orchestration frameworks. They connect components like LLMs, vector databases, and tool calls, but the prompts themselves are still strings written by humans. If you switch models, you still have to manually modify the prompts.

DSPy is an AI program compilation framework. It doesn't just connect components—it takes over the generation and optimization of prompts. Humans are responsible for writing the logic, while it translates that into the most effective natural language instructions for a specific model.

Specifically for math teaching scenarios: if you built a "generate third-grade explanations" pipeline with LangChain, and tomorrow the product requires fifth-grade support, you need to manually go back and modify all related prompt strings—because the vocabulary and logical depth requirements for fifth grade have changed. With DSPy, you only change the input parameter grade=5, then rerun compilation, and the framework will automatically adjust the internal prompting strategy.

If I were to make an analogy: LangChain is an automated assembly line, DSPy is a high-level language with a JIT compiler.

My Developer Perspective: What It Solves, What's Still Missing

After all these praises, I should also mention what I think it still lacks.

What DSPy truly solves:

Pain of model migration: Switching from GPT-5.4 to the cheaper Kimi 2.5, just recompile once—no need to manually modify prompts;
Multi-step joint optimization: Explanation quality + diagram script usability—these two goals were previously hard to optimize simultaneously, but DSPy's compiler can perform global optimal search;
Reproducible experiments: Optimization results saved as JSON, shareable with the team, version-controlled, goodbye to "which document has that best-performing prompt we used before?"

Current limitations:

Evaluation metrics are the hard part: Functions like validate_dinogsp() need to be written by you, and writing them well isn't easy. DSPy's optimization effectiveness highly depends on metric quality—vague metrics lead the optimizer to game the system;
Optimization isn't free: Medium-level optimization on 100 samples costs about $2; if you have multiple grade levels and problem types, costs will rise significantly as data volume increases;
Debugging experience is still maturing: When an optimized pipeline still underperforms, it's sometimes hard to determine whether it's insufficient data, flawed metrics, or the model's inherent capability boundary.

When Should You Use DSPy?

If you're encountering any of the following situations, it's worth seriously considering DSPy:

✅ Very suitable:

You're building multi-step LLM pipelines (explanation + diagram + practice problems is exactly this structure)
You need to switch between different models (cost control, or selecting different capability models by age group)
You have an evaluation dataset and want quantifiable improvement in effectiveness
You're tired of modifying prompts by feel and want a systematic optimization method
Your application needs long-term maintenance in production

⚠️ Not quite suitable:

Just quickly validating an idea, no need for long-term maintenance
The task has no clear evaluation metrics, leaving the optimizer with nothing to work with

Final Thoughts

I think DSPy's approach is good because it proposes a more engineering-reliable way of thinking:

Prompts in AI pipelines are essentially parameters of the program, not the program's source code.

Just as I wouldn't hard-code neural network weights into source code, I shouldn't treat prompts tuned for a specific model as the program logic itself. These weights should be systematically learnable, optimizable, savable, and transferable.

The logic of teaching content is stable—step-by-step, illustrated, age-appropriate expression; but how to guide the model to achieve all this will constantly change with model updates, grade expansions, and problem type additions. Using DSPy to separate the two enables a truly maintainable AI teaching system.

🙋‍♀️ If you're also working on AI education, feel free to connect.

References

DSPy Official Documentation: dspy.ai
Paper: DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines, ICLR 2024
GitHub: stanfordnlp/dspy
Optimizer Details: dspy.ai/learn/optimization/optimizers

Struggling with Research Figures? Here's How Multi-Agent Collaboration Gets It Right

Luhui Dev — Sat, 11 Apr 2026 08:51:32 +0000

Struggling with Research Figures? Here's How Multi-Agent Collaboration Gets It Right

The Problem Every Researcher Knows Too Well

Anyone who's done research knows this pain: creating a single figure from concept to completion can be more exhausting than writing the actual paper. You need logical structure, data precision, and style compliance—miss any one of these, and you're back to the drawing board.

Single-model AI generation tools often produce beautiful images with broken logic, or logically sound diagrams that look terrible, or worst of all—figures where all the proportions are completely off.

PaperBanana solved this problem, and it works remarkably well. The key insight? Break the task into multiple roles and let an AI team collaborate.

Why Traditional AI Falls Short

Many assume that throwing a large language model at the problem should work. But research figures aren't ordinary illustrations—they need to accurately express logic, ensure data precision, and ultimately meet academic journal aesthetics.

A single model can't nail all three at once. The result? Either gorgeous images with completely wrong logic, or logically correct diagrams that look like they're from the '90s, and almost always with numerical proportions that make no sense.

This is the core pain point of research figure generation, and exactly why solutions like PaperBanana emerged.

PaperBanana's Five-Role Collaboration

PaperBanana's design philosophy is simple: Split the generation task into five specialized roles, let each focus on what they do best, then collaborate iteratively.

The Visual Workflow

1. Retriever — The Inspiration Board

The Retriever searches through a curated reference database to find the most relevant examples.

It focuses on visual structure matching, ensuring that subsequent generation has reliable layout references to work from.

Think of it like a designer browsing templates before starting to sketch—that's what the Retriever does.

2. Planner — The Skeleton Designer

The Planner is the core brain. It transforms paper descriptions and figure objectives into detailed figure plans, including:

Figure components (nodes/modules)
Logical relationships and arrow directions between components
Spatial layout suggestions
Labels, annotations, etc.

The Planner's core job is to provide the skeleton, preventing the generation from going off the rails.

3. Stylist — The Aesthetic Director

With the skeleton in place, the Stylist handles the aesthetics.

It extracts colors, fonts, line weights, and shapes from reference examples, optimizing the Planner's output to meet journal standards.

NeurIPS and Nature have different figure styles—the Stylist ensures generated figures comply with academic norms.

4. Visualizer — The Executor

The Visualizer generates figures based on the standardized plan:

Method figures → Rendered using high-quality image generation models
Data charts → Outputs reproducible Matplotlib code

This means generated figures aren't just pretty—they're directly usable as research materials, reproducible and modifiable.

5. Critic — The QA/Feedback Loop

The Critic is key to closing the loop. It checks whether the figure faithfully reflects the text, whether it's clear, and whether it meets style specifications.

If unsatisfied, it provides revision suggestions, prompting the Planner/Visualizer to iterate. Usually 2–3 rounds produce high-quality figures.

Why Multi-Role Collaboration Works

Compared to single-model end-to-end generation, PaperBanana has three major advantages:

Reference-driven: The Retriever provides structural and stylistic examples, making generation more reliable
Clear division of labor: Logic, style, and rendering are separated, avoiding the chaos of black-box generation
Closed-loop self-checking: Critic + iteration makes figure quality controllable

In other words, this is a process innovation for AI-assisted research figure creation. In experiments, PaperBanana significantly outperformed baselines in fidelity, readability, and aesthetics.

If you're interested in the design of this scenario, I've compiled the complete Prompt set—grab it below 👇

Beyond Academic Figures

This multi-role collaboration pattern isn't limited to academic illustrations.

For flowcharts, experimental design diagrams, teaching demonstrations, automated data visualization, and even complex tasks like code generation and decision planning, multi-agent collaboration proves more reliable.

References

Dino-GSP Major Update: dynamic geometry demos, geometry embeds, and AI drawing upgrades

Luhui Dev — Tue, 07 Apr 2026 12:52:04 +0000

Dino-GSP 2.4.0 was released on March 23, 2026. This update is not just a list of extra features. It connects dynamic geometry demos, online geometry embeds, region area calculation, and AI geometry drawing into a more complete workflow.

If you are comparing dynamic geometry software, online geometry tools, math teaching tools, or interactive geometry platforms for lessons, content, or websites, this release deserves attention.

Dino-GSP 2.4.0 at a glance

This release focuses on four high-frequency needs:

Slider-based dynamic demos that make geometry figures actually move
Geometry embed mode for blogs, course pages, and product sites
Boolean region operations and area calculation for more complex analysis
Broader AI geometry assistance that fits real creation workflows

1. Dynamic geometry demos upgraded: sliders are now a first-class feature

The point of dynamic geometry is not just drawing figures. It is showing parameter changes, geometric relationships, and reasoning processes in motion. The latest Dino-GSP release fully rounds out slider support and makes it much closer to a real dynamic geometry software workflow for classrooms and content creation.

This upgrade includes:

Create and edit dynamic parameters: sliders can directly control lengths, angles, and point positions, with figures updating in real time.
Text-linked values: slider values can be inserted into explanatory text so teaching copy updates together with the figure.
Autoplay support: presentation and sharing modes support autoplay, speed adjustment, and looping for lessons and recorded demos.
More complete exports: sliders can be exported to SVG and TikZ while preserving labels and control styles for papers, handouts, and blogs.

This pushes Dino-GSP beyond a static geometry board and makes it more suitable for interactive geometry demos, classroom walkthroughs, and parameter-driven explanations.

2. Geometry embed mode arrives: the online geometry tool can now live inside web pages

For course builders, bloggers, and documentation teams, the ability to embed geometry into a page is a practical requirement. The latest Dino-GSP release adds a full geometry embed mode.

2.1 Where this helps

Embedding interactive geometry into teaching blogs
Showing manipulable math demos inside online courses
Adding interactive diagrams to product sites or knowledge bases
Preserving parameter control and geometry state in shared pages

2.2 What is included

A complete embed architecture: dedicated routing, state synchronization, and communication bridging.
iframe export: exportable iframe links with configurable aspect ratios for different layouts.
REPL integration: embedded surfaces can load and edit geometry content, so the experience goes beyond passive viewing.

3. Region area calculation and boolean operations improved: analysis is more complete

If you need to work with overlapping shapes, composite figures, or region logic, this release strengthens the analytical layer.

The update includes:

Boolean path operations: intersection, union, and difference for more complex region construction.
Region area calculation: direct area calculation plus contains checks.
Precision fixes: better handling of boundary precision issues, negative radii, and undefined dependencies.

This matters for:

Solving geometry problems involving overlapping areas
Verifying region relationships in teaching contexts
Building composite paths for cleaner exports
Running more stable geometry computation workflows

4. Master management is now available: keep diagram styles consistent at scale

If you produce many teaching diagrams or worksheet visuals, repeated style setup quickly becomes inefficient. The latest release adds master management to improve content production efficiency.

You can now:

Open the master panel directly from the editor tabs
Create, update, apply, and delete masters
Set default styles and preview them in real time

For teachers, geometry creators, and worksheet teams, this improves batch production more than one-off drawing speed.

5. AI geometry drawing keeps improving: a smarter geometry assistant

Dino-GSP has been pushing AI toward an executable geometry assistant, not just a chat box. This AI update is part of that broader workflow.

The main AI improvements include:

Usage and credit records: clearer tracking for AI costs and consumption.
Image upload entry points: users can upload sketches or images and be routed to image-capable models.
Better conversation tools: copy, reaction, and feedback support for a more stable interaction loop.
Clearer instruction display: formatting, truncation, and expansion improve readability for complex prompts.
Animation support: AI can help create geometry animations and assist with keyframes and motion paths.

6. Axes, grids, and algebra definitions continue to improve

Beyond the larger features, this release also includes lower-level upgrades that affect daily use.

6.1 Coordinate system and grid

Custom grid ranges are supported
Axis point selection can lock intelligently
pi and pi/2 spacing are supported
X and Y ranges, labels, and intervals are more configurable

6.2 Automatic algebra definition reordering

Object order is adjusted automatically when algebra definitions change
Circular dependency detection and error prompts are supported

7. More upgrades across drawing and sharing workflows

7.1 Geometry and drawing

New orthogonal drawing mode
Better ellipse arc editing
Added arrow styles
Dynamic anchor support for labels
Formula editor symbols better aligned with classroom math notation

7.2 Interaction and interface

Floating toolbar for union selection, color settings, and hover hints
More line and point styling options
Clearer property panel structure
Input width adjusts dynamically with expression count

7.3 Sharing and SEO

Community sharing can control whether AI chat records are public
Shared works can restrict saving and remixing
Shared pages support dynamic titles and descriptions

This makes Dino-GSP better not just for drawing, but also for distribution, discoverability, and search visibility.

8. Which day-to-day issues were fixed

This release also fixes a large number of practical issues, including:

Region computation: negative area, path restoration, arc judgment, and precision flicker
Sliders: style copying, step and speed defaults, snapping, previews, and history behavior
Selection: deselect with Shift, incorrect select-all behavior, and function graph box selection
Exports: inconsistencies across SVG, LaTeX, and Canvas, plus font embedding and clipping offsets
Tool compatibility: grid snapping, compass and transform tool errors, file jumps, and copy/paste

Try Dino-GSP

If you are comparing geometry software, math teaching tools, or embeddable dynamic geometry options, this version is now a much stronger reference point.

👉 Try Dino-GSP now

About Dino-GSP

Dino-GSP is a tool for math teaching, geometry creation, and online sharing. It combines a geometry engine, AI assistance, and professional export capabilities into a more modern geometry workflow.

Embed a Geometry Canvas in Your Webpage with One Line of Code

Luhui Dev — Tue, 17 Mar 2026 13:46:02 +0000

Introduction

Many products actually need geometry capabilities.

For example:

Online education platforms need to display geometric shapes in courses
Question bank systems need to create diagrams for math problems
AI Tutors need to draw diagrams dynamically when explaining problems
Lesson plan and courseware tools need to generate mathematical graphics

But here's the problem:

A geometry canvas is actually a very complex software system.

If you develop it yourself, you'll quickly find yourself dealing with a pile of problems:

Geometric object management (points, lines, circles, angles, curves)
Intersection calculation and constraint computation
Graphics rendering and drag-and-drop interaction
Multi-canvas management
File format and sharing system

All these capabilities combined basically constitute a complete product.

The final choice for many teams is either to use static images or integrate an existing geometry system.

Recently, we did something interesting: We turned a geometry canvas into a component that can be directly embedded in webpages.

Developers only need one line of code to put a complete geometry canvas into their own products.

A Geometry Canvas That Can Be Embedded in Webpages

The Dino-GSP（大角几何）Open Platform provides an embeddable geometry canvas SDK.

Developers can embed the geometry canvas into their own web applications just like using a frontend component.

The core concept is actually quite simple:

Your webpage
   ↓
Embed geometry canvas
   ↓
Gain complete geometry capabilities

This means:

No need to develop your own geometry engine
No need to implement geometry calculations yourself
No need to write complex interaction logic yourself

Just embed it and use it.

In the official capability design, Dino-GSP（大角几何）aims to become “geometry capability infrastructure”: through SDK, API, REPL, and other methods, making geometry capabilities embeddable in more products and systems.

The Simplest Way: Direct Embedding

If you just want to display a geometric figure, the simplest method is iframe embedding.

For example:

<iframe src="https://dajiaoai.com/e/33TA3484" width="800" height="600" allow="fullscreen"></iframe>

This way you can directly embed a geometry canvas into a webpage.

Suitable scenarios include:

Displaying geometric figures on teaching pages
Embedding mathematical graphics in blog articles
Showing dynamic figures in online textbooks

No additional development work required.

Developer Approach: Using the SDK

If you want deeper control over the canvas, such as:

Dynamically loading graphics
Switching canvases
Importing files
Calling geometry operations

You can use the SDK integration approach.

First, install the SDK:

npm install @dajiaoai/algeo-sdk

Then create a canvas on the page:

import { AlgeoSdk } from '@dajiaoai/algeo-sdk'

const container = document.getElementById('algeo-container')

const sdk = await AlgeoSdk.create(container, {
  initialId: '33TA3484'
})

This creates a geometry canvas instance.

You can then operate it through the API, for example:

Load shared content:

await sdk.loadShareById('33TA3484')

Get canvas count:

const { count } = await sdk.getSlideCount()

Switch canvas:

await sdk.switchSlide(2)

Developers can use the geometry canvas as a programmable component.

A Very Interesting Capability: REPL

In addition to regular APIs, Dino-GSP（大角几何）also provides a REPL interface.

Simply put, it means using commands to directly control the geometry system.

For example:

Define geometric objects
Query graphic states
Execute geometry operations

The REPL output is in structured text format, making it convenient for AI or Agent systems to call.

This means that in the future, not only humans can operate the canvas, but AI can also directly call geometry capabilities.

This is why we call it: AI-native geometry capability interface.

Which Products Is This Suitable For?

The embeddable geometry canvas is actually suitable for many products.

1. Online Education Platforms

Directly embed geometric figures in course pages, supporting drag-and-drop and dynamic demonstrations.

2. Question Bank Systems

Automatically generate or load geometric figures for math problems.

3. AI Tutors

Draw diagrams dynamically when explaining geometry problems.

4. Math Content Platforms

Directly embed geometric figures in articles.

5. Independent Developer Tools

Quickly build a math tool without developing your own geometry engine.

Why We Built This Open Platform

Over the past year, while working on the geometry system, I've had a deep realization: geometry capability is actually a fundamental capability for many products.

But there aren't many solutions available on the market currently—either complete software (like GeoGebra) or simple graphics libraries.

There's a lack of a way to call geometry capabilities like an API.

So what the Dino-GSP（大角几何）Open Platform hopes to do is enable more products to directly use geometry capabilities without having to reinvent the wheel.

👉 Dino-GSP（大角几何）Open Platform: open.dajiaoai.com

AlphaGeometry DSL Guide: Google Geometry DSL, defs.txt Actions, and Predicates

Luhui Dev — Sun, 08 Mar 2026 07:35:44 +0000

This article focuses only on AlphaGeometry DSL itself. It does not cover model training, search strategy, or paper results.

The goal is to treat the DSL as an engineering-facing protocol document and answer four questions:

How problem input is encoded
How actions are defined in defs.txt
How geometric relations are mapped into predicates
How numerical construction and symbolic reasoning are connected

If you want to reproduce AlphaGeometry, build a geometry data generator, or design a compatibility layer for a custom solver, understanding the DSL protocol is a prerequisite.

1. Role of the DSL

AlphaGeometry DSL is a domain-specific language for geometric construction and relation expression. It mainly serves 3 purposes:

Express initial geometric premises
Express executable construction actions
Express target geometric goals to be verified

Its output is not a final proof text, but a set of geometric relations consumable by a reasoning engine.

From an implementation perspective, the DSL is closer to an intermediate representation:

Upstream, it connects to problem descriptions or data generators
In the middle, it connects to action definitions and relation expansion
Downstream, it connects to rules.txt and DDAR reasoning

The protocol is centered on relation generation rather than diagram drawing.

2. Problem File Structure

A complete problem is usually written as:

problem_name
premises ? goal

Example:

orthocenter
a b c = triangle;
h = on_tline b a c, on_tline c a b
? perp a h b c

It can be decomposed into 3 sections:

a b c = triangle; Initial premises and free objects
h = on_tline b a c, on_tline c a b Construction based on known objects
? perp a h b c Target predicate

This DSL fragment is not a natural-language solution. It is a geometric program:

Premises
  -> constructions
  -> predicate graph
  -> goal checking

After parsing, the system usually needs to produce:

An initial object table
An action invocation sequence
An initial predicate set
A target to be checked

3. Basic Syntax Conventions

The core expression form of the DSL is:

output = action(parameters)

Example:

h = on_tline b a c, on_tline c a b

This means that point h satisfies two construction constraints at the same time:

Draw a line through b perpendicular to ac
Draw a line through c perpendicular to ab

Therefore, h is the intersection of the two perpendicular lines.

This style has two main properties:

Output variables are uniformly represented as point variables
Geometric objects are represented implicitly through point sets rather than independent object types

For example:

line a b
circle o a

line a b denotes the line defined by points a and b
circle o a denotes the circle centered at o passing through a

This design simplifies the parser and relation graph structure, but it requires the predicate system to be expressive enough to cover higher-level semantics of lines, circles, and angles.

4. Action Definition Structure in `defs.txt`

defs.txt is the action registry. Each action usually contains 5 parts:

action_name outputs inputs

variable dependency

input conditions

geometric constraints

numerical constructions

Example:

midpoint x a b
x : a b
a b = diff a b
x : coll x a b, cong x a x b
midp a b

The role of each part is as follows.

1. Action signature

midpoint x a b

This means:

The action name is midpoint
The output point is x
The input points are a b

2. Variable dependency

x : a b

This means output variable x depends on inputs a and b.

This part is typically used for dependency graphs or variable scope management.

3. Input validity conditions

a b = diff a b

This means a and b must be distinct points.

This layer is used to prevent degenerate constructions and does not directly generate proof relations.

4. Geometric constraints

x : coll x a b, cong x a x b

This means the following predicates must hold after the action is completed:

coll x a b, meaning x is collinear with a and b
cong x a x b, meaning XA = XB

This part defines the symbolic semantics of the action and is the main input consumed by the downstream reasoning system.

5. Numerical construction interface

midp a b

This means the numerical engine should invoke a midpoint construction.

Typical uses of the numerical layer include:

Generating concrete coordinate instances
Checking whether a construction degenerates
Providing numerical truth checks for predicates

5. Predicate System

Predicates are the core input format of the AlphaGeometry reasoning system.

Common core predicates include:

predicate	Meaning
`coll A B C`	Three points are collinear
`cong A B C D`	`AB = CD`
`perp A B C D`	`AB ⟂ CD`
`para A B C D`	`AB ∥ CD`
`eqangle ...`	Angles are equal
`cyclic A B C D`	Four points are concyclic

These predicates enter the rule system defined in rules.txt and trigger further inference.

A typical flow is:

construction
  -> initial predicates
  -> rule firing
  -> new predicates
  -> goal reached / not reached

So the core value of the DSL is not the action catalog itself, but its predicate generation capability.

Whether an action is useful mainly depends on:

Which predicates it introduces
Whether those predicates are likely to trigger rule chains
Whether they significantly shorten the proof path to the goal

6. Common Action Types

1. Basic objects

free a
triangle a b c
quadrangle a b c d

free generates a free point
triangle generates the 3 base points of a triangle
quadrangle generates the 4 base points of a quadrilateral

2. Points on a line or circle

on_line x a b
on_circle x o a
on_pline x a b c
on_tline x a b c

on_line corresponds to a collinearity constraint
on_circle corresponds to an equal-radius constraint
on_pline corresponds to a parallel constraint
on_tline corresponds to a perpendicular constraint

3. Intersection constructions

intersection_ll x a b c d
intersection_lc x a o b
intersection_cc x o w a

These represent:

line-line intersection
line-circle intersection
circle-circle intersection

4. Basic geometric constructions

midpoint x a b
foot x a b c
mirror x a b

midpoint generates a midpoint
foot generates a foot of the perpendicular
mirror generates a symmetric point

The key property of these actions is that a single invocation can introduce multiple high-density relations.

5. Triangle centers

For example:

circumcenter x a b c
incenter x a b c
excenter x a b c
centroid x y z i a b c
ninepoints x y z i a b c

These actions typically introduce multiple relation groups at once, such as equidistance, angle bisection, and perpendicular bisectors.

6. Special polygons

For example:

square a b x y
rectangle a b c d
parallelogram a b c x
trapezoid a b c d
eq_trapezoid a b c d

These primitives have stronger initial relations and are better suited for generating structured problems or high-constraint training samples.

7. Worked Example

Example:

orthocenter
a b c = triangle;
h = on_tline b a c, on_tline c a b
? perp a h b c

The execution process is as follows.

Step 1: Parse the premises

triangle a b c generates the base point set and non-degeneracy conditions.

Step 2: Execute the construction

h is defined as the intersection of the following two constraints:

A line through b perpendicular to ac
A line through c perpendicular to ab

Step 3: Materialize predicates

The construction is converted into:

perp b h a c
perp c h a b

Step 4: Verify the goal

The system checks whether it can derive:

perp a h b c

If yes, the goal is established. Otherwise, the problem is not proved under the current construction and rule set.

8. Execution Pipeline

From problem text to goal verification, the typical pipeline is:

Problem DSL
  -> parse.py
  -> action expansion via defs.txt
  -> geometry graph
  -> predicate inference via rules.txt
  -> DDAR solver

The responsibility of each stage is:

Parse the problem text and identify premises, constructions, and goals
Look up the corresponding action definition in defs.txt
Expand variable dependencies, input conditions, and geometric constraints
Write the constraints into the geometry relation graph
Trigger new predicates according to rules.txt
Run reachability checks against the target predicate

Numerical construction and symbolic reasoning usually coexist in parallel in this pipeline:

The numerical layer handles instantiation and truth checking
The symbolic layer handles strict inference and proof tracing

9. Implementation Notes

1. Action design should optimize for relation output

Whether an action is worth keeping should be judged by the quality of the predicates it introduces, not by whether the geometric meaning feels intuitive.

2. Degeneracy must be handled explicitly

Cases such as coincident points, parallel lines without an intersection, and zero-radius circles should be intercepted either in input conditions or in the numerical layer.

3. Predicate coverage determines the expressive ceiling

If the system can only express collinearity, parallelism, and perpendicularity, the representational power for harder geometry problems will become limited very quickly.

4. The numerical interface should not be omitted

If symbolic definitions exist without numerical construction interfaces, the cost of data generation, debugging, and truth checking rises substantially.

10. Recommended Minimal Implementable Subset

If you want a minimal version compatible with the AlphaGeometry approach, prioritize support for the following actions:

triangle
on_line
on_tline
on_pline
intersection_ll
midpoint
foot

And support at least the following predicates:

coll
perp
para
cong
cyclic
eqangle

This is a small but workable protocol core.

11. Protocol Essence

AlphaGeometry DSL can be summarized as:

Geometry Construction DSL
+ Predicate Interface
+ Rule-System Input Layer

Its main value is not in describing diagrams, but in compressing geometry problems into an executable, verifiable, and inferable protocol layer.

12. Reliable Recommendation: Dino-GSP

If you need a geometry representation environment that is more open than AlphaGeometry DSL and better suited for product and ecosystem integration, take a look at Dino-GSP.

It also represents geometric objects as executable structures and defines its own DSL and constraint representation layer to support:

More open geometry construction and editing workflows
Ecosystem integration for teaching, content production, and AI geometry applications
Programmable figure generation, constraint validation, and auxiliary structure construction

If AlphaGeometry DSL is closer to an internal protocol for solvers and research systems, Dino-GSP is closer to an extensible product layer and an open ecosystem interface.