<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Milky</title>
    <description>The latest articles on Forem by Milky (@milky2018).</description>
    <link>https://forem.com/milky2018</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3676498%2F7e06eca9-ecf6-45ca-bc2d-b3a50e6b092e.jpeg</url>
      <title>Forem: Milky</title>
      <link>https://forem.com/milky2018</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/milky2018"/>
    <language>en</language>
    <item>
      <title>AAoM-04: A Python 3.12 Interpreter</title>
      <dc:creator>Milky</dc:creator>
      <pubDate>Fri, 30 Jan 2026 03:32:05 +0000</pubDate>
      <link>https://forem.com/milky2018/aaom-04-a-python-312-interpreter-24fl</link>
      <guid>https://forem.com/milky2018/aaom-04-a-python-312-interpreter-24fl</guid>
      <description>&lt;h1&gt;
  
  
  AAoM-04: A Python 3.12 Interpreter
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;January 2026&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Happy New Year! This entry covers moonpython: a Python interpreter in MoonBit. After about two weeks of vibe coding with Codex (GPT-5.2 &amp;amp; GPT-5.2-Codex), the project now runs a large, pragmatic subset of Python 3.12. &lt;/p&gt;

&lt;h2&gt;
  
  
  Toolchain Updates
&lt;/h2&gt;

&lt;p&gt;This time I used Codex CLI with GPT-5.2 and kept three MoonBit skills active (same set as before: &lt;code&gt;moonbit-lang&lt;/code&gt;, &lt;code&gt;moonbit-agent-guide&lt;/code&gt;, &lt;code&gt;moon-ide&lt;/code&gt;). Codex is slower than Claude but much more stable on long, reasoning-heavy tasks. That tradeoff would work well for a language interpreter. &lt;/p&gt;

&lt;p&gt;The &lt;code&gt;moon-ide&lt;/code&gt; recently gained three commands that are especially handy in a growing interpreter codebase: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;hover&lt;/code&gt; shows hover information for a symbol.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;outline&lt;/code&gt; shows an outline of a specified file.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;doc&lt;/code&gt; shows documentation for a symbol.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Codex can use these tools very skillfully.&lt;/p&gt;

&lt;h2&gt;
  
  
  Problem
&lt;/h2&gt;

&lt;p&gt;A useful subset of Python needs to capture at least: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dynamic semantics: scoping, closures, globals/nonlocals, and the descriptor model.&lt;/li&gt;
&lt;li&gt;Generators, async/await, and exception handling&lt;/li&gt;
&lt;li&gt;Pragmatic import support and enough builtins to run real scripts.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I targeted Python 3.12 semantics, but intentionally skipped full stdlib parity, C extensions, packaging, and bytecode compatibility. The aim is correctness for a useful subset, a clean architecture, and repeatable testing. More specifically, moonpython is meant to be used as a library to run real Python snippets, not as a CPython replacement for large production-scale projects. Given that scope, a JIT is poor ROI: it usually demands deep, platform-specific optimization for each OS and architecture, and the effort is often 5-10x the cost of building the interpreter itself.&lt;/p&gt;

&lt;h3&gt;
  
  
  Runtime Reality Check
&lt;/h3&gt;

&lt;p&gt;During implementation I kept adding new features: most builtins, async/await, generators, type hints, exception groups, and so on. Yet real-world Python projects still failed to run. The lesson was clear: the hardest part of an industrial-strength interpreter is not syntax coverage, but whether the runtime is dirty enough.&lt;/p&gt;

&lt;p&gt;By "dirty," I mean the unglamorous compatibility details that real code quietly depends on: import caching rules, path search order, namespace packages, module metadata (&lt;code&gt;__file__&lt;/code&gt;, &lt;code&gt;__package__&lt;/code&gt;, &lt;code&gt;__spec__&lt;/code&gt;), descriptor binding semantics, edge-case exception types, and even tiny differences in string/float formatting. A clean design is not enough; you have to copy CPython's weird corners. For example, &lt;code&gt;__file__&lt;/code&gt; is not part of "Python-the-language", but for file-backed modules it is a widely relied-upon convention that real projects assume exists. In practice, this means the import system must be almost CPython-identical, the object model has to match descriptor and attribute rules, and error messages must be stable. This is where most of the remaining work lies, not in adding new syntax nodes. &lt;/p&gt;

&lt;p&gt;Python has documentation and PEPs, but there is no single executable specification; on the messy edges (especially imports), compatibility is ultimately defined by what CPython does and what the ecosystem expects. This is also why I treat CPython behavior as the ground truth: it is at least verifiable, and that matters enormously when you are building with AI in the loop. The downside is that it pushes moonpython toward a lot of edge-case handling code, because matching a living ecosystem inevitably means handling its corners.&lt;/p&gt;

&lt;p&gt;If you want a vivid reminder of how these "small" features accumulate into ecosystem constraints, Armin Ronacher’s classic post &lt;a href="https://lucumr.pocoo.org/2014/8/24/revenge-of-the-types/" rel="noopener noreferrer"&gt;Revenge of the Types&lt;/a&gt; is a great read.&lt;/p&gt;

&lt;h2&gt;
  
  
  Approach
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Tests Generation
&lt;/h3&gt;

&lt;p&gt;Following the previous AAoM pattern, I built the test harness first. The script &lt;code&gt;scripts/generate_spec_tests.py&lt;/code&gt; harvests a subset of snippets from CPython &lt;code&gt;Lib/test&lt;/code&gt;, runs the snippets under a restricted builtins set and emits MoonBit snapshot tests into &lt;code&gt;spec_generated_test.mbt&lt;/code&gt;. That single file contains 2,709 generated tests. Together with the (AI) hand-written tests, the suite currently has 2,894 tests. &lt;/p&gt;

&lt;p&gt;A typical generated test looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;test "generated/expr/0001" {
  let source =
    #|'This string will not include \
    #|backslashes or newline characters.'
  let result = Interpreter::new_spec().eval_source(source)
  let expected = "[\"ok\", [\"Str\", \"This string will not include backslashes or newline characters.\"], \"\", \"\"]"
  assert_run(result, expected)
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Limitations are inevitable. The harvesting is heuristic and can miss real-world patterns. The sandboxed evaluator only allows a small builtins set and a few imports, so many library behaviors are simply out of scope. Snippets that take too long are skipped, and some values (like NaN/Inf or arbitrary objects) are deliberately excluded from serialization. Error reporting is normalized rather than bit-for-bit identical with CPython. I avoided generating too unbounded tests early on purpose. Based on my experience writing a &lt;a href="https://github.com/Milky2018/wasmoon" rel="noopener noreferrer"&gt;WebAssembly runtime&lt;/a&gt;, if I did that, there was a good chance moonpython passes literally zero tests at the start. When everything is red, the agent may tend to over-engineer locally (especially in the parser) to satisfy a huge, noisy failure surface, and it becomes hard to enter an iterative development rhythm. &lt;/p&gt;

&lt;p&gt;So I kept the generator constrained to produce a manageable bootstrap suite which is enough to guide the first implementation and validate core semantics, but not so much that it prevents early wins. Once the interpreter becomes usable, the CPython &lt;code&gt;Lib/test&lt;/code&gt; suite is the real endgame. Of course, I never expected the interpreter to pass them completely.&lt;/p&gt;

&lt;h3&gt;
  
  
  Long-time Grinding
&lt;/h3&gt;

&lt;p&gt;These two weeks were effectively 7x24 for Codex. I would check in a few times a day, but most of the time the agent was working on its own. In total, Codex initiated 104 commits during this period.&lt;/p&gt;

&lt;p&gt;At the beginning of the conversation, we repeatedly discussed shaping the spec-driven test harness and stabilizing CPython evaluation under a restricted builtins set. Once the generator was in place, I let Codex run unattended and only stepped in when the failure rate stopped decreasing. Intervention was not about fixing individual bugs. I would interrupt Codex only when I noticed clear conceptual errors. One recurring example was a misunderstanding of how "import" fields in &lt;code&gt;moon.pkg.json&lt;/code&gt; should be authored, which led Codex to apply the same incorrect pattern repeatedly. When that happened, I wrote the correct convention into &lt;code&gt;AGENTS.md&lt;/code&gt; or a &lt;code&gt;SKILL.md&lt;/code&gt;, and then restarted the agent using &lt;code&gt;codex resume --last&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;The interpreter is a direct AST evaluator (which may not be the best choice. I will discuss this later). &lt;/p&gt;

&lt;p&gt;The project layout: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;lexer.mbt&lt;/code&gt; and &lt;code&gt;parser.mbt&lt;/code&gt; implement a Python 3.12 grammar.&lt;/li&gt;
&lt;li&gt;A dedicated spec file defines the public AST and value model.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;runtime_*.mbt&lt;/code&gt; implements the syntax including builtins, scoping, exceptions, generators, and object model.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;cmd/main&lt;/code&gt; and &lt;code&gt;cmd/repl&lt;/code&gt; provide a runner and a simple REPL. Besides, moonpython can also be used as a library. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More language features were implemented than I had expected:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.12 syntax for &lt;code&gt;match&lt;/code&gt;, &lt;code&gt;with&lt;/code&gt;, and full f-strings.&lt;/li&gt;
&lt;li&gt;Generators (&lt;code&gt;yield&lt;/code&gt;, &lt;code&gt;yield from&lt;/code&gt;) and async (&lt;code&gt;async def&lt;/code&gt;, &lt;code&gt;await&lt;/code&gt;, &lt;code&gt;async for/with&lt;/code&gt;, async generators).&lt;/li&gt;
&lt;li&gt;Exception groups and &lt;code&gt;except*&lt;/code&gt; (PEP 654), plus tracebacks with line/column spans.&lt;/li&gt;
&lt;li&gt;Type parameter syntax (PEP 695) parsed and preserved in the AST (runtime no-op for now).&lt;/li&gt;
&lt;li&gt;Core data model: big ints, floats, complex numbers, bytes/bytearray, lists/tuples/sets/dicts, slicing assignment.&lt;/li&gt;
&lt;li&gt;A file-based import system and a vendored CPython &lt;code&gt;Lib/&lt;/code&gt; snapshot for pure-Python modules.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can try some &lt;a href="https://github.com/moonbit-community/moonpython/tree/main/examples" rel="noopener noreferrer"&gt;real world Python programs&lt;/a&gt; with moonpython: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8nks901w0dzdrnisrfpj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8nks901w0dzdrnisrfpj.png" width="800" height="303"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;All generated 2,894 tests passed. So, how many tests in CPython &lt;code&gt;Lib/test&lt;/code&gt; has passed? Well, zero at the moment, because the suite currently aborts early due to missing support for variable annotations (PEP 526). This feature might have been supported when this post is published. Anyway, to pass the whole &lt;code&gt;Lib/test&lt;/code&gt; is still a long-term goal, and I fully expect it to land as the project matures.&lt;/p&gt;

&lt;p&gt;Time Investment:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One day to find and download the factual standards;&lt;/li&gt;
&lt;li&gt;Then, about two weeks of active development, mostly spent on the runtime (scoping + generators/async) and on shrinking the long tail of test failures.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Reflections and Takeaways
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Implementation Code Is Far from Clean
&lt;/h3&gt;

&lt;p&gt;As I said, useful tools have to be dirty because the problems in the real world are dirty. Import in Python is a great example of why the runtime must be dirty: because the spec is messy. Fortunately, this is exactly the kind of dirty work that AI is good at: it can grind through edge cases, keep the bookkeeping consistent, and iterate until imports behave like the ecosystem expects.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is AST-walking Interpretation a Good Idea?
&lt;/h3&gt;

&lt;p&gt;Another counterintuitive lesson is that a pure AST-walking interpreter is not always simpler than "compile to bytecode &amp;amp; run a bytecode VM". &lt;/p&gt;

&lt;p&gt;When Codex first presented this trivial design, I didn't object because it was the most straightforward solution. But once you need suspend/resume semantics like generators and async/await, plus correct try/finally, and fine-grained tracebacks, an AST interpreter has to handle defunctionalized continuations manually and often ends up re-creating a minimal VM anyway. Even worse, with an AST-walking interpreter, there is nowhere to apply any analysis and optimizations (that's why you need to &lt;a href="https://www.moonbitlang.com/blog/moonbit-static-analysis" rel="noopener noreferrer"&gt;analyze the programs on a IR&lt;/a&gt;).&lt;/p&gt;

&lt;h3&gt;
  
  
  Which Model Should We Choose?
&lt;/h3&gt;

&lt;p&gt;I also used &lt;a href="https://openai.com/index/introducing-gpt-5-2-codex/" rel="noopener noreferrer"&gt;GPT-5.2-Codex&lt;/a&gt; for days which is the latest agentic coding model released by OpenAI. But based on my observations, GPT-5.2-Codex is not always superior to GPT-5.2. In fact, for non-programming tasks, GPT-5.2 is clearly better than GPT-5.2-Codex. Overall, Codex (GPT-5.2) works well, though it usually takes several times longer than Claude (Opus-4.5) to complete easy tasks. &lt;/p&gt;

&lt;p&gt;Although Codex is slow, its accuracy is significantly higher than Claude's, and it almost never causes rework. Claude, by contrast, often tries to tackle difficult problems through repeated trial-and-error; when an attempt fails, it frequently reverts all changes with git checkout, only to head back into the same dead end. &lt;/p&gt;

&lt;p&gt;In this respect, Codex is highly reassuring. You can safely hand tasks over to it and step in only to update your SKILLs with necessary guidance. Once the rules were made explicit, Codex usually absorbed them and continued working productively with little further guidance. Btw, MoonBit's strong typing and predictable tooling turned out to be particularly helpful once the runtime logic grew large.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multiple Codexes Work Together
&lt;/h3&gt;

&lt;p&gt;In parallel, I was also using Codex to build other software. We increasingly do not need to micromanage AI, today. Given enough time and a solid test loop, the AI can make major progress with minimal human interaction. That suggests a better workflow: keep multiple AI sessions open in separate terminals and let them work concurrently.&lt;/p&gt;

&lt;p&gt;In theory, with &lt;code&gt;git worktree&lt;/code&gt;, having multiple Codexes develop the same project in parallel should not be hard either. The real cost is the merge: resolving conflicts is a heavy cognitive load, especially when the changes are large and cross-cutting. For now I am having each Codex develop a separate project and will only try multi-agent same-repo work when the payoff is more clear. At the moment, I am developing 5–8 projects in parallel with 5–8 coding agents. Next time, I will share some of the results.&lt;/p&gt;




&lt;p&gt;Code is available on &lt;a href="https://github.com/moonbit-community/moonpython" rel="noopener noreferrer"&gt;github&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>vibecoding</category>
      <category>moonbit</category>
      <category>python</category>
    </item>
    <item>
      <title>AAoM-03: A WHATWG HTML5 Parser, Driven by Tests</title>
      <dc:creator>Milky</dc:creator>
      <pubDate>Wed, 21 Jan 2026 08:48:42 +0000</pubDate>
      <link>https://forem.com/milky2018/aaom-03-a-whatwg-html5-parser-driven-by-tests-40o7</link>
      <guid>https://forem.com/milky2018/aaom-03-a-whatwg-html5-parser-driven-by-tests-40o7</guid>
      <description>&lt;p&gt;&lt;em&gt;December 2025&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This post covers &lt;code&gt;aaom-html&lt;/code&gt;: a WHATWG HTML5 parser implemented in MoonBit. As with the previous entries in this series, I treat the AI as a high-intensity pair programmer: I provide the goal, constraints, and review; Claude does the rapid implementation and iteration. The key was not "write a tokenizer and a tree builder first", but &lt;strong&gt;build the test harness first&lt;/strong&gt;—after that, progress becomes a mostly automated grind of reducing failures to zero.&lt;/p&gt;

&lt;h2&gt;
  
  
  Skill
&lt;/h2&gt;

&lt;p&gt;This time, I use a moonbit-library-builder skill turned from experiences in building parsers with agents during the last two &lt;strong&gt;AAoM&lt;/strong&gt; posts. The header looks like as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;moonbit-library-builder&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Build&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;MoonBit&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;libraries&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;using&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;spec-driven&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;test-driven&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;development.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Use&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;when&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;asked&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;implement&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;library,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;port&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;library&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;from&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;another&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;language&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;(JS,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Rust,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Go,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Python),&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;create&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;parser/compiler,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;or&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;build&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;any&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;substantial&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;MoonBit&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;package.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Triggers&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;on&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;requests&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;like&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;implement&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;X&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;in&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;MoonBit&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;port&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Y&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;library&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;MoonBit&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;create&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Z&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;parser&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;or&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;build&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;template&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;engine&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;."&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The section titles are like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Workflow&lt;/span&gt;

&lt;span class="gu"&gt;### Phase 1: Gather Specs&lt;/span&gt;

&lt;span class="gu"&gt;### Phase 2: Write Tests First&lt;/span&gt;

&lt;span class="gu"&gt;### Phase 3: Implement Incrementally&lt;/span&gt;

&lt;span class="gu"&gt;## Testing Patterns&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I also annotate &lt;strong&gt;Common Pitfalls&lt;/strong&gt; in parsers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Common Pitfalls&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**String indexing**&lt;/span&gt;: &lt;span class="sb"&gt;`s[i]`&lt;/span&gt; returns &lt;span class="sb"&gt;`UInt16`&lt;/span&gt;, not &lt;span class="sb"&gt;`Char`&lt;/span&gt;. Use &lt;span class="sb"&gt;`for char in str`&lt;/span&gt; for char access
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**Mutable fields**&lt;/span&gt;: Use &lt;span class="sb"&gt;`mut`&lt;/span&gt; keyword in struct field declarations
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Approach
&lt;/h2&gt;

&lt;p&gt;HTML5 is not hard because of syntax. It’s hard because of browser-grade error recovery: insertion modes, the adoption agency algorithm, foster parenting, foreign content (SVG/MathML), and a lot of stateful edge cases. It has no context-free grammar. Instead, it needs an extremely complicated state machine to tokenize and parse. Without a canonical test suite, you end up guessing.&lt;/p&gt;

&lt;p&gt;My initial command was simple: "Implement a HTML5 spec‑compliant parser with WHATWG state machine, malformed input recovery". This command instructed Claude make a plan for test&lt;br&gt;
suite generation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3dvu6zn5h36don6b6g7s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3dvu6zn5h36don6b6g7s.png" alt=" " width="800" height="206"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Unfortunately, the number of tests was huge (~8k) and the test-gen script got out of memory. Claude splitted the tests into batches, each including up to 500 tests. I knew there were better ways to overcome this problem, but since Claude had already solved it, I didn't ask him to make any further changes. After all, the scripts are not vital to this project. &lt;/p&gt;

&lt;p&gt;As results, 14 tokenizer test files and 4 tree test files were generated. Since each test in &lt;code&gt;html5lib-tests&lt;/code&gt; already includes expected output, e.g. for tokenizing,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"PLAINTEXT content model flag"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"initialStates"&lt;/span&gt;&lt;span class="p"&gt;:[&lt;/span&gt;&lt;span class="s2"&gt;"PLAINTEXT state"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"lastStartTag"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"plaintext"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"input"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;head&amp;gt;&amp;amp;body;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"output"&lt;/span&gt;&lt;span class="p"&gt;:[[&lt;/span&gt;&lt;span class="s2"&gt;"Character"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;head&amp;gt;&amp;amp;body;"&lt;/span&gt;&lt;span class="p"&gt;]]}&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and for trees:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;#data&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;&amp;lt;a&amp;gt;&amp;lt;p&amp;gt;&amp;lt;/a&amp;gt;&amp;lt;/p&amp;gt;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;#errors&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="err"&gt;):&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;expected-doctype-but-got-start-tag&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="err"&gt;):&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;adoption-agency&lt;/span&gt;&lt;span class="mf"&gt;-1.3&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;#document&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;&amp;lt;html&amp;gt;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;|&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;&amp;lt;head&amp;gt;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;|&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;&amp;lt;body&amp;gt;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;|&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="err"&gt;&amp;lt;a&amp;gt;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;|&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="err"&gt;&amp;lt;p&amp;gt;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;|&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="err"&gt;&amp;lt;a&amp;gt;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;we don’t re-run a reference parser to compute expected trees—the &lt;code&gt;.dat&lt;/code&gt; files already contain canonical expected output. The job is to make our &lt;code&gt;doc.dump()&lt;/code&gt; stable and compatible with those expectations.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// for tokenizing
test "html5lib/tokenizer/namedEntities_bad_named_entity_hat_without_a_semicolon_492" {
  let (tokens, _) = @html.tokenize("&amp;amp;Hat")
  inspect(
    tokens,
    content="[Character('&amp;amp;'), Character('H'), Character('a'), Character('t'), EOF]",
  )
}

// for trees
test "html5lib/tree/adoption01_2" {
  let doc = @html.parse("&amp;lt;a&amp;gt;1&amp;lt;button&amp;gt;2&amp;lt;/a&amp;gt;3&amp;lt;/button&amp;gt;")
  inspect(
    doc.dump(),
    content=(
      #|&amp;lt;html&amp;gt;
      #|  &amp;lt;head&amp;gt;
      #|  &amp;lt;body&amp;gt;
      #|    &amp;lt;a&amp;gt;
      #|      "1"
      #|    &amp;lt;button&amp;gt;
      #|      &amp;lt;a&amp;gt;
      #|        "2"
      #|      "3"
    ),
  )
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Meanwhile, Claude also wrote a Python script to generate entities. It was amazing! Let me briefly explain this. HTML5 defines 2231 &lt;strong&gt;Named Character References&lt;/strong&gt; such as &lt;code&gt;&amp;amp;amp; &amp;amp;nbsp; &amp;amp;NotEqualTilde;&lt;/code&gt;. Claude downloaded the official specification from &lt;a href="https://html.spec.whatwg.org/entities.json" rel="noopener noreferrer"&gt;https://html.spec.whatwg.org/entities.json&lt;/a&gt; and then transformed them into &lt;code&gt;entities.mbt&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;fn init_entities() -&amp;gt; Map[String, Array[Int]] {
  let m : Map[String, Array[Int]] = Map::new()
  m["AElig"] = [198]
  m["AElig;"] = [198]
  ...
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Generating tests is just the start. The time sink is making output stable so the suite is usable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Remove the &lt;code&gt;|&lt;/code&gt; prefix from tree dumps (cleaner snapshots)&lt;/li&gt;
&lt;li&gt;Stable attribute sorting (lexicographic)&lt;/li&gt;
&lt;li&gt;Escaping control characters and C1 controls&lt;/li&gt;
&lt;li&gt;Matching MoonBit &lt;code&gt;inspect&lt;/code&gt; escaping rules (&lt;code&gt;\b&lt;/code&gt;, &lt;code&gt;\u{0X}&lt;/code&gt;, soft hyphen, noncharacters, etc.)&lt;/li&gt;
&lt;li&gt;Using &lt;code&gt;#|&lt;/code&gt; multi-line strings for readable expected trees&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I can hardly imagine how painful it would be to debug the script by myself on this.&lt;/p&gt;

&lt;p&gt;Once the entity definitions and conformance tests were done, the rest was straightforward. With the command &lt;code&gt;Continue until finishing all 8251 tests&lt;/code&gt;, Claude worked continuously for nearly 6 hours, submitted 23 code commits. When I checked his work status again, he had already processed 8244/8251 tests. But sadly, he wasted nearly 3 hours on the final 7 tests without solving a single one. He was going in circles, getting stuck on the same point, constantly trying to write new code, finding no solution, then deleting the code and repeating the same useless attempts. I asked Claude to think deeper but in vein. &lt;/p&gt;

&lt;p&gt;I suddenly remembered a news story I'd heard recently: GPT-5.2 was better at programming than Opus-4.5. I thought, why not let Codex (GPT-5.2) try to solve these seven edge cases? Well, I launched CodeX, selected model GPT-5.2, and chose the extra high reasoning level. It must be said that he thought very slowly. But after about ten minutes, he came up with a very clear solution and solved the problem in another ten minutes. The value wasn’t "more code faster", but a cleaner causal chain through the state machine (stack traces -&amp;gt; tokenizer/tree builder trigger -&amp;gt; specific transition), leading to a fix quickly. That intelligence surprised me.&lt;/p&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;8251 tests (including conformance tests and smoke tests) passed. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Full WHATWG HTML5 specification compliance&lt;/li&gt;
&lt;li&gt;80 tokenizer states&lt;/li&gt;
&lt;li&gt;25 tree construction insertion modes&lt;/li&gt;
&lt;li&gt;49 parse error types with graceful recovery&lt;/li&gt;
&lt;li&gt;2,231 named character references&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Reflections
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The last 1% is reasoning-heavy.&lt;/strong&gt; What surprised me most was GPT-5.2's ability to handle difficult problems. After completing this &lt;strong&gt;AAoM&lt;/strong&gt;, I used GPT-5.2 to assist with other projects originally developed with Claude. In contrast, Codex didn't have the forgetfulness that Claude had, as I mentioned before. However, Codex's user experience, speed, and compliance are far inferior to Claude's. The best practice I've found so far is to have Claude do the actual work, and then have Codex review it simultaneously, which works very well. &lt;/p&gt;

&lt;p&gt;Time investment: ~7 hours of active development:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;2 minutes: Elaborate the moonbit-library-builder skill. &lt;/li&gt;
&lt;li&gt;~3 hours: Without human intervention, download html5lib-tests, writing test-gen scripts, implement the basic features and pass most tests.&lt;/li&gt;
&lt;li&gt;~3 hour: Try to handle the remaining 7 edge cases, and failed.&lt;/li&gt;
&lt;li&gt;~0.5 hour: GPT-5.2 help solving the remaining 7 edge cases.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Code is available on &lt;a href="https://github.com/moonbit-community/html5-mbt" rel="noopener noreferrer"&gt;github&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I haven't decided what AAoM Day 4 will be yet, but I’m increasingly convinced: for spec-heavy projects, the best use of AI isn't "write everything", it’s "make it converge inside a strong testing loop". Maybe I will try something different like interpreters for a subset of ECMAScript 2025 or Python3.&lt;/p&gt;

</description>
      <category>vibecoding</category>
      <category>moonbit</category>
    </item>
    <item>
      <title>AAoM-02: XML Parser with W3C Conformance</title>
      <dc:creator>Milky</dc:creator>
      <pubDate>Wed, 14 Jan 2026 02:32:50 +0000</pubDate>
      <link>https://forem.com/milky2018/aaom-02-xml-parser-with-w3c-conformance-5a2f</link>
      <guid>https://forem.com/milky2018/aaom-02-xml-parser-with-w3c-conformance-5a2f</guid>
      <description>&lt;p&gt;&lt;em&gt;December 2025&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Continuing the Agentic Adventures of MoonBit series, this time I tackle XML parsing. The goal: build a streaming XML parser that passes the official W3C XML Conformance Test Suite. &lt;/p&gt;

&lt;h2&gt;
  
  
  Skill
&lt;/h2&gt;

&lt;p&gt;I'm still using Claude Code (Opus 4.5) with the &lt;a href="https://github.com/moonbitlang/system-prompt" rel="noopener noreferrer"&gt;MoonBit system prompt and IDE skill&lt;/a&gt;. Moreover, I create a new skill named &lt;code&gt;moonbit-lang&lt;/code&gt; to inform AI to be aware of the best practices and common pitfalls for the MoonBit language. The header looks like as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;moonbit-lang&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MoonBit&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;language&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;reference&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;coding&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;conventions.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Use&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;when&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;writing&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;MoonBit&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;code,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;asking&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;about&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;syntax,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;or&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;encountering&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;MoonBit-specific&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;errors.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Covers&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;handling,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;FFI,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;async,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;common&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;pitfalls."&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="c1"&gt;# MoonBit Language Reference&lt;/span&gt;

&lt;span class="err"&gt;@&lt;/span&gt;&lt;span class="s"&gt;reference/fundamentals.md&lt;/span&gt;
&lt;span class="err"&gt;@&lt;/span&gt;&lt;span class="s"&gt;reference/error-handling.md&lt;/span&gt;
&lt;span class="err"&gt;@&lt;/span&gt;&lt;span class="s"&gt;reference/ffi.md&lt;/span&gt;
&lt;span class="err"&gt;@&lt;/span&gt;&lt;span class="s"&gt;reference/async-experimental.md&lt;/span&gt;
&lt;span class="err"&gt;@&lt;/span&gt;&lt;span class="s"&gt;reference/package.md&lt;/span&gt;
&lt;span class="err"&gt;@&lt;/span&gt;&lt;span class="s"&gt;reference/toml-parser-parser.mbt&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this skill doc, I also mention the official file I/O package &lt;code&gt;moonbitlang/x/fs&lt;/code&gt; which AI is not familiar with. The complete skill doc and references can be accessed at &lt;a href="https://github.com/moonbitlang/skills" rel="noopener noreferrer"&gt;github&lt;/a&gt;, where I will continuously update the skills I use.&lt;/p&gt;

&lt;p&gt;AI (both Codex and Claude) will only read the description at startup, and then read the rest when needed. Even so, I keep the skill doc simple because based on my experience, any document with excessively long content will hinder the AI's ability to understand the details.&lt;/p&gt;

&lt;h2&gt;
  
  
  Problem
&lt;/h2&gt;

&lt;p&gt;XML remains ubiquitous in configuration files, data interchange, and legacy systems. A conformant XML parser must handle:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Element tags, attributes, and namespaces&lt;/li&gt;
&lt;li&gt;Entity references (&lt;code&gt;&amp;amp;lt;&lt;/code&gt;, &lt;code&gt;&amp;amp;amp;&lt;/code&gt;, custom entities)&lt;/li&gt;
&lt;li&gt;CDATA sections and comments&lt;/li&gt;
&lt;li&gt;Processing instructions and XML declarations&lt;/li&gt;
&lt;li&gt;DTD internal subsets with entity declarations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;My goal is to implement XML 1.0 with namescope, entities and DTD. The challenge is that XML has many edge cases specified in the W3C standard. Rather than guessing what's correct, I use the official test suite as ground truth.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tests Generation
&lt;/h2&gt;

&lt;p&gt;First, I download (effectively, let Claude download) the official W3C XML Conformance Test Suite:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-L&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; xmlts.tar.gz &lt;span class="s2"&gt;"https://www.w3.org/XML/Test/xmlts20130923.tar.gz"&lt;/span&gt;
&lt;span class="nb"&gt;tar&lt;/span&gt; &lt;span class="nt"&gt;-xzf&lt;/span&gt; xmlts.tar.gz &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;mv &lt;/span&gt;xmlconf &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;rm &lt;/span&gt;xmlts.tar.gz
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It contains lots of valid and not-well-formed XML documents. I let Claude to make a script &lt;code&gt;generate_conformance_tests.py&lt;/code&gt; for generating snapshot tests in MoonBit based on the official test suite.&lt;/p&gt;

&lt;p&gt;How to get the expected snapshot contents? Initially, I had Claude use quick-xml (Rust) as the reference parser. This worked for most tests, but quick-xml is intentionally lenient in some cases where strict XML compliance requires rejection. After hitting 23 test failures due to leniency differences, I switched to libxml2 (via Python's lxml) as the reference. libxml2 is the de-facto standard XML parser and matches W3C conformance closely. &lt;/p&gt;

&lt;p&gt;Finally, the generated tests looked like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="c1"&gt;// valid&lt;/span&gt;
&lt;span class="n"&gt;test&lt;/span&gt; &lt;span class="s"&gt;"w3c/valid/valid_sa_001"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Test demonstrates an Element Type Declaration with Mixed ...&lt;/span&gt;
  &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;xml&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"&amp;lt;!DOCTYPE doc [&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;!ELEMENT doc (#PCDATA)&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;]&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;doc&amp;gt;&amp;lt;/doc&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
  &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;Reader&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;from_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xml&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
  &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;match&lt;/span&gt; &lt;span class="n"&gt;reader&lt;/span&gt;&lt;span class="nf"&gt;.read_event&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="n"&gt;Eof&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="nf"&gt;.push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Eof&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;break&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="nf"&gt;.push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nf"&gt;inspect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nf"&gt;to_libxml_format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"[DocType(&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;doc&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;), Empty({name: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;doc&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;, attributes: []}), Eof]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// not well formed&lt;/span&gt;
&lt;span class="n"&gt;test&lt;/span&gt; &lt;span class="s"&gt;"w3c/not-wf/not_wf_sa_001"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Attribute values must start with attribute names, not "?".&lt;/span&gt;
  &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;xml&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"&amp;lt;doc&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;doc&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;?&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;a&amp;lt;/a&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;/doc&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
  &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;Reader&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;from_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xml&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;has_error&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="n"&gt;reader&lt;/span&gt;&lt;span class="nf"&gt;.read_event&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="n"&gt;noraise&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="n"&gt;Eof&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;
      &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;continue&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nf"&gt;inspect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;has_error&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"true"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A total of 735 tests were generated, comprising 14k lines of code. Including other tests manually written by the Claude ​​afterwards, the total number of tests is 800.&lt;/p&gt;

&lt;h2&gt;
  
  
  Parser Implementation
&lt;/h2&gt;

&lt;p&gt;Since quick-xml was the initial implementation reference, Claude followed a pull-parser architecture inspired by quick-xml, which I thought was OK for our goal. The APIs look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;xml&lt;/span&gt;&lt;span class="py"&gt;.Reader&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;from_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xml&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;match&lt;/span&gt; &lt;span class="n"&gt;reader&lt;/span&gt;&lt;span class="nf"&gt;.read_event&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Eof&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;
    &lt;span class="nf"&gt;Start&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;elem&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Start: &lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;{elem.name}"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;End&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"End: &lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;{name}"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Text: &lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;{content}"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;continue&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Since lxml returns a tree structure while our parser emits events, I had Claude implement a &lt;code&gt;to_libxml_format&lt;/code&gt; function that transforms our event stream to match lxml's output format exactly. This made test comparison straightforward.&lt;/p&gt;

&lt;p&gt;It took about 4 hours without human intervention (except &lt;code&gt;Please continue&lt;/code&gt;) to accomplish the basic parts. The most complext feature was DTD (Document Type Definition) parsing and validating. I used Claude's plan mode to structure the implementation. Here is the plan summary: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffczuv4ykgl5utntkw31b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffczuv4ykgl5utntkw31b.png" alt=" " width="800" height="398"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After about 1 hour, DTD was implemented and 726 tests passed. But it took 3 more hours to handle edge cases including &lt;strong&gt;entity value expansion&lt;/strong&gt;, &lt;strong&gt;text splitting details&lt;/strong&gt; and &lt;strong&gt;UTF-8 BOM handling&lt;/strong&gt;. &lt;/p&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;At the end, 800 W3C conformance tests passed. Note that there were 59 tests skipped by the tests-gen script because some of them were valid but rejected by lxml, while the other were not-well-formed but passed by lxml. The script recognized these tests as "lxml implementation quirks". Since these edge cases were overly complicated, I didn't carefully check if those were really caused by "lxml implementation quirks". The remaining 800 tests were sufficient anyway.&lt;/p&gt;

&lt;p&gt;So this library supports: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;XML 1.0 + Namespaces 1.0&lt;/li&gt;
&lt;li&gt;Pull-parser API for memory-efficient streaming&lt;/li&gt;
&lt;li&gt;Writer API for XML generation&lt;/li&gt;
&lt;li&gt;DTD support with entity expansion&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Reflections
&lt;/h2&gt;

&lt;p&gt;What Worked Well? &lt;strong&gt;Using an official test suite&lt;/strong&gt; was invaluable. The W3C conformance tests cover edge cases I would never have thought to test manually—obscure character references, DTD quirks, namespace handling, and more. &lt;strong&gt;Switching reference implementation&lt;/strong&gt; when needed. quick-xml's leniency was a feature for its users but a problem for conformance testing. libxml2 provided the strict reference I needed. &lt;strong&gt;Plan mode for complex features&lt;/strong&gt; like DTD parsing kept Claude organized. Without it, Claude would jump between fixing different issues without completing any.&lt;/p&gt;

&lt;p&gt;The main problem I met was that Claude's was prone to modify tests instead of fixing bugs. That was a recurring issue. When tests failed, Claude would often:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Modify test expectations to match incorrect output&lt;/li&gt;
&lt;li&gt;Update the test generator to skip failing tests&lt;/li&gt;
&lt;li&gt;Suggest marking tests as "lenient" and skip them rather than fixing the parser&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I had to repeatedly redirect: "Update the MoonBit implementation, not the tests."&lt;/p&gt;

&lt;p&gt;Moreover, &lt;strong&gt;Forgetting project conventions&lt;/strong&gt; was common. Claude would forget to use the &lt;code&gt;moon-ide&lt;/code&gt; skill for code navigation, or use anti-patterns like &lt;code&gt;match (try? expr)&lt;/code&gt; instead of &lt;code&gt;try/catch/noraise&lt;/code&gt;. Adding these to CLAUDE.md helped but didn't eliminate the issue. I searched this issue in the community (&lt;a href="https://www.reddit.com/r/ClaudeAI/comments/1oetd0h/claude_will_not_follow_instructions_in_skills/" rel="noopener noreferrer"&gt;reddit link&lt;/a&gt;) and found that this might be a bug in &lt;code&gt;Opus 4.5&lt;/code&gt; and &lt;code&gt;Sonnet 4.5&lt;/code&gt;. Hope it will be fixed in the near future.&lt;/p&gt;

&lt;p&gt;In my future work, I may need to implement or port a large number of parsers. I think I need to turn my experience in writing these parsers and creating test generation scripts based on standards into reusable skills or commands. Perhaps we will see the benefits next time.&lt;/p&gt;

&lt;p&gt;Time investment: 10+ hours of active development:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;2 hours: Collaborative exploration of how to write the expected test generation script. &lt;/li&gt;
&lt;li&gt;4 hours: Without human intervention, implement the basic features.&lt;/li&gt;
&lt;li&gt;1 hour: Plan and implement DTD, Namescope and Entites.&lt;/li&gt;
&lt;li&gt;3 hours: Handle edge cases (fix 17 test failures)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Code is available on &lt;a href="https://github.com/moonbit-community/xml-mbt" rel="noopener noreferrer"&gt;github&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>vibecoding</category>
      <category>moonbit</category>
      <category>aaom</category>
    </item>
    <item>
      <title>AAoM-01: Pug Template Engine</title>
      <dc:creator>Milky</dc:creator>
      <pubDate>Wed, 07 Jan 2026 06:33:59 +0000</pubDate>
      <link>https://forem.com/milky2018/aaom-01-pug-template-engine-11ka</link>
      <guid>https://forem.com/milky2018/aaom-01-pug-template-engine-11ka</guid>
      <description>&lt;p&gt;&lt;em&gt;December 2025&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Let's kick off the Agentic Adventures of MoonBit series with a complete Pug template engine. I'm using Claude Code (Opus 4.5) with the &lt;a href="https://github.com/moonbitlang/system-prompt" rel="noopener noreferrer"&gt;MoonBit system prompt and IDE skill&lt;/a&gt;. The workflow is simple: I describe what I want, Claude writes the code, and I review and refine.&lt;/p&gt;

&lt;p&gt;Claude excels at bootstrapping MoonBit projects with &lt;code&gt;moon new&lt;/code&gt;, understanding the package structure (&lt;code&gt;moon.mod.json&lt;/code&gt;, &lt;code&gt;moon.pkg.json&lt;/code&gt;), and generating idiomatic code after a few iterations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Problem
&lt;/h2&gt;

&lt;p&gt;HTML is verbose. Writing nested structures by hand is tedious and error-prone. &lt;a href="https://pugjs.org/" rel="noopener noreferrer"&gt;Pug&lt;/a&gt; (formerly Jade) solves this with a clean, indentation-based syntax:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;doctype html
html
  head
    title My Site
  body
    h1#greeting.hero Hello, World!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The goal: implement a Pug-to-HTML compiler in MoonBit that supports the full specification—tags, attributes, interpolation, conditionals, loops, mixins, includes, and extends.&lt;/p&gt;

&lt;h2&gt;
  
  
  Aproach
&lt;/h2&gt;

&lt;p&gt;I had Claude read all specifications from pugjs.org and write corresponding tests first.&lt;/p&gt;

&lt;p&gt;The tests looked like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;test "case with fall through" {
  let pug =
    #|case num
    #|  when 0
    #|  when 1
    #|  when 2
    #|    p Small number
    #|  default
    #|    p Large number
  let locals = Locals::new()
  locals.set("num", "1")
  let html = render_with_locals(pug, locals)
  inspect(html, content="&amp;lt;p&amp;gt;Small number&amp;lt;/p&amp;gt;")
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There were 153 tests spread across 18 blackbox test files. This test-driven approach caught edge cases early. &lt;/p&gt;

&lt;p&gt;When Claude started to implement the library, I reviewed the tests. The implementation follows a classic compiler architecture: &lt;code&gt;lexer.mbt&lt;/code&gt;, &lt;code&gt;parser.mbt&lt;/code&gt; and &lt;code&gt;render.mbt&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;After the core features worked, Claude stopped and told me there remained features like includes/extends because of Claude does not know how to perform file system operations. I introduced &lt;code&gt;@moonbitlang/x/fs&lt;/code&gt; for file system access. Claude's first API was awkward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;let registry = TemplateRegistry::new()
registry.register("includes/head.pug", @fs.read_file_to_string("example/includes/head.pug"))
registry.register("includes/foot.pug", @fs.read_file_to_string("example/includes/foot.pug"))
let html = render_with_registry(source, Locals::new(), registry)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I pointed this out and asked for improvement. The result was much cleaner:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;let html = render_file("example/index.pug")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;render_file&lt;/code&gt; function now automatically discovers and loads dependencies, resolving paths relative to the input file.&lt;/p&gt;

&lt;p&gt;The hardest challenge is JS expressions interpolation like &lt;code&gt;#{msg.toUpperCase()}&lt;/code&gt;. To realize this, the template engine needs to evaluate JS expressions. We can directly use JS FFI for the JS backend. But I also want this library to be used on other backends (MoonBit compiles to multiple backends including WebAssembly, native and JavaScript). While Claude was trying to implement a comprehensive JS interpreter, I interrupted Claude and told him to use &lt;code&gt;extern "js" fn eval(expr : String) -&amp;gt; String = "(expr) =&amp;gt; eval(expr)"&lt;/code&gt; (not exactly this, but roughly the idea) for the JS backend and abort with &lt;code&gt;only supported on the JS backend&lt;/code&gt; message for non-JS backends.&lt;/p&gt;

&lt;p&gt;Final task was to implement a command-line interface. The design and implementation was done by Claude and I did not give any suggestions on this task. I ran the CLI with different inputs quite a few times and made sure the output made sense.&lt;/p&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;A fully functional Pug template engine with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tags, IDs, classes, and attributes&lt;/li&gt;
&lt;li&gt;Nested elements via indentation&lt;/li&gt;
&lt;li&gt;String interpolation (&lt;code&gt;#{}&lt;/code&gt; and &lt;code&gt;!{}&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Conditionals (&lt;code&gt;if&lt;/code&gt;, &lt;code&gt;else if&lt;/code&gt;, &lt;code&gt;else&lt;/code&gt;, &lt;code&gt;unless&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Iteration (&lt;code&gt;each&lt;/code&gt;, &lt;code&gt;while&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Mixins with parameters and blocks&lt;/li&gt;
&lt;li&gt;Template inheritance (&lt;code&gt;include&lt;/code&gt;, &lt;code&gt;extends&lt;/code&gt;, &lt;code&gt;block&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;CLI with JSON locals and directory processing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All tests pass: 140 general tests and 13 js-only tests. &lt;/p&gt;

&lt;p&gt;CLI example session:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s1"&gt;'{"name": "World"}'&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; data.json
&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s1"&gt;'h1 Hello #{name}!'&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; greeting.pug
&lt;span class="nv"&gt;$ &lt;/span&gt;moon run cmd/main &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="nt"&gt;-O&lt;/span&gt; data.json greeting.pug
&amp;lt;h1&amp;gt;Hello World!&amp;lt;/h1&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Reusable templates with the compile API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;test "compile api" {
  // Compile template once
  let template = @pug.compile("p Hello #{name}!")

  // Render with different locals
  let locals1 = @pug.Locals::new()
  locals1.set("name", "Alice")
  inspect(template.render(locals1), content="&amp;lt;p&amp;gt;Hello Alice!&amp;lt;/p&amp;gt;")
  let locals2 = @pug.Locals::new()
  locals2.set("name", "Bob")
  inspect(template.render(locals2), content="&amp;lt;p&amp;gt;Hello Bob!&amp;lt;/p&amp;gt;")
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The API now is very clear and convenient to use, just like the official pug implementation. &lt;/p&gt;

&lt;h2&gt;
  
  
  Reflections
&lt;/h2&gt;

&lt;p&gt;Test-driven development works very well. Having Claude read pugjs.org docs and write tests first caught issues early. MoonBit's pattern matching makes AST processing clean and exhaustive. &lt;/p&gt;

&lt;p&gt;The challenges lies in MoonBit features or conventions which Claude is not very clear about. For example, Claude does not know the conventional way to access file system and is prone to write anti-patterns like &lt;code&gt;match (try? expr) { Ok(_) =&amp;gt; ...; Err(_) =&amp;gt; ...}&lt;/code&gt;, which may be due to the Opus model's unfamiliarity with the latest MoonBit syntax and best practices. This information has been updated in my commonly used &lt;code&gt;moonbit-lang&lt;/code&gt; SKILL, and its effectiveness will be tested in the subsequent AAoM series.&lt;/p&gt;

&lt;p&gt;A practical way I have found to improve the project is to ask Claude "What Pug features are still missing?" Claude will explore the whole library, list the possible missing features, write a todo list, and then implement them. I only need to check one last time to make sure the tests match the examples on the official pug website.&lt;/p&gt;

&lt;p&gt;The process takes roughly 6 hours of active development: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;several minutes to generate the tests,&lt;/li&gt;
&lt;li&gt;nearly 3 hours for core features,&lt;/li&gt;
&lt;li&gt;1 hour for include/extends,&lt;/li&gt;
&lt;li&gt;2 hours for JS interpolation and several minutes for CLI.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Code is available on &lt;a href="https://github.com/moonbit-community/pug-mbt" rel="noopener noreferrer"&gt;github&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>vibecoding</category>
      <category>pug</category>
      <category>moonbit</category>
    </item>
    <item>
      <title>Introduction to Agentic Adventures of MoonBit</title>
      <dc:creator>Milky</dc:creator>
      <pubDate>Wed, 07 Jan 2026 06:30:49 +0000</pubDate>
      <link>https://forem.com/milky2018/introduction-to-agentic-adventures-of-moonbit-1pgi</link>
      <guid>https://forem.com/milky2018/introduction-to-agentic-adventures-of-moonbit-1pgi</guid>
      <description>&lt;h2&gt;
  
  
  Motivation
&lt;/h2&gt;

&lt;p&gt;Inspired by &lt;a href="https://anil.recoil.org/notes/aoah-2025" rel="noopener noreferrer"&gt;Anil Madhavapeddy's Agentic Adventures&lt;/a&gt;, this project explores what it's like to build practical libraries in MoonBit with AI assistance.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.moonbitlang.com" rel="noopener noreferrer"&gt;MoonBit&lt;/a&gt; is a modern programming language designed for cloud computing and AI coding.&lt;/p&gt;

&lt;h2&gt;
  
  
  Goals
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Build useful libraries&lt;/strong&gt; - Each entry will document the creation of a practical MoonBit library.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Explore AI-assisted workflows&lt;/strong&gt; - Document how agentic programming integrates with MoonBit's toolchain.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Share learnings&lt;/strong&gt; - Be transparent about what works, what doesn't, and lessons learned.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;

&lt;p&gt;Each blog post will document:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The problem being solved&lt;/li&gt;
&lt;li&gt;The specification and design process&lt;/li&gt;
&lt;li&gt;Interactions with Coding Agent (such as Claude Code) during development&lt;/li&gt;
&lt;li&gt;Code review notes and refinements&lt;/li&gt;
&lt;li&gt;Final library usage and examples&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Contents:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/milky2018/aaom-01-pug-template-engine-11ka"&gt;AAoM-01: Pug Template Engine&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/milky2018/aaom-02-xml-parser-with-w3c-conformance-5a2f"&gt;AAoM-02: XML parser&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/milky2018/aaom-03-a-whatwg-html5-parser-driven-by-tests-40o7"&gt;AAoM-03: HTML5 parser&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/milky2018/aaom-04-a-python-312-interpreter-24fl"&gt;AAoM-04: Python interpreter&lt;/a&gt;&lt;/p&gt;

</description>
      <category>vibecoding</category>
      <category>moonbit</category>
    </item>
  </channel>
</rss>
