Forem: Hagicode

OpenCode Integration Practice: Architectural Evolution from Standalone Process to Shared Runtime

Hagicode — Mon, 11 May 2026 03:54:48 +0000

OpenCode Integration Practice: Architectural Evolution from Standalone Process to Shared Runtime

This article shares the complete practice of HagiCode integrating the OpenCode AI assistant, including key design decisions during the architectural evolution process, pitfalls encountered, and final solutions.

Background

OpenCode is an open-source AI coding assistant project hosted on GitHub. For a monorepo project like HagiCode, integrating OpenCode as a supported AI Provider means it can be used as a backend model for proposal generation, code editing, and workflow execution.

However, this integration process didn't go as smoothly as imagined. Early on, there were two separate proposals: one planned to create a C# SDK, which was later abandoned—not really a loss; another for repository-level integration did persist. As OpenCode entered the formal session pipeline, we encountered a series of issues like session management and error recovery—after all, what must come will come.

More troublesome was that the initially designed "standalone process per session" model exposed high resource overhead issues in actual operation, forcing a refactor to a "system-level shared runtime" model. We also stepped into the 400 BadRequest pit—reusing external endpoints lacking context causing request failures. It's all tears, really.

This article is just organizing these pitfalls and design decisions to provide reference for projects that need to integrate OpenCode in the future. After all, beautiful things or people don't necessarily need to be possessed—as long as she remains beautiful, just watching her beauty quietly is enough... Technical sharing is the same.

About HagiCode

The solution shared in this article comes from our practical experience in the HagiCode project. HagiCode is an AI-based code assistant project. During development, we needed to integrate multiple AI Providers, and OpenCode is one of them. The architectural evolution process shared below are all real experiences from our actual project—stepping in pits and optimizing them. No choice but to fill the pits we stepped in.

Technical Architecture

Overall Layered Design

HagiCode's OpenCode integration architecture is divided into five layers, each with clear responsibilities:

1. Repository Integration Layer

Register the OpenCode repository through the MonoSpecs configuration system (.hagicode/monospecs.yaml). There's a choice here: submodule or plain Git repository? We chose the latter, managing cloning and synchronization through a unified scripts/clone-repos.mjs script. This is more flexible and avoids the permission and collaboration issues brought by submodules—after all, no one wants to see that error screen, but no choice.

2. Provider Layer

OpenCodeCliProvider implements the IAIProvider interface, which is the standard abstraction layer for interfacing with external AI services. The initial proposal wanted "standalone process per session," but actual operation revealed resource overhead was too high, ultimately changing to shared runtime mode, managing system-level runtime lifecycle through OpenCodeRuntimeCoordinator. It's nothing, really—the idea was beautiful, reality is cruel.

3. Runtime Management Layer

OpenCodeRuntimeCoordinator is the core of the entire architecture, responsible for runtime startup, health checks, and失效重建. It uses HagiCode.Libs.Providers.OpenCode as the HTTP client foundation, encapsulating all interactions with the OpenCode runtime. Like that winter night, the bamboo outside the window remained the same as yesterday, lacking the response to her—she still liked looking out the window—runtime is the same, needing someone to silently guard it.

4. Session Persistence Layer

Using SQLite database (opencode-session-bindings-v2.db) to persist the mapping of CessionId to OpenCode SessionId. This design is critical, supporting session recovery and restart, avoiding creating new sessions each time. After all, memory—sometimes forgetting is better, but in the program world, having no memory really doesn't work.

5. Error Recovery Layer

ProviderErrorAutoRetryCoordinator provides automatic retry mechanism,配合 OpenCodeRetryableTerminalFailureClassifier to classify errors—which can be retried, which should fail directly. This layer greatly improves system robustness. Actually nothing much, just letting the system be like a person—fall down and get up again.

Key Data Flow

When an AI request comes in, the data flow goes like this:

Request first reaches OpenCodeCliProvider
Provider requests runtime from OpenCodeRuntimeCoordinator
Coordinator checks if there's an available runtime, if not starts a new one
Query or create session binding through CessionId
Use bound SessionId to call OpenCode API
If error occurs, decide whether to retry based on error type

This process looks simple, but every step has had pitfalls. Does this have meaning? Perhaps, but we've stepped in them all... Also figured it out, stepping in pits is itself part of growth.

Key Design Decisions

From Standalone Process to Shared Runtime

The initial opencode-csharp-sdk proposal adopted a "standalone process per session" model. The idea was beautiful: good isolation, one process crash doesn't affect other sessions. But reality is cruel:

High resource overhead: each process needs to load runtime, memory usage rises straight up
Slow startup: frequent creation and destruction of processes, overhead can't be ignored
Complex management: process lifecycle management is itself a troublesome matter

Ultimately we changed to "system-level shared runtime" mode. All sessions reuse the same runtime process, distinguishing different sessions through session id. This change reduced resource usage by an order of magnitude and significantly improved response speed. Actually nothing much, just changing "one person enjoying alone" to "everyone using together."

Self-Managed Endpoint vs External BaseUri

Early on we encountered a weird 400 BadRequest problem. Investigation revealed it was because we reused an external BaseUrl but lacked necessary context information. OpenCode's runtime is stateful—directly using an external endpoint is equivalent to context loss—like a person without memory, at a loss.

The solution is simple: maintain self-managed runtime, don't depend on external endpoints. Leave BaseUri empty in configuration file, let the system manage runtime lifecycle itself.

AI:
  OpenCode:
    Enabled: true
    ExecutablePath: "opencode"
    BaseUri: null  # Leave empty, use self-managed runtime
    Model: "anthropic/claude-sonnet-4-20250514"

This configuration change looks inconspicuous, but solved the most headache-inducing problem at the time. After all, sometimes the answer is right before our eyes, we just took too many detours.

Session Binding Strategy

Session binding is another key design. We use CessionId as binding key, supporting three modes:

started: New session, create new OpenCode SessionId
resumed: Resume existing session, read binding from database
restarted: Restart session, create new SessionId but preserve history

This design makes session management very flexible—users can resume previous conversations at any time, and the system can automatically rebuild bindings after runtime restart. After all, memory—sometimes want to forget but can't, sometimes want to remember but can't... Memory in the program world is quite reliable.

Implementation Plan

1. Repository Integration

repositories:
  - path: "repos/opencode"
    url: "https://github.com/anomalyco/opencode.git"
    displayName: "OpenCode"
    icon: "⌨️"

Then run the clone script:

node scripts/clone-repos.mjs

This pulls the OpenCode source code locally, and it can be updated at any time later. Actually quite simple, as long as there are no errors...

2. Provider Configuration

Configure OpenCode provider in appsettings.yml:

AI:
  OpenCode:
    Enabled: true
    ExecutablePath: "opencode"
    BaseUri: null
    Model: "anthropic/claude-sonnet-4-20250514"
    RequestTimeoutSeconds: 300
    StartupTimeoutSeconds: 60

Several key parameters:

RequestTimeoutSeconds: Timeout for single request, default 5 minutes—after all, waiting too long is quite torturous
StartupTimeoutSeconds: Runtime startup timeout, giving a full 1 minute

3. Provider Restoration

Bring OpenCode back into the AI Provider system:

Restore OpenCodeCli in AIProviderType enum
Restore creation logic in AIProviderFactory
ExecutorGrainFactory routes OpenCodeCli to dedicated grain

These changes make OpenCode an equally-treated AI Provider, not a special case. Actually everyone is the same, nothing special or not special.

4. Runtime Management Code Example

// Get runtime through OpenCodeRuntimeCoordinator
var runtime = await _runtimeCoordinator.GetRuntimeAsync(
    _settings,
    request.WorkingDirectory,
    cancellationToken);

// Create or resume session
var session = await ResolveSessionAsync(runtime, request, cancellationToken);

// Send prompt
var response = await session.Runtime.Client.PromptAsync(
    session.SessionId,
    promptRequest,
    cancellationToken);

This code looks very concise, but behind it does a lot of work: runtime startup, health checks, session binding query and creation. Like many things,表面上看不出什么，behind it all are stories.

5. Error Recovery Mechanism

// Detect retryable errors and rebuild runtime
if (ShouldRetryWithFreshRuntime(ex, cancellationToken))
{
    await _runtimeCoordinator.InvalidateAsync(runtime, ...);
    var recoveredRuntime = await ResolveRuntimeAsync(request, cancellationToken);
    // Retry with new runtime
}

Automatic retry mechanism greatly improves system robustness—network jitter, runtime occasional crashes can all automatically recover. Actually life is the same, fall down and get up, nothing big... Programs are much stronger than people.

Practice Guide

Key Configuration Quick Reference

Configuration	Default	Description
`Enabled`	`true`	Whether to enable OpenCode provider
`ExecutablePath`	`"opencode"`	OpenCode executable path
`BaseUri`	`null`	External endpoint (recommended to leave empty)
`Model`	-	Default model
`RequestTimeoutSeconds`	`300`	Request timeout
`StartupTimeoutSeconds`	`60`	Runtime startup timeout

Session Binding Database Structure

CREATE TABLE IF NOT EXISTS OpenCodeSessionBindings (
    BindingKey TEXT NOT NULL PRIMARY KEY,
    OpenCodeSessionId TEXT NOT NULL,
    CreatedAtUtc TEXT NOT NULL,
    UpdatedAtUtc TEXT NOT NULL
);

Bindings are retained for 30 days, automatically cleaned after expiration. This design both ensures session recovery capability and avoids unlimited data growth. After all, everything has an expiration, expired then clean up, it's also a form of letting go...

Common Issues and Solutions

1. 400 BadRequest Error

Check BaseUri configuration, recommend leaving empty to use self-managed runtime. If must use external endpoint, ensure context is complete. Actually most times, the problem lies in "taking for granted."

2. Session Cannot Resume

Confirm whether CessionId is correctly passed, check if corresponding binding record exists in database. Like searching for memory, there must be clues.

3. Model Selection Issue

Supports two formats: provider/model (like anthropic/claude-sonnet-4) and no-provider format (like claude-sonnet-4). All roads lead to Rome, just some roads are easier to walk, some roads slightly more winding.

4. Tool Name Mismatch

Tool names are automatically normalized, removing content after parentheses and colons. For example read(path) becomes read, pay attention when calling. These details aren't much, just easily overlooked.

5. Auto Retry Not Working

Check if error classifier correctly identifies retryable errors. By default, network errors, runtime failures etc. automatically retry up to 3 times. After all, trying a few more times doesn't hurt, might just work.

Related Code Paths

Provider: repos/hagicode-core/src/PCode.ClaudeHelper/AI/Providers/OpenCodeCliProvider.cs
Runtime Coordinator: repos/hagicode-core/src/PCode.ClaudeHelper/AI/Providers/OpenCodeRuntimeCoordinator.cs
Configuration: repos/hagicode-core/src/PCode.ClaudeHelper/AI/Configuration/OpenCodeSettings.cs
Proposal Archive: openspec/changes/archive/2026-03-*opencode*/

Summary

HagiCode's process of integrating OpenCode is actually a continuous process of stepping in pits and optimizing. From the initial standalone process mode to shared runtime, from reusing external endpoints to self-managed runtime, every architecture adjustment is driven by actual needs. Actually nothing much, just didn't miss any pit that should be stepped in.

There are three core experiences:

Resource sharing is important: Don't blindly pursue isolation, shared runtime can significantly reduce resource overhead—sometimes one person enjoying alone isn't as good as everyone using together
Be careful with state management: Stateful services should be self-managed, don't depend on external endpoints—after all, your own affairs are best done yourself
Error recovery is essential: Automatic retry mechanism can take system robustness up a level—fall down and get up, nothing big

This solution now runs stably in HagiCode, supporting session recovery, automatic retry, runtime rebuild and other functions. If your project also needs to integrate OpenCode, hope these experiences can help you walk fewer detours. After all... only after walking detours do you know where the shortcut is, sometimes knowing it is of no use.

References

OpenCode GitHub Repository
HagiCode GitHub Repository
HagiCode Official Site: hagicode.com
HagiCode Installation Guide: docs.hagicode.com/installation/docker-compose
HagiCode Desktop: hagicode.com/desktop/
Official Version Demo Video: www.bilibili.com/video/BV1z4oWB3EpY/

Steamworks Multilingual Metadata Management: From Manual Maintenance to Structured Workflow

Hagicode — Sat, 09 May 2026 09:00:07 +0000

Steamworks Multilingual Metadata Management: From Manual Maintenance to Structured Workflow

The Steam platform requires games to provide store descriptions in 28 languages. Traditional manual maintenance is inefficient and error-prone. This article introduces how to build a structured multilingual metadata management system through HagiCode, achieving an integrated workflow from content creation to export and release.

Background

The Steam platform requires games and applications to provide multilingual store descriptions, including fields like about (detailed description) and short_description (short description). For products released globally, localization content in 28 languages is typically required.

This sounds like a simple content management task, but when you actually start working on it, you discover there are more problems than you imagined.

First, the maintenance workload is enormous. 28 languages multiplied by 2 fields equals 56 content blocks that need to be managed. Manually switching languages for editing in the Steamworks website backend is indeed inefficient. Every content update requires repeating this process—it's painful to even talk about it.

Second, scattered content is difficult to manage. Multilingual content is typically scattered across different tools and documents, lacking a unified local storage format. Version control becomes difficult, and team collaboration is prone to errors. After all, scattered things are like scattered memories—when you want to find them, you can't.

Furthermore, DLC content and main application content management are siloed. If your game has multiple DLCs, each DLC needs to maintain multilingual content separately, and management complexity grows exponentially. It's like life—things pile up, and you don't know where to start cleaning up.

Finally, the export format is unintuitive. The JSON format required by Steamworks doesn't match human reading habits, making manual editing error-prone. After all, who wants to look at that dense JSON?

These were all problems we encountered during the actual development of the HagiCode project. As an AI coding tool for global development, we need to maintain complete multilingual content for the Steam platform. Traditional maintenance methods could no longer meet our needs, and we urgently needed a more efficient solution. Actually, there's no other way—we had to build it ourselves.

About HagiCode

The solution shared in this article comes from our practical experience in the HagiCode project. HagiCode is an AI coding tool that supports multiple AI providers and code editors. During development, we needed to maintain multilingual store content for the Steam platform, which drove us to build a structured metadata management system.

The multilingual metadata management solution shared in this article is exactly what we actually refined through trial and optimization during HagiCode development. If you find this solution valuable, it shows our engineering strength is pretty good—so HagiCode itself is worth paying attention to. After all, a tool that can solve problems is a good tool, right?

Core Concepts

Languages and Fields

Steamworks supports a fairly complete list of languages, covering major markets:

en-US, fr-FR, it-IT, de-DE, es-ES,
bg-BG, cs-CZ, da-DK, nl-NL, fi-FI,
el-GR, hu-HU, id-ID, ja-JP, ko-KR,
nb-NO, pl-PL, pt-BR, pt-PT, ro-RO,
ru-RU, zh-CN, es-419, sv-SE, th-TH,
zh-TW, tr-TR, uk-UA, vi-VN

The most commonly used are en-US (English), zh-CN (Simplified Chinese), zh-TW (Traditional Chinese), ja-JP (Japanese), and ko-KR (Korean). After all, these languages cover major markets—once you get these done, the others aren't so scary.

The main fields that need to be maintained include two:

about: Detailed description, supports rich text format
short_description: Short description, with a 300-character limit

Scope Concept

Steam app content can be divided into two scopes:

Base App: Main application content
DLC: Downloadable content, each DLC has independent content management

This distinction is important because DLCs typically need independent store descriptions, and a game may have multiple DLCs that need unified management. It's like life—some things are primary, some are additional, but they all need to be managed properly, or things become a mess.

Data Model Design

The system defines a clear data model to support multilingual content management:

// 28 supported language codes
const STEAMWORKS_SUPPORTED_LOCALES = [
  'en-US', 'fr-FR', 'it-IT', 'de-DE', 'es-ES',
  'bg-BG', 'cs-CZ', 'da-DK', 'nl-NL', 'fi-FI',
  'el-GR', 'hu-HU', 'id-ID', 'ja-JP', 'ko-KR',
  'nb-NO', 'pl-PL', 'pt-BR', 'pt-PT', 'ro-RO',
  'ru-RU', 'zh-CN', 'es-419', 'sv-SE', 'th-TH',
  'zh-TW', 'tr-TR', 'uk-UA', 'vi-VN'
];

// Supported fields
const STEAMWORKS_SUPPORTED_FIELDS = [
  'about',           // Detailed description
  'short_description' // Short description
];

// Content scope
type SteamworksScopeKind = 'base' | 'dlc';

There are a few considerations in this model design—well, actually, it's just about making things a bit simpler:

Use standard language code formats (like zh-CN instead of chinese)—after all, standard things are always more reliable
Explicitly list field types for future extension—who knows if more fields will be needed later
Distinguish scope types to support unified management of Base App and DLC—it's always good to keep things clear

File Storage Structure

Content is stored in .hagiclaw-data/steamworks-metadata/ in the project directory, using a hierarchical directory structure:

.hagiclaw-data/
└── steamworks-metadata/
    └── default-app/
        ├── workspace.json              # Workspace configuration manifest
        ├── base/                       # Base application content
        │   ├── en-US/
        │   │   ├── about.md
        │   │   └── short_description.md
        │   ├── zh-CN/
        │   │   ├── about.md
        │   │   └── short_description.md
        │   └── ...
        └── dlc/                        # DLC content
            └── turbo-engine/
                ├── en-US/
                │   ├── about.md
                │   └── short_description.md
                └── ...

This structure design has several advantages—or at least, it's much better than the previous approach:

Human-readable: Each content is an independent Markdown file that can be edited directly—after all, human eyes prefer to see things clearly
Version control friendly: Text files make it easy to track change history and compare differences—so what was changed is clear at a glance
Strong extensibility: Adding new languages or fields only requires creating new files—like building blocks, add whatever you want
Clear structure: The directory structure intuitively reflects how content is organized—won't make people feel confused

workspace.json stores workspace configuration, including DLC list and language configuration information. After all, some things still need a manifest—otherwise, after a while, who remembers what they put where.

Markdown to BBCode Conversion

Steam uses BBCode format for rich text, not standard Markdown. This brings additional workload to content creation—either write BBCode directly or manually convert it later.

HagiCode's solution is: let developers create content in familiar Markdown, and the system automatically converts it to Steam BBCode. After all, people are always accustomed to what they're familiar with—why force yourself to adapt to those strange curly braces?

Conversion Rules

// Heading conversion
# HagiCode        → [h1]HagiCode[/h1]
## Features        → [h2]Features[/h2]

// Text styles
**bold text**     → [b]bold text[/b]
*italic text*     → [i]italic text[/i]
`code`            → [code]code[/code]

// Links and images
[text](url)       → [url=url]text[/url]
![alt](src)       → [img src="{STEAM_APP_IMAGE}/extras/..."][/img]

// Lists
- item 1
- item 2          → [*]item 1
                   [*]item 2
                   (wrapped in [list])

Language Wrapping

When exporting, content needs to be wrapped with language tags:

wrapWithSteamLanguage(locale: SteamworksLocaleCode, bbcode: string): string {
  // Returns [lang=english]...[/lang] format
}

Language codes need to be mapped to Steam's format:

en-US → english
zh-CN → schinese
zh-TW → tchinese
ja-JP → japanese
ko-KR → korean

This mapping relationship isn't actually that complicated, it just needs to be remembered. After all, every platform has its own rules, we can only adapt.

Export Format

The exported JSON needs to meet Steamworks' structure requirements:

{
  "itemid": "1158573",
  "languages": {
    "english": {
      "app[content][about]": "[h1]HagiCode[/h1]\n[b]About[/b]...",
      "app[content][short_description]": "AI coding tool..."
    },
    "schinese": {
      "app[content][about]": "[h1]HagiCode[/h1]\n[b]关于[/b]...",
      "app[content][short_description]": "AI 编码工具..."
    }
  }
}

The key points aren't many, just need to remember these format requirements:

itemid corresponds to Steam AppID
Steam's language codes (like schinese) are used under languages
Field paths use app[content][fieldName] format
Values are converted BBCode strings

These rules seem a bit tedious, but you get used to them. After all, every platform has its own temperament, we can only adapt.

API Service Design

The system provides a complete REST API to support the multilingual content management workflow:

Load Workspace

GET /api/steamworks/metadata

Returns workspace configuration, all languages, and field content. After all, there needs to be a place to pull everything out for viewing.

Save Content

POST /api/steamworks/metadata

{
  "scopeId": "base-app",
  "scopeKind": "base",
  "values": {
    "en-US": {
      "about": "Markdown content...",
      "short_description": "Short text..."
    },
    "zh-CN": {
      "about": "Markdown 内容...",
      "short_description": "简短文本..."
    }
  }
}

When saving, the system writes Markdown content to corresponding .md files. This way nothing gets lost—after all, memory is always unreliable.

Render Preview

POST /api/steamworks/metadata/preview

{
  "locale": "zh-CN",
  "field": "about",
  "content": "# HagiCode\n\n这是关于..."
}

Returns Markdown rendering result and BBCode conversion result for easy previewing. Preview is like looking in a mirror—you should at least see what you look like before going out.

Export JSON

POST /api/steamworks/metadata/export

{
  "scopeId": "base-app",
  "scopeKind": "base"
}

Generates Steamworks-format JSON that can be directly imported into the Steamworks backend. This step is essentially packaging everything up, ready for shipping.

DLC Management

POST /api/steamworks/metadata/dlc    // Create
PUT /api/steamworks/metadata/dlc     // Update
DELETE /api/steamworks/metadata/dlc  // Delete

DLC management includes creating, updating, and deleting DLC metadata configurations. After all, DLC is also content and needs to be managed properly.

Usage Workflow

1. Access Metadata Panel

Open the Steamworks Metadata panel in the HagicLaw workspace, and the system will load the current workspace's configuration and content. Once all preparations are done, you can begin.

2. Select Edit Scope

Select Base App or a specific DLC in the left navigation. Each scope independently manages its multilingual content. Like organizing a room—first categorize things, then clean them up one by one.

3. Multilingual Matrix Editing

Expand the languages you need to edit, and directly edit the Markdown content for about and short_description. The system supports:

Real-time Markdown rendering preview
Steam BBCode conversion preview
Character count and length checking

These preview features are actually quite useful—at least you can know what your content looks like. After all, no one wants to write a bunch of stuff only to find the format is completely wrong.

4. Save Content

Click the save button, and content is automatically written to corresponding .md files. Files are included in Git version control for easy change tracking. Saving is like writing down memories—they won't be forgotten even after a long time.

5. Validation Checks

The system automatically checks:

Whether required fields are complete
Whether short_description exceeds 300 characters
Whether Markdown syntax is correct

These checks can avoid some basic errors—after all, humans make mistakes, it's always good to have a machine help watch over things.

6. Export JSON

Select the scope to export (Base App or specific DLC), and the system generates Steamworks JSON containing all languages. Copy the JSON and paste it into the Steamworks backend to complete the import. Once this step is done, the entire workflow is complete. Everything is ready, just waiting for release.

Notes

Language Code Mapping

The system's en-US corresponds to Steam's english, and zh-CN corresponds to schinese. This mapping is handled automatically during export, but needs attention when manually editing JSON. After all, some things machines can help you with, but some you still need to remember yourself.

BBCode Limitations

Steam only supports a subset of BBCode, and complex Markdown may not convert perfectly. It's recommended to check conversion results in preview. Preview is like looking in a mirror—check what you look like before going out.

Image Paths

Images are converted to [img src="{STEAM_APP_IMAGE}/extras/..."] placeholder format. Actual images need to be uploaded separately to the Steam backend. Images are sometimes more persuasive than text, just a bit more troublesome to upload.

Field Validation

short_description has a strict 300-character limit. The system validates before export, but it's recommended to control length during editing. After all, writing too many characters is useless—the platform only looks at the first 300, so you have to be concise.

Version Control

All Markdown files can be included in Git version control for easy change history tracking and collaborative editing. It's recommended to commit changes regularly. Version control is like a time machine that lets you return to a past moment and see what you wrote then.

DLC Management

DLC's itemId needs to correspond to the DLC AppID in the Steamworks backend. When creating a DLC, ensure the ID is accurate. IDs are hard to change once wrong, so it's better to be careful.

Summary

The core challenge of Steamworks multilingual metadata management lies in how to efficiently maintain large amounts of multilingual content. Through structured data models, human-friendly file storage, and automated conversion/export workflows, we can transform this tedious process into a manageable content creation workflow.

This solution has proven effective in the practice of the HagiCode project. We transformed from a manual, error-prone state to a structured, verifiable, collaborative workflow. This not only improved efficiency but also reduced human error. After all, when the tool is well-made, things become simple.

If you're developing applications for the Steam platform and need to maintain multilingual content, I hope this solution can provide some inspiration. Multilingual content management doesn't have to be a painful thing—with the right tools and workflows, it can become relatively easy. Or at least, not so despair-inducing...

References

Steamworks Documentation - Store Metadata
Steam BBCode Guide
HagiCode project: github.com/HagiCode-org/site
HagiCode official site: hagicode.com

If this article helped you:

Give a Star on GitHub: github.com/HagiCode-org/site
Visit the official site to learn more: hagicode.com
Watch the official demo video: www.bilibili.com/video/BV1z4oWB3EpY/
One-click installation experience: docs.hagicode.com/installation/docker-compose
Desktop quick installation: hagicode.com/desktop/

Original Article & License

Thanks for reading. If this article helped, consider liking, bookmarking, or sharing it.
This article was created with AI assistance and reviewed by the author before publication.

Author: newbe36524
Original URL: https://docs.hagicode.com/go?platform=devto&target=%2Fblog%2F2026-05-09-steamworks-multilingual-metadata-management%2F
License: Unless otherwise stated, this article is licensed under CC BY-NC-SA. Please retain attribution when sharing.

Quantifying AI Cost-Benefit Analysis

Hagicode — Sat, 09 May 2026 02:20:13 +0000

Quantifying AI Cost-Benefit Analysis

Your boss asks: "How much does it cost to equip employees with AI assistants, and is it worth it?" You can't answer, and you feel unsure. This article discusses how to calculate this clearly.

Background

In recent years, Claude Code, GitHub Copilot, and various AI programming assistants have flooded in like a tidal wave. As a technical person, you've probably already started using them and feel genuinely more efficient—like someone handing you a ladder when you need to climb.

But when it comes to discussing ROI with your boss or clients, you often hit a wall—how do you quantify that subjective feeling of "increased efficiency"? I understand this feeling. It's like when someone asks you "what do you like about her?" and you stutter for a while, only saying "I just do." That's fine, but bosses want numbers, not your feelings.

And that's not the only problem:

ROI: Is the cost of equipping the team with AI tools worth it?

Efficiency Quantification: How do we translate "productivity gains" across different roles and usage levels into measurable metrics?

Risk Assessment: If competitors大规模 adopt AI, how much will our competitiveness suffer?

Traditional ROI calculations often overlook two critical factors:

Enterprise Total Cost Perspective: Only considering salary while ignoring city differences, social insurance, housing fund, and other additional costs
Token Economics Model: Lack of a calculation framework connecting AI usage (Tokens) to actual output

Both factors are indispensable. Here's a real example: For the same 300k annual salary, the actual cost to the enterprise in Beijing versus Wuhan can differ by over 30%. And that's not even counting the cost of AI usage itself. Cost is like an iceberg—you only ever see the tip...

About HagiCode

The solution shared in this article comes from our practical experience in the HagiCode project.

HagiCode is essentially just an AI code assistant project. However, during development, we genuinely needed to accurately assess the cost-effectiveness of different AI models—after all, money doesn't grow on trees. To that end, we built a complete calculation framework and open-sourced the HagiCode Cost assessment tool.

If you're also thinking about AI cost issues, this approach might give you some reference. Or maybe not—I can't guarantee that, but we're just giving it a try.

Core Calculation Framework

A complete AI cost-benefit assessment requires establishing a three-layer model:

Input Layer
├── Annual salary data
├── City tier coefficient
├── AI model selection
├── Efficiency multiplier estimate
└── Daily Token usage

Calculation Layer
├── Enterprise total cost accounting
├── AI annual cost calculation
├── Cost proportion analysis
├── ROI calculation
└── Equivalent workforce conversion

Output Layer
├── AI cost proportion
├── Efficiency gain
├── Return on investment
├── Equivalent workforce count
└── Elimination risk assessment

This framework looks complex enough to make your head spin. Actually, the core logic is quite simple: calculate the enterprise's real labor costs clearly, calculate the AI's annual costs clearly, then look at the ROI and equivalent workforce. After all, simplifying complexity is the right path.

Calculating Key Metrics

Enterprise Annual Total Labor Cost

First, enterprise total cost—this isn't simply annual salary multiplied by 12 months. Real costs need to consider two factors:

City Coefficient: Additional costs in first-tier cities (Beijing, Shanghai, Guangzhou, Shenzhen) are about 30% higher than other cities. This includes social insurance, housing fund, various benefits, and the cost-of-living premium for first-tier cities—after all, the price of living in Beijing versus Wuhan is indeed different.

Additional Employment Costs: Roughly equivalent to 1 month's salary, covering year-end bonuses, various subsidies, office equipment amortization, etc. These amounts may seem small individually, but they add up.

So the formula is:

enterpriseAnnualTotalLaborCost = annualSalary × (1 + cityCoefficient) + annualSalary/12

City coefficient can refer to this standard:

First-tier cities (Beijing, Shanghai, Guangzhou, Shenzhen): 0.4
New first-tier (Hangzhou, Chengdu, Suzhou, Nanjing): 0.3
Second-tier cities (Wuhan, Xi'an, Tianjin, Zhengzhou): 0.2
Other cities: 0.1

AI Annual Cost

AI cost calculation is slightly more complex because AI models charge by Token. And input and output prices differ—output is typically 5-10x more expensive than input. This isn't surprising, after all, output is the AI "working," while input is just you "talking."

In code scenarios, the input-output ratio is about 3:1, so we can calculate a composite unit price:

// Composite unit price (based on 3:1 input-output ratio)
compositeUnitPrice = (3 × inputPrice + outputPrice) / 4

// Daily cost
dailyAICost = dailyTokenUsage(M) × compositeUnitPrice

// Annual cost (based on 264 working days)
annualAICost = dailyAICost × 264

For example, GPT-5.4's input price is 2.5 USD/1M Token, output price is 15 USD/1M Token. Then the composite unit price is:

compositeUnitPrice = (3 × 2.5 + 15) / 4 = 5.625 USD/1M Token

Converting to RMB (assuming 1 USD = 7.25 CNY):

compositeUnitPrice = 5.625 × 7.25 = 40.78 yuan/1M Token

Exchange rates fluctuate, but we fix them for calculation convenience.

Core Benefit Metrics

With the two costs above, we can calculate core metrics:

// AI cost proportion
aiCostProportion = annualAICost / enterpriseAnnualTotalLaborCost

// Efficiency gain
efficiencyGain = efficiencyMultiplier - 1

// AI return on investment
aiROI = efficiencyGain / aiCostProportion

// Affordable workflow count
affordableCount = enterpriseAnnualTotalLaborCost / annualAICost

// Equivalent workforce
equivalentWorkforce = 1 + (efficiencyMultiplier - 1) × min(affordableCount, 1)

Meanings of these metrics:

AI Cost Proportion: The percentage of enterprise labor costs consumed to maintain Agent workflows. The lower this number, the more "cost-effective" the AI usage. Who doesn't like saving money?

Return on Investment: Efficiency gain ÷ AI cost proportion. Less than 1 means "somewhat wasteful," greater than 2 means "very worthwhile." This is easy to understand—like spending money to buy time, you calculate whether it's worth it.

Equivalent Workforce: There's a point easily misunderstood here. It's not directly accepting the efficiency multiplier, but whether the enterprise can afford this AI workflow. If affordableCount is less than 1, then equivalent workforce won't reach your expected efficiency multiplier. After all, even the cleverest housewife can't cook without rice...

Practical Calculation Example

Let's do a real accounting example. Assume a first-tier city backend developer:

Annual salary: 300k
Using GPT-5.4, efficiency multiplier: 2.5x
Daily Token usage: 12 M

Step 1: Calculate enterprise total cost

enterpriseTotalCost = 30 × (1 + 0.4) + 30/12 = 44.5k

Step 2: Calculate AI annual cost

compositeUnitPrice = 40.78 yuan/1M Token
dailyCost = 12 × 40.78 = 489.36 yuan
annualCost = 489.36 × 264 = 129,191 yuan ≈ 12.9k

Step 3: Calculate benefit metrics

AI cost proportion = 12.9 / 44.5 = 29%
efficiencyGain = 2.5 - 1 = 150%
return on investment = 1.5 / 0.29 = 5.17x

Step 4: Calculate equivalent workforce

affordableCount = 44.5 / 12.9 = 3.45
equivalentWorkforce = 1 + (2.5 - 1) × 1 = 2.5 people

What's the conclusion? This AI usage has an ROI over 5, falling in the "very worthwhile" range. If the entire team uses it, forming approximately 2.5 people's production capacity advantage, it would be very competitive in the market.

This makes sense—after all, the money you spend on AI is far less than your additional output. This deal is worth it.

Impact of Multi-Agent

HagiCode discovered an interesting phenomenon in actual use: a single Agent's efficiency gains have an upper limit.

This is actually quite natural—no matter how capable a person is, they can only do one thing at a time. After all, you're not an octopus.

Traditional single Agent usage patterns have several bottlenecks:

Serial Limitation: Proposal → Implementation → Review → Fix must wait sequentially. No matter how fast a single Agent is, it can only do one thing at a time. It's like cooking—you can only wash, cut, and stir-fry step by step.

Quota Waste: Monthly quota limits can't be fully utilized. Unused quota this month doesn't roll over to next month. This isn't surprising, just a bit wasteful.

Context Switching: Different tasks require repeatedly establishing context, meaning you have to explain background information each time. Like chatting with different people about the same thing—starting from scratch each time gets tiring.

HagiCode's multi-Agent architecture solves these problems through parallel sessions:

Parallel 10x+: Multiple Agents drive multiple instances simultaneously, achieving true parallel work
Throughput Increase: Proposal, implementation, and fixes can advance in parallel without waiting for each other
Improved Token Utilization: OpenSpec process reduces rework, spreading equivalent consumption

The change this brings is enormous. Using the previous example, if using HagiCode multi-Agent architecture:

Parallel sessions: 4
Token utilization improvement: 1.5x

Amplified calculation:

amplifiedEfficiency = 2.5 × 4 = 10x
optimizedDailyToken = (12 × 4) / 1.5 = 32 M
optimizedAnnualCost = 32 × 40.78 × 264 = 344k

New benefit metrics:

newAICostProportion = 34.4 / 44.5 = 77%
newROI = 9 / 0.77 = 11.68x
newEquivalentWorkforce = 1 + (10 - 1) × 1 = 10 people

Although AI cost proportion rose from 29% to 77%, ROI increased from 5.17x to 11.68x, and equivalent workforce changed from 2.5 to 10 people.

This is the power of multi-Agent parallelism. One Agent is one person; ten Agents are a team... The difference isn't just a little bit.

Practical Considerations

Don't Get City Coefficient Wrong

Employment cost differences across cities are significant—first-tier cities' additional costs are about 30% higher than other cities. When calculating, be sure to use the correct city tier. A small difference in this number can significantly skew the final result. After all, "a miss is as good as a mile"... This is an old saying, but it still holds true.

Input-Output Ratio Isn't Fixed

Code scenarios default to a 3:1 input-output ratio, matching the proportion of prompts to generated code in actual programming. But if you're doing other types of work—like writing copy or doing data analysis—this ratio might be completely different.

This is normal—different work, different methods.

Efficiency Multiplier Is Subjective

Efficiency multiplier is a subjective estimate. It's recommended to combine with actual observation:

1.5-2x: Familiar with basic functions, occasional use
2-3x: Proficient, daily high-frequency use
3x+: Deep integration, forming专属 workflows

Don't estimate too high initially—observe for a while before adjusting. After all, higher expectations lead to greater disappointment.

How to Calculate Token Usage

If you don't know your daily Token usage, you can estimate this way:

Check platform usage statistics (both Claude and OpenAI have them)
Record Token consumption from several typical conversations and take an average
Multiply by your daily conversation count

Or just use HagiCode Cost to calculate—it has reference values for common scenarios. This is convenient and saves you from blind trial and error.

Impact of Exchange Rate Fluctuations

USD models require exchange rate conversion, but rates fluctuate. Calculators typically use fixed rates (like 1 USD = 7.25 CNY), while actual costs may vary with exchange rate fluctuations. This error is usually small, but keep it in mind.

After all, everything has an approximation—precision to several decimal places isn't really necessary...

Technical Implementation Points

If you want to implement this calculation logic yourself, several technical details are worth noting:

Multi-Currency Support

function convertCnyAmountToCurrency(
  amountCny: number,
  targetCurrency: "USD" | "CNY"
): number {
  if (targetCurrency === "CNY") return amountCny
  return amountCny / EXCHANGE_RATE_USD_TO_CNY
}

There's not much to say about this code—it's just simple currency conversion.

Multi-Language Localization

function getLocalizedModelCopy(
  model: ModelPricing,
  language: SupportedLanguage
): LocalizedModelMeta {
  return {
    description: language === "zh-CN"
      ? model.description
      : model.descriptionEn,
    pricingContext: language === "zh-CN"
      ? model.pricingContext
      : model.pricingContextEn,
    // ... other fields
  }
}

Multi-language support is complex in some ways, simple in others. It's essentially storing different language content and retrieving it when needed.

Regional Differentiation

function getCityTierLabel(
  cityTier: CityTier,
  region: "cn-mainland" | "international",
  language: SupportedLanguage
): string {
  const city = benchmarkData.cityCoefficients.find(
    item => item.tier === cityTier
  )

  if (region === "cn-mainland") {
    return language === "zh-CN" ? city.label : city.labelEn
  }

  return language === "zh-CN"
    ? city.internationalLabel
    : city.internationalLabelEn
}

Regional differentiation means displaying different labels for different regions. This isn't difficult—just judge the region and language, then return the corresponding value.

Summary

AI cost-benefit assessment isn't anything profound—the core is three calculations: enterprise labor costs, AI usage costs, and efficiency improvement magnitude. Calculate these three clearly, and the ROI naturally emerges.

This is like many things in life—seemingly complex, but when broken down, it's just that. Few people are willing to sit down and calculate it.

But there's an easily overlooked point here: the multiplier effect from multi-Agent architecture. No matter how strong a single Agent is, it can only improve efficiency linearly. But multiple Agents working in parallel bring exponential capacity improvements. This is the core reason HagiCode chose a multi-Agent architecture.

One person's power is limited; a group's power is infinite. This sounds like a platitude, but applied to AI, it's fitting.

If you're also thinking about AI cost issues, welcome to try HagiCode Cost to experience our calculator. Or go directly to GitHub to see the source code—maybe it'll give you some inspiration.

Or maybe not—I can't guarantee that. Just giving it a try, after all, paths are made by walking...

Writing this, I suddenly remembered an old saying: "To do good work, one must first sharpen one's tools."

But sometimes, even with sharp tools, knowing how to use them is another matter. AI is like a double-edged sword—used well, it's assistance; used poorly, it's a burden. The balance is for you to find.

Enough of that. Hope this helps you.

References

Optimizing OpenSpec Phase Efficiency with Different Agents: HagiCode Practice Summary

Hagicode — Fri, 08 May 2026 06:24:01 +0000

Optimizing OpenSpec Phase Efficiency with Different Agents: HagiCode Practice Summary

Generic prompts cannot handle the specific requirements of different development stages. Through phase-specific agents and a parameterized template system, AI can produce high-quality output at every step.

Background

OpenSpec is a proposal-driven development system that manages the creation, review, and implementation of technical proposals through structured workflows. The idea itself is sound, but in practice, we found significant issues with using a single generic AI prompt.

The explore stage lacks context anchoring, causing AI explorations to deviate from the proposal scope; artifact generation quality is unstable, with design.md missing visual elements, proposal.md lacking code change tables, and tasks.md even including Git operations that shouldn't be there; responsibility boundaries are blurred, with unclear content requirements for different document types; prompts lack flexibility, unable to dynamically adjust AI behavior based on different scenarios.

These issues directly impact the efficiency and output quality of the OpenSpec workflow. There's really no other way but to modify the prompt templates ourselves. This article documents that period of work.

About HagiCode

The solution shared in this article comes from our practical experience in the HagiCode project. HagiCode is an AI-powered code assistant, and we extensively use the OpenSpec workflow to manage technical proposals during development. The agent layering strategy introduced here is exactly the optimization solution we summarized from practical use.

If you find this approach valuable, it means our engineering practices are pretty solid—HagiCode itself is worth paying attention to.

OpenSpec Workflow Analysis

The OpenSpec system contains multiple core stages, each with specific goals and constraints. Understanding the responsibility boundaries of these stages is the foundation for designing effective agent strategies.

┌─────────────────────────────────────────────────────────────────────┐
│                    OpenSpec Workflow Stages                         │
├─────────────────────────────────────────────────────────────────────┤
│  ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐     │
│  │ Explore  │ -> │   New    │ -> │    FF    │ -> │  Apply   │     │
│  └──────────┘    └──────────┘    └──────────┘    └──────────┘     │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐     │
│  │ Archive  │    │   Sync   │    │ Verify   │    │  Status  │     │
│  └──────────┘    └──────────┘    └──────────┘    └──────────┘     │
└─────────────────────────────────────────────────────────────────────┘

Each stage has completely different goals: The Explore stage requires a thinking posture, focused on information gathering; the New stage focuses on requirement analysis and solution design; the FF stage creates artifacts in batch by dependency order; the Apply stage transforms proposals into actual code. Using the same prompt template to drive these vastly different tasks is clearly unreasonable.

Prompt System Architecture

OpenSpec uses a templated prompt system, which provides the technical foundation for agent layering. Template files use .hbs (Handlebars/Scriban) format, paired with .json metadata files to define parameters and validation rules, supporting both Chinese and English.

The key design is the PromptScenario enumeration, which defines prompt scenarios for different stages:

public enum PromptScenario
{
    OpenspecV1Explore,      // Exploration stage
    OpenspecV1New,          // New proposal
    OpenspecV1Ff,           // Fast generation
    OpenspecV1Apply,        // Apply changes
    OpenspecV1Archive       // Archive
}

Each scenario has a corresponding independent template file, such as openspec-v1-explore.zh-CN.hbs and openspec-v1-ff.zh-CN.hbs, allowing specific constraints and guidance to be injected for different stages.

Parameterized Prompt Loading

Implementing dynamic parameter injection is the core of the entire system. FilePromptProvider is responsible for loading prompts based on scenarios and parameters:

public async Task<string> GetOpenspecV1FfPromptAsync(
    string changeName,
    string changeDescription,
    string locale = "en-US",
    string? planningDirectionInstructions = null,
    CancellationToken cancellationToken = default)
{
    var parameters = new Dictionary<string, object>
    {
        { "planningDirectionInstructions", 
          ResolvePlanningDirectionInstructions(locale, planningDirectionInstructions) }
    };

    if (!string.IsNullOrWhiteSpace(changeName))
    {
        parameters["changeName"] = changeName;
    }

    return await GetPromptWithParametersAsync(
        PromptScenario.OpenspecV1Ff,
        locale,
        cancellationToken,
        parameters);
}

This design allows us to dynamically inject parameters at runtime, such as changeName and planningDirectionInstructions, without modifying the template file itself.

Dynamic Planning Direction Configuration

HagiCode implements a flexible planning direction system that allows users to select different directions for each generation. Each direction has an independent ID, description, and prompt fragment:

public static class ProposalPlanningDirections
{
    private static readonly ProposalPlanningDirectionDefinition[] Catalog =
    [
        new(
            ExploreId,
            "Explore mode",
            DefaultEnabled: true,
            EnglishPromptFragment:
            "- Explore mode: add an explicit exploration pass...",
            ChinesePromptFragment:
            "- 探索模式：在定稿工件之前增加明确的探索阶段..."),
        // ... change-map, flowchart, prototype, architecture, sequence
    ];

    public static NormalizedProposalPlanningDirections Normalize(
        bool? enableExploreMode,
        IReadOnlyList<PlanningDirectionOptionDto>? planningDirections)
    {
        // Merge default configuration with user custom configuration
    }
}

Supported directions include: explore (exploration mode), change-map (change map), flowchart (interaction flowchart), prototype (UI prototype), architecture (architecture diagram), sequence (API sequence diagram). Users can freely toggle these directions, and the system dynamically generates corresponding prompt instruction blocks.

Use conditional statements in Handlebars templates to inject these instructions:

{{#if planningDirectionInstructions}}
## Planning Directions for This Generation

{{{planningDirectionInstructions}}}
{{/if}}

Clear Content Scope Constraints

The most critical improvement is defining clear content scope constraints for different document types, especially tasks.md. We added strict constraint conditions in the prompt:

### tasks.md Content Scope Constraints

When creating `tasks.md` artifacts, the following content scope constraints MUST be observed:

**MUST include**:
- Business logic tasks (code implementation, feature development)
- Technical implementation tasks (component integration, API development)
- Testing tasks (unit tests, integration tests)
- Documentation tasks (updating documentation, adding comments)

**MUST NOT include**:
- Git commit operations (git add, git commit, git push)
- Version control management workflows
- Deployment and release operations

Using normative language (MUST/SHALL) rather than suggestive language ensures AI strictly understands these constraints. For proposal.md and design.md, we also clarified their respective responsibility boundaries: proposal.md must include code change tables and UI prototype diagrams (when involving UI changes), while design.md must include architecture diagrams and data flow diagrams.

Exploration Stage Context Anchoring

The Explore stage problem is easily overlooked—AI explorations may completely deviate from the proposal scope. We address this through enhanced prompts:

## Explore Execution Principles

- **No documentation needed** - Exploration results need not be saved as independent documents
- **Information transfer** - After exploration is complete, collected information will be passed to the Proposal creation stage
- **Focus on thinking** - The value of exploration lies in information gathering, not document output

## Connection with Proposal Creation

The Explore stage occurs after proposal creation and before project code is written. After exploration is complete,
the system will guide you to create or populate the `proposal.md` file, and exploration-collected information will serve as the foundation for proposal content.

This clarifies the positioning of the Explore stage: it's a preliminary step for information gathering, not an independent document production phase. Once AI understands this, it can focus more on proposal-related knowledge exploration.

Implementation Guide

If you want to apply this solution in HagiCode, follow these steps:

Define planning directions: Define direction IDs, default states, and prompt fragments in ProposalPlanningDirections.cs
Template parameterization: Use conditional statements and variable injection in .hbs templates
Verify output: When enabling specific directions, check that corresponding artifacts contain expected content
Test boundaries: Verify that disabling directions doesn't generate corresponding content and doesn't affect other directions

Note that template modifications should remain synchronized with upstream, and Chinese and English template structures should be consistent. Planning direction rendering should complete in microseconds to avoid performance impact.

Summary

The efficiency optimization of the OpenSpec workflow lies in understanding the differentiated needs of different stages. Through stage-specific agents, parameterized templates, and clear content constraints, we enable AI to produce high-quality output at every step.

This solution has been validated in HagiCode's practice—not only improving document quality but also reducing manual modification workload. If your team is also using a similar proposal-driven workflow, I hope these experiences provide some inspiration.

It's really just about breaking down the problem. Each stage has its characteristics, use the right method, and the problem naturally becomes simple.

References

HagiCode project repository: github.com/HagiCode-org/site
HagiCode official website: hagicode.com
Official version demo video: www.bilibili.com/video/BV1z4oWB3EpY/
One-click installation experience: docs.hagicode.com/installation/docker-compose
Desktop quick installation: hagicode.com/desktop/

If this article helps you:

Give it a like to help more people see it
Come to GitHub and give us a Star
Visit the official website to learn more
Watch the demo video to understand complete features
One-click installation to start experiencing

Public beta has begun, welcome to install and experience!

Desktop Application P2P Distribution Acceleration Practice: Full-Chain Integration from Consumer to Publisher

Hagicode — Fri, 08 May 2026 01:23:10 +0000

Desktop Application P2P Distribution Acceleration Practice: Full-Chain Integration from Consumer to Publisher

Large file distribution for desktop applications has always been a headache—high bandwidth costs, slow download speeds, and poor user experience. This article shares the hybrid distribution solution we implemented in HagiCode Desktop, which accelerates downloads through P2P technology while maintaining HTTP fallback capability, ultimately achieving a complete closed loop between the publisher and consumer sides.

Background

Desktop application distribution packages are typically not small, often running into hundreds of MB. This is actually quite normal—after all, modern applications have more and more features, so naturally their size increases. For applications like HagiCode Desktop, each version update means distributing large files to a large number of users, which poses a significant challenge to server bandwidth.

The traditional approach is direct HTTP download—simple and straightforward, but the problems are obvious: high server load during peak periods, slow download speeds for users, especially overseas users. There's really no way around this, as physical distance is what it is. P2P technology can solve this problem well—users share file fragments with each other, reducing server pressure while improving download speeds.

But things aren't that simple. During the development of HagiCode Desktop, we discovered an interesting phenomenon: the consumer side (desktop application) already had hybrid download capabilities, able to parse fields like torrentUrl, infoHash, webSeeds, sha256, etc., and prioritize P2P-accelerated downloads through a hybrid download coordinator. However, the publisher side (build toolchain) didn't stably output these fields to Azure Blob's index.json.

This actually created a disconnect: the client was expecting a more efficient distribution method, but the publisher was still using the traditional flat file list to build the index. The potential of P2P acceleration was being wasted, which is a shame.

To close this loop, we implemented a complete overhaul solution—from metadata generation on the publisher side to hybrid download coordination on the consumer side, making the entire distribution chain truly work. Next, I'll share in detail the design thinking and implementation details of this solution, hoping to provide some reference for friends facing similar problems.

About HagiCode

The hybrid distribution solution shared in this article comes from our practical experience in the HagiCode project. HagiCode Desktop is our desktop application, supporting Windows, macOS, and Linux platforms. As an AI code assistant project, the desktop client needs to update distribution packages frequently, which prompted us to explore more efficient distribution methods. After all, no one wants to wait half a day for every update, right?

Analysis

Nature of the Problem

On the surface, this looks like a "add torrent file generation" feature requirement. But upon deeper analysis, we discovered this is actually a producer-consumer contract mismatch problem. This kind of situation is quite common—sometimes development and operations just aren't on the same page.

The consumer side expects asset-level hybrid distribution fields:

{
  "torrentUrl": "https://...",
  "infoHash": "<sha1 infohash>",
  "webSeeds": ["https://..."],
  "sha256": "<package digest>"
}

While the publisher side provides a file-level flat list:

{
  "files": [
    {"name": "hagicode-1.2.3-win-x64.zip", "url": "https://..."},
    {"name": "hagicode-1.2.3-win-x64.zip.torrent", "url": "https://..."}
  ]
}

These two are semantically completely mismatched. The consumer side cannot determine from a flat list which file is the main file and which is the sidecar, nor can it establish associations between them. It's like trying to find someone, but only being given a phonebook and told to find them yourself—quite troublesome.

Key Constraints

When designing the solution, we defined several constraints that must be met:

Threshold Consistency: The publisher and consumer must use the same file size threshold. We set it to 100 MB—only files reaching this size generate P2P metadata. This avoids policy drift where "the publisher marks it as acceleratable but the consumer decides not to accelerate." This is actually quite important, after all, if the two sides are inconsistent, various strange bugs will appear.

Fallback Guarantee: webSeeds must include directUrl. This ensures that even without P2P connections (for example, as the first downloader), users can still download the complete file via HTTP. P2P is an acceleration means, not a replacement. It's like driving—P2P is the highway, but you also need to keep regular roads in case the highway is congested.

Compatibility Window: index.json needs to output both assets and files projections. Old clients may not recognize the assets field, so files needs to be retained as a compatibility projection to avoid client interruption due to server upgrades. This is actually quite common, after all, not all users update their clients in a timely manner.

Technical Decisions

In terms of specific implementation, we adopted a "standalone metadata builder + optional Node bridge script" architecture, rather than implementing torrent generation directly in AzureBlobAdapter.

This has several benefits:

Clear responsibilities: Metadata construction logic is independent of the storage adapter, facilitating testing and maintenance
Platform decoupling: C# environment can call Node scripts to generate torrents, leveraging existing torrent libraries
Migration-friendly: If we need to migrate to other storage backends in the future, the metadata builder can be reused

This is actually a pretty good choice, after all, when responsibilities are clear, subsequent maintenance is much easier.

Solution

1. Metadata Construction Process

The complete metadata construction process looks like this:

Packaging complete → Identify large files (≥100MB) → Calculate sha256 → Generate .torrent sidecar 
→ Extract infoHash → Assemble metadata → Upload ZIP + .torrent → Write index.json

Each step has clear responsibilities:

File identification: Iterate through build artifacts and filter files with size ≥ 100 MB. This threshold is consistent with the consumer side's HYBRID_THRESHOLD_BYTES. This is actually quite important, after all, if thresholds are inconsistent, various strange problems will appear.

SHA256 calculation: Calculate the SHA256 digest of the main file for integrity verification after download. This is a security line of defense, ensuring that files downloaded by users haven't been tampered with. It's like adding a fingerprint to a file—if it's tampered with, it can be discovered in time.

Torrent generation: Use a Node script to call the torrent library to generate a .torrent sidecar file. Naming follows the {artifact}.zip.torrent format for easy reverse lookup of sidecars from ZIP filenames. This is actually a small trick that makes naming more standardized and convenient for subsequent processing.

InfoHash extraction: Extract the infoHash (SHA1 format) from the torrent file, which is the unique identifier for resources in the P2P network. It's like everyone's ID number—without this, the P2P network cannot find the corresponding resource.

Metadata assembly: Assemble directUrl, torrentUrl, infoHash, webSeeds, sha256 into a complete asset metadata object.

2. Index Structure Upgrade

Upgrade from a flat files projection to an asset-level assets object:

{
  "versions": [{
    "version": "1.2.3",
    "assets": [{
      "name": "hagicode-1.2.3-win-x64.zip",
      "directUrl": "https://hagicode.blob.core.windows.net/releases/v1.2.3/hagicode-1.2.3-win-x64.zip",
      "torrentUrl": "https://hagicode.blob.core.windows.net/releases/v1.2.3/hagicode-1.2.3-win-x64.zip.torrent",
      "infoHash": "a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6q7r8s9t0",
      "sha256": "1a2b3c4d5e6f7g8h9i0j1k2l3m4n5o6p7q8r9s0t1u2v3w4x5y6z7a8b9c0d1e2f",
      "webSeeds": [
        "https://hagicode.blob.core.windows.net/releases/v1.2.3/hagicode-1.2.3-win-x64.zip"
      ]
    }],
    "files": [ // Compatibility projection
      {"name": "hagicode-1.2.3-win-x64.zip", "url": "https://..."}
    ]
  }]
}

This structure has several design considerations:

Dual projections coexist: assets provides complete hybrid distribution metadata, files provides a simplified compatibility view. New clients prioritize using assets, old clients fall back to files. This is actually a compromise, after all, we can't just leave old users behind.

WebSeeds includes DirectUrl by default: Ensures that even without P2P connections, users can still download completely via HTTP. This is a fallback guaranteeing 100% availability. It's like driving—P2P is the highway, but you also need to keep regular roads in case the highway is congested.

Clear naming convention: The {artifact}.zip.torrent naming allows the consumer side to automatically discover sidecars without additional configuration. This is actually a small trick that makes naming more standardized and convenient for subsequent processing.

3. Publishing Orchestration

Build.AzureStorage.cs orchestrates the complete process through AzureReleasePublishOrchestrator:

var orchestrator = new AzureReleasePublishOrchestrator(
    new ArtifactHybridMetadataBuilder(), // Build hybrid metadata
    adapter);

summary = await orchestrator.PublishAsync(
    downloadedFiles,
    publishOptions,
    outputPath,
    UploadIndex,
    MinifyIndexJson,
    EffectiveGitHubRepository);

The orchestrator ensures sidecars are uploaded before the index, and outputs diagnostic information in the summary. This way, if publication fails, you can quickly locate whether it was sidecar generation failure, missing upload, or index write failure. This is actually quite important, after all, if publication fails, being able to quickly locate the problem saves time.

Practice

Key Code Modules

1. Metadata Consumer

The consumer side builds hybrid distribution metadata from asset objects in index.json:

// http-index-source.ts:418-463
private buildHybridMetadata(asset: HttpIndexAsset, directUrl: string, assetKind: VersionAssetKind): HybridDistributionMetadata {
  const torrentUrl = this.resolveOptionalUrl(asset.torrentUrl);
  const hasTorrentMetadata = Boolean(torrentUrl || asset.infoHash);

  // WebSeeds includes directUrl by default, ensuring fallback
  const webSeeds = [...legacyWebSeeds, ...structuredWebSeeds];
  if (directUrl && !webSeeds.some((seed) => seed.toLowerCase() === directUrl.toLowerCase())) {
    webSeeds.push(directUrl);
  }

  return {
    torrentUrl,
    infoHash: asset.infoHash,
    webSeeds,
    sha256: asset.sha256,
    hasTorrentMetadata,
    torrentFirst: hasTorrentMetadata, // Prioritize P2P
    eligible: hasTorrentMetadata,
  };
}

Key design points:

The torrentFirst flag controls download strategy, prioritizing P2P when torrent metadata is available
webSeeds forcibly includes directUrl, ensuring fallback capability
The eligible field indicates whether this asset supports hybrid distribution

This is actually a small trick—through these flag bits, you can flexibly control download strategy.

2. Hybrid Download Coordinator

The hybrid download coordinator is responsible for executing the actual download logic:

// hybrid-download-coordinator.ts:83-184
async download(...): Promise<HybridDownloadResult> {
  const policy = this.policyEvaluator.evaluate(version, settings);

  if (policy.useHybrid) {
    try {
      // Prioritize Torrent engine download
      await this.engine.download(version, cachePath, settings, onProgress);
    } catch (error) {
      // Fall back to HTTP/WebSeed when Torrent fails
      await this.downloadViaHttpSources(version, cachePath, packageSource, policy, ...);
    }
  } else {
    // HTTP-only mode
    await packageSource.downloadPackage(version, cachePath, onProgress);
  }

  // sha256 verification ensures integrity
  return await this.verify(version, cachePath, ...);
}

Download strategy:

Evaluate user settings and network environment to decide whether to enable hybrid mode
Prioritize attempting Torrent download (P2P)
Automatically fall back to HTTP/WebSeed on failure
Use SHA256 to verify integrity after download completion

This design ensures the best user experience—acceleration when P2P is available, normal download when it's not. This is actually a pretty good strategy, after all, user experience is what matters most.

3. Publisher Orchestration

The publisher side coordinates the entire process through an orchestrator:

// Build.AzureStorage.cs:152-168
var orchestrator = new AzureReleasePublishOrchestrator(
    new ArtifactHybridMetadataBuilder(),
    adapter);

summary = await orchestrator.PublishAsync(
    downloadedFiles,
    publishOptions,
    outputPath,
    UploadIndex,
    MinifyIndexJson,
    EffectiveGitHubRepository);

The orchestrator is responsible for:

Calling the metadata builder to generate P2P metadata
Ensuring both main files and sidecars are uploaded to Blob storage
Updating both assets and files projections in index.json
Outputting a publication summary containing diagnostic information

This is actually a pretty good architecture—through the orchestrator, the entire process is strung together, making subsequent maintenance convenient.

Practical Experience

In implementing this solution, we accumulated some practical experience:

Naming conventions are important: Using {artifact}.zip.torrent makes it easy to reverse-lookup sidecars from ZIP files. This convention seems simple, but in actual operation it can save a lot of trouble—the consumer side can automatically discover sidecars without additional configuration. This is actually a small trick that makes naming more standardized and convenient for subsequent processing.

Failure diagnosis must be clear: Publication summaries need to clearly distinguish between sidecar generation failure, missing upload, and index write failure. We suffered from this in early versions—after publication failed, we didn't know which step had problems, making troubleshooting very difficult. Now each step has clear error messages, making problem location much faster. This is actually quite important, after all, debugging time is also a cost.

Graceful degradation: Assets that don't meet conditions automatically fall back to HTTP-only, not blocking the entire publication. For example, if a file is smaller than 100 MB, or torrent generation fails, no P2P metadata is generated, and it goes directly to HTTP download. This way, even if the P2P link has problems, basic functionality isn't affected. This is actually a pretty good strategy, after all, you can't let one failed feature affect the entire publication process.

Threshold validation: The publisher threshold must be consistent with the consumer side's HYBRID_THRESHOLD_BYTES. We define this value as a constant and test consumer-publisher consistency in CI. If inconsistent, the awkward situation of "publisher thinks it can accelerate but consumer decides not to accelerate" will occur. This is actually quite important, after all, if the two sides are inconsistent, various strange problems will appear.

SHA256 is the security line: No matter the download channel (P2P, HTTP, WebSeed), everything is verified with SHA256 in the end. This is the last line of defense against file tampering and absolutely cannot be omitted. It's like adding a fingerprint to a file—if it's tampered with, it can be discovered in time. When it comes to security issues, you can't be too careful.

Summary

Large file distribution for desktop applications is a classic challenge, and P2P technology provides an elegant solution. Through this hybrid distribution architecture, HagiCode Desktop achieved several key goals:

Lower distribution costs: P2P shares server bandwidth pressure, maintaining stable distribution capability even during peak periods. This is actually a pretty good benefit, after all, saving some bandwidth money is good.

Improved user experience: Download speeds increase significantly with P2P connections, especially for overseas users. Without P2P connections, normal download via HTTP is still possible, guaranteeing 100% availability. This is actually a pretty good strategy, after all, user experience is what matters most.

Smooth evolution path: Through the dual-projection index design, independent upgrades of server and client are achieved. Old clients are unaffected, new clients gradually enable P2P acceleration. This is actually a pretty good architecture, after all, if you can upgrade smoothly, you won't affect existing users.

The core idea of this solution is "progressive enhancement"—HTTP is the baseline, P2P is the enhancement. This both guarantees reliability and provides room for performance improvement. This is actually a pretty good philosophy, after all, you can't sacrifice reliability for the sake of performance.

If you're also working on desktop application distribution, or facing similar large file distribution problems, I hope this solution provides you some inspiration. P2P technology isn't mysterious—the key is designing good contracts between the publisher and consumer sides, making the entire chain work. This is actually pretty good experience, after all, being able to help others is a good thing.

References

If this article helped you, feel free to give a Star on GitHub: github.com/HagiCode-org/site. HagiCode Desktop public beta has begun, welcome to install and try it! This is actually a nice invitation, after all, one more trial means one more piece of feedback, which is also a good thing.

Implementing Image Upload and AI Recognition in Chat: A Complete Solution from Design to Implementation

Hagicode — Thu, 07 May 2026 06:47:44 +0000

Implementing Image Upload and AI Recognition in Chat: A Complete Solution from Design to Implementation

In AI interaction systems, how can we enable users to upload images and have AI directly recognize them? I've actually struggled with this question for quite a while, but fortunately, I've gained some insights through the practice at HagiCode. Today, let's discuss this image upload and recognition solution—from custom protocol design to file system storage, to front-end and back-end separated preview. This serves as a complete technical note.

Background

In this era of AI chat popularity, visual information is actually an important carrier for users to express their intentions. However, most traditional chat systems only support pure text input, which prevents users from directly passing visual context to AI for analysis—a bit regrettable.

HagiCode also faced similar challenges during development: users couldn't upload images when chatting or creating main opinions, AI couldn't access users' local visual information, and there was a lack of a complete loop from image input, storage, rendering to AI context delivery.

Actually, these problems aren't a big deal, they just need some time and patience to solve. We designed and implemented a complete image upload and recognition process, enabling Claude and other AIs to directly recognize and analyze user-uploaded screenshots. Next, I'll detail the implementation of this solution.

About HagiCode

The solution shared in this article comes from our practical experience in the HagiCode project. HagiCode is an open-source AI code assistant project that uses OpenSpec-based workflow design and is committed to providing a smarter code writing experience.

Analysis

Technical Challenges

Before starting implementation, we need to first clarify the main challenges we face, after all, sharpening the axe before cutting trees doesn't delay the work.

Cross-module collaboration: Image upload involves multiple modules including frontend UI, upload service, backend API, file storage, message persistence, and AI execution mapping. Each module has its own responsibilities and interfaces, requiring a coordinated overall solution design.

Storage strategy selection: Should images be stored in the database or file system? If choosing file system, how should the directory structure be designed? How to integrate with the existing OpenSpec workflow? These all need careful consideration.

Reference protocol design: A standard image reference method is needed that can be both rendered by the frontend and correctly parsed by the AI execution pipeline. Use file paths directly? HTTP URLs? Or design a dedicated protocol?

AI capability compatibility: Different AI executors have varying degrees of multimodal support. Some executors natively support image input, while others can only process text. How to design a unified adaptation layer to ensure all executors can correctly handle image information?

Design Decisions

After thorough discussion and consideration, we made the following key design decisions.

Decision 1: File System Storage

We chose to store images in the file system rather than the database. The directory structure is designed as follows:

<system-root>/images/<sessionId>/
├── <timestamp>-<uuid>.jpg
└── <timestamp>-<uuid>.png

The rationale is quite clear: simplify implementation, avoid database bloat, and files can be directly read by AI. Moreover, image files are essentially not suitable for storage in databases; file system is the more natural choice. It's like putting books on a bookshelf rather than stuffing them into a notebook—same principle.

Decision 2: Custom Protocol hagiimag://

To avoid conflicts with HTTP URLs while making reference semantics clearer, we designed a custom image reference protocol:

hagiimag://session-abc123/20260301-143022-a1b2c3d4

This protocol has the format hagiimag://<sessionId>/<imageId>, with clear semantics and easy to parse and route. Seeing this format, developers can immediately understand it's an image reference, not a regular URL. Such design nuances can sometimes be quite useful.

Decision 3: Frontend Preview and AI Access Separation

During implementation, we discovered that frontend and AI have different access needs for images: the frontend needs to preview through HTTP API, while AI needs to directly read local file paths. Therefore, we designed separated access methods:

Frontend uses /api/Images/{sessionId}/{imageId}/content for preview
AI uses local file paths parsed by the server

This ensures both security (not exposing server paths) and usability (browsers can directly access). After all, security and usability always need to be balanced.

Decision 4: Immediate Upload Strategy

Another key decision is the upload timing. We chose to trigger upload immediately when the user selects or pastes an image, only referencing successfully uploaded images when sending messages.

The benefit is error handling is done upfront, avoiding complexity in the message sending API and maintaining JSON contract simplicity. Users know whether the image upload succeeded before sending, providing better experience. This "prepare for a rainy day" design approach applies in many situations.

Solution

Architecture Design

Based on the above decisions, we designed the following overall architecture:

Frontend Layer
├── ConversationInputArea  ◄─────── useImageAttachmentManager
│       │                             │
│       ├── File selection            ├── Attachment state management
│       ├── Clipboard paste           ├── Upload/retry/delete
│       └── Attachment preview        └── Image reference generation
│
Service Layer
├── ImageUploadService
│       ├── uploadImage()      ◄─────── ImagesController
│       ├── deleteImage()                 │
│       ├── parseHagiImageUrl()  ◄─────── Parse protocol links
│       └── buildPreviewUrl()              │
│
Backend Layer
├── ImagesController           ◄─────── ImagesDomainService
│       │                                  │
│       ├── POST /upload                  ├── File validation
│       ├── GET /{sessionId}/{imageId}    ├── Image saving
│       ├── DELETE                        ├── Image compression
│       └── GET /content                  └── Reference parsing
│
AI Execution Layer
├── ImageContentBlock          ◄─────── StructuredMessageDomainService
│       │                                  │
│       ├── Multimodal executor           ├── Image block parsing
│       └── Text executor fallback        └── Path hint generation

This architecture clearly shows the complete data flow from frontend to AI. Each layer has clear responsibilities and interacts through standard interfaces. Good architecture is like this—each doing its job, not interfering with each other, smooth communication.

Key Processes

Image Upload Process:

User selects images through file selection or clipboard paste
Frontend validates file type and size (supports JPEG/PNG/WEBP/GIF, 10MB per file)
Calls upload API, image saved to /images/{sessionId}/ directory
API returns hagiimag:// reference and preview URL
Frontend displays preview thumbnail in attachment bar, user can preview before sending

AI Recognition Process:

User sends message containing image reference
Backend parses hagiimag:// protocol link, extracts sessionId and imageId
Maps image reference to ImageContentBlock
Selects processing method based on executor capability:
- Multimodal executor: passes structured image input
- Text executor: falls back to image path hint

This completes a full loop: user uploads image → AI recognizes image → AI returns analysis results. Such smooth processes often bring better user experience.

Practice

Frontend Implementation

On the frontend, we provide a dedicated Hook to manage image attachment state:

import { useImageAttachmentManager } from '@/hooks/useImageAttachmentManager';

function ChatInput() {
  const {
    attachments,
    uploadedImages,
    hasBlockingAttachments,
    isUploading,
    selectFiles,
    removeAttachment,
    clearAttachments,
  } = useImageAttachmentManager({
    ownerId: sessionId,
    mapUploadedImage: (response) => response,
    uploadOptions: { compress: false },
  });

  const handleFileSelect = (files: File[]) => {
    selectFiles(files);
  };

  const handlePaste = (e: ClipboardEvent) => {
    const files = Array.from(e.clipboardData?.files || [])
      .filter(f => f.type.startsWith('image/'));
    if (files.length > 0) {
      handleFileSelect(files);
    }
  };

  return (
    <div>
      {/* Attachment bar */}
      {attachments.map(att => (
        <AttachmentItem
          key={att.localId}
          file={att.file}
          status={att.status}
          onRemove={() => removeAttachment(att.localId)}
        />
      ))}

      {/* Input box */}
      <textarea onPaste={handlePaste} />

      {/* Upload button */}
      <button onClick={() => fileInputRef.current?.click()}>
        Upload Image
      </button>
    </div>
  );
}

This Hook encapsulates all attachment management logic, including upload status tracking, failure retry, attachment deletion, etc. It's very simple to use—just calling a few methods completes the entire process. Good API design is like this—simple and easy to use, yet flexible.

Parsing Custom Protocol:

// Extract sessionId and imageId from custom protocol
const parsed = parseHagiImageUrl("hagiimag://session-abc123/20260301-143022-uuid");
// Returns: { sessionId: "session-abc123", imageId: "20260301-143022-uuid" }

// Build preview URL
const previewUrl = buildPreviewUrl(parsed.sessionId, parsed.imageId);
// Returns: "/api/Images/session-abc123/20260301-143022-uuid/content"

Through these two utility functions, the frontend can easily convert between hagiimag:// protocol and HTTP URLs. This conversion logic is encapsulated, making it much more convenient to use.

Backend Implementation

The backend uses ASP.NET Core implementation, with ImagesController and ImagesDomainService at the core:

[HttpPost("upload")]
[RequestSizeLimit(50 * 1024 * 1024)]
public async Task<ActionResult<ImageUploadResponseDto>> Upload(
    [FromForm] UploadImageFormRequest input)
{
    // 1. Validate request
    if (file == null || file.Length == 0)
        throw new UserFriendlyException("No file provided");

    // 2. Validate file type and size
    var (isValid, errorMessage) = _imagesDomainService.ValidateImage(
        file.FileName, file.ContentType, file.Length);
    if (!isValid)
        throw new UserFriendlyException(errorMessage);

    // 3. Save to file system
    await using var stream = file.OpenReadStream();
    var result = await _imagesDomainService.UploadImageAsync(
        stream,
        sessionId,
        file.FileName,
        file.ContentType,
        CurrentUserId,
        compress: input.Compress);

    // 4. Return result
    return Ok(result);
}

This implementation follows typical Web API development patterns: validate, process, return. Note that we set a 50MB request size limit to prevent malicious large file uploads. In the online world, it's always better to be cautious.

Important Considerations

During implementation, some details need special attention:

Permission validation: Image access must verify user identity, ensuring only images from their own sessions can be accessed. This is a basic security requirement that cannot be omitted. When it comes to security, better safe than sorry.

Path security: Strictly validate sessionId and imageId to prevent path traversal attacks. For example, reject paths containing ../ to prevent users from accessing arbitrary files in the system. Handling these boundary conditions well makes the system more robust.

File cleanup: When sessions are deleted, associated images must be cleaned up synchronously to avoid orphan file accumulation. Over long operation periods, these files may occupy significant disk space. Timely cleanup is also a good habit.

Compression strategy: For screenshot-type filenames (like screenshot.png), automatically enable compression to save space. This strategy can be adjusted according to actual needs. When it comes to storage space, every bit saved helps.

Fallback handling: Executors that don't support multimodal must receive image path hints and cannot silently drop image information. This is important, otherwise users will think the AI ignored their image. User experience depends on these details.

State management: Attachments being uploaded block message sending, failed attachments allow retry or deletion. This design ensures user experience continuity. Clear state management means users won't feel confused.

Summary

Through this complete image upload and recognition solution, HagiCode achieved a full loop from user input to AI recognition. The core highlights of the entire solution include:

Custom hagiimag:// protocol achieves standardization of image references
File system storage simplifies implementation and improves performance
Frontend preview and AI access separation balances security and usability
Immediate upload strategy optimizes user experience
Multimodal and text fallback compatibility design ensures flexibility

This solution runs stably in HagiCode with positive user feedback. If you're also implementing similar functionality, I hope these experiences are helpful to you.

Actually, when it comes to technical solutions, there's no absolute right or wrong, only what fits or doesn't fit. Finding the path that suits your project is what's most important.

References

HagiCode GitHub: github.com/HagiCode-org/site
HagiCode Official Site: hagicode.com
OpenSpec Workflow Documentation: docs.hagicode.com

Original Article & License

Thanks for reading. If this article helped, consider liking, bookmarking, or sharing it.
This article was created with AI assistance and reviewed by the author before publication.

Author: newbe36524
Original URL: https://docs.hagicode.com/go?platform=devto&target=%2Fblog%2F2026-05-07-chat-image-upload-ai-recognition%2F
License: Unless otherwise stated, this article is licensed under CC BY-NC-SA. Please retain attribution when sharing.

Customizing OpenSpec Steps to Improve AI Generation Results

Hagicode — Thu, 07 May 2026 02:25:58 +0000

Customizing OpenSpec Steps to Improve AI Generation Results

When using OpenSpec to manage technical proposals, we encountered inconsistent quality in AI-generated documentation. There was really no other way but to modify the prompt templates ourselves. This article documents those days.

Background

OpenSpec is a system for managing technical proposals with a simple core idea: input a change description, automatically generate various documentation artifacts. Proposals, designs, specs, tasks—all can be auto-generated. Sounds pretty ideal, right?

But in actual use, we discovered some issues. How should I put it—not major problems, just that the generated output didn't feel quite right.

The generated design.md lacked necessary visual elements—no Mermaid flowcharts, no sequence diagrams, and no architecture diagrams. Such design documents made the technical team shake their heads; after all, who wants to read walls of pure text?

proposal.md was also unsatisfactory, lacking code change tables and UI prototypes. Decision-makers could stare at it for ages and still not understand what the change actually modified.

More frustrating was tasks.md, which mixed in various Git operation tasks. Responsibility boundaries became unclear, and developers looking at these tasks didn't know what they should or shouldn't do. This is also a bit helpless—after all, AI doesn't know your team's division of labor.

Visualization requirements for different document levels were also unclear. What charts should proposal and design contain? This question constantly troubled the team.

Where's the root of these problems? After analysis, we discovered the key point: the prompt templates lacked clear constraints and guidance.

This isn't surprising—after all, templates themselves are generic and can't perfectly adapt to every team's needs.

About HagiCode

The solution shared in this article comes from our practical experience in the HagiCode project. HagiCode is an AI code assistant project, and we heavily use OpenSpec to manage technical proposals during development.

It was precisely these real-world experiences that led to the birth of this improvement plan. Actually, it's nothing special—just encountering problems and solving them.

Analysis: Prompt System Architecture

To solve problems, first understand the system. Let's see how OpenSpec's prompt system works.

OpenSpec uses the Handlebars template system, where each prompt contains two parts:

JSON metadata file: Defines parameters, scenarios, version information
Handlebars template file: Contains actual prompt content

Resources/Prompts/
├── openspec-v1-ff.zh-CN.json    # metadata
├── openspec-v1-ff.zh-CN.hbs     # template content
├── openspec-v1-ff.en-US.json
└── openspec-v1-ff.en-US.hbs

The advantages of this separation design are obvious: metadata and content are managed separately, facilitating maintenance and localization. It's also a bit like writing code—separation of logic and presentation, everyone understands this principle.

The FF (Fast Forward) workflow is OpenSpec's core generation process:

flowchart TD
    A[User inputs change description] --> B[Create change directory]
    B --> C[Get artifact build order]
    C --> D[Create artifacts in dependency order]
    D --> E[Check planning direction requirements]
    E --> F[Verify artifact completeness]
    F --> G[Display final state]

This process looks perfect, but the problem lies in the "planning direction requirements" step—it lacks sufficiently clear guidance.

This is also a bit helpless; after all, when designing the system, it's impossible to consider every team's specific needs.

Planning Direction System

The planning direction system is OpenSpec's core customization mechanism, allowing users to select different generation options. The HagiCode project defines the following directions:

Direction ID	Function	Default Enabled
`explore`	Exploration mode	Yes
`change-map`	Change map	Yes
`flowchart`	Interactive flowchart	Yes
`prototype`	UI prototype	Yes
`architecture`	Architecture diagram	Yes
`sequence`	API sequence diagram	Yes

Each direction defines stable identifiers, default enabled states, display labels, and Chinese/English prompt fragments.

This system design is clever, but in HagiCode's practice, we discovered that having definitions alone isn't enough—the planning directions need to be explicitly used in prompt templates.

This is also a bit like many things in life: having options doesn't mean making choices; someone still needs to tell you how to choose.

Solution: Clear Constraints and Examples

Our improvement approach is straightforward: add clear constraints and reference examples to prompt templates.

Actually, there's nothing special—just making things clear.

1. Add Document Visualization Requirements

In the openspec-v1-ff.zh-CN.hbs template, we added explicit content scope constraints:

### tasks.md Content Scope Constraints

When creating `tasks.md` artifacts, the following content scope constraints must be observed:

Must include:
- Business logic tasks (code implementation, feature development)
- Technical implementation tasks (component integration, API development)
- Testing tasks (unit tests, integration tests)
- Documentation tasks (updating documentation, adding comments)

Must not include:
- Git commit operations (git add, git commit, git push)
- Version control management workflows
- Deployment and release operations

Using standardized "MUST/MUST NOT" language rather than "suggested" or "may" allows AI to more accurately understand constraints.

This is also a bit like teaching children—say what you mean, no ambiguity allowed.

2. Provide Reference Examples for Each Direction

Just saying "include flowcharts" isn't enough. We provided specific output examples for each enabled direction.

After all, talk is cheap—give a concrete example, and AI can better understand.

Change map direction example:

| File path | Change type | Change reason | Impact scope |
|-----------|-------------|---------------|-------------|
| Path/to/file | Add | Description | Module name |

Prototype direction example:

┌─────────────────────────────────────────┐
│ User Login                            [×] │
├─────────────────────────────────────────┤
│  Email address *                       │
│ ┌─────────────────────────────────────┐ │
│ │ user@example.com                   │ │
│ └─────────────────────────────────────┘ │
└─────────────────────────────────────────┘

Flowchart direction example:

sequenceDiagram
    participant U as User
    participant UI as Login Interface
    participant API as Backend API
    U->>UI: Click login button
    UI->>API: POST /api/auth/login

These examples allow AI to accurately understand the expected output format rather than improvising.

This is also a bit like providing reference answers during an exam—while not exactly the same, the format should at least be correct.

3. Use Standardized Language for Clear Requirements

For visualization requirements of different document types, we use standardized language to constrain:

For proposal.md:
- Must include code change table (when change-map direction enabled)
- Must include UI prototype (when involving UI changes and prototype direction enabled)
- Must not include detailed architecture diagrams (these should be in design.md)

For design.md:
- Must include all proposal.md content (more detailed version)
- Must include architecture diagram (when architecture direction enabled)
- Must include data flow diagram (when flowchart direction enabled)

These clear constraints significantly improved generation quality.

Actually, there's nothing else—just making things clear, don't let AI guess.

Practice: Code Implementation

Theory covered, let's see how it's implemented in the HagiCode project.

Define Planning Directions

Define planning directions in ProposalPlanningDirections.cs:

public static class ProposalPlanningDirections
{
    private static readonly ProposalPlanningDirectionDefinition[] Catalog =
    [
        new(
            ChangeMapId,
            "Change map",
            DefaultEnabled: true,
            EnglishPromptFragment:
            "- Change map: include structured file-impact views...",
            ChinesePromptFragment:
            "- 变更地图：加入结构化的文件影响视图..."),
        // ... other directions
    ];

    public static string RenderInstructionBlock(
        IEnumerable<ProposalPlanningDirectionState> directions,
        string? locale)
    {
        var enabledDirections = directions
            .Where(direction => direction.Enabled)
            .ToArray();

        if (enabledDirections.Length == 0)
        {
            return string.Empty;
        }

        var heading = IsChineseLocale(locale)
            ? "本次生成启用以下规划方向："
            : "Apply the following planning directions:";

        return string.Join(Environment.NewLine,
            [heading, .. enabledDirections.Select(d => d.GetPromptFragment(locale))]);
    }
}

This code has several noteworthy design points:

Using arrays instead of lists because definitions don't change at runtime
Lazy rendering—only generate text when there are enabled directions
Multi-language support, selecting appropriate prompt fragments based on locale

Actually, there's nothing special—just some常规 code design.

Template Parameterization

Use conditional statements in Handlebars templates:

{{#if planningDirectionInstructions}}
## Planning Directions for This Generation

{{{planningDirectionInstructions}}}
{{/if}}

**Steps**
1. **If input not provided, use reasonable defaults**
2. **Create change directory**
3. **Get artifact build order**
4. **Create artifacts sequentially until apply-ready**
   a. For each ready artifact:
      - Get instructions
      - Read dependency files
      - Create artifact file

Note that {{{planningDirectionInstructions}}}—three curly braces mean don't escape HTML, which preserves formats like Mermaid code blocks.

This is also a bit like compromise in life—sometimes you need to keep some original content, can't escape everything.

Prompt Loading Implementation

Implement parameterized prompt loading through FilePromptProvider:

public async Task<string> GetOpenspecV1FfPromptAsync(
    string changeName,
    string changeDescription,
    string locale = "en-US",
    string? planningDirectionInstructions = null,
    CancellationToken cancellationToken = default)
{
    var parameters = new Dictionary<string, object>
    {
        { "planningDirectionInstructions",
          ResolvePlanningDirectionInstructions(locale, planningDirectionInstructions) }
    };

    if (!string.IsNullOrWhiteSpace(changeName))
    {
        parameters["changeName"] = changeName;
    }

    return await GetPromptWithParametersAsync(
        PromptScenario.OpenspecV1Ff,
        locale,
        cancellationToken,
        parameters) ?? string.Empty;
}

This design is flexible: planningDirectionInstructions is optional—if not provided, the system uses default configuration.

After all, no one wants to pass in a bunch of parameters every time; having a default value is always good.

Validation and Testing

After implementation, the HagiCode team conducted comprehensive validation:

When Specific Directions Are Enabled

Check if generated proposal.md contains code change table
Check if generated design.md contains architecture diagrams
Verify tasks.md doesn't include Git operation tasks

When Specific Directions Are Disabled

Verify corresponding visualization content isn't generated
Ensure other directions' output isn't affected

Edge Cases

Behavior when all directions are disabled
Error handling for invalid direction IDs

These tests ensure system stability and predictability—critical for team adoption of new tools.

Actually, there's nothing special—just test what should be tested, after all, no one wants problems after going live.

Considerations

When implementing this solution, avoid these pitfalls:

Template synchronization: When modifying templates, keep them in sync with upstream. The HagiCode team encountered a template conflict that took half a day to resolve. This is also a bit helpless—after all, upgrades always bring some compatibility issues.

Bilingual consistency: Ensure Chinese and English templates have consistent structure and constraints. We encountered a situation where the Chinese version had constraints but the English version didn't, causing inconsistent document quality. This is also a bit awkward—after all, who knows which language users will use.

Performance impact: Planning direction rendering should complete in microseconds. If rendering takes too long, it affects user experience. After all, who wants to wait ages to see results.

Backward compatibility: Maintain support for old version APIs. For example, the enableExploreMode parameter—although we now use the planning direction system, old code still uses it. This is also a bit helpless—can't always require everyone to upgrade.

Clear expression: Use standardized language (MUST/SHALL) rather than suggestive language. This point was fully validated in HagiCode's practice. Actually, there's nothing else—just making things clear.

Summary

By customizing OpenSpec prompt steps, we successfully improved the quality of AI-generated documentation. Key improvements include:

Adding clear constraint conditions to prompt templates
Providing specific output examples for each planning direction
Using standardized language (MUST/MUST NOT) to constrain AI behavior
Implementing flexible parameterized prompt loading through code

This solution was validated in the HagiCode project, with significantly improved document quality: design documents include complete visual elements, proposal documents have clear code change tables, and task lists have clear responsibilities.

Actually, it's nothing special—just solving the problem.

If you're also using similar AI-assisted documentation generation systems, I hope these experiences help you. Remember: clear constraints and concrete examples are key to obtaining high-quality output.

After all, for some things, it's better to be clear...

References

How to Integrate GPT, Claude, and Other AI Models Using Copilot CLI

Hagicode — Wed, 06 May 2026 11:11:59 +0000

How to Integrate GPT, Claude, and Other AI Models Using Copilot CLI

In AI application development, how can you use a unified interface to integrate multiple models like GPT and Claude? This article shares our AI provider system design based on Orleans Grain architecture and practical GitHub Copilot CLI integration experience.

Background

In modern AI application development, integrating the latest GPT models is a core requirement for many developers. GitHub Copilot CLI is a powerful tool that not only supports OpenAI's GPT series models (such as GPT-4, GPT-5), but also other mainstream AI models like Claude. Through Copilot CLI, developers can call different AI models using a unified command-line interface without implementing complex integration logic for each model separately.

Actually, this is a long-standing issue. Having to write call logic for each model is just painful—too much heartache. After all, no one enjoys writing repetitive code, and rather than reinventing the wheel, it's better to find a unified interface to handle everything. Copilot CLI is exactly that kind of existence—you just call it, and let it handle the rest.

Core Values:

Unified CLI interface for accessing multiple AI models
Support for session management and context retention
Built-in tool calling capabilities (file operations, Git operations, etc.)
Support for streaming responses and real-time output

About HagiCode

The solution shared in this article comes from our practical experience in the HagiCode project. HagiCode is an AI code assistant project. During development, we encountered the challenge of needing to support multiple AI models simultaneously—some users are accustomed to using GPT-4, some prefer Claude, and others want to try the latest GPT-5. If we implemented separate call logic for each model, the code would become difficult to maintain. Through Copilot CLI's unified interface, we successfully solved this multi-model support pain point.

To put it plainly, users just have diverse tastes—difficult to please everyone. Some like GPT, some prefer Claude, and others insist on using the latest GPT-5. We just want everyone to be able to use their favorite model—after all, being happy is what matters most.

System Architecture Design

We implemented an extensible AI provider system through Orleans Grain architecture with the following overall structure:

┌─────────────────┐
│  Frontend/Client │
└────────┬────────┘
         │
         ▼
┌─────────────────────────────────┐
│  IGitHubCopilotGrain (Interface)│
│  - ExecuteCommandStreamAsync    │
│  - RunEditAsync                 │
│  - CancelAsync                  │
└────────┬────────────────────────┘
         │
         ▼
┌─────────────────────────────────┐
│  GitHubCopilotGrain (Implementation)│
│  - State Management             │
│  - Session Binding              │
│  - Response Mapping             │
└────────┬────────────────────────┘
         │
         ▼
┌─────────────────────────────────┐
│  CopilotAIProvider (Provider)   │
│  - Configuration Parsing        │
│  - Permission Management        │
│  - Streaming Processing         │
└────────┬────────────────────────┘
         │
         ▼
┌─────────────────────────────────┐
│  HagiCode.Libs (Shared Runtime) │
│  - Copilot CLI Process Management│
│  - Message Protocol Parsing     │
│  - Session Retention            │
└─────────────────────────────────┘

The advantage of this architecture is clear layering with single responsibilities. The interface layer defines the unified AI service contract, the implementation layer handles Orleans' distributed state management, the provider layer encapsulates Copilot CLI interaction details, and the underlying runtime is responsible for communicating with the CLI process.

To put it simply, clarify who does what—don't mix things up. After all, once code gets messy, it's hard to change later.

Core Component Analysis

1. GitHubCopilotGrain: Distributed AI Service Interface

As an Orleans Grain implementation, GitHubCopilotGrain provides distributed AI service capabilities:

public interface IGitHubCopilotGrain : IGrainWithStringKey
{
    /// <summary>
    /// Execute command and stream response
    /// </summary>
    Task<IAsyncEnumerable<GitHubCopilotResponse>> ExecuteCommandStreamAsync(
        string command,
        string? heroId = null,
        CancellationToken token = default,
        string? executionMessageId = null,
        string? systemMessage = null,
        Dictionary<string, string>? requestSettings = null);

    /// <summary>
    /// Execute edit operation
    /// </summary>
    Task<IAsyncEnumerable<GitHubCopilotResponse>> RunEditAsync(
        string editCommand,
        string? heroId = null,
        CancellationToken token = default);

    /// <summary>
    /// Cancel current execution
    /// </summary>
    Task CancelAsync(string heroId);
}

Key Design Points:

Using IAsyncEnumerable to support streaming responses, avoiding long wait times
Session-level state isolation through heroId
Support for passing requestSettings to dynamically configure model parameters

2. CopilotAIProvider: Core Provider Implementation

CopilotAIProvider is the core of the entire solution, encapsulating all interaction logic with Copilot CLI:

public class CopilotAIProvider : IAIProvider, IVersionedAIProvider
{
    private readonly CopilotOptions _options;
    private readonly ICopilotProcessExecutor _executor;

    public async IAsyncEnumerable<AIStreamingChunk> SendMessageAsync(
        AIRequest request,
        string? embeddedCommandPrompt = null,
        [EnumeratorCancellation] CancellationToken cancellationToken = default)
    {
        // Build execution options
        var options = new CopilotOptions
        {
            Model = request.Model ?? _options.Model,
            SessionId = request.Options?.Settings?.GetValueOrDefault("copilotSessionId"),
            Timeout = _options.Timeout,
            PermissionMode = request.OperationType == AIOperationType.Edit
                ? CopilotPermissionMode.BypassPermissions
                : CopilotPermissionMode.Default
        };

        // Execute command and stream process response
        await foreach (var message in _executor.ExecuteAsync(
            options, request.Prompt, cancellationToken))
        {
            yield return BuildChunk(message);
        }
    }
}

Core Features:

Automatic retry mechanism: Handles transient network issues and CLI process exceptions
Reasoning content tracking: Captures the model's reasoning process (reasoning field)
Multiple message type handling: Supports assistant, tool.started, tool.completed and other messages
Permission mode switching: Edit operations automatically use bypassPermissions, regular queries use default

3. CopilotOptions: Flexible Configuration System

The configuration class supports rich option settings:

public class CopilotOptions
{
    /// <summary>
    /// Specify the model to use, such as "gpt-4", "gpt-5", "claude-opus-4.5"
    /// </summary>
    public string Model { get; set; } = "gpt-4";

    /// <summary>
    /// Copilot CLI executable path
    /// </summary>
    public string ExecutablePath { get; set; } = "copilot";

    /// <summary>
    /// Session timeout
    /// </summary>
    public TimeSpan Timeout { get; set; } = TimeSpan.FromSeconds(1800);

    /// <summary>
    /// Authentication method
    /// </summary>
    public CopilotAuthSource AuthSource { get; set; } = CopilotAuthSource.LoggedInUser;

    /// <summary>
    /// Permission mode
    /// </summary>
    public CopilotPermissionMode PermissionMode { get; set; } = CopilotPermissionMode.Default;

    /// <summary>
    /// Session ID for maintaining context
    /// </summary>
    public string? SessionId { get; set; }

    /// <summary>
    /// Tool permission configuration
    /// </summary>
    public CopilotToolPermissions? Permissions { get; set; }
}

Configuration is about being adequate—after all, who wants to write a bunch of configurations they'll never use? Covering most scenarios is enough.

Configuration Guide

1. Basic Configuration

Add Copilot provider configuration in appsettings.json:

{
  "AI": {
    "Providers": {
      "Providers": {
        "GitHubCopilot": {
          "Enabled": true,
          "ExecutablePath": "copilot",
          "Model": "gpt-5",
          "Timeout": 1800,
          "IdleTimeout": 300,
          "UseLoggedInUser": true,
          "NoAskUser": true,
          "PermissionMode": "default",
          "Permissions": {
            "AllowAllTools": false,
            "AllowAllPaths": false,
            "AllowedTools": ["Read", "Bash(git:*)", "Bash(cat:*)"],
            "DeniedTools": []
          }
        }
      }
    }
  }
}

2. Model Selection

The system supports the following models (specified via Copilot CLI's --model parameter):

Model	Description	Recommended Scenarios
gpt-4 / gpt-4-turbo	OpenAI 4th generation models	General tasks, cost-effective
gpt-5	OpenAI latest 5th generation model	Complex reasoning, best performance
claude-sonnet-4.5	Anthropic Sonnet 4.5	Balance performance and cost
claude-opus-4.5	Anthropic Opus 4.5	High-precision tasks

In HagiCode's practice, we use GPT-4 as the default daily model, switch to GPT-5 for complex tasks (like large refactoring), and offer Claude models as an alternative for users who prefer Anthropic.

3. Register Services

// Register Copilot AI provider
services.AddSingleton<IAIProvider, CopilotAIProvider>();

// Register Orleans Grain
services.AddSingleton<IGitHubCopilotGrain, GitHubCopilotGrain>();

// Register process executor
services.AddSingleton<ICopilotProcessExecutor, CopilotProcessExecutor>();

Actually just these few lines—nothing special. Just register what needs to be registered so it can be found when needed.

Practice Examples

1. Basic Call

// Get Grain
var grain = grainFactory.GetGrain<IGitHubCopilotGrain>("session-123");

// Execute command
await foreach (var response in grain.ExecuteCommandStreamAsync(
    "Analyze the code structure of the current directory and generate documentation",
    heroId: null,
    token: cancellationToken))
{
    switch (response.Type)
    {
        case ExecutorResponseType.Text:
            Console.Write(response.Content);
            break;
        case ExecutorResponseType.ToolCall:
            Console.WriteLine($"[Tool Call] {response.ToolName}");
            break;
        case ExecutorResponseType.Completion:
            Console.WriteLine($"\n[Complete] Token Usage: {response.PromptTokens}+{response.CompletionTokens}");
            break;
    }
}

2. Context-Aware Session

var requestSettings = new Dictionary<string, string>
{
    { "model", "gpt-5" },
    { "temperature", "0.7" },
    { "maxTokens", "4096" },
    { "copilotSessionId", "existing-session-123" }  // Maintain session context
};

await foreach (var response in grain.ExecuteCommandStreamAsync(
    "Based on the previous analysis, generate corresponding unit tests",
    requestSettings: requestSettings,
    token: cancellationToken))
{
    // Handle response
}

3. Edit Mode Call

await foreach (var response in grain.RunEditAsync(
    "Convert all PascalCase naming to camelCase",
    heroId: "hero-001",
    token: cancellationToken))
{
    if (response.Type == ExecutorResponseType.FileEdit)
    {
        Console.WriteLine($"[Edit] {response.FilePath}: {response.EditCount} changes");
    }
}

Best Practices

Session Retention

Using the copilotSessionId parameter allows maintaining context across requests, which is very useful for scenarios requiring multi-turn dialogue. For example:

// Round 1: Establish context
var settings1 = new Dictionary<string, string> { { "copilotSessionId", "session-001" } };
await grain.ExecuteCommandStreamAsync("This is a C# project using .NET 8", requestSettings: settings1);

// Round 2: Ask based on context
var settings2 = new Dictionary<string, string> { { "copilotSessionId", "session-001" } };
await grain.ExecuteCommandStreamAsync("Recommend suitable project structure", requestSettings: settings2);

After all, AI isn't omnipotent—without context, how does it know what you're talking about? Like chatting, you need back-and-forth to keep the conversation going.

Permission Control

Choose the appropriate permission mode based on operation type:

Query operations: Use default mode, allowing AI to only read files and execute safe Git commands
Edit operations: Use bypassPermissions mode, allowing AI to modify files

var permissionMode = operationType == AIOperationType.Edit
    ? CopilotPermissionMode.BypassPermissions
    : CopilotPermissionMode.Default;

Tool Whitelist

Control AI executable operations through AllowedTools configuration:

{
  "Permissions": {
    "AllowAllTools": false,
    "AllowedTools": [
      "Read",
      "Bash(git:*)",
      "Bash(cat:*)",
      "Glob"
    ]
  }
}

In HagiCode, we strictly limit AI's operation permissions, only allowing file reading and Git command execution to ensure system security.

After all, you can't be too careful with security. Who knows if the AI might suddenly delete your entire project?

Timeout Handling

The default timeout is set to 30 minutes. For operations involving large numbers of files (like full code analysis), adjustments may be needed:

var options = new CopilotOptions
{
    Timeout = TimeSpan.FromMinutes(60)  // Extend to 60 minutes
};

Common Questions

Q: How to switch between different AI models?

A: Specify through Model configuration or requestSettings:

var settings = new Dictionary<string, string> { { "model", "claude-opus-4.5" } };

Actually just changing a parameter—nothing complex.

Q: How long can session context be maintained?

A: Depends on Copilot CLI implementation, usually cleaned up after session idle timeout (default 5 minutes). Can be adjusted through IdleTimeout configuration.

Q: How to handle CLI process crashes?

A: CopilotAIProvider has a built-in automatic retry mechanism that captures process exceptions and restarts the CLI. If consecutive failures exceed a threshold, it will throw an AIProviderException.

Program crashes are unavoidable. You can only do your best with fault tolerance—if it really goes down, just restart it.

Q: Are custom tools supported?

A: The tools supported by Copilot CLI are predefined, but you can control which tools are available through AllowedTools configuration. Custom tools require waiting for future Copilot CLI updates.

Summary

By integrating multiple AI models through Copilot CLI, we solved the multi-model support challenge in HagiCode development. The core advantages of this solution are:

Unified Interface: One codebase supporting multiple models like GPT, Claude, etc.
Session Management: Automatic context retention and session isolation
Tool Integration: Built-in common tools like file operations, Git operations
Streaming Response: Real-time AI output returns, improved user experience
Security and Control: Fine-grained permission control and tool whitelisting

If your project also needs to support multiple AI models, or you're looking for a mature CLI tool integration solution, why not try Copilot CLI? This architecture has been fully validated in HagiCode and can handle complex production environment requirements.

After all, who wants to write separate call code for each model? Having a unified solution saves everyone trouble.

References

If this article helps you:

Give us a Star on GitHub: github.com/HagiCode-org/site
Visit the official website to learn more: hagicode.com
Watch the official version demo video: www.bilibili.com/video/BV1z4oWB3EpY/
One-click install to experience: docs.hagicode.com/installation/docker-compose
Desktop quick install: hagicode.com/desktop/
Beta testing has started, welcome to install and experience

Building Multi-Platform code-server and OmniRoute with GitHub Actions

Hagicode — Wed, 06 May 2026 06:04:29 +0000

Building Multi-Platform code-server and OmniRoute with GitHub Actions

Facing the need to build and publish across Linux, macOS, and Windows platforms with unified releases, we designed a GitHub Actions-based multi-platform CI/CD pipeline. It's not that difficult when you think about it, but the roadblocks can definitely make you pull your hair out. This article shares the design philosophy and implementation details of this pipeline—including, of course, the pits we stepped in along the way.

Background

code-server is an open-source project that runs VS Code in a browser, allowing developers to work through a web IDE on a remote server. As HagiCode Desktop integrates code-server as its built-in runtime, we need to build, verify, and distribute customized versions of code-server across different operating systems (Linux, macOS, Windows).

This should have been pretty straightforward, but... when is life ever that easy?

Meanwhile, OmniRoute as a multi-model routing service also needs to share the same build and release pipeline with code-server. Although the two packages are built differently, they ultimately need to converge into the same GitHub Release. Like two originally non-intersecting lines, they still meet at some point in the end—call it fate, I suppose.

This brings several engineering challenges:

Cross-platform build differences: The build toolchains for Linux, macOS, and Windows are completely different (Linux uses quilt + bash, macOS uses Homebrew, Windows requires MSYS2)—each platform has its own temperament
Build artifact verification: After building, artifacts need to be automatically verified to start properly—after all, nobody wants to release something that doesn't run
Unified version management: Two packages need to share the same version number and release tag—like two people sharing one name, there needs to be a system
Parallel builds with serial publishing: Builds can run in parallel, but publishing needs coordination—this is where mistakes happen, and when they do, they're really mistakes

About HagiCode

The solution shared in this article comes from practical experience in the HagiCode project. HagiCode is an AI code assistant project that integrates code-server as a built-in runtime in its desktop product, thus requiring engineering solutions for multi-platform building and publishing. This is, to put it bluntly, just about getting the product out—nothing more.

Limitations of Upstream Build Pipeline

The code-server upstream project's own CI/CD pipeline (build.yaml) only builds for the linux-x64 platform, and its release process (publish.yaml) only targets npm, AUR, and Docker channels. It doesn't support:

Native builds for macOS and Windows—perhaps they don't think these platforms are important enough
Multi-platform matrix parallel builds—maybe the upstream team is small
Unified artifact verification mechanism—just publish it and let users try it themselves

That's fine, every project has its own priorities. We just happen to need these features, so we'll build them ourselves.

Design Decisions

Based on the above analysis, HagiCode designed an independent build pipeline in repos/vendered with the following core decisions:

1. Reuse shared version management and release toolchain

Version numbers use UTC date format YYYY.MMDD.RRRR, where RRRR is a zero-padded sequence of the GitHub Actions run number. This ensures monotonic incrementing and traceability of versions—after all, time doesn't flow backward, just like some things once changed cannot be undone:

// scripts/versioning.mjs
export function formatDateVersion({ date = new Date(), revision }) {
  const year = normalizedDate.getUTCFullYear()
  const month = String(normalizedDate.getUTCMonth() + 1).padStart(2, "0")
  const day = String(normalizedDate.getUTCDate()).padStart(2, "0")
  return `${year}.${month}${day}.${normalizedRevision}`
}

For example, the first build on 2026-05-05 generates version 2026.0505.0001 and tag v2026.0505.0001.

Actually, there's nothing special about this version format—it just happens to be good enough.

2. Package-isolated build scripts

Each package (code-server, omniroute) maintains its own build and verification logic under packages/<name>/scripts/, while shared publishing tools (scripts/versioning.mjs, scripts/github-release.mjs, scripts/publication.mjs) remain package-agnostic. Each manages its own affairs without interfering—this is what's called "staying in one's lane."

3. Unified metadata contract

All packages produce standardized metadata.json containing schemaVersion, packageId, version, platform, arch, sourceRevision, and artifacts[] fields, ensuring downstream consumers don't need to be aware of package differences. With a unified format, everyone can save some trouble.

Solution

Overall Workflow Architecture

The entire pipeline is defined in repos/vendered/.github/workflows/code-server-artifacts.yaml and includes the following stages:

prepare_release → build (matrix) → verify (matrix) → publish_github_release

The process is simple if you look at it simply, complex if you look at it complexly—it all depends on your perspective.

Trigger Conditions

on:
  workflow_dispatch:          # Manual trigger
  schedule:
    - cron: "23 3 * * *"     # Daily scheduled build
  push:
    branches: [main]          # Trigger on push to main branch
    paths:                    # Only trigger on related file changes
      - ".github/workflows/code-server-artifacts.yaml"
      - ".gitmodules"
      - "scripts/**"
      - "packages/code-server/**"
      - "packages/omniroute/**"

The daily scheduled build is set for 3:23 AM—no particular reason, just picked a random time. Perhaps the person who chose this time didn't think too much about it either.

Stage 1: Version Preparation

jobs:
  prepare_release:
    runs-on: ubuntu-22.04
    outputs:
      version: ${{ steps.version.outputs.version }}
      tag: ${{ steps.version.outputs.tag }}
    steps:
      - uses: actions/checkout@v6
      - uses: actions/setup-node@v6
        with:
          node-version: 22
      - id: version
        run: node ./scripts/versioning.mjs >> "$GITHUB_OUTPUT"

This stage generates a unified version number and Git tag, shared by all subsequent build and release steps. A good start saves a lot of trouble for the work ahead.

Stage 2: Multi-Platform Matrix Build

The build stage uses strategy.matrix to execute in parallel across different platforms:

code-server Build Matrix

build_code_server:
  needs: prepare_release
  strategy:
    fail-fast: false
    matrix:
      include:
        - name: code-server Linux
          runner: ubuntu-22.04
          artifact_name: code-server-linux
        - name: code-server macOS
          runner: macos-latest
          artifact_name: code-server-macos
        - name: code-server Windows
          runner: windows-latest
          artifact_name: code-server-windows

Key design: fail-fast: false ensures that a failure on one platform doesn't cancel builds on other platforms. After all, one platform failing doesn't mean all platforms have issues—no need for everyone to go down together.

omniroute Build Matrix

build_omniroute:
  needs: prepare_release
  strategy:
    fail-fast: false
    matrix:
      include:
        - name: omniroute Linux x64
          runner: ubuntu-22.04
          platform: linux
          arch: amd64
        - name: omniroute macOS x64
          runner: macos-15-intel
          platform: macos
          arch: amd64
        - name: omniroute macOS arm64
          runner: macos-14
          platform: macos
          arch: arm64
        - name: omniroute Windows x64
          runner: windows-latest
          platform: windows
          arch: amd64

OmniRoute's matrix is richer, including both Intel and ARM architectures for macOS. Note that macOS ARM uses the macos-14 runner (Apple Silicon), while Intel uses macos-15-intel. That's just how the world is—some things are always divided into camps—like Intel and ARM, never to reconcile.

Stage 3: Platform-Specific Prerequisites

Each platform requires different toolchains, and the workflow handles this through conditional steps:

Linux

- name: Install Linux prerequisites
  if: runner.os == 'Linux'
  run: sudo apt-get update && sudo apt-get install -y jq rsync quilt libkrb5-dev

macOS

- name: Install macOS prerequisites
  if: runner.os == 'macOS'
  run: brew install jq rsync quilt python-setuptools

Windows (MSYS2)

Windows is the most complex, requiring MSYS2 to provide a Unix-like toolchain—there's no way around it, since Windows' design philosophy is completely different from Unix systems:

- name: Setup MSYS2
  if: runner.os == 'Windows'
  uses: msys2/setup-msys2@v2
  with:
    msystem: MSYS
    path-type: inherit
    update: true
    install: >-
      diffutils jq patch quilt rsync unzip zip

- name: Configure Windows shell paths
  if: runner.os == 'Windows'
  shell: pwsh
  run: |
    Add-Content -Path $env:GITHUB_ENV -Value 'NPM_CONFIG_SCRIPT_SHELL=/usr/bin/bash'
    Add-Content -Path $env:GITHUB_ENV -Value ("MSYS2_CMD={0}\\setup-msys2\\msys2.cmd" -f $env:RUNNER_TEMP)

Actually, these configurations aren't that complex, but the first time you encounter them, they can be pretty confusing.

Stage 4: Build Artifact Verification

After building on each platform, verification steps download the artifacts, extract them, and actually start them to verify usability. After all, we don't want to release something that doesn't run—that would be too embarrassing:

verify_code_server:
  needs: build_code_server
  strategy:
    fail-fast: false
    matrix:
      include:
        - name: code-server Linux
          runner: ubuntu-22.04
          bash_path: bash
        - name: code-server Windows
          runner: windows-latest
          bash_path: C:\msys64\usr\bin\bash.exe

The verification script (verify-startup.mjs) will:

Extract the build artifacts
Start code-server on a random available port
Poll the /healthz endpoint waiting for service readiness
After confirming the service responds with 200, shut down the process

async function waitForHealth(port) {
  const deadline = Date.now() + 60_000
  while (Date.now() < deadline) {
    const response = await requestHealth(port)
    if (response.statusCode === 200) return
    await new Promise((resolve) => setTimeout(resolve, 1000))
  }
  throw new Error(`Timed out waiting for code-server to become healthy`)
}

Waiting for health checks always makes people a bit anxious—like waiting for someone who will never reply. Except this time the service will eventually start, while some people may never respond.

Stage 5: Unified Publishing

After all builds and verifications complete, the publishing stage collects the artifacts and creates a GitHub Release:

publish_github_release:
  needs:
    - prepare_release
    - build_code_server
    - build_omniroute
    - verify_code_server
    - verify_omniroute
  if: >-
    ${{ (github.event_name == 'push' && github.ref == 'refs/heads/main') ||
        github.event_name == 'workflow_dispatch' }}
  concurrency:
    group: ${{ format('vendered-github-release-{0}', needs.prepare_release.outputs.tag) }}
    cancel-in-progress: false

Key points:

Concurrency control: Using concurrency ensures that publishes for the same tag don't execute in parallel—avoiding duplicate releases is generally a good thing
Conditional publishing: Only publish on push to main branch or manual trigger; scheduled builds only execute build and verification
Artifact aggregation: Use the pattern parameter of download-artifact to batch download all platform artifacts for both code-server and omniroute

Practice

Key Points for Cross-Platform Build Script Writing

Build scripts (build-artifacts.mjs) need to handle platform differences. Here are the key points:

1. Platform detection and normalization

function normalizePlatform(value) {
  switch (String(value).toLowerCase()) {
    case "darwin":
    case "macos":
      return "macos"
    case "win32":
    case "windows":
    case "windows_nt":
      return "windows"
    default:
      return "linux"
  }
}

Different systems refer to the same platform differently—like the same person having different names in different contexts, but still being the same person.

2. Shell compatibility on Windows

On Windows, npm run calls cmd.exe, but code-server's build scripts depend on bash. The solution is to set the NPM_CONFIG_SCRIPT_SHELL environment variable and use MSYS2. There's no way around this, since Windows and Unix have completely different design philosophies:

function withCodeServerEnv(env) {
  const scriptShell = platform === "windows"
    ? "/usr/bin/bash"
    : env.BASH_PATH || "bash"
  return {
    ...env,
    NPM_CONFIG_SCRIPT_SHELL: platform === "windows" ? scriptShell : env.NPM_CONFIG_SCRIPT_SHELL,
  }
}

3. Artifact packaging

Different platforms use different archive formats (Linux/macOS use .tar.gz, Windows uses .zip)—each platform has its own preferences, just like everyone has their own habits:

if (platform === "windows") {
  await run("powershell.exe", [
    "-NoLogo", "-NoProfile", "-Command",
    `Compress-Archive -Path '${releaseDir}' -DestinationPath '${archivePath}' -Force`,
  ])
} else {
  await run("tar", ["-czf", archivePath, "-C", codeServerRoot, path.basename(releaseDir)])
}

4. Patch management

code-server customization is implemented through quilt patches in the patches/ directory. Linux uses quilt directly, macOS installs quilt through Homebrew, and Windows needs to use quilt from MSYS2 or fall back to the patch command (this part is quite troublesome):

// Use patch command on Windows instead of quilt
async function applyPatchesWithPatch(env) {
  const series = await readFile(path.join(codeServerRoot, "patches", "series"), "utf8")
  const patchFiles = series.split(/\r?\n/)
    .map(line => line.trim())
    .filter(line => line && !line.startsWith("#"))

  for (const patchFile of patchFiles) {
    await runMsys2(`patch -p1 --forward -i "patches/${patchFile}"`, { cwd: codeServerRoot, env })
  }
}

The Windows part definitely took a lot of time—no way around it, since Windows' design philosophy is different from other systems.

Version Number Design Considerations

HagiCode uses the YYYY.MMDD.RRRR format instead of upstream semantic versioning for the following reasons:

Determinism: Each build's version number is uniquely determined by date and run number
Monotonic incrementing: Date prefix ensures natural sorting is chronological order
Source traceability: Build time and CI run sequence number can be inferred from the version number

Actually, there's nothing special about this—it just happens to be good enough. Semantic versioning sounds nice, but it's actually quite troublesome to use in practice.

Important Notes

Submodule recursive checkout: Must use submodules: recursive when building, ensuring complete cloning of upstream code for both code-server and omniroute (this place is easy to forget)
Node version matching: code-server build uses the Node version specified in upstream .node-version file; omniroute uses Node 24
Windows home directory: OmniRoute on Windows CI needs to manually create $HOME directory structure to avoid build scripts accessing non-existent paths—Windows directory structure is different from other systems
Verification timeout: code-server startup verification has a 60-second timeout and needs adjustment based on actual startup speed
Artifact slimming: Delete embedded Node binary after building (slimRelease), since downstream will use its own Node runtime
Publish idempotency: github-release.mjs supports updating existing Releases (delete old Asset first then upload new one), ensuring retry safety

These are all lessons learned from stepping in pits—of course, when you're in the pit, it really makes you want to pull your hair out.

Complete CI/CD Flow Diagram

┌─────────────────────────────────────────────────────────────────┐
│                     Trigger Sources                              │
│  push to main / workflow_dispatch / cron(23 3 * * *)            │
└──────────────────────────┬──────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│  prepare_release                                                 │
│  Generate version: 2026.0506.0001, tag: v2026.0506.0001         │
└──────────────────────────┬──────────────────────────────────────┘
                           │
              ┌────────────┼────────────┐
              ▼            ▼            ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ code-server  │ │ code-server  │ │ code-server  │
│ Linux        │ │ macOS        │ │ Windows      │
│ ubuntu-22.04 │ │ macos-latest │ │win-latest    │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
       │                │                │
       ▼                ▼                ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ verify       │ │ verify       │ │ verify       │
│ Linux        │ │ macOS        │ │ Windows      │
│ startup+healthz│ │ startup+healthz│ │ startup+healthz│
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
       │                │                │
       └────────────────┼────────────────┘
                        │
       ┌────────────────┼────────────────┐
       ▼                ▼                ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ omniroute    │ │ omniroute    │ │ omniroute    │ ...
│ linux-amd64  │ │ macos-amd64  │ │ macos-arm64  │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
       │                │                │
       └────────────────┼────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────────────┐
│  publish_github_release                                          │
│  Download all artifacts → Create/update GitHub Release → Upload │
└─────────────────────────────────────────────────────────────────┘

This flowchart looks quite complex, but when you break it down, it's not that difficult. Many things are like this—they look scary, but when you actually do them, they're just whatever.

Key Configuration Reference

# Build environment variables
env:
  CI: true
  GITHUB_TOKEN: ${{ github.token }}
  ELECTRON_SKIP_BINARY_DOWNLOAD: 1    # Skip Electron download
  PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD: 1  # Skip Playwright browser download
  npm_config_build_from_source: true   # Build native modules from source
  VERSION: ${{ needs.prepare_release.outputs.version }}

These environment variables are crucial for build speed and correctness: skipping unnecessary binary downloads significantly reduces build time, and build_from_source ensures native modules compile correctly on the target platform.

Through this pipeline, HagiCode achieves automated building, verification, and publishing of code-server and OmniRoute across three operating systems, turning what was originally a manual multi-platform publishing process into a fully automated CI/CD process. This can be considered making a troublesome thing less troublesome.

Summary

The key to designing multi-platform CI/CD pipelines lies in:

Centralized version management: Generate a unified version number at the start of the pipeline, shared by all downstream steps
Separation of build and publish: Use fail-fast: false to ensure a platform failure doesn't affect other platforms, with the publishing stage aggregating all artifacts
Platform-isolated build scripts: Each package maintains its own build logic, while shared toolchain remains package-agnostic
Automated artifact verification: Verify usability immediately after building to avoid discovering problems only after publishing

This solution is not only applicable to code-server and OmniRoute, but can also provide reference for other projects needing multi-platform builds. The build system shared in this article is exactly what we actually stepped on pits and optimized during the development of HagiCode. If you find this solution valuable, it shows our engineering strength is not bad—then HagiCode itself is worth paying attention to.

After all, people who can automate such troublesome things probably aren't too bad themselves.

References

If this article helps you:

Come to GitHub and give us a Star: github.com/HagiCode-org/site
Visit the official website to learn more: hagicode.com
Watch the official release demo video: www.bilibili.com/video/BV1z4oWB3EpY/
One-click install to try: docs.hagicode.com/installation/docker-compose
Desktop quick install: hagicode.com/desktop/
Public beta has started, welcome to install and try

Why HagiCode Chose execa for CLI Command Execution

Hagicode — Tue, 05 May 2026 14:17:03 +0000

Why HagiCode Chose execa for CLI Command Execution

Using child_process directly in Node.js projects to execute external commands comes with pain points like significant platform differences and inconsistent error handling. This article shares the practical experience of introducing execa in the HagiCode project, including core design decisions and real code examples.

Background

In Node.js projects, directly using the child_process module to execute external commands is common practice, but it comes with quite a few issues:

Significant platform differences: Windows .cmd/.bat files require special handling, and paths containing spaces need to be wrapped in quotes
Inconsistent error handling: execFile, spawn, and execFileSync produce error information in varying formats, making unified handling difficult
Tedious stream processing: Manual collection and buffering of stdout/stderr streams is required
Complex timeout and signal handling: Extra code is needed to implement command timeout cancellation and process signal handling

The Hagiscript and Desktop applications within the HagiCode project both need to execute a large number of external CLI commands (npm, node, PowerShell, etc.). Directly using child_process led to code duplication and high maintenance costs.

To address these pain points, we made a decision: introduce execa as a unified command execution solution. The impact of this decision turned out to be greater than you might imagine — I'll explain the specifics shortly.

About HagiCode

The approach shared in this article comes from our practical experience in the HagiCode project. HagiCode is an AI coding assistant project that needs to execute a large number of external commands across multiple sub-projects (the Hagiscript scripting engine and the Desktop application). The complexity of supporting multiple languages and platforms is perhaps the direct reason we chose to introduce execa.

If you find the approach shared in this article valuable, it speaks to our engineering capabilities — and HagiCode itself is worth checking out.

Why execa?

execa is a mature process execution library that addresses the core problems of child_process:

Cross-platform consistency: Automatically handles Windows command shims without manually detecting .cmd files
Unified error handling: Standardized error objects containing exitCode, signal, timedOut, stdout, and stderr
Better API design: Supports Promise API, AbortSignal cancellation, and stream processing
Security: Maintains argument boundaries, avoiding command injection risks

These are exactly the features we needed during HagiCode development. Hagiscript needs to execute npm commands across different platforms, and Desktop needs to call PowerShell and various development tools. execa's cross-platform consistency significantly reduced our platform-specific code. After all, who wants to write special handling code for every platform?

Core Design Decisions

Both projects' implementations use an internal wrapper layer rather than calling execa directly:

// Hagiscript's unified executor
export const runCommand: CommandRunner = async (command, args, options) => {
  const result = await execa(command, args, { /* normalized options */ });
  return { /* normalized result */ };
};

Reasons:

Maintain domain-specific error types (e.g., NpmCommandError)
Enable injecting mock executors during testing
Unify error handling and logging
Make it easy to replace the underlying implementation in the future

Argument Boundary Protection

Both implementations emphasize argument arrays over shell strings:

// Correct: clear argument boundaries
await runCommand('npm', ['install', '@scope/package@1.0.0']);

// Wrong: prone to injection risks
await execa(`npm install @scope/package@1.0.0`, { shell: true });

This avoids security issues related to argument quoting, escaping, and injection. In HagiCode, we frequently handle user-supplied inputs like package names and script names as parameters. Using argument arrays effectively prevents command injection. When it comes to security, once something goes wrong, it's a big problem.

Hagiscript's Solution

HagiCode's Hagiscript sub-project created a runtime/command-launch.ts module that provides:

Unified executor: The runCommand function wraps execa
Standardized result: CommandResult interface
Standardized error: CommandExecutionError class
Compatibility helpers: normalizeCommandPath, requiresShellLaunch

export interface CommandResult {
  command: string;
  args: string[];
  stdout: string;
  stderr: string;
  exitCode?: number;
  signal?: string;
  timedOut?: boolean;
}

export class CommandExecutionError extends Error {
  readonly context: CommandFailureContext;
}

This abstraction allows Hagiscript to handle all external commands uniformly, whether installing npm dependencies or executing scripts with node. With a unified interface, the code really does flow much more smoothly.

Desktop's Solution

HagiCode's Desktop sub-project created a utils/cli-executor.ts module that provides:

Execution options: CliExecutorOptions supports timeout, cancellation, and environment variables
Result classification: CliExecutionResult includes success/failure status
Stream processing: executeCliStreaming supports real-time output callbacks
Error classification: CliFailureKind distinguishes between exit, timeout, cancellation, and other failure types

export async function executeCli(options: CliExecutorOptions): Promise<CliExecutionResult>
export async function executeCliStreaming(options: CliExecutorOptions): Promise<CliExecutionResult>

The Desktop application needs to display command execution progress in the UI, which is where the streaming feature comes in handy. Users can see the output of npm install in real time rather than waiting until the command finishes. Once you've experienced it, there's no going back.

Usage Examples

Executing Commands in Hagiscript

import { runCommand } from '../runtime/command-launch.js';

// Simple execution
const result = await runCommand('node', ['--version']);
console.log(result.stdout); // 'v20.0.0'

// Execution with options
const installResult = await runCommand('npm', ['install', 'express'], {
  cwd: '/project/path',
  env: { NODE_ENV: 'development' },
  timeoutMs: 30000
});

Executing Commands in Desktop

import { executeCli, executeCliStreaming } from './utils/cli-executor.js';

// Buffered execution
const result = await executeCli({
  command: 'npm',
  args: ['list', '--json'],
  cwd: projectPath,
  timeoutMs: 5000,
});

if (result.success) {
  console.log(result.stdout);
} else {
  console.error(result.error?.message);
}

// Streaming execution
await executeCliStreaming({
  command: 'npm',
  args: ['install'],
  onOutput: (type, data) => {
    console.log(`[${type}]`, data);
  }
});

Error Handling

try {
  await runCommand('npm', ['install', 'invalid-package']);
} catch (error) {
  if (error instanceof CommandExecutionError) {
    console.error('Command failed:', error.context.command);
    console.error('Exit code:', error.context.exitCode);
    console.error('Stderr:', error.context.stderr);
  }
}

Unified error handling allows us to provide a better user experience in HagiCode. For example, when an npm installation fails, we can extract the specific error message and display it to the user instead of showing a generic "command execution failed" message. Seeing the specific error at least tells the user where the problem lies.

Testing Strategy

Both projects support dependency injection for easier testing:

// Production code
async function installPackage(pkg: string, runCommand = defaultRunCommand) {
  return runCommand('npm', ['install', pkg]);
}

// Test code
it('installs package', async () => {
  const mockRunCommand = vi.fn().mockResolvedValue({
    stdout: 'installed',
    stderr: '',
    exitCode: 0
  });
  await installPackage('test-pkg', mockRunCommand);
  expect(mockRunCommand).toHaveBeenCalledWith('npm', ['install', 'test-pkg']);
});

This design makes HagiCode's tests more reliable and faster. We don't need to actually execute npm commands in tests — we only need to mock the executor to return expected results. Faster tests naturally lead to a better development experience.

Best Practices

Based on HagiCode's practice, we've summarized the following best practices:

Keep arguments separated: Always pass the command and arguments as separate array elements
Use shell mode sparingly: Only use shell: true when necessary, such as when piping or redirection is needed
Handle timeouts: Set timeoutMs for commands that may hang
Buffer size: Consider setting maxBuffer for commands with large output
Windows paths: execa automatically handles .cmd shims — no manual detection needed
Cancellation: Use AbortSignal instead of manual kill()
Error classification: Distinguish between process startup failure, execution failure, timeout, cancellation, and other scenarios

These are all pitfalls we've encountered during actual development. Hopefully they can save you some detours.

Common Pitfalls

// Wrong: string concatenation may allow injection
await execa(`npm install ${userInput}`, { shell: true });

// Correct: argument array
await execa('npm', ['install', userInput]);

// Wrong: ignoring timeout
await execa('npm', ['install', 'heavy-package']);

// Correct: set timeout
await execa('npm', ['install', 'heavy-package'], { timeout: 60000 });

// Wrong: assuming exit code is 0
const result = await execa('npm', ['install']);

// Correct: check for failure
try {
  await execa('npm', ['install']);
} catch (error) {
  // Handle failure
}

These pitfalls are hard-won lessons. After all, who hasn't stepped on a few landmines in production?

Summary

After introducing execa, the HagiCode project saw significant improvements in both code quality and maintainability for command execution:

Cross-platform consistency: No more writing special handling code for Windows
Unified error handling: Structured error messages make display and analysis easier
Better testability: Command execution can be easily mocked through dependency injection
More secure argument handling: Using argument arrays avoids injection risks

If you also need to execute external commands in a Node.js project, we highly recommend giving execa a try. The approach shared in this article was refined through real-world pitfalls and optimizations during HagiCode development. We hope you find it helpful.

Good tools deserve wider recognition.

References

If this article was helpful:

Give us a Star on GitHub: github.com/HagiCode-org/site
Visit our website to learn more: hagicode.com
Watch a 30-minute hands-on demo: www.bilibili.com/video/BV1pirZBuEzq/
Quick install with one click: docs.hagicode.com/installation/docker-compose
Desktop app quick install: hagicode.com/desktop/

Original Article & License

Thanks for reading. If this article helped, consider liking, bookmarking, or sharing it.
This article was created with AI assistance and reviewed by the author before publication.

Author: newbe36524
Original URL: https://docs.hagicode.com/go?platform=devto&target=%2Fblog%2F2026-04-28-why-hagicode-chose-execa-for-cli-execution%2F
License: Unless otherwise stated, this article is licensed under CC BY-NC-SA. Please retain attribution when sharing.

Implementing Auto-Retry for Agent CLIs like Claude Code and Codex

Hagicode — Sat, 18 Apr 2026 09:14:33 +0000

Implementing Auto-Retry for Agent CLIs like Claude Code and Codex

Auto-retry might seem like a simple switch, but in real-world engineering, it's anything but. Hello everyone, I'm Yu Kun, creator of HagiCode. Today, let's skip the fluff and talk about how auto-retry for Agent CLIs like Claude Code and Codex should actually be implemented—to handle exceptions properly without spiraling into endless retry loops.

Background

If you've been working with AI programming recently, you've likely encountered this: tasks don't fail immediately—they break halfway through.

With ordinary HTTP requests, you often just retry with some exponential backoff. But Agent CLIs are different. Tools like Claude Code and Codex typically execute in streaming fashion, pushing output in chunks, while binding to threads, sessions, or resume tokens. In other words, it's not just "did this request fail" but rather:

Is the content already output still valid
Can the current context continue running
Should this failure auto-recover
If recovering, how long to wait, what to send, whether to reuse the original context

Many teams building this for the first time instinctively write the simplest version: retry on error. That's natural, but once it's in the project, problems start popping up one after another:

Some clearly transient errors get treated as final failures
Some errors not worth retrying get replayed repeatedly
Requests with threads and without get treated identically
Unbounded backoff strategies pound the backend with self-inflicted load

HagiCode has stepped in these pits while integrating multiple Agent CLIs. Especially on the Codex side, the initially exposed problem was that certain reconnect messages weren't recognized as retryable terminal states, so existing recovery mechanisms never got a chance to kick in. Basically, the system didn't lack auto-retry—it failed to recognize "this is worth retrying."

So the core point of this article is clear: Auto-retry is not a button, but a layered design.

About HagiCode

The solution shared here comes from our real practice in the HagiCode project. HagiCode's mission isn't just to hook up some model and call it done—it's to unify streaming messages, tool calls, failure recovery, and session context across multiple Agent CLIs into a maintainable execution model.

One of my main concerns is how to make AI programming actually work in production environments. Writing demos isn't hard; turning demos into something teams will use long-term is. HagiCode takes auto-retry seriously not because it looks fancy, but because if long-running, streaming, resumable CLI execution isn't stable, users see an unreliable wrapper that drops connections mid-task, not an intelligent assistant.

If you want to check out the project first, here are two entry points:

GitHub: github.com/HagiCode-org/site
Official site: hagicode.com

Taking it a step further, HagiCode is now on Steam. If you're on Steam, feel free to wishlist:

Steam Store Page (Wishlist / Details)

Why Agent CLI Auto-Retry is Harder Than Ordinary Retry

This is a practical question—let's jump to the conclusion: The difficulty with Agent CLI auto-retry isn't "wait a few seconds and try again," but "can we continue within the original context?"

Think of it like a long conversation. Ordinary API retry is like a busy line—redial. Agent CLI retry is like the other person's signal cutting mid-sentence. You have to decide: call back? Start over? Do they remember where we left off? These aren't the same problem at all.

Specifically, four challenges are most typical.

1. It's Streaming

Once output is sent to the user, you can't quietly swallow failures and retry like with ordinary requests. Because that initial content was already seen, improper replay leads to duplicate text, confused state, and scrambled tool call lifecycles. This isn't magic—it's engineering.

2. It Binds Session Context

Providers like Codex bind to threads; Claude Code-type implementations have continuation targets or equivalent resume context. Real auto-retry prerequisites aren't just "this error looks like transient failure," but also "does this execution still have a medium to continue?"

3. Not All Errors Are Worth Retrying

Network jitter, SSE idle timeout, upstream transient failures—usually worth a try. But authentication failure, lost context, or providers without resume capability? Continued retrying isn't recovery, it's noise.

4. It Needs Boundaries

Infinite auto-retry is almost always wrong. One stable engineering principle is: failure recovery must have boundaries. The system must know: max attempts, spacing between attempts, when to stop and admit defeat.

Because of these characteristics, HagiCode didn't implement auto-retry as a few try/catch lines in some provider—we extracted it as shared capability. Engineering problems need engineering solutions.

HagiCode's Approach: Extract Retry from Provider

HagiCode's current real implementation can be compressed to one sentence:

Shared layer uniformly manages retry flow; specific Providers only answer two questions: Is this terminal state worth retrying? Can current context continue?

This isn't complex, but it's critical. Once responsibilities are separated, Claude Code, Codex, and even other Agent CLIs can all reuse the same skeleton. Models change, tools change, workflows upgrade, but the engineering foundation remains.

Layer 1: Unified Coordinator Manages Retry Loop

The core implementation fragment looks roughly like this:

internal static class ProviderErrorAutoRetryCoordinator
{
    public static async IAsyncEnumerable<CliMessage> ExecuteAsync(
        string prompt,
        ProviderErrorAutoRetrySettings? settings,
        Func<string, IAsyncEnumerable<CliMessage>> executeAttemptAsync,
        Func<bool> canRetryInSameContext,
        Func<TimeSpan, CancellationToken, Task> delayAsync,
        Func<CliMessage, bool> isRetryableTerminalMessage,
        [EnumeratorCancellation] CancellationToken cancellationToken)
    {
        var normalizedSettings = ProviderErrorAutoRetrySettings.Normalize(settings);
        var retrySchedule = normalizedSettings.Enabled
            ? normalizedSettings.GetRetrySchedule()
            : [];

        for (var attempt = 0; ; attempt++)
        {
            var attemptPrompt = attempt == 0
                ? prompt
                : ProviderErrorAutoRetrySettings.ContinuationPrompt;

            CliMessage? terminalFailure = null;

            await foreach (var message in executeAttemptAsync(attemptPrompt)
                               .WithCancellation(cancellationToken))
            {
                if (isRetryableTerminalMessage(message))
                {
                    terminalFailure = message;
                    break;
                }

                yield return message;
            }

            if (terminalFailure is null)
            {
                yield break;
            }

            if (attempt >= retrySchedule.Count || !canRetryInSameContext())
            {
                yield return terminalFailure;
                yield break;
            }

            await delayAsync(retrySchedule[attempt], cancellationToken);
        }
    }
}

This code does something simple but powerful:

Intermediate failures aren't directly passed through; coordinator first judges if recovery is possible
Only when retry budget is exhausted does final failure return to upper layers
From round 2 onward, don't send original prompt—send unified continuation prompt

This is why I keep emphasizing: auto-retry isn't simply "request again." It's not patching an exception branch—it's managing an execution lifecycle. Sounds product-manager-ish, but that's how it works in engineering.

Layer 2: Snapshot Retry Strategy

Another easily overlooked issue: Who decides whether this request enables auto-retry?

HagiCode's answer: Don't rely on "current global configuration"—snapshot the strategy and let it travel with the request. This way, session queuing, message persistence, execution forwarding, provider adaptation won't lose the strategy. One success isn't a system; sustained success is.

Core structure simplifies to:

public sealed record ProviderErrorAutoRetrySnapshot
{
    public const string DefaultStrategy = "default";

    public bool Enabled { get; init; }

    public string Strategy { get; init; } = DefaultStrategy;

    public static ProviderErrorAutoRetrySnapshot Normalize(bool? enabled, string? strategy)
    {
        return new ProviderErrorAutoRetrySnapshot
        {
            Enabled = enabled ?? true,
            Strategy = string.IsNullOrWhiteSpace(strategy)
                ? DefaultStrategy
                : strategy.Trim()
        };
    }
}

Then map to settings objects actually consumed by providers at execution time. The value is direct:

Business layer decides "whether to retry"
Runtime decides "how to retry"

Each manages its own concern. Many problems aren't impossible, just not properly costed. Snapshotting strategy essentially calculates costs upfront.

Layer 3: Provider Only Does Terminal and Context Determination

At specific Claude Code or Codex provider level, responsibilities are actually thin. Think of it as enhancement, not replacement.

Take Codex—it essentially only needs to provide three things when integrating the shared coordinator:

await foreach (var message in ProviderErrorAutoRetryCoordinator.ExecuteAsync(
                   prompt,
                   options.ProviderErrorAutoRetry,
                   retryPrompt => ExecuteCodexAttemptAsync(...),
                   () => !string.IsNullOrWhiteSpace(resolvedThreadId),
                   DelayAsync,
                   IsRetryableTerminalFailure,
                   cancellationToken))
{
    yield return message;
}

You'll find truly Provider-specific judgments are only two:

IsRetryableTerminalFailure
canRetryInSameContext

Codex checks if thread can continue; Claude Code checks if continuation target exists. Backoff strategy, retry count, subsequent prompts—none of these should be reinvented by each Provider.

Once this layer is extracted, HagiCode's cost to integrate more CLIs drops significantly. You don't duplicate the entire retry state machine—just plug in "this provider's boundary conditions." Fast doesn't mean stable; handling doesn't mean handling well; runnable doesn't means maintainable.

An Easy Mistake: Don't Treat All Errors as Retryable

In this analysis, what's most worth calling out separately isn't "how to implement retry" but "how to avoid wrong retry."

The initial problem entry point was Codex missing recognition of a reconnect message. Intuitively, many would choose minimal fix: add another string prefix to whitelist. This isn't wrong per se, but it's more like a demo-era solution, not a long-term maintainable one.

From current HagiCode implementation, the system has moved toward more stable direction. It no longer focuses on specific literal strings but uniformly hands recoverable terminal states to shared coordinator. Benefits are obvious:

Won't completely break from minor text copy changes
Test coverage can focus on "terminal state envelope" rather than single hardcoded text
Same provider's retry logic stays more consistent

Of course, set a boundary: more generic doesn't mean more permissive. If current context cannot continue, even if error looks like transient failure, don't blindly replay.

This is critical. What's truly reassuring isn't that it occasionally works, but that it's reliable most of the time. If a process requires experts to maintain, it's far from mainstream.

Three Most Worthwhile Practical Lessons

Let's wrap this up at the practical level. If you're implementing similar capability in your project, I most recommend guarding these three principles.

1. Retry Budget Must Have Boundaries

HagiCode's current default backoff rhythm:

10 seconds
20 seconds
60 seconds

This rhythm might not fit all systems, but "having boundaries" must be kept. Otherwise, auto-retry quickly transforms from recovery mechanism into disaster amplifier. Don't rush to name it grand—first see if it survives two iterations in your team.

2. Continuation Prompt Should Be Unified

The project uses fixed continuation prompts, letting subsequent attempts explicitly take "continue current context" path rather than initiating a fresh complete request. This isn't flashy, but it's indispensable in real projects. Many capabilities look like magic, but unpacked they're just polished engineering workflows.

3. Both Shared Library and Adapter Layer Need Mirror Tests

I want to emphasize this. Many teams write tests in shared runtime and call it good enough. It's not.

What makes me confident about HagiCode is both layers have test coverage:

Shared Provider tests "whether auto-resume actually occurred"
Adapter layer tests "whether final errors and streaming messages were corrupted"

I additionally ran two related test suites this time—all 31 cases passed. This result itself doesn't prove perfect design, but it shows at least one thing: current auto-retry isn't a paper plan—it's capability constrained by both code and tests. Talk is cheap. Show me the code. Fits perfectly here.

Summary

If we compress this article to one sentence:

Auto-retry for Agent CLIs like Claude Code and Codex is best implemented not as local tricks inside some Provider, but as a combination of shared coordinator + strategy snapshot + context determination + mirror tests.

Benefits are very real:

Logic written once, reused by multiple Providers
Whether request allows retrying can stably follow execution chain
Continue with context, stop without context
Frontend sees stable completion or failure states, not abandoned intermediate noise

This solution was polished through HagiCode's real integration of multiple Agent CLIs. Who says AI-assisted programming isn't the new pair programming? Models help you start, complete, diverge—but what ultimately determines experience ceiling is context, workflow, and constraints.

If this helped you, also feel free to check out HagiCode's public entry points:

GitHub: github.com/HagiCode-org/site
Official site: hagicode.com
30-minute demo: www.bilibili.com/video/BV1pirZBuEzq/
Desktop install: hagicode.com/desktop/
Steam: Steam Store Page (Wishlist / Details)

HagiCode is now on Steam—this isn't vaporware, link's right here. If you're on Steam, wishlist it and click through. More direct than me saying ten sentences here.

That's it for now—see you in real projects.

References

HagiCode project homepage: https://hagicode.com
HagiCode GitHub repo: https://github.com/HagiCode-org/site
Official demo video: https://www.bilibili.com/video/BV1pirZBuEzq/
Desktop install guide: https://hagicode.com/desktop/

Original Article & License

Thanks for reading. If this article helped, consider liking, bookmarking, or sharing it.
This article was created with AI assistance and reviewed by the author before publication.

Author: newbe36524
Original URL: https://docs.hagicode.com/go?platform=devto&target=%2Fblog%2F2026-02-11-agent-cli-automatic-retry%2F
License: Unless otherwise stated, this article is licensed under CC BY-NC-SA. Please retain attribution when sharing.

SQLite Sharding in Practice: A Deep Comparison of Three Sharding Strategies

Hagicode — Fri, 17 Apr 2026 05:53:37 +0000

SQLite Sharding in Practice: A Deep Comparison of Three Sharding Strategies

When single-file SQLite hits concurrency bottlenecks, how do we break through? This article shares three SQLite sharding approaches from different scenarios in the HagiCode project, helping you understand how to choose the right sharding strategy.

Hello everyone, I'm Yu Kun, producer of HagiCode.

Background

When building high-performance applications, single-file SQLite databases encounter very real problems. As user volume and data grow, these issues start lining up at your door:

Write operations begin queuing, response times visibly increase
Query performance declines as data grows
Frequent "database is locked" errors during multi-threaded access

Many people's first reaction is: should we just migrate to PostgreSQL or MySQL? While this approach can solve the problem, deployment complexity rises sharply. Is there a lighter-weight solution?

The answer is: sharding. Ultimately, engineering problems must be solved with engineering methods. By distributing data across multiple SQLite files, you can significantly improve concurrency and query performance while maintaining SQLite's lightweight characteristics.

About HagiCode

The approaches shared in this article come from our practical experience in the HagiCode project. As an AI code assistant project, HagiCode needs to handle large volumes of conversation messages, state persistence, and event history records. It was in solving these real-world problems that we summarized three different sharding approaches for different scenarios.

To do good work, one must first sharpen one's tools—but how to use these "tools" depends on the specific "work" at hand.

Our code repository is at github.com/HagiCode-org/site—friends who are interested are welcome to dive deeper.

Overview of Three Sharding Approaches

Through analysis of the HagiCode codebase, we discovered three SQLite sharding approaches for different business scenarios:

Session Message Sharded Storage: AI conversation message storage, characterized by high-frequency writes and session-based isolated queries
Orleans Grain Sharded Storage: Distributed framework state persistence, characterized by cross-node access requiring deterministic routing
Hero History Sharded Storage: Gamification system historical event records, characterized by event sourcing requiring migration compatibility

Although the business scenarios differ, all three follow the same core design principles:

Deterministic routing: Directly calculate shards from business IDs, no metadata tables needed
Transparent access: Upper layers operate through unified interfaces, unaware of sharding
Independent storage: Each shard is a completely independent SQLite file
Concurrency optimization: WAL mode + busy_timeout reduces lock contention

Many people ask: Why not build a universal sharding solution? This is a very practical question. Here's our conclusion: In engineering, there are no universal solutions, only approaches that best fit the current business scenario. Next, we'll deeply compare the specific implementations of these three approaches.

Sharding Strategy Comparison

Shard Count and Naming Rules

Aspect	Session Message	Orleans Grain	Hero History
Shard Count	256 (16²)	100	10
Naming Rule	Hexadecimal (00-ff)	Decimal (00-99)	Decimal (0-9)
Storage Directory	`DataDir/messages/`	`DataDir/orleans/grains/`	`DataDir/hero-history/`
Filename Pattern	`{shard}.db`	`grains-{shard}.db`	`{shard}.db`

Why such significant differences in shard count? This depends on business characteristics. In other words, models will change, tools will evolve, workflows will upgrade, but the engineering fundamentals remain: you must first understand what problem you're solving.

Session Message uses 256 shards because conversation messages have the highest write frequency, requiring more shards to distribute load
Orleans Grain uses 100 shards, balancing concurrency performance with management complexity
Hero History uses only 10 shards because historical events have lower write frequency and migration costs must be considered

Routing Algorithm Differences

Routing algorithms are the core of sharding approaches, determining how data distributes across shards. The three approaches use different routing strategies:

// Session Message: GUID last two digits hexadecimal
var normalized = Guid.Parse(sessionId.Value).ToString("N").ToLowerInvariant();
return normalized[^2..];  // Take last two hexadecimal characters

// Orleans Grain: Extract digits and take last two digits modulo
var digits = ExtractDigits(grainId);  // Extract all digits
var lastTwoDigits = (digits[^2] * 10) + digits[^1];
return lastTwoDigits % shardCount;

// Hero History: Last character ASCII value modulo
return heroId[^1] % 10;

Design Logic Analysis:

Session Message IDs are GUIDs; after converting to hexadecimal and taking the last two digits, you get evenly distributed 256 shards
Orleans Grain ID formats are inconsistent, possibly containing both letters and numbers, so all digits are extracted before modulo
Hero History IDs are strings; directly using the last character's ASCII value modulo is simple but distribution may not be even enough

Key Point: Regardless of which algorithm is used, you must ensure the same ID always maps to the same shard. This is the most basic requirement in distributed systems—otherwise, data inconsistency results. Ultimately, unstable routing means all effort is wasted.

Initialization Strategy Differences

Aspect	Session Message	Orleans Grain	Hero History
Initialization Timing	On-demand lazy loading	Startup full parallel initialization	On-demand lazy loading
Concurrency Control	Lazy prevents duplicate initialization	Parallel.ForEachAsync	Lazy prevents duplicate initialization

Why does Orleans Grain choose full initialization at startup?

Because Orleans is a distributed framework, Grains may be scheduled to any node. If shard files are discovered missing only at runtime, requests will fail. Full initialization at startup extends startup time but ensures runtime stability. Getting it running is just the beginning; keeping it maintainable is real skill.

Lazy Loading Advantages:

For Session Message and Hero History, lazy loading reduces startup time—files and Schema are created only when actually accessing a specific shard. Using Lazy<Task> prevents race conditions during concurrent initialization. This design looks simple, but saves a lot of unnecessary trouble in real projects.

Schema Design Characteristics

The three approaches' Schema designs reflect their respective business characteristics:

Session Message:

Supports Event Sourcing pattern (event table + snapshot table)
Includes message content block sub-table (MessageContentBlocks)
Has compression and compression flag fields, supporting future optimization

Orleans Grain:

Minimalist design: single table GrainState
JSON serialization for state storage
ETag optimistic concurrency control

Hero History:

Timeline query optimization indexes
DedupeKey unique constraint prevents duplication
Supports multiple event types and statuses

From these designs, we can see Schema design should closely fit business requirements, not pursue generality. Orleans Grain's simple design exists because it only needs to store serialized state, without complex query capabilities. This isn't magic—it's engineering. Don't rush to give things grand names—first see if they can survive two iterations in the team.

Concurrency Configuration Comparison

All three approaches use the same SQLite concurrency optimization configuration:

PRAGMA journal_mode=WAL;      -- Write-ahead logging mode
PRAGMA synchronous=NORMAL;     -- Reduce persistence overhead
PRAGMA busy_timeout=5000;      -- 5 second busy wait
PRAGMA foreign_keys=ON;        -- Foreign key constraints

WAL Mode Advantages:

Traditional rollback journal mode produces lock contention during writes, while WAL mode allows concurrent reads and writes. This can significantly improve performance in large data volume scenarios. Many people don't know this configuration—actually, it's more important than you think.

synchronous=NORMAL Trade-off:

Setting to FULL guarantees maximum safety but significantly reduces performance. NORMAL mode achieves balance between safety and performance, making it the right choice for most applications. Don't struggle with this configuration too long—NORMAL is enough.

How to Choose a Sharding Strategy

Based on analysis of HagiCode's three approaches, we can summarize this decision matrix:

High throughput scenarios → More shards (e.g., Message uses 256)
Simple maintainability     → Fewer shards (e.g., Hero History uses 10)
Mostly numeric IDs         → Modulo algorithm (Orleans Grain)
Mostly GUIDs              → Hexadecimal suffix (Session Message)
String IDs                → ASCII modulo (Hero History)

Experience Values for Shard Count Selection:

Too few (< 10): Limited concurrency improvement, little sharding benefit
Too many (> 1000): Complex file management, high connection pool overhead
Experience value: 10-100 shards suitable for most scenarios
Extremely high concurrency scenarios: Consider 256 shards

This might look exciting in demos, but once you're in production, every cost must be calculated carefully. Many problems aren't impossible—just haven't had their costs properly calculated.

Implementation Guide

Implement Standardized Shard Router

public interface IShardResolver<TId>
{
    string ResolveShardKey(TId id);
}

// Hexadecimal sharding (for GUIDs)
public class HexSuffixShardResolver : IShardResolver<string>
{
    private readonly int _suffixLength;

    public HexSuffixShardResolver(int suffixLength = 2)
    {
        _suffixLength = suffixLength;
    }

    public string ResolveShardKey(string id)
    {
        var normalized = id.Replace("-", "").ToLowerInvariant();
        return normalized[^_suffixLength..];
    }
}

// Numeric modulo sharding (for pure numeric IDs)
public class NumericModuloShardResolver : IShardResolver<long>
{
    private readonly int _shardCount;

    public NumericModuloShardResolver(int shardCount)
    {
        _shardCount = shardCount;
    }

    public string ResolveShardKey(long id)
    {
        return (id % _shardCount).ToString("D2");
    }
}

Unified Connection Factory Pattern

public class ShardedConnectionFactory<TOptions>
{
    private readonly ConcurrentDictionary<string, Lazy<Task>> _initializationTasks = new();
    private readonly TOptions _options;
    private readonly IShardSchemaInitializer _initializer;

    public ShardedConnectionFactory(
        TOptions options,
        IShardSchemaInitializer initializer)
    {
        _options = options;
        _initializer = initializer;
    }

    public async Task<TDbContext> CreateAsync(string shardKey, CancellationToken ct)
    {
        var connectionString = BuildConnectionString(shardKey);

        // Use Lazy<Task> to prevent concurrent initialization
        var initTask = _initializationTasks.GetOrAdd(
            connectionString,
            _ => new Lazy<Task>(() => InitializeShardAsync(connectionString, ct))
        );

        await initTask.Value;
        return CreateDbContext(connectionString);
    }

    private async Task InitializeShardAsync(string connectionString, CancellationToken ct)
    {
        await _initializer.InitializeAsync(connectionString, ct);
    }

    private string BuildConnectionString(string shardKey)
    {
        var shardPath = Path.Combine(_options.BaseDirectory, $"{shardKey}.db");
        return $"Data Source={shardPath}";
    }

    private TDbContext CreateDbContext(string connectionString)
    {
        // Create DbContext based on specific ORM
        return Activator.CreateInstance(typeof(TDbContext), connectionString) as TDbContext;
    }
}

Schema Initialization Best Practices

public class SqliteShardInitializer : IShardSchemaInitializer
{
    public async Task InitializeAsync(string connectionString, CancellationToken ct)
    {
        await using var connection = new SqliteConnection(connectionString);
        await connection.OpenAsync(ct);

        // Concurrency optimization configuration
        await connection.ExecuteAsync("""
            PRAGMA journal_mode=WAL;
            PRAGMA synchronous=NORMAL;
            PRAGMA busy_timeout=5000;
            PRAGMA foreign_keys=ON;
        """);

        // Create table structure
        await connection.ExecuteAsync("""
            CREATE TABLE IF NOT EXISTS Entities (
                Id TEXT PRIMARY KEY,
                CreatedAt TEXT NOT NULL,
                UpdatedAt TEXT NOT NULL,
                Data TEXT NOT NULL,
                ETag TEXT
            );
        """);

        // Create indexes
        await connection.ExecuteAsync("""
            CREATE INDEX IF NOT EXISTS IX_Entities_CreatedAt
            ON Entities(CreatedAt DESC);

            CREATE INDEX IF NOT EXISTS IX_Entities_UpdatedAt
            ON Entities(UpdatedAt DESC);
        """);
    }
}

Key Considerations

1. Routing Stability

Routing algorithms must guarantee the same ID always maps to the same shard. Avoid using random or time-related calculations, and don't introduce mutable parameters in the algorithm.

2. Shard Count Selection

Shard count should be determined in the design phase; later modification is very difficult. Consider:

Current and future concurrency levels
Management cost per individual shard
Data migration complexity

3. Migration Considerations

The Hero History approach demonstrates a complete migration path:

Build new sharded storage infrastructure
Implement migration service to copy main database data to shards
Verify query compatibility after migration
Switch read/write paths to shards
Clean up old main database tables

Design sharding approaches with future migration needs in mind. Talk is cheap. Show me the code—but code alone isn't enough; you need a complete migration path. One success doesn't make a system; sustained success does.

4. Monitoring and Operations

Monitor each shard's size distribution, detect data skew promptly
Set up alerts for shard hotspots, avoid single shard becoming bottleneck
Regularly check WAL file sizes, prevent excessive disk space usage
Establish shard health check mechanisms

5. Test Coverage

Test edge cases (empty ID, special characters, overly long ID)
Verify routing determinism, ensure same ID always maps to same shard
Concurrent write stress testing, verify lock contention is effectively mitigated
Migration testing, ensure data integrity and consistency

Conclusion

By comparing three SQLite sharding approaches in the HagiCode project, we can see:

No universal solution: Different business scenarios require different sharding strategies
Core principles are universal: Deterministic routing, transparent access, independent storage, concurrency optimization
Design for the future: Consider migration paths and operational costs

If your project is using SQLite and starting to encounter concurrency bottlenecks, I hope this article provides some ideas. There's no need to rush into migrating to heavyweight databases—sometimes an appropriate sharding approach can solve the problem.

Of course, sharding isn't a silver bullet. Before choosing a sharding approach, ensure:

You've optimized single-table query performance
You've used appropriate indexes
You've enabled WAL mode

Only after these optimizations are complete and performance bottlenecks still exist should you consider introducing sharding. Being able to do simple things well is itself a capability.

Many things are better done once than said once—now let the engineering results speak for themselves.

References

HagiCode Project Repository: github.com/HagiCode-org/site
SQLite WAL Mode Documentation: sqlite.org/wal.html
Orleans Distributed Framework: dotnet.github.io/orleans

Original Article & License

Thanks for reading. If this article helped, consider liking, bookmarking, or sharing it.
This article was created with AI assistance and reviewed by the author before publication.

Author: newbe36524
Original URL: https://docs.hagicode.com/go?platform=devto&target=%2Fblog%2F2026-04-17-sqlite-sharding-strategies-comparison%2F
License: Unless otherwise stated, this article is licensed under CC BY-NC-SA. Please retain attribution when sharing.