Forem: tumf

Conflux Release: A Spec-Driven Orchestrator for Parallel AI Development

tumf — Sat, 11 Apr 2026 04:20:14 +0000

Originally published on 2026-04-11
Original article (Japanese): Confluxをリリース: 仕様駆動でAI開発を並列に進めるオーケストレータ

Conflux is now released. It is a tool designed to move an entire AI coding workflow forward — implementation, acceptance, archiving, and beyond — with spec-driven development as the foundation.

Tools like Claude Code, Codex, and OpenCode have made “writing code” itself much easier. But in real development, the harder problems are different: how to keep the specification in front, how to run multiple changes safely in parallel, and where to place acceptance judgment.

Conflux was built to fill that gap. It is not about one-off code generation. It is an orchestration layer for steadily growing a substantial finished product by stacking changes over time.

This is what Conflux looks like. As a TUI (Text User Interface), it lets you inspect the progress of each change and the overall flow from the terminal.

What problem was I trying to solve?

When you introduce AI agents into a development workflow, things look very fast at first. But once the task becomes even slightly larger, the same issues keep appearing.

The spec is vague while implementation moves ahead
Changes collide with each other
It becomes unclear what is actually finished
The implementation role and the acceptance role get mixed together
The flow stops unless a human keeps watching it

In other words, what I needed was not just “a smarter single agent,” but an operating model that keeps multiple changes moving.

Conflux organizes that around a few principles:

Put the spec first
Split work into independent change units
Use git worktree for safe parallel progress
Separate the implementation role from the acceptance role
Keep the flow moving even when no human is actively watching

Conflux in one sentence

The README calls it a “spec-driven parallel coding orchestrator for AI agents.” In Japanese, I would describe it as an orchestrator for spec-driven AI development with parallel execution and role separation.

One important point is that Conflux itself is not tied to a single “best” model. It is designed around swapability. Different models are good at different things: some are fast but rough, some are slower but better at review, some work better through CLI tools, and some are better at long-form evaluation. In practice, separating roles is often more stable than asking one model to do everything.

How does the workflow look?

The basic idea is simple.

Define the spec and the intent of the change
Split work into change units
Let Conflux assign each change to an independent worktree
Move implementation forward
Run acceptance judgment
Archive successful changes and carry them to final merge

What matters here is that step 1 is not just note-taking. Conflux is not designed around “implement first, explain later.” It assumes that the spec and change intent come first, and implementation flows under that unit.

Conflux keeps looping through the latter half of this flow — implementation → acceptance → archive/merge — in a repeatable way. The multi-Ralph-loop diagram makes this easier to picture.

What this diagram shows is not a straight line where implementation happens once and ends. It is a continuing development loop: implementation is judged, failed work goes into another iteration, and accepted work is archived so the product keeps moving forward.

The advantage of this flow is that you do not have to stuff everything into one huge prompt. Each change can stay small in context, which also makes it easier to control what you pass into an LLM (Large Language Model).

The smallest way to try it

Here is the minimum setup for trying Conflux locally. I assume you have Rust and cargo, plus at least one AI coding agent CLI installed.

There are only three steps:

Install Conflux
Initialize the config file
Start running it

cargo install cflx

# Initialize the config file
cflx init

# Launch the TUI
cflx

That is enough for a first check. If cflx launches, the basic setup is done.

If you want to try headless execution, use the following commands.

# Headless execution
cflx run

# Run only a specific change
cflx run --change add-feature-x

The simplest checks are:

Does cflx launch the TUI?
Does cflx init create a config template?
Does cflx run start the workflow?

What matters most in this first release

At this stage, I cared most about three things.

1. Treating the whole flow, not just one-off generation

There are already many options if all you want is “something that can write code.” But in practice, what matters is everything around that.

Define the spec
Split work into change units
Run work in parallel
Judge acceptance
Carry it through to merge

If a system does not cover that entire sequence, manual operation quickly creeps back in. Conflux is aimed at that whole flow from the start.

2. Building around parallel execution

Even running one change at a time with AI agents is useful. But as the number of changes increases, waiting time becomes obvious.

That is why Conflux uses git worktree to give each change its own independent work area. This makes it possible to move multiple changes forward in parallel with more safety.

But the point is not merely that it can run in parallel. Parallel execution itself is no longer unusual; there are already many systems that run multiple agents or multiple tasks at once.

What is still rare is the next part: treating acceptance, archiving, and final merge for those parallel changes as one continuous development flow.

Of course, parallelization does not automatically make everything faster. Strongly dependent changes still need ordering, and acceptance quality still matters. But at the very least, this is much easier to reason about than mixing everything into one worktree — and Conflux tries to cover the downstream flow as well.

3. Not locking into a single vendor

This was very intentional. AI tools are moving so fast that tightly coupling your workflow to one vendor or product tends to shorten its useful life.

Conflux treats agents as swappable components. For example, you can use Claude Code for implementation, and another model for review or acceptance.

Who is this for?

Right now, Conflux is especially a good fit for people who:

want to work in a spec-driven way
want AI agents to handle implementation while humans stay focused on spec and final judgment
want to run multiple changes at once
want to grow a real product over time instead of doing one-off generation

On the other hand, if you only want to generate one file quickly or just want lightweight code completion, Conflux is probably too much. In that case, a standalone agent CLI is likely the better fit.

Where to start

If you want to explore it, I recommend this order:

Read README.ja.md for the overall picture
Follow QUICKSTART.ja.md for initial setup

The best first test is simply to run cflx init and cflx, and see whether the feeling of “put the spec first, split changes, run them in parallel, and accumulate them with acceptance” matches your own development style.

Closing thoughts

This is the first release of Conflux.

What I wanted to build was not just another wrapper around AI code generation. It is an operational foundation that starts from the spec, runs multiple changes in parallel, judges them, and keeps moving toward a substantial finished product.

It is still early, but I think it is already becoming an interesting base for anyone who wants to bring spec-driven development and AI coding agents into a practical workflow. OpenSpec is only one implementation means for that today, and it may be replaced in the future by another representation or another spec layer. Even so, the core idea of putting the spec first should remain.

If that sounds interesting, try it locally.

Reference links

Web Adapter Tool Agent: Turn Self-Learning Skills into "98% Average Token Reduction on Revisits," Measured

tumf — Mon, 09 Mar 2026 02:25:02 +0000

Originally published on 2026-03-09
Original article (Japanese): Web→Adapter→Tool→Agent: 自己学習型スキルで『再訪を実測で平均98%トークン削減』する

If you build web data extraction by having an LLM read raw HTML every time and "just figure it out," it usually ends up expensive, slow, and brittle.

It gets worse for use cases that revisit the same site repeatedly - news monitoring, documentation tracking, price change detection, and so on. You end up repeating the same failure modes over and over.

Problems like this are often better solved not with ever more heroic scraping tricks, but by accepting a simpler approach: once an extraction method works, freeze it as a reusable tool and keep using it from then on.

This article summarizes a design that turns scraping into a learned tool through a Web→Adapter→Tool→Agent transformation pipeline.

The original inspiration was web2cli (GitHub repository), which I introduced in an earlier article. If you take the idea of "Every website is a Unix command" and push it toward agent operations - revisits, token usage, and drift - it tends to converge in this direction.

More recently, along that line of thought, I added a self-learning skill called self-learning-web-adapter (skill: a package of procedures and tools given to an agent). The skill itself lives in skills/self-learning-web-adapter.

Why this hurts: passing raw HTML directly to an LLM increases cost

First, let’s align on the premise. By "LLM," I mean a Large Language Model that works not only on text, but also as an agent - a system where the LLM calls external tools to get work done.

When you hand raw HTML to an LLM and ask it to extract information, the following costs pile up:

token cost is high
latency (processing wait time) increases
it breaks when the DOM (Document Object Model: the idea of treating HTML as a tree structure) changes even slightly
retries increase when extraction fails, which makes it even more expensive

Personally, my real feeling is: "Fine for the first time, maybe, but I do not want to repeat the same exploration on the second run and beyond."

Direction of the solution: confine exploration to one pass, make execution lightweight

The idea is simple. Stop re-scraping the Web from scratch every time, and transform it like this:

website
  ↓ (exploration: one pass)
adapter
  ↓ (freeze it)
tool / CLI
  ↓ (reuse)
agent

In this model, what the LLM does each time is no longer "interpret raw HTML," but "call a tool."

When it works well, the LLM input can be compressed down to a few hundred tokens of JSON (JavaScript Object Notation: a structured data format).

As a reference point, if you compare "raw HTML" with "adapter output" for a specific site such as a blog or marketing page, you can sometimes see input token reductions in the 95% to 99% range.

It is better not to oversell this. The first learning pass has its own cost, and results vary by site. But the overall direction is very stable: if the workload revisits the same site often, the payoff is usually easy to recover.

Measurement: how many tokens do revisits actually save?

Since the obvious question is "Does it really shrink that much?", here is a simple measurement.

For token counting, I used tiktoken (tokenizer: a mechanism that splits strings into tokens), counted with o200k_base.

The comparison uses three patterns:

pass raw HTML directly to the LLM
pass JSON output from a trained adapter to the LLM
pass JSON output from a web2cli-style wrapper to the LLM

Training used three articles per site, and evaluation used one different article as a holdout set.

Site	HTML tokens	Adapter tokens	web2cli tokens	Reduction vs Adapter	Reduction vs web2cli
`blog.python.org`	15,057	265	351	98.24%	97.67%
`blog.rust-lang.org`	7,656	263	361	96.56%	95.28%
`vercel.com`	224,735	255	335	99.89%	99.85%

On average, the input token reduction looked like this:

average reduction rate (direct adapter output): 98.23%
average reduction rate (web2cli-style command output): 97.60%
average reduction amount (direct adapter output): 82,221.7 tokens / page
median reduction amount (direct adapter output): 14,792 tokens / page

There are two key takeaways:

Raw HTML alone can be tens of thousands to hundreds of thousands of tokens, depending on the site
Once learned, the system can compress only the needed information into a few hundred tokens of JSON

For especially heavy pages like vercel.com, it reduced more than 220k tokens per page.

A few caveats are worth noting too:

this is still a small-scale measurement over only three sites
the extracted fields are mainly limited to title, author, and published
the first access includes learning cost, so the real benefit appears on revisits

A rough estimate can be made with this formula:

saved_cost = saved_tokens_per_page * pages_per_month / 1_000_000 * model_input_price

If your workload is mostly lightweight articles, it is safer to reason from the median value (14,792 tokens / page). If you deal with many SPAs (Single Page Application: a web app that navigates within a single page) or marketing pages, it may skew closer to the average value (82,221.7 tokens / page).

What is an Adapter? A contract that encapsulates site-specific differences

An Adapter is a configuration plus a set of rules that captures "for this host (a domain like example.com), extract data this way."

The important point is that the adapter remains not as LLM reasoning, but as an extraction contract.

For an article page such as a blog post, these are the typical fields you want:

title (title)
author (author)
published (publication datetime)

You also make the extraction strategy explicit - which information source should be prioritized:

JSON-LD (JSON for Linking Data: structured metadata embeddable in HTML)
Open Graph protocol (OG: a meta tag specification for social sharing) and ordinary meta tags
CSS selectors (CSS selector: a notation for targeting HTML elements)

Another practical point is the ability to determine mechanically whether something has broken.

This is where a DOM fingerprint comes in (DOM fingerprint: a signature of DOM structure). During training, you save structural features of the DOM. At runtime, if the current page deviates from that signature, you treat it as drift (structural change) and send it to retraining.

What is a Tool/CLI? A one-line "web interface"

You can leave an adapter as-is, but if you want agents to use it, it is easier to lower it all the way down into a CLI (Command Line Interface: a tool callable from the terminal).

The ideal is simply: "pass a URL, get back JSON."

# Example: extract a rust-lang blog article as structured data (conceptual)
site-article https://blog.rust-lang.org/2026/03/05/some-post.html

Once you have this form, prompt design on the agent side becomes much simpler.

You can write instructions like: "Call this command, then use only title and published from the returned JSON."

A similar idea exists in web2cli, which turns the Web into commands. If you take that idea - "Every website is a Unix command" - and adapt it for agent operations such as revisits, token usage, and drift, you end up roughly in this direction.

Example: running the self-learning skill `self-learning-web-adapter`

From here on, this section gets more "skill-oriented."

This skill is designed for repeatedly reading the same host. It encapsulates site-specific differences into an adapter and reuses them.

Its goals are as follows:

Why: after the second run, I do not want to repeat scraping exploration
What: return URL -> structured JSON (title/author/published + health diagnostics)
Prereq: Python 3.10+, network reachability
Verify: python3 skills/self-learning-web-adapter/scripts/web_adapter_cli.py run <url> outputs JSON

1) Setup

If you only want to use the skill, you do not need to clone the repository.

Adding the skill can be done with npx (npm package runner: a mechanism for running a CLI temporarily), which comes with Node.js.

npx skills add tumf/self-learning-web-adapter

The dependencies are minimal. For HTML parsing, install Beautiful Soup (an HTML parser).

(Python dependencies are not resolved by npx, so this part must be installed separately.)

python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install -U pip
python3 -m pip install beautifulsoup4

Note: the commands below assume that skills/self-learning-web-adapter/ has been added directly under the current directory. If it was installed elsewhere, just adjust the path.

2) Prepare training samples (3 or more from the same host)

The rule for this skill is simple: training (learn) requires "3 or more URLs from the same host."

For a blog, it is usually safer to choose around three articles from the same author or category.

3) Learn -> run to freeze the behavior

If you pass representative pages from the same host, the adapter is saved to adapter_registry/<host>.json.

python3 skills/self-learning-web-adapter/scripts/web_adapter_cli.py learn <url1> <url2> <url3>

Once training succeeds, run it against a different URL (a holdout page not used in training).

python3 skills/self-learning-web-adapter/scripts/web_adapter_cli.py run <holdout-url>

The output contains not only extraction results, but also diagnostic fields such as signature_known and extraction_health.score.

That is one reason this leans toward a "skill": it turns not only extraction, but also failure handling, into a reusable tool.

4) Drift checks and retraining

check returns JSON just like run, but it is intended to answer: "Does this look like it needs retraining?"

python3 skills/self-learning-web-adapter/scripts/web_adapter_cli.py check <url>

If needs_retrain: true is set, send it through retraining.

python3 skills/self-learning-web-adapter/scripts/web_adapter_cli.py retrain <host>

5) Export to a web2cli-style command

This is where it starts to feel like a real skill.

You take a trained adapter and lower it into a single web2cli-style command.

python3 skills/self-learning-web-adapter/scripts/web_adapter_cli.py export-command <host>
python3 skills/self-learning-web-adapter/scripts/web_adapter_cli.py commands

The exported commands are placed in web2cli_commands/, and web2cli_commands/index.json becomes the registry (the command index).

From the agent’s point of view, this is the moment when "a site has become a tool."

Design intuition: how to bias toward skills that work well

The following patterns tend to work well in practice:

Suspect JSON-LD first
Then fall back to Open Graph and ordinary meta
Treat CSS selectors as the last escape hatch
Treat failures not as "exceptions," but as "health checks" (check)
Do not aim for perfection immediately; narrow the fields you want first (start with something like title and date)

And the anti-patterns look like this:

trusting a CSS selector that happened to work on one page, without evidence
embedding extraction logic in the agent prompt and executing it every time
fixing breakage as a one-off patch and never preserving it as a learned artifact

Conclusion: turn "reading the Web" into a tooling problem

With the Web→Adapter→Tool→Agent model, scraping changes from "try hard to read the page" into "build a reusable tool."

This transformation is especially effective for workloads that revisit the same site repeatedly.

Here are a few concrete next steps:

Pick one domain you read often, and narrow the required fields to three (title / published / url is a good start)
Build a working extractor with the priority order JSON-LD -> meta -> CSS
Add a DOM fingerprint and check, then move toward a design that automatically retrains when it breaks

References

Summary of the Web3 Industry in 2025: Technologies Implemented as Products

tumf — Fri, 06 Feb 2026 00:43:51 +0000

Originally published on 2026-01-02
Original article (Japanese): Web3業界2025年総括: プロダクトとして実装された技術たち

Looking back at the Blockchain industry in 2025, the most symbolic development was that "technologies became tangible as products."

After long discussions, Account Abstraction was implemented on the mainnet as EIP-7702, fragmented Layer 2 solutions connected as Superchain, and DeFi integrated application layers through Hooks.

In this article, we will focus on the technical implementations that operated on the mainnet and transformed user experiences, rather than on "specification formulation" or "testnets," as we reflect on 2025.

January: Bitcoin Evolves into a "Payment + Asset" Layer

2025 began with Bitcoin evolving from a mere "digital gold" into an infrastructure capable of practical asset payments on the Lightning Network.

January: Full Operation of Taproot Assets on Mainnet

The Taproot Assets developed by Lightning Labs began to be supported by major wallets (such as Strike and Phoenix).

The technical highlight is that it became possible to embed asset metadata within the Taproot script tree while maintaining Bitcoin's UTXO model, allowing it to be treated as state transitions on Lightning channels. This enabled users to enjoy the experience of "paying Gas fees in BTC while instantly settling stablecoins" on Bitcoin-native security, rather than relying on L2 or sidechains.

February: The "Wall" Between Layer 2s Technically Disappears

February was the month when the fragmentation between Layer 2s began to be resolved at the protocol level, rather than through bridges or external services.

February: Implementation of Optimism Superchain Interoperability

The Optimism ecosystem (including Base, Zora, Mode, etc.) activated native interoperability features.

Unlike traditional "Lock & Mint" bridges, this was achieved through a design where all OP Stack chains share a single bridge contract on L1. As a result, users could complete cross-chain transactions with just one click, such as purchasing NFTs on Zora using USDC on Base, without even being aware of "switching chains" on their wallets.

March: DeFi Incorporates "Apps"

March was the month when DeFi protocols evolved from mere "exchanges" to "execution environments for financial logic."

March: Emergence of Uniswap v4 "Hooks" Ecosystem

A few months after the release of Uniswap v4, pools utilizing the true value of Hooks began to operate one after another.

Technically, this involves a mechanism to call external contracts at timing such as beforeSwap, afterSwap, and beforeModifyPosition during pool creation. By March 2025, the following Hooks were put into practical use:

TWAMM Hook: Automatically time-distributes large orders to minimize price impact
Limit Order Hook: On-chain limit orders managed by the pool itself
Dynamic Fee Hook: Automatically adjusts swap fees based on volatility

As a result, DEXs evolved from "automated vending machines without order books" to "programmable liquidity layers" with functionalities comparable to CEXs (centralized exchanges).

April: Revolution in Wallet Experience (Pectra Upgrade)

April saw the large Ethereum upgrade "Pectra" (Prague-Electra) applied to the mainnet, fundamentally changing the nature of wallets.

April: Smart Account Transformation of EOA via EIP-7702

The highlight of Pectra was the introduction of EIP-7702.

This feature allows temporary "setting" of smart contract code only during transaction execution for existing EOAs (such as standard addresses like Metamask). As a result, users could immediately utilize the following functionalities without moving assets to a new smart contract wallet (SCW):

Gas fee sponsorship: The application side bears the Gas
Batch processing: Execute approval (Approve) and swap in one signature
Session keys: Issuance of temporary keys that permit only specific operations

This moment marked the technical resolution of the biggest hurdle of "having to recreate wallets."

May: Marketization of Shared Security

May was the month when security itself began to circulate as a "product."

May: EigenLayer AVS Goes Live

Multiple AVS (Actively Validated Services) began mainnet operations on EigenLayer.

Ethereum validators reused (Restaking) their staked ETH to also secure other services. In May, not only EigenDA (data availability layer) but also decentralized sequencers, oracles, and bridge monitoring networks began operating as AVS, establishing a pattern of building "middleware with Ethereum-level security without gathering unique validator sets."

June: "Programming" of RWA

June was the month when real-world assets (RWA) were not only tokenized but also incorporated as building blocks in DeFi.

June: BlackRock BUIDL's DeFi Integration

BlackRock's tokenized fund "BUIDL" became available for atomic swaps with stablecoins like USDC and as collateral in lending protocols.

Technically, it is a permissioned token that allows interaction with whitelisted smart contracts (DEX pools and lending pools). This enabled a workflow for institutional investors to "earn yields from U.S. Treasury bonds while being able to instantly liquidate and redirect to crypto investments as needed," all on-chain.

July: "In-App Apps" in Decentralized Social Networks

July was the month when social media transformed from "a place to view posts" to "a place to use apps."

July: Adoption of Farcaster Frames v2

The Frames v2 extension of the decentralized SNS protocol Farcaster became widespread.

This standard (an extension of OpenGraph tags) allows interactive mini-apps to be embedded within posts on feeds. By July, users could complete actions like "minting NFTs," "playing games," "voting in polls," and "making small payments" within the feed, enabling Web3 actions without switching apps. The wallet signing process was also integrated within the Frame, significantly reducing UX friction.

August: Establishing Reliability in Off-Chain Computation

August marked the practical stage of technologies that verify computations done outside the blockchain.

August: Expansion of ZK Coprocessor Adoption

ZK Coprocessors (zero-knowledge coprocessors) such as Axiom and Brevis were adopted by major DeFi protocols.

These coprocessors perform aggregations of all past transaction histories and complex calculations off-chain, submitting only the ZK proof that the results are correct on-chain. This made it possible to implement logic that was previously gas-prohibitive for traditional smart contracts, such as "applying VIP rates based on transaction volume over the past year" and "calculating complex derivative prices."

September: Proof of Parallel Processing EVM's Capabilities

September saw the emergence of implementations that broke through the performance limits of the EVM (Ethereum Virtual Machine).

September: Monad's Mainnet Launch

The L1 chain Monad, characterized by parallel execution EVM, was launched.

Monad achieved 10,000 TPS while maintaining compatibility with existing Ethereum tools through optimistic parallel execution and asynchronous I/O access. This proved that use cases such as high-frequency trading (HFT) and on-chain games, which were previously only possible on non-EVM chains like Solana, could now be realized within the EVM ecosystem.

October: Establishment of Intent-Centric Architecture

October was the month when users were liberated from "creating transactions."

October: Standardization of UniswapX / CowSwap

Formats represented by UniswapX and CowSwap emerged, where users only sign their intent (e.g., "I want to swap token A for B") and delegate the actual route exploration and gas payment to a third party known as a solver.

Technically, the adoption of standard specifications like ERC-7683 (Cross Chain Intents) progressed, creating an environment where solvers could find and execute optimal routes even across different chains. Users no longer needed to worry about "which chain has gas."

November: Starknet's Quantum Resistance and Throughput

November was the month when Starknet, a leader in ZK-Rollups, implemented significant performance improvements.

November: Starknet v0.14 "Quantum Leap"

Starknet conducted a major upgrade, implementing parallel processing for sequencers and optimizing proof generation. As a result, transaction costs dropped even further from when EIP-4844 was introduced, making micro-payments and on-chain games (Autonomous Worlds) feasible at realistic costs.

December: Towards the Next Phase of Ethereum (Glamsterdam)

December was the month when specifications for the upcoming "Glamsterdam" upgrade, scheduled for 2026, were solidified. (For a detailed preview article, click here)

December: Agreement on Implementation of ePBS (EIP-7732)

The implementation details of ePBS (Enshrined Proposer-Builder Separation / EIP-7732), which will be central to the next upgrade, were agreed upon.

This will incorporate the separation of block construction, which currently relies on external software like MEV-Boost, at the protocol level (Enshrine). This will enhance censorship resistance and simplify the role of validators. By the end of 2025, development of client implementations based on this specification began in earnest.

Conclusion: UX is "Concealed," and Infrastructure is "Integrated"

Reflecting on Web3 technologies in 2025, it was a year where technical implementations aimed at "concealing complexity from users" came together. Notably, the establishment of L2s and the proliferation of Intents significantly reduced the opportunities for users to be concerned about gas fees.

EIP-7702: Abstraction of private key management and gas fee payments (easing the burden on applications)
Superchain / AggLayer: Concealing boundaries between chains
Intents: Concealing transaction failure risks and gas management
Hooks: Encapsulating the complexity of financial products behind the scenes

In 2026, consumer-facing applications (Consumer Crypto) offering experiences comparable to Web2 apps will flourish on top of these "invisible infrastructures."

Reference Links

Bold Predictions for 2026 from the Intersection of AI and Web3: The Era of Agents with Wallets

tumf — Fri, 06 Feb 2026 00:42:45 +0000

Originally published on 2026-01-03
Original article (Japanese): AI×Web3から見える2026年の大胆予測: エージェントがウォレットを持つ時代

Reflecting on the AI and Web3 industries in 2025, each has undergone a unique evolution. However, what will truly become interesting in 2026 is when these two domains begin to interact.

AI agents will operate smart contracts, on-chain AI will run on decentralized infrastructure, and the token economy will accelerate AI development. With the technological foundations established by 2025, 2026 will be the year when this fusion transitions from "experimentation" to "practical application."

In this article, we will boldly predict the "future where AI and Web3 intersect," taking into account the recap of the AI industry in 2025 (A tumultuous year that began with the DeepSeek shock) and the recap of the Web3 industry (Technologies implemented as products).

Prediction 1: AI Agents Will Have Wallets and Conduct Economic Activities Autonomously ★★★★☆

The Technological Foundation is Ready

By 2025, the necessary elements for the fusion of AI agents and blockchain have been established.

Preparations on the AI Side:

The ability for agents to utilize multiple tools (e.g., MCP, A2A Protocol)
Long-duration autonomous execution (Copilot agent mode, Kiro autonomous agent)
Default inference capabilities (o3, Claude 4, Gemini 2.5, GPT-5)

Preparations on the Web3 Side:

Revolutionizing the wallet experience (Smart account for EOAs via EIP-7702)
Abstraction of gas fees (Sponsoring, session keys)
Simplification of cross-chain transactions (Superchain Interoperability, Intents)

What Will Happen in 2026

Q1-Q2: Introduction of Agent-Specific Wallet Protocols

Based on existing smart accounts (EIP-7702, ERC-4337), agent-specific wallet specifications will emerge. Key features include:

Budget Limits: Ability to set daily spending caps
Allowlist: Interaction only with specific smart contracts
Audit Logs: All transactions can be reviewed by humans
Emergency Stop: Ability to halt operations immediately in case of issues

Q3-Q4: Expansion of Use Cases

Examples of economic activities conducted autonomously by agents:

DeFi Automated Operations: Moving funds across multiple protocols for yield optimization
NFT Proxy Purchases: Bidding and purchasing on OpenSea or Blur based on user instructions
Data Sales: Selling reports or models generated by agents on-chain
Automated Reward Distribution: Automatically distributing rewards for tasks completed by multiple agents

Factors Accelerating Realization

Agent Collaboration via Model Context Protocol (MCP)

In December 2025, Anthropic donated MCP to the Agentic AI Foundation under the Linux Foundation. This will standardize wallet operations as an MCP server, allowing unified access from multiple AI agents.

Flexibility of EIP-7702

EIP-7702 allows existing EOAs to apply smart contract code "only during transaction execution." This enables agents to have advanced permissions only when necessary, minimizing security risks.

Prediction 2: Practical Implementation of On-Chain AI Inference Market ★★★☆☆

Maturity of Decentralized AI Infrastructure

In 2025, the proliferation of open-weight models (e.g., DeepSeek, Qwen, Kimi K2, gpt-oss) and inference engines (vLLM, SGLang) made the option to run AI in-house realistic.

Simultaneously, the Web3 side saw the marketization of shared security (EigenLayer AVS), enabling a system where "Ethereum validators ensure the security of middleware as a side job."

What Will Happen in 2026

Q2-Q3: Providing AI Inference as AVS

Using the Restaking mechanism of EigenLayer, AI inference services will emerge as Actively Validated Services (AVS).

Mechanism:

Node operators prepare GPU clusters and host specific models (Qwen3, Llama, etc.) with inference engines (vLLM/SGLang)
Staked ETH in EigenLayer serves as collateral to guarantee the accuracy of inference results
Users send inference requests via on-chain and receive results
Nodes providing fraudulent inferences will be slashed (stake confiscation)

Practical Examples:

DeFi protocols delegating price predictions and risk assessments to on-chain AI
NFT projects executing image generation in a verifiable on-chain manner
AI analyzing proposal content in DAO voting and providing summaries to voters

Realizing Verifiability

Combination with ZK Proofs

In August 2025, ZK Coprocessors like Axiom and Brevis were put into practical use. In 2026, this will also apply to AI inference.

Executing AI inference off-chain
Submitting only the ZK proof that the inference was conducted correctly on-chain
Ensuring verifiability while reducing gas costs

This will address the challenge of "Can we trust AI outputs?" through the verification mechanisms of blockchain.

Prediction 3: Token Economy Accelerating AI Development ★★☆☆☆

Clarification of Data and Model Ownership

Until 2025, the rights concerning training data and trained AI models were ambiguous. In 2026, clarification of ownership through Blockchain will progress.

Q2-Q3: Data NFTs and Model NFTs

Data NFTs: Tokenizing training datasets and licensing usage rights
Model NFTs: Tokenizing trained models and selling access rights to inference APIs
Contributor Tokens: Automatically distributing rewards to data providers and annotators based on usage

Example Scenario:

A hospital providing a medical image dataset registers as the owner of the data NFT
An AI startup trains a model using that data
Each time the model is commercialized, a smart contract automatically distributes royalties to the hospital

Sustainability of Open Source AI

In 2025, DeepSeek-R1 and Qwen3 were released as open-source, but recovering development costs remained a challenge.

Q3-Q4: "Open but Monetized" Model

The model is open-weight (available to anyone)
For commercial use, a mechanism to pay licensing fees via on-chain is established
Payments are automated (via smart contracts) and highly transparent

This allows for a balance between "being open" and "sustainable development."

Prediction 4: AI Accelerating UX Improvements in Web3 Products ★★★★★

Challenges in 2025

The biggest challenge for Web3 was the complexity of UX. While technical solutions are being developed through EIP-7702 and Intents, psychological barriers remain, such as "not knowing what can be done" and "fear of failure."

What Will Happen in 2026

Q2-Q4: AI Assistants Becoming Web3 "Translators"

Use Case 1: Natural Language DeFi Operations

User: "Buy $1000 worth of this token and operate it in a safe pool."

AI Agent (internal processing):

Selecting the optimal DEX (e.g., UniswapX / CowSwap based on intents)
Calculating slippage and gas fees
Presenting to the user for approval
Executing the transaction (automatically retrying in case of failure)

Use Case 2: Automating Risk Explanations

Before executing a smart contract, AI:

Performs static analysis of the contract code
References past transaction history (via ZK Coprocessors)
Explains risks in natural language (e.g., "This pool is audited but has a low collateral ratio.")

Use Case 3: Simplifying Cross-Chain Operations

User: "I want to buy an NFT on Optimism with USDC from Arbitrum."

AI Agent:

Constructs a cross-chain transaction using Superchain Interoperability
Optimizes gas fees (L2 bridge vs direct swap)
Completes with a single signature

Supporting Technologies

Application of Vibe Coding

In February 2025, Andrej Karpathy proposed Vibe Coding, emphasizing "conveying intent" over "writing code."

In 2026, this concept will be applied to Web3, allowing users to simply express "what they want to do," and agents will construct the optimal transaction.

Prediction 5: Implementation of Collective Intelligence through "Decentralized AI + DAO" ★★★☆☆

AI Supporting DAO Decision-Making

By 2025, DAO voting participation rates were low (often below 10%), and the quality of decision-making was a concern.

Q3-Q4: AI-Powered Governance

Proposal Summarization: AI automatically summarizes lengthy proposals, reducing the burden on voters
Impact Analysis: Running simulations of the financial impact and changes in tokenomics if a proposal is passed
Individual Recommendations: Recommending proposals that align with each voter's past voting history

Implementation Example:

An AI like NotebookLM analyzes DAO proposals
A research AI like Deep Research collects related past discussions and external information
Results are stored on-chain for voter review

Training Decentralized AI

Q4: Federated Learning × Blockchain

Federated Learning, where multiple nodes collaborate to train AI models, will be managed on the blockchain.

Mechanism:

Each node learns with local data (data is not shared)
Only the learning results (gradients) are submitted on-chain
Smart contracts aggregate gradients and update the global model
Token rewards are distributed based on contribution

Benefits:

Enables training large models while maintaining data privacy
Contributions are transparently recorded, making incentive design easier

Conclusion: 2026 Will Be the Year of "Fusion"

By 2025, AI and Web3 have each built a practical foundation in their respective domains.

AI has evolved from a "smart answerer" to an "agent that integrates research, operation, and generation."
Web3 has progressed from "specification formulation" to "products operating on the mainnet."

In 2026, these two will interact and create practical value.

Summary of Predictions:

AI agents will have wallets and conduct economic activities autonomously ★★★★☆ (Q1-Q2)
Practical implementation of the on-chain AI inference market ★★★☆☆ (Q2-Q3)
Token economy accelerating AI development ★★☆☆☆ (Q2-Q3)
AI accelerating UX improvements in Web3 products ★★★★★ (Q2-Q4)
Implementation of collective intelligence through "decentralized AI + DAO" ★★★☆☆ (Q3-Q4)

Of course, these are all "bold predictions." Many technical challenges (security, scalability, regulation) remain.

However, what we see from the reflections of 2025 is that rather than announcements of "models released" or "specifications decided," it is the actual implementations that run on the mainnet and change user experiences that will drive the next change.

Let’s witness together the transition of the fusion of AI and Web3 from "experimentation" to "practical application" in 2026.

Reference Links

docker-android: A Docker Environment for Controlling Android Emulators from a Web Browser

tumf — Fri, 06 Feb 2026 00:41:35 +0000

Originally published on 2026-01-04
Original article (Japanese): docker-android: WebブラウザからAndroidエミュレータを操作するDocker環境

docker-android is an open-source project that allows you to run Android emulators inside a Docker container and control them remotely via a web browser. This enables the automation of testing in CI/CD pipelines and the creation of a scalable Android testing infrastructure in cloud environments without the need to install Android Studio.

In this article, we will cover an overview of docker-android, how to set it up, basic usage, and practical use cases.

What is docker-android?

docker-android is a Docker-based Android emulator environment developed by budtmo. Its main features are as follows:

Key Features

Web Browser Access: Directly control the Android screen from the browser using noVNC.
Multiple Version Support: Supports Android versions from 5.0 to the latest.
CI/CD Integration: Easily integrates with Jenkins, GitLab CI, GitHub Actions, and more.
Appium Support: Can be integrated with test automation frameworks.
Scalability: Supports scale-out with Kubernetes or Docker Swarm.

Architecture

docker-android consists of the following components.

graph TB
    Browser[Webブラウザ]
    noVNC[noVNC Server<br/>Port: 6080]
    VNC[VNC Server]
    Emulator[Android Emulator<br/>AVD]
    Appium[Appium Server<br/>Port: 4723<br/>オプション]
    ADB[ADB Server<br/>Port: 5554/5555]

    Browser -->|HTTP| noVNC
    noVNC -->|VNC Protocol| VNC
    VNC --> Emulator
    Appium --> ADB
    ADB --> Emulator

    style Browser fill:#e1f5ff
    style Emulator fill:#a5d6a7
    style noVNC fill:#fff59d
    style Appium fill:#ffcc80

Main components:

Android Emulator: Android SDK's AVD (Android Virtual Device)
noVNC: A web interface that allows VNC to be accessed from a browser.
Appium Server: A WebDriver server for test automation (optional).
ADB (Android Debug Bridge): For debugging and command execution.

Setup

Prerequisites

Docker: Version 20.10 or higher.
Hardware Virtualization: Intel VT-x or AMD-V must be enabled.
Memory: Minimum of 4GB (8GB or more recommended).

Basic Startup Method

The simplest way to start an Android 11 emulator is with the following command.

docker run -d -p 6080:6080 -p 5554:5554 -p 5555:5555 \
  --name android-container \
  budtmo/docker-android:emulator_11.0

After starting, accessing http://localhost:6080 in your browser will display the Android screen.

Port Descriptions

Port	Purpose
6080	noVNC (Web Interface)
5554	ADB Console Port
5555	ADB Debug Port
4723	Appium Server (when using Appium image)

Basic Usage

Operating from a Web Browser

Access http://localhost:6080 in your browser.
Click, drag, and swipe on the screen to control Android.
Keyboard input is also possible.

Operating via ADB

You can access the Android inside the container from the Docker host using ADB.

# ADB connection
adb connect localhost:5555

# Check device list
adb devices

# Install an app
adb install my-app.apk

# Shell access
adb shell

Configuration with Docker Compose

If you want to start multiple emulators, using Docker Compose is convenient.

version: '3'
services:
  android-11:
    image: budtmo/docker-android:emulator_11.0
    ports:
      - "6080:6080"
      - "5554:5554"
      - "5555:5555"
    environment:
      - DEVICE=Samsung Galaxy S10
      - DATAPARTITION=4g
    privileged: true

  android-13:
    image: budtmo/docker-android:emulator_13.0
    ports:
      - "6081:6080"
      - "5556:5554"
      - "5557:5555"
    environment:
      - DEVICE=Pixel 6
      - DATAPARTITION=4g
    privileged: true

Startup command:

docker-compose up -d

This will start two emulators, Android 11 and Android 13, accessible on different ports.

Test Automation with Appium

docker-android also provides an image with an integrated Appium server. Appium is an open-source framework for automating mobile application testing.

Starting the Appium Image

docker run -d -p 6080:6080 -p 4723:4723 -p 5555:5555 \
  --name appium-android \
  budtmo/docker-android:emulator_11.0_appium

Example Test Code in Python + Appium

Here is an example of Python code to test an Android app using Appium.

from appium import webdriver
from appium.options.android import UiAutomator2Options

# Appium server configuration
options = UiAutomator2Options()
options.platform_name = 'Android'
options.platform_version = '11'
options.device_name = 'emulator-5554'
options.app = '/path/to/your/app.apk'

# Connect to Appium server
driver = webdriver.Remote('http://localhost:4723', options=options)

# Example test execution
try:
    # Get element and click
    element = driver.find_element('id', 'com.example:id/button')
    element.click()

    # Text input
    input_field = driver.find_element('id', 'com.example:id/input')
    input_field.send_keys('Hello, docker-android!')

    print("Test successful")
finally:
    driver.quit()

Example of Using in CI/CD Pipeline (GitHub Actions)

Here’s an example of running tests using docker-android in GitHub Actions.

name: Android UI Tests

on:
  push:
    branches: [ main ]

jobs:
  test:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v3

      - name: Start Android emulator
        run: |
          docker run -d -p 4723:4723 -p 5555:5555 \
            --name android-emulator \
            budtmo/docker-android:emulator_11.0_appium

          # Wait for the emulator to start
          sleep 30

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'

      - name: Install dependencies
        run: |
          pip install appium-python-client pytest

      - name: Run tests
        run: |
          pytest tests/test_android.py

      - name: Stop emulator
        if: always()
        run: docker stop android-emulator

Customization via Environment Variables

docker-android allows you to customize the emulator's behavior using the following environment variables.

Environment Variable	Description	Default Value
`DEVICE`	Device model name	`Samsung Galaxy S10`
`DATAPARTITION`	Data partition size	`2g`
`EMULATOR_TIMEOUT`	Startup timeout (seconds)	`300`
`RELAXED_SECURITY`	Relax security for Appium	`false`

Usage example:

docker run -d -p 6080:6080 -p 5555:5555 \
  -e DEVICE="Pixel 6" \
  -e DATAPARTITION="4g" \
  budtmo/docker-android:emulator_13.0

Utilizing in Cloud Environments

Example Configuration on AWS ECS

docker-android can also run in container orchestration environments like AWS ECS and GCP Cloud Run.

Here is an example of an AWS ECS task definition (excerpt).

{
  "family": "android-emulator-task",
  "containerDefinitions": [
    {
      "name": "android-emulator",
      "image": "budtmo/docker-android:emulator_11.0_appium",
      "memory": 4096,
      "cpu": 2048,
      "essential": true,
      "portMappings": [
        {
          "containerPort": 6080,
          "protocol": "tcp"
        },
        {
          "containerPort": 4723,
          "protocol": "tcp"
        }
      ],
      "environment": [
        {
          "name": "DEVICE",
          "value": "Samsung Galaxy S10"
        },
        {
          "name": "DATAPARTITION",
          "value": "4g"
        }
      ]
    }
  ],
  "requiresCompatibilities": ["FARGATE"],
  "networkMode": "awsvpc",
  "cpu": "2048",
  "memory": "4096"
}

Example Configuration on Kubernetes

Example of a Deployment in Kubernetes:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: android-emulator
spec:
  replicas: 3
  selector:
    matchLabels:
      app: android-emulator
  template:
    metadata:
      labels:
        app: android-emulator
    spec:
      containers:
      - name: android
        image: budtmo/docker-android:emulator_11.0_appium
        ports:
        - containerPort: 6080
        - containerPort: 4723
        resources:
          limits:
            memory: "4Gi"
            cpu: "2"
          requests:
            memory: "2Gi"
            cpu: "1"
        env:
        - name: DEVICE
          value: "Pixel 6"
        - name: DATAPARTITION
          value: "4g"
---
apiVersion: v1
kind: Service
metadata:
  name: android-emulator-service
spec:
  selector:
    app: android-emulator
  ports:
  - name: novnc
    port: 6080
    targetPort: 6080
  - name: appium
    port: 4723
    targetPort: 4723
  type: LoadBalancer

Performance and Resource Management

Resource Requirements

Recommended resources per emulator:

CPU: 2 cores or more
Memory: 4GB or more (8GB recommended for Android 11 and later)
Disk: 10GB or more

Considerations for Parallel Execution

When starting multiple emulators simultaneously, keep the following points in mind:

Avoid port number conflicts.
Properly allocate resources on the host machine.
Set resource limits using Docker's --cpus and --memory options.

Example of parallel execution:

# Emulator 1
docker run -d -p 6080:6080 -p 5555:5555 \
  --cpus="2" --memory="4g" \
  --name android-1 \
  budtmo/docker-android:emulator_11.0

# Emulator 2
docker run -d -p 6081:6080 -p 5556:5555 \
  --cpus="2" --memory="4g" \
  --name android-2 \
  budtmo/docker-android:emulator_11.0

Troubleshooting

If the Emulator Fails to Start

Cause: Hardware virtualization is disabled.

Solution: Enable Intel VT-x or AMD-V in the BIOS. For Linux, check if KVM is available.

# Check if KVM is available
egrep -c '(vmx|svm)' /proc/cpuinfo

If noVNC Cannot Connect

Cause: Port mapping is incorrect.

Solution: Verify that the ports are correctly mapped with the -p option.

# Check port usage
docker port android-container

Memory Insufficient Error

Cause: Insufficient memory allocated to the container.

Solution: Increase memory with the --memory option or reduce the DATAPARTITION environment variable.

docker run -d -p 6080:6080 -p 5555:5555 \
  --memory="8g" \
  -e DATAPARTITION="2g" \
  budtmo/docker-android:emulator_11.0

Comparison with Android Studio Emulator

Item	docker-android	Android Studio Emulator
Ease of Setup	◎ (Only Docker)	△ (Requires Android Studio)
CI/CD Integration	◎ (Easy)	△ (Complex setup)
Remote Access	◎ (Via browser)	× (Requires VNC, etc.)
Performance	○ (Slightly slower)	◎ (Native-like performance)
Scalability	◎ (Containerized)	△ (Manual management)
GPU Support	△ (Limited support*)	◎ (Full support)

GPU passthrough configuration is required in container environments.

While docker-android is suitable for automated testing in CI/CD and cloud environments, the Android Studio Emulator performs better for local development.

Conclusion

Using docker-android significantly simplifies the setup of Android emulator environments, making it easy to automate testing in CI/CD pipelines and build scalable testing infrastructures in cloud environments.

Key benefits include:

No need for Android Studio, allowing control of Android from a browser.
High reproducibility of environments as it can be managed as a Docker container.
Powerful test automation can be achieved in combination with Appium.
Scalable with Kubernetes or ECS.

If you are considering automated testing in CI/CD or building an Android testing infrastructure in cloud environments, be sure to give it a try.

Using GitHub Actions as a cron job - How to set up periodic execution in CI/CD.
Bucketeer: CyberAgent's Feature Flag Platform - An example of building a development environment using Docker Compose.
Year-End Cleanup for Engineers (Part 2) - About removing unnecessary resources in Docker.

Reference Links

Vibium: A Browser Automation Tool Optimized for AI Agents Over Playwright

tumf — Fri, 06 Feb 2026 00:40:08 +0000

Originally published on 2026-01-05
Original article (Japanese): Vibium: PlaywrightよりAIエージェントに最適化されたブラウザ自動化ツール

Jason Huggins, the creator of Selenium, has announced a new browser automation tool, Vibium, which comes approximately 20 years after Selenium. In this article, we will discuss Vibium's design philosophy, its differences from Playwright and Puppeteer, and why a new tool was necessary in the era of AI agents.

What is Vibium?

Vibium is a browser automation infrastructure designed for AI agents. All of the following features are integrated into a single binary of about 10MB:

Browser Lifecycle Management: Detection and launching of Chrome
WebDriver BiDi Proxy: Communication with the browser
MCP Server: Integration with LLM agents (like Claude Code)
Automatic Waiting: Polling until elements appear
Screenshots: PNG captures of the viewport

The standout feature is that integration with Claude Code can be completed with a single command:

claude mcp add vibium -- npx -y vibium

With this single line, Claude Code can directly manipulate the browser. Chrome is automatically downloaded, eliminating the need for manual setup.

From Selenium to Vibium: 20 Years of Evolution

Jason Huggins created Selenium in 2004, paving the way for browser automation. Since then, the industry has evolved with Selenium WebDriver, Puppeteer, and Playwright, but what prompted Huggins to create a tool again?

Challenges with Existing Tools

Selenium WebDriver (2011 onwards) is mature but has the following issues:

Complex setup (driver management, browser version compatibility)
Boilerplate code required for element waiting
Lack of consideration for integration with AI agents

Playwright (2020 onwards) and Puppeteer (2018 onwards) addressed these issues using the Chrome DevTools Protocol (CDP). However:

CDP is a Chrome-specific protocol (not standardized)
Additional abstraction layers are needed to support multiple browsers
MCP server functionality needs to be implemented separately

The Choice of WebDriver BiDi

Vibium adopts the WebDriver BiDi protocol. This is a next-generation protocol being developed as a W3C standard, combining the best aspects of Selenium WebDriver and CDP:

Bidirectional Communication: Real-time reception of events from the browser
Standardization: Works across Chrome, Firefox, and Safari (by specification)
Low-Level Access: Direct access to network, console, and DOM

Huggins stated in an interview on the TestGuild Podcast:

WebDriver BiDi is a protocol that has learned all the lessons from CDP that made Puppeteer and Playwright great.

Why Create Vibium Instead of Using Playwright?

So, why create a new tool instead of using Playwright? The primary reason is differences in design philosophy.

1. AI Agent-First Design

Vibium is optimized for AI agents:

Built-in MCP Server: Instant integration with Claude Code, Gemini, and local LLMs
stdio Communication: Conforms to the standard communication protocol for LLM agents
Simple API: Minimal methods that are easy for AI to understand

In contrast, Playwright is designed for human test engineers, requiring separate implementation for MCP integration.

2. Zero Setup Philosophy

The design goal of Vibium is to be "invisible binary":

// Just running npm install vibium makes this work
const { browserSync } = require('vibium')

const vibe = browserSync.launch()
vibe.go('https://example.com')
vibe.find('a').click()
vibe.quit()

Downloading the browser, placing drivers, and setting paths are all automated. This emphasizes a developer experience that prioritizes "getting it running first" in the AI era.

3. Simplicity of a Single Binary

Vibium achieves everything with a single Go binary of about 10MB:

┌─────────────────────────────────────────────────────────────┐
│                         LLM / Agent                         │
│          (Claude Code, Codex, Gemini, Local Models)         │
└─────────────────────────────────────────────────────────────┘
                      ▲
                      │ MCP Protocol (stdio)
                      ▼
           ┌─────────────────────┐
           │   Vibium Clicker    │
           │                     │
           │  ┌───────────────┐  │
           │  │  MCP Server   │  │
           │  └───────▲───────┘  │         ┌──────────────────┐
           │          │          │         │                  │
           │  ┌───────▼───────┐  │WebSocket│                  │
           │  │  BiDi Proxy   │  │◄───────►│  Chrome Browser  │
           │  └───────────────┘  │  BiDi   │                  │
           │                     │         │                  │
           └─────────────────────┘         └──────────────────┘

Playwright consists of multiple npm packages and browser binaries, leading to complex dependencies. Vibium chose simplicity.

Practical Use Cases for Vibium

Using as an AI Agent

After integration with Claude Code, you can issue commands in natural language:

"Go to example.com and click the first link"

Claude Code will automatically invoke the following MCP tools:

Tool	Description
`browser_launch`	Launch the browser (visible by default)
`browser_navigate`	Navigate to a URL
`browser_find`	Find elements using CSS selectors
`browser_click`	Click an element
`browser_type`	Input text
`browser_screenshot`	Take a screenshot
`browser_quit`	Close the browser

Using as a JavaScript Library

You can also use it directly as an npm package:

Synchronous API (REPL Friendly)

const fs = require('fs')
const { browserSync } = require('vibium')

const vibe = browserSync.launch()
vibe.go('https://example.com')

const png = vibe.screenshot()
fs.writeFileSync('screenshot.png', png)

const link = vibe.find('a')
link.click()
vibe.quit()

Asynchronous API

const fs = await import('fs/promises')
const { browser } = await import('vibium')

const vibe = await browser.launch()
await vibe.go('https://example.com')

const png = await vibe.screenshot()
await fs.writeFile('screenshot.png', png)

const link = await vibe.find('a')
await link.click()
await vibe.quit()

Automatic Waiting Mechanism

Vibium automatically waits until elements are displayed:

// This will automatically poll until the element appears
const button = vibe.find('button.submit')
button.click()

In Playwright and Selenium, explicit wait code was necessary, but Vibium waits intelligently by default, simplifying the code.

Platform Support

Vibium supports the following platforms:

Platform	Architecture	Status
Linux	x64	✅ Supported
macOS	x64 (Intel)	✅ Supported
macOS	arm64 (Apple Silicon)	✅ Supported
Windows	x64	✅ Supported

During installation, the appropriate binary for the platform is automatically selected, and Chrome's cache is stored in the following locations:

Linux: ~/.cache/vibium/
macOS: ~/Library/Caches/vibium/
Windows: %LOCALAPPDATA%\vibium\

Roadmap: Plans Beyond v2

The current v1 focuses on "integration of AI and browsers," but the v2 roadmap outlines the following features that are released or planned:

Python Client: Released in December 2025 (pip install vibium)
Java Client: Planned for enterprise use
Cortex: Memory and navigation layer
Retina: Recording extension
Video Recording: Capturing test execution
AI Locator: Smarter element searching

All of these are extensions of the vision of "AI agents handling browsers more naturally."

Why Vibium Now?

The reason Jason Huggins created a new tool after about 20 years since Selenium is due to the paradigm shift brought about by the emergence of AI agents.

From Testing Tools to AI Tools

Selenium was created for test automation. The same goes for Playwright and Puppeteer. However, AI agents like Claude Code, Gemini, and ChatGPT use different approaches:

Instead of executing human-written test scripts, AI makes dynamic judgments
Instead of fixed selectors, elements are identified based on visual information and context
Browser operations are part of task achievement, not the ultimate goal

A tool optimized for this new usage was needed. That is Vibium.

The Rise of the MCP Ecosystem

The Model Context Protocol (MCP) is an integration standard for AI agents and tools proposed by Anthropic. Vibium was designed from the ground up as an MCP server, allowing for instant integration with Claude Code, Cursor, and other MCP-compliant AI editors.

This represents a shift in thinking from "creating a tool and then figuring out how to integrate it" to "designing a tool with integration in mind."

A Return to Simplicity

Over 20 years, browser automation tools have become feature-rich but also complex. Vibium regains simplicity by focusing on "only the truly necessary features":

Single binary
Zero setup
Minimal API
Automatic waiting

This philosophy aligns with the recent trend of "reducing complexity" seen in tools like exo and dotenvx.

Token Efficiency Comparison Experiment Between Playwright and Vibium

For AI agents, the important factor is the amount of token consumption required to achieve a task. We executed the same task with both tools and measured token efficiency.

Experimental Conditions

Task: "Access example.com and take a screenshot of the page"

We used OpenCode's token counter to measure actual token consumption.

Measured Results: Token Consumption

In the Case of Vibium

# Executed tool calls
1. browser_launch         # Launch the browser
2. browser_navigate       # Navigate to https://example.com
3. browser_screenshot     # Take a screenshot
4. browser_quit          # Close the browser

Consumed Tokens: 240 tokens

Examples of responses from each tool:

browser_launch: "Browser launched (headless: false)" (7 words)
browser_navigate: "Navigated to https://example.com/" (4 words)
browser_screenshot: "Screenshot saved to /path/to/file.png" (5 words)
browser_quit: "Browser session closed" (3 words)

In the Case of Playwright

# Executed tool calls
1. browser_navigate       # Navigate to https://example.com
2. browser_take_screenshot # Take a screenshot

Consumed Tokens: 2,061 tokens

Examples of responses from each tool:

browser_navigate:
- Executed code snippet: await page.goto('https://example.com');
- Page information (URL, title)
- Entire accessibility tree (in YAML format, hundreds to thousands of words)
browser_take_screenshot:
- Executed code snippet
- Screenshot image data (consumed tokens via Vision API)

Surprising Result: Vibium is 8.6 Times More Efficient

Playwright: 2,061 tokens (2 tool calls)
Vibium:       240 tokens (4 tool calls)

Efficiency Ratio: Vibium achieves the same task at about 1/8.6 the tokens of Playwright

{{< figure-desc src="/images/vibium-selenium-creator-browser-automation/token-comparison-race.png" alt="Token Efficiency Comparison: Playwright vs. Vibium Race" >}}

Why Such a Difference?

Reasons Playwright Consumes More Tokens:

Automatic Sending of Accessibility Tree: browser_navigate returns the entire DOM structure of the page in YAML format every time.

   - generic [ref=e2]:
     - heading "Example Domain" [level=1] [ref=e3]
     - paragraph [ref=e4]: This domain is for use...
     - paragraph [ref=e5]:
       - link "Learn more" [ref=e6]:
         - /url: https://iana.org/domains/example

This data alone consumes hundreds of tokens.

Displaying Executed Code: Displays actual Playwright code for debugging purposes.

   await page.goto('https://example.com');
   await page.screenshot({...});

Image Data: Screenshots are returned as images and processed by the Vision API.

Reasons Vibium is Efficient:

Minimal Responses: Only success/failure messages (averaging fewer than 5 words).
Images Stored Locally: Screenshots return only the file path (no token consumption).
No Code Display: Only simple status messages are returned.

Impact on Complex Tasks

Even for a simple page like example.com, an 8.6x difference emerges. In actual web applications (e.g., dashboards, admin panels), this difference will widen even further:

Page Complexity	Playwright Consumption	Vibium Consumption	Ratio
Simple (example.com)	2,061 tokens	240 tokens	8.6x
Medium (blog post)	Estimated 5,000–10,000	240–300	16–33x
Complex (admin panel)	Estimated 10,000–50,000	240–400	25–125x

Conclusion:

Vibium is designed to return "only the necessary information for AI."
Playwright returns "information for humans to debug."
Vibium's approach is overwhelmingly advantageous for reducing operational costs for AI agents.

These experimental results clearly illustrate why Vibium was created separately from Playwright. In the era of AI agents, "how efficient" is more important than "how feature-rich."

Differentiating Between Playwright and Vibium

Both tools are excellent, but the optimal choice varies depending on the use case.

Cases Where Playwright is More Suitable

Playwright is better suited for scenarios such as:

1. Human-Written E2E Tests

Fixed test scripts executed in CI/CD pipelines
Existing Playwright test suites
Need for debugging information (accessibility tree, executed code)

2. Cross-Browser Testing

Running the same code across Chrome, Firefox, and Safari
Verifying differences in behavior between browsers
Emulating mobile browsers

3. Advanced DOM Manipulation

Complex operations with Shadow DOM or iframes
Intercepting and mocking network requests
Fine control over browser contexts

4. Integration with Existing Ecosystems

Official tools like Playwright Test Runner, Playwright Inspector
Benefits of TypeScript type definitions (auto-completion, type checking)
Official support and community from Microsoft

Example: Complex E2E Test Scenario

// Playwright's strength: Detailed control
import { test, expect } from '@playwright/test';

test('Complex payment flow', async ({ page, context }) => {
  // Mocking network requests
  await page.route('**/api/payment', route => {
    route.fulfill({ status: 200, body: '{"success": true}' });
  });

  // Operations across multiple tabs
  const [newPage] = await Promise.all([
    context.waitForEvent('page'),
    page.click('a[target="_blank"]')
  ]);

  // Manipulating elements within Shadow DOM
  const shadowHost = await page.locator('custom-element');
  const shadowButton = await shadowHost.evaluateHandle(
    el => el.shadowRoot.querySelector('button')
  );
});

Cases Where Vibium is More Suitable

Conversely, Vibium is optimal for scenarios such as:

1. Automation by AI Agents

LLMs (Claude, Gemini, etc.) operating the browser
Executing tasks based on natural language instructions
Emphasizing token cost efficiency in operations

2. Dynamic Browser Operations

Tasks where steps are not predetermined
Situations requiring actions to change based on user input
Prioritizing "reaching the goal" over fixed procedures

3. Simple Scripts and REPL

Interactively operating the browser in Node.js REPL
Direct invocation from Python scripts
Writing simply with synchronous API

4. Zero Setup is Essential

Minimizing dependencies in CI environments
Keeping Docker container sizes small
Quick prototyping

Example: AI Agent Task

// Vibium's strength: Simplicity and AI integration
const { browserSync } = require('vibium');

// REPL-friendly synchronous API
const vibe = browserSync.launch();
vibe.go('https://example.com');

// AI determines the next step
// "Find the login button and click it"
// → Automatically waits if the element is not found
// → If still not found, requests AI to reassess

Criteria for Differentiation

Criteria	Playwright	Vibium
Executor	Human-written scripts	AI agents
Nature of Tests	Deterministic (fixed steps)	Dynamic (context-dependent judgments)
Need for Debugging	High (detailed information needed)	Low (results-focused)
Token Cost	Not a concern	Important
Cross-Browser	Essential	Chrome-centric is fine
Existing Assets	Playwright code available	Zero start

Practical Suggestion: Use Both

In many projects, the ideal approach is to differentiate as follows:

Fixed tests in CI/CD → Playwright (stability-focused)
Exploratory testing and demos → Vibium (flexibility-focused)
AI assistant integration → Vibium (token efficiency-focused)

Both are excellent tools, and the decision should be based on "which is more suitable" rather than "which is better."

{{< figure-desc src="/images/vibium-selenium-creator-browser-automation/tool-selection-crossroads.png" alt="Tool Selection Crossroads: Differentiating Playwright and Vibium" >}}

Conclusion

The reasons Vibium was created anew rather than using Playwright can be summarized in the following three points:

AI Agent-First Design: Built-in MCP server, stdio communication, simple API
Adoption of WebDriver BiDi: A standardized next-generation protocol
Zero Setup Philosophy: Single binary, automatic browser downloads, immediate functionality

The evolution from Selenium to Playwright aimed at creating "better testing tools." In contrast, Vibium pioneers a new category as "infrastructure for AI to operate browsers."

As demonstrated by the token efficiency experiment, Vibium adopts a new approach where "AI agents visually understand the web." This contrasts with the traditional DOM manipulation-centric tools.

If you're interested in browser automation in the AI agent era, be sure to try out Vibium.

Reference Links

Agentic CLI Design: 7 Principles for Designing CLI as a Protocol for AI Agents

tumf — Fri, 06 Feb 2026 00:38:38 +0000

Originally published on 2026-02-06
Original article (Japanese): Agentic CLI Design: CLIをAIエージェント向けプロトコルとして設計する7つの原則

CLI tools have long been designed as interfaces for human interaction with terminals. However, with the rise of LLMs (Large Language Models) and AI agents (programs that autonomously invoke tools to progress tasks), there is a new role being demanded of CLIs. This new role is to be designed as a "protocol/API that agents can safely, reliably, and repeatedly invoke."

Recently, I have had more opportunities to have agents run CLI commands. Issues that might not bother humans, such as "stopping at confirmation prompts," "logs mixing with stdout making parsing impossible," and "accidentally repeating the same operation," can occur quite normally when interacting with agents.

In this article, I will summarize the design concept I propose called "Agentic CLI Design." This redefines CLI from "a UI operated by humans" to "a protocol invoked by agents," establishing seven design principles that ensure functionality based on assumptions of failure, re-execution, and non-interactivity.

What is Agentic CLI Design?

Agentic CLI Design consists of design principles for CLIs that allow LLMs/agents to execute commands safely and reliably in a non-interactive, iterative, and failure-prone environment.

Rather than optimizing for "tactility" or "ease of use" for humans, it focuses on ensuring that machines can read, judge, re-execute, and recover.

Success Conditions

The success conditions for Agentic CLI Design are that agents must meet the following criteria:

No confusion: Options are clearly presented, allowing for judgment on the next action.
No destruction: Default to safety, requiring explicit confirmation for destructive operations.
No blockage: Able to complete non-interactively, with clear timeout/retry policies.
Repeatable: Idempotent, ensuring safety when re-executed.
Self-repairing: Observable, allowing for judgment of recovery procedures from errors.

7 Principles

Agentic CLI Design is composed of the following seven principles (Principle 1 to Principle 7).

Principle 1: Machine-readable

Principle: Output is structured and provided in a format that machines can reliably parse.

Design Checks:

Options for --json / --output json|yaml|text are available.
Strict adherence to standard output (stdout) = results / standard error output (stderr) = logs/progress (do not mix).
Errors are also structured (preferably in JSON).
Schema is stable (breaking changes managed via schemaVersion, etc.).

Example:

The CLI for Kubernetes called kubectl supports JSON output. The AWS CLI also has --output json.

# Structured output on success
kubectl get pods -o json

# Example using JSON output
aws ec2 describe-instances --output json 2>&1

Human-friendly "readable tables" are secondary. Agents need to be able to reliably parse JSON or YAML.

Minimum Recommended Response (JSON):

On success:

{
  "ok": true,
  "type": "items.list",
  "schemaVersion": 1,
  "data": {
    "items": [
      {"id": "...", "createdAt": "2026-02-05T08:00:00Z"}
    ],
    "nextCursor": "..."
  }
}

On failure:

{
  "ok": false,
  "type": "items.list",
  "schemaVersion": 1,
  "error": {
    "code": "rate_limited",
    "message": "...",
    "retryAfterMs": 1200
  }
}

Principle 2: Non-interactive by default

Principle: Do not assume interactive prompts, allowing for completion in headless environments (running without screens or interactive operations, such as CI or job runners).

Design Checks:

Options for --yes / --force / --no-confirm / --non-interactive are available.
Must be able to complete in environments without TTY.
If interaction is necessary, it must be explicitly opted in.

Example:

Having pre-execution options like in Terraform makes agent operations easier.

# Execute without interaction
terraform apply -auto-approve

# Explicitly in non-interactive mode
apt-get install -y package-name

Agents cannot respond to "Y/N?" prompts. All choices must be specified in advance via options. It is also crucial that the process does not stop in environments without TTY.

Authentication (OAuth/headless) Key Points:

If possible, prioritize Device Authorization Grant (RFC 8628).
Provide auth status --json for agents to confirm prerequisites.
Support migration to headless environments with auth export / auth import.
When --non-interactive, return "error + next steps" without asking for confirmation.

Principle 3: Idempotent & Replayable

Principle: It is safe to execute the same command multiple times, and the results are predictable.

Idempotence means that repeating the same operation does not change the result. Agents may "hit the same command again" due to timeouts or network interruptions. Therefore, a design that avoids accidents during re-execution is necessary.

Design Checks:

Accept dedupe-key / client-request-id for sending/creating.
Allow choosing behaviors for "already created": --if-exists skip|update|error.
Clearly indicate paging for retrieval: --limit --cursor --all.

Example:

# Idempotent creation (skip if already exists)
kubectl apply -f deployment.yaml

# For a CLI hitting an HTTP API, explicitly provide a request ID (deduplication key)
curl -sS -X POST https://api.example.com/v1/items \
  -H 'Content-Type: application/json' \
  -H 'Idempotency-Key: 01JHXXXX...' \
  -d '{"name":"example"}'

Agents may re-execute the same command due to network errors or timeouts. A design that ensures safety during re-execution is essential.

Principle 4: Safe-by-default

Principle: Destructive operations are not executed by default and require explicit confirmation.

Design Checks:

Destructive operations can enforce --dry-run / --confirm <id>.
Deletion requires --force, ensuring no accidents by default.
Minimize permissions/scope, returning "next steps" when insufficient.

Example:

# Dry-run for pre-confirmation
terraform plan

# Execution requires explicit approval
terraform apply

# Prepare a preview before destructive operations
kubectl diff -f deployment.yaml

Agents can accidentally perform deletions. Multiple layers of confirmation are necessary for destructive operations.

Principle 5: Observable & Debuggable

Principle: The execution status can be observed, and recovery procedures can be determined in case of errors.

Design Checks:

Options for --verbose / --debug / --log-format json are available.
Ability to pass correlation IDs with --trace-id.
Classify exit codes to facilitate automatic recovery:
- Example: 0=success / 2=argument error / 3=authentication error / 4=retry recommended.

Example:

# Output detailed logs
kubectl apply -f deployment.yaml --v=9

# Determine based on exit code
if [ $? -eq 4 ]; then
  echo "Retryable error, waiting..."
  sleep 5
  retry_command
fi

Agents will determine the "next step" from error messages. Structured exit codes and errors are crucial.

Recommended Exit Code Classification:

0: Success
2: Argument error / usage error
3: Authentication / permission error
4: Retry recommended (rate limit / transient)

Principle 6: Context-efficient

Principle: Do not waste the context window of LLMs.

Design Checks:

Use --fields/--select (projection) to retrieve only necessary fields.
Handle large data with --output ndjson (streaming format with one JSON per line).
Default to summaries, with details provided via get/--include-*.
Implement server-side filtering (since/until/query/type…).

Example:

# Retrieve only necessary fields
kubectl get pods -o custom-columns=NAME:.metadata.name,STATUS:.status.phase

# Handle large data with paging
aws s3api list-objects-v2 --bucket my-bucket --max-items 100

Agents can hit token limits if they cram too much data into the context window (the maximum input LLMs can reference at once). A design that retrieves only the minimum necessary data is required.

Principle 7: Introspectable

Principle: The CLI itself should output specifications in a machine-readable format, allowing agents to self-discover.

Design Checks:

Provide commands --json (command list and argument list).
Provide schema --command ... --output json-schema (command-level JSON Schema, defining the structure of JSON).
Provide --help --json (examples, exit codes, error vocabulary).
Example of top-level fixed fields for --output json:
- schemaVersion, type, ok.

Example:

# This is an example of desirable self-describing design
tool commands --output json
tool schema --command items.list --output json-schema
tool help items.list --output json

The Model Context Protocol (MCP) can derive schemas from tool definitions, but CLIs often become black boxes. By having the CLI output specifications in a machine-readable format, agents can achieve self-discovery.

Recommended Set of Introspection Commands:

tool commands --json
tool schema --command <subcommand...> --output json-schema
tool help --json (or --help --json for each command)

Anti-patterns

The following are examples of "commonly broken" aspects from the perspective of Agentic CLI Design.

Logs/Progress Mixed with stdout

# ❌ Bad Example
echo "Processing..."
echo '{"result": "success"}'

Agents will fail when trying to parse JSON. Please output logs to stderr.

JSON Structure Changes Based on Conditions

# ❌ Bad Example
# On success: {"data": {...}}
# On failure: {"error": "..."}

Please clarify success/failure with an ok field and unify the structure.

Default Interaction Causes Blockage in Headless Environments

# ❌ Bad Example
read -p "Continue? (y/n): " answer

This will cause blockage in CI environments or job runners. Please provide a --yes option.

Destructive Commands Can Be Executed by Default

# ❌ Bad Example
rm -rf /data/*

Please provide two-step confirmations with --dry-run and --confirm.

`--all` Results in Huge JSON Output

# ❌ Bad Example
curl https://api.example.com/items?all=true

This will explode the context window. Please implement paging (--limit / --cursor).

Authentication Requires a Browser, Failing in Remote/Container Environments

# ❌ Bad Example
open https://auth.example.com/login

Please prioritize Device Authorization Grant (RFC 8628).

Scorecard (Review Checklist)

The following is a checklist that can score the seven principles of Agentic CLI Design on a scale of 0/1/2 points. It is structured with specific items to facilitate easy adaptation to other projects.

Principle 1: Machine-readable

[ ] --output json is available.
[ ] stdout = results / stderr = logs is adhered to.
[ ] Errors are structured (JSON recommended).
[ ] schemaVersion is present / compatibility policy is documented.

Principle 2: Non-interactive

[ ] --non-interactive is available (can be auto ON without TTY).
[ ] All operations requiring interaction are opt-in (i.e., default is non-interactive).
[ ] Vocabulary for --yes/--force/--no-confirm is unified.

Principle 3: Idempotent & Replayable

[ ] Writing commands have --client-request-id / --dedupe-key equivalents.
[ ] There is a policy for --if-exists.
[ ] --cursor/--limit/--all are available (with --all implementing internal paging).

Principle 4: Safe-by-default

[ ] Destructive operations can use --dry-run.
[ ] Actual execution requires additional guards like --confirm <id> / --force.

Principle 5: Observable & Debuggable

[ ] --debug is available (logs to stderr).
[ ] --log-format json is available.
[ ] Accepts --trace-id.
[ ] Exit code classification is present (2/3/4, etc.).

Principle 6: Context-efficient

[ ] --fields/--select is available.
[ ] --output ndjson is available.
[ ] Heavy fields are opt-in via --include-*.

Principle 7: Introspectable

[ ] commands --json is available.
[ ] schema --command ... --output json-schema is available.

This Scorecard can be used for reviewing CLIs or as acceptance criteria.

Released AgentSkill

The content written in this article consists of "principles," which can lead to confusion when trying to implement them. Therefore, I have released an AgentSkill (a manual for agents) that supports Agentic CLI Design.

tumf/skills: agentic-cli-design

What it includes (having these elements generally stabilizes agent operations):

Recipes by task (shortest command sequences).
Guardrails (flow like --dry-run → confirmation → --confirm).
Recommended defaults (--output json, --non-interactive, paging, etc.).
Typical success/failure output examples (JSON).
Recovery procedures for errors (retries, authentication, insufficient permissions, etc.).

Using this AgentSkill as a foundation, I believe it is the fastest way to solidify "procedures and vocabulary for safe usage" for your CLI.

CLI vs MCP: Considerations for Differentiation

The Model Context Protocol (MCP) is a standard protocol for connecting AI models with external tools. MCP and CLI are not competing; they can be differentiated as follows:

Cases Suitable for CLI

Existing CLI tools are available: GitHub CLI (gh), kubectl, aws cli, etc.
Stateless operations: Operations that can be completed in a single command.
Integration with Unix pipes: Integration with existing shell scripts.

Cases Suitable for MCP

No existing CLI tools: Custom services or APIs.
Stateful operations: Operations that require maintaining state across multiple calls.
Real-time streaming: MCP supports streaming responses.
Custom business logic: Applying unique rules to tool access.

Agentic CLI Design is a set of design principles for optimizing existing CLI tools for agents. When creating new tools, consider both MCP and CLI.

Conclusion

Agentic CLI Design is a design concept that redefines CLI from "a UI operated by humans" to "a protocol invoked by agents."

By being mindful of the seven principles (Principle 1 to Principle 7), you can design CLIs that allow agents to operate "without confusion, without destruction, without blockage, repeatedly, and while self-repairing."

Using this Scorecard when reviewing existing CLI tools (gh, kubectl, aws cli, etc.) can help identify areas for improvement for agents.

If you are interested, please take the time to score your CLI tool using the Scorecard.

Reference Links

jj workspace: Parallel Development with Vibe Coding Without Getting Stopped by Conflicts

tumf — Thu, 05 Feb 2026 12:43:12 +0000

Originally published on 2026-01-06
Original article (Japanese): jj workspace: コンフリクトで止まらないvibe coding並列開発

Recently, while developing with Claude Code running in parallel across four instances, I encountered a bit of a problem.

I created four directories using git worktree and assigned the AI to implement features in each one—up to this point, everything was smooth. However, when it came time to merge, I was hit with a storm of "CONFLICT" messages. If one worktree gets blocked by a conflict, all the work derived from it gets blocked as well, ultimately resulting in the AI just sitting idle.

"I don’t want to stop working just because a conflict occurred..." I thought, and while researching, I discovered a version control system called Jujutsu (jj), developed primarily by Google engineers. This tool turned out to be quite effective and fit my vibe coding style perfectly, so I wanted to share it.

What is Jujutsu (jj)?

Jujutsu is an open-source VCS that is compatible with Git, initiated by engineers at Google. Although it is not an official Google product, it is still actively developed by Googlers. The key point is its "Git compatibility," allowing it to be integrated directly into existing Git repositories.

So, what are the benefits?

Work does not stop even if conflicts occur: This is the biggest advantage. Conflicts are recorded in commits, allowing other work to continue normally.
All operations are logged: You can view the entire history with jj op log, and you can revert to any point. This is subtly helpful when the AI's output is not quite right.
Automatic commits: Files are committed as soon as they are saved. No need for git add or similar commands.
Built with Rust and fast: This is just a bonus.

Installation

If you're on macOS, you can install it with brew install jj.

# macOS (Homebrew)
brew install jj

# Linux (cargo)
cargo install --git https://github.com/jj-vcs/jj jj-cli

# Check version
jj --version

Conceptual Differences from Git

Initially, I was a bit confused by the subtle conceptual differences from Git. Once you get used to it, it becomes easier, but at first, I was like, "Wait, there are no branches?"

Concept	Git	Jujutsu (jj)
Working Copy	Managed in the staging area (index)	Always one commit (`@`)
Branches	Required (detached HEAD is a special state)	Not needed (can work anonymously)
Conflicts	Treated as errors (blocks work)	Recorded in commits (work can continue)
History Manipulation	rebase (risky)	Automatic rebase (safe)

The especially "no need for branches" aspect felt odd at first, but in vibe coding, it allows you to "start working and decide on a name later," which turned out to be quite convenient.

Basic Operations in jj

Trying out jj in an existing Git repository is straightforward. It can be used alongside Git, so if you don’t like it, you can revert back.

Initializing a Repository

# Initialize jj in an existing Git repository
cd your-git-repo
jj git init --git-repo .

# Check the current state
jj log

Example output of jj log:

```text {hl_lines="1"}
@ qpvuntsm tumf@example.com 2026-01-05 14:30:00
│ (empty) (no description set)
◉ rlvkpntz tumf@example.com 2026-01-05 14:25:00 main
│ Add README
~




`@` represents the current working copy commit. It is similar to Git's `HEAD`, but it always treats uncommitted changes as part of a commit.

### Basic Change Flow

If you're used to Git, you might initially think, "Wait, what?" since you don’t need `git add` or `git commit`.



```bash
# 1. Start a new change
jj new main -m "Add feature A"

# 2. Edit files
vim src/feature.js

# 3. Check the state (already committed!)
jj log

# 4. Add/modify description
jj describe -m "Add authentication feature"

# 5. Start the next change
jj new -m "Add tests for feature A"

As soon as you save the file, it’s already committed, eliminating the need for the git add → git commit flow. Honestly, this alone makes it much more comfortable.

Editing Commits

Editing past commits is also easy. In Git, I would nervously use git rebase -i, but with jj, it’s much more relaxed.

# Edit a past commit
jj edit <change-id>
vim src/feature.js
jj describe -m "Updated feature"

# Return to the original position
jj edit @

Moreover, descendant commits are automatically rebased, so you don’t have to worry about dependencies.

git worktree vs jj workspace

Now, onto the main topic. To engage in vibe coding with parallel development, you need "multiple working directories."

Git has git worktree, while jj has jj workspace. While they may seem similar at first glance, there are several critical differences.

Basic Mechanism is Similar

The directory structure looks similar:

repo/               # Main directory
├── .git/          # Git data (shared)
├── .jj/           # jj data (shared)
└── src/

workspace-1/        # Parallel working directory 1
└── src/

workspace-2/        # Parallel working directory 2
└── src/

You will still need to run npm install in each directory for both, so there's no escaping that.

Three Key Differences

Now, here are the important differences that made me realize, "Oh, these are completely different."

Difference 1: Behavior During Conflicts (This is the Biggest One)

With git worktree, it goes like this:

# Merge feature-A in worktree-1
cd ../worktree-1
git merge feature-A  # ✅ Success

# Merge feature-B in worktree-2
cd ../worktree-2
git merge feature-B  # ❌ CONFLICT!
# At this point, worktree-2 is blocked
# Even if you create a new branch in another worktree, it branches from the main before the conflict

When worktree-2 gets blocked, all work derived from it also stops. This was painful.

On the other hand, with jj workspace:

# Rebase feature-A in workspace-1
cd ../workspace-1
jj rebase -s feature-A -d main  # ✅ Success

# Rebase feature-B in workspace-2
cd ../workspace-2
jj rebase -s feature-B -d feature-A  # ⚠️ Conflict occurs

# But work can continue!
jj log
# @  feature-B (conflict) ← Conflict state is recorded
# ◉  feature-A
# ◉  main

# Start a different task in workspace-3
cd ../workspace-3
jj new main -m "feature-C"  # ✅ Work starts without issues

Conflicts are "recorded" only, and other work is not blocked. This is incredibly helpful for vibe coding.

{{< figure-desc src="/images/jj-workspace-vibe-coding-parallel-development/conflict-comparison-diagram.png" alt="Comparison diagram of conflict handling between git worktree and jj workspace" caption="git worktree blocks on conflict, while jj records it and allows continuation" >}}

Difference 2: State Sharing and Visibility

In git worktree, uncommitted changes in each worktree are not visible from others:

# Changes in worktree-1 (uncommitted)
cd ../worktree-1
echo "test" > newfile.txt

# Not visible from worktree-2
cd ../worktree-2
ls newfile.txt  # ❌ Does not exist

In jj, changes are automatically committed, so they are visible from all workspaces:

# Changes in workspace-1 (auto-committed)
cd ../workspace-1
echo "test" > newfile.txt

# Visible from workspace-2
cd ../workspace-2
jj log  # Changes from workspace-1 are displayed with the @ mark

This reduces the instances of "Wait, where was I working and what was I doing?"

Difference 3: Automatic Rebase for Dependent Branches

This is also subtly convenient. When there are dependencies like feature-A → feature-B → feature-C.

In git worktree, if you fix feature-A, you need to manually rebase B and C:

cd ../worktree-A
git commit --amend -m "Fix feature A"

# B and C need manual rebase
cd ../worktree-B
git rebase feature-A  # Manual

cd ../worktree-C
git rebase feature-B  # Manual

With jj, it does it automatically for you:

cd ../workspace-A
jj describe -m "Fix feature A"

# B and C are automatically rebased
jj log  # All updated

Since I often fine-tune code generated by AI later, this is a nice feature.

Practical Workflow for Vibe Coding

Here’s how I actually run my workflow with Claude Code in four parallel instances.

{{< figure-desc src="/images/jj-workspace-vibe-coding-parallel-development/workspace-architecture-diagram.png" alt="Architecture diagram of parallel development with jj workspace" caption="Four workspaces operate in parallel around a shared repository hub" >}}

Setup

The initial setup looks like this. It’s a bit tedious to run npm install four times, but it’s just the first time, so it’s manageable.

# Initialize in the project directory
cd my-project
jj git init --git-repo .

# Create four workspaces
jj workspace add ../workspace-1
jj workspace add ../workspace-2
jj workspace add ../workspace-3
jj workspace add ../workspace-4

# Install dependencies in each workspace (this is unavoidable)
for i in {1..4}; do
  cd ../workspace-$i
  npm install
done

Starting Parallel Tasks

I throw different tasks at the AI in each workspace. I open four terminal windows and run Claude Code in each.

# workspace-1: Authentication feature
cd ../workspace-1
jj new main -m "Add authentication"
# Ask Claude Code to "Implement JWT authentication"

# workspace-2: Database optimization
cd ../workspace-2
jj new main -m "Optimize database queries"
# Ask Claude Code to "Fix N+1 queries"

# workspace-3: Adding UI components
cd ../workspace-3
jj new main -m "Add user profile component"
# Ask Claude Code to "Create a profile screen"

# workspace-4: Adding tests
cd ../workspace-4
jj new main -m "Add integration tests"
# Ask Claude Code to "Write E2E tests"

While the AI is working, I switch to another workspace to review or think about the next task.

Handling Conflicts

When a conflict actually occurs, for example, if workspace-1 and workspace-2 edit the same file:

# Integrate changes from workspace-1 into main
cd ../workspace-1
jj rebase -s @ -d main  # ✅ Success

# When trying to integrate changes from workspace-2...
cd ../workspace-2
jj rebase -s @ -d main  # ⚠️ Conflict occurs

# But work can continue
jj log
# @  workspace-2-change (conflict)
# ◉  workspace-1-change
# ◉  main

# Work in workspaces 3 and 4 is unaffected!
cd ../workspace-3
jj new main -m "Continue other work"  # ✅ No issues

The key point is that "conflict resolution can be done later in bulk." This prevents the AI from sitting idle.

# After all tasks are complete, resolve conflicts in bulk
cd ../workspace-2
jj edit @  # Move to the commit with conflicts
jj resolve  # Resolve interactively
jj describe -m "Optimize queries (resolved conflict)"

Final Merge

Once everything is done, I push everything at once. Pushing to GitHub is done in the usual way.

# Check changes in all workspaces
cd my-project
jj log --all

# Organize with squash if necessary
jj squash -s <change-id> -d main

# Push as a Git repository
jj git push

Tracking Prompt History with Operation Log

This is a side benefit I discovered, but the operation log is handy for tracking "what was generated with which prompt."

{{< figure-desc src="/images/jj-workspace-vibe-coding-parallel-development/operation-log-diagram.png" alt="Timeline diagram of the operation log" caption="All operations are recorded, allowing restoration to any point" >}}

Record of All Operations

# Display all operations
jj op log

# Example output
@  qpvuntsm agent-1@example.com 2026-01-05 15:30:00 op_abc123
│  describe "Implement authentication"
◉  sqpuoqvx agent-2@example.com 2026-01-05 15:29:45 op_def456
│  edit src/auth.js
◉  rqxostpw agent-1@example.com 2026-01-05 15:29:30 op_ghi789
   new --after main

Record Prompts in Commit Messages

I tend to include the prompts I give to the AI directly in the commit messages:

jj describe -m "Prompt: Add user authentication with JWT.
Implementation includes:
- Login endpoint with email/password
- Token generation and validation
- Middleware for protected routes"

This way, if I later wonder, "Why is this code like this?" I can refer back to the prompt, which is very convenient.

Restoring to Specific Operations

If I want to roll back because "the AI's output was not quite right," I can easily revert:

# Return to a past operational state
jj op restore op_abc123

# Return to the current state
jj op restore @

In Git, you would struggle with git reflog, but jj is more intuitive.

Naming Branches Later with Bookmarks

This feature also suits vibe coding well. You can "start working and decide on a name later."

# Start working anonymously (no branch name needed)
jj new main
# Implement with Claude Code...

# Name it after completion
jj bookmark create user-auth-feature

# Push for GitHub PR
jj git push --bookmark user-auth-feature

In Git, you have to decide on a branch name upfront with git checkout -b feature-xxx, but when having the AI implement, you often don’t know what you’ll end up with until the end. jj eliminates that issue.

Should You Continue Using jj?

Honestly, it’s a bit tricky to recommend it to everyone.

Who jj is Suitable For

Developers using AI agents in parallel
Environments where conflicts frequently occur (often editing the same files)
Exploratory development involving trial and error
Frequent history shaping and editing

Who Should Stick with git worktree

Teams where everyone is familiar with Git (to avoid learning costs)
Those who prioritize IDE Git integration (jj's IDE support is still developing)
Environments where conflicts rarely occur
Those who only need 1-2 parallel instances

Conclusion

I was stuck with the "everything stops due to conflicts" issue while doing parallel development with git worktree, and switching to jj resolved that.

To summarize the advantages of jj:

Conflicts do not block work: They are recorded, allowing work to continue.
Visibility of state: All changes are visible from all workspaces.
Automatic rebase: Changes in dependencies are automatically propagated.

There is a learning curve, but for those who want to avoid letting the AI sit idle during vibe coding, it’s worth trying. Since it’s Git-compatible, you can always revert if you don’t like it.

Start by trying jj git init --git-repo . in an existing project.

Reference Links

Quantifying the "Vague Anxiety" of Tailscale: tailsnitch Exposes 50 Configuration Mistakes

tumf — Thu, 05 Feb 2026 12:41:22 +0000

Originally published on 2026-01-07
Original article (Japanese): Tailscaleの『なんとなく不安』を数値化する：tailsnitchが暴く50の設定ミス

In September 2025, Asahi Group Holdings suffered a ransomware attack. The entry point was a vulnerability in their VPN device. In October, Askul was breached through a VPN account of a contracted service provider, leading to the shutdown of their e-commerce site.

Both incidents were the result of the misconception that "as long as we have a VPN, we are safe."

If you have implemented Tailscale, you might be thinking, "It should be safer than traditional VPNs, so I'm fine."

Indeed, Tailscale has advantages over traditional on-premises VPN gateways. With lightweight and audited encryption via WireGuard, device-level zero-trust authentication, and a SaaS architecture that eliminates the risk of exploiting VPN device vulnerabilities—these designs address the weaknesses of traditional VPNs.

However, Tailscale can also be dangerous if misconfigured. Leaving default ACLs unchecked can allow unrestricted access to all devices, and if reusable authentication keys are leaked, attackers can add unauthorized devices. tailsnitch is a security auditing tool that quantifies such "vague anxiety" with over 50 checks, evaluating them on a scale of Critical/High/Medium/Low/Info.

In this article, we will explain how to use tailsnitch and the dangerous configuration mistakes it can detect.

What is tailsnitch?

tailsnitch is an open-source tool that automatically audits the configuration of a Tailscale network (tailnet). Released on December 24, 2025, it garnered over 430 GitHub Stars in just two weeks.

Key features:

Over 50 security checks: 7 categories including ACLs, authentication keys, devices, network exposure, SSH, logs, and DNS
Severity rating in 5 levels: Critical → High → Medium → Low → Info
SOC 2 audit trail output: CSV/JSON output mapped to controls like CC6.1, CC6.2
CI/CD integration: Automatic checks during PRs with GitHub Actions
Interactive remediation mode: Fix configuration mistakes with the --fix flag

Developed by the security company Adversis, a Tailscale Hardening Guide is also available.

Installation

You can install tailsnitch using one of the following methods.

Prebuilt Binary (Recommended)

Download the latest version from GitHub Releases:

# Remove quarantine attribute for macOS
sudo xattr -rd com.apple.quarantine tailsnitch
chmod +x tailsnitch
sudo mv tailsnitch /usr/local/bin/

Install with Go

go install github.com/Adversis/tailsnitch@latest

Build from Source

git clone https://github.com/Adversis/tailsnitch.git
cd tailsnitch
go build -o tailsnitch .

Authentication Setup

tailsnitch uses the Tailscale API, so you need authentication credentials. You can use either an OAuth Client (recommended) or an API Key.

OAuth Client (Recommended)

The OAuth Client allows you to restrict permissions with scopes and is logged in the audit logs. There is no risk of API keys being invalidated when employees leave.

Create an OAuth Client at https://login.tailscale.com/admin/settings/oauth
Grant the following scopes for read-only auditing:
- all:read (the easiest)
- Or individually: policy_file:read, devices:core:read, dns:read, auth_keys:read
Set the environment variables:

export TS_OAUTH_CLIENT_ID="..."
export TS_OAUTH_CLIENT_SECRET="tskey-client-..."

API Key

The API Key inherits the permissions of the user who created it.

Create an API Key at https://login.tailscale.com/admin/settings/keys
Set the environment variable:

export TSKEY="tskey-api-..."

Basic Usage

Audit All Items

tailsnitch

Example of the first run:

+=====================================================================+
|                    TAILSNITCH SECURITY AUDIT                        |
|            Tailnet: example.com                                     |
|            Version: 1.4.0 (build: d717661)                          |
+=====================================================================+

=== ACCESS CONTROLS ===================================================

[CRITICAL] ACL-001: Default 'allow all' policy active
  Your ACL policy omits the 'acls' field. Tailscale applies a
  default 'allow all' policy, granting all devices full access.

  Remediation:
  Define explicit ACL rules following least privilege principle.

  Source: https://tailscale.com/kb/1192/acl-samples
----------------------------------------------------------------------

[HIGH] AUTH-001: Reusable auth keys exist
  Found 2 reusable auth key(s). These can be reused to add
  multiple devices if compromised.

  Details:
    - Key tskey-auth-xxx (expires in 45 days)
    - Key tskey-auth-yyy (expires in 89 days)

  Remediation:
  Store reusable keys in a secrets manager. Prefer one-off keys.
----------------------------------------------------------------------

SUMMARY
======================================================================
  Critical: 1  High: 3  Medium: 5  Low: 2  Info: 8
  Total findings: 19  |  Passed: 33

Filter by Severity

# Show only Critical/High
tailsnitch --severity high

# Specific categories only
tailsnitch --category access   # ACL issues
tailsnitch --category auth     # Authentication key issues
tailsnitch --category device   # Device security

JSON Output and Aggregation with jq

# Output all results in JSON
tailsnitch --json > audit.json

# Extract only failed checks
tailsnitch --json | jq -r '
  .suggestions
  | map(select(.pass == false))
  | .[]
  | [.id, .title, .severity, .remediation]
  | @tsv
' > findings.tsv

# Aggregate by severity
tailsnitch --json | jq '
  .suggestions
  | map(select(.pass == false))
  | group_by(.severity)
  | map({severity: .[0].severity, count: length})
'

Example output:

[
  {"severity": "CRITICAL", "count": 1},
  {"severity": "HIGH", "count": 3},
  {"severity": "MEDIUM", "count": 5}
]

Dangerous Configuration Mistakes Detected

Here are some representative issues that tailsnitch can detect.

Critical: Leaving Default ACLs Unchecked

Issue: If the ACL policy lacks the acls field, Tailscale applies a default 'allow all' policy, granting all devices full access.

[CRITICAL] ACL-001: Default 'allow all' policy active

Impact:

Unlimited access from a developer's laptop to the production database
A single compromised device puts the entire tailnet at risk

Remediation:

Define minimal ACLs:

{
  "groups": {
    "group:engineering": ["alice@company.com"],
    "group:devops": ["charlie@company.com"]
  },
  "tagOwners": {
    "tag:dev": ["autogroup:admin"],
    "tag:prod": ["autogroup:admin"]
  },
  "acls": [
    {
      "action": "accept",
      "src": ["group:engineering"],
      "dst": ["tag:dev:443", "tag:dev:8080"]
    },
    {
      "action": "accept",
      "src": ["group:devops"],
      "dst": ["tag:prod:22", "tag:prod:443"]
    }
  ]
}

High: Reusable Authentication Keys Exist

Issue: If reusable authentication keys are leaked, attackers can add devices without restriction.

[HIGH] AUTH-001: Reusable auth keys exist
  Found 2 reusable auth key(s):
    - Key tskey-auth-xxx (expires in 45 days)

Impact:

Breach through an authentication key committed to a GitHub repository
Unauthorized device addition with keys stolen from a CI/CD pipeline

Remediation:

Delete existing reusable keys
Switch to ephemeral (temporary) keys:

# Generate an ephemeral key (usable once)
tailscale up --authkey tskey-auth-xxx --ephemeral

Use OAuth Client in CI/CD

High: Tailnet Lock Disabled

Issue: If Tailnet Lock is disabled, an attacker can add unauthorized devices if the Tailscale coordination server is compromised.

[HIGH] DEV-010: Tailnet Lock disabled

Impact:

Trust in the control plane is required
Risk of man-in-the-middle attacks by advanced attackers

Remediation:

Enable Tailnet Lock (requires a signing node):

# Initialize lock on a trusted node
tailscale lock init tlpub:<SIGNING_NODE_KEY>

# New devices will require signing
tailscale lock sign nodekey:<NEW_NODE_KEY>

Note: Tailnet Lock can impose operational burdens, hence it is suited for defense industries or companies with strict compliance requirements.

Medium: Outdated Clients Detected

Issue: Older Tailscale clients may have known vulnerabilities.

[MEDIUM] DEV-003: Outdated clients detected
  Found 3 devices running Tailscale < 1.50.0

Remediation:

Enforce version checks with Device Posture:

{
  "postures": {
    "posture:baseline": [
      "node:tsVersion >= '1.50.0'"
    ]
  },
  "acls": [
    {
      "action": "accept",
      "src": ["group:devops"],
      "srcPosture": ["posture:baseline"],
      "dst": ["tag:prod:22"]
    }
  ]
}

Medium: Stale Devices Detected

Issue: Devices that have not been used for over 60 days pose a risk of being compromised if they belong to former employees.

[MEDIUM] DEV-004: Stale devices detected
  Found 5 devices not seen in 60+ days

Remediation:

Interactively delete with --fix mode:

tailsnitch --fix

Or delete manually:

tailscale logout --device <device-id>

Interactive Remediation Mode (`--fix`)

Using the --fix flag allows you to interactively correct issues that can be fixed via the API.

tailsnitch --fix

Fixable items include:

Check	Remediation
AUTH-001, AUTH-002, AUTH-003	Delete authentication keys
AUTH-004	Replace with ephemeral keys
DEV-002	Remove tags from user devices
DEV-004	Delete stale devices
DEV-005	Approve unauthorized devices

For items requiring manual intervention, links to the management console will be displayed.

Dry Run (preview changes):

tailsnitch --fix --dry-run

SOC 2 Audit Trail Output

tailsnitch can output the necessary audit trails for SOC 2 in CSV/JSON format.

# CSV format
tailsnitch --soc2 csv > soc2-evidence.csv

# JSON format
tailsnitch --soc2 json > soc2-evidence.json

Example output (CSV):

resource_type,resource_id,resource_name,check_id,check_title,cc_codes,status,details,tested_at
device,node123,prod-server,DEV-001,Tagged devices with key expiry disabled,CC6.1;CC6.3,PASS,Tags: [tag:server] key expiry enabled,2025-01-05T10:30:00Z
key,tskey-auth-xxx,tskey-auth-xxx,AUTH-001,Reusable auth keys exist,CC6.1;CC6.2;CC6.3,FAIL,Reusable key expires in 45 days,2025-01-05T10:30:00Z

Each check is mapped to the following SOC 2 controls (CC):

CC6.1: Logical Access Controls
CC6.2: Granting Access Rights
CC6.3: Removing Access Rights
CC6.6: Network Segmentation
CC7.1: Detection of Security Events
CC7.2: Monitoring Security Incidents

Automated Auditing in CI/CD Pipeline

Example of running automatic checks during ACL changes with GitHub Actions:

# .github/workflows/tailscale-acl.yml
name: Tailscale ACL CI
on:
  pull_request:
    paths: ['policy.hujson']
  push:
    branches: [main]
    paths: ['policy.hujson']

jobs:
  test-acl:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Run tailsnitch
        env:
          TS_OAUTH_CLIENT_ID: ${{ secrets.TS_OAUTH_CLIENT_ID }}
          TS_OAUTH_CLIENT_SECRET: ${{ secrets.TS_OAUTH_CLIENT_SECRET }}
        run: |
          curl -L https://github.com/Adversis/tailsnitch/releases/latest/download/tailsnitch-linux-amd64 -o tailsnitch
          chmod +x tailsnitch
          ./tailsnitch --severity high --json > audit.json

      - name: Fail on critical issues
        run: |
          if ./tailsnitch --severity high --json | jq -e '.summary.critical + .summary.high > 0' > /dev/null; then
            echo "Critical or high severity issues found!"
            ./tailsnitch --severity high
            exit 1
          fi

With this configuration, security checks will automatically run on PRs for ACL changes, blocking merges if Critical/High issues are found.

Preventing Configuration Mistakes with ACL Tests

While tailsnitch is a detection tool, adding a tests field within your ACLs can help prevent configuration mistakes from occurring in the first place.

{
  "acls": [
    {
      "action": "accept",
      "src": ["group:engineering"],
      "dst": ["tag:dev:443"]
    }
  ],
  "tests": [
    {
      "src": "group:engineering",
      "deny": ["tag:prod:*", "tag:prod-db:5432"]
    },
    {
      "src": "group:devops",
      "accept": ["tag:bastion:22"]
    }
  ]
}

If tests fail, ACL changes will be rejected. This helps prevent configuration mistakes before they are detected by tailsnitch.

Common Pitfalls

The Danger of `autogroup:member`

autogroup:member includes all users participating in the tailnet. Since it also includes external users (Shared Nodes), it can unintentionally grant access rights.

Bad Example:

{
  "acls": [
    {
      "action": "accept",
      "src": ["autogroup:member"],
      "dst": ["tag:staging:*"]
    }
  ]
}

Good Example:

{
  "acls": [
    {
      "action": "accept",
      "src": ["group:engineering"],
      "dst": ["tag:staging:443"]
    }
  ]
}

Overreliance on Subnet Routers

While Subnet Routers are convenient, if compromised, they can provide access to a wide range of networks.

Mitigation:

Enable Stateful Filtering:

tailscale up --advertise-routes=10.0.0.0/24 --stateful-filtering

Protect the Subnet Router itself with security groups or NACLs.

The Trap of SSH `autogroup:nonroot`

autogroup:nonroot allows SSH access for all users except root, but it also includes users with sudo privileges.

Bad Example:

{
  "ssh": [
    {
      "action": "accept",
      "src": ["group:engineering"],
      "dst": ["tag:prod"],
      "users": ["autogroup:nonroot"]
    }
  ]
}

Good Example:

{
  "ssh": [
    {
      "action": "accept",
      "src": ["group:devops"],
      "dst": ["tag:prod"],
      "users": ["deploy"]
    }
  ]
}

Regular Audit Operations

Here are operational guidelines for continuously utilizing tailsnitch.

Weekly

[ ] Run tailsnitch to check for Critical/High issues
[ ] Review the device approval queue
[ ] Remove unused authentication keys

Monthly

[ ] Cross-check group memberships with the employee roster
[ ] Ensure devices of former employees have been removed
[ ] Review ACL change history

Quarterly

[ ] Audit all access rights (who can access what)
[ ] Review third-party access
[ ] Reassess Subnet Router configurations

Upon Employee Departure (Immediately)

[ ] Remove from Tailscale groups
[ ] Delete user from tailnet
[ ] Remove created authentication keys
[ ] Delete devices
[ ] Audit recent ACL changes

Conclusion

While Tailscale addresses many weaknesses of traditional VPNs at the design level, risks from configuration mistakes still exist. By using tailsnitch, you can visualize "vague anxiety" as concrete issues and prioritize addressing them.

Setup takes just five minutes, and integrating it into your CI/CD pipeline allows for automatic checks with every ACL change. You can also output audit trails for SOC 2 compliance.

Personally, I believe that every organization using Tailscale should run tailsnitch at least once a month. Do not leave Critical/High issues unaddressed until they reach zero, and add ACL tests to prevent recurrence—by thoroughly implementing these two practices, you can significantly reduce the risk of large-scale incidents from VPN breaches.

If you're interested, try running tailsnitch on your own tailnet. You might discover unexpected configuration mistakes.

Reference Links

jj-desc: Release of the Rust-based jj Commit Message Generation Tool

tumf — Thu, 05 Feb 2026 12:39:19 +0000

Originally published on 2026-01-08
Original article (Japanese): jj-desc: Rust製のjjコミットメッセージ自動生成ツールをリリース

We have released a CLI tool called jj-desc that automatically generates commit messages for Jujutsu (jj) using LLMs.

jj is a Git-compatible version control tool developed by Google, known for its powerful undo functionality and flexible commit operations through revset (revision set). The generated commit messages adhere to the Conventional Commits format. A significant feature of jj-desc is its ability to generate multiple commit messages in bulk, leveraging the unique characteristics of jj.

Main Features of jj-desc

Support for Multiple LLM Providers

jj-desc supports the following LLM providers:

OpenRouter (default)
OpenAI
Anthropic
Google Gemini
Custom endpoints (Azure OpenAI, Ollama, etc.)

You can easily switch between them using environment variables or CLI options:

export LLM_PROVIDER=anthropic
export ANTHROPIC_API_KEY="sk-ant-..."
jj-desc

Backfill Functionality Utilizing revset

The greatest strength of jj-desc is its backfill functionality, which allows for bulk generation of commit messages by utilizing revset.

Revset is jj's unique query language that allows for flexible specification of a set of commits (revisions). It provides powerful features not found in Git, enabling you to freely select targets by combining set operations (&, |, ~) and condition filters.

# Current commit
jj-desc -r @

# Process all of my commits in bulk
jj-desc -r "mine()"

# Range from the main branch to HEAD
jj-desc -r "@..main"

# Mutable commits without descriptions (default)
jj-desc -r "::@ & mutable()"

# Process 5 commits without descriptions in bulk
jj-desc -r "::@ & mutable()" -n 5

Why is backfill important?

During development, it's common to "commit for now" and want to organize messages later. In Git, rewriting commit messages with git rebase -i can be cumbersome. Tools like aicommits primarily generate messages for "the content being committed now."

jj-desc is different. By combining jj's editable commit history with revset, it allows for bulk message generation for past commits. You can focus on your work, accumulate commits, and then simply run jj-desc when you’re ready. This is the true value of jj-desc.

The following demo shows jj-desc being executed on multiple commits without descriptions, generating messages in bulk:

Immediate Application Without Confirmation Prompts

jj-desc applies the generated commit messages immediately without confirmation. While this may seem bold, it is designed with jj's powerful undo functionality (jj undo, jj op log) in mind.

In Git tools, it's common to ask, "Do you really want to apply this?" However, in the jj ecosystem, the natural workflow is "try it, and if it doesn't work, undo." All operations are recorded in history and can be easily reverted, making confirmation prompts an unnecessary friction.

Of course, if you prefer to confirm carefully, options like --dry-run (preview) and -i (interactive mode) are also available:

# Preview only (do not apply)
jj-desc --dry-run

# Apply while confirming one by one
jj-desc -i -r "mine()"

Diff Optimization and Token Savings

Before sending diffs to the LLM, the following optimizations are automatically performed:

Automatic exclusion of lock files (Cargo.lock, package-lock.json, etc.)
Simplification of binary files (Binary file {path} changed)
User-specified exclusion patterns (using the --exclude option)
Warning for diffs exceeding 50KB

This balances cost reduction for LLM API calls and avoidance of context limits.

# Exclude specific files
jj-desc --exclude "*.json" --exclude "*.yaml"

# Shortened form
jj-desc -x "docs/*" -x "*.lock"

Automatic Detection of Merge Commits

In jj, many merge commits are treated as "empty" (see jj FAQ). jj-desc automatically detects merge commits and sets appropriate descriptions ("Merge commit") without needing an LLM API call.

Installation

Homebrew (Recommended)

brew install tumf/tap/jj-desc

Prebuilt Binaries

You can obtain binaries for each platform from the release page:

# macOS (Apple Silicon)
curl --proto '=https' --tlsv1.2 -LsSf \
  https://github.com/tumf/jj-desc/releases/latest/download/jj-desc-installer.sh | sh

# Linux (x86_64)
curl --proto '=https' --tlsv1.2 -LsSf \
  https://github.com/tumf/jj-desc/releases/latest/download/jj-desc-installer.sh | sh

# Windows (PowerShell)
powershell -ExecutionPolicy Bypass -c \
  "irm https://github.com/tumf/jj-desc/releases/latest/download/jj-desc-installer.ps1 | iex"

Build from Cargo

cargo install --git https://github.com/tumf/jj-desc

Basic Usage

Setting Up LLM Provider

First, set the API key for the LLM provider you wish to use:

# OpenRouter (default)
export OPENROUTER_API_KEY="your-api-key"

# Or OpenAI
export LLM_PROVIDER=openai
export OPENAI_API_KEY="sk-..."

# Or Anthropic
export LLM_PROVIDER=anthropic
export ANTHROPIC_API_KEY="sk-ant-..."

# Ollama (local LLM)
export LLM_PROVIDER=openai
export OPENAI_API_KEY="dummy"
export OPENAI_BASE_URL="http://localhost:11434/v1"
export LLM_MODEL="llama2"

Generating Commit Messages

# Default: Process all mutable commits without descriptions
jj-desc

# Only the current commit
jj-desc -r @

# All of my commits
jj-desc -r "mine()"

# Preview
jj-desc --dry-run

# Interactive mode
jj-desc -i

Differences from aicommit2

As an AI commit message generation tool for jj, aicommit2 has been around longer. aicommit2 is a TypeScript-based general tool that supports Git, YADM, and jj.

The main difference between the two is the presence of the backfill functionality:

Feature	jj-desc	aicommit2
Backfill (bulk processing of past commits)	✅ Freely specified with revset	❌ Current commits only
Supported VCS	jj only	Git, YADM, jj
Implementation Language	Rust	TypeScript
Confirmation Prompt	None (based on undo)	Yes
Diff Optimization	✅ Automatic filtering	-

aicommit2 is geared towards a "committing now" workflow. jj-desc is designed for a workflow where you "commit in bulk and organize messages later."

If you want to leverage jj's features to the fullest, use jj-desc; if you want to integrate with existing Git workflows, aicommit2 is recommended.

Conclusion

jj-desc's standout feature is its backfill functionality utilizing jj's revset. Focus on your work, accumulate commits, and generate messages later—this workflow is unique to jj and cannot be achieved with Git tools.

If you're a jj user, be sure to give it a try. We welcome feedback and feature suggestions on our GitHub repository.

Reference Links

jj-desc GitHub Repository
Jujutsu (jj) Official Documentation
aicommit2 - AI commit tool supporting Git/YADM/jj
OpenRouter
Anthropic
Google AI Studio

cron, anacron, and systemd timer: Guidelines for Choosing Between Three Linux Task Schedulers

tumf — Thu, 05 Feb 2026 12:38:09 +0000

Originally published on 2026-02-04
Original article (Japanese): cron・anacron・systemd timer: Linuxタスクスケジューラ3種の使い分け基準

"I want to run backups regularly" or "I want to rotate logs at midnight"—there are three methods to automate tasks in Linux: cron, anacron, and systemd timer. However, many people may wonder, "Which one should I use?"

In this article, we will clarify the characteristics of these three schedulers and provide criteria for deciding which one to choose.

Basic Characteristics of the Three Schedulers

First, let's outline the basic features of each.

cron: A Scheduler with Strict Time Specifications

cron is the oldest task scheduler used in UNIX-like operating systems.

Main Features:

Strict Time Specification: Executes jobs at specific times, such as "every day at 2 AM" or "every Monday at 9 AM."
Minute-Level Precision: The smallest unit is 1 minute.
Designed for Continuous Operation: Assumes the system is running 24/7.
Simple Configuration: Can be intuitively set up in crontab format.

Configuration Example:

# Backup every day at 2 AM
0 2 * * * /usr/local/bin/backup.sh

# Generate report every Monday at 9 AM
0 9 * * 1 /usr/local/bin/generate-report.sh

Suitable Use Cases:

Machines that are always running, such as servers.
Jobs that require strict time specifications (e.g., maintenance outside of business hours).
Detailed schedules at minute intervals (e.g., monitoring every 5 minutes).

Constraints:

Jobs during system downtime will not be executed.
The next job can start even if the previous one has not finished (risk of overlapping execution).
Checking logs can be cumbersome (need to look in /var/log/syslog or /var/log/cron).

anacron: A Daily Scheduler That Doesn't Miss Jobs When Powered Off

anacron is designed for machines that are not always running.

Main Features:

Frequency-Based Execution: Specifies execution based on frequency, such as "once a day" or "once every 7 days."
Compensation for Missed Jobs: Automatically executes jobs that have not run during system startup.
Minimum Unit is a Day: Minute-level specifications are not possible.
Not a Daemon: Typically operates by being called periodically from cron or systemd.

Configuration Example (/etc/anacrontab):

# period  delay  job-id  command
1         5      cron.daily    run-parts /etc/cron.daily
7         10     cron.weekly   run-parts /etc/cron.weekly
@monthly  15     cron.monthly  run-parts /etc/cron.monthly

How It Works:

anacron records the last execution time of jobs in /var/spool/anacron/.
At system startup (or periodically from cron), it checks for missed jobs.
Executes the jobs after the specified delay time.

Suitable Use Cases:

Laptops or desktops that have periods of being powered off.
Daily, weekly, or monthly maintenance tasks.
Jobs that do not require strict time specifications (e.g., backups, package updates).

Constraints:

Minute-level detailed scheduling is not possible.
Cannot specify strict times (only delay time after startup can be specified).
Standard settings often require root operation (when using /etc/anacrontab).

systemd timer: A Modern Integrated Scheduler

systemd timer is a task scheduler provided as part of systemd.

Main Features:

Flexible Schedule Specification: Allows for various specifications, including strict times, relative times, and elapsed time after startup.
Compensation for Missed Jobs: Can operate similarly to anacron with Persistent=true.
Integration with systemd: Unified handling of service management, logging, and dependencies.
Random Delay: Can randomize execution times with RandomizedDelaySec.

Configuration Example:

Timer file (backup.timer):

[Unit]
Description=Daily backup timer

[Timer]
OnCalendar=daily
Persistent=true
RandomizedDelaySec=1h

[Install]
WantedBy=timers.target

Service file (backup.service):

[Unit]
Description=Backup service

[Service]
Type=oneshot
ExecStart=/usr/local/bin/backup.sh

Suitable Use Cases:

Systems that adopt systemd.
Complex scheduling requirements (e.g., executing 24 hours after the last run, or 15 minutes after startup).
When there are dependencies between services.
When centralized logging is necessary.

Constraints:

Not usable on systems that do not use systemd.
Requires two configuration files (.timer and .service), resulting in more verbosity compared to cron.
Slightly higher learning curve.

Comparison Table of the Three Schedulers

Item	cron	anacron	systemd timer
Minimum Unit	1 minute	1 day	Less than a second (default AccuracySec is 1 minute)
Time Specification	✅ Strict	❌ Not possible	✅ Strict
Compensation for Missed Jobs	❌ None	✅ Yes	✅ Yes (Persistent=true)
Prevention of Overlapping Execution	❌ None	✅ Yes	✅ Yes
Log Checking	syslog	syslog	`journalctl -u <unit>`
General User	✅ Possible	△ Possible (-S spooldir)	✅ Possible
Simplicity of Configuration	✅ 1 file	✅ 1 file	❌ 2 files
Dependency Management	❌ None	❌ None	✅ Yes
Random Delay	❌ None	△ delay (not random)	✅ Yes (RandomizedDelaySec)

Guidelines for Choosing Between Them

Case 1: Strict Time Specification Needed on a Server

Conclusion: Use cron

Maintenance outside of business hours (e.g., every day at 2 AM).
Regular report generation (e.g., every Monday at 9 AM).
High-frequency monitoring (e.g., every 5 minutes).

Reason:

Simple and low learning cost.
Allows for strict time specifications.
Supports detailed schedules at minute intervals.

Case 2: Daily Backup on a Laptop

Conclusion: Use anacron or systemd timer

Daily tasks on machines that have powered-off periods.
Maintenance that does not require strict time specifications.

Choose anacron if:

You want to keep the configuration simple.
You want to leverage existing /etc/cron.daily setups.

Choose systemd timer if:

You want centralized logging with journalctl.
There are dependencies between services.
You want to distribute load with random delays.

Case 3: Complex Scheduling Requirements

Conclusion: Use systemd timer

Execute 24 hours after the last run.
Execute 15 minutes after system startup.
Execute after a specific service has started.

Reason:

Supports various triggers like OnBootSec, OnUnitActiveSec.
Manages service dependencies with After=, Requires=.

For details on the format of OnCalendar and timer precision (AccuracySec), refer to systemd.time(7) and systemd.timer(5).

Case 4: Improving Existing cron Jobs

Conclusion: Migrate to systemd timer

Benefits of Migration:

Logs can be easily checked with journalctl.
Prevents overlapping execution if the previous job has not finished.
Allows for load distribution with random delays.
Manual execution is straightforward (systemctl start service.service).

Example Migration Steps:

Check crontab entries.

crontab -l
# 0 2 * * * /usr/local/bin/backup.sh

Create systemd timer and service.

# ~/.config/systemd/user/backup.timer
[Unit]
Description=Daily backup timer

[Timer]
OnCalendar=02:00
Persistent=true

[Install]
WantedBy=timers.target

# ~/.config/systemd/user/backup.service
[Unit]
Description=Backup service

[Service]
Type=oneshot
ExecStart=/usr/local/bin/backup.sh

Enable the timer.

systemctl --user daemon-reload
systemctl --user enable --now backup.timer

Confirm operation.

# Check the list of timers
systemctl --user list-timers

# Check logs
journalctl --user -u backup.service

Practical Example: Automating Backups on a Laptop

Here, we will introduce an example of executing daily backups on a laptop that has powered-off periods.

Using anacron

Create a backup script.

#!/bin/bash
# /usr/local/bin/backup.sh

BACKUP_DIR="/mnt/backup"
DATE=$(date +%Y%m%d)

rsync -av --delete /home/user/Documents/ "$BACKUP_DIR/$DATE/"

Place it in /etc/cron.daily/.

sudo cp /usr/local/bin/backup.sh /etc/cron.daily/backup
sudo chmod +x /etc/cron.daily/backup

anacron will execute it automatically.

On Ubuntu/Debian, anacron is set up by default to run /etc/cron.daily/ daily. Even if the machine is powered off, missed jobs will be compensated during the next startup.

Using systemd timer

Create the backup script (same as above).
Create the service file.

# ~/.config/systemd/user/backup.service
[Unit]
Description=Daily backup

[Service]
Type=oneshot
ExecStart=/usr/local/bin/backup.sh

Create the timer file.

# ~/.config/systemd/user/backup.timer
[Unit]
Description=Daily backup timer

[Timer]
OnCalendar=daily
Persistent=true
RandomizedDelaySec=30min

[Install]
WantedBy=timers.target

Enable the timer.

systemctl --user daemon-reload
systemctl --user enable --now backup.timer

Confirm operation.

# Check the next execution time
systemctl --user list-timers backup.timer

# Manually execute for testing
systemctl --user start backup.service

# Check logs
journalctl --user -u backup.service

Advantages of systemd timer:

Logs can be easily checked with journalctl.
Execution times can be randomized with RandomizedDelaySec for load distribution.
Persistent=true compensates for missed jobs.
Manual execution is straightforward (systemctl start).

Frequently Asked Questions

Q1: Is anacron still in use?

A: Yes, it is installed by default in desktop environments like Ubuntu/Debian and is used to execute /etc/cron.daily and similar tasks. However, there is a trend toward migrating to systemd timer, and new projects are increasingly opting for systemd timer.

Q2: Should I migrate from cron to systemd timer?

A: It is worth considering migration if:

You find checking logs cumbersome.
The next job starts even if the previous one has not finished.
You want to manage dependencies between services.
You frequently perform manual executions.

On the other hand, if your jobs are simple and sufficient with cron, there is no need to force a migration.

Q3: Is the learning curve for systemd timer high?

A: Compared to cron, it requires two configuration files, resulting in more verbosity. However, you can easily check the list of timers and logs with the systemctl command, making operational management easier in many cases.

Q4: Can general users use systemd timer?

A: Yes, you can place files in ~/.config/systemd/user/ and manage them with the systemctl --user command. However, to ensure that the timer continues to run after logging out, you need to execute loginctl enable-linger username (loginctl(1)).

Conclusion

It is important to choose the appropriate Linux task scheduler based on your needs.

Guidelines for Selection:

Strict time specification on servers: cron
Daily tasks on laptops: anacron or systemd timer
Complex schedules: systemd timer
Improving existing cron jobs: migrate to systemd timer

Personally, I believe that on systems using systemd, new jobs should be created with systemd timer, and existing cron jobs should gradually be migrated. The centralized management of logs and prevention of overlapping execution can significantly reduce operational overhead.

If you're interested, start by trying out systemd timer with a simple job.

Reference Links

Git Hooks Completed with a Single Binary: Migration Notes from pre-commit to prek

tumf — Thu, 05 Feb 2026 12:36:34 +0000

Originally published on 2026-02-05
Original article (Japanese): シングルバイナリで完結するGit hooks: pre-commitからprekへの移行メモ

Have you ever thought, "Why do I need Python when I'm not working on a Python project?" while using pre-commit?

prek is a Git hooks management tool that reimplements pre-commit in Rust. It allows you to use your existing .pre-commit-config.yaml without needing a Python runtime, and it operates as a single binary. In this article, I will discuss the features of prek, the migration process, and the importance of "portability" over "speed."

Fundamental Issues with pre-commit

While pre-commit is an excellent tool, it has several challenges:

Python Required - Even for Go/Rust/TypeScript projects, a Python environment is necessary.
Virtualenv Management - There is a potential conflict with the project's Python version.
Complexity in CI/CD - Steps like setup-python + pip install pre-commit are required.
Onboarding Barriers - New members must start by setting up a Python environment.

This situation often arises in non-Python projects, where you end up needing Python just for Git hooks.

Features of prek

prek (MIT License) addresses these challenges:

1. Single Binary, No Dependencies

# macOS/Linux
curl -LsSf https://github.com/j178/prek/releases/latest/download/prek-installer.sh | sh

# Windows
powershell -c "irm https://github.com/j178/prek/releases/latest/download/prek-installer.ps1 | iex"

No Python runtime is required. It operates with just one binary.

2. Fully Compatible with pre-commit

You can use your existing .pre-commit-config.yaml as is:

repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v6.0.0
    hooks:
      - id: trailing-whitespace
      - id: end-of-file-fixer

Migration is as simple as pre-commit uninstall && prek install.

3. Speed Improvements

Official benchmarks (Apache Airflow):

Operation	pre-commit	prek	Ratio
Initial Installation	187 seconds	18 seconds	10.2x
Hook Execution	352 ms	77 ms	4.6x
Disk Usage	1.6 GB	810 MB	Half

However, this is a "bonus." The essential value will be explained in the next section.

The Value of Portability

The true value of Rust tools lies not in "speed" but in "portability."

Simplified CI/CD

For pre-commit:

# .github/workflows/lint.yml
- uses: actions/setup-python@v5
  with:
    python-version: '3.11'
- run: pip install pre-commit
- run: pre-commit run --all-files

For prek:

# .github/workflows/lint.yml
- uses: j178/prek-action@v1

The setup steps become unnecessary.

Reduced Dependencies in Development Environments

Dependencies for pre-commit:

Project
├── Python 3.11 (for the app)
└── Python 3.9 (for pre-commit, virtualenv)
    └── pre-commit
        └── Dependencies for each hook

Dependencies for prek:

Project
└── prek (single binary)
    └── Dependencies for each hook

The risk of Python version conflicts is eliminated.

Cross-Platform Compatibility

The same binary works on Windows/macOS/Linux:

# Same command on any OS
prek install
prek run --all-files

This absorbs differences in development environments.

Migration Steps

1. Install prek

# Homebrew
brew install prek

# uv
uv tool install prek

# cargo
cargo install prek

2. Uninstall Existing pre-commit

pre-commit uninstall

3. Install prek

prek install

You can use your .pre-commit-config.yaml as is.

4. Verify Functionality

# Display list of hooks
prek list

# Run on all files
prek run --all-files

# Run specific hooks only
prek run trailing-whitespace end-of-file-fixer

5. Update GitHub Actions

# Before
- uses: pre-commit/action@v3.0.1

# After
- uses: j178/prek-action@v1

Additional Features of prek

Convenient features not available in pre-commit:

List Hooks

$ prek list
.:trailing-whitespace
.:end-of-file-fixer
.:check-yaml
.:check-toml
.:ruff
.:ruff-format
.:mypy

Run with Directory Specification

# Only in the src directory
prek run --directory src

# Multiple directories
prek run --directory src --directory tests

Run Only on Last Commit Targets

prek run --last-commit

Monorepo Support

You can place a .pre-commit-config.yaml for each subproject:

monorepo/
├── .pre-commit-config.yaml  # Root
├── frontend/
│   └── .pre-commit-config.yaml
└── backend/
    └── .pre-commit-config.yaml

# Run for all projects
prek run

# Run for specific projects only
prek run frontend backend

Points to Note

1. Tool in Development

Currently at v0.3.1. Some languages and subcommands are not yet implemented:

It is recommended to verify the functionality of the hooks you plan to use before production deployment.

2. Unique Features of pre-commit

Some unique features of pre-commit may not be supported. Please check the contents of your .pre-commit-config.yaml before migrating.

Comparison with Other Tools

Tool	Language	pre-commit Compatible	Dependencies
prek	Rust	✅ Fully	None
lefthook	Go	❌ Custom Settings	None
hk	Rust	❌ Custom Settings	None
Husky	Node.js	❌ Custom Settings	Node.js
pre-commit	Python	-	Python

Importance of Configuration Compatibility:

While lefthook and hk are also fast, they require rewriting the existing .pre-commit-config.yaml. Since prek can use the configuration file as is, the migration cost is zero.

Adoption Records

Notable OSS projects that have adopted it:

Conclusion

The intrinsic value of prek lies not in "speed" but in "portability":

Reduction of Dependencies - No Python runtime required.
Simplification of CI/CD - Fewer setup steps.
Unification of Development Environments - Cross-platform support.
Zero Migration Cost - Existing configurations can be used as is.

It is particularly recommended for non-Python projects or for teams looking to simplify their development environments.

If you're interested, please give it a try.

Forem: tumf

Conflux Release: A Spec-Driven Orchestrator for Parallel AI Development

What problem was I trying to solve?

Conflux in one sentence

How does the workflow look?

The smallest way to try it

What matters most in this first release

1. Treating the whole flow, not just one-off generation

2. Building around parallel execution

3. Not locking into a single vendor

Who is this for?

Where to start

Closing thoughts

Reference links

Web Adapter Tool Agent: Turn Self-Learning Skills into "98% Average Token Reduction on Revisits," Measured

Why this hurts: passing raw HTML directly to an LLM increases cost

Direction of the solution: confine exploration to one pass, make execution lightweight

Measurement: how many tokens do revisits actually save?

What is an Adapter? A contract that encapsulates site-specific differences

What is a Tool/CLI? A one-line "web interface"

Example: running the self-learning skill self-learning-web-adapter

1) Setup

2) Prepare training samples (3 or more from the same host)

3) Learn -> run to freeze the behavior

4) Drift checks and retraining

5) Export to a web2cli-style command

Design intuition: how to bias toward skills that work well

Conclusion: turn "reading the Web" into a tooling problem

References

Summary of the Web3 Industry in 2025: Technologies Implemented as Products

January: Bitcoin Evolves into a "Payment + Asset" Layer

January: Full Operation of Taproot Assets on Mainnet

February: The "Wall" Between Layer 2s Technically Disappears

February: Implementation of Optimism Superchain Interoperability

March: DeFi Incorporates "Apps"

March: Emergence of Uniswap v4 "Hooks" Ecosystem

April: Revolution in Wallet Experience (Pectra Upgrade)

April: Smart Account Transformation of EOA via EIP-7702

May: Marketization of Shared Security

May: EigenLayer AVS Goes Live

June: "Programming" of RWA

June: BlackRock BUIDL's DeFi Integration

July: "In-App Apps" in Decentralized Social Networks

July: Adoption of Farcaster Frames v2

August: Establishing Reliability in Off-Chain Computation

August: Expansion of ZK Coprocessor Adoption

September: Proof of Parallel Processing EVM's Capabilities

September: Monad's Mainnet Launch

October: Establishment of Intent-Centric Architecture

October: Standardization of UniswapX / CowSwap

November: Starknet's Quantum Resistance and Throughput

November: Starknet v0.14 "Quantum Leap"

December: Towards the Next Phase of Ethereum (Glamsterdam)

December: Agreement on Implementation of ePBS (EIP-7732)

Conclusion: UX is "Concealed," and Infrastructure is "Integrated"

Reference Links

Bold Predictions for 2026 from the Intersection of AI and Web3: The Era of Agents with Wallets

Prediction 1: AI Agents Will Have Wallets and Conduct Economic Activities Autonomously ★★★★☆

The Technological Foundation is Ready

What Will Happen in 2026

Factors Accelerating Realization

Prediction 2: Practical Implementation of On-Chain AI Inference Market ★★★☆☆

Maturity of Decentralized AI Infrastructure

What Will Happen in 2026

Realizing Verifiability

Prediction 3: Token Economy Accelerating AI Development ★★☆☆☆

Clarification of Data and Model Ownership

Sustainability of Open Source AI

Prediction 4: AI Accelerating UX Improvements in Web3 Products ★★★★★

Challenges in 2025

What Will Happen in 2026

Supporting Technologies

Prediction 5: Implementation of Collective Intelligence through "Decentralized AI + DAO" ★★★☆☆

AI Supporting DAO Decision-Making

Training Decentralized AI

Conclusion: 2026 Will Be the Year of "Fusion"

Reference Links

docker-android: A Docker Environment for Controlling Android Emulators from a Web Browser

What is docker-android?

Key Features

Example: running the self-learning skill `self-learning-web-adapter`