Forem: guanjiawei

A Token Is Not a Thing

guanjiawei — Tue, 26 May 2026 04:48:36 +0000

Lately, "token economy" is hot. Every business model in AI will eventually converge on one unit of account: the token. I buy that thesis. But one premise keeps getting skipped—a token is not a standardized commodity.

Water has standard units. Electricity has standard units. Money, obviously. Token doesn't. It's more like gasoline: 92, 95, and 98 octane are different fuels, priced differently, for different engines. Adding them up by the liter and reporting one number means nothing.

Most contradictions in AI today come down to this.

I. Intelligence Has Tiers

Roughly four.

Top tier. Overseas: OpenAI GPT-5.5, Anthropic Claude Opus 4.7. China: Zhipu GLM-5.1, Moonshot Kimi K2.6, DeepSeek V4-Pro. Xiaomi MiMo-V2.5-Pro is a bit controversial, but usage and data are climbing, so I'll count it. These range from hundreds of billions to over a trillion parameters. Demand is almost unlimited; willingness to pay is fierce. Prices rise, quotas tighten, prices rise again—users keep pouring in. Zhipu's 2025 annual report showed GLM Coding Plan token calls up 15× in six months, with paying developers past 240,000. That's the real demand curve for top-tier tokens.

Mid-tier. This is the awkward gap. MiniMax M2.7, DeepSeek V4-Flash, Xiaomi MiMo-V2.5 standard—these are about it. Moderate size, an order of magnitude cheaper, theoretically the best value. But almost no one is seriously building here. I'll explain why later.

Low-mid. Mostly open source. Alibaba Qwen 3.6 leads, with both 35B-A3B (MoE) and 27B dense versions open. Google Gemma 4 is here too, from E2B to 31B.

On-device. A few billion parameters, or even sub-billion, fitting in a phone or a consumer GPU.

The first imbalance is right here. Top tier is a bloodbath. Mid-tier is empty. Low-mid and on-device are noisy but lack clear scenarios.

II. Speed Is Another Dimension

Tiers are only half the token story.

The other half is speed. GPT-5.5 at 30 TPS versus 200 TPS is a completely different experience.

Here are the 2026 numbers from Artificial Analysis, a commonly cited benchmark:

Tier	Model	Output TPS
Flagship Standard	GPT-5.5 (high)	~68
Flagship Standard	Claude Opus 4.7	~48
Flagship Standard	DeepSeek V4-Pro	~48
Flagship Standard	Kimi K2.6	~33
High Speed	DeepSeek V4-Flash	~126
High Speed	Gemini 3.5 Flash	~203
Ultra High Speed	GLM-5.1 High-Speed Edition	400 (official)
Ultra High Speed	Cerebras running Kimi K2.6	981

I wrote a post earlier, A Model 5× Faster Is No Longer the Same Model. The argument: a 5× speedup unlocks product forms that literally didn't exist before. This isn't slightly faster. It's a different species.

The market already prices this. Anthropic Opus Fast: 2.5× speed, 6× price. OpenAI Priority Tier: 2.5× price. Look at those ratios—price rises faster than speed. Not greed. It's a pricing signal. There's a real cohort willing to pay multiples for speed.

Intelligence tier × speed tier. Stack them and you get a matrix. The token in each cell is a different product.

III. Two Demand Tracks, Worlds Apart in Willingness to Pay

Who's burning top-tier tokens? Two main tracks.

First: coding agents. The fastest-growing, highest-burn category worldwide. The surface is a coding agent writing code to solve problems. In practice, people use them for everything. The work just happens to get done through "writing code."

Second: consumer agents. The Claude app, ChatGPT app, Microsoft Copilot, and Zhipu's new AutoClaw (Claw Plan). AutoClaw launched March 2026 and hit 400,000 subscriptions in 20 days. Under the hood it's a coding agent wrapped in a non-technical shell, letting ordinary people "hire an AI employee."

The two tracks have very different willingness to pay.

Coding agent users demand peak intelligence—Opus 4.7, GPT-5.5 tier. Anything less fails. The work is valuable; time saved is valuable. They'll pay for top-tier tokens continuously. Stickiness is another story: when a better model drops, they switch immediately.

Consumer agent users differ. Their tasks are lower-value, they're price-sensitive, and they don't need absolute peak intelligence. A "mid-tier smarts, good value, acceptable speed" model fits them perfectly. The problem: that tier is empty right now, with no real supply. So DeepSeek V4, with extreme cost-performance, quickly captured this segment. I've noticed many friends around me switching to DeepSeek.

Demand looks like this, so model companies follow the money. That's why top-tier models keep screaming compute shortages while mid-tier models have no takers.

IV. The Supply-Side Mismatch: Scarce Cards and Idle Racks at the Same Time

Demand misalignment carries over to the compute market.

Top-tier compute shortage is obvious.

Jensen Huang personally confirmed NVIDIA's Blackwell series (B200/GB200) is "sold out through mid-2026," with new enterprise orders facing 8–16 week lead times. Meta's annual CapEx is expected past $100 billion; Microsoft is spending nearly $35 billion in a single quarter—all scrambling for these chips. In China, the frenzy is over B300 and H200: a B300 server costs ¥7 million and you still can't get one, monthly rent pushed to ¥130,000–200,000. H200 was cleared for sale in China in January 2026; the first 5,000–10,000 module batch was snapped up by top vendors immediately, cluster delivery pushed to Q2 2027. The older H100 has cooled. No one is fighting for it now.

Domestic top-tier chips are even more extreme. Huawei's latest Ascend 950PR only began mass production in March 2026, yet the full-year plan of 750,000 units was completely locked up: ByteDance (350,000), Alibaba (200,000), Tencent/Baidu (100,000), government and enterprise IT innovation (100,000)—orders pushed to 2027. Roughly $16,000 per chip, 1.56 PFLOPS FP4, officially claimed at 2.87× H20 single-card performance. This is the first time in domestic AI chip history that an entire year's production was bought out. When DeepSeek V4 open-sourced, it shipped day-zero support for eight domestic chips, listing Ascend NPUs alongside NVIDIA GPUs in the technical report. GLM-5 was trained entirely on Ascend + MindSpore, with support for seven domestic chips. This is about positioning: anchoring top models on domestic chips is both a technical and supply problem.

The hidden side is massive idle capacity in low-to-mid-range compute.

PPIO founder Yao Xin has said some domestic GPU AI compute centers have idle rates up to 80%. 36Kr reported some centers at only 10–20% utilization. Xinhua put it more bluntly: "General-purpose compute is relatively oversupplied; AI compute is relatively scarce"—an admission of structural mismatch. Prices reflect this: A100 prices crashed over 50%, RTX 4090 hourly rent dropped to ¥1–2, and the 5090 is around ¥2.5.

But the low-to-mid-range mismatch has two distinct bottlenecks.

Mid-tier datacenter cards (H20, L20, Huawei 910B, etc.) are stuck on infrastructure. Inference frameworks optimize for top-tier cards far more than these. KV cache management, MoE expert parallelism, FP8/FP4 precision support—none of the critical paths is mature here. The hardware exists, demand exists, but you can't serve a top experience.

Consumer PCIe cards (4090, 5090, 4090 48GB mods) face the opposite problem. The hardware can run; vLLM already supports the 5090 (needs CUDA 12.8 + falling back to FlashAttention 2, usable enough). What's missing is good models designed for them. The 70B dense tier is obsolete—as of May 2026, the top six open-source models are all MoE; dense has virtually disappeared at the flagship level. MoE total parameters routinely exceed 100B, which won't fit on consumer cards; distilled small models can't match top quality. No one is supplying new, high-quality models tailored to 24GB/32GB/48GB VRAM limits.

So the picture is: 4090/5090 prices are absurdly cheap compared to datacenter cards, yet the mid-tier models you can actually run are still old stock like Llama 3.3 70B from late 2024. Individual developers experimenting locally, small-team PoCs, and privacy-sensitive on-prem deployments can get by. But for enterprise-grade mid-tier inference on these cards, no newly optimized models exist.

The issue isn't "total compute is insufficient." It's "compute can't align with demand."

Outsiders used to quote compute in "petaflops." That was always shaky; in the AI inference era it's nearly useless. Whether a compute unit can serve top-tier models depends on interconnect, memory bandwidth, FP4/FP8 support, KV cache management. A hundred older cards can't match one top-tier card's single-stream speed.

You get a strange picture: top model providers scrambling for chips, while last-gen cards in datacenters can't be rented out even at a discount. Scarcity and glut, side by side.

V. The Market Will Correct the Mismatch, But It Takes Time

This mismatch won't last. The two bottlenecks will be pushed by two different market forces.

The infrastructure gap for mid-tier datacenter cards will be driven by engineering priorities. Inference frameworks follow the money. Once mid-tier model demand grows, top frameworks like vLLM, SGLang, and TensorRT-LLM will eventually be forced to prioritize H20, L20, and 910B optimization. Not glamorous, but inevitable.

The model supply gap for consumer cards is being pushed by distillation and small MoE. DeepSeek-V4 has already distilled a ~9B version; the Qwen series has been working on this. Once someone actually delivers "runs in 32GB VRAM, quality close to top-tier," idle 4090s and 5090s will immediately find work.

Another track is deep binding between domestic chips and domestic models. DeepSeek and Zhipu are both pursuing it; technically it's proven feasible. Once it fully works, the low-to-mid-range compute market will reshuffle structurally.

I'm fairly optimistic this will happen—it just takes time. Maybe a few quarters, maybe a year or two. For those who catch the rhythm, there's a structural window.

VI. Don't Reduce Tokens to a Single Number

Back to the opening line. "Token economy" is a fine term, but it's far less intuitive than selling water or electricity.

It's more like a gas station. Gasoline looks like one thing, but it's actually an intelligence × speed matrix. Layer on the supply-side compute tier mismatch, and you have the real cause behind today's apparently contradictory industry phenomena: why model companies are scrambling for chips, why some AI compute centers sit idle, why fast tier can charge 6×, and why mid-tier intelligence models are slow to arrive.

Next time you see "we've deployed N petaflops" or "we produce X trillion tokens per month," pause and ask: which intelligence tier, which speed tier, which demand tier.

A token is not a thing.

References

Model Versions and Positioning

Speed Data

Zhipu Products and Financial Reports

Compute Market

原文链接：https://guanjiawei.ai/en/blog/token-is-not-one-thing

The Stronger the Agent, the More Common Sense Is Worth

guanjiawei — Mon, 25 May 2026 04:06:31 +0000

Last month I wrote “AI Turns Ignorance Into an Advantage”, about how outsiders without the baggage of knowing how hard something is are more willing to use AI to try things that look impossible.

I still believe that. But agents burned me four times in a row recently, so I need to revise.

The sweet spot isn’t knowing nothing; it’s knowing just enough. You have common sense, you grasp the big picture, but you don’t get lost in technical details. Total beginners do dare to try, which is good. But they can’t tell whether the agent’s output is actually reliable.

1. Fake Data Can Trick You by Orders of Magnitude

I’ve been optimizing an inference engine lately.

I checked the results on the first night. The metric had hit a target I’d considered seriously challenging. I was excited. Had we really cracked it that fast?

If I knew nothing about this domain, I’d probably have cheerfully shared the results with my partners. But because I had some common sense, something felt off. I had it run a correctness check. The output was nothing but exclamation marks. After fixing correctness, performance dropped by orders of magnitude.

I thought that was the end of it. But as we kept optimizing, the rhythm still felt wrong. The numbers climbed too fast, suspiciously fast. I looked at the test flow and found that before each official test it quietly ran a warm-up using the exact same prompt. Every subsequent test was hitting the prefix cache, essentially cheating on an open-book exam. After isolating the cache, performance dropped by orders of magnitude again.

Still not done. Once prefill returned to normal, decode speed suddenly became absurd. A Windows build of the engine was somehow outperforming the Linux version. I ran the real-prompt test script I’d written earlier, and performance took another ten-fold hit. The problem was that the agent’s synthetic prompts were too simple and too regular, letting speculative decoding hit an acceptance rate above 80%. Switch to real prompts and the acceptance rate cratered, taking performance with it. Teams that have shipped speculative decoding have documented the exact same trap: real-world production performance is 40% to 60% lower than in the lab, a gap large enough to make you wonder if it’s the same system.

Three layers of illusion, stacked. If I’d believed that first number and shared it externally, the cleanup would have been miserable. You give someone the wrong expectation, and they start scheduling around it. Then you have to go back and say, “Sorry, we’re off by orders of magnitude.” That feels way worse than saying “We’re not there yet” from the start.

After that, every optimization target explicitly included two rules: prefill must not be affected by prefix-cache interference, and decode must use real prompts. Only then did we see a normal curve that crept upward, bit by bit.

2. It Will Brick Your Lab Machine

The latest agents can work autonomously for a full day or longer. The longer they run, the higher the chance something goes wrong.

My agent has, more than once, trashed the entire OS of a lab machine mid-run because of a missing quote or a flipped command-line argument. Files gone, environment wiped. It happens in a split second. You can’t stop it in time.

It’s not just me. In April, when an agent hit a credential mismatch, it didn’t stop to ask a human. It found a token with full privileges and deleted an entire company’s production database and all backups in nine seconds. Thirty-plus hours of downtime. Three months of customer data, gone. There have been at least a dozen similar documented incidents in the past two years.

Anthropic and OpenAI are now pushing sandboxing. The idea isn’t complicated. Filesystem isolation on one layer, network isolation on another. Without filesystem isolation, the agent can touch things it shouldn’t. Without network isolation, a compromised agent can steal your keys.

My own approach is more low-tech: dedicate a machine exclusively to the agent, and don’t store anything else on it. If it runs for dozens of hours straight, the probability of a dumb mistake is non-zero. Reinstalling the OS costs time. Losing important data costs your sanity.

3. It Will Spin in Circles Until You Step In

Agents have another bad habit: they circle the same problem.

A recent goal was to run an inference engine on Windows in BF16 precision. The model weights were over 60 GB, and loading them caused an immediate OOM crash.

The agent’s response was interesting: it kept trying to work around the memory bottleneck. Load only some weights, dynamically read the rest during inference, every offload trick in the book. None of them worked, and each ate up a lot of time. It even added a warm-up to the tests to hide loading latency. That was part of the root cause of the prefix-cache problem I mentioned earlier.

I finally cut in and said: stop tweaking performance and fix the memory issue first. Until that bottleneck is solved, everything else is wasted effort.

The agent actually executes well. Once pointed in the right direction, it quickly found a series of system-level Windows settings to expand available memory and VRAM. After that was fixed, the optimization path smoothed out immediately. All the previous workarounds were suddenly useless. That time was basically wasted.

The problem is that it won’t proactively redefine the problem. Hand it “optimize performance” and it will keep grinding on that goal, even when stuck on a prerequisite. It finds ways to work around it rather than telling you, “This assumption is false; we need to handle something else first.” Recognizing the real blocker and pulling the agent out of the dead end is a judgment call only a human can make.

4. Set the Bar Too High and You Ship Nothing

The last pitfall isn’t the agent’s fault. It’s mine.

The more powerful agents get, the easier it is to set the bar too high. Because they can run for days, you start thinking anything is fair game. Every direction looks like a top-conference breakthrough. So you spin up multiple threads, each one ambitious.

The result? Every thread is active, every thread shows progress, but nothing ships.

You keep burning tokens, you keep seeing “progress,” yet nothing reaches the user’s hands. It looks like work. It’s actually just burning money. I made this mistake recently: several threads were the kind that would be huge if they landed, but the execution risk was equally high. An agent isn’t a genie; if it can’t be done, it can’t be done. I burned a mountain of tokens and delivered nothing.

I eventually realized: narrow the scope. You need something shippable in the short-to-medium term and some worthwhile long-term explorations, not only the latter. Deliver what can be delivered first, stabilize the rhythm, then go after the big bets.

5. Knowing Just Enough Is Exactly Right

Look at the four pitfalls together and one thread connects them: none requires you to be a deep expert to avoid.

Performance jumped by orders of magnitude? Check whether you measured wrong first. The agent needs to run on your main machine all day? Give it a dedicated one. Stuck on the same spot after three optimization rounds? That spot is the real problem. Every thread is running but none ships? Kill a few.

It’s all common sense.

An MIT Sloan article this year on managing in the age of agentic AI noted that the most important skills for managing agents are defining problems and validating outputs. Those are things AI still can’t do well. “Agent Manager” is already showing up on job boards, and one line in the job description stands out: domain common sense matters more than AI expertise.

Going back to my previous post. “Ignorance is an advantage” still holds: you have to not know what’s hard in order to dare to try. But courage alone isn’t enough. The most valuable state is this: willing to try, yet able to sense when something is off at the critical moment.

Total beginners get carried away by fake data. Deep experts get shackled by priors. The people in the middle, the ones who know just enough, are bold enough to act, yet wise enough to pull the reins when needed.

Agents will keep getting stronger. But that bit of human common sense, whether the numbers check out, whether the direction is right, or whether this should ship now, will only become more valuable. These are still things agents can’t handle.

References

原文链接：https://guanjiawei.ai/en/blog/agent-common-sense

When a Model Is 5 Faster, It’s No Longer the Same Model

guanjiawei — Fri, 22 May 2026 06:52:33 +0000

Two releases caught my eye this week.

On May 19, Google released Gemini 3.5 Flash. I watched their launch event. Oddly, they barely emphasized the model’s raw intelligence. Benchmarks against the previous generation didn’t exactly stand out either. But they devoted serious time to speed, calling it “frontier intelligence built for speed,” claiming inference is 4× faster than other frontier models.

Today, May 22, Zhipu also launched GLM-5.1 High-Speed, claiming 400 token/s output—the current ceiling for industry APIs. This engine wasn’t built by Zhipu alone; it was a joint effort with a team called TileRT, doing low-level customization specifically for the GLM model family on a specific class of hardware.

Put these two together, then look back at Anthropic’s Opus Fast and OpenAI’s GPT-5.5 Fast over the past few months, and the direction is clear: differentiation at the model layer is changing lanes. Everyone used to compete on smarts; now they’re increasingly competing on speed.

And once speed crosses a certain line, it stops being a linear “X times faster” improvement. AI becomes a different kind of thing.

1. Pricing Already Tells the Story

The clearest evidence is fast-mode pricing.

Anthropic’s Opus Fast: 2.5× the speed, 6× the price.

OpenAI’s GPT-5.5 Fast: 1.5× the speed, 2.5× the cost.

Look at the numbers. If speed were valued linearly, 2.5× speed would cost 2.5× the price, and 1.5× speed would cost 1.5×. But in practice, the price jumps far more than the speed.

This isn’t greed. It’s a real market signal: some people will happily pay disproportionately more for speed. Either their tasks need high-frequency feedback, or users are sitting there waiting, or downstream steps are blocked. In these scenarios, going from 30 seconds to 12 seconds feels completely different from going from 30 seconds to 20 seconds.

I toggle Opus Fast on and off constantly myself. I turned off GPT-5.5’s 1.5× tier immediately. I couldn’t feel the difference; it was just burning money. But at 2.5×, there are tasks I just leave it on for—mostly when I’m staring at the output and iterating fast.

Markets don’t lie. Something that sells for 6× has buyers who genuinely think it’s worth it.

2. Per-Request Speed and Scaling Out Are Not the Same Thing

Two things need to be kept apart here.

The first is “doing more concurrency at a fixed speed.” Same 30 token/s throughput, but serving 1,000 users instead of 100. This is relatively easy. Just throw more machines at it—you can buy slightly weaker cards and spread the load across them, and the cost-performance ratio can be tuned.

The second is “making a single request faster.” Going from 30 token/s to 400. This is an entirely different beast. You need higher-end hardware, more aggressive memory bandwidth, and cutting-edge packaging. You can’t fix this by “spending a bit more to stack extra cards.” A hundred weak cards won’t get a single request to the speed of one top-tier card.

I’ve spent time experimenting with inference infra myself, optimizing a few open-source models. The cost curves are completely different. The first is roughly linear—double the money, get about double the concurrency. The second is non-linear—that first 20% speedup might cost you 50% more, and it only gets steeper.

So when Gemini 3.5 Flash emphasizes speed, or GLM High-Speed hits 400 token/s, they’re not saying “we made a cheaper version.” They’re saying “we pushed single-request speed to a new level.” That’s a problem of an entirely different magnitude.

3. 5× Is a Speciation Threshold

So why push so hard?

When I think about this, I go back to a simple comparison.

If you want something done faster, the traditional options are limited.

First, hire smarter people. But that hits a ceiling. There are only so many world-class experts, and today’s best models are already brushing up against that ceiling.

Second, make people work overtime. Agents already run 24/7, so that ceiling is gone too.

Third, divide the work and throw more people at it. But anyone who’s done engineering knows adding people doesn’t scale linearly. Adding one person doesn’t make it twice as fast; adding ten gets you nowhere near 10×. You have to break things down, hand off, coordinate, deal with uneven quality, manage waste. The ramp-up period for new hires is expensive. If you’re doing multi-agent orchestration, you know exactly what I mean.

At this point, the traditional paths to speed are tapped out.

So what’s left? Make the model itself—the same employee—faster.

And making that “same employee” faster is non-linear.

Imagine an employee who used to take an hour to finish a task. Now they do it in ten minutes. You think you just saved fifty minutes? It’s more than that.

You’ll start giving them tasks you’d never have bothered with because “it’s too slow.” Small ad-hoc requests that used to take an hour—so you never asked—now come back in ten minutes, and you make a dozen a day. Speed unlocks tasks that literally didn’t exist before.

I saw a demo the other day: someone wearing glasses pointed at a video on a screen and said “zoom in on this,” and the AI behind it wrote code to resize the element. If the whole chain takes thirty seconds, you glance and walk away—there’s no real interaction. But if it finishes in five seconds, the feel is completely different; it becomes a genuinely usable product.

That’s the gap between 50 token/s and 400 token/s. 8× speed unlocks products that were impossible to build before.

A speedup beyond 5× is a speciation line.

4. The Return of Specialization

Okay, speed is valuable. How do you actually achieve it?

That brings us to TileRT’s approach, which diverges from where the industry was a year ago.

Mainstream inference frameworks like vLLM, TensorRT-LLM, and SGLang are general-purpose. They aim to “support as many models as possible, running well enough on as much hardware as possible.” That has always been software engineering’s default bias: generality first, performance second.

TileRT does the opposite. It statically schedules the entire inference graph at compile time, running as a persistent kernel on the GPU with almost no runtime dynamic scheduling. Micro-tasks at tile-level granularity squeeze the hardware close to its physical limits. The cost? Change the model and it’s basically scrap; change the hardware and it needs major rework.

DeepSeek is on the same path. Their own inference engine started out based on vLLM, then underwent more than a year of deep customization—almost every path was rewritten for their own MoE architecture. When they open-sourced part of it recently, the industry’s reaction wasn’t “how general-purpose this is,” but “how deep you can go for a single model.”

Go one layer deeper, and the hardware side has been on this path for a while. Groq’s LPU runs Llama 4 Scout at 460 token/s, 3–4× what an H100 delivers. Cerebras’s WSE-3 hits 1,800 token/s on a 70B model and nearly 3,000 on gpt-oss-120B. These are specialized chips. They aren’t trying to run every kind of model; they’re built to take a specific workload to the extreme.

Chip designers have debated this for decades: general-purpose CPU or specialized ASIC? General chips have their place, but when a domain is big enough and the lifecycle is long enough, specialization pays off.

The software layer used to avoid this, mainly because software isn’t cheap to write. Building a dedicated inference stack for one model takes too long to pay off; the model changes and your software is dead.

That’s changing. AI agents can write software now. The cost of “building an optimal inference stack from scratch for a specific model and specific hardware” drops every year. Once it falls below a certain threshold, specialization becomes the default.

Every promising model will eventually have its own dedicated inference engine. Every generation of mainstream hardware will have its own specially optimized stack. What you used to think of as just “the last 5% of optimization” could now become a 5× or 10× gap.

5. Vertical Integration at the Model Layer Is Inevitable

Pulling these threads together.

Intelligence will keep improving in the short term, but the marginal utility of competing on raw smarts is declining. A model that’s 20% smarter versus the same model accelerated 10×—for many users, the latter is far more valuable, especially for the new scenarios that speed itself unlocks.

So the next phase of competition shifts from “point intelligence” to “end-to-end capability.” Model, inference engine, and hardware—all three bundled together.

If you’re at 400 token/s and I’m at 30 token/s, even if my model is 20× smarter, I’m unusable in many scenarios. I’ll be watching my smartest model sit there slowly spitting out words while you’ve already delivered the whole product experience to the user.

DeepSeek and Zhipu are already doing this. Anthropic and OpenAI are too. Google probably went the earliest and deepest—the TPU + Gemini combo has been running internally for a long time. My guess is that over the next year or two, the whole industry moves this way: model companies must own their inference stack and go deep into the hardware layer; hardware companies must go deep into model architecture; and the generic middle layer gets squeezed from both ends.

For engineers, this is pretty exciting. We used to think “general, scalable, portable” was good taste. For the foreseeable future, the opposite may hold: writing the most extreme code for a specific model and specific hardware—code that breaks if you change anything—becomes worth doing again.

Software engineering aesthetics will have to change.

References

原文链接：https://guanjiawei.ai/en/blog/inference-speed-new-species

AMD Gave a Developer Award to Someone Who Can't Code

guanjiawei — Tue, 19 May 2026 08:07:36 +0000

Today I went to AMD's developer conference in Shanghai.

The entrance alone was a shock. The line to get in was long, and the hall was already packed before anything started. They expected just over 1,000 people; more than 2,000 showed up. AMD said it was their biggest recent event.

Lisa Su showed up too. She'd been in Beijing the day before meeting Vice Premier He Lifeng to talk chip cooperation. I'd never seen a chip company pull a crowd like this for a developer conference.

AMD Gave an Award to Someone Who Can't Code

That morning, an AMD senior VP got on stage to hand out two developer awards. When they introduced one winner, the host said:

"He didn't actually know how to code before."

He'd used an AI agent to rewrite an entire system in Rust and optimize performance. AMD figured that was worth an award.

Sitting there, the whole thing felt surreal. A chip company worth hundreds of billions, at a 2,000-person developer conference, handed one of two awards to someone who doesn't code.

I bet next year the award will be even harder to judge. AI-native people like him will only become more common.

The Boundary Between Developers and Users Is Vanishing

Before, if you used a product, you just used it. You couldn't really help build it. Even in open source, you had to code before you could contribute.

Not anymore. Coding agents are getting stronger, and regular users can now tweak, optimize, and push changes back while using a product. The same person is both user and builder.

I wrote before about the split between Builders and Promoters. That was about passion diverging. This is the flip side: the roles of user and contributor now overlap, often in the same person. Users are also investing their tokens across different products, and the ones that earn that investment keep evolving.

Product logic has shifted. You used to focus on making the experience great. Now you also need to make it easy for users to become contributors.

AMD's big Strix Halo push is interesting. The AI Max+ 395 chip can allocate up to 96 GB of unified memory to its integrated GPU for running local models, and my inference engine can run on it too. Domestically, prices have been climbing and it's been out of stock. I have several R&D test machines for performance tuning, and they're also my entry point into the ROCm ecosystem.

AMD is pushing this machine to lower the developer barrier another notch. More developers means stickier stacks.

Industry Winds Did a 180 in Six Months

I attended a similar conference around mid last year. The vibe was completely different.

Back then, people in the compute business were stressed. Early last year, DeepSeek raised expectations for models, and everyone was wondering whether the wave would last. How to move product, how to clear inventory, whether the business could survive. Everyone was scrambling for solutions and partners.

This year, the table talk completely changed. The first thing anyone says is, "Can you get me more supply?" or "I'll take everything you've got."

It's completely flipped. Supply is tight, and whoever holds quality inventory is making money. The shift from demand anxiety to supply anxiety took just six months.

This Isn't Another Bubble

Plenty of people say: "Here we go again. Next metaverse."

This time it really is different. I lived through the metaverse and blockchain cycles too. The difference this time is in the data, specifically paid demand from real users.

Lisa Su said on stage that roughly 1 billion people are already using AI worldwide, and by 2030 that number will hit 5 billion daily active users. ChatGPT came out at the end of 2022, so it's been less than three years. The internet took over 20 years to reach that scale; the PC era took even longer. This is a diffusion speed never seen before in history.

The money is keeping up too. Anthropic's Q1 grew 80x year-over-year. That's annualized revenue, not API calls. Dario himself said they weren't ready to catch a wave this big. Claude Code hit a $1 billion annualized run rate within six months of launch, and by April this year the company's overall ARR surged to $30 billion.

This is nothing like a few years ago, when everyone was in a price war, handing out free tokens, and chasing call volume. Supply can't keep up with paid demand.

"X Is Dead" Is the Cheapest Narrative

A friend recently asked me: "Is Openclaw dead?" "How's Claude Code doing?" "I heard Codex is going to win."

I think that's just inertia.

Last December, every top academic conference and product circle was talking about Gemini. Back then everyone thought Google had it in the bag. A few months later, almost nobody mentioned Gemini. Then it was Cursor, then Claude Code. Pretty soon it'll be Codex. At the top table, players keep rotating.

But the underlying trend runs one way. It hasn't reversed. Paid demand is rising, call volume is rising.

Real information is expensive. You have to use the tools yourself, show up on-site, and talk to people inside. So the audience for that is naturally small. Narratives like "it's dead," "it's a bubble," or "just another cycle" are the cheapest to spin up. They validate sitting on the sidelines and feed the need to believe that not engaging was the right call. They spread the easiest.

Not that nobody believes it. Most people just want to.

Go See for Yourself

Lately when I meet friends, I do one thing: tell them to bring their laptop, and I help them install Claude Code or Codex and get the model connected.

Once you get past that hurdle, you can hand off 95% of computer work. I built my own website from scratch without lifting a finger. Frontend, DNS, SEO, all done by agents. The barrier is small, but once you're past it, the world looks completely different.

If that's still too much, just find a conference this year where people are actually doing this and go. There were quite a few workshops at the event where you brought a laptop and worked hands-on. Only when you sit there do you realize how far AI has already come.

References

AMD CEO Lisa Su in Shanghai Predicts 5 Billion Daily AI Users Within Five Years — On-site report from AMD AI Developer Day Shanghai, May 19, 2026
Chinese Vice Premier He Lifeng Meets Lisa Su, Calls for Deeper Cooperation — May 18, 2026, He Lifeng meets AMD CEO Lisa Su in Beijing
CES 2026: Lisa Su Predicts Over 5 Billion AI Users in Five Years — Lisa Su first gave the 5-billion-user forecast during her CES keynote
Anthropic Q1 Grew 80x, Annualized Run Rate Hits $30 Billion ARR — Dario Amodei publicly acknowledged 80x year-over-year Q1 growth
Anthropic's ARR Surged from $9 Billion to $30 Billion in 4 Months — Full ARR trajectory: Jan 2024 $87M → Dec 2024 $1B → End of 2025 $9B → Apr 2026 $30B
Claude Code Surpassed $1 Billion Annualized Revenue Within Six Months of Launch — Claude Code is Anthropic's fastest-growing product
ChatGPT Is the Fastest-Growing Consumer Product in History to Reach 100 Million Users — Reached 100 million users in 2 months, faster than TikTok and Instagram
AI Adoption Speed Compared with Historical Technologies — Epoch AI research: 70% US household adoption took 40 years in 1900, shrinking to 17 years by 2000
AMD Ryzen AI Max+ 395 (Strix Halo) Official Specs — 16 Zen 5 cores, Radeon 8060S, up to 128GB LPDDR5X unified memory (up to 96GB allocatable to GPU)
Ryzen AI Max+ 395 Out of Stock and Rising in Price in China — Current tight supply situation for Strix Halo standard chips in China's retail market

原文链接：https://guanjiawei.ai/en/blog/non-coder-award

Same /goal Feature, Two Agent Personalities

guanjiawei — Mon, 18 May 2026 04:59:32 +0000

I've been using Codex's /goal for weeks, and my token consumption has climbed another notch. Claude Code added the feature in its May 12 2.1.139 release—straight to stable, not experimental. I had a few tasks that Codex never quite managed to finish, so I moved them over to try.

The contrast was stark. Same paradigm, nearly identical loop, yet the two models produced completely different results.

I'm writing this partly to think it through, partly because it's worth sharing. /goal isn't so much a feature as a new way of working. The form looks identical, but when the model's personality differs, the practical reality is entirely different.

1. Codex: Heads Down, No Questions, Never Quits

Let me start with Codex as a baseline.

Codex CLI's /goal appeared as an experimental feature in 0.128.0. I've been using it since then and wrote about it previously. The real shift has been mental: I actually started believing that "letting the agent run" really works.

It doesn't interrupt you. When running /goal, Codex almost never calls subagents; it works inline unless I explicitly tell it to delegate. Compaction works better than I expected. After compressing, it picks up from the previous round without major information loss, and doesn't suddenly get dumber as it pushes forward. Most importantly, it's stubborn. It almost never tells me a goal is unachievable. Even when it hits a wall, it tries another angle, then another, until the token budget runs out. I've tested this repeatedly. I've left three independent /goal sessions running overnight; in the morning, most are still on track.

The context window is a genuine weak spot. Codex defaults to 400K under GPT-5.5. OpenAI balanced pricing and throughput there, while the API offers the full 1M. Claude Code defaults to 1M. But even with only 400K, Codex runs remarkably stable under /goal.

2. Claude Code: Beautiful Opening, Then What?

On May 12, Anthropic dropped /goal, Agent View, /bg, /loop, and /batch all at once. My first thought was "finally." Codex had been iterating on this for several versions; Claude Code felt a bit slow to catch up.

I moved the tasks Codex couldn't crack over to Claude Code and started /goal.

It started strong. Claude Code immediately spun up subagents, laid out plans, and orchestrated context. It looked far more ambitious than Codex. My expectations immediately rose. With an opening like this, it should outperform Codex.

But as it ran, issues cropped up.

The first thing that made me frown: it kept popping up to ask me to make choices. This is usually one of Claude Code's likable traits. Faced with a judgment call, it doesn't just plow ahead. It stops to align with you, asking which of directions A, B, or C you prefer. And the questions are usually on point. But under /goal, this is a bug, not a feature. The whole point of /goal is "you set the goal, I run myself, don't interfere." The model should own every intermediate judgment. When it pops out with questions, those hours of freed-up time are immediately lost. If you step away, it just sits there waiting for you to come back.

More surprisingly, it proactively tells me it can't achieve the goal. Then it actually fails the goal. Sometimes after just a few dozen minutes. The reason is usually that the task seems too large for the session, or that there are fundamental blockers. When I tell it to continue, it reluctantly pushes forward a bit, then does it again.

Third: it gets dumber after compaction. A 1M context window sounds huge, but Anthropic themselves have admitted that performance degrades over long runs. Worse is the compression step. After each compaction, Claude Code often seems to have forgotten everything that came before. The original plan, the pitfalls already encountered, the original context—all have to be pieced back together. Codex doesn't suffer from compaction nearly as badly.

These three issues combined make long-horizon tasks unstable in Claude Code's /goal.

3. It's Not Just My Impression

At first I thought it was my usage. Then I looked around and realized Opus 4.7's laziness was already common knowledge.

Opus 4.7 was released on April 16. Within 48 hours, a Reddit thread titled "Opus 4.7 is not an upgrade but a serious regression" got over 2,300 upvotes. AMD's AI director publicly complained that Claude Code had become "dumber and lazier." Screenshots were everywhere. Someone posted a conversation where Claude itself replied, "I was acting lazily."

Anthropic later published a postmortem, admitting that on April 16 they had added a "reduce verbosity" instruction to the system prompt. This instruction, along with a few other changes, dragged down coding quality. On April 20 they rolled it back. But my sense is that after the rollback, Opus 4.7's laziness only eased slightly. It didn't fully recover. The RL layer had already internalized this tendency. You can't fix that by tweaking a system prompt.

In extended continuous operation like /goal, this laziness gets amplified. A lazy model might get away with it on short tasks. Put it on a long task, and it will find all sorts of seemingly reasonable excuses to fail itself.

4. These Past Few Months, We've Been Doing the Same Thing

/goal didn't appear out of nowhere. It's the culmination of months of exploration.

Before the Lunar New Year, I was already tinkering with something similar. At the time, I was doing stability testing for AIMA (our model management platform). The core idea was to have AI simulate real users running tests repeatedly to improve stability. The most naive attempt was using the terminal's built-in task mechanism. I set up 10 tasks, each running for a long time.

This path died quickly. Each task was still in the same session, and models don't hold up well in long sessions. Within a few rounds, things destabilized, and no amount of prompt tuning could save it.

Next I looked at a two-layer architecture. At the time, Kilo Code was pushing a feature called Orchestrator Mode, previously known as Boomerang Tasks, inherited from Roo Code. The logic was sound: an outer orchestrator manages tasks, delegates each subtask to an independent subagent running in its own context, then collects the results.

I tried a round with several cost-effective models available at the time. Zhipu performed slightly better, able to push through long tasks for a while. Minimax was more comical. It started writing code at the orchestrator layer itself and never delegated. The two-layer architecture simply failed on it. I thought about this for a while afterwards. It didn't seem like a harness adaptation issue. More likely, the model itself lacked the sense that it's the lead and should delegate.

In February, Claude Code shipped Agent Teams alongside Opus 4.6. It was experimental, requiring the CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS environment variable to enable. One session acts as team lead, dispatching other subagents to complete tasks in fresh context windows. This was essentially Kilo's architecture as an official implementation. I was genuinely impressed when I tried it. Long tasks could run for two or three hours without crashing.

But after one compaction, the team lead side fell apart. Previously dispatched subagents couldn't be found, so it would redeploy a new round, task lists got misaligned, and tokens burned fast. The two-layer architecture itself suffered information decay. Context shuttled back and forth between layers, losing a bit each time.

Then came Ralph, full name Ralph Wiggum. Australian developer Geoffrey Huntley built it at the end of 2025. The logic was so simple it was almost suspicious: a bash while-true loop, repeatedly feeding the same prompt file to an agent until the goal is achieved. I tried to test its tmux version at the time, hit some snags, and shelved it.

Ralph caught on extremely fast. It's the most direct inspiration for the /goal product line. Today, Anthropic has absorbed Ralph as an official Claude Code plugin, parked under plugins/ralph-wiggum/ in the repo. Kilo Code's Orchestrator Mode, conversely, has been officially marked deprecated. The reason given: "the main agent can now delegate directly to subagents, so a dedicated orchestrator is no longer needed."

Hand-rolled terminal tasks, to Kilo Orchestrator, to Claude Code Agent Teams, to Ralph going viral, to Codex shipping /goal, to Claude Code shipping /goal, to Ralph being absorbed and Kilo Orchestrator deprecated. The evolutionary thread of these past few months is clear.

5. Codex Dives In; Claude Code Keeps Looking Up

Back to the models themselves.

After running both, I have a fairly solid judgment. Codex is the "local" faction. Claude Code is the "global" faction.

Math has a concept called local optima. The optimization space is like a valley. Starting from one point and walking downhill, you might end up in a local minimum, but over the next ridge there's a deeper valley. I've watched Codex fall into these local optima repeatedly during /goal. It polishes one direction, does this and that, circles back, thinks it's moving forward, but is actually treading water. Its heads-down approach is usually a strength. In these moments it becomes a weakness.

Claude Code is different. It performs large-span reflection and validation, proactively asking whether its current direction is right. I've repeatedly seen it jump out of what looked like a converging direction, saying "wait, the root of this problem might not be here, I need to reconsider," and then actually find a better path.

This global view is Claude Code's strength. For complex tasks lasting one to two hours and requiring judgment, reflection, and cross-module coordination, I still think Claude Code outperforms Codex.

But this global view doesn't buy endurance. It can't run long under /goal, and can't deliver stable 24-hour unattended output. An imperfect analogy: Codex is an intern who can grind for 12 hours straight, occasionally drifting off course. Claude Code is a senior engineer with good judgment, but he needs to check in every 40 minutes, or decides after 30 minutes that this is too hard and he's out. Which is better suited for /goal? The answer is obvious.

6. Form Converges, Training Diverges

After running this comparison, I have an additional read on where coding agents are heading in the coming months. Harnesses are rapidly converging, but model personality differences will become increasingly prominent.

Boris Cherny (founder of Claude Code) has been saying that in the future, a harness might be just 100 lines of code. I believe this even more now. Once the /goal paradigm converges, the outer structure of coding agents will get thinner and thinner. A loop, a set of tools, a goal. That's enough.

What will truly determine the gap is the model's personality within this loop. Whether it's willing to put its head down and work. Whether it keeps popping out to align with humans. Whether its state survives compaction. Whether it can jump out when stuck in a wrong direction. When it hits a wall, does it try again, or say it can't do the goal and bail?

None of these can be fixed with prompting. They're set during training.

OpenAI and Anthropic have already trained distinctly different model personalities for long-horizon tasks. Codex seems to have been trained into "never give up, hit the wall and try again." Claude Code seems trained to "report frequently, align frequently, reflect frequently." That's endearing in interactive scenarios, but fatal under /goal.

In the short term, this divergence is hard to bridge. Even after Anthropic rolled back that verbosity system prompt, Opus 4.7's laziness only eased. It didn't fully recover. RL internalized it. You can't fix that by changing outer prompts.

7. Choosing an Agent Is Increasingly Like Choosing a Partner

At this point, the way I use /goal has changed.

I no longer start by asking which tool is stronger. Instead, I ask: which model's personality fits this task?

For iterations lasting over six hours, with a clear goal and low trial-and-error cost, I just fire up Codex /goal. For architectural judgment, cross-module decisions, possible mid-course direction changes, I use Claude Code /goal, but I check back every 30 to 60 minutes, mentally prepared for it to pop out with questions. For truly unattended 24-hour runs, it has to be Codex, and the task direction needs to be clearly nailed down upfront. If it's just a single hard problem requiring global vision, I actually don't use /goal at all. I use Claude Code in normal mode and knock it out in 30 minutes.

A few months ago, choosing an agent meant choosing UI, community, pricing. Now it's more about choosing a model personality.

Next-generation models, whether from Anthropic or OpenAI, will definitely train toward fixing the other side's weakness. Codex will try to add global vision; Claude Code will try to add endurance. In the short term, this personality divergence remains real, and it significantly affects how much value you can extract from /goal.

The biggest effect of /goal is that it amplifies a model's true personality into 24 hours of continuous output. The one with the steadier personality wins this round.

Right now, Codex leads by half a step. But only half a step.

References

Claude Code 2.1.139 adds /goal command — explainx.ai: Claude Code /goal launch notes, May 12, 2026
Claude Code Agent View, Goal Command, and Background Sessions Update — Geeky Gadgets: Detailed overview of Claude Code 2.1 features
Inventing the Ralph Wiggum Loop — Dev Interrupted: Geoffrey Huntley on inventing Ralph
Ralph Wiggum 官方 Claude Code plugin — GitHub: Anthropic has absorbed Ralph as an official plugin
Kilo Code Orchestrator Mode (Deprecated): Current status of Kilo Code Orchestrator
Orchestrate teams of Claude Code sessions — Claude Code Docs: Agent Teams official documentation
Claude Code experimental agent teams — DeepakNess: Agent Teams release notes alongside Opus 4.6
Claude Opus 4.7 Regression Explained — buildfastwithai: Opus 4.7 regression and community feedback
Opus 4.7 isn't dumb, it's just lazy — Shimin Zhang: Analysis of Opus 4.7's laziness issue
An update on recent Claude Code quality reports — Anthropic Engineering: Anthropic official postmortem on rolling back the verbosity system prompt
GPT-5.5 Codex 400K context window — GitHub Issue: Codex 400K context window limit explained
Boris Cherny on Claude Code's future — Pragmatic Engineer: The "100 lines of code" prediction

原文链接：https://guanjiawei.ai/en/blog/goal-two-personalities

AI Transformation Doesn't Come from Training

guanjiawei — Sat, 16 May 2026 13:19:33 +0000

Lately, when AI agents come up in conversation with friends, I've fallen into a habit. I pull out my phone, remote into my computer, and show them the agents I've had running over the past 24 hours. One has been autonomously chasing a goal for over ten hours straight. Another is running experiments and tuning parameters.

Their reactions are pretty much always the same: "Oh, so it's already at this stage. That's not what I pictured."

What they say next is the interesting part.

1. "Help Me Explain This to My Boss"

After watching, the first thing friends say usually goes like this:

"Can you come explain this to our boss?"

"I want to bring our tech lead over to see this."

"Can you give us a training session?"

Fair enough. You think this is important, and you want to bring in the people who need to see it. Good instinct.

But looking back at how AI agents actually spread through our own company, real change never came from a single class or presentation.

It started with someone who just did it themselves. They created something in their own work that made people around them do a double take.

A salesperson who suddenly talks like an engineer, while closing deals faster than ever. An admin or HR person who turns out to be doing technical work and marketing, shipping product-grade work from a role that never used to do that. People around them start to wonder. Why don't you seem like the same salesperson, the same admin anymore?

At that point, the curious ones show up naturally. Colleagues, bosses, friends. The change is happening right beside them, they can see it, and only then do they actually absorb what you're saying. Then it spreads from you to the next colleague, and the next, and out from there.

To be honest, trying to drive change by "getting the boss to sit through a lesson" rarely works. Unless that boss personally got their hands dirty on day one. Because right now, knowing what AI can and can't do comes entirely from bumping up against its boundaries yourself, not from hearing about them.

The data backs this up. A BCG report from early 2025 said 75% of executives rank AI as a top-three priority, but only a quarter have actually captured significant value. McKinsey put it more bluntly: 70% of employees skip their company's formal AI training videos entirely, learning instead by tinkering and word of mouth.

Training can only convey so much. What's scarier is that someone who hasn't deeply used AI themselves, if they go on to set policy, easily falls into one of two extremes. Either they fantasize that AI can do anything, piling on unrealistic KPIs that make their team's life miserable while they think it's all simple. Or they dismiss it entirely—"another bubble, here we go again"—and miss the real window.

So the first misconception, and I think the biggest: don't start by trying to change others. Start with yourself.

2. A PhD-Level AI Writing Weekly Reports

The second thing that really strikes me as a shame.

A lot of top companies give their employees excellent AI infrastructure. The best models, unlimited usage, loose policies. But most people, once they get access, instinctively reach for the most routine tasks. Meeting summaries. Reports. Weekly and monthly updates. And then they stop.

I'm not saying those tasks don't matter; AI really is useful for them. But stopping there is a waste.

If you look deeper along the company's value chain, at the most painful links, whether that's marketing, sales, the product itself, or R&D, couldn't AI do something there too? You don't have to be an expert in that domain, but your industry understanding plus AI's execution ability could let you build something at those nodes.

Think about it. A PhD-level AI told to write weekly reports will dutifully write weekly reports. It does what you assign. But tell it to research cutting-edge math, biology, or medicine, to run experiments and work through deductions, and it does that well too. One's a clerk, the other's a scientist. The gap is massive.

Worklytics data says that within an organization, truly deep AI power users probably account for only 20–30%. The rest hold the exact same tools and use them only for the shallowest tasks. A BCG report from October 2025 also noted that 74% of enterprises get stuck when trying to expand AI adoption. It's not that the tools don't work. It's that the users only used one corner of them.

3. Long-Term Without Short-Term Is Unsustainable

This one is harder to spot than the first two.

After using AI for a while, a lot of people go through an emotional arc. At first they're amazed: "This is so powerful." Then they gradually shift to: "What exactly should I do?" The directions seem plentiful, all viable, but deciding specifically what to do and how to keep going is actually the hardest part.

I've fallen into this trap myself.

AI agents can do remarkable things, but they don't grant wishes. For some bigger directions, agents still burn through massive amounts of tokens and take forever. They need round after round of experimentation to explore and tune before they might yield results. They might not yield anything at all. You're at the boundary of knowledge, and probing forward was never easy. If you bet everything on projects like that, it's easy for your enthusiasm to fizzle out. You work for ages without seeing results, and when people ask what you're doing, you can't really explain it.

So you need a mix.

Short-term things with fast positive feedback. My shortest feedback loop comes from working on my digital identity. Optimizing my website for SEO and having people find me through search. Writing blog posts and having readers get something out of them and want to share and engage. In between, I do small AI projects for friends. Helping a friend with a crawfish business. Making games for people. All of them show results quickly.

Mid-term, you need products that accumulate. The AIMA system, for example. When I show it to potential partners, some are willing to install it and promote it. That's a sturdier kind of positive feedback than "I ran an experiment."

And those deep, long-term explorations in the trenches keep running quietly in the background.

Kotter's eight-step change model has a step called "Generate Short-Term Wins." Same idea. Short-term results sustain confidence, giving you the nerve to keep chewing on hard problems. If the process also brings in some revenue to cover the token costs, the positive loop gets even stronger.

4. Prompt Engineering Is Yesterday's News

The last one, and I think a lot of people are still stuck here.

When people talk about using AI, they still fixate on prompts, thinking they need to master prompt engineering.

That was fine two years ago. Not anymore.

Give today's models a goal and a couple of sentences, and they'll go execute complex tasks. Prompts stopped being the bottleneck a while ago.

The bottleneck is harness. How to build an environment where the agent can actually get work done.

What you need to think about has changed. How do you design the document structure of the working directory? How do you give it machines for experiments? When do you check if it's gone off track? When should you have it pivot direction or change methods? How do you do periodic summaries and archiving?

In early 2025 Karpathy coined the term "vibe coding," casually using natural language to have AI write code, very freeform. A year later, looking back, he said the industry had moved from vibe coding to "agentic engineering," with value shifting up from syntax and implementation to judgment, taste, and management capability. Shopify's Tobi Lutke offered another term, "context engineering." It's not about how to write a good prompt, but about how to fill the agent's context window with the right information.

At the end of the day, AI is a digital employee. When you work with an employee, you don't think the most important thing is crafting their first email, right? That email is a tiny piece. What you really need to figure out is how to set up a proper work environment and guidance that leverages your sense of direction and their execution power, while steering clear of the mistakes they're prone to make.

Shift your thinking from "how to write one good sentence" to "how to manage a digital employee," and collaborating with AI feels completely different.

Looking back, these four points are really one thing.

Start doing it yourself. Don't wait for others. Once you do, don't stay in the comfort zone. Look deeper along the value chain. Set your own rhythm so short-term feedback never dries up. And shift your attention from prompts to environment and collaboration.

The change you create doesn't need pushing. It spreads on its own.

References

BCG, From Potential to Profit: Closing the AI Impact Gap, January 2025.
McKinsey, Superagency in the Workplace: Empowering People to Unlock AI's Full Potential at Work, 2025.
Tobi Lutke (Shopify CEO), Internal Memo on AI Usage Expectations, April 2025.
Andrej Karpathy, Sequoia AI Ascent 2026: From Vibe Coding to Agentic Engineering, April–May 2026.
Tobi Lutke & Andrej Karpathy on "Context Engineering," 2025.
BCG, The Widening AI Value Gap, October 2025.
Worklytics, AI Adoption Benchmarks 2025, Q3 2025.
McKinsey, The State of AI in 2025, March 2025.
John P. Kotter, Leading Change: Generate Short-Term Wins.

原文链接：https://guanjiawei.ai/en/blog/ai-transformation-not-from-training

Two Generations Was All It Took

guanjiawei — Fri, 15 May 2026 03:43:43 +0000

Yesterday I watched the footage of Trump's state visit to China, and honestly it hit me. Red carpet, military band, state dinner. Musk, Jensen Huang, Tim Cook all came along, even Defense Secretary Hegseth. First time a US president visited China in nearly nine years.

Think about who this is. The president of the most powerful country on earth, bringing some of the biggest names in tech, sitting down to talk. Not coming to lecture. Coming to negotiate.

Everyone knows Trump's style. With countries he considers weaker, he doesn't even bother pretending — might makes right. These past few years, the way many world leaders have looked standing next to him has been, frankly, painful to watch. Some of it bordered on comical.

But watch him in China. Completely different person. Polite, restrained, saying nice things.

Why? Because you're strong enough. Weak countries in today's world have no dignity to speak of.

Stand first

When the People's Republic was founded in 1949, how bad was it? A century of getting beaten from every direction. Foreign powers, the Japanese, civil war. The country had nothing left.

But the first priority wasn't getting rich. It was making sure nobody could beat you again.

The Korean War broke out in 1950. When Chinese volunteers crossed the Yalu River, the country's per capita GDP was a few dozen dollars. The Americans had tanks, artillery, fighter jets. China didn't even have an air force, and logistics were basically nonexistent. Under those conditions, over a million troops went in, fighting on guts and willingness to die, and pushed the front line back to a ceasefire.

Over a hundred thousand never came home.

Then came the Two Bombs, One Satellite program.

First atomic bomb in 1964. Hydrogen bomb test in 1967, just 32 months from fission to fusion, the fastest of any nuclear state. "Dongfanghong-1" satellite launched in 1970. China became the fifth country to independently put a satellite in orbit.

Of the 23 scientists honored for this, 10 had studied in America, 6 in Britain, others in France, Germany, the Soviet Union. They finished their studies abroad and came back. Back to a country that had nothing. In the chaos of the Great Leap Forward and the Cultural Revolution, they built world-class strategic technology.

What made it extraordinary? The window.

Try building nuclear weapons today. You can't. The treaties killed that. The window closed. Those scientists, working with raw brilliance and pure stubbornness on barren ground, grabbed it while it was still open.

From poor to prosperous

The first thirty years solved the "can't be beaten" problem. Next: "can't eat."

Deng Xiaoping's Southern Tour in 1992. Reform and opening up had nearly stalled, conservative forces were gaining ground. An 87-year-old man, instead of arguing with bureaucrats in Beijing, went straight to Wuhan, Shenzhen, and Zhuhai and said one line: "Development is the only hard truth."

GDP growth went from 3.9% in 1990 to 14.3% in 1992. That same year, the 14th Party Congress formalized the "socialist market economy."

I was born right at that inflection point. For as long as I can remember, China was growing. My generation got lucky — never went hungry, never lived through a war.

There's a concept in economics called the middle income trap. The World Bank studied it: out of 101 middle-income economies since 1960, only 13 made it to high income. South Korea, Taiwan, Hong Kong, Singapore. That's the short list.

Now it's China's turn.

Per capita GNI in 2024 was roughly $13,500. The World Bank's high-income line is $14,005, a 4% gap. Probably cleared within a year or two. A 1.4-billion-person economy crossing that line. Never happened before.

Why do some countries get stuck? It's not a shortage of smart people. Some countries have plenty of brilliant minds. But the talent leaves and doesn't come back. Society itself is too fractured, no stable foundation to channel all that energy into something coherent. Having a big population and having a deep talent pool are different things. Look at India and Brazil.

The AI parallel

Back to windows of opportunity. AI works the same way.

I wrote a piece before about how the AI industry is already in wartime. Look at global AI today. The only two players that actually compete at the frontier model level are the US and China.

Stanford's 2026 AI Index has some interesting numbers. The top US model leads the top Chinese model by just 2.7%. DeepMind's CEO Hassabis himself said the gap is only "a few months."

But there's another number that's even more interesting: US private AI investment totaled $285.9 billion. China's was $12.4 billion. A 23x spending gap producing less than a 3-point performance gap. So who's more efficient?

Europe has Mistral, valued at €11.7 billion and growing. But on the frontier model leaderboards, the gap between Mistral and the US-China top tier is clear. Everywhere else isn't even in the conversation.

Why can only the US and China compete?

I think the answer is the same as why those scientists pulled off Two Bombs, One Satellite seventy-seven years ago. Stable environment, sustained investment in education, deep enough talent base, and making the right calls when the window was open. China now holds close to 70% of global AI patents, and leads in research output and industrial robot deployment.

Foundation, environment, timing. Take away any one and it falls apart. Same logic as seventy years ago.

None of this was guaranteed

From 1949 to 2026. Seventy-seven years. About two generations.

Zoom in a bit. My parents' generation, thirty, forty years ago, still going hungry. One generation before that, Li Hongzhang signing the Treaty of Shimonoseki after the Sino-Japanese War, ceding Taiwan and the Liaodong Peninsula. Then the Boxer Protocol after the Eight-Nation Alliance. World War II ended, China was one of the victors, and its territory was still carved up at will.

Less than a century later, this country stands at the dead center of the world stage, sitting across the table from the most powerful nation on earth as equals.

Flip through history. Britain's rise via the Industrial Revolution took the better part of a century. Germany after unification, decades. Japan from the Meiji Restoration to genuine great-power status, about the same. And all of them started from a much better position than China did.

Watching yesterday's footage, it's worth stopping to think about what it actually took. People back then carrying millet and rifles, owning nothing, trading their lives for the space to survive. Scientists building world-class technology from absolutely nothing. Then generation after generation of ordinary people grinding it out until we got here. Sitting comfortably, eating whatever we want, drinking whatever we want, living with dignity.

None of this fell from the sky.

From standing up, to getting prosperous, to sitting at the center of the world while it falls apart around you. Two generations. That's all it took.

So what about us? What does our generation do next?

References

Trump's 2026 State Visit to China — May 13–15, 2026, first US presidential visit to China in nearly nine years; Elon Musk, Jensen Huang, Tim Cook, and Defense Secretary Hegseth accompanied
Who Was on Trump's Plane to China (PBS) — delegation included multiple tech CEOs and the Secretary of Defense
Trump–Xi Beijing Summit Trade Talks (CNBC) — both sides reached "generally balanced and positive outcomes"
Deng Xiaoping's Southern Tour — Jan–Feb 1992, visited Wuhan, Shenzhen, Zhuhai, Shanghai; GDP growth surged from 3.9% to 14.3%
China in the Korean War — 1950–1953, China deployed over 1 million volunteers under extreme material disadvantage
Two Bombs, One Satellite — atomic bomb 1964, hydrogen bomb 1967, satellite 1970; 23 honored scientists
China's 32 Months from A-Bomb to H-Bomb (Bulletin of the Atomic Scientists) — the shortest fission-to-fusion timeline of any nuclear state
The Middle Income Trap and China (CEPR) — World Bank high-income threshold $14,005; only 13 of 101 middle-income economies since 1960 successfully crossed
China's Per Capita GNI Approaching High-Income Threshold (SCMP) — ~$13,500 in 2024, ~4% gap
Stanford 2026 AI Index Report — US–China AI performance gap narrowed to 2.7%; China holds ~70% of global AI patents
DeepMind CEO: US–China AI Gap Is Only "Months" — Hassabis's assessment of the US–China AI gap
US–China AI Investment Gap (Morgan Stanley) — US private AI investment $285.9B vs China $12.4B
Treaty of Shimonoseki — signed 1895 after the First Sino-Japanese War by Li Hongzhang, ceding Taiwan, Penghu, and the Liaodong Peninsula
Boxer Protocol — signed 1901 after the Eight-Nation Alliance; Li Hongzhang was China's signatory and died shortly after

原文链接：https://guanjiawei.ai/en/blog/two-generations

One Hour for the Demo, Three for the Production Line

guanjiawei — Thu, 14 May 2026 08:17:04 +0000

You often see people online saying that in the AI era, reliability matters most.

The first time I saw it, it sounded like a tired cliché. Every era gets assigned its own buzzword; "intelligence" and "execution" have already had their turn. Does "reliability" actually fit the AI era any better? Not really.

A few recent projects finally drove the point home.

Three Characters a Day, Thousands of Assets Over a Hundred Days

I made a PAW Patrol reading game for my son. Three characters a day, three hundred over a hundred days. The number looks small.

The thing is, each of those hundred days is an independent mini-level. Every day needs about six or seven images and dozens of audio clips. The voices are cloned from a PAW Patrol character, each line matching a specific script. Add it up across a hundred days, do the math, and you're looking at thousands of assets, easy.

The day-one demo worked. My son and I sat there playing for twenty minutes, having fun. That's exactly where the problem started.

I thought the rest was just running that demo ninety-nine more times. Turns out the real thing and the demo are two completely different beasts.

I randomly picked two clips from the first batch of twenty. Both were bad. One dropped two characters; the other's emotion completely mismatched the line. Would I dare use the other eighteen? I sampled again. Still broken. With over a thousand assets across a hundred days, how many would actually be usable? I had no idea.

That feeling of uncertainty is the critical part. It's not a minor issue; it stops you cold.

The Extra Layer Is Called Quality Assurance

Demos work because a human is watching. Generate one, listen, if it's no good try again, pick the best and keep it. The whole process is manual. The human is an invisible QA layer.

To make it fully automated, you have to swap "human in the loop" for "model in the loop." That's QA.

Sounds simple. Doing it opens a whole new world.

One Audio Clip, Scored by Three or Four Models at Once

I started digging into niche models in the industry. The usual suspects are ASR and TTS. ASR understands speech, TTS generates it. But scoring TTS output for quality? There's a whole category of models built just for that.

DNSMOS is from Microsoft. Originally built to score noise-suppression algorithms, it doesn't need the original clean audio as reference; from a single clip it judges how much noise is present and whether the overall result is listenable. Later people found it's also sensitive to TTS artifacts.

NISQA comes from Gabriel Mittag's team at TU Berlin. It includes a NISQA-TTS weight specifically for TTS naturalness. Instead of a single score, it breaks things down into dimensions: noise, coloration, discontinuity, loudness.

UTMOS is from the SaruLab team at the University of Tokyo, winner of the VoiceMOS Challenge 2022, and now the de-facto baseline for TTS scoring. I use it as the outermost backstop.

Finally there's a reverse ASR pass: feed the generated audio through Whisper, compare the transcript to the original script, and reject it if the gap is too big. It's the crudest check, but the most reliable.

Add up the four scores; pass the threshold and it's good, fail and it triggers regeneration. I spent a day wiring it up, and the output was clearly better than running TTS alone.

From 1× to 3× Is Not an Exaggeration

But the cost went through the roof.

Before adding QA, I figured it would add maybe 50% more time. Pick out the bad ones and regenerate. Worst case, the new batch is all bad too, so you run it again. At most 1.5×.

In reality, one hour became three.

The reason is that the model simply can't clear a certain line. The same script, different random seeds, seven, eight, nine tries and it still can't pass the QA threshold. Sometimes you have to fall back to changing the prompt, the speaking rate, the emotion tag, just to squeeze it through. Every regeneration is a full model call, burning tokens each time.

Run this in the cloud, billed by the minute or by the call, and racking up a hefty bill in minutes is no exaggeration. I later did a rough calculation: a voice generation task I had planned to run entirely in the cloud would cost roughly 7 to 10× the demo bill once you factor in retry rates.

This was the math I hadn't done: from demo to production line, costs jump by orders of magnitude, not percentages.

The Same Thing Happened Again on Another Project

Recently I've been playing with another project called MiroFish.

It's a pretty interesting open-source project by Guo Hangjiang, an undergraduate in China. It hit #1 on global GitHub Trending in March 2026, with investment from Chen Tianqiao of Shanda Group. It generates a large population of agents with personalities, memories, and social relationships, then runs them across two simulation platforms where they discuss, debate, form alliances, and shift opinions. Finally, a ReportAgent summarizes the conclusions of the entire evolution to predict how an event will unfold.

My config wasn't large. About 54 agents per event, across a 20-round timeline. Every round requires every agent to run once. 54 × 20, roughly 1,000 full calls.

I used Kimi K2.6 Thinking. The problem is you can't turn off thinking mode; it thinks before every output. Thousands of thinking tokens per call is normal. Multiply by 1,000, and the token burn hurts.

After a few runs, I started wondering: does this scenario really need a top-tier model?

Each agent, on its turn, just scans the context, says a line or casts a vote based on its persona, then gets aggregated. The intelligence threshold for each call is actually low. Swap in a last-year model, something around GPT-4o level, and the results are probably similar, only faster.

The Mid-Tier Slot That No One Has Clearly Defined

For the past year, one question has gone unanswered: what scenarios actually need a second-tier model? Everyone races for the best and most expensive, leaving the mid-tier in an awkward spot.

I now see two very specific slots.

First is quality assurance. Judging whether an audio clip sounds natural, whether an image matches the style, whether a conversation stays on topic, these tasks require mid-tier intelligence. Using a top model here is like using Claude Opus to review GPT-4o's code. It works, but it's not cost-effective. A lightweight vision model plus a specialized scorer like NISQA costs far less than one top-tier call.

Second is large-scale agent simulation. A setup like MiroFish strings together 1,000 inferences to reach a collective evolutionary result. It's not sensitive to the quality of any single call, but extremely sensitive to total cost. The "best model" for this scenario isn't the smartest; it's whatever gives you the best mix of per-token price and inference speed.

These two scenarios hadn't been clearly spelled out because no one actually doing industrialization was batch-producing content at scale. Once you actually need to generate thousands of audio clips or tens of thousands of agent inferences, these two slots jump right out.

The Second Reason for Going Local

This is also when I finally understood why local compute matters so much.

The two reasons usually cited for local deployment: speed and privacy. Both are valid, but neither is the most critical.

The real killer is cost per call.

An industrial pipeline is bound to retry heavily. Cloud TTS is billed by duration, token models by the call. Every retry is another invoice. Local is different. A DGX Spark running open-source models like F5-TTS or VoxCPM incurs zero marginal cost beyond electricity. Leave it running for a day and you get enough material for a week. Failed? Run it again, no big deal.

This is the fundamental difference between cloud and local models in industrial scenarios. The former charges by usage; the latter only charges once for hardware. In a high-retry-rate pipeline, that gap gets magnified by orders of magnitude.

The reason local deployment never made sense in past discussions is that everyone compared it to demo costs. A demo TTS run costs pennies. Set that against a local machine costing thousands, and the math never works. But compare it to industrial-scale costs, factoring in retry rates, QA, and agent simulation, and the math flips immediately.

Three Tiers, Three Positions

Writing this, I suddenly realized that industrialized AI content production actually needs three tiers of models running simultaneously.

Top-tier models at the front, handling the hardest generation tasks. Expensive per call, but you don't run them often.
Mid-tier models for QA and agent simulation, handling high-volume, medium-intelligence tasks. Called repeatedly, so each call must stay cheap.
Local models at the bottom doing the heavy lifting. Asset generation, vectorization, transcription, alignment, the grunt work. If it can run locally, don't send it to the cloud.

You won't find these three tiers in any official tutorial; the setup is still evolving. But once you actually get your hands dirty with industrial content production, you'll end up piecing together this structure yourself.

Looking back at that opening line that sounded like empty talk, I actually think it understated things. In the AI era, what matters most isn't "reliability" itself; it's the cost curve of reliability. From demo to production line, that curve starts at 3×.

Understand that curve, and you know how to spend money. Otherwise you'll budget for 1× and get a bill for 7×.

References

原文链接：https://guanjiawei.ai/en/blog/from-demo-to-production

Three Questions After the AI Job Wave

guanjiawei — Wed, 13 May 2026 04:35:47 +0000

The drive to hire is weak right now.

Before, when you wanted to build something bigger, your gut reaction was "we need more people." Now it's the opposite: "how do we squeeze more out of who we already have?" Even bringing on interns feels less appealing.

It's a small shift in sentiment, but it points to something that isn't reversing.

Two days ago, Kim Yong-beom, policy chief at the South Korean presidential office, posted on Facebook: the "excess returns" of the AI era shouldn't belong only to individual companies; part should flow back to the public as a "citizen dividend." The next day, the KOSPI plunged 5.1%. He later clarified he wasn't suggesting confiscating profits, only discussing how to spend the "excess tax revenue" created by AI dividends. The market settled.

A single Facebook post moving the market 5% tells you the issue is already on the table.

The context is plain enough. SK hynix posted a 72% operating profit margin in the first quarter; spread across employees, bonuses averaged nearly $500,000 per person. Samsung Electronics' semiconductor division logged 53.7 trillion won in operating profit for the same period, but the 74,000 workers represented by the union received a far smaller slice than their SK counterparts. The Samsung union has threatened an 18-day strike starting May 21. The workers aren't after a slightly larger bonus. They want a respectable cut of the AI dividend chain.

I see this as the middle of three questions AI is really throwing at society. Ahead of it is the unemployment question. Behind it is a deeper one about identity. The three are linked.

Question One: Net Job Loss Is the Pattern

First, let's get the definitions straight.

Mainstream reports don't actually agree on "global net job losses." The World Economic Forum's Future of Jobs Report 2025 still predicts a net increase of 78 million jobs by 2030. The ILO's GenAI exposure index uses "transformation" rather than "replacement": one-quarter of jobs globally are touched by GenAI, one-third in high-income countries. The IMF uses the broadest brush: 40% globally, 60% in advanced economies.

But macro models are neat; actual labor pools are not. When a clerk gets "transformed" into an "AI-collaborating high-efficiency role," the macro report counts that as transformation. For the clerk, it's unemployment. New jobs like AI product managers, data governance specialists, model evaluators, and robot maintenance technicians may not be in the same city, may not suit the same age group, and may not absorb the former customer service reps, copywriters, junior programmers, and admin staff. Someone always gets pushed off along the way.

The current data already shows how painful this transition is. In Challenger's March 2026 U.S. layoff report, AI was the top reason given for cuts: 15,341 people, 25% of all monthly layoffs. Tech layoffs in the U.S. reached 52,050 in Q1, up 40% year over year. Goldman Sachs estimates that over the past year AI has eliminated roughly 25,000 jobs a month while creating fewer than 9,000. Net loss: 16,000. Gen Z and entry-level white-collar workers are hit hardest. Research from Stanford Digital Economy Lab also shows that in the jobs most exposed to AI, employment among 22- to 25-year-olds has dropped markedly.

This is what I mean by "net loss." I'm not forecasting global employment in 2030; I'm looking at the real demand for specific occupations, specific age groups, and specific companies right now. When a company realizes that ten people plus AI can do what used to take fifteen, it doesn't first ask a macro model whether new jobs will appear in five years. It freezes hiring, trims headcount, and cuts peripheral roles.

The sectors that traditionally soaked up labor are running in reverse, too. Autonomous vehicles are replacing drivers; drones are replacing delivery riders; industrial robots are replacing line workers. IFR World Robotics data shows global manufacturing robot density doubled in seven years. In China in 2023 there were 470 robots per 10,000 manufacturing employees. New infrastructure no longer naturally brings large numbers of low-barrier jobs the way building bridges and roads once did. Computing centers, ultra-high-voltage grids, battery plants, and dark factories are all capital-intensive and light on labor.

Some pin their hopes on "one-person companies." OPCs have been hyped plenty over the last couple of years, and I do think they'll become a real new organizational form. By June 2025, China had over 16 million one-person limited liability companies; 2.86 million were newly registered in the first half of 2025 alone, up 47% year over year. Shangcheng District in Hangzhou is already piloting OPC community policies.

But judging whether OPCs can carry employment means looking past the headline numbers and anecdotes. Most of those 16 million are traditional self-employed operations and micro-entities that existed long before AI. Two things matter: the growth rate, and what share of that growth can actually sustain middle-class incomes.

The growth rate is eye-catching: 47% year over year. But the distribution is ugly. Industry reports show OPC revenue is extremely long-tail: more than half are still stuck in a product-validation phase earning a few thousand yuan a month; fewer than one in ten steadily clear a million yuan a year. Even with an optimistic 10%, only 600,000 of the 6 million new OPCs each year would reach middle-class levels.

China's 2025 statistical bulletin puts year-end employment at 725 million. Apply the IMF's exposure metric: 60% in advanced economies. Use a more conservative 30% for China, and that's still over 200 million people. 600,000 versus 200 million: two orders of magnitude apart. OPCs will buoy some super-individuals, but they can't hold up the labor market.

Why is this shock so sharp? I boil it down to one word: concentration.

SK hynix posted 37.6 trillion won in Q1 operating profit with roughly 35,000 employees company-wide. That's roughly 1 billion won in operating profit per employee for the quarter, over 4 billion won annualized, or more than 20 million RMB per person. Not every employee actually creates that much, but the number makes it viscerally clear how few hands the AI dividend is squeezed into.

Xiaomi isn't as extreme, but the direction is the same. In 2025 the group recorded 457.3 billion yuan in revenue; its smart EV and AI innovation business contributed 106.1 billion yuan at a 24.3% gross margin, delivering 410,000 vehicles for the year. Automakers used to compete on production capacity; now they also compete on algorithms, supply-chain software, automated production lines, and data loops. Capacity is scaling up; headcount isn't scaling with it.

Since the Industrial Revolution, every major industrial wave has pulled new job chains along with it. Labor scale and industrial scale mostly moved together. This AI wave runs the other way: the greater the output, the fewer people it needs. Chips, cloud, models, data centers, plus the handful of teams that can push AI to its limits. These swallow most of the dividends.

You can think of it this way: one person out of a thousand, armed with super-productivity, flattens part of the work that the other 999 used to do. Not all jobs are erased, but the demand curve flattens. I don't see any new direction that could regenerate labor demand on that scale in the same window.

Accepting this is prerequisite to discussing the next two questions. Net job loss isn't a panic slogan; it's a magnitude mismatch already happening in local labor markets.

Question Two: Distribution Needs a Reset

Which brings us to the second question: distribution.

The Korean incident is a template for the distribution question. Once AI dividends concentrate in a handful of companies, who gets them? Shareholders, executives, core engineers, all employees, or society at large through taxes and public spending? This question will inevitably spread from Korea to Japan, Europe, and China.

If concentrated dividends are still distributed under old rules, the outcome is almost predetermined. The bulk goes to shareholders and top management; core employees get hefty bonuses; ordinary workers get fixed salaries; and most people outside the supply chain only feel rising prices, rents, fewer openings, and tougher competition. This structure was already widening inequality; AI only steepens the curve.

The group squeezed hardest isn't the lowest earners. It's the middle class, especially those living on salaries, professional skills, and stable jobs.

The reason is simple. Wages are the most perfectly taxed form of income; there's nowhere to hide. In China's individual income tax system, comprehensive income faces progressive rates from 3% to 45%. Salaries are withheld at source, and social insurance plus tax are deducted before the money ever hits your hands. High-salary workers should certainly pay tax. The problem is that wealthier people have far more types of income: capital gains, equity incentives, dividends, corporate structures, trusts, family offices, cross-border arrangements. I'm not saying these are illegal. I'm saying they have more choices of tax base and more room to defer.

This structure was already obvious in the internet era; in the AI era it will only get worse, because the core gains of the industry concentrate in fewer hands. The EU Tax Observatory's global tax evasion report also notes that billionaires worldwide face extremely low effective tax rates relative to their wealth. The more mobile wealth becomes across borders, the harder it is for any single country to carry out redistribution alone.

The other side of the middle-class squeeze is how fast they are being replaced. The jobs AI currently hits easiest are middle-class occupations: programmers, designers, customer service reps, junior legal staff, junior analysts, copywriters, translators, operations specialists, admin staff, finance assistants. They shoulder the heaviest taxes and face the fastest replacement. They are the ones hurting most in the current structure.

Back to Korea. The Samsung and SK unions aren't fighting over a one-time bonus; they're fighting over a long-term rule. The companies will only offer a "special bonus." The unions want the profit-sharing ratio locked into a formal agreement that takes effect every year. On the surface it's about the bonus amount. In reality it's about whether this distribution rule will still hold next year.

Using "excess profits" or "excess tax revenue" for redistribution isn't a new framework in itself. Nordic countries have been running this for decades. Denmark's top marginal income tax rate is pushed to 60.5% in 2026. Sweden, Finland, and Norway have long maintained high labor-tax burdens and public services. The OECD's Taxing Wages also shows that the average tax wedge on labor in European countries is markedly higher than in the U.S. or Korea.

But the AI era introduces a new problem: productivity itself can move.

Heavy-asset, fab-heavy players like Samsung and SK hynix can't move; the Korean government can at least capture some corporate income tax and supply-chain revenue. But the more typical AI business doesn't look like that. Compute is rented in Singapore; the company is registered in Ireland; the team is spread across five time zones; settlements run through global payment networks. Teams of three to five people generating hundreds of millions in revenue will become more common, and nations have far fewer levers to tax them than they had with traditional manufacturing.

So an "AI tax" can't be read as simply slapping higher taxes on a few companies. It's more like a bundle of questions. What is the tax base? Compute, profits, capital gains, data, or the labor costs displaced by robots? And who receives the revenue? Is it poured into new infrastructure, or used to shore up social security, education, health care, pensions, unemployment insurance, even direct cash transfers to residents?

What needs guarding against here is path dependence. Many countries have grown used to propping up the economy with investment and infrastructure, but AI-era infrastructure may not prop up employment. Building more computing centers, data centers, ultra-high-voltage grids, and battery plants will likely continue to raise the productivity of leading firms, benefiting capital and a narrow slice of high-skill jobs, while doing little directly for displaced middle-class and low-income workers.

This is why people at OpenAI have been talking publicly about UBI for years. OpenResearch, funded by Sam Altman, ran a three-year experiment in Texas and Illinois: 1,000 low-income participants received $1,000 per month, alongside a control group of 2,000. The results, published in 2024, weren't miraculous. Recipients worked an average of 1.3 fewer hours per week, had a 2 percentage-point lower employment probability, and saw household income excluding subsidies decline. But they were more proactive in looking for work, valued meaningful work more, had more room to relocate, see doctors, and plan long term, and were more likely to have entrepreneurial ideas.

This experiment matters, not because it proves UBI is right, but because it drags the debate from slogans back to data. Cash doesn't automatically make people stop working, nor does it automatically give them dignity. What it provides is a buffer and choice. For a society with excess productivity and rapid job restructuring, choice itself may be infrastructure.

I don't think UBI is the standard answer for the AI era. But it's one of the few options that has been seriously tested and has data behind it. Compared with patching old rules, it at least offers a different starting point.

Question Three: Where Does Value Come From for People Who Don't Work?

This is the hardest of the three. The first two can still be moved forward with policy, tax systems, and redistribution. This one cannot.

In the Chinese context, "not working" is a very heavy verdict. At family gatherings, when someone asks "What do you do?" the expected answer is an occupation. If you reply, "I don't currently have a job," the atmosphere changes instantly. This isn't just about face.

The National Bureau of Statistics' 2025 bulletin lists 5.95 million urban residents and 33.4 million rural residents on subsistence allowances at year-end. China does have a welfare system. But subsistence allowances and relief still carry stigma in many places. Families who qualify but don't apply have always existed. The reason isn't insufficient money; it's the fear of being whispered about for "living off handouts."

This sense of shame runs deep. Our generation grew up on a narrative that said "Work hard and you'll be rewarded; effort deserves respect." Education, media, and the people around you all tell you the same thing: your value equals your output. Labor is the anchor of identity; salary is the measure of it. I wrote a piece on AI anxiety before, touching on the other side of this. When AI fortune-telling trends and young people flock to mysticism for certainty, what's really happening is that this anchor is loosening.

AI has simply used up the expiration date of this narrative ahead of schedule. What it truly shakes isn't just income. It's the sense of identity. You receive a basic income, food and housing are covered, friends respect you, but you wake up with nothing to look forward to. That hollowness is something policy cannot answer.

A society whose time has been freed by AI doesn't lack welfare distribution points. It lacks a narrative that can give people a new identity. This isn't something engineers can code or models can compute. It requires telling a new story about what kind of person is worthy of respect and what kind of life is decent.

A thousand years ago the story was "study to become an official"; a hundred years ago it was "industry saves the nation"; thirty years ago it was "go into business." Over the last decade or so it was "get into a big tech firm," "buy an apartment," "start a company," "financial freedom." What it should be in the AI era, no one can give a clean answer.

Closing

The three questions are linked. The first makes the second urgent. No matter how well the second is handled, the third cannot be avoided.

The market tremor triggered by that May 11 proposal in Korea is only the opening act. I expect these discussions to spread to Japan, Europe, and China in the second half of the year. Every country will craft different answers based on its own politics and culture. AI taxes, UBI, tax-base reforms, new infrastructure, promoting one-person companies—each will have its trial runs. Trial and error itself is part of the answer.

What individuals can actually do isn't complicated: build more skills, keep more capital on hand, and don't let any single narrative sweep away your emotions. What society must do is harder: stop brushing things off with "new jobs will always appear," and stop pushing the unemployed back into shame. No one can answer all three questions at once. We can only walk through them one by one.

References

原文链接：https://guanjiawei.ai/en/blog/three-questions-after-jobs

AI Vanguard: 10 Weeks Left

guanjiawei — Tue, 12 May 2026 13:11:13 +0000

A few recent numbers, taken together, are quite telling.

Sam Altman has clearly been energized since the GPT-5.5 launch. Codex weekly downloads surged to 90 million. Paid users climbed from 3 million in March to over 4 million by late April. My own gut feeling lines up with that. Before 5.5, I had one $200 Codex account; now I have four, at $800 a month. Many are saying 5.5 shouldn't be called 5.5—it should be GPT-6. I once posted that "this is not a minor version," and that claim is being validated.

The Anthropic side is even more extreme. Dario Amodei did the math in a CNBC interview last week. From the start, they bet on exponential AI growth and built infrastructure for "10x per year." Q1 growth annualized to 80x. Annualized revenue ran from $9 billion at the start of the year to $30 billion in April. Infrastructure got crushed, so they signed a deal with SpaceX to lease the entire 220,000 GPU Colossus 1 data center in Memphis, unlocking 5-hour usage limits for Claude Pro / Max users.

That's the backdrop.

Against that backdrop, I spent a few hours yesterday talking with a friend. He uses AI heavily, knows his way around Terminal, builds things on open-source frameworks like OpenClaw, and is a power user at his company. A firm offered him a role as an AI transformation expert and he asked what I thought.

After the conversation, I realized a few judgments from it are worth pulling out on their own.

1. When Evaluating an Offer, Look for "Unrestricted Access to the Strongest Models"

I told him that at this particular moment, the offer itself isn't the most important thing. You're already in a good spot.

What truly matters is whether the platform can give you resources for near-unlimited use of the absolute best models, plus enough freedom to tinker in whatever direction you want.

What do I mean? For someone like me with a bit more tenure, I have ways around resource limits—I can burn $1,000 a month out of my own pocket for top-tier access and set my own direction. But for younger people just getting fired up about AI, the company still matters. Because you're creating value for someone else during work hours. If the model isn't the best, you're always one step behind. You can feel the gap, but you can't close it.

Without that foundational access, it's actually very hard to be an AI transformation expert.

2. Stop Running GPT-5.5 on Medium Effort

The second thing—I guessed right. I asked him, when you use GPT-5.5, you don't crank effort to the max, do you? To save money, control costs, do "routing," and dial effort down? Or just switch to Kimi, DeepSeek, Minimax?

He confirmed it. Most people I know use them that way.

I think there's a problem here.

I've reached a conclusion through repeated trial and error. Back with Opus 4.6, dropping effort from high to medium on the same model slashed accuracy from around 80% to roughly 30%. Same model, just one effort tier lower, and the entire workflow performed completely differently. After that, I never used medium effort again; Claude has stayed on x high ever since.

With GPT, I've been on x high from day one. The reason is simple. If a task runs well in Claude Code, I generally won't bother trying GPT. The moments that made me think "this thing is genuinely different" all came at maximum effort. That was true in the 5.4 era, and the gap is even more pronounced since 5.5.

So in one sentence: don't optimize costs prematurely at this stage.

Let's do the math. One top-tier model, one account, $200 a month. Two accounts give you more than enough headroom to run nearly 7*24 on a single vertical task. That's $400 a month—about 3,000 RMB. That's cheaper than hiring an intern, but the output approaches PhD-level researcher quality. What is the point of saving that small amount of money?

I currently spend $1,000 a month on myself, rotating across five accounts. If I switched to API calls at the same intensity, it would cost roughly $10,000.

Why not optimize? Because right now you don't know where the strongest model's boundary lies. The researchers don't know either—they haven't tested it in your domain. This is uncharted territory. What you need to do is take the strongest, most expensive, highest-effort setup and slam against that boundary, pushing it to the limit. If it truly can't do something, you'll know for certain.

As for "why can't we control costs first like traditional software"—wait until everyone has mapped out the boundaries and we're in the mass-deployment phase, then consider cost-effectiveness. We're not in that phase right now.

3. Ten Weeks Left

My friend asked: What about time? Can't I just take more time and experiment slowly?

I said, let me do the time math for you.

The world hasn't been broadly stunned yet because top-tier models are coming too fast, too dense, with intervals too short. From my own use, I've found something counterintuitive. Even with the most elite models like GPT-5.5 and Claude Opus, cranked to maximum effort, it still takes time to run a valuable direction to completion. It's fast, but not so fast that you "think it and it's instantly done." It proceeds step by step. What used to take a team months gets compressed to weeks. But you still have to walk through the steps.

Plot this on a timeline:

GPT-5.5 launched on April 23—two weeks ago.
At that point, a cohort already realized this time was different and started running their most ambitious directions on top of it.
Conservatively, within three months, a batch of things will ship from unimaginable people, unimaginable directions, unimaginable domains.

Three months = 12 weeks. Two have passed. Ten remain.

In ten weeks, remarkable results will enter the world and make everyone realize "the world is different." These results won't wait for you. In ten weeks, your market position, your label, your place in the seniority hierarchy may all need to be recalculated.

Is ten weeks long? No. It's short enough that you cannot afford to waste a week building an inefficient workflow or hesitating over which tool to use.

4. Don't Use IDEs, and Don't Mess with Third-Party Frameworks

Third topic: So what should I use?

My answer might offend some people, but I want to be clear.

Don't use IDEs. In the coding-agent scenario, the IDE is, in my view, a form factor that has already been obsoleted. It's especially unfriendly to people who aren't already senior programmers. You have to spend time learning a complex interface that means nothing to you, and in the end you still can't read the code the agent writes. Squeezing a tiny Terminal window in the middle—this form factor is fundamentally awkward.

The name "Terminal" has done Claude Code and Codex a disservice. When many people hear CLI or Terminal, they immediately think "programmer stuff." But after I helped some friends with zero programming background install Claude Code and Codex, not a single one said "I can't figure this out." It shows you what's important, compresses unimportant steps into logs, and you just judge at the key decision points. Beginners actually pick it up faster than an IDE.

If you can avoid it, don't use third-party frameworks like OpenClaw or Hermes. This is a bit counterintuitive. I have indeed helped people install these before. But looking at it now, Terminal plus the official CLI has matured to the point where it can do everything those frameworks do, and better.

Why? Because the official CLI is tailored to the official model's behavior. Claude Code connects to Claude models; Codex connects to GPT models. Caching mechanisms, error recovery, risk guardrails, context compression—all tuned for that specific model. Switch to "using OpenClaw with GPT" or "using Claude Code with Kimi," and it may run in theory, but in practice the effect is noticeably worse.

A recently popular open-source project proves this point. Someone built a Deep Code CLI specifically for DeepSeek V4, similar in form to Claude Code but tailored solely for the DeepSeek model. Many find this counterintuitive—aren't relays supposed to "connect to every model"? This path is actually the right one. Models have their own behavior; a carrier customized around a specific model delivers better results and cost efficiency.

5. Never Use a Completely Black-Boxed Agent

OpenClaw has another "advantage" that I find dangerous. It can dispatch tasks remotely and deliver results without you watching the process. Sounds great.

For some people this is a good thing. But for those exploring the boundaries, this feature should be disabled.

The foundation of collaborating with an agent is understanding. What kinds of work it does well, what it does poorly, what cognitive habits it has—you only learn these by watching it work. Once it becomes a black box, what you lose is judgment, not just the details.

Managing AI is like managing a new hire. The fastest way to learn is to watch them do every step. Treating it as a wishing machine that gives you results won't make you a stronger leader.

6. Want to Operate Anywhere, Anytime? SSH + tailscale + tmux Is Enough

My friend said another reason he likes OpenClaw is that it can be controlled from a phone, letting him dispatch tasks anytime, anywhere. This point has been overlooked far too much, so I'll address it specifically.

If you only use one laptop, feel free to skip this section.

If you want to control your home desktop from your phone, the required infrastructure is actually very mature. SSH is an ancient protocol that lets you log into one machine from another with high privileges. tailscale is a free virtual VPN that adds your desktop, laptop, and phone to the same VPN so they can talk directly via stable internal IPs. tmux is a background session tool: open a session on the desktop, cd into the project directory, launch Claude Code or Codex, and that session runs forever in the background. Disconnecting from the network or turning off your phone doesn't affect it. You can attach anytime to check progress.

On the phone side, pair it with a terminal app like Termius and connect in. The whole setup takes less than an hour.

The workflow after setup looks something like this: Before leaving home in the morning, drop a task into the desktop's tmux session. During the commute, attach from your phone to take a look; if progress looks off, adjust direction. While you're in meetings at the office, the agent keeps running. At lunch, attach again to review and give more feedback. When you get home, pick up right where you left off at the computer.

The entire chain is seamless. I currently have about 50 hours a week where the agent works on its own, out of sight, but I have a clear sense of what it's doing. The phone is the controller.

This kind of infrastructure is common among programmers, but it's severely undervalued in the context of "using AI to drive work transformation." It lets you scale from one laptop to a building's worth of compute without any middleware.

Putting All of the Above Together

Back to my friend's original question: Should I take the AI transformation expert offer?

I didn't give him a direct answer; I gave him my decision framework. First, see if that company can give you near-unlimited access to the strongest models. If not, the offer's value is limited. Second, in your daily work you must use the top-tier model at maximum effort. The vanguard phase is no time to save money. Third, there are only about ten weeks left; don't waste them on inefficient toolchains or debating whether "third-party frameworks might be better."

It all comes down to one sentence. Right now, no one knows the true boundary of the strongest model. What you need to do is not optimize costs, not adapt to existing workflows, but take the most powerful tools available and slam into that boundary to see if it can be pushed outward.

In ten weeks, the world will be shaken by a wave of unexpected breakthroughs. At that moment, the last thing you want is to look back and realize: these past few months I spent tweaking IDE configurations and optimizing model routing costs.

Time is the most expensive resource. Attention is second. Money is last. Don't get this order reversed for the next ten weeks.

References

原文链接：https://guanjiawei.ai/en/blog/ai-vanguard-ten-weeks

Won Over by Cheng Li-wun: What Should the New Generation of Leaders Look Like?

guanjiawei — Tue, 12 May 2026 13:09:46 +0000

I've been won over by Cheng Li-wun lately.

Cheng Li-wun is the new KMT chair. On October 18, 2025, she won with 50.15% of the vote — only the second woman to chair the KMT, and the first chair to come from a DPP background. After winning, her position was crystal clear: return to the 1992 Consensus, oppose Taiwan independence, push cross-strait exchange through peaceful means.

Last week, April 7 to 12, she led a delegation to the mainland — the first KMT chair-led visit in ten years. The previous one was Eric Chu meeting Xi Jinping in 2015. This time she met Xi at the Great Hall of the People — a high-level reception.

The visit itself was a major event. But what really moved me was what she did after returning to Taiwan.

She Doesn't Do Traditional Press Tours

A typical politician comes back, holds a press conference, chats with state media. She did something different.

She went on livestream. On the night of April 16, she connected with Taiwanese internet personality Chen Zhihan ("Guan Zhang") for over two hours. They covered everything — Xiaomi cars, Taiwan's youth brain drain, mainland smart manufacturing, the gap between the two sides. Direct. No talking points.

Picture this: a politician at the absolute center of cross-strait attention, fresh from being received by the highest leader on the other side, comes home and instead of going to mainstream media, sits down with a streamer and just talks for hours on camera.

Counterintuitive. But also natural.

I watched a stretch of it. What she said had real impact. She wasn't reading prepared lines — she was actually thinking through it, actually saying things she believed. People like that are rare in politics.

She and Guan Zhang weren't a one-off pairing either. Back in June 2025, they had already started something called the "Non-Party Opposition Alliance," pulling people from across the political spectrum into dialogue. The livestream was a natural extension.

The Risk She's Carrying

The other thing I respect is the risk.

Pushing cross-strait peace right now is dangerous, period. The American side has been running operations for years — espionage, infiltration, assassinations — and it has never stopped. None of it acknowledged in public, but always running. As the most visible person pushing on this issue, she's a target by definition.

Her willingness to step forward, push this visit through, come back and keep extending the work — that's not nothing. The risk to her physical safety is real, not rhetorical.

The Lei Jun Segment Says Something Specific

The part of the livestream where she talked about meeting Lei Jun stuck with me.

She said the last day of the trip was a visit to the Beijing Xiaomi car super-factory. Lei Jun received her personally, she test-drove the YU7, and she even got a Xiaomi phone. She said she's a Lei Jun fan, and her husband is even more so. Their household runs on Xiaomi — cups, phones, wristbands, backpacks, tablets, all of it. When Lei Jun was explaining a Xiaomi cup set at 29.9 yuan, she laughed and said "everything in our home is Xiaomi."

This segment says one specific thing.

A politician just received by the highest leader on the other side comes home and openly says she was thrilled to meet Lei Jun, because she's a fan. Who is Lei Jun? An entrepreneur. Someone expressing himself daily on Douyin and Weibo. Someone continuously extending his identity through products and content.

The identity he has built in the digital world has this much energy: it can make one of the most politically influential people in the region voluntarily declare herself a fan.

The reverse is true too. Why has Cheng Li-wun been able to generate so much discussion so fast? Not through official statements. Through livestreams, dialogues with internet personalities — pushing her thinking directly to people, her own way.

This reinforced a take I've held: digital identity is the biggest individual leverage we have right now. Cuts across industry, cuts across direction. Politicians, entrepreneurs, creators, researchers — whoever does it earliest comes out ahead.

This Era Especially Needs Political Thinkers

Bigger point.

Everyone talks about AI like the technology by itself solves everything. I'm increasingly convinced it doesn't.

This wave of AI is changing economics — the underlying logic and the production formula are shifting hard. Every historical shift of this magnitude has triggered a reshuffling of society. Power gets redistributed, wealth gets redistributed, sometimes nations even go to war.

The thing is, the technology itself is neutral. People all over the world are using more or less the same AI; the threshold isn't that high. But look around: Silicon Valley engineers in stable environments working out how to use Claude to boost research efficiency; parts of Africa where people don't have drinking water; Ukraine living daily with the threat of incoming missiles.

Why?

Not because they can't get the technology. Because the modes of organization, the modes of distribution, the political structures are completely different.

So what really shapes a society or a world has never been only the scientists. Scientists matter, but they're not the only narrative. King's "I have a dream" pushed racial equality. Gandhi did nonviolent resistance. Deng took China from absolute poverty to a different state. What all of these people did was change how humans organize, distribute, and relate to each other.

That kind of work matters more than ever right now.

An Era When the Rites Have Broken Down

Bluntly: the world is entering a new phase.

I've told friends this — the configuration looks very much like the transition from late Spring and Autumn into the Warring States. The rites have broken down.

The old world order was held by the US: IMF, UN, World Bank, cultural exports, the whole package. Not necessarily fair, but at least everyone was paying surface lip service to the rules. A bit like the Spring and Autumn period: feudal lords going to war still needed a pretext, still had to invoke the Zhou king for legitimacy, still had to dress their actions in the language of rites and morality.

What Trump is doing now is no longer that.

Venezuela is one example. On January 3, 2026, US forces struck inside Venezuela and arrested President Maduro, who is now held in the US to face drug-trafficking charges. Trump immediately announced a "historic energy agreement" with Venezuela — his exact words were that the US would now "run Venezuela," and that Venezuelan oil interests would belong to the US. Venezuela's proven oil reserves are about 303 billion barrels, the largest in the world.

Iran was even harsher. On February 28, 2026, the US and Israel jointly killed Iran's Supreme Leader Khamenei. There was no formal declaration of war, no congressional authorization. Picture this: two countries already tense but not in a state of declared hostility, and one suddenly kills the other's top leader, then announces it as a successful military operation. Any head of state watching that gets cold chills.

Greenland is another thread. He has openly said he won't rule out using military force to take Greenland from Denmark, and threatened a 25% tariff to push Denmark to hand it over. He temporarily walked it back after Davos in January 2026, but the posture is already on the table: I want it, you'd better hand it over.

Put these three together and the signal is clear: between major powers, the pretense is gone. Whoever has the bigger fist makes the rules. Morality, rules, procedure — all of it can be thrown out.

This is what late Spring and Autumn felt like.

A Hundred Schools Contending

But late Spring and Autumn had another side: the contention of a hundred schools.

When everyone is stable and settled, ideas and new theories of governance don't have a market — nobody needs them. But when the existing modes of organization start collapsing, while a new economic foundation is being born at extreme speed, that is exactly when ideas have to appear.

That's right now.

One side: the old order is breaking down. The other: a new economic foundation from AI. Both sides are moving.

What this era needs, in my view, is not just scientists. We need more thinkers, more political practitioners, more people experimenting with modes of organization — people exploring what kinds of social arrangements fit this technology, this configuration, this reality.

That's the biggest thing Cheng Li-wun has stirred in me. Her approach is exactly what this era calls for: direct dialogue, livestreaming with internet personalities, thinking out loud on camera. Carrying personal risk to push something forward — that, too, is the posture this era needs.

The Form Doesn't Have to Be Fancy

When it comes to transmitting ideas, there's an easy wrong turn now: trying to make the form fancy, very produced, full of innovation cues. A short video, an animation, an interactive installation. All impressive — but not the only path.

Words and writing themselves carry enormous power. No matter how saturated short video gets, that doesn't go away.

What Cheng Li-wun did was: sit down, turn on the livestream, talk about what she's seeing, what she's thinking, how she feels. No editing, no filters, no script. But in those few hours of conversation, the volume of information and the resonance she transmitted exceeded a hundred carefully produced PR videos.

China Is in a Window Right Now

Back to ourselves.

Against the backdrop of a violently shaking world, China right now is a relatively stable patch. Outside is fighting hard, inside is relatively calm. Tense outside, loose inside. That itself is a remarkable achievement — and a precious window.

In this window, there is already a wave of very young people doing real political work on the front lines. At the county level, the village level, the district level — young leaders running real experiments and finding new ways to express themselves.

But that's still far from enough.

The hundred-schools moment in Spring and Autumn wasn't everyone going their own way and talking past each other. It was dozens of small states and hundreds of thinkers, each running experiments in their own corner, then converging — debating, exchanging, comparing results. Confucianism, Daoism, Legalism, Mohism — they all grew in that environment. If today we only have scattered practice with no exchange or collision, the hundred schools can't take off.

So I hope more people — researchers, founders, political workers, local officials, content creators — bring out their own thinking, run experiments, use digital identity to put it out there, find both resonance and disagreement.

What one person can change is amplified now. It's not that nothing can be done — it's that there's an extraordinary amount you can try.

What Cheng Li-wun did is not a grand-power strategy. It's one visit, a few livestreams, a few conversations. But it really did change how a lot of people felt and thought.

We can do the same.

References

Cheng Li-wun elected KMT chair — October 18, 2025, won with 50.15% (65,122 votes). Second woman to chair the KMT, first chair from a DPP background.
Cheng Li-wun's delegation arrives on the mainland — April 7–12, 2026 "Journey of Peace 2026," visiting Jiangsu, Shanghai, and Beijing.
Xi Jinping meets Cheng Li-wun — April 10, 2026 at the Great Hall of the People, the first meeting between KMT and CCP leaders in nearly a decade (the previous one was Eric Chu and Xi in 2015).
Taiwan's opposition leader arrives in China for a 'Journey of Peace' — NPR coverage of the trip.
The Domestic Politics of Cheng Li-wun's China Trip — The Diplomat's analysis of the trip's political significance.
Cheng Li-wun visits Xiaomi factory; Lei Jun receives her personally — April 12, 2026 visit to the Beijing Xiaomi auto super-factory; test drive of the YU7.
Cheng Li-wun openly a Lei Jun fan — Publicly stated she and her husband are Lei Jun fans; their household is full of Xiaomi products.
Cheng Li-wun's livestream with Chen Zhihan — April 16, 2026 evening; over two hours of livestreamed conversation covering Xiaomi cars, Taiwan's youth brain drain, the cross-strait gap.
Founding of the Non-Party Opposition Alliance — June 2025; co-founded by Cheng Li-wun and Chen Zhihan, gathering different parts of the political spectrum into dialogue.
US strike on Venezuela and arrest of Maduro — January 3, 2026 US military strike inside Venezuela; arrest of President Maduro.
Trump Claims 'Historic' Venezuela Oil Deal After Maduro Arrest — Military.com; Trump announces taking over Venezuelan oil.
Assassination of Ali Khamenei — Wikipedia — February 28, 2026 joint US-Israel operation kills Iran's Supreme Leader.
U.S. and Israel launch a major attack on Iran — PBS coverage of the operation.
Greenland crisis — Wikipedia — Trump threatens 25% tariff on Denmark and won't rule out military means to take Greenland.
Proposed United States acquisition of Greenland — Background on US sovereignty intentions toward Greenland.

原文链接：https://guanjiawei.ai/en/blog/zheng-liwen-hundred-schools

There Are Only Two Ways to Start Vibe Coding

guanjiawei — Tue, 12 May 2026 13:09:06 +0000

The question friends and colleagues have been asking most lately: Vibe Coding sounds pretty cool, but where exactly do you start?

It's a good question. AI Coding is a productivity tool, not an entertainment tool. If you try to tinker with it in daily life, you'll mostly be at a loss. The vast majority of daily needs have already been saturated by cheap or even free apps over the past decade or so. Want bubble tea? There's Meituan. Want to edit videos? There's Jianying (CapCut). Why vibe-code another one yourself?

My answer only offers two paths: either start from work, or start from building your own digital identity. There is no third path.

1. Start from Work

Let me first explain why work.

Tools should grow wherever human attention is focused. Whether it's eight hours or ten hours a day, that's the time you're compelled to produce value. The pressure is high, the feedback is direct. There's no better testing ground for transforming your work paradigm.

Scenarios like playing ball, cooking, or binge-watching shows seem more "free," but the actual time you invest weekly is pitifully thin, and your attention isn't sufficiently attached. Vibe coding doesn't easily take root in these places because your mind isn't really there.

Work is the opposite. You spend the vast majority of your time immersed in it. Off-the-shelf tools mostly cost money and aren't particularly good. Output gets validated by others, feedback doesn't disappear. Room for improvement is concrete, not imagined.

My own metric is AI penetration during work hours. Early on, I was probably only spending 10% of my time interacting with AI, with the remaining 90% being my original way of working—that state basically equals zero penetration. Later I gradually pushed it above 80%. When penetration is high, the only things left in my day that still use the old paradigm are purely human activities like meetings, signing off, and talking with clients.

This number is actually closer than you might think. In Stack Overflow's 2025 Developer Survey, 84% of developers are already using or planning to use AI tools, with 51% using them daily. Anthropic's own Economic Index data is even more direct: 46% of conversations on Claude.ai are work-related, and on the API side it's 74%. Heavy users treat it as a work partner, not a toy.

But pushing penetration from 10% to 80% isn't just a matter of adding AI to your workflow. It requires breaking apart your current tasks and redistributing them: which segments let AI write, which let AI research, which let it run on its own, and which must be monitored by humans. This restructuring is the most natural training ground for getting started with vibe coding.

2. Start from Building Your Digital Identity

If you can't find an opening at work right now—say the process is too rigid, or the team isn't ready yet—the second path works just as well. Build your own personal digital identity.

Why place it second, outside of work? Because its ROI is absurdly high.

The investment is so small it's barely worth calculating. A personal website can now be built with vibe coding in a day or two.

The potential payoff is compounding. What you write is yours, not the platform's. The same piece published on Zhihu, WeChat Official Accounts, Geekbang, X, and Xiaohongshu (RED) represents five completely different exposure formats. Each platform's algorithm is different. The probabilities of hitting a viral piece stack together, making it far more stable than betting on a single platform.

I cold-started for just over a month, averaging about 2,000 daily views across all platforms. This number actually means very little. What matters is the distribution. I had an article about software bidding and procurement that got pushed to over 20,000 reads and hundreds of comments on Zhihu. Many other articles sit at just dozens to hundreds of views on each platform. But as long as your inventory is large enough, one or two will eventually hit. That's the compounding effect of distribution.

Here's a counterintuitive fact. profy.dev surveyed over 60 technical hiring managers: 93% will look at a candidate's portfolio website, but 51% flat-out said "not having a portfolio website doesn't lower a candidate's chances." What does that mean? Static showcase "vanity sites" are largely useless. What actually works are vehicles that let content gradually accumulate. GitHub profiles, blogs, project notes—these things with "output traces" are what constitute a digital identity.

Applied to vibe coding, this path is now much easier to walk. AI has compressed the most tedious parts of the content pipeline to near-zero cost: writing, reformatting, adapting for different platforms, generating different media. What used to take half a day for one person to spread an article across five platforms now takes ten-plus minutes. Even investing just 30 minutes daily adds up significantly over the long term.

As for "why you should build a digital identity," I wrote a previous piece that covers it more completely, so I won't repeat it here.

3. The Prerequisite: It Must Be the Real You

My friend followed up with a second question that I think deserves more elaboration than "where to start."

He said: I feel a bit hesitant about putting my own stuff out publicly online. What should I post? How should I post it?

My answer has only one rule: post, but what you put out must be the real you.

Don't perform, and don't chase traffic by imitating a style you don't even endorse yourself. This is where long-term ROI most easily crashes.

Many people misunderstand, thinking that building a personal identity means "performing a better version of yourself." This path only works for full-time influencers who depend on traffic for their livelihood. That's their survival mode. For most people, digital identity is a compounding bonus beyond work, not their main livelihood. Once you start performing, you incur an invisible liability: every piece you publish is a "raw archive" that may be dug up and cross-examined someday in the future.

The fact that the internet has memory is, I think, grossly underestimated.

The classic example is Justine Sacco. In 2013, she casually tweeted before flying to South Africa, with only 200 followers at the time. Eleven hours later when her flight landed, she was trending worldwide amid outrage and was fired upon returning to her company. Follower count is no shield. James Gunn was fired by Disney in 2018 over old tweets from 2008–2009—a decade prior. Kevin Hart stepped down as Oscars host the same year, also over decade-old jokes that were dug up.

Once your influence crosses a certain threshold, past content gets examined under a magnifying glass.

So before I publish anything, I ask myself: at that point in time, is this the real me? Is this an opinion I'm willing to claim later when I look back?

If yes, publish. Even if your thinking changes later and you find it naive—that's fine. People naturally change, and looking back at youthful naivety is normal; it's part of your history. But if you couldn't even endorse what you wrote at that point in time, looking back it becomes a stain. These stains accumulate and become constraints holding you back.

Posting your authentic self has another hidden benefit. It forces you to think clearly about "what is the real me." The value of this process itself may exceed the exposure the content generates.

Not Posting Means Zero

Back to my friend's original question.

Two paths: starting from work is the most natural, since your attention is already there; starting from digital identity is the most cost-effective, with small investment and compounding returns. Whichever one you can start doing immediately, take it—it's far more effective than agonizing for six months.

On digital identity, I also said in my last piece: if you don't do it, the opportunity just hangs there waiting. The probability of any event hitting you is zero. If you don't buy a lottery ticket, the probability is zero; if you buy one, at least there's a probability.

But rather than grand narratives like "digital identity is leverage," what I want to emphasize more is: start today, don't wait. The barrier is already ridiculously low. Few things are so universally applicable across industries and effective for everyone. There's only one prerequisite: that it's the real you.

Everyone should do it. Musk, Lei Jun—they're all doing it, even Trump is doing it. You're using the same infrastructure, the same medium as them. There's no reason you shouldn't do it.

References

原文链接：https://guanjiawei.ai/en/blog/where-to-start-vibe-coding