Forem: port

Simple just works: how i built puddleswap

port — Wed, 20 May 2026 11:18:48 +0000

Any problem yields to enough complexity.

I caught myself almost doing exactly that on puddleswap. Here's how that went, plus the gut-check I run now before writing anything clever. If you ever feel yourself overengineering things, this is for you.

I was at a Monad Blitz event, if I am not mistaken it was the one in Ankara, and I was watching everyone around me hack on cool stuff while I sat in the corner answering their questions. I mean that's my job but it felt weird not building stuff.

So at some point I figured I should just build something(while talking to people at the same time lol). Something simple enough that the brag would be how little it took, not how clever it was.

That's how puddleswap happened. A no-bs DEX on Monad testnet, the kind a weekend buys you.

Going in, I wanted the fewest moving parts I could get away with. The thing I'd be most proud of would be how little there was to maintain.

Most of the actual work was done by an AI agent. It wrote the React frontend, deployed the contracts, and put together the swap UI. The contracts are stock Uniswap V2, audited a thousand times over the years(centuries in web3) and not something I wanted to fork. The frontend is Vite plus React with no backend anywhere. The swap accepts real Circle USDC, a mock USDT we deployed for testnet liquidity, and WMON. A small rebalancer service keeps the price pegs roughly honest.

It's live at app.puddleswap.org.

The build was mostly uneventful. The agent did its thing, I reviewed diffs, we iterated. What I want to talk about is the one decision I almost got wrong: the routing.

The thing I almost overengineered

Standard answer for "how does a DEX UI route swaps" is a graph algorithm. You have N tokens and M pools, build the liquidity graph, run shortest-path weighted by output amount, return the best route. 1inch and Matcha both work this way and every aggregator article online tells you to do the same, so I started writing it.

Then I looked at my actual data.

Three "core" tokens: USDC, USDT, WMON. Maybe ten pools, every one of them touching at least one core. I was writing a graph algorithm to solve a problem I didn't have.

So I deleted it and wrote this instead (s/o to @danielvf for the idea + the initial PRD).

The enumeration

For any swap A → B, enumerate every plausible route through the hubs:

Direct: A → B
Through one hub: A → USDC → B, A → USDT → B, A → WMON → B
Through two hubs: A → USDC → USDT → B, A → USDC → WMON → B, A → USDT → WMON → B, and reverses

That's at most ten candidate paths. Send all ten quote requests in one multicall, pick the path with the highest output, swap on that.

const routes = buildCandidateRoutes(tokenIn, tokenOut, cores);

const results = await publicClient.multicall({
  contracts: routes.map((path) => ({
    address: router,
    abi: routerAbi,
    functionName: "getAmountsOut",
    args: [amountIn, path],
  })),
  allowFailure: true,
});

const best = selectBestQuote(results);

The whole router is around 50 lines. It builds the candidate list, dedups it, and returns whichever path the multicall said had the highest quote.

Why this matters (and not just for DEXes)

I'm not saying graph routing is wrong. For a mainnet aggregator routing across thousands of pools and dozens of DEXes, it's the right tool. I'm saying I wasn't building that.

Here's the lesson: a lot of code over-solves its problem.

You see it everywhere once you start looking. Sorting algorithm where the data is always ≤ 10 items (insertion sort is fine, stop). Caching layer where the data hits the database twice a day (the database is already a cache). Pub/sub where there's one publisher and one subscriber (call the function directly).

The smart-looking solution is usually someone solving the general problem, because that's what they were trained on. The general problem is harder, more interesting, and absolutely useless to you if your constraints are narrower.

On puddleswap, my constraints are:

One chain, one DEX, mine
Three hub tokens I control
Operator-maintained liquidity
Test users with low gas budgets

Within those constraints, enumeration is provably correct (every meaningful route gets checked), faster than graph traversal (one batched RPC, not N round-trips), and a fraction of the code. The day any of those constraints stops holding is the day I'll bother writing the graph router.

When this breaks

I'd be lying if I said this scales. Obvious failure modes are:

Exotic-to-exotic pools that bypass the hubs entirely. Enumeration misses them.
A hub runs dry of liquidity on one side. Router still checks routes through it and eats a bad quote.

The end

If you're building on Monad testnet and need swaps for your tests, puddleswap is live at app.puddleswap.org. The router is at puddleswap/web/src/lib/routing.ts.

And next time you reach for the complex solution, check whether your problem actually needs it. It probably doesn't.

And maybe ask your agent if there are any easier solutions to the problem you are trying to solve.

Questions?

You don't know how to vibe-code

port — Sun, 17 May 2026 12:30:08 +0000

It's 2026. We have AGI (or at least the ability to code almost anything thanks to models like Opus 4.5 from Anthropic and GPT 5.2 from OpenAI).

But there's one problem. What you create in minutes creates problems you spend hours trying to fix. And if you're unlucky, you end up with a spaghetti codebase that no LLM can untangle. You no longer understand the code. It doesn't even make sense to read it anymore.

So, what are you doing wrong and what could you do better, and how do some people get everything right when they are vibe-coding?

Honestly, vibe coding kinda gave people the wrong impression on using LLMs to write code. Somehow everyone ended up thinking "yeah i can do this with ONE PROMPT, without EVER LOOKING AT THE CODE".

That just won't work, unless you consider this good work:

And the code behind it is even worse. The AI's knowledge is months old, maybe a year. It doesn't know your codebase. It doesn't know what "done" means.

Alright, here's how I actually vibe-code. Or rather, how I use my current favorite tool (claude code) to build real projects.

I'm going to walk you through how I built execevents.xyz, a real-time execution visualizer for Monad. Blocks race across the screen as they go through consensus. Transactions stream in live. You can see state changes, call traces, gas usage.

a short glance at execevents.xyz

This isn't a toy project. Under the hood, execevents connects to Monad's Execution Events API—a Rust service that reads blockchain data directly from shared memory, HFT-style. We're talking sub-millisecond latency for real-time block and transaction data. Building something that interfaces with infrastructure this performant would normally require deep systems knowledge.

But here's the thing: I built this in HOURS, not days, not weeks. Using Claude Code and the methodology below, anyone can build high-performance applications on Monad without being a systems engineer or even a regular developer.

Below I explain my methodology about vibe-coding, or how I code.

Step 1: Think about the end goal

Visualize the most basic version of what you want to build. I usually ask claude something like this:

I read about execution events from Monad docs and I want to build an app showing how to use them. Here is the page about execution events: (i paste the markdown here) Do not start building until I confirm. Tell me how you are planning to build this. Then ask me to confirm. Also, ask me any questions you have. Our first goal is to reach to a basic MVP.

Above is the answer I got from claude. Notice how it basically told me what it's going to be doing exactly. I can now visualize what I am gonna be getting and can direct the project better. This is the point where I want to stop and think. If everything looks OK. I move on to the questions claude asks. Then, I start answering them.

Much like real coding, you want to spend time thinking about the code rather than writing it.

You might do several iterations before even you tell claude to build. I usually ask it to not to build in every message until I like the implementation plan.

I also use the plan mode a lot. It is the new way of telling the claude to ask you questions, and it just works really well!

Step 2: Build the MVP, then use it

Then, ask claude to start building. When it finishes doing stuff, test it. This is the part people LOVE skipping, not knowing that the problems that arise later actually stem from it. After it fixes the issue, go back and find another problem to fix, do this until there are no issues left.

Step 3: Iterate with small, focused prompts

This is where most people mess up. They find five things wrong and try to fix them all in one massive prompt.

Don't do that.

Every time you find something broken, fix just that one thing. Here's what my prompts actually looked like:

Prompt 1: "The TPS calculation is wrong. It's counting blocks that arrive in batches over WebSocket. Make it only count consecutive block numbers."

Prompt 2: "This doesn't work on mobile. Add a responsive layout with a bottom sheet for block details."

Prompt 3: "The block state transitions are too abrupt. Add CSS transitions so blocks slide smoothly between states."

Each prompt is:

Specific -> I'm telling it exactly what's wrong
Small -> targeting one thing
Reviewable -> I can read the diff and understand what changed

Step 4: Read the Code

Or at least, take a quick glance at it. Every time Claude makes a change, I read the diff. Not because I don't trust it, but because I need to understand what I'm shipping.

Reading doesn't mean auditing every line. It usually means:

Skimming the diff
Understanding the approach
Asking yourself "does this make sense?"

By reading the code, you will catch mistakes, learn, and stay in control. The moment you stop understanding your codebase is the moment you can't fix it anymore. Do not turn your project into a mess you can't make sense of.

And if you don't understand anything in the code, you can open a new terminal window and ask claude code to explain it for you.

What I Learned

Building execevents taught me things I wouldn't have learned from tutorials.

On the systems side: I now understand how Monad's Execution Events work at a low level, how the Rust API pulls data from shared memory, why certain event types arrive in batches, and how to handle the timing edge cases that come with real-time blockchain data. Claude didn't just write code; it explained the architecture as we built it. When the TPS calculation was wrong, debugging it meant understanding WebSocket message ordering and block finality.

On the vibe-coding side: I learned that the quality of your output directly reflects the quality of your iteration loop. The people who fail at vibe-coding aren't bad at prompting, they're bad at testing and reading diffs. They skip the boring parts.

The real unlock is this: with the right methodology, AI tools let you punch above your weight. You can build performant, production-grade applications that interface with serious infrastructure, even if you've never written Rust or worked with shared memory systems. The barrier isn't coding ability anymore. It's knowing how to guide the process.

Now, go.

And do magic, for we live in a magical era.

You are prompting GPT 5.5 wrong.

port — Sun, 17 May 2026 12:29:54 +0000

Source: OpenAI.

Prompting GPT 5.5 is A LOT different than how you prompted any model before. And GPT 5.5 itself can't write good prompts for itself! See the screenshot below from @victortaelin

So, in this short article, I will be talking about how to create good prompts for GPT 5.5 so that you can do your work better&faster.

Btw before we go any further, this guide is for using GPT 5.5 inside Codex.

So here's what changed. Older models needed you to walk them through the steps. First do this, then check that, then call this tool. GPT 5.5 reasons more efficiently and that kind of prompting actively makes it worse. It narrows the search space & you end up with mechanical answers.

The fix is the opposite of what people are doing. Describe the destination, not the route. Let the model figure out the path.

I've been changing how I prompt since 5.5 dropped. Here are the 5 moves with the highest hit rate, with examples you can paste in(or modify) directly.

1. Lead with the outcome

Stop telling the model HOW to solve the problem, instead tell it what the result should look like.

(btw the full examples are at the end)

Resolve the customer's issue end to end.

Success means:
- the eligibility decision is made from the available policy and account data
- any allowed action is completed before responding
- the final answer includes completed_actions, customer_message, and blockers
- if evidence is missing, ask for the smallest missing field

2. Kill the preamble

Codex loves to narrate. "I'll start by examining the file structure." "Let me first check the existing implementation." "Now I'll proceed to make the changes."

You don't need any of this. You can see what it's doing. The preamble is noise & it eats latency before any real work happens.

Skip preambles. Do not narrate what you are about to do before doing it. Do not announce tool calls. Do not end with "Let me know if you'd like adjustments" or "Feel free to ask if you have questions."

When you finish, report what changed in 2-4 lines. File paths, what was modified, anything I need to know to use the change. That's it.

3. Bias to action, finish what you start

Default Codex behavior on a hard task is to surface a plan and stop. We don't want that. We want action. Get action:

Bias to action. If the request is clear and the next step is reversible, just do it. Do not stop at analysis, do not stop at a plan, do not stop after the first file change.

Persist until the task is fully handled end to end in this turn:
- carry changes through implementation, verification, and a clear summary
- if you hit a blocker, try one more reasonable approach before stopping
- only stop early if the next step is irreversible, destructive, or genuinely ambiguous

Unless I explicitly ask for a plan or a question, assume I want code shipped.

(btw this is from the OpenAI Codex starter prompt)

4. Read in parallel, not one file at a time

Watch Codex on a real task. It reads package.json, waits, reads src/index.ts, waits, reads src/utils.ts, aaaand waits some more... Use this:

When you need to read multiple files, read them in parallel in a single batch, not sequentially.

Workflow:
1. Plan all the files you need before reading any
2. Issue one parallel batch of reads
3. Analyze together
4. Only do another batch if new unpredictable reads come up

Same for searches. If you need to grep for 3 patterns, run 3 searches in parallel. Sequential reads are only justified when one result genuinely determines the next.

5. Make it actually verify

Run validation and tests. Don't trust "this should work"::

After making changes, run the relevant validation:
- targeted tests for the behavior you changed
- typecheck and lint
- build, if the change touches anything build-time sensitive
- a quick smoke test on the running app if it's user-facing

If validation fails, fix it before reporting done. If validation can't run in this environment, say so & describe the next best check I can run myself.

"Done" means verified, not "code is written."

Here are 3 simple rules to follow when prompting GPT 5.5:

Add a completeness rule
Add a stop condition
Force verification.

Here are three examples you can adjust to your use case:

1. Building a feature

Build [feature]. Done = it works in the running app, has at least one test for the new behavior, types and lint clean, diff scoped to this change only.

Stop & ask only if: the next step is destructive, requirements are genuinely ambiguous, or you'd need to expand scope to 3+ unrelated files. Otherwise just ship it.

No preamble. Don't narrate before doing. When done, report changed files + what was modified in 2-4 lines.

Verify before reporting done: run affected tests, typecheck, lint. If anything fails, fix it. "Should work" is not done.

2. Fixing a bug

Fix [bug]. Done = root cause is fixed (not the symptom), a test exists that fails before the fix and passes after, no other behavior regressed, diff scoped to the fix.

Stop & ask only if: the bug isn't reproducible from what I gave you, the root cause is in unexpected scope (different module, infra, dependency), or two plausible root causes exist and the wrong fix would mask the real bug.

No preamble. Don't walk me through your hypothesis before testing it. When done, report root cause + fix + what you verified in 3-5 lines.

Verify before reporting done: run the regression test, run the affected module's full suite, confirm the original repro is gone.

3. Refactoring

Refactor [target]. Done = behavior is byte-identical before and after, all existing tests pass without modification, types and lint clean, diff scoped to the refactor.

Stop & ask if: you can't preserve behavior without changing a test (means the refactor changed semantics), the refactor naturally pulls in a 3rd+ file beyond what we discussed, or you find a real bug while refactoring (surface it separately, don't silently fix it inside the refactor diff).

No preamble. Don't explain the refactor plan before doing it. When done, report what moved, what's now where, and what was verified in 2-4 lines.

Verify before reporting done: run the FULL test suite (refactors break unexpected places), typecheck, build.

4. Migration / upgrade

Migrate [target] from [old] to [new]. Done = the codebase compiles and runs on the new version, all existing tests pass without behavior changes, deprecation warnings from the migration are resolved (not suppressed), diff is scoped to the migration only.

Stop & ask if: the new version requires a behavior change that affects users (don't make that call alone), the migration touches config, infra, or build files in ways we didn't discuss, or you find code that depends on the old version's bugs (genuinely tricky - surface it, don't paper over it).

No preamble. Don't list every breaking change in the changelog before starting - read the changelog yourself and apply what's needed. When done, report what was migrated, what was left untouched and why, and any deprecation warnings still standing.

Verify before reporting done: run the full test suite (migrations break unexpected places), typecheck, build. If the project has integration or e2e tests, run those too - unit tests pass through migrations more often than you'd think.

5. Adding tests to existing code

Add tests for [target]. Done = the tests exercise the actual behavior (not implementation details), they pass against the current code, they would fail if the behavior broke, coverage hits the meaningful branches not just the happy path.

Stop & ask if: the code is genuinely hard to test because of how it's structured (don't refactor it to make testing easier without checking), you find a real bug while writing tests (surface it separately, don't quietly fix it), or the existing tests already cover this and I missed it.

No preamble. Don't outline the test plan before writing - just write the tests. When done, report what's covered, what's intentionally not covered, and anything you found while writing them.

Verify before reporting done: run the new tests (must pass), then mutate the code under test in a small way and rerun (the tests must fail - if they don't, they're testing the wrong thing). Run the full suite to make sure nothing else broke.

And here are 5 things to avoid:

Telling Codex HOW to solve it instead of what done looks like
Asking GPT to create a prompt for itself
Using the same chat for more than one task
Sequential file reads on multi-file tasks (waste of latency)
Trusting "this should work" without running the tests (never do this)

Alright, if you take one thing from this: before you reach for that Extra High button, rewrite the prompt using the tips above. (and give me a follow)

Read more: developers.openai.com/api/docs/guides/prompt-guidance

Skills don't work the way we think they do

port — Sun, 17 May 2026 12:29:53 +0000

I just finished reading SkillBench paper: https://arxiv.org/pdf/2602.12670

And the results are definitely not what most people expect.

What researchers did

They did 86 real-work tasks across 11 domains and executed 7,308 runs.

Each task was tested in three modes:

Baseline (no skills)
Curated skills (human-written)
Self-generated skills by the model

Without further ado, below are some conclusions that I found interesting in the paper.

Self-generated skills don't help

One of the most hyped ideas in agent research is:

"Let the model write its own tools / skills."

But it is mostly a wasted effort. In this research, self-generated skills produced no meaningful improvement over baseline.

In some cases, they made performance worse.

Today's models simply cannot reliably create useful reusable procedural abstractions.

This matters because a huge part of current agent research assumes models can recursively improve by generating better skills/tools. This benchmark suggests that assumption is premature.

Human-made skills work A LOT better

When Skills were carefully written by humans, performance jumped +16.2 percentage points on average.

But here's what's even more surprising:

Domain variance was extreme

Some domains saw small gains (~4-5 pp)
Others saw enormous gains (~50+ pp)

Skills don't help the same in different fields.. They disproportionately help in structured, procedural domains.

Smaller models + skills ≈ bigger models without skills

A smaller model with curated Skills matched or exceeded a larger model without Skills.

This is huge for cost optimization:

Local agents
Edge deployment
Open-source models

Too many skills can hurt

Overly broad or verbose skill libraries degraded performance. Focused, minimal skill modules performed better.

Pick your skills carefully. 2-3 skills work better than 4+ skills.

Here is my takeaway

If this paper is right (and i think it is, mostly because of my personal experiences with skill files):

Scaling alone isn't enough
Autonomy narratives are premature
Skill architecture design is now a first-class research problem

Read the full paper: https://arxiv.org/pdf/2602.12670

so... how to create a skill that works?

port — Sun, 17 May 2026 12:29:52 +0000

In my previous article, I argued that skills don't work the way most people expect.

Related: Skills don't work the way we think they do

The data from SkillBench supports this. Attaching skills doesn't automatically guarantee better performance.

So the real question becomes:

If skills don't magically fix models... How do you engineer them properly?

To answer that, we need to understand how knowledge itself works.

I think human knowledge is like a block of cheese.

It grows over time, with holes ever-present.

When we hit something we don't know, we:

look it up
learn it
apply it
patch the hole and move forward

LLMs don't do this.

When they hit a hole, they don't say "I don't know."

They hallucinate. They lazily fill the gap with plausible-sounding but incorrect information.

Aaand that's where things break, and we, being the superior entity, come in to help.

The Two Types of Holes

Through trial and error, I've noticed there are two kinds.

1. Knowledge gaps

Example:

My OpenClaw agent tries to open a browser extension. It fails.

I tell it:

"You already have a browser. Open that."

Suddenly the dumdum understands the task and opens the freaking browser.

It wasn't incapable.

It just didn't reason through the environment correctly.

That's a hole.

2. Moldy knowledge

Sometimes it does know something, but it's outdated.

Examples:

Using useScaffoldContractRead instead of useScaffoldReadContract in Scaffold-ETH
Manually defining Monad mainnet instead of importing from viem/chains

That's stale info on the LLM's side. I call it mold.

And mold spreads silently. If you don't correct it once, it keeps reappearing in future runs. And you might never notice it.

How I Create Skill Files

Here's my actual process.

1. I let the model fail

For example, when I was building the monad-development skill, I simply said:

"Create a token on Monad."

That's it. Then I watched it fail.

I didn't over-direct it.

I wanted to see where the holes were.

2. I take notes on every failure

This sounds weird but yes I watch it and take notes/let it takes notes afterwards. after the LLM completes its run. I ask it "What did you have problems with?", "What did you fail to do on the first try?", and I go and check if the thing I asked for is built the way I wanted it to be.

3. I create the skill.md file

The skill file contains the patches to fill in the gaps of the LLMs knowledge and remove mold+fill in the gap that is created by removing the moldy part.

The file is concise, specific, and clear.

4. I re-run and benchmark

I run the same prompt again with the skill attached. If it still struggles, I refine the skill.

I repeat until:

First-attempt success rate is high
Hallucinations drop(mostly)
Tool usage becomes clean and consistent

What This Really Is

This is systematic failure harvesting. Treat the LLM as a system with blind spots and engineer around them.

Prompt. Let it fail. Take notes. Create a skill file out of your notes. Rinse and repeat until you are at a desired success rate.

This is how you create a skill that actually works.

I built a copy-for-LLMs button for Docusaurus. Then Ethereum and Sui shipped it.

port — Mon, 27 Apr 2026 19:01:36 +0000

*A few months ago I got tired of selecting docs pages and pasting them into Claude. Half the time the nav came along with the content. So I built docusaurus-plugin-copy-page-button: a one-line install that drops a Copy page button into your Docusaurus sidebar.
*
When I click the button, I get the page as clean markdown. I also added a dropdown that opens the page directly in ChatGPT, Claude, or Gemini.

Setup:

npm install docusaurus-plugin-copy-page-button
Then one line in docusaurus.config.js:

plugins: ['docusaurus-plugin-copy-page-button']
That's it.

Six months later, I see the plugin running on:

Ethereum execution-apis
Sui, Walrus, Seal, SuiNS (Mysten Labs)
Monad
Flare
Kaia
Nillion
Chronicle

Around 10k installs a month, mostly blockchain ecosystems. I didn't aim at that niche, it just landed there.

What was actually hard

Three things took most of the time.

Content extraction. Docusaurus pages come wrapped in nav, breadcrumbs, edit-this-page links, footers, and a sidebar. The plugin walks the DOM, finds the article container, drops the chrome, and hands the rest to a markdown converter that handles code blocks, tables, lists, and admonitions.

Then SPA route changes. Docusaurus uses client-side navigation. Inject the button on first load and it vanishes when the user clicks a link. The plugin watches popstate, Docusaurus's own events, and URL changes, then re-injects on each route.

And mobile. Docusaurus collapses the TOC sidebar on small screens. The button needs to live somewhere visible without breaking the layout. Took a few iterations.

Try it

If you run a Docusaurus site, install it. If something's missing, open an issue.