<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Andrei Mashukov</title>
    <description>The latest articles on Forem by Andrei Mashukov (@amashukov).</description>
    <link>https://forem.com/amashukov</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3951696%2Fc16fe1c6-0338-49b2-a65c-e43b06e44a0f.jpeg</url>
      <title>Forem: Andrei Mashukov</title>
      <link>https://forem.com/amashukov</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/amashukov"/>
    <language>en</language>
    <item>
      <title>The Missing POP: How I Ported a Yul Contract to Huff by Reading Every Opcode</title>
      <dc:creator>Andrei Mashukov</dc:creator>
      <pubDate>Tue, 26 May 2026 04:08:24 +0000</pubDate>
      <link>https://forem.com/amashukov/the-missing-pop-how-i-ported-a-yul-contract-to-huff-by-reading-every-opcode-16l0</link>
      <guid>https://forem.com/amashukov/the-missing-pop-how-i-ported-a-yul-contract-to-huff-by-reading-every-opcode-16l0</guid>
      <description>&lt;p&gt;&lt;em&gt;A war story about hand-managing the EVM stack, two words of litter left behind a CALL, and the debug trace that finally made the drift visible.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;I had a contract that worked. Then I rewrote it in a second language so it would behave &lt;em&gt;the same way, faster&lt;/em&gt; — and it didn't. The bug wasn't a crash. It was worse: the contract ran to completion, returned status &lt;code&gt;1&lt;/code&gt;, and the value it produced was quietly wrong.&lt;/p&gt;

&lt;p&gt;This is the story of that bug, the tool that caught it, and what porting an opcode dispatcher from Yul to Huff taught me about the one thing Yul had been doing for me the whole time without ever saying so.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two implementations of the same thing
&lt;/h2&gt;

&lt;p&gt;The contract is an opcode dispatcher — it reads a packed byte-stream of commands and routes funds through multi-hop swap paths across seven DEX families. The whole thing is open source (&lt;a href="https://github.com/AndreyMashukov/adaptive-mev-router" rel="noopener noreferrer"&gt;&lt;code&gt;adaptive-mev-router&lt;/code&gt;&lt;/a&gt; on GitHub), so every snippet below can be read in full context. The detail that matters for this story is that it ships &lt;em&gt;twice&lt;/em&gt;: once in &lt;strong&gt;Yul&lt;/strong&gt; as the reference implementation (&lt;code&gt;MEV_V2.yul&lt;/code&gt;), and once in &lt;strong&gt;Huff&lt;/strong&gt; as a hand-optimised port (&lt;code&gt;MEV_V2.huff&lt;/code&gt;) with an O(1) jump-table dispatch.&lt;/p&gt;

&lt;p&gt;Why two? Yul is readable and gives me a reference I can trust. Huff lets me shave the contract down opcode by opcode and control the dispatch path exactly. The deal is that both must behave identically — and the test suite enforces it. The harness loads both builds as two variants and runs every scenario against each:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;loadVariants&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;MEVJson&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;../artifacts/contracts/MEV_V2.yul/MEV_V2.json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;huffRuntime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;fs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;readFileSync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;huffBinPath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;utf8&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Yul&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="na"&gt;abi&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;MEVJson&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;abi&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;bytecode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;MEVJson&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;bytecode&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Huff&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;abi&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;MEVJson&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;abi&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;bytecode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;wrapRuntimeBytecode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;huffRuntime&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nf"&gt;loadVariants&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;forEach&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;variant&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`MEV_V2 [&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;variant&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;]: ...`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;function &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// every scenario runs once for Yul, once for Huff&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If Yul and Huff ever disagree, CI goes red. That harness is the hero of this story. It turned a vague "something feels off" into a precise, reproducible failure. But before it could do that, I had to write the Huff version. And the Huff version is where I met the stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Yul never told me
&lt;/h2&gt;

&lt;p&gt;Here is the thing I didn't fully appreciate until I left it behind: &lt;strong&gt;Yul manages the stack for you.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Look at one swap handler in the Yul reference. This is the entire V2 adaptive swap, zero-for-one:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;function swap_v2_adaptive_zfo(cursor) {
    let sig      := shr(224, calldataload(cursor))
    let feeBps   := and(shr(240, calldataload(add(cursor, 4))), 0xFFFF)
    let pair     := shr(96,  calldataload(add(cursor, 6)))
    let tokenIn  := shr(96,  calldataload(add(cursor, 26)))
    let amountIn := shr(144, calldataload(add(cursor, 46)))

    amountIn := resolve_amount(amountIn, tokenIn)            // 0 -&amp;gt; balanceOf(this, tokenIn)
    let amountOut := v2_compute_amount_out(pair, amountIn, feeBps, 1)

    transfer_token(tokenIn, pair, amountIn)
    swap_v2(sig, pair, 0, amountOut, address())
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Five named values. When I write &lt;code&gt;swap_v2(sig, pair, 0, amountOut, address())&lt;/code&gt;, the compiler decides where &lt;code&gt;sig&lt;/code&gt;, &lt;code&gt;pair&lt;/code&gt;, &lt;code&gt;amountOut&lt;/code&gt; live, in what order they get pushed, which &lt;code&gt;DUP&lt;/code&gt; retrieves each one, and when they get cleaned up. I think in &lt;em&gt;named values&lt;/em&gt;. The compiler thinks in &lt;em&gt;stack slots&lt;/em&gt;. I never have to know the translation.&lt;/p&gt;

&lt;p&gt;Huff removes that layer. In Huff you are the compiler's stack allocator. There are no names — there is a column of 32-byte words, and you address them by how far down they sit &lt;em&gt;right now&lt;/em&gt;. Here is the &lt;em&gt;same&lt;/em&gt; swap in the Huff port — and notice the comments running down the right side, the stack diagram after every single instruction:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// SWAP_V2 expects [sig, pair, amount0, amount1, to]
// zfo: amount0=0, amount1=amountOut
address                                  // [to, cursor, limit]
0x620 mload                              // [amountOut, to, cursor, limit]
0x00                                     // [0, amountOut, to, cursor, limit]
dup4 0x06 add calldataload 0x60 shr      // [pair, 0, amountOut, to, cursor, limit]
dup5 calldataload 0xe0 shr               // [sig, pair, 0, amountOut, to, cursor, limit]
SWAP_V2()                                // [cursor, limit]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Those comments are not decoration. In Huff the stack layout &lt;em&gt;is&lt;/em&gt; the program state, and &lt;code&gt;dup4&lt;/code&gt; only fetches the right value if &lt;code&gt;pair&lt;/code&gt; is genuinely four slots down &lt;em&gt;at that exact instruction&lt;/em&gt;. There is one more tell in that snippet, and it's the heart of this whole story: &lt;code&gt;0x620 mload&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  When the stack isn't enough — and why that's the warning sign
&lt;/h2&gt;

&lt;p&gt;In Yul, &lt;code&gt;amountIn&lt;/code&gt; and &lt;code&gt;amountOut&lt;/code&gt; are just locals; the compiler keeps them alive across the whole function for free. In the Huff port I couldn't do that. The V2 swap has to call &lt;code&gt;getReserves()&lt;/code&gt; on the pair halfway through to compute &lt;code&gt;amountOut&lt;/code&gt; — and that staticcall writes its result into scratch memory, and the reserve math needs a deep working stack of its own. Trying to also balance &lt;code&gt;amountIn&lt;/code&gt; and &lt;code&gt;amountOut&lt;/code&gt; on top of all that, reachable by &lt;code&gt;dup&lt;/code&gt;, across dozens of intervening opcodes, is exactly the kind of bookkeeping that breaks.&lt;/p&gt;

&lt;p&gt;So in Huff I spilled them to memory on purpose:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dup1 0x600 mstore     // save amountIn  — getReserves() is about to clobber scratch memory
...
dup1 0x620 mstore     // save amountOut — survive until the transfer + swap at the end
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That decision — &lt;em&gt;this value lives too long and travels too deep, put it in memory&lt;/em&gt; — is one Yul made silently for me every time. In Huff it's a conscious call, and getting it wrong is a real bug. Which brings me to the bug.&lt;/p&gt;

&lt;h2&gt;
  
  
  The macro that leaves litter on the stack
&lt;/h2&gt;

&lt;p&gt;Most of the swap macros in the contract are clean: they take their inputs, push them into the right memory slots, and consume everything. &lt;code&gt;SWAP_CURVE_EXEC&lt;/code&gt; is the textbook case — five inputs in, every one of them spent, nothing left behind.&lt;/p&gt;

&lt;p&gt;Then there's the native-ETH variant. A Curve swap that sends ETH has to pass the amount twice: once written into memory as a call argument, and once as the actual &lt;code&gt;msg.value&lt;/code&gt; of the &lt;code&gt;call&lt;/code&gt;. Which means, unlike every other swap macro, it cannot just consume &lt;code&gt;pool&lt;/code&gt; and &lt;code&gt;amount&lt;/code&gt; — it has to keep them &lt;em&gt;alive&lt;/em&gt; on the stack until the &lt;code&gt;call&lt;/code&gt; itself. Here is the real macro:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#define macro SWAP_CURVE_ETH_EXEC() = takes(5) returns(0) {
    0xe0 shl 0x00 mstore   // sig&amp;lt;&amp;lt;224 at mem[0]. stack: [pool, sellId, buyId, amount]
    swap1 0x04 mstore      // sellId at mem[4]. stack: [pool, buyId, amount]
    swap1 0x24 mstore      // buyId at mem[36]. stack: [pool, amount]
    dup2 0x44 mstore       // amount at mem[68], keep both on stack. stack: [pool, amount]
    0x00 0x64 mstore       // minOut = 0. stack: [pool, amount]
    // call(gas, pool, amount, 0, 132, 0, 32) — amount as msg.value
    0x20 0x00 0x84 0x00    // stack: [0x00, 0x84, 0x00, 0x20, pool, amount]
    dup6 dup6              // stack: [pool, amount, 0x00, 0x84, 0x00, 0x20, pool, amount]
    gas call
    iszero err jumpi
    pop pop                // clean up leftover pool and amount from dup6 dup6
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Read the last four lines slowly, because they are the whole problem in miniature.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;dup6 dup6&lt;/code&gt; reaches deep down the stack and copies &lt;code&gt;pool&lt;/code&gt; and &lt;code&gt;amount&lt;/code&gt; to the top, because &lt;code&gt;call&lt;/code&gt; needs them there as arguments. &lt;code&gt;call&lt;/code&gt; consumes its seven inputs and pushes one result. But the &lt;em&gt;originals&lt;/em&gt; — the &lt;code&gt;pool&lt;/code&gt; and &lt;code&gt;amount&lt;/code&gt; that &lt;code&gt;dup6 dup6&lt;/code&gt; copied &lt;em&gt;from&lt;/em&gt; — are still sitting down there. The &lt;code&gt;call&lt;/code&gt; didn't touch them. They are litter. And the macro is declared &lt;code&gt;returns(0)&lt;/code&gt;: it promises to leave the stack exactly as deep as it found it. So the macro has to end with &lt;code&gt;pop pop&lt;/code&gt; to sweep that litter away by hand.&lt;/p&gt;

&lt;p&gt;That &lt;code&gt;pop pop&lt;/code&gt; is not optional and it is not obvious. It exists only because of a &lt;code&gt;dup&lt;/code&gt; that happened nine instructions earlier. Forget it, and the macro returns two words heavier than it claims to. Nothing reverts. The next opcode in the dispatcher just finds a stack two slots deeper than its comments assume — and every &lt;code&gt;dup&lt;/code&gt; and &lt;code&gt;swap&lt;/code&gt; it does from that point on reaches for the wrong neighbour.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bug that didn't crash
&lt;/h2&gt;

&lt;p&gt;That is exactly the bug I shipped.&lt;/p&gt;

&lt;p&gt;Not in &lt;code&gt;SWAP_CURVE_ETH_EXEC&lt;/code&gt; itself — that one I'd already gotten right, and the &lt;code&gt;pop pop&lt;/code&gt; comment is me having learned the lesson once. The bug was in a &lt;em&gt;different&lt;/em&gt; macro, one I wrote later, where I did the same &lt;code&gt;dup&lt;/code&gt;-deep-then-&lt;code&gt;call&lt;/code&gt; pattern and simply did not realise it had left two originals stranded. I'd internalised "call consumes its arguments" and stopped there. But &lt;code&gt;call&lt;/code&gt; consumes the &lt;em&gt;copies&lt;/em&gt; &lt;code&gt;dup&lt;/code&gt; puts on top. It has nothing to say about the originals &lt;code&gt;dup&lt;/code&gt; copied from. Those are mine to clean up, and that time I didn't.&lt;/p&gt;

&lt;p&gt;Here is what made it vicious: &lt;strong&gt;nothing reverted.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The EVM doesn't know a leftover &lt;code&gt;pool&lt;/code&gt; from a meaningful value from any other 32-byte word. It's all just words. The macro returned two words heavier than its &lt;code&gt;returns(0)&lt;/code&gt; signature claimed. The dispatcher continued, every stack comment from that point on now describing a stack two slots shallower than reality, and the next &lt;code&gt;dup&lt;/code&gt; fetched a word two places off from the one I wanted — a different address entirely. The swap was issued with a wrong argument, the transaction ran to the end, and it returned &lt;code&gt;status 1&lt;/code&gt;. Success.&lt;/p&gt;

&lt;p&gt;The Yul variant of the same scenario returned the correct result. The Huff variant returned a different one. The &lt;code&gt;forEach&lt;/code&gt; harness caught the divergence and turned CI red — but all it could tell me was &lt;em&gt;that&lt;/em&gt; the two disagreed, not &lt;em&gt;where&lt;/em&gt;. I had a contract producing a wrong answer with no revert, no error, no line number.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reading every opcode
&lt;/h2&gt;

&lt;p&gt;You cannot reason your way out of this from the source. The whole problem is that your reasoning about the stack is what's broken — re-reading the macro just reproduces the same wrong mental model. You need ground truth.&lt;/p&gt;

&lt;p&gt;Ground truth is the execution trace. I ran the failing Huff scenario under a debug tracer and dumped the step-by-step opcode log: every instruction executed, and crucially, &lt;strong&gt;the stack contents after each one.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Then I did something tedious and completely worth it. I walked the trace one opcode at a time, and beside each line I wrote what I &lt;em&gt;expected&lt;/em&gt; the stack to be. Two columns: what the trace said, what I thought.&lt;/p&gt;

&lt;p&gt;For a while the columns matched — &lt;code&gt;PUSH&lt;/code&gt;, &lt;code&gt;PUSH&lt;/code&gt;, &lt;code&gt;CALLDATALOAD&lt;/code&gt;, fine. Then I reached the &lt;code&gt;CALL&lt;/code&gt; inside the offending macro. On the line &lt;em&gt;after&lt;/em&gt; it, the trace still carried two words my map had already discarded. The columns diverged by exactly two slots, and they never re-converged — every subsequent line was off by the same two.&lt;/p&gt;

&lt;p&gt;That was the whole bug, sitting in the diff between two columns: a missing &lt;code&gt;pop pop&lt;/code&gt;. Two characters. The fix took seconds. &lt;em&gt;Finding&lt;/em&gt; it took the trace.&lt;/p&gt;

&lt;h2&gt;
  
  
  The habit that was already half there
&lt;/h2&gt;

&lt;p&gt;Here's the part I'm slightly embarrassed by: the fix wasn't a new technique. It was &lt;em&gt;doing the thing I was already half-doing, properly.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;My Huff already had stack comments. Some of them were even notes-to-self mid-calculation — at one point in the V2 amountOut math I'd literally written, inline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;0x2710 sub   // stack was [feeBps, ...], push 0x2710 -&amp;gt; [0x2710, feeBps, ...],
             // sub -&amp;gt; 0x2710-feeBps. Correct!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That comment is me reverse-engineering my own stack in real time and &lt;em&gt;verifying it&lt;/em&gt;. I had the discipline in places. What I lacked was the discipline &lt;em&gt;everywhere&lt;/em&gt; — and a missing &lt;code&gt;pop pop&lt;/code&gt; is precisely what a lapse looks like. In the macro that bit me, I'd written the stack diagram down the right margin, then changed the instructions during a later edit and didn't re-derive the diagram beneath the &lt;code&gt;call&lt;/code&gt;. The comment said one thing; the opcodes did another.&lt;/p&gt;

&lt;p&gt;So the lesson wasn't "start commenting the stack." It was: &lt;strong&gt;the stack comment is code. It is the source of truth your &lt;code&gt;dup&lt;/code&gt;/&lt;code&gt;swap&lt;/code&gt; indices are read from, and it has to be updated with the same discipline as the instructions themselves.&lt;/strong&gt; An out-of-date stack comment is exactly as dangerous as an out-of-date mental model — because it &lt;em&gt;is&lt;/em&gt; one, just written down.&lt;/p&gt;

&lt;p&gt;Concretely, the rules I now hold myself to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Every line that touches the stack updates the diagram on that line.&lt;/strong&gt; Not the top of the macro — every line. Top-of-stack on the left.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;When you need a value, read its depth off the current comment and count. Never count from memory.&lt;/strong&gt; The comment is authoritative; the &lt;code&gt;dup&lt;/code&gt; index just obeys it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;dup&lt;/code&gt; copies; it does not move.&lt;/strong&gt; Every &lt;code&gt;dup&lt;/code&gt;-deep-then-&lt;code&gt;call&lt;/code&gt; pattern leaves the originals stranded below — &lt;code&gt;call&lt;/code&gt; only consumes the copies on top. If you &lt;code&gt;dup&lt;/code&gt; to reach call arguments, you almost certainly owe a matching &lt;code&gt;pop&lt;/code&gt; afterwards. &lt;code&gt;SWAP_CURVE_ETH_EXEC&lt;/code&gt;'s trailing &lt;code&gt;pop pop&lt;/code&gt; is that debt, paid.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A macro's &lt;code&gt;takes&lt;/code&gt;/&lt;code&gt;returns&lt;/code&gt; signature is a contract — verify it.&lt;/strong&gt; &lt;code&gt;returns(0)&lt;/code&gt; means the stack must be exactly as deep on exit as on entry. Walk the macro and prove it. A macro that secretly returns two words heavy corrupts every caller downstream.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;When a value lives long or travels deep, spill it to memory&lt;/strong&gt; — like &lt;code&gt;0x600&lt;/code&gt;/&lt;code&gt;0x620&lt;/code&gt; for &lt;code&gt;amountIn&lt;/code&gt;/&lt;code&gt;amountOut&lt;/code&gt;. If keeping it on the stack feels fragile, that feeling is correct; that's the Yul compiler's job knocking, and in Huff the job is yours.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A wrong answer with clean execution? Suspect the stack first.&lt;/strong&gt; A revert usually means a bad jump or a failed call. A &lt;em&gt;wrong result&lt;/em&gt; with &lt;code&gt;status 1&lt;/code&gt; is the signature of a stack that drifted — leftover litter, or a &lt;code&gt;dup&lt;/code&gt;/&lt;code&gt;swap&lt;/code&gt; that grabbed the wrong neighbour.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why the trace beat everything else
&lt;/h2&gt;

&lt;p&gt;The bug was an invisible disagreement between my model of the stack and the EVM's actual stack. Source review can't fix that — the review is done by the same broken model that wrote the bug. A debugger that shows only &lt;em&gt;values&lt;/em&gt; doesn't help much either, because every value is a 32-byte word and they all look alike; a wrong address is indistinguishable from a right one until you know which slot it &lt;em&gt;should&lt;/em&gt; have come from.&lt;/p&gt;

&lt;p&gt;What the opcode-level trace gives you is the &lt;strong&gt;shape&lt;/strong&gt; of the stack at every step, independent of your assumptions. It's the one artifact in the toolchain that doesn't share your mental model. Line it up against your expectations and the divergence point is the bug — not "near the bug," &lt;em&gt;the&lt;/em&gt; bug, the exact instruction where reality and intention split.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd tell anyone starting with Huff
&lt;/h2&gt;

&lt;p&gt;Huff is wonderful for what it is: total control, no compiler between you and the bytecode, every opcode chosen by you. But "no compiler between you and the bytecode" means the compiler's stack allocator is now a job on &lt;em&gt;your&lt;/em&gt; desk, and it is a real job with a real failure mode.&lt;/p&gt;

&lt;p&gt;So:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Respect what the high-level language was doing for you.&lt;/strong&gt; Yul's stack management isn't a convenience — it's an entire correctness layer. Take it over deliberately.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maintain the stack diagram as code.&lt;/strong&gt; Inline, every line, updated as rigorously as the instructions. Your &lt;code&gt;dup&lt;/code&gt;/&lt;code&gt;swap&lt;/code&gt; indices are &lt;em&gt;reads&lt;/em&gt; from that diagram.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;When behaviour diverges and nothing crashes, go straight to the opcode trace.&lt;/strong&gt; Don't re-read the source. Walk the trace beside your expectations and find the slot where they part.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep a reference implementation and test against it relentlessly.&lt;/strong&gt; The Yul-vs-Huff &lt;code&gt;forEach&lt;/code&gt; harness didn't find the bug for me, but it's the reason I knew there &lt;em&gt;was&lt;/em&gt; one. An executable specification you can't argue with beats any amount of careful reading.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The payoff: Yul vs Huff, measured
&lt;/h2&gt;

&lt;p&gt;Once the two implementations agreed byte-for-byte, the harness handed me something else for free. Every scenario runs against both variants and logs &lt;code&gt;receipt.gasUsed&lt;/code&gt;, so I got a direct, apples-to-apples gas comparison — same test, same calldata, two compilers.&lt;/p&gt;

&lt;p&gt;Huff's hand-built O(1) jump table wins consistently on dispatcher-heavy opcodes; on I/O-dominated swaps the two land within a handful of gas. A selection of measured numbers (Solidity 0.8.28, EVM Cancun, &lt;code&gt;viaIR&lt;/code&gt;, optimizer &lt;code&gt;runs=200&lt;/code&gt;):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;Yul&lt;/th&gt;
&lt;th&gt;Huff&lt;/th&gt;
&lt;th&gt;Δ (Huff − Yul)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;0x02&lt;/code&gt; V3 swap zfo (&lt;code&gt;amount=0&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;56 851&lt;/td&gt;
&lt;td&gt;56 800&lt;/td&gt;
&lt;td&gt;−51&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;0x04&lt;/code&gt; Balancer V2 zfo (&lt;code&gt;amount=0&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;81 794&lt;/td&gt;
&lt;td&gt;81 644&lt;/td&gt;
&lt;td&gt;−150&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;0x08&lt;/code&gt; &lt;code&gt;wrap_weth&lt;/code&gt; (adaptive)&lt;/td&gt;
&lt;td&gt;37 907&lt;/td&gt;
&lt;td&gt;37 782&lt;/td&gt;
&lt;td&gt;−125&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;0x0A&lt;/code&gt; &lt;code&gt;unwrap_weth&lt;/code&gt; (adaptive)&lt;/td&gt;
&lt;td&gt;34 097&lt;/td&gt;
&lt;td&gt;33 921&lt;/td&gt;
&lt;td&gt;−176&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;0x0B&lt;/code&gt; &lt;code&gt;transfer_eth&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;31 454&lt;/td&gt;
&lt;td&gt;31 290&lt;/td&gt;
&lt;td&gt;−164&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;0x0C&lt;/code&gt; &lt;code&gt;transfer_erc20&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;50 249&lt;/td&gt;
&lt;td&gt;50 009&lt;/td&gt;
&lt;td&gt;−240&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;0x0D&lt;/code&gt; &lt;code&gt;balance_check&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;27 756&lt;/td&gt;
&lt;td&gt;27 504&lt;/td&gt;
&lt;td&gt;−252&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;0x0E&lt;/code&gt; &lt;code&gt;sweep&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;33 775&lt;/td&gt;
&lt;td&gt;33 458&lt;/td&gt;
&lt;td&gt;−317&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;0x19&lt;/code&gt; Balancer V1 zfo (&lt;code&gt;amount=0&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;82 730&lt;/td&gt;
&lt;td&gt;82 257&lt;/td&gt;
&lt;td&gt;−473&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;0x1A&lt;/code&gt; Balancer V1 ofz&lt;/td&gt;
&lt;td&gt;82 880&lt;/td&gt;
&lt;td&gt;82 313&lt;/td&gt;
&lt;td&gt;−567&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;0x1B&lt;/code&gt; Fluid zfo (&lt;code&gt;amount=0&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;72 234&lt;/td&gt;
&lt;td&gt;71 624&lt;/td&gt;
&lt;td&gt;−610&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;0x1C&lt;/code&gt; Fluid ofz (&lt;code&gt;amount=0&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;55 171&lt;/td&gt;
&lt;td&gt;54 535&lt;/td&gt;
&lt;td&gt;−636&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;0x1D&lt;/code&gt; DODO zfo (&lt;code&gt;amount=0&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;86 338&lt;/td&gt;
&lt;td&gt;85 696&lt;/td&gt;
&lt;td&gt;−642&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;0x1E&lt;/code&gt; DODO ofz (&lt;code&gt;amount=0&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;69 298&lt;/td&gt;
&lt;td&gt;68 640&lt;/td&gt;
&lt;td&gt;−658&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;V2 flash 3-hop chain&lt;/td&gt;
&lt;td&gt;146 486&lt;/td&gt;
&lt;td&gt;145 781&lt;/td&gt;
&lt;td&gt;−705&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sandwich backrun (resolve + check + sweep)&lt;/td&gt;
&lt;td&gt;89 950&lt;/td&gt;
&lt;td&gt;89 152&lt;/td&gt;
&lt;td&gt;−798&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Negative Δ means Huff is cheaper. The gap widens exactly where you'd expect: the more dispatching a scenario does relative to actual I/O, the more the hand-built jump table pulls ahead. The 3-hop flash chain and the sandwich backrun — the dispatcher-heaviest scenarios — show the biggest savings, around 700–800 gas.&lt;/p&gt;

&lt;p&gt;But notice what every row in that table depends on. The numbers are only meaningful because the &lt;em&gt;behaviour&lt;/em&gt; column is identical first. A faster implementation that returns a different answer isn't an optimisation — it's the bug I spent this whole article describing. The gas win is real, but it's a footnote to the actual achievement: two independent implementations, in two languages, that a test suite cannot tell apart.&lt;/p&gt;

&lt;p&gt;The fix to my bug was two characters: a &lt;code&gt;pop pop&lt;/code&gt; that should have been there and wasn't. I think about that a lot. The cost of the mistake and the cost of the fix were wildly mismatched, and the only thing that closed the gap between them was being willing to read every single opcode until the stack told me the truth.&lt;/p&gt;

&lt;p&gt;In Huff, the stack always tells you the truth. You just have to be looking at it instead of at your idea of it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;The full contract — both the Yul reference and the Huff port, the &lt;code&gt;forEach&lt;/code&gt; parity harness, the fork tests, and the gas-diff CI — is on GitHub: &lt;a href="https://github.com/AndreyMashukov/adaptive-mev-router" rel="noopener noreferrer"&gt;&lt;strong&gt;github.com/AndreyMashukov/adaptive-mev-router&lt;/strong&gt;&lt;/a&gt;. The &lt;code&gt;SWAP_CURVE_ETH_EXEC&lt;/code&gt; macro in this article lives in &lt;a href="https://github.com/AndreyMashukov/adaptive-mev-router/blob/main/contracts/MEV_V2.huff" rel="noopener noreferrer"&gt;&lt;code&gt;contracts/MEV_V2.huff&lt;/code&gt;&lt;/a&gt;; its Yul counterpart is in &lt;a href="https://github.com/AndreyMashukov/adaptive-mev-router/blob/main/contracts/MEV_V2.yul" rel="noopener noreferrer"&gt;&lt;code&gt;contracts/MEV_V2.yul&lt;/code&gt;&lt;/a&gt;. Stars and issues welcome — and if you spot a stack comment that's drifted, you now know exactly what to look for.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>solidity</category>
      <category>ethereum</category>
      <category>evm</category>
      <category>mev</category>
    </item>
  </channel>
</rss>
