<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Lei Tang</title>
    <description>The latest articles on Forem by Lei Tang (@lei_tang_a062427a571e681f).</description>
    <link>https://forem.com/lei_tang_a062427a571e681f</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3913065%2F92d4963c-e7d4-4bd7-a70b-cf6936afc0a0.jpg</url>
      <title>Forem: Lei Tang</title>
      <link>https://forem.com/lei_tang_a062427a571e681f</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/lei_tang_a062427a571e681f"/>
    <language>en</language>
    <item>
      <title>AWS Account Security Restricted? Here's My 56-Hour Unblock Journey</title>
      <dc:creator>Lei Tang</dc:creator>
      <pubDate>Tue, 05 May 2026 03:04:53 +0000</pubDate>
      <link>https://forem.com/lei_tang_a062427a571e681f/aws-account-security-restricted-heres-my-56-hour-unblock-journey-4pb</link>
      <guid>https://forem.com/lei_tang_a062427a571e681f/aws-account-security-restricted-heres-my-56-hour-unblock-journey-4pb</guid>
      <description>&lt;h1&gt;
  
  
  AWS Account Security Restricted? Here's My 56-Hour Unblock Journey
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;A production incident triggered by leaked Access Keys — the complete process from receiving the AWS security notice to full account recovery&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  How It Started: A Chilling Email
&lt;/h2&gt;

&lt;p&gt;It was Saturday afternoon when I received an email from AWS — &lt;strong&gt;and it was the second day of China's Labor Day holiday.&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"We are reaching out to you because your AWS Account may have been inappropriately accessed by a third-party."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Translation: &lt;strong&gt;Your AWS account might have been compromised.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Along with the notice came account restrictions — I couldn't start EC2 instances, couldn't use certain AWS services. And to make things worse, we had a production EC2 instance running in Singapore serving overseas users.&lt;/p&gt;

&lt;p&gt;The feeling at that moment? Only those who've been through it can truly understand.&lt;/p&gt;

&lt;p&gt;My carefully planned holiday vanished the moment I opened that email. For the next three days, I basically lived inside the AWS Console and Support Center.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnwi78aufpshboe1esdaf.png" alt=" "&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Phase 1: Initial Response (Hours 0-6)
&lt;/h2&gt;

&lt;p&gt;AWS provided clear recovery steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Change the Root account password&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Enable MFA for the Root user&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Review CloudTrail logs for suspicious activity&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Clean up all unauthorized IAM users, roles, and policies&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reply to the support case confirming completion&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I immediately logged into the console and followed each step:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Changed the Root password&lt;/li&gt;
&lt;li&gt;Enabled MFA&lt;/li&gt;
&lt;li&gt;Checked CloudTrail logs — there were indeed unusual API calls&lt;/li&gt;
&lt;li&gt;Removed all suspicious IAM resources&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After completing everything, I replied to the case: "I've completed all required steps. Please restore my account."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I thought this would be resolved quickly. Little did I know, it was just the beginning.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Phase 2: Root Cause Found, Escalation Requested (Hours 6-18)
&lt;/h2&gt;

&lt;p&gt;During the investigation, our dev team finally identified the root cause of the key leakage — &lt;strong&gt;a code vulnerability in our project.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We confirmed that all related code had been fixed. I urgently requested AWS to start the Singapore EC2 instance, which had been down for over 4 hours — overseas business was completely paralyzed and users were complaining non-stop.&lt;/p&gt;

&lt;p&gt;The error message when trying to start the instance was devastating:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;This account is currently blocked and not recognized as a valid account.
Please contact AWS Support if you have questions.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At this point I realized: &lt;strong&gt;Security restrictions aren't automatically lifted once you complete the steps. There's a review process behind it.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Phase 3: Security Review, Answering Questions (Hours 18-24)
&lt;/h2&gt;

&lt;p&gt;On Sunday morning, the AWS security team responded with four questions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Which location (country/city) are you accessing the account from?&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Does anyone else have access to your account (e.g., IT staff)?&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;If yes, where are they accessing from?&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Are all active IAM users and roles authorized by you?&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I answered truthfully:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I use a VPN to access my account — common nodes include the US, Singapore, and Hong Kong&lt;/li&gt;
&lt;li&gt;No one else has access to my account&lt;/li&gt;
&lt;li&gt;All IAM users and roles are authorized by me&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The security team said they'd forwarded my case for review and asked me to wait.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I kept refreshing the case page. Every minute felt like an hour.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Phase 4: Cleaning Up Suspicious Resources (Hours 24-30)
&lt;/h2&gt;

&lt;p&gt;Monday morning, the security team found a suspicious S3 bucket:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="err"&gt;Bucket&lt;/span&gt; &lt;span class="py"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;logs-94591786-wmn5rp&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;They asked me to confirm whether this bucket was needed for my business — if not, delete it.&lt;/p&gt;

&lt;p&gt;I checked and confirmed it wasn't ours (likely created by the attacker), deleted it immediately, and replied to the case.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I thought this would be the last step. I was wrong again.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Phase 5: 48 Hours Passed, at My Wit's End (Hours 30-48)
&lt;/h2&gt;

&lt;p&gt;It had been over 48 hours since the initial notification. The account was still restricted. Production was down. Overseas business was paralyzed.&lt;/p&gt;

&lt;p&gt;I sent another strong message through the case:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"We've completed all security measures, but our account remains restricted. If this can't be resolved soon, we'll have no choice but to consider migrating our production services to Alibaba Cloud or other providers."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This wasn't an empty threat — &lt;strong&gt;every minute of downtime was causing real damage.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;AWS replied that they were working with their service team.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;While others were posting vacation photos on social media, I was writing essays in AWS support tickets.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Phase 6: Missed Details, Chat to the Rescue (Hours 48-52)
&lt;/h2&gt;

&lt;p&gt;Monday evening, I received an unexpected email from AWS:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"The team was unable to perform the required remediation steps. There are some actions needed from your side."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;They needed me to:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Delete unauthorized IAM resources&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;IAM user "Server": Delete the role or add MFA&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Step 2: Reset Root password and update MFA&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;But I'd already done all of this! Looking closer, the MFA for IAM user "Server" hadn't been properly recognized.&lt;/p&gt;

&lt;p&gt;Instead of replying through the case, I chose &lt;strong&gt;Chat support&lt;/strong&gt; this time.&lt;/p&gt;

&lt;p&gt;The agent, Melissa, was very professional:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Confirmed my MFA setup for IAM user "Server"&lt;/li&gt;
&lt;li&gt;Asked me to reset the Root password and update MFA again&lt;/li&gt;
&lt;li&gt;Confirmed everything was in order and updated the service team immediately&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key takeaway: In urgent situations, Chat is far more efficient than case replies. Use it.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Chapter: Unblocked! (Hours 52-56)
&lt;/h2&gt;

&lt;p&gt;Monday at 21:48, I finally received the email I'd been waiting for:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;*"We've verified that you've taken the required steps, and we've reinstated your AWS account."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The account was finally unblocked! All restrictions were lifted.&lt;/p&gt;

&lt;p&gt;From Saturday 14:06 to Monday 21:48 — &lt;strong&gt;56 hours in total.&lt;/strong&gt; My entire Labor Day holiday was consumed by AWS support tickets. &lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fov1js3dc4cz9ipoos0no.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fov1js3dc4cz9ipoos0no.png" alt=" "&gt;&lt;/a&gt;And the production environment was down for roughly the same duration — an entire weekend plus Monday.&lt;/p&gt;


&lt;h2&gt;
  
  
  Complete Timeline
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm0zj872w11p8uy19m4er.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm0zj872w11p8uy19m4er.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: All times below occurred during China's Labor Day holiday.&lt;br&gt;
&lt;/p&gt;


&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Sat 14:06   — Received AWS security notice, account restricted
Sun 01:08   — Completed initial security steps, requested restoration
Sun 04:14   — Found root cause (code vulnerability), requested escalation
Sun 04:22   — EC2 down for 4h+, urgent request to start instance
Sun 10:38   — Security team asked questions (location, personnel, etc.)
Sun 10:45   — Answered, case forwarded for review
Mon 09:54   — Asked to delete suspicious S3 bucket
Mon 11:06   — Bucket deleted, requested restoration + escalation
Mon 12:11   — 48h+ still restricted, filed strong complaint
Mon 15:07   — AWS confirmed, asked service team to remove restrictions
Mon 18:02   — New requirements: handle IAM Server + reset password
Mon 18:07   — Chat support, completed operations with guidance
Mon 21:48   — Account restored!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Lessons Learned &amp;amp; Recommendations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Security First, Prevention Over Cure
&lt;/h3&gt;

&lt;p&gt;The root cause was &lt;strong&gt;a code vulnerability that leaked Access Keys.&lt;/strong&gt; Improvements we made afterward:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Removed all hardcoded credentials from the codebase&lt;/li&gt;
&lt;li&gt;Started using AWS Secrets Manager or similar services for key management&lt;/li&gt;
&lt;li&gt;Configured &lt;code&gt;git-secrets&lt;/code&gt; in git repos to prevent key commits&lt;/li&gt;
&lt;li&gt;Set up regular Access Key rotation&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Understand the AWS Unblock Process
&lt;/h3&gt;

&lt;p&gt;The full process looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Notice → Remediation Steps → Security Review → Follow-up Actions → Final Review → Unblocked
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The entire process can take over 48 hours.&lt;/strong&gt; If production is involved, have a contingency plan ready.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Choose the Right Support Channel
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Channel&lt;/th&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;th&gt;Response Speed&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Case Reply&lt;/td&gt;
&lt;td&gt;Routine communication, providing info&lt;/td&gt;
&lt;td&gt;Slow (hours)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chat&lt;/td&gt;
&lt;td&gt;Urgent issues, need guidance&lt;/td&gt;
&lt;td&gt;Fast (instant)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Phone&lt;/td&gt;
&lt;td&gt;Critical emergencies&lt;/td&gt;
&lt;td&gt;Fastest&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For production outages, &lt;strong&gt;go directly to Chat or Phone.&lt;/strong&gt; Don't wait for case replies.&lt;/p&gt;

&lt;p&gt;Also, always communicate in English — most AWS security team members don't read Chinese, so using Chinese means it'll go through a translation layer first, adding unnecessary delay.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Configure Security Contacts in Advance
&lt;/h3&gt;

&lt;p&gt;AWS strongly recommends setting up &lt;strong&gt;alternate security contacts.&lt;/strong&gt; This way, if something goes wrong with your account, AWS can reach you through multiple channels.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Set Up Budget Alerts and Cost Anomaly Detection
&lt;/h3&gt;

&lt;p&gt;One thing I discovered through this incident — AWS Cost Anomaly Detection and budget alerts can notify you of unusual spending as soon as it happens. If we'd set these up, we might have detected the anomaly before AWS did.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Technically:&lt;/strong&gt; Key leakage sounds like something that happens to "other people," but a single code vulnerability can make it your reality. Security isn't a switch — it's a continuous process.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Procedurally:&lt;/strong&gt; AWS's security review process can be frustrating, but on the flip side, it's protecting your account. If unblocking were trivial, the security防线 would be meaningless.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mentally:&lt;/strong&gt; 56 hours of downtime is painful, but instead of complaining about the process, focus on how to accelerate communication and prepare thorough materials.&lt;/p&gt;

&lt;p&gt;The bottom line — &lt;strong&gt;configure your security contacts, budget alerts, and key management before production goes down.&lt;/strong&gt; These things always feel like "not urgent" when things are calm. By the time the storm hits, it's already too late.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Have you ever experienced a cloud provider security restriction? Share your story in the comments — I'd love to hear about your experience and what you learned.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>security</category>
    </item>
    <item>
      <title>Your Go Binary Won't Run on macOS? syspolicyd Is the Culprit</title>
      <dc:creator>Lei Tang</dc:creator>
      <pubDate>Tue, 05 May 2026 03:00:57 +0000</pubDate>
      <link>https://forem.com/lei_tang_a062427a571e681f/your-go-binary-wont-run-on-macos-syspolicyd-is-the-culprit-5h01</link>
      <guid>https://forem.com/lei_tang_a062427a571e681f/your-go-binary-wont-run-on-macos-syspolicyd-is-the-culprit-5h01</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;A bizarre Go project debugging saga — full investigation assisted by Claude Code&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Strange Bug
&lt;/h2&gt;

&lt;p&gt;It was an ordinary afternoon. I was developing a Go project on my Mac as usual. Code written, &lt;code&gt;go build&lt;/code&gt; succeeded, everything looked normal — then &lt;code&gt;./server&lt;/code&gt; started and &lt;strong&gt;nothing happened.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;No log output. No port listening. No error messages. The process was created, but it was like frozen in place — RSS stuck at 32 bytes.&lt;/p&gt;

&lt;p&gt;Even stranger, &lt;code&gt;air&lt;/code&gt; (my hot-reload tool) ran perfectly fine — and it's also a Go-compiled binary. Only &lt;strong&gt;newly compiled&lt;/strong&gt; binaries refused to run.&lt;/p&gt;

&lt;p&gt;If you've ever run into something similar, you know the drill: first you suspect your code, then the Go version, then you start questioning your life choices.&lt;/p&gt;

&lt;p&gt;Fortunately, I used &lt;strong&gt;Claude Code&lt;/strong&gt; (Anthropic's AI coding assistant) throughout this investigation. It was like having an experienced SRE sitting next to me, helping narrow things down step by step. Here's the complete AI-assisted debugging process.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Investigation: One Elimination at a Time
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Verify the Build
&lt;/h3&gt;

&lt;p&gt;First, I confirmed it wasn't a code issue:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;go build ./...    &lt;span class="c"&gt;# Passed, zero errors&lt;/span&gt;
go run ./cmd/server/  &lt;span class="c"&gt;# Process created, but… no output&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Checking the process state:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;PID   STAT  RSS  COMMAND
92672 SN      32  server
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;RSS of 32 bytes basically means the process was created but never actually executed any code.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Check Logs (Claude Code's Auto-Discovery)
&lt;/h3&gt;

&lt;p&gt;I asked Claude Code to check the project startup. It automatically read &lt;code&gt;logs/debug.log&lt;/code&gt; and noticed the log file timestamps hadn't been updated — meaning &lt;code&gt;InitLogger()&lt;/code&gt; was never reached.&lt;/p&gt;

&lt;p&gt;The code was stuck somewhere, very early in initialization. &lt;strong&gt;Claude Code quickly narrowed the problem to the process initialization phase by analyzing log timestamps and process memory.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Confirm Binary Integrity (Claude Code's Automatic Check)
&lt;/h3&gt;

&lt;p&gt;Claude Code ran &lt;code&gt;file&lt;/code&gt; and &lt;code&gt;otool -l&lt;/code&gt; to analyze the binary:&lt;/p&gt;

&lt;p&gt;The binary was compiled correctly, the Mach-O structure was intact — not a corrupted file.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Rule Out CGO
&lt;/h3&gt;

&lt;p&gt;I suspected a CGO dynamic linking issue:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;CGO_ENABLED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0 go build &lt;span class="nt"&gt;-o&lt;/span&gt; /tmp/testhello ./cmd/server/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result: &lt;strong&gt;Same failure.&lt;/strong&gt; CGO ruled out.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Rule Out Code Signing
&lt;/h3&gt;

&lt;p&gt;I tried signing the binary manually:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;codesign &lt;span class="nt"&gt;-s&lt;/span&gt; - &lt;span class="nt"&gt;--force&lt;/span&gt; /tmp/testhello
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result: &lt;strong&gt;Same failure.&lt;/strong&gt; Signing ruled out.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 6: Rule Out Quarantine Attributes
&lt;/h3&gt;

&lt;p&gt;macOS tags files downloaded from the internet with &lt;code&gt;com.apple.quarantine&lt;/code&gt; or &lt;code&gt;com.apple.provenance&lt;/code&gt; attributes. I checked and the newly compiled binary did have &lt;code&gt;com.apple.provenance&lt;/code&gt;, but the working &lt;code&gt;air&lt;/code&gt; binary had the same attribute — so that wasn't the cause.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 7: The Breakthrough — syspolicyd (Claude Code's Key Finding)
&lt;/h3&gt;

&lt;p&gt;After six steps ruled out all the usual suspects, I started suspecting a system-level issue. &lt;strong&gt;Claude Code suggested checking system processes&lt;/strong&gt; and ran:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ps aux | &lt;span class="nb"&gt;grep &lt;/span&gt;syspolicyd
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result was shocking:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;root 478  98.7%  /usr/libexec/syspolicyd
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;CPU usage at 98.7%.&lt;/strong&gt; This process had been running for 6 days and 16 hours, essentially stuck.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 8: Connecting the Dots (Claude Code's Root Cause Analysis)
&lt;/h3&gt;

&lt;p&gt;After discovering the &lt;code&gt;syspolicyd&lt;/code&gt; anomaly, &lt;strong&gt;Claude Code immediately laid out the complete causal chain:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;syspolicyd&lt;/code&gt; is the System Policy Daemon. One of its responsibilities is verifying code signatures for new binaries. When the system launches a new process, the kernel calls &lt;code&gt;amfid&lt;/code&gt; (Apple Mobile File Integrity), which in turn calls &lt;code&gt;syspolicyd&lt;/code&gt; for security validation. If &lt;code&gt;syspolicyd&lt;/code&gt; is stuck, the verification request never completes, and the process remains in a loading state, never executing any code.&lt;/p&gt;

&lt;p&gt;This also explained why RSS was only 32 bytes — the binary was mmap'd into memory (accounting for those 32 bytes), but the code was never executed.&lt;/p&gt;

&lt;p&gt;This also explained why &lt;code&gt;air&lt;/code&gt; worked fine — it was already loaded and didn't need re-verification.&lt;/p&gt;




&lt;h2&gt;
  
  
  Root Cause Analysis
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Frequent go build of new binaries
    ↓
amfid frequently calls syspolicyd for verification
    ↓
Verification requests pile up, queue gets blocked
    ↓
syspolicyd CPU at 100%, enters a deadlock
    ↓
New binaries get stuck at the verification stage
    ↓
Process RSS=32, code never executes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Trigger conditions:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Long uptime without reboot&lt;/strong&gt; — my Mac had been running for 6 days 16 hours&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frequent compilation of new binaries&lt;/strong&gt; — repeated &lt;code&gt;go build&lt;/code&gt; during development generates many new Mach-O files&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Possible compounding factor&lt;/strong&gt; — macOS notarization checks timing out and retrying, further exacerbating the issue&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Solutions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Method 1: Reboot (Simple and Effective)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;shutdown &lt;span class="nt"&gt;-r&lt;/span&gt; now
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After rebooting, &lt;code&gt;syspolicyd&lt;/code&gt; resets naturally and everything returns to normal.&lt;/p&gt;

&lt;h3&gt;
  
  
  Method 2: Just Restart syspolicyd (No Reboot Needed)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;killall syspolicyd
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The system automatically restarts it, and CPU usage drops back to normal levels. This is the fastest fix.&lt;/p&gt;

&lt;h3&gt;
  
  
  Method 3: Verify the Issue Is Resolved
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check syspolicyd CPU usage&lt;/span&gt;
ps aux | &lt;span class="nb"&gt;grep &lt;/span&gt;syspolicyd

&lt;span class="c"&gt;# Normal operation should be 0-1% CPU&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  How to Prevent It
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Reboot your Mac weekly&lt;/strong&gt; — macOS &lt;code&gt;syspolicyd&lt;/code&gt; tends to misbehave after long uptime. Regular reboots are the simplest prevention.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Watch your CPU during development&lt;/strong&gt; — glance at Activity Monitor occasionally. If &lt;code&gt;syspolicyd&lt;/code&gt; spikes, deal with it immediately.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep macOS updated&lt;/strong&gt; — Apple has fixed similar issues in later releases.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Don't hesitate to kill it&lt;/strong&gt; — if you notice abnormal CPU usage, run &lt;code&gt;sudo killall syspolicyd&lt;/code&gt; right away. Don't wait for it to lock up.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Reflections
&lt;/h2&gt;

&lt;h3&gt;
  
  
  AI-Assisted Debugging Experience
&lt;/h3&gt;

&lt;p&gt;This investigation was conducted entirely with &lt;strong&gt;Claude Code.&lt;/strong&gt; The experience was like pair-programming with a seasoned SRE:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I just said "the project won't start," and it automatically ran &lt;code&gt;go build&lt;/code&gt;, &lt;code&gt;go run&lt;/code&gt;, and analyzed logs&lt;/li&gt;
&lt;li&gt;When it found the binary process with RSS of 32 bytes, I didn't even need to do the math — it told me directly "the process was created but never executed any code"&lt;/li&gt;
&lt;li&gt;After systematically ruling out CGO, code signing, and quarantine attributes, it proactively suggested checking system process status and eventually pinpointed &lt;code&gt;syspolicyd&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;From problem discovery to solution (&lt;code&gt;sudo killall syspolicyd&lt;/code&gt;), it took less than 20 minutes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This really drove home the point: &lt;strong&gt;AI-assisted debugging doesn't replace human experience — it amplifies it.&lt;/strong&gt; An experienced engineer might figure this out faster, but having AI handle all the repetitive checks, cross-references, and log analysis dramatically accelerates the process.&lt;/p&gt;

&lt;h3&gt;
  
  
  Back to the Tech
&lt;/h3&gt;

&lt;p&gt;What this investigation taught me: &lt;strong&gt;sometimes the problem isn't in your code — it's in your operating system.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When developing Go projects, if compilation succeeds but execution fails, most people's first instinct is to check the code, configuration, and dependencies. But if none of those pan out, consider the system level — a stuck system daemon is all it takes to keep your binary frozen at the starting line.&lt;/p&gt;

&lt;p&gt;macOS is generally stable, but &lt;code&gt;syspolicyd&lt;/code&gt; seems to be a weak link. Especially for Go developers, frequent compilation triggers its verification mechanism more often than you'd think, making it more prone to breaking down.&lt;/p&gt;

&lt;p&gt;If you ever encounter a Go binary that compiles but won't run, with RSS stuck at 32 bytes — don't rush to reinstall your system. Check &lt;code&gt;syspolicyd&lt;/code&gt; first. Chances are, that's your culprit.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Have you encountered similar macOS development environment issues? Share your debugging stories in the comments. And if you've used AI-assisted debugging tools, I'd love to hear about your experience too.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>go</category>
      <category>claude</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
