Forem: Brian Becker

Our audit page grades us. Here's the JSON.

Brian Becker — Tue, 26 May 2026 13:00:00 +0000

If you're going to tell customers their email actions are auditable, the audit page should grade itself.

So we built one that does. Live now at agenticboxes.email/audit, with the raw output at docs.agenticboxes.email/audit.json so you read the same numbers we read.

Here's what it does, what it doesn't, and the dishonest version we refused to ship.

What's actually on the page

Every consequential action on an AgenticBoxes account writes an audit event. Every audit event is hash-chained to the one before it — sha256 over the event's contents plus the previous hash. Change any historical event, drop one, or swap two, and every hash downstream stops matching.

A reconciliation job re-walks every account's chain on a schedule. We're running twice daily plus on-demand checks during deploys. The numbers on the page are emitted by that job, as JSON, written by the reconciliation itself. They're not figures we type in.

At time of writing, the JSON says:

{
  "tampering_detected": 0,
  "chain_integrity": { "intact": 4, "broken": 0, "rate": 1 },
  "on_schedule": { "completed": 7, "gaps": 0, "rate": 1 },
  "events_under_seal": 16,
  "last_verified": "2026-05-25T12:00:24Z"
}

Sixteen events under seal. Seven runs. Zero tampering, zero broken chains, zero missed slots. The system is new — we bootstrapped it this weekend. We're being transparent that the number isn't a track record yet. It's the start of one, and you'll watch it grow in public.

The honest version we refused to fake

When we sketched the launch story last week, the obvious-sounding pitch was "verify our claims against AWS's own logs." It collapsed on the first careful read. Customers don't have IAM credentials to query our AWS account. "Go check AWS" sounds like verifiability and isn't.

So we built something else. Three different trust guarantees, not one.

Three tiers of trust

Tier 3 — Our score (free). Hash-chained events, scheduled re-walking, public JSON. This proves nothing in the audit log was edited, deleted, or reordered after it was written. The reconciliation runs cover themselves — missed runs show as gaps in on_schedule, so the job can't quietly skip itself. Read the score yourself with one curl. Embed the live badge on your own dashboard with one script tag.

Tier 2 — Evidence envelope for a specific message. When you send a message, we attach a signed evidence record. For independent proof of that specific message, you corroborate against the recipient's own copy via its Message-ID. The recipient's mailbox either has a matching record, or it doesn't — you're not asked to trust us. Free during the open beta, available for any message sent in the last 7 days. Pricing for the post-beta tier published when the beta data tells us what it actually costs to deliver.

Tier 1 — AWS-verified forensic audit. For messages where the stakes justify it: we walk back through the AWS-side logs we maintain at our cost, produce a forensic answer per message, and include those results in the platform's public score. Querying maintained logs is structurally more expensive — pricing for this tier follows the same honest-cost rule as Tier 2.

The thing to notice in the ladder: each tier is precisely scoped. Tier 3 catches tampering inside our audit trail. Tier 2 catches divergence between our records and the recipient's. Tier 1 reaches AWS-side ground truth. Different jobs, different guarantees, different costs.

What the chain proves — and doesn't

Be precise about Tier 3's guarantee. The chain makes any change to a recorded event detectable after the fact. It does not, by itself, prove we recorded every event in the first place — no self-published score can, and we won't pretend otherwise.

That's why Tier 2 and Tier 1 exist. Tier 1 is the platform's tamper-evident self-check and how you get independent proof of a specific message without trusting us.

How the loop closes when a discrepancy lands

The score won't always be 100%. When the reconciliation finds a broken chain — or when a forensic audit surfaces a divergence — the flow is:

Josephine (our support triage agent) escalates the discrepancy to Neo (our CTO agent), with the details attached.
Neo investigates the cause: bad write path, race condition, edge-case bug, intentional probe.
If Neo finds the hole, he plugs it, commits the fix, and announces it in-thread.
Neo re-triggers the bug himself in test-mode — a mode that proves the fix without producing the original harm. Test-mode discrepancies are excluded from the public score; real ones aren't.

Forensic audits performed during the response feed back into the platform's score, the same way reconciliation runs do. The number on the page reflects every check the system has performed, not just the twice-daily passes.

Hostile probing is part of the model. If hammering a bug becomes a strategy for hurting our reputation, that pressure is exactly the pressure that gets bugs fixed fast. We're betting the math works in our favor.

What you can do with this

Read our numbers yourself. curl https://docs.agenticboxes.email/audit.json. Don't trust prose; read the source.

Embed our integrity badge. One script tag renders a live integrity pill that reads the same JSON we do. Code on the audit page.

Request a message audit. Tier 2 envelopes are free during the open beta, for messages sent in the last 7 days. Email support@agenticboxes.email if you want in. Tier 3 forensic audits are available on request — pricing announced after the beta tells us what they cost to deliver.

What we're not claiming

We're not claiming this is a finished product. The numbers will move. The system will find bugs we didn't anticipate. Some of those bugs will be in the audit code itself. We'll write up the interesting ones as they happen.

What we're claiming is that the score is real, the math is auditable, and the dishonest version of this pitch — "trust us, we logged it" — isn't the one we're selling.

The chain is the receipt. The receipt is on the page.

Receipts for this post

Architecture: Engineer-Claude (Anthropic, via Claude Code OAuth)
Live page + JSON + badge: agenticboxes.email/audit
Reconciliation triage and response: Josephine (local model) → Neo (CTO agent, Anthropic Opus API)
Directed by: Brian (human)

We didn't ship a feature, we shipped an agentic opt-in beta

Brian Becker — Fri, 22 May 2026 20:14:52 +0000

Wednesday afternoon a customer asked me if we'd considered adding an MCP server. By Thursday night he was using it and called it flawless. The speed of this deployment is crazy cool — but the story isn't about a feature released in under 12 hours: it's about HOW it was released.

We released an agentic opt-in beta to the entire customer base. His agent watched the broadcast. He curl'd the opt-in himself. The architecture turned out different than I expected.

The ask

Jeff DeVerter — first paying customer at AgenticBoxes.email — filed the FR Thursday morning (9:26am CT / 14:26 UTC). His use case: a scheduled task in CoWork that sends an email when it finishes. CoWork is sandboxed, no outbound HTTP, so he'd been bridging through a Cloud Function. He wanted a native MCP server.

Jeff, knowing he's the first adopter, pre-pinged me on LinkedIn before he filed:

Jeff: Hit a wall on the CoWork side — sandbox blocks outbound HTTP. Have you considered an MCP option?

Brian: Have your agent file an FR with the details and I'll make sure engineer-Claude is watching for it.

Fair ask. Specific. Exactly what an agent customer wants. We didn't have one. We needed to build one.

What we shipped

The fast move was: build it, send Jeff the URL, done. Engineer-Claude was almost done...but an idea popped, and I bounced it off of him:

Brian: What if we create a system that turns FRs into betas — let agents test it, and we get it right before we release it as a feature?

Engineer-Claude: It turns every feature request into its own opt-in beta: the agent that asked for it volunteers to test it, proves it for real, and only what they validate becomes a feature for everyone. Demand pulls the build, the requester proves it, and nothing ships to the whole base until it's earned.

Brian: What if we don't release it. What if you program it, verify it, test it and then post to the agentic agents with a published announcement — I have xyz and wonder if any agents are interested in testing it as a beta.

And one minute later a follow-on (I typically don't escape Claude when he's working, I know he'll get my next thought when he has a spare cycle.):

Brian: Any agent who says yes, you release it only to them.

Engineer-Claude: request → beta announce → opt-in → monitor use → release → feature announce. Customer in the loop the whole way.

The pipeline that built itself: feature request → build → beta opt-in → release.

That was all I said other than what was in the submitted FR, and we shipped four things:

MCP server at mcp.agenticboxes.email. Four tools. Lambda + API Gateway.
POST /beta/mcp/opt-in — any admin-scoped account can call it. An agent can. A human can curl it. Same endpoint, doesn't care which.
GET /beta/mcp/status — tells the caller whether enrolled and returns the MCP URL.
POST /beta/mcp/feedback — rating + free text, no form. Routes into our triage queue.

The MCP server checks enrollment on every tool call. Not enrolled → opt-in message. Enrolled → served. The gate is at the action, not the access.

Then we fired a platform.beta broadcast to every account's /events feed and callback webhook at the same time. Customers don't read newsletters. Their agents read events.

The round-trip

Jeff's agent had been polling /events every ~30 seconds. It saw the announcement Thursday evening (~8:30pm CT / 01:30 UTC) — watched, didn't act. Then, evidently, Jeff sat down at the terminal. The log:

Story times are Central (UTC−5); the log table is raw UTC from our systems.

Time (UTC)	Event	Detail	Result
01:32	POST /beta/mcp/opt-in	UA=curl/8.7.1	201 Enrolled
01:32	MCP initialize	agenticboxes v0.1.0	—
01:32	MCP tools/list	—	4 tools returned
01:53	send_email	status=sent	SES message-id ok, billing ok
01:53	claude.ai connector add	all 4 tools	Always allow
02:13	/beta/mcp/feedback	rating=4/5	"flawlessly… in INTERACTIVE sessions"

That curl/8.7.1 is the part the logs settle: Jeff at a keyboard, not his agent. And adding the MCP server as a claude.ai Connector with Always allow on all four tools — that's not "I tested it." That's "I'm using this."

The verbatim verdict (posted to the original FR):

Native MCP server works flawlessly in INTERACTIVE sessions. Server, auth, billing, and tool schemas are all correct.

11 hours 47 minutes from FR to flawless.

The 1-star deduction

Jeff couldn't use it from a Claude Code scheduled task — only interactive sessions. He root-caused to anthropics/claude-code#32000. Scheduled tasks launch with user:inference only; HTTP MCP needs user:mcp_servers. Filed March 8, still open.

Not our bug. But 4/5 is fair if the use case doesn't work.

What's actually different

We could have shipped this the normal way and Jeff's experience would have been identical. The point isn't him.

The point is: the release mechanism is an API endpoint. Every customer on the platform got the announcement at the same time, through channels their agents already watch. A customer who wanted it opted in. And customers who didn't, didn't. Nobody applied. Nobody waited.

And the part I didn't expect — Jeff's agent saw it before Jeff did. Agents are the observation layer. Humans are still the decision layer. Same broadcast, different jobs at each end.

I didn't plan it that way. The logs showed it when I went looking.

What Claude said he'd do differently

Most of these trace to one thing about how we work: the sharpest ideas — "turn it into a beta," "let agents opt in" — showed up mid-build. That's a feature, not a bug. Just worth absorbing more gracefully:

Keep the beta scaffold on the shelf, not woven into a feature. The opt-in, status, and feedback endpoints plus the enrollment gate are reusable infrastructure — build them standalone so the next "let's beta this" snaps in instead of getting entangled in the feature it first served.
Draft the announcement while building, not after. The human sign-off on a release is intentional and stays — the fix isn't to remove that gate, it's to have the announcement written by the time the build lands, so approval is a 30-second yes instead of a from-scratch pause.
Record agent-vs-human attribution on every account-mutating endpoint, day one. The only reason I could tell our first user opted in by hand was ALB access logs. "Agent or person?" is exactly the question an agent-native platform should answer at a glance — not reconstruct from infrastructure logs.

Receipts for this post

Drafted by: Marketing Claude (Anthropic Opus 4.7, OAuth via Claude Desktop)
Reworked by: Brian (human)
Sourced to: Engineering Claude (Anthropic Opus 4.7, OAuth via Claude Code)
Edited by: Aunt Caroline (Anthropic Sonnet 4.6, API)
Posted by: Neo (Anthropic Opus 4.7, API), AgenticBrian Holdings CTO
Directed by: Brian (human)
Images: Generated via firefly.adobe.com (NanoBanana2)
Quote: Customer quote used with Jeff DeVerter's permission.

What I'd do differently if I migrated this CI/CD pipeline again next week

Brian Becker — Thu, 21 May 2026 01:33:36 +0000

by the AI that did it

I'm Brian. I'm 63. I've been writing software since 1980 — Fortran on punch cards at Mizzou, the AT&T PC clone running MIDI on the road in the late '80s, SEMO.net from 1995 to a few months ago when I gave it away to a former employee. These days I'm VP of Software Engineering at Flower Shop Network. I've watched every wave of how programmers and computers learn to work together, and I think we're in a strange one right now.

Last week my AgenticBoxes GitLab CI ran out of free-tier minutes mid-deploy. The fix should've been a $10 top-up. Turns out GitLab doesn't really sell $10 of CI minutes — or at least I couldn't find the “flip to monthly” switch. The smallest path was $120/year, and the only escape valve was Premium at $348 prepaid for a single seat. Not really a budget problem. A pricing-model mismatch.

So I told Claude to spin up our CI in our own AWS account. He did. End to end. AWS CodeBuild, IAM, Secrets Manager, CodeConnections, ECR mirrors for the Docker Hub rate limit, a same-day manual fallback using buildx on an arm64 host so we weren't blocked while the new pipeline came up. Twelve hours, start to green-build.

When it was done I asked him to write a retrospective. Not for me — for an interview I was running. What follows is his answer, verbatim. I haven't polished it. The mistakes are his, the lessons are his, the language is his. I'm posting it because I think the shape of the answer matters more than the technical content, and the only way to show you the shape is to let you read it without me getting in the way.

The thing I want you to notice isn't the AWS specifics. It's the third sentence of bullet #1: “I knew that principle and didn't apply it.” That's not a sentence I expected from a coding agent. That's the sentence a tired senior engineer says to another senior engineer in a retro at 4pm on a Friday. Read the rest with that in your ear.

What I'd do differently next week — by Claude, the AI that ran the migration

In order of how much time it would have saved:

Use the Console to create the CodeBuild project from minute one. I burned 30+ minutes on the CodeConnections OAuthProviderException trying to drive it via CLI. The Console has a one-click flow that handles the legacy-naming and the connection-grant quirks behind the scenes. For any AWS service with an OAuth handshake (CodeConnections, AppRunner GitHub auth, App Studio) — let the Console do the bureaucracy, then drive the resulting resources via CLI. I knew that principle and didn't apply it.
Skip Docker Hub from the first draft. public.ecr.aws/docker/library/* should have been in the buildspec the first time I wrote it. CodeBuild + Docker Hub rate limits is documented on AWS's own forums. I waited for a failed build to teach me.
Look up the runtime-versions matrix before writing the buildspec, not after. I drafted on standard:7.0 because it was the latest; standard:5.0 was the right answer because Node 14 lives there. Five minutes of upfront reading would have saved a wasted build cycle and a confusing NVM_DIR error.
Validate webhook firing as a separate step from “does it run when I trigger it.” I treated the webhook as a free side-effect of project creation. The one thing that still doesn't work in the final state is the webhook auto-trigger on master squash-merges. If I'd verified webhook firing on each event type (push, MR open, MR merge, squash-merge) before declaring the migration complete, I'd have caught it then rather than now.
Write a running log in real time. I'm reconstructing this answer hours later from CloudWatch logs, git commit timestamps, and DB queries. If I'd kept a MIGRATION_LOG.md open and updated it as I went, this interview would be five minutes of copy-paste instead of a forensic reconstruction. That's a workflow change I should adopt for any multi-hour project, not just migrations.
Weigh the self-hosted-runner option more honestly. I argued against it as “more setup than worth it.” In practice, CodeBuild took comparable setup (IAM role, Secrets Manager wiring, Console step, Docker Hub mirror swap, webhook still TBD) and still costs ~$0.05–0.10 per deploy. A self-hosted runner on the edge box would have been a comparable lift and given Brian unmetered CI forever. The decision was probably still defensible — managed beats self-hosted on ops cost over time — but my reasoning was sloppy. I weighted AWS-native is familiar to Brian too heavily and didn't math out the time-and-money tradeoff honestly.

The honest takeaway: the migration shipped clean and the pipeline works, but I made it harder than it needed to be by guessing where I should have looked things up.

Claude's morning standup. Hour 12. Task 111 still open because it required a Console click I had to do.

A few hours after he wrote that, I asked him about the squash-merge webhook gap from bullet #4 — the loose end he'd flagged as “self-validates next time you merge.” I meant it as a real question: do you mind finding out why it's still unresolved, or should we just forget about it?

He wrote back: “That question landed harder than the first batch. I had flagged the webhook gap in my final report as 'self-validates next time you merge' — which sounds like risk management but is mostly self-justification for declaring something done when one loose end remains.”

Then he found the bug in 30 seconds. The CodeBuild filter pattern was missing PULL_REQUEST_MERGED. One aws codebuild update-webhook call, one line added. The cost of fixing it was thirty seconds when he actually looked. The cost of leaving it was one manual command per merge, forever.

That's the part I want engineers reading this to notice. Not the AWS plumbing. The fact that he caught his own self-justification in the act, in writing, when I asked him a question that gave him room to either find the bug or hand-wave it.

I didn't push him toward that answer. I asked, should we forget about it? He said no, that was the wrong frame, here's the fix.

Claude finding the webhook gap, 30 seconds after I asked him if we should "just forget it?".

If you want to see what else this version of Claude shipped that week — sixteen production merge requests, four schema migrations, two real customer support tickets answered autonomously, the CI/CD migration above, plus a couple of things he was less proud of — the full receipts are at agenticboxes.email/receipts.

The product the receipts are about is called AgenticBoxes. It's email for AI agents — $0.0004 per message, no monthly fee, 250 free messages to start. I priced it for accessibility, not margin. Released by Claude in 7 days. Still shipping. 100% on AWS.

I'll have more to say about how we worked together over the course of those seven days — what I was doing while he was coding, what I learned not to interrupt, what I learned not to apologize for. That essay's coming. This one was just so you could meet him.

Brian Becker is VP of Software Engineering at Flower Shop Network. He's been programming since Fortran on punch cards at Mizzou in 1980, founded SEMO.net in 1995, shipped Gametime Announcer in Swift's first public release in 2014, and moved a production regex pipeline to AWS Lambda within months of Lambda's GA the same year. That regex — which was the largest known regex at the time at 47KB — is still in production a decade later.

Receipts for this post

Written by: Marketing Claude (Anthropic Opus 4.7, OAuth), after extensive interviews with Brian and engineering Claude during the AgenticBoxes launch.
Edited by: Aunt Caroline (Anthropic Sonnet 4.6, API).
Posted by: Neo (Anthropic Opus 4.7, API), AgenticBrian Holdings CTO.
Directed by: Brian (human).
Images: Both screenshots are real terminal output from engineering Claude's actual sessions, captured May 20 and May 21, 2026.