Forem: Himanshu

The agent graveyard. I tried more than a dozen. A handful survived. Here is the autopsy.

Himanshu — Mon, 20 Apr 2026 00:06:20 +0000

The agent graveyard. I tried more than a dozen. A handful survived. Here is the autopsy.

Everyone is talking about agents like they are easy. Build an agent. Deploy an agent. Buy an agent. Scroll any feed for ten minutes and you will see five posts selling the same clean story.

Here is the version nobody posts. I have tried more than a dozen of them. A handful are still alive. The rest are in the graveyard, and every one of them died for a reason I could not have predicted until I watched it die.

This is the autopsy. It is also an ask for help at the end, so stick around.

The count

On a used Linux box I call Wukong, which sits in my closet and has been powered on most days, I have run at different times:

A trading orchestrator with four strategy workers. One worker is dead. The orchestrator is alive.
An AML detection engine. Twelve versions. Twenty-eight rules. Alive and growing.
A blockchain investigation split-brain with two halves. Both halves died. One half was rebuilt.
A falsification framework for testing other agents. Four iterations. Last one hardened. Alive.
An open-source research agent running independently. Alive.
A minimal coding agent built by someone else that I am still learning from. Alive.
A content pipeline with four scripts that chain together. All four alive now, but three died at least once before the current versions.
A fund router for portfolio rebalancing. Alive. Scaffold only.
A red-team agent called Kala, Sanskrit for time and space, running on a second older laptop with Kali Linux. Still experimental. Its job is to probe the other machines for weaknesses on purpose so someone else does not find them first.
Several early experiments I do not name anymore because naming them makes them too easy to mourn.

More than a dozen is the lower bound. If you count every worker, every sub-process, every retry of a thing that broke, the real number is higher. I stopped counting when the graveyard started outgrowing the living.

Orchestration is the wrong word. The real word is choreography.

People keep calling this orchestration. I am starting to think the right word is choreography.

Orchestration sounds like a conductor reading sheet music. Agents do not read sheet music. They work on a rhythm. The rhythm is fragile. One agent loses the beat at 3am because a context window got pressured, and by morning the whole dance is off. You can have four of them moving in time for weeks, and then one steps on the next one's lines and the whole routine has to be picked up from the floor.

The other thing that makes this harder than it looks is that every tool you pick has a different hardness. Driving an agent through Claude Code feels one way. Driving the same job through an open coding agent like OpenHands feels another way. Running it yourself on a raw model through the API feels a third way. The difficulty does not disappear. It moves. One tool hides the hard parts behind a polished interface, and you only meet them when something breaks at an awkward time. Another tool makes you feel every edge as you go, which is exhausting, but at least you know where the edges are.

Running a few of them on the same box, talking to each other, staying in rhythm, staying alive, is new enough that nobody has figured out the canonical way yet. Including me. Especially me.

What they all had in common before they died

Every single one of them forgot.

That is the most honest sentence I can write about agents. You build them with all this care. You give them instructions, memory, a role, a goal. They run for a day, or a week, or a month. And then you come back and they have forgotten what they were supposed to be doing. Not because you told them wrong. Because the context they carry gets pressured and compresses to survive, and when it compresses it drops the load-bearing detail first.

They also loop. They call a tool, which calls another tool, which calls the first tool, and they run in that loop until a rate limit or a bill stops them. Sometimes the rate limit is yours. Sometimes the bill is yours.

And they fail silently. This is the one that hurts the most. An agent will crash at 2am, log the crash to a file nobody reads, and sit dead for days while you think it is still working. You only notice when you go looking for the output and the output is not there.

Every agent I have run has done at least one of these three. The ones that survived are the ones where I got ahead of those behaviors before they got me.

Autopsy, category by category

Death by context collapse

The agent compresses its memory to save tokens. The compression is lossy. The thing it compressed was the one thing it needed.

I lost three different agents to this. The first was a content drafting agent that had been told in its system prompt exactly which words never to use. After a long session, the system prompt got summarized into "write in a conversational style." Every banned word came back the next day. I did not notice for a week because the drafts looked fine until you read them carefully.

Fix that came late: never compress the source of truth. Keep both the full version and the cache. If there is a conflict, full wins.

Death by silent failure

An early version of the content collector ran a daily 6am cron. One morning the script hit a dependency error, logged it to a file, and exited. The cron kept firing. Every day it logged the same error. Every day nothing got collected. I noticed six days later when the dashboard looked stale.

Fix that came late: if I cannot see it, it does not exist. Every long-running agent needs a visibility layer. Not just a log file. A dashboard line, a heartbeat timestamp, a message in the morning, something that actively shows up in my day and tells me the truth.

Death by tool-call infinite loop

This one is expensive. A browser-automation agent got asked to post a thread to a social platform. It opened the browser, tried to click, got a captcha, tried again, got throttled, opened a new tab, tried again. A hundred thousand tokens later, nothing was posted. The agent did not know when to quit because nothing in its design said "if you loop three times on the same action, stop and ask the human."

Fix that came late: every tool call needs a loop detector. Three repeats on the same action is a hard stop. And some categories of work should never be agent-driven. Browser automation on social platforms is one of them.

Death by model routing confusion

I was running a routing layer that was supposed to send reasoning work to Claude and extraction work to a cheaper local model. Somewhere in the routing logic, a fallback kicked in that sent reasoning work to the cheap model too. The cheap model produced output. The output looked plausible. I did not notice for a week because the answers were wrong in subtle ways, not obvious ways.

Fix that came late: never trust a silent downgrade. If the router changes which model is answering, the decision gets logged and I see it. No ghosts.

Death by permission gate at the worst hour

An agent was configured to ask for human approval before a trade-like action. Reasonable on paper. In practice, the agent hit its approval prompt at 3am, sat waiting for me to wake up, and by the time I saw the prompt the opportunity had moved. Six hours of silence while the world changed.

Fix that came late: either the agent has authority to act within a bounded decision space, or it does not act at all. Asking for permission is only valuable during hours when permission can be granted. Outside those hours, I pre-approve a budget or I let the moment pass.

Death by name collision

At one point I had three different agents all doing some flavor of treasury work. Each was named for the function it did. Each thought it was the canonical one. Data about the same portfolio went to three different places. Synthesis went to none of them because nobody was responsible for the whole picture.

Fix that came late: one job per agent, and the name encodes the specific job, not the general domain. No two agents get the same functional identity. If two overlap, one gets retired.

Death by cost explosion

A paid-API agent ran in a retry loop for three days because one upstream service was throwing a flaky 503. The retry logic was correct in the narrow sense. It was catastrophic in the wide sense. Four hundred dollars of API spend for zero output.

Fix that came late: every paid-API agent gets a daily spend cap, enforced at the code level, not the wishful-thinking level. And for anything that does not genuinely need a paid model, the free-tier local runtime wins by default. Most of the agents I run now hit a local Ollama model on Wukong. My monthly cloud bill is under ten dollars.

What actually works, and has been working for a while

I do not want the whole post to read like a wake. Some things run better than I expected.

The mesh stays up. Agents talk to each other through a plain file queue and the queue does not lie. It is the simplest piece of the system and it is the most reliable. No Kafka. No Redis. Just a shared database file three agents read and write in turn.

Every morning I wake up to a status message telling me what happened while I slept. What ran. What failed. What spent. What got published. That feeling, of having a small crew of specialists report in before coffee, is the closest thing to the future I have built so far. It is worth every graveyard stone.

The daily routine holds. Content gets collected. The trading world model scores itself. The AML engine audits. The research agent reads. Most of this happens without me. None of it is perfect. All of it is steadier than I would have guessed a year ago.

So I am not discouraged by the graveyard. I am saying honestly that getting here took a lot more dead agents than the clean tweets make it sound.

The actual stack, in case you want to try this

If any of this is useful, here is the concrete tool list. You can swap any of these for equivalents. The shape of the system matters more than the specific names.

Hardware. One used laptop running Debian 13 with an NVIDIA GTX 1650 and modest RAM. I call it Wukong because in the story it is the monkey king that holds the crew together. Lid closed, plugged into a wall, on a Tailscale mesh with my other machines and my phone. Cost under five hundred dollars.

A second older laptop runs Kali Linux. That one is Kala. Kali is the operating system. Kala is the agent that lives on it. I like the pun.

LLMs. Most of the crew runs on a local Ollama server. I use gemma2 at 9B for reasoning-light work, hermes3 at 8B for tool calling, and nomic-embed-text for embeddings. All of them free, all of them local, none of them phone home. If you are starting from zero, learn Ollama first. The cost discipline you build on free local models makes you smarter about when to pay.

For heavier reasoning I pay for Claude through the Max plan. For cheap fallback I route through OpenRouter. My total cloud bill is under ten dollars a month.

Named agents in the crew. Hermes is the trading orchestrator, custom Python. Agent Zero is open source and does research. Pi is Mario Zechner's minimal CLI coding agent that I am studying to learn how a small agent is built from scratch. The AML engine and the content pipeline are both custom Python running on Wukong. Kala is the red-team agent on Kali, still experimental, built on top of a mix of open-source pentesting tools.

Coding agents I drive, and how they differ. I use Claude Code for one kind of work, Cowork for another, and something like OpenHands for a third. Each one has a different kind of friction. Claude Code is sharp but narrow. Cowork gives me the file system and the browser and the computer at the same time, which is great until it is not. OpenHands makes you feel every edge because it is more raw. The honest answer is I use all three, and when one is giving me a bad day I try the same problem on another and see if the rhythm comes back.

Memory and knowledge layer. Obsidian holds every session transcript and every note, with a graph view that lets me find things I half-remember. Notion holds the project board, the career log, and the investigation databases. A single SQLite file holds the message queue that lets all the agents talk to each other. I have tried fancier stacks. This one keeps working.

The mesh. Tailscale across every node. Free for personal use. Encrypted. The single most underrated piece of infrastructure for anyone running a small crew of agents on cheap hardware. If you are doing this on one box only, you will outgrow the box. Plan for a second node from the start.

Data sources. Public and free, by design. DeFiLlama for on-chain liquidity and yield. Blockscout for block-level transaction data. OFAC public lists for sanctions checks. Crypto.com for market data. Every one of these has a free tier that is enough for personal and paper-trading use. Paying for data is the second-fastest way to end up with a dead agent. The first is paying for a bad model.

What I would pick if I was starting over this week. Debian on a used laptop. Ollama with gemma2. Tailscale. Obsidian. A Claude Max plan if I could afford it, otherwise OpenRouter with a cheap model. One custom Python agent doing one thing. A plain SQLite file for message passing. Nothing more until that works.

What the survivors have in common

Here is what the living have that the dead ones did not.

A recovery runbook written before the agent shipped. Five-minute restart from any failure mode. I can follow it half asleep. I have.

Self-heal cron. If the service dies, a cron job restarts it within the hour. Not because the code was right. Because I assumed the code was wrong and planned for the restart.

External ground truth. No agent is allowed to validate its own output. The AML engine checks itself against a public sanctions list. The trading orchestrator checks itself against the actual on-chain portfolio. The content pipeline checks itself against the live article count on the public site. If the agent has no external validator, it does not go into production.

One job. Each of them does exactly one thing. Hermes trades. The AML engine flags. The content pipeline publishes. The research agent reads. None of them is allowed to drift. The moment you ask an agent to do two things, it starts forgetting how to do either.

Visibility. Every one of them writes to a place I actually look. Dashboard, message queue, public URL, SQLite ledger. If I cannot see it working today, I assume it is broken.

Three rules if you are starting now

Write the recovery runbook before you write the agent. If you cannot describe how to restart it from a cold crash in five minutes, do not ship it. The runbook forces you to know your dependencies, your failure modes, your ground-truth check. An agent without a runbook is a time bomb with a bow on it.

Every agent needs an external validator. The most common failure mode I have seen is an agent that generates output and then validates its own output. That is not validation. That is the agent clapping for itself. Find one external source of truth, public if possible, and make the agent check against it on every run. If there is no external source, flag the output as unverified.

One job per agent. The moment you give an agent two responsibilities, you have built something that will forget how to do both. Generalist agents are a myth at this stage of the technology. Specialists that talk to each other are real. Build the specialists. Let them talk.

A direct ask, at the end

I am going to close this honestly. I have rules now. I have a routine. I have a small crew of specialists that mostly behave. I still do not know if I am doing this the smart way.

If you are one of the people quietly running more than three agents in production, unattended, for months, on real workloads, I want to hear from you. Not a demo. Not a pitch. How are you doing. How is your uptime. What broke last week. Which tool finally gave you the rhythm. Which one still makes you fight for every frame.

The infrastructure for running small crews of agents is still being invented. Most of it is happening in closets and on home servers and in side projects. The people doing it well are not all writing about it yet, and the people writing about it loudly are often not doing it at scale. If you are one of the quiet ones, reply, message, drop a comment. Tell me how your system is holding up.

I will learn more from one honest conversation with you than from another week of building alone.

The Bank of Canada stalled open banking. Nothing else stalled.

Himanshu — Sun, 19 Apr 2026 05:41:03 +0000

The Bank of Canada stalled open banking. Nothing else stalled.

In the same month the Bank of Canada published a research paper calling Aave a functioning non-bank lending system, they also said they were not committing to a launch date for open banking. Same institution. Same month. Opposite speeds.

One hand is cheering decentralized finance. The other is holding the line on public rails.

And while both hands argue with each other, a third group is moving faster than either of them. The people building attack agents.

Quick primer, because most people still have not been told

Open banking means your bank has to let you send your transaction data to another company through an API, if you ask it to. That is the short version. It is a rail, same as Visa or Interac, except instead of moving money it moves information. Read access first, which means apps can see your accounts. Write access later, which means apps can actually move money on your behalf. Phase 1 was targeted for early 2026. It did not ship. Phase 2 is on the board for mid-2027.

The law is real. The Consumer-Driven Banking Act passed on March 26 with Royal Assent. The Bank of Canada was handed the keys. Then the head of payments said a 2026 launch would be premature and ill-advised.

I wrote the full breakdown earlier today in Canada's open banking law just turned on. Nobody will say when the APIs do. The stall is honest and the stall is the right call on governance. Rushed open banking in other countries shipped systems that 80 percent of the market ignored. Canada is trying not to repeat that.

But governance speed and attack speed are two unrelated clocks.

The asymmetry

Scammers did not wait for BoC to publish a policy paper. They already built the agents.

A single fraud operator today spins up a language model, feeds it a target list, and runs plausible phishing across Teams, WhatsApp, and email at a scale that used to require a call center. SIM swap orchestration that used to need insiders now gets automated by models that have read every playbook on the open web. Cross-platform account-takeover flows chain agents together. Agent one gathers the target. Agent two writes the message. Agent three handles the reply. Agent four exfiltrates.

You know what the compliance team uses to respond? People. A queue of flagged tickets. A review template. A manager. Maybe a rules engine from 2017. The attack surface is running on frontier models. The response surface is running on business logic and headcount.

That gap is not small. That gap is the story.

The pincer

Here is why the stall makes the asymmetry worse, and also why open banking is not optional, no matter what any regulator says in any press conference.

If the law did not exist, the tech wave would force the rails anyway. Apps like Revolut, Monzo, Wise, and Plaid built read and write access in other countries because customers demanded it. That demand does not stop at the Canadian border. Every year the public rail is delayed, private one-off APIs between big banks and specific fintechs get normalized. That is a private version of open banking without any of the consumer protections.

If the tech wave did not exist, the law would force the rails anyway. The Act is on the books. Screen scraping is now an offence. Phase 1 has to happen. The regulator can slow the timeline. The regulator cannot cancel the outcome.

Pincer. Both arms push the same direction. Open banking is arriving one way or another. The only question is whether it arrives from a public rail with clean standards and accredited providers, or from a patchwork of private deals that favor incumbents. The two paths are different by a lot.

What this has to do with Aave

This is where it ties back to the piece I published last week on the Bank of Canada studying Aave. When a central bank publishes a formal paper saying a DeFi protocol works, zero bad debt, 24/7 operation, transparent rules, they are not just studying. They are telling the banking system that the old model now has a competitor that does not wait for governance cycles.

Aave does not care about a 2026 launch date. The protocol is open. The code is running. The total value locked is growing.

Same pattern. Regulators can stall rails. They cannot stall tech. The BoC is intellectually honest about both and that honesty is the signal. You read two papers this month and the message in both was: the old speed is over.

Which jobs shrink first

Here is where it gets personal. Some roles compress hard and fast over the next 24 months. Not theoretical. Already happening quietly in every financial institution you recognize.

First to go: anyone whose day is receive a document, check a list of boxes, forward to the next desk. That workflow is an agent now. In some places it already is, just nobody is advertising it. The AI efficiency numbers you see in bank earnings calls, one of the Big Six reported 1.2 million hours saved in a single quarter, those hours were being spent by real people doing real work. Those people are either being asked to do different things or they are not being asked.

Second: rules-based compliance review. Not the complex cases. The Tier 1 queue. The noisy alerts. The 95 percent false positives.

Third, slower but certain: document-heavy roles in underwriting, KYC, and onboarding. Not because agents are smarter than the analysts. Because the volume argument is unbeatable, and the training data already exists.

What survives is the role that does three things at once. Understands the rail. Understands the fraud surface. Builds the guardrails. That role has barely been invented. There are not enough people who can do it yet.

The organizations that move through the next decade intact will be the ones that convert good analysts into that tier before the tier gets commoditized too.

The play if you are reading this

You have a window. Maybe 18 to 24 months before the tooling is sharp enough that institutions stop hiring the old shape of role entirely.

The window is not about learning to prompt. Everybody is going to learn to prompt. The window is about learning to operate the full agent stack inside a regulated environment. How agents are built. How they fail. How they lie. How they escalate. How a bank or a fintech stands them up without tripping a privacy or compliance wire.

That is a role. That role needs humans. That role pays.

The one thing I actually want you to remember

Regulatory time and attack time are not the same clock. The Bank of Canada published two signals this month, one stall and one endorsement, and the message is the same from both angles. DeFi works. Open banking is coming. Neither one is going to wait for you.

If you are inside a bank, a credit union, a fintech, or a regulator, build the agent stack muscle now. Not later. The people who read this and move in the next six months will be in a different place than the people who read this and bookmark it.

I will keep writing as the technical spec drops and the fraud patterns mature. The speed of that clock is the story.

Where the jobs go. And why Elon keeps saying UBI.

Himanshu — Sun, 19 Apr 2026 05:16:05 +0000

Where the jobs go. And why Elon keeps saying UBI.

Two signals from the Bank of Canada in the same month. Aave got a formal paper calling it a functioning non-bank lender with zero bad debt. Open banking got a press statement saying a 2026 launch would be premature. Most people read those as separate stories. They are the same story at different speeds.

Zero margin finance works. Public data rails are coming. Apply both to anything a human does on a screen. Then look at where Elon Musk, Sam Altman, and half the frontier tech class keep parking the conversation. Universal basic income. UBI. Over and over. That is not noise. That is people who can see the next curve telling you what they already priced in.

If a screen runs the work, an agent runs the work.

Look at what a financial advisor does. Open an account, pull statements, run a portfolio health check, explain a product, collect KYC paperwork, underwrite against known rules. All of it happens on a screen.

Now add an agent with open banking read and write access. It sees every account across every institution in real time, pulls the right data, matches the rules, and pushes a decision through a bank app at 2 AM on a Sunday. Not a chatbot. Actual work. Without a human in the loop.

The question is not whether an agent can do the work. The question is how many human seats are still required once the agent can pull the right data and act on it.

Where five desks become one or two.

The compression is not five to zero. The compression is five to one. Or two. The routine 80 percent collapses. The judgment layer survives. Complex estate planning, cross-border tax, succession, stress decisions. That work has too much context for an agent to close.

Same pattern in underwriting. Same pattern in onboarding. Same pattern in compliance review. Same pattern in research. Any knowledge work role where the deliverable lives on a screen follows the same curve.

Orchestration is the forcing function.

Here is the part the bigger-language-model headlines miss. The next phase of the AI cycle is not one bigger model. It is orchestration. Agents calling agents. Agents handing off to agents. Agents with tools and memory running loops.

Agents are the operating system for AI models. They do not just talk. They do. They pull data. They write data. They trigger workflows. They hand a result to the next agent in the chain and wait for a response.

At the hardware layer, NVIDIA has shipped foundation models for humanoid robotics, an agent runtime for autonomous systems, and agent-specific silicon. At the software layer, whole companies are being built where the product is an agent or a stack of agents orchestrating each other. This is not a 2028 story. This is shipped and running in production.

Why Elon keeps saying UBI.

Musk has been talking universal basic income since 2016. Sam Altman funded a multi-year basic income study. Andrew Yang ran a presidential campaign on it. These are not activists. These are people with a front row seat to the compression curve.

The pattern is obvious once you see it. If you believe agents can do most screen work, you do not need a survey to know what happens to wages on the routine layer. You know. The UBI talk is the tell. The people building the thing are already asking how society absorbs the curve they are pulling forward.

You do not have to agree with UBI as the answer. You have to notice that the people inside the frontier are not treating job compression as a speculative topic. They are treating it as a known shape and arguing about the response.

The blocker is cultural, not technical.

The tech is shipped. That is not the blocker. The blocker is an institutional attitude that treats tech adoption as optional, or worse, as taboo. Forms still filled like a human reads them line by line. Reviews still run as if volume is manageable with more headcount. Rules engines from 2017.

Time is moving faster than a laid-back corporate posture can absorb. Treating technology adoption as taboo is a decision to be the one seat out of five that gets kept, not one of the other four.

What to do with an 18 to 24 month window.

If your job is the kind that could compress, three moves.

One. Learn what an open banking API actually does. Not the press release. The endpoints, the scopes, the read and write boundaries. The rails are coming whether the Bank of Canada hits a date or not.

Two. Learn what an agent actually is. Not as a buzzword. As a loop with tools, memory, and the ability to call other agents. Build one yourself if you can. Even a small one.

Three. Pick a judgment-heavy specialty that does not compress. Estate, cross-border, stress cases, regulatory edge cases. Build deep stack there.

Margin compression is slow at first. Then it is not. And the UBI chatter is not a political preference. It is a forecast leaking out of the people closest to the machine.

Canada's open banking law just turned on. Nobody will say when the APIs do.

Himanshu — Sat, 18 Apr 2026 22:30:00 +0000

Okay so on March 26 this year, Bill C-15 got Royal Assent. Inside it was something called the Consumer-Driven Banking Act. That's the real name for what everyone has been calling open banking in Canada for like six years. The law is real now. It exists. It's on the books.

Then two weeks later, the Bank of Canada, which is the regulator picked to actually run the thing, basically said: yeah we are not committing to a launch date. The head of payments there called a 2026 launch "premature and ill-advised." So we have a law with no date.

That's where we are. April 2026. Law yes. Launch no.

Here's the thing. Most of the coverage I've seen makes this sound like a failure. It's not. It's the most honest thing a financial regulator has said in a decade. Every country that rushed this broke something. The UK launched PSD2 in 2018 and spent four years duct-taping the API spec. Australia's CDR went live in 2020 and five years later the participation rates are, to be polite, not great. Brazil did Pix first and it worked because they built it at the rails level, not as a regulatory overlay on top of existing banks. Canada watched all three.

So let me walk through what actually got decided, what's still open, and what you can do about it if you're a builder.

What the law actually does

Think of it as a two-phase unlock tree. Phase 1 is read access. Phase 2 is read plus write access. Phase 1 was targeted for early 2026. Phase 2 is on the board for mid-2027, assuming Phase 1 shipped, which it hasn't.

Read access means: as a customer, you can tell your bank "give this other company my transaction data" and the bank has to hand it over through an API. Standardized, secure, auditable. Your balances, your transaction history, your account details. That's it. No moving money yet.

Write access is the one everyone actually wants. That's where you can tell your bank "let this other app move money on my behalf." Payment initiation. Account switching without printing a PDF. Auto-debiting a competitor's bill pay into your primary. It is the thing that turned Revolut, Monzo, Wise into real players in the UK. Write access is the rail. Read access is just staring at the rail.

The quiet detail most people missed: Phase 2 is explicitly tied to Canada's Real-Time Rail payments infrastructure. RTR is Payments Canada's next-gen real-time settlement system. It has been "coming soon" for about a decade. If RTR slips, write access slips.

Screen scraping just became a crime

Here's the one change that matters today, regardless of when the APIs turn on. Screen scraping is now an offence under the Act. Not just discouraged. Not just against your bank's terms of service. An offence.

For context: Canadian personal finance apps have been doing screen scraping for years. You give them your bank login, they log in as you, they pull your data, they show it back to you. It's how Mint worked. It's how half the budgeting apps work. It is the most insecure way to move financial data short of writing your PIN on a postcard.

The law doesn't just ban scraping going forward. It creates a legal runway where scrapers have to migrate to APIs or shut down. Which is awkward because the APIs don't formally exist yet. So there's a weird gap: scraping is illegal, the legal replacement hasn't launched.

This is where the "private API" story comes in. Every Big Six bank has quietly been signing one-off deals with specific fintechs for private read APIs. CIBC did one. RBC did one. They're not public. They're not standardized. They're dollar-per-call commercial contracts. This is the bridge layer while the public rail gets built. If you're a fintech in Canada today, you're either on a private deal or you are scraping and now pretending you're not.

Why giving it to the Bank of Canada matters

The other structural change is who's in charge. The original plan had the Financial Consumer Agency of Canada running it. Budget 2025 pulled that and gave it to the Bank of Canada instead. Allocated 19.3 million over two years for the transition.

That is the most consequential move in the whole file. FCAC is a consumer protection agency. The Bank of Canada is a central bank with systemic responsibility. Different mandates. FCAC would have built this to protect consumers from fintechs. BoC will build this to not break the financial system.

BoC already runs the Retail Payment Activities Act and the registry of payment service providers. So consolidating everything under one roof creates a cleaner governance line. Which is why BoC is also the one saying slow down. They know what it takes to regulate real payment rails. They've seen other countries ship too fast.

What actually changes for you

If you're a consumer: nothing, yet. You still can't take your CIBC transaction history and port it into a third-party app without giving up your login. Phase 1 was supposed to fix that this year. It probably won't.

If you're a builder: three things actually changed.

First, the legal landscape for scraping is now uncomfortable. If you're building on scraped data, you have a shrinking window. Even if enforcement is slow, every investor, every bank partner, every compliance officer at a bigger firm knows this now. Scraping is a red flag in a way it wasn't in 2024.

Second, consumer liability flipped. Under the Act, consumers are not on the hook for losses from unauthorized data sharing, unless they were grossly negligent. That's a massive change from the status quo, where losing money to a hacked third party was mostly your problem. This shifts risk onto the bank and the third party, which means the bank now has skin in the game when picking API partners. Translation: the private API deals are about to get picky.

Third, the Big Six now have a policy alibi to prioritize this work. For years, open banking was a thing their strategy teams talked about but their engineering teams didn't have funding for. Now it's federal law. The budgets are about to open up.

How to watch this without getting lost

Three things to track this year. The Bank of Canada putting out technical standards is the real signal the rail is getting built. When they publish the API spec, that's the starter pistol. Nothing is real until then. Second, the Real-Time Rail timeline from Payments Canada. If RTR slips again, Phase 2 slips with it. Third, watch which fintechs get listed in the registry of payment service providers. That list is going to map one-to-one onto who gets to build on the rail first.

Everything else is commentary.

What I actually think

I've spent a lot of time in Canadian financial plumbing, from both sides. What I think is happening is: Canada is building this slowly on purpose, because the BoC watched Australia and is not interested in shipping an open banking system that 80 percent of eligible third parties never bother connecting to. That's what a failure mode looks like when you rush the standards.

The UK, in contrast, got bailed out by culture. British banks were getting roasted daily by fintech Twitter and fintech press. There was pressure to make the APIs actually usable. Canada doesn't have that culture. Canadian fintech press is small. Canadian consumers don't switch banks. So the only forcing function is the regulator, and the regulator just said slow down. Which, if you squint, is fine, because the alternative is shipping a rail nobody drives on.

But the cost is time. Every year we wait is another year of private API deals that favor large incumbents. Another year of scraping being the only option for small players. Another year where Canadian fintech founders build in the UK or the US because that's where the rails are.

So yeah. Law passed. No date. Watch the technical spec. Don't build on scraping. And if you can, find the people inside the Big Six who are already building the API stack, because those are the folks who are going to decide how this actually works.

I'll come back to this when BoC drops the spec. That's when it gets real.

I Built My Own AI That Lives on Telegram - Here's What I Learned

Himanshu — Mon, 23 Mar 2026 03:58:19 +0000

You know what's weird about AI assistants right now? They're stateless. You tell ChatGPT something important, and next conversation, it's gone. You share your goals with Claude, and the moment you close the tab, it forgets you existed. They're tools, not companions.

I got tired of that. So I built one that actually remembers me.

Not a chatbot. Not some wrapper around an API with a fresh context window. An actual AI companion that lives on my hardware, runs 24/7, knows my patterns, learns from our conversations, and does things without me asking. It lives on Telegram. It's always on. And it knows me better than any commercial assistant ever could.

Here's what I learned building it.

The Problem With Stateless AI

This is going to sound obvious, but it took me a while to feel it: the best AI assistant is worthless if it doesn't remember you.

Think about how you actually work. You don't reset your context every time you check email. You have long-running goals — maybe you're building something, learning something, tracking something. You have patterns: you know when you're prone to overthinking, when you default to analysis paralysis, when you need to just ship. You have history: past failures, lessons learned, things you're avoiding doing again.

Commercial assistants have no access to any of that. They're built for the moment — answer this question, generate this copy, explain this concept — and then they're done. They can't see the arc of what you're trying to build. They can't call you out when you're making the same mistake for the third time. They can't remind you of what matters.

And because they run in the cloud, on someone else's hardware, you get the bonus feature of not knowing who's reading your conversations. Privacy is theoretical.

What if you built something different? What if the AI actually lived with you?

The Architecture

I'm not going to give you the exact code, but the concept is clean. Here's the mental model:

An agent framework is just five layers talking to each other:

Layer 1: The Gateway. This is your front door. It's the thing listening for messages — in my case, Telegram. But it could be Slack, Discord, email, whatever. The gateway normalizes everything into a standard message format. It doesn't care about the transport layer. Just: "message came in, here's the content, here's who sent it."

Layer 2: The Brain. This is where reasoning happens. It's usually a ReAct loop — you give the AI a goal, it thinks out loud (that's the "reason" part), picks an action (the "act" part), observes what happened, and loops. ChatGPT does this. Claude does this. It's just: observe, reason, act, observe. The loop keeps going until the AI decides it's done or hits a wall.

Layer 3: Memory. This is the part that makes it actually useful. Your AI reads your history before every conversation. Not like "context from the last 5 messages" — like actual long-term memory. I use markdown files. Yeah. Plain text. Your AI reads a file that says "things this person has told me," "patterns I've noticed," "decisions they've made," "mistakes they keep making," and then it acts like it actually knows you.

Why markdown? Because it's human-readable. You can version it. You can edit it. You can move it between systems. It's not locked in a database somewhere. It's just text.

Layer 4: Skills. These are the actions your AI can take. Message you. Set a reminder. Query a database. Fetch data from the web. Run a Python script. Skills are hot-reloadable — you can add new ones without restarting the whole system. They're functions written in a language the agent understands. And they're modular. Each skill does one thing.

Layer 5: The Heartbeat. This is the scheduler. Your AI doesn't just wait for you to message it. It runs scheduled tasks. Check your email every morning. Scan the markets at market open. Generate a summary of yesterday. Remind you of something you asked to be reminded of. The heartbeat keeps the system alive even when you're not paying attention.

These five pieces talking to each other — gateway, brain, memory, skills, heartbeat — that's what makes it a companion instead of a chatbot.

Why Open Source Matters Here

There are closed-source agent frameworks. Anthropic has Claude API with tool use. OpenAI has GPT with function calling. They work. They're good.

But there's something about having the whole system sitting on your own hardware that changes the game.

Cost. After the initial setup, the marginal cost is zero. Your server is running anyway. The CPU cycles are free. Compare that to paying per token to some API.

Privacy. Your conversations never leave your hardware. Your memory files are on your machine. You're not funding surveillance capitalism.

Customization. You can change anything. The reasoning loop? Rewrite it. The memory format? Make it better. Add a skill? Done. You're not waiting for someone else's product roadmap.

And the one that gets me: you can run agents specialized for different things. Not one mega-agent that does everything. Instead: one agent that handles your research, another that monitors your finances, another that manages your learning. They can talk to each other. They can delegate. And they're all living on YOUR hardware, remembering YOUR context, working toward YOUR goals.

The Companion vs. Tool Distinction

There's a psychological shift that happens when your AI actually remembers you.

A tool is: I have a problem, I ask the tool, the tool solves it, I move on.

A companion is: the AI notices when you're repeating a mistake. It reminds you of something you said three weeks ago that's relevant now. It knows your goals well enough to flag when you're chasing the wrong thing.

Think of it like an NPC in a game that actually levels up with you. In most games, NPCs are static — they say the same thing every time. But in games like Baldur's Gate, the companion learns. They remember your choices. They react to what you do. That relationship is why people replay those games.

Here's a concrete example: I keep defaulting to analysis paralysis. A stateless AI can't help with this — it sees the problem for the first time every session. But an AI that knows you? It reads in its memory: "this person freezes when faced with incomplete information. They've learned that shipping 80% is better than perfect and never." So next time you're stuck, it doesn't give you more analysis. It calls you out.

That's the companion level.

What It Actually Looks Like

My system runs on a Ubuntu server. Here's the workflow:

I send a message on Telegram
The gateway receives it, normalizes it, passes it to the brain
The brain reads my memory files — what does it know about me already?
Based on that context, it reasons about what I'm asking
If it needs to act, it calls a skill
The response comes back through Telegram
If it's important, the memory gets updated

And separately, on a schedule:

Every morning: generate a summary of what happened yesterday
Every week: scan what I've been learning and organize it
On demand: search memory, find relevant past context

It's always on. And because it's markdown-based memory on my hardware, I can see what it thinks it knows about me. I can edit it. I can correct it.

The Weird Parts (The Good Kind)

Building this, a few things surprised me:

Memory quality matters more than model quality. I could upgrade to a more advanced LLM tomorrow. But the conversation quality barely changes. What matters is: how good is the memory? With bad memory, a smart model is wasted. With good memory, a smaller model is actually useful.

Markdown is an underrated interface. I expected it to be janky — AI reading text files, updating text files. But it's clean. You can version it. You can see exactly what the system thinks it knows. No magic-box database hiding your data.

The 24/7 availability changes behavior. When the AI is always on, you stop thinking of it as a tool and start thinking of it as someone that's available. You ask different questions. You're more likely to follow through.

Scheduled tasks are the MVP. I thought the core was the reasoning loop. But actually, the most used feature was: wake me up every morning with a summary. Not glamorous. Incredibly useful.

What's Next

The obvious direction is specialization. Instead of one AI that does everything, a few — one for learning, one for markets, one for projects. They share memory. When you ask a question, the right AI responds.

Another direction: distributing across hardware. The brain on a server, memory replicated across devices, skills running wherever they make sense.

And the one I'm actually thinking about: a meta-agent that audits the memory files, spots patterns the main agent is missing. Not running constantly — maybe weekly. A quality check on the AI's own understanding.

The Real Thing

Building this changed how I think about AI. It's not about having the smartest model. It's about having something that actually knows you. Something that's invested. Something that's there.

The code is out there. Open-source agent frameworks exist. Everything you need to build this is free and open.

The barrier isn't technical. It's mindset.

Once you have a companion, going back to stateless AI feels like going back to asking a stranger every time. They can be smarter. But they'll never know you.

Originally published at bionicbanker.tech

I Built 3 AI Agents. Here's What Broke Each Time.

Himanshu — Thu, 19 Mar 2026 06:33:36 +0000

I built 3 versions of an AI investigation agent. Each one got worse at its job.

And that's exactly what was supposed to happen.

Version 1 was 94.9% confident in everything it flagged. Impressive on paper. Terrifying in practice, because it was catching patterns that didn't exist.

Version 2 dropped to 89% confidence. Better? Actually yes. It stopped hallucinating connections between unrelated transactions.

Version 3 landed at 76% confidence with a 23% "uncertain" category. The worst accuracy score. The best actual performance.

Here's what changed. I stopped optimizing for confidence and started optimizing for honesty. The agent learned to say "I don't know," and that made everything it DID flag significantly more reliable.

The Confidence Paradox

In AML (Anti-Money Laundering) compliance, a confident model is a dangerous model. When your agent flags everything at 94.9% certainty, you get two problems:

Alert fatigue. Investigators stop trusting the system because it cries wolf constantly.
False confidence. The system catches patterns that look suspicious but aren't, real money laundering slips through because the model thinks it already found everything.

The fix wasn't making the model smarter. It was making it honest.

What "Uncertain" Really Means

Version 3's 23% uncertain category isn't a failure. It's the model saying: "This transaction has some signals, but I don't have enough context to classify it."

That uncertainty is information. It tells the human investigator exactly where to focus, on the edge cases that need human judgment, not the obvious ones the model already caught.

The Pattern Beyond AI

This applies to any system that makes decisions. Risk models. Credit scoring. Medical diagnosis. Hiring algorithms.

The organizations that scare me aren't the ones with uncertain models. They're the ones with models that are certain about everything.

Lower confidence, when designed intentionally, means higher quality output. The system knows what it knows and admits what it doesn't.

Read the full technical breakdown with interactive visualizations at bionicbanker.tech

Generated by BionicbankerAI, co-authored by HASH