Forem: Alex O'Callaghan

Hardening npm dependency security

Alex O'Callaghan — Thu, 30 Apr 2026 12:15:50 +0000

On March 30, 2026, two malicious versions of axios were briefly published to npm. Axios has over 100 million weekly downloads. The attacker had compromised a maintainer's account and used it to publish axios@1.14.1 and axios@0.30.4, each containing a hidden dependency whose postinstall hook silently installed a cross-platform Remote Access Trojan.

The malicious versions were live for around three hours before being removed, with Microsoft attributing the attack to a North Korean state actor. This was a targeted, well-prepared supply chain operation against one of the most widely used packages in the ecosystem.

It's a good reminder to review what you're actually doing to protect yourself.

Start with the obvious: use a lockfile

This one goes without saying, but it's worth saying anyway. Commit your lockfile. Don't run installs that bypass it. A lockfile means you're installing exactly what was resolved last time, not pulling whatever version satisfies your semver range today. It also means supply chain incidents show up as diffs so you can see when a transitive dependency changes unexpectedly.

Reduce your dependency surface area

The simplest way to limit exposure to a supply chain attack is to have fewer dependencies. Every package you don't install is a package that can't be compromised.

First, audit for unused dependencies. Knip will scan your codebase and surface packages that are listed in your package.json but no longer imported anywhere. Projects accumulate dead dependencies over time and most teams don't actively prune them. Running Knip periodically or including it in your CI pipeline makes sure unused dependencies are removed.

Second, look at e18e, an ecosystem initiative focused on cleaning up, modernising, and improving the performance of JavaScript packages. One aspect of that work is replacing heavy, outdated dependencies with lighter modern alternatives, such as is-odd style packages that have no reason to exist as a dependency, lodash functions that are now native, and so on. Less dependency weight means fewer packages to worry about.

pnpm settings

The axios RAT was delivered via a postinstall hook that runs automatically when a package is installed. This is the mechanism behind the majority of npm supply chain attacks.

pnpm v10 disables automatic execution of postinstall scripts in dependencies by default. Rather than running build scripts for any package that asks, you explicitly allowlist the ones that legitimately need it:

# pnpm-workspace.yaml
allowBuilds:
  esbuild: true
  "@parcel/watcher": true

If a dependency didn't require a build script before, it won't suddenly run one. A compromised version of a package can't use a postinstall hook to execute malicious code if that hook was never in the allowlist.

You should also enable a few pnpm settings to help protect you:

minimumReleaseAge: 10080 # 7 days in minutes
trustPolicy: no-downgrade
blockExoticSubdeps: true

minimumReleaseAge tells pnpm to refuse to install any package version published less than the specified number of minutes ago. The axios attack was live for three hours. A one-day delay (1440) would have been enough to dodge it entirely. We use seven days, matching our Renovate stability window.

With trustPolicy: no-downgrade if a package was previously published with provenance attestation from a trusted CI pipeline, but a new version lacks that evidence, pnpm will block the install. The axios attack was detectable this way as the malicious versions were published without the trusted publisher binding present in legitimate releases.

One caveat: no-downgrade does generate occasional false positives when maintainers of legitimate packages drop provenance attestation. You can use trustPolicyExclude to exempt specific packages you've manually verified:

trustPolicyExclude:
  - "some-package@1.2.3"

Also add blockExoticSubdeps: true explicitly. This prevents any transitive dependency from being resolved from a git repository or direct tarball URL, forcing them to come from the registry.

Scope your internal packages

If you publish internal packages to a private registry, make sure they're under an organisation scope (e.g. @myorg/package-name) rather than an unscoped name. This reduces the risk of a dependency confusion attack, where an attacker publishes a public package with the same or similar name as your internal one. If your registry configuration ever regresses or a new environment is misconfigured to pull from the public registry first, an unscoped internal package name is a straightforward target.

Automated upgrades with Renovate

We use Renovate to manage dependency updates across our projects. Two settings work together here.

minimumReleaseAge (formerly stabilityDays) delays Renovate from raising a PR for a new package version until it's been published for a given number of days. We set this to seven days. This gives the community time to catch malicious releases before they land in our codebase, and as a side benefit it avoids the churn of picking up a release that gets a patch two days later.

Security update PRs bypass the minimum release age entirely. If Renovate detects a known vulnerability in a dependency, it raises the PR immediately regardless of how recently the fix was published. The stability delay doesn't slow down your response to CVEs, it just slows down routine bumps.

For internal packages, configure a separate package rule with no stability delay. You want to roll those out quickly to catch integration issues early.

{
  "minimumReleaseAge": "7 days",
  "packageRules": [
    {
      "matchPackagePrefixes": ["@myorg/"],
      "minimumReleaseAge": "0 days"
    }
  ]
}

None of this is a silver bullet

These settings work by buying time rather than detecting anything themselves. We rely on the security researchers, automated scanning platforms and the open source community to identify compromised packages, along with the npm registry team to promptly remove them. A stability window only works if others find the issues, and it's not a replacement for vigilance around the dependencies you add and upgrades you pull in.

Caching & CDNs with micro-frontends

Alex O'Callaghan — Thu, 16 Apr 2026 09:35:05 +0000

Caching in a micro-frontend architecture is more nuanced than in a monolithic frontend. You have a shell, multiple remote manifests, and the chunks they reference, each with different deployment cadences and different tolerance for staleness. This post covers how we've approached it at Mintel, what's broken for us in the past and what we haven't fully solved yet.

Our stack

We run ~30 micro-frontends using Webpack Module Federation. The shell is a purely static Jamstack app deployed to S3, served via CloudFront, with Akamai in front of everything at the outermost layer. Remotes live at fixed, well-known URLs.

Deploy is a single rclone command that copies the built dist directory to a specific subdirectory in S3 per MFE. The shell and each remote are independent deployments, maintained by independent teams.

How we configure each asset type

Different assets have different caching requirements. Here's what we currently set and why.

index.html - no-cache

The shell's index.html bootstraps everything. If it's stale, everything downstream is potentially wrong. We set a Cache-Control: no-cache header on it via S3 object metadata, automated as part of the deploy.

no-cache doesn't mean the file won't be cached. It means the CDN or browser must revalidate with the origin before serving it. If the origin returns a 304, the cached copy is served, but if the content has changed, a fresh copy is returned.

remoteEntry.js - never cache

remoteEntry.js is the Module Federation manifest. It tells the shell where to find a remote's chunks. When you deploy a remote, this file changes but the filename doesn't.

A stale remoteEntry.js has two failure modes. The obvious one is errors: if it points at chunks from a previous deploy that have since been replaced, you'll get runtime failures. The subtler one is that users won't see the new version of the app until they get a fresh manifest. This leads to a remote team shipping a fix or a feature, but users continue running the old code because the manifest is stale.

At the Akamai layer we set Cache-Control: no-store, max-age=0 on remoteEntry.js, which prevents it being cached by the browser. We also use dynamic remote loading via the module-federation-import-remote package, which appends a cache-busting query param to the remoteEntry.js URL by default. Since our CloudFront distribution includes query strings in the cache key, this ensures that even if the file is cached at CloudFront, each request gets a unique URL that bypasses the cache and fetches the latest version from S3.

Chunks - no explicit headers

The JS chunks that remoteEntry.js references are content-addressed via Webpack's contenthash substitution. When the content of a file changes, the hash changes, and so does the filename. That means you can cache chunks aggressively - a new deploy produces new filenames, so the CDN treats them as new assets automatically.

Configuring this in your output.filename is straightforward:

output: {
  filename: '[name].[contenthash].js',
}

We've had cases where teams forgot to configure contenthash for their remote's asset filenames. The chunks deployed with predictable names, got cached, and subsequent deploys weren't reflected for users until the cache TTL expired naturally.

We don't currently set explicit Cache-Control headers on chunks. CloudFront then caches these files based on a heuristic TTL derived from the Last-Modified or ETag headers from S3. At the Akamai layer we set no-store behaviour on origin responses, so all caching happens either within the browser or at CloudFront.

404 handling

Our shell is a single-page app. Routing is client-side. If a user navigates directly to a route or refreshes, the CDN looks for a file at that path in S3, finds nothing, and by default returns a 404.

The fix is configuring CloudFront to serve index.html for 4xx responses from the origin. AWS documents this as custom error responses in the CloudFront distribution settings.

Akamai proxies any request that doesn't match a known API or MPA route through to CloudFront, so this fallback behaviour is handled at the CloudFront layer and applies to all MFE routes.

User request flow

Remotes are loaded lazily, wrapped in React.lazy and dynamic imports. The shell only fetches a remote's remoteEntry.js when the user navigates somewhere that needs it.

CloudFront/S3 outage

The setup above evolved over time, partly in response to incidents. In January 2025, an AWS issue meant that our CloudFront origin started failing to fetch content from S3, resulting in 404 NoSuchBucket errors. Because CloudFront is configured to serve index.html for 4xx responses from S3, the standard SPA catch-all setup, those errors were converted to 200 responses before reaching Akamai. Akamai had no way to know anything was wrong and cached them normally. When AWS recovered, users were still being served those cached bad responses from both Akamai's edge and their own browser caches.

Purging in Akamai is slow and painful. You can't glob a path and clear everything matching a pattern, you need specific URLs. With dozens of MFEs and hundreds of JS chunk files, that's not a practical option under pressure. We ended up in a war room, scrambling through purge requests that sat loading, watching caches bust gradually over the course of a few hours as TTLs expired naturally. This also wouldn't help users with bad responses cached in their browsers.

The escape hatch we landed on was changing the contenthash length in the Webpack config across key MFEs, then redeploying. Changing the contenthash length changes all the generated filenames, which forced the CDN to treat them as new assets rather than serving cached bad responses. It worked, but we arrived at it under pressure, it wasn't a documented runbook step.

Since then, we disabled Akamai caching for MFE assets and added multi-region failover for the S3 bucket to reduce the risk of being in the same position again. We also started explicitly setting no-cache on index.html to ensure changes are picked up quickly and any erroneous fallback responses aren't cached for long.

The honest answer to "what's the plan if we get bad responses cached" is still: we don't have a clean solution. The contenthash length trick remains our nuclear option for forcing new filenames across the board when we need to invalidate everything in a hurry.

Improving our caching strategy

Writing this post has been a useful exercise in reflecting on how our current caching strategy works. Caching config isn't something you revisit often when the system is working, and the current setup has held up well enough in practice.

However our cache configuration is split between S3 object metadata for index.html and Akamai rules for remoteEntry.js, and other assets have no explicit cache headers at all. There's no single place to look to understand the full caching policy. We also don't cache at the outer Akamai layer at all, as a strong response to the pain of our previous incident, but this is hurting performance and increasing bandwidth costs.

The cleaner approach is to set explicit Cache-Control headers at the origin via S3 object metadata for every asset type, and treat the CDN layers as caches that respect origin headers rather than places where caching policy is defined. That also means if we ever swap out or reconfigure the Akamai layer, the caching behaviour follows from the origin rather than being silently lost.

We aren't setting cache headers for chunks at all, so we're relying on CDN and browser heuristics to determine how long to cache them. Setting explicit Cache-Control: max-age=31536000, immutable headers on content-hashed chunks and re-enabling Akamai respect origin cache behaviour would be a good improvement, ensuring they're cached aggressively and correctly as immutable assets - but with our current set up there's no guarantee that every team has correctly configured their build output filenames to use contenthash. There is, however, a different approach that would solve both problems at once, but it requires a more significant architectural change.

The alternative: versioned URLs and a discovery service

Everything above assumes remotes live at fixed, well-known URLs. That's the simplest deploy model, but it's also the root cause of why remoteEntry.js caching is hard, because when you're mutating a file in place you can never safely cache it for long.

The more robust approach is to include versioning in the URL itself:

https://cdn.example.com/remote-a/v1.4.2/remoteEntry.js

With versioned paths, remoteEntry.js becomes a content-addressed file like any other chunk. You can cache it with max-age=31536000, immutable. Old versions stay in S3 indefinitely, so users mid-session aren't broken by a deploy. Rollback is pointing the manifest at a previous version rather than a redeploy.

To make this work, the shell can't hardcode remote URLs. You need a discovery service - something the shell calls at boot time to get the current URL for each remote:

{
  "remote-a": "https://cdn.example.com/remote-a/v1.4.2/remoteEntry.js",
  "remote-b": "https://cdn.example.com/remote-b/v2.1.0/remoteEntry.js"
}

However we don't do this, even though we identified this pattern and added it to our architectural blueprint as a possible future direction over a year ago. It would catch teams missing contenthash configuration, but that's a discipline problem, not a reason to migrate 30 MFEs. More aggressive caching at each CDN layer would also help performance and reduce bandwidth costs, although it's challenging to quantify the impact of that without a detailed analysis of cache hit rates and bandwidth costs, and we could achieve a similar effect by setting immutable headers for chunks with our current approach.

Having a discovery service would also enable more complex deployment patterns like canary releases or feature flags at the deployment level, but this all adds complexity and operational overhead. Most cases we're able to feature flag within the application logic itself, making it easier to reason about the code our users are running. Canary releases also requires a time investment in automated monitoring and alerting strategies to be truly useful.

While a discovery service remains a potential future option, our more immediate actions are focused on improving our current caching strategy within our simpler deployment approach.

You can't trust agent tests

Alex O'Callaghan — Mon, 13 Apr 2026 11:09:26 +0000

I used an agent to migrate 53 Enzyme test suites to React Testing Library. The tests passed and the code looked coherent, however when we took a closer look we found tests that wouldn't catch regressions, that asserted on the wrong things and some that didn't really test anything.

With agent-generated tests this is easy to miss as the output looks intentional, the code is clean and there's a lot of it.

Enzyme to RTL migration

The Enzyme tests were technical debt sitting on a backlog as low priority for years. This is exactly the kind of well-defined, mechanical work agents are supposed to be good at. I chunked it by complexity, used my prepare-mr-skill to structure the commits and write descriptions and it worked. At least it looked like it worked: the tests were migrated, they passed, and the code changes looked reasonable.

Before merging, we ran the MRs through our usual team review process and spotted tests that looked correct but wouldn't actually catch regressions. For example, a test asserting that an element with a specific ARIA role wasn't rendered, where the role was invalid and that element could never have been rendered by the component in the first place. There was also confusion between disabled and aria-disabled, two attributes that behave differently and matter for accessibility but look similar enough that a plausible-looking test can get them wrong without failing.

Why this happens

The problem isn't unique to agents, it's easy to write tests like this by hand too. However a human writing a bad assertion usually knows the component and might notice the test feels too easy. An agent has no intuition, it pattern-matches from the implementation to produce something that looks right and moves on. Also, the volume of code that agents can produce make it easy for issues like this to slip through.

When the implementation already exists (eg migrating tests, backfilling coverage, validating tests written years ago) the natural safeguard of Test Driven Development (TDD) doesn't apply. You're writing tests against working code, so a bad assertion never has the chance to fail. The only way to catch it is to deliberately introduce a failure: make a temporary change to the component, confirm the test catches it, revert.

Getting an agent to actually do it

My first instinct was to fix the prompt to force the agent to check for these types of mistakes. I updated my RTL migration skill to explicitly require that every test be validated against a failing state.

It didn't reliably work. The agent would consider the requirement, sometimes catch something, and move on. It treated validation as a checkbox rather than a constraint.

Part of the problem is structural. When you ask an agent to migrate a large number of tests in a single pass, the task is long and the context fills up. As context grows, models start to lose focus from your instructions. Some models also display "context anxiety" where they begin to wrap up work early as they approach the end of their context window. This drives agents to start to forget or skip these sorts of validation instructions.

I tried running the migration per test suite rather than in one pass, thinking a shorter context window would reduce the pressure to complete and give the agent more room to slow down on validation steps. The agent was more likely to check, but would often decide to validate "a few key tests" rather than every single one.

Finally, I looked to split the migration and review steps into two separate agent tasks. I wrote a bash script that looped through each test suite and ran two sequential agent prompts: one to migrate, one to review.

#!/bin/bash

echo "Starting migration..."

declare -a files=(
  "path/to/test-suite.test.tsx"
)

for i in "${files[@]}"; do
  echo "Migrating $i"

  agent -p --force --output-format stream-json --stream-partial-output --model claude-4.6-sonnet-medium \
    "Migrate $i to use RTL. Ensure linting passes and then commit your changes in a single commit."

  agent -p --force --output-format stream-json --stream-partial-output --model claude-4.6-sonnet-medium \
    "Review the RTL tests in $i -

        * Identify any unnecessary tests, remove them
        * Identify any tests not following best practices, fix them
        * For every single test intentionally change the implementation to break the specific feature tested and confirm the test fails as expected, fix them if they don't

        Ensure linting passes and then commit your changes in a single commit."

  echo "Migrated $i"
done

With a more focused validation task the agent created its own to-do list and worked through each test exhaustively, catching issues and making some improvements beyond what was caught in the review.

What it cost

That worked, but also cost significantly more. All migrations were ran through Cursor using claude-4.6-sonnet-medium on a batch of 22 test suites:

Approach	Cost
Single-pass batch	~$13
Per-suite	~$32
Per-suite + reviewer	~$62

The per-suite reviewer pass resolved every issue that human reviewers had previously caught, but was an almost 5x increase in cost compared to the single-pass batch.

Agents enable us to do more, completing modernisation work that would have just sat on a backlog. But doing it well, with results we can trust is more expensive than we might expect. I think if we're going to do something we need to do it well, but there's a trade-off here around the real monetary cost of this work even if the time required is greatly reduced.

Organisations need to be intentional in how this increased productivity leads to increased revenue, and not just increased costs on AI vendor bills.

The takeaway

The two-prompt script wasn't a better prompt, it was a different architecture. When you ask one agent to migrate and validate in the same pass, the task structure works against validation. When a process step consistently gets dropped, the fix usually isn't a better instruction. It's a workflow where that step can't be skipped.

What the reviewer is doing here, conceptually, is mutation testing: for each assertion, asking whether a change to the component would cause it to fail. Incorporating a mutation testing tool like Stryker into your testing pipeline could help catch these issues and reduce cost by avoiding having an agent do it manually.

Time spent designing agent workflows is part of the work and should be considered when evaluating productivity gains from using agents. It's the difference between output you can trust and output that just looks like you can.

Moving fast with agents without losing comprehension

Alex O'Callaghan — Mon, 06 Apr 2026 08:49:01 +0000

Addy Osmani wrote a great post last week on comprehension debt, the hidden cost of AI-generated code. The core idea: AI generates code far faster than humans can evaluate it, and that gap quietly hollows out the team's understanding of their own codebase.

It resonated with me, but what struck me most is a specific asymmetry in how the industry is responding. Most guidance around working with agents optimises for agent comprehension: context files, MCP servers, documented skills, feeding in the right information so the agent can reason about your codebase. There's far less conversation about the equally important problem: making sure humans still understand the system the agent is changing.

We're optimising for agent comprehension while human comprehension quietly erodes. That gap is what's made me think carefully about how I've been working, and what actually needs to be in place before you can move fast without losing the understanding that keeps a codebase healthy.

The thing reviews were actually doing

Reviews aren't just quality assurance. They're how understanding spreads across a team. When someone reads your code carefully enough to approve it, they're building a mental model of what changed and why. That's the mechanism by which a team stays collectively oriented to its own codebase.

Agents put this mechanism under pressure, not by making code worse, but by generating it faster than the review process was designed to handle. Sometimes moving fast and trusting the agent is the right call, especially in well-covered, well-understood parts of the codebase. But when it goes wrong the consequences compound. Each poorly-understood change makes the next review less meaningful as you're reasoning about new code against a mental model that's already drifting.

What I've learned from trying

My initial instinct when I ran into this was process. Break large agent changesets into smaller sequenced MRs, each telling a coherent part of the story, each individually deployable, like a slow-motion replay after a fast-forward session. There's something to it. A large MR where I reorganised commits to be reviewed one by one got merged without friction. Making changes legible and telling a coherent story is always the right instinct.

But I also have five stacked MRs on a legacy codebase sitting in draft. I understand what the changes do, but I don't trust the existing test coverage to catch the side effects and functional behaviour that could break. Without that confidence there's an implicit expectation of manual verification underneath the whole thing, and that's asking a reviewer to carry the risk you haven't dealt with.

Process can make changes more legible. It can't substitute for a safety net that isn't there.

What comprehension actually needs to look like now

It's not line-by-line, that's not feasible anymore, and pretending otherwise just means some reviews are theatre. But it's not nothing either. I think it works at three levels.

The first is behavioural: does it work as expected? This is where test coverage becomes the most important investment a team can make. Real coverage that covers real behaviour across paths users actually take, alongside type safety that catches type errors at compile time. If the compiler and test suite are doing their job, reviewers don't need to trace every line. The places where coverage is thin, or where teams have been relying on manual testing, are exactly the places where agent velocity stops being speed and starts being negligence.

The second is architectural: do we broadly understand how the changes work, and can we update our mental model of the system? This is something agents can help with directly. Ask the agent to summarise the meaningful decisions in a changeset, not the mechanical changes but the choices a human needs to evaluate: what alternatives were considered, where the non-obvious decisions are, what the author would flag in a code walkthrough. Use that as the basis for your MR description. I've packaged this into an agent skill you can drop into your own workflow, it produces a structured MR description and a commit structure recommendation you can review and use to help make agent-generated changesets more legible to reviewers.

The third is standards: does the code meet the conventions the team has agreed on? Linting handles a lot of this automatically and anything you can push into a linter is one less thing a human reviewer needs to spend attention on. For the things linting can't catch, I've written before about agent skills. If your standards are documented well enough to guide the agent writing the code, they're documented well enough to guide an agent reviewing it too.

Show your working

Good authorship has always mattered. It matters more now. The reviewer wasn't in your agent session and they have no ambient understanding of what you were trying to do, what tradeoffs you considered, what decisions the agent made that you consciously kept. That context doesn't transfer through the diff, you have to transfer it deliberately.

That means flagging the architectural decisions that actually need human eyes, not just describing what changed but why. It means thinking carefully about commit structure so the story of the change is legible before someone even reads the code. It means writing a description that demonstrates you understood what the agent produced, because if you can't explain it clearly there's a risk you've switched to passive delegation.

The Anthropic study Addy cites found that engineers who used AI for passive delegation, just letting it produce code without staying actively engaged, scored significantly lower on comprehension tests than those who used it as a thinking tool. The agent doesn't replace the engineer. It's a tool, and you still need to understand what it's doing and why, not just that it works. That understanding is what your reviewer deserves: guide them toward it rather than leaving them to reconstruct it from scratch.

Not every change carries the same risk or requires the same depth of review, and being explicit about that is part of good authorship too. Ship / Show / Ask is a useful frame for this, calibrating the level of review based on the nature of the change and the trust already established with your team.

What fast actually requires

The five MRs sitting in draft aren't blocked by process or by my understanding of the code. They're blocked because the safety net isn't there. That's the first obligation, fix it before you ship, not after.

But a solid test suite without the authorship work just means your reviewer can confirm nothing broke. That's not the same as understanding what changed, or why, or what the agent decided that you consciously kept. The agent gives you velocity. What makes that velocity real is being able to explain what you built and why, not just that it works.