Forem: Lukasz Ostrowski

Cleaning up large frontend codebase

Lukasz Ostrowski — Wed, 15 Oct 2025 14:21:30 +0000

Recently I started work on the new extensions functionality in Saleor Dashboard. This repo is quite large (450k LOC). I decided to make some cleanup before I introduce the feature.

So the plan was:

Make a feature...
... but first, refactor this and that....
... but first, remove some dead code.

tldr: This post is about removing approx 30k LOC (~6.6%) of the codebase and bundle size (pre-gzipped) lowered by 350KB

Diagnosis

First of all, I have diagnosed where the main gains are. I found three main areas

Invalid module refactor
Stale feature flags
Unused graphQL queries

I wanted to focus mainly on them, because I already had some understanding what's going on. I also wanted to keep them specifically, to reduce the code complexity before my change.

Invalid module refactor

Some time ago Saleor consolidated extensions model - we have merged Apps, Plugins and Webhooks under the single domain - extensions.

The refactor happened using feature flag, but it was not "branching" with specific differences (like show route A or B). Instead entire module was copy pasted 1:1 to a new directory. Then for few months we maintained both of them and they slowly started to differ.

The goal is to drop the old code without breaking anything

Stale feature flags

We also had several feature flags, all of them quite stale. All of them enabled, but entire code branching for disabled flags still was in the source. The goal here was to drop flags and remove dead code.

Unused queries

When I removed flags, I realised many graphQL queries exist in the codebase (and are processed by codegen into types and executable operations). The goal here was to remove as many of them as possible

Techniques

I will not write a deep dive of my every move in this process, but rather share some learnings on the process

Static analysis

During the cleanup I was using existing static analysis tool and improved their config. It's probably the best possible way to diagnose and maintain quality around dead code.

ESLint - detect unused declarations, plugins like graphQL plugin can help detecting issues with queries, etc.
Knip - scans the code for unused exports. Not always working, but it can show not-so-easy dependencies. For example, when code is imported by test, technically this code is not "unused". Knip can detect that
Dependency cruiser - can enforce architecture decisions like forbid circular deps etc

I was using such tools on every PR to validate my changes and find new issues

Tests

Obviously before the refactor, we should have good tests coverage to verify if our changes didn't break anything.

With AI this is easier than ever. Code that is hard to test is often not tested. Claude wrote tests for me, I only reviewed if assertions are valid.

Dropping `default export` and barrel files

Both export default and barrel files (index.js/ts) are known anti-patterns. Their existence make static analysis a nightmare. I did several runs to remove them before moving with other changes

Small PRs

It's tempting to add more and more changes into the same PR, but it never works.

Large PR is hard/impossible to be checked by human properly
LLMS context is too low to review them as well
It's not friendly to ask someone to check such code

Instead, I tried (not always successfully, but it's something) to introduce small and cohesive changes:

PR 1: Remove export default statements and update imports → reviewer only has one "context" to verify and usually if it builds, it works
PR 2: Remove barrel files and update imports → ditto
PR 3: Remove bunch of files, nothing else

etc.

I broke this rule once - and on the review the bug was found. I didn't have time to fix it then, and the change was quite large. Before I was able to go back to my work, I had so many conflicts, I had to start from scratch

AI codemods instead of direct changes

I realized using AI that for such a large codebase, AI often fail to keep the context. I use it to write a test suite, but I can't ask it to refactor the codebase.

What I did instead was starting to use AI to write codemods and other scripts.

For example, I haven't found a tool that would find unused graphQL queries (maybe because we colocate them in .ts files?). It's not easy to find dead code here, because codegen is transforming them to documents, hooks etc. So only way to find it is

Remove document, rebuild codegen, build app/types and check if it's failing
Automate it somehow

To automate it, I gave Claude rules how codegen is generating queries (eg query ProductsPage generates useProductsPage hook). It took approx 30 minutes, but it vibecoded a script that pretty much worked. Script itself can be a trash that I won't maintain, but I can use it to find what to remove, and delete it afterwards.

Keep flags clean

Our flags are controlled custom way, so we don't have any fancy tools to control their staleness. But we should, either use a service or introduce a process that will allow us to periodically drop them. Flags usually add major complexity and should be short living.

Upgrading React 17 to 18 in a large codebase using AI

Lukasz Ostrowski — Mon, 11 Aug 2025 06:24:11 +0000

Recently, at Saleor, we invested some time into CMD+K (command bar) enhancements. While working on it, I realised our custom solution had become too limited and hard to maintain, so it was time to switch to an existing library.

Our Dashboard - a large and mature codebase - is still running on React 17, which turned out to be a blocker because the library requires v18. You might wonder why we’re still on a five-year-old version. The answer is simple: if we don’t need a change, we prioritise other work. Until now, there was no real need to upgrade.

So, the question was: is the new CMD+K enough of a reason to invest the time in upgrading?

Scope of Changes

Fortunately, changes from 17 to 18 are quite simple:

Change how ReactDOM creates a root node (manual change, usually once per project)
Upgrade types: get rid of React.FC and fix types in general.

I couldn’t find a codemod that safely migrated React.FC to the shape:

const Component = (props: Pros) => JSX

so I started doing this manually.

Trying an Agent

Manually fixing these would be a monkey job, so it's time to leverage AI for help.

Spoiler: it worked, but I had a few rounds of incrementing prompts:

Precise Prompt

Explaining to the model what to do worked quite well, but it was too literal.

For example, some components used destructuring ({a, b, c}: Props), others used props: Props. This mattered - in some places we passed props to legacy Material UI styling via useStyles(props), so destructuring broke the flow.

I had to do a few rounds to explain exactly what to do.

Focus on Type-Checked Files

Instead of scanning all the files, I told the agent to run tsc first and aggregate files with errors. This kept the AI focused on relevant files.

Working in Batches

Model context was clearly not enough to scan entire codebase. I explicitly asked to work in batches, clearing context in the meantime.

Saving State

At the end of each prompt, I asked to dump the “enhanced" prompt in the .md file, including:

The original prompt
Its refined version
A list of files already fixed
Any performance optimisations it used

I committed these files to git to make it easy to pause, resume later, or hand off to another AI. The results were surprisingly good.

Picking up by another agent

To avoid hitting usage limits, I switched between Claude Code, Atlassian's Rovo, and JetBrains Juni. Each could pick up exactly where the other left off using the saved prompt.

Live Editing vs Codegen

While researching this topic, I noticed an interesting approach, heavily used by Juni. It created codemods instead of fixing each file directly in the following flow:

Write a transformation script
Test it on one file
Iterate
Apply it to the whole codebase

In future upgrades, I might skip manual batching and go straight to having AI generate a codemod.

TypeScript type inference - the dark side

Lukasz Ostrowski — Thu, 05 Jun 2025 06:47:17 +0000

Type inference in TypeScript allows it to figure out what is the type, without explicitly declaring it.

let mutableVersion = 1 // Type is number
const immutableVersion = 1 // Type is 1

You can try it here

TypeScript is not only smart enough to figure out that 1 is a number (duh) but also to understand that const declaration for a primitive can't be reassigned (it's a value, not a reference) - hence it will stay 1 forever, contrary to let which can change.

TypeScript tries to do it's best and usually it works quite well. Together with satisfies and as const statements we can write type-safe code barely declaring anything.

However, there are code-architecture downsides to relying too heavily on type inference, which can eventually make maintenance difficult.

Code first or design first

From my experience, most developers tend to write code and figure out the design as the outcome. "Something" eventually works, a few tests are added on top and we are done.

This approach may be more satisfying but is more "artistic" than an "engineering" approach. The implementation of the logic itself is not too important, if:

It works, which tests proof
Is performant enough to match our metrics
Is encapsulated, so it doesn't leak where it doesn't belong to

If all these 3 are met, we can always easily replace the implementation without affecting the rest of the program.

In the context of the design, encapsulation is critical. It draws the boundary of the abstractions (function, classes, modules) and allows us to design how they coexist and communicate.

At this point, you may be thinking - what does it have in common with inference?

Interface vs inference

By allowing language to infer the type, we accept it to follow a "code first" approach.

const wrapCollection = (collection: Array<{id: string; name: string}>) => collection.reduce((acc, next) => {
  acc[next.id] = next.string
}, {})

What does it return? Apart from the cognitive load (you force someone to read and understand implementation to understand what is returned), you just let TypeScript figure it out. And this is a simple function, for sure you have seen multiple-layered map/filter/reduce monster, best if placed in some React component, to make testing even harder 🥲

You can also be a good colleague and do this.

type CollectionItem = {id: string; name: string}

type WrapCollection = (collection: Array<CollectionItem>): Record<string, string>

const wrapCollection: WrapCollection = ...

What has changed? You started with declaring the data shape and data flow (in and out), then started to implement. Even empty, not implemented functions will be ready to import and write tests at an early stage, where you can validate the API.

Relying on inference is like writing only half of the interface.

Impact on the maintenance

Using inference not only makes it difficult to design a good code but also makes it harder to maintain.

Our wrapCollection in a few months can be used by many engineers in many places. They will rely on inferred type... Then you need to refactor.

Say, you want to change it to for instead of reduce because you operate on a large amount of data and need to improve performance.

You change the inner implementation, but there is no Typescript boundary preventing you from changing the outer shape. Yes, the rest of the code hopefully will be "red" and your tests will fail. But there are many codebases, including ones with not full TypeScript coverage and missing tests.

But every time, your function defines its outer shape (doesn't have to be an interface, can be just a static declaration of the returned type), you are locally protected from breaking that contract.

Summary

I'm not declaring types literally everywhere, but I believe strong type coverage makes code more maintainable.

The more complex the function is, the higher the ROI is. Simple, especially private functions are not that important if we know that only one caller exists. But public methods, widely used across the codebase, benefit from being typed.

You can use ESLint rule to require explicit function return type as well.

Error modelling

Lukasz Ostrowski — Thu, 29 May 2025 06:52:24 +0000

Intro

Let’s have a conceptual look at error modeling. I will use Node.js ecosystem and TypeScript in these examples (and some pseudo-code). I find built-in error management in the JS ecosystem rather poor compared to some other languages, which makes it even more important to treat this topic seriously in this tech stack

I focus on the error modeling, but I don’t focus on the data flow. In another article, I will write more about managing errors Rust-way (which has the built-in distinction between recoverable and non-recoverable errors)

Role of errors

Let’s think about all of these:

SyntaxError: JSON.parse: unexpected character
TypeError: Cannot read property 'value' of null
ValidationError: Email already exists
Internal Server Error

Each of them is an error with a different origin and reason.

First, JSON.parse will happen when we try to parse a string that is not valid JSON. It can occur when we directly catch the external API response body and without checking the response type, we parse e.g. error page which is HTML

Another TypeError can be a pure static code issue - we can try to access a property of nullish value, for example, accessing an object property before it has been created.

Common things for these two is that they are mainly useful for the developer. Best if they are caught during compilation or static analysis if possible, then we can protect ourselves by writing proper tests. Once we reach the runtime, we must ensure we will be able to recognize them when the application crashes - in logs or error-tracking platforms like Sentry. These errors are also often non-recoverable - the app probably can’t find another way to work if the code can’t execute anymore.

Once we reach ValidationError we change the abstraction level. First of all, it’s not language that will throw such errors, but either our database or our internal data layer that is trying e.g. to insert a user into the database. Validation is a graceful way to recover from the issue, giving clear feedback without crashing the app. Such an error is also different from the previous two: it’s rather not interesting to track it in the error tracker (it’s not something we can fix) and this error should be returned in the response, so the user/frontend can handle it.

Internal Server Error on the other hand is an error on the HTTP layer. It’s represented by the semantic code (500) but also provides information that it’s a crash on the server side. Something we definitely should catch and fix. We definitely need as many details as possible in our internal tracking systems, but also do not expose any detail to the front end to avoid leaking any implementation details.

You can see now, that errors are not equal to errors - depending on the context they differ. And for that reason, it requires us to model errors carefully.

Abstraction is the problem

Conceptually errors can propagate through the stack trace:

FunctionA()
  FunctionB()
    FunctionC()
        throw new Error()

The stack trace follows the function execution. Then, the opposite when we are catching it:

try {
  FunctionA()
    try {
      FunctionB()
        try{
          FunctionC()
            throw new Error()
        }
    }
}

The inner error will be traveling through the execution until some catch block intercepts it (or the program finishes with the unhandled exception).

Now when we think about it in scale - a program running hundreds of functions to process the request, we travel through the abstraction layers.

Errors that arise in the controllers can be often caused by validation logic, partially represented business rules (minimal password length), partially expected data format (JSON), etc.

Errors that are caught in the model are likely related strictly to the domain, for example, we can’t add to a cart product that doesn’t exist anymore.

Sometimes, errors happen in external services. Can be our database downtime or external API not responding.

And in every other place, we can face dozens of programming errors, caused by wrong implementation on the language level.

Depending on the abstraction we will need a different handling

Errors chaining

In Python, there is a concept of raise ErrorA from ErrorB. Its purpose is to indicate one error is caused by another, especially useful when we transform errors.

In the JS stack, we can use Error.prototype.cause for that.

// not real API
try {
  stripe.pay()
}catch(stripeError) {
  if(stripeError instanceOf Stripe.InvalidCardDetails) {
    throw new PaymentFailedError("Payment failed to due invalid payment details", {
      cause: stripeError
    })
  }
}

This is a powerful pattern.

First, we have locally intercepted errors from the 3rd party API. It gives us control, at this point, we can log, track, set metrics, emit events, or anything else.
Second, we can match the error type. External APIs will provide us a unified errors layer (likely HTTP serialized errors if a list of enum codes) - some of them can be recoverable, some not.

An error like “invalid card details” is expected and recoverable - the user must try again.

An error like “invalid Stripe secret key” is not an action for the customer, but definitely should reach the payment operator to fix it, otherwise, the business critical path may be down.

Third, we still chain the reasons, allowing us to track not only the stack tree (representing function calls) but also the human-readable messages we wrote in our code.

When we move level up from the payment example above, we will be able to see simplified reasoning of how powerful error matching is:

// controller / app service / use case

...

try {
  paymentProvider.pay(request.body)
}catch(error){
  switch(true) {
    case error instanceOf PaymentFailedError: {
      tracker.trackEvent("invalid_payment")

      // In real life respond with a json-like structure with a payment refusal reason
      return new Response("Invalid payment details", {status: 400})
    }
    case error instanceOf GatewayAuthError: {
      captureException(new Error("Payment Gateway auth rejected", {cause: error}), {level: "FATAL"})

      return new Response("Error processing payment, please try a different payment method",
      {status: 500})
    }
  }
}

In real life, it will be much broader, due to all handled error cases - but conceptually, we can achieve the same thing: a strong and clear distinction between what class of error we are dealing with, who should receive it, what data we provide and how do we monitor these.

Passing cause gives us additional context we can log (e.g. into Sentry), but we don’t have to return it to the storefront. Or maybe we want to, but only for test environments (we do that in our Stripe App in Saleor)

Error matching (that can be implemented with extending Errors or by enum-like reasons) allows us to route error handling and react properly depending on the issue.

Summary

Try to think about what types of errors your application can produce
Model what data you need to attach to your errors (both internal and external systems)
Leverage the Error.prototype.cause field to chain errors
Transform errors when they travel through the abstraction layers

In the upcoming article, I will write more about error implementation and error flow in the application.

Forem: Lukasz Ostrowski

Cleaning up large frontend codebase

Diagnosis

Invalid module refactor

Stale feature flags

Unused queries

Techniques

Static analysis

Tests

Dropping default export and barrel files

Small PRs

AI codemods instead of direct changes

Keep flags clean

Upgrading React 17 to 18 in a large codebase using AI

Scope of Changes

Trying an Agent

Precise Prompt

Focus on Type-Checked Files

Working in Batches

Saving State

Picking up by another agent

Live Editing vs Codegen

TypeScript type inference - the dark side

Code first or design first

Interface vs inference

Impact on the maintenance

Summary

Error modelling

Intro

Role of errors

Abstraction is the problem

Errors chaining

Summary

Dropping `default export` and barrel files