Forem: Lukáš

90% of SaaS is Doomed. Here is How Developers Can Survive the Agent Era.

Lukáš — Fri, 27 Feb 2026 14:51:53 +0000

Software is changing forever. In the near future, people will use AI agents to do almost everything. I believe 90% of traditional SaaS apps are doomed.

In this post, I will show you why the User Interface is dying. I will also share what we as developers need to build today to survive the AI revolution. Let's get started.

The Death of UI

One of the main reasons why agents will take over is very simple. It comes down to convenience.

For years, we built user interfaces. We tried our best to make them user-friendly. But what is a UI really? It is just a bridge. It displays data and allows the user to do stuff.

Learning dozens of different tools is exhausting. Speaking to an AI to do the work for you is much easier. Today, that AI is becoming an autonomous agent.

In 36 months, everyone will have their own main AI agent. That agent will orchestrate a swarm of other specialized agents. People will no longer have to learn complicated dashboards. UI will become a thing of the past.

It is not perfect yet. But we already have all the lego pieces required to replace massive amounts of software. Agents can learn skills. Agents can use memory. Agents can coordinate other tools.

Let's Look at a CRM Example

Data lives in a database. Let's assume your current CRM has an API. You want to stop paying 100 bucks a month for it. You ask your AI agent to migrate the CRM to something cheaper.

The agent will analyze the transition. If it is simple, it will just migrate all the data to a new database. It will create thin wrappers as skills to access your data. It will create cron jobs to automate reports. It will build integrations via webhooks.

You might be thinking about safety. What if the agent accidentally deletes the entire database? Who is responsible?

Remember that AI agents will not perform raw SQL queries. Instead, they will use standard SDKs. They will build consistent and safe skills to perform predictable CRUD operations.

But what if simple wrappers are not enough? What if you need complex roles, permissions, and advanced automations?

In that case, the agent will simply spin up an instance of an open-source solution. Very soon, all popular open-source repos will have agent skills built directly into their codebase. Your agent will install a tool like Directus on your favorite hosting provider. It will set it up automatically. You will never even log into the admin panel.

The 6 Ways to Survive

If UI is dying, how do developers stay relevant? I see 6 safe zones for the upcoming years.

1. Brand and Trust
AI will not replace personal connection. If you build a trustworthy brand, you can do masterminds, coaching, or live seminars. Human connection is safe.

2. Fun and Leisure
People will have more free time. Cinema, sports, and immersive games will become super competitive. If you create something unique and engaging in this area, you are safe long-term.

3. Heavy Compute and Hosting
AI agents will have to store data reliably somewhere. Hosting providers must be agent-friendly. They need to provide automatic backups, disaster recovery, and strong security. GPU clusters for rendering complex scenes are also safe. Agents cannot run heavy compute locally on a laptop.

4. Bits to Atoms
Anything where the end product is physical will be safe. Custom t-shirt printing, 3D printing, and logistics are great examples. AI lives in the digital world. It needs you to touch the physical world.

5. The Legal Wall
Payment processors like Stripe are safe. Anything that has massive compliance and legal requirements is a great moat. An AI agent cannot go to jail.

Everything else is basically doomed. Marketing automation, project management, and basic CRMs will take massive hits. Unless you build an agent infrastructure.

6. Agent Infrastructure
This is the most important point for us developers. If you are building a SaaS, you must focus on the Agent Interface instead of the User Interface.

Create skills for agents so they can easily connect to your service. No more user team slots or weird tiered memberships. Pay-as-you-go will become the new norm.

Your API must be highly competitive. If it is too expensive, the AI will just replicate your core functionality using open-source code.

A Real World Example

Let me give you a quick example of Agent Infrastructure.

Today, you might use an AI model like Nano Banana Pro to generate an amazing image. You are super happy with it. But later you want to move the text or change the font. You are screwed because the font is hardcoded into the pixels.

But imagine if a platform like Canva exposed a headless Agent Skill. Your agent could instruct Canva to compose the image in layers behind the scenes. Then, your agent could simply relocate the text and change the font. You would get the perfect result without ever visiting the Canva website.

This already exists for videos. Remotion released an AI skill that you can install right now. You can use it in Claude Code or Cursor. It composes a video for you. If you want changes, you just prompt it again.

Conclusion

Think of the future like a big company structure. You are the CEO. You have one main AI agent acting as your general manager. That manager controls specialized sub-agents. One takes care of your database. Another does internet research. A third one creates and schedules content.

Digital transformation is already happening. The puzzle pieces are here. Ignoring this shift is like trying to make your horse faster during the Industrial Revolution while everyone else is already building a car.

What are your thoughts on how the future will look? Are you building a UI wrapper or actual infrastructure? Let me know in the comments.

Building a TypeScript Video Editor as a Solo Dev

Lukáš — Mon, 04 Nov 2024 23:23:07 +0000

4 years after embarking on an exciting SaaS building journey, it's the right time to rebuild one of the key components of our app.

Simple video editor for social media videos written in JavaScript.

Here's the stack I decided to use for this rewrite, which is now a work in progress.

Svelte 5

Since our frontend is written in SvelteKit, this is the best option for our use case.

The video editor is a separate private npm library I can simply add to our frontend. It's a headless library, so the video editor UI is completely isolated.

The video editor library is responsible for syncing the video and audio elements with the timeline, rendering animations and transitions, rendering HTML texts into canvas, and much more.

SceneBuilderFactory takes in a scene JSON object as an argument to create a scene. StateManager.svelte.ts then keeps the current state of the video editor in real time.

This is super useful for drawing and updating the playhead position in the timeline, and much more.

Pixi.js

Pixi.js is an outstanding JavaScript canvas library.

Initially, I started to build this project with Pixi v8, but due to some reasons I'll mention later in this article, I decided to go with Pixi v7.

However, the video editor library is not tightly coupled to any dependencies, so it's easy to replace them if needed or to test different tools.

GSAP

For timeline management and complex animations, I decided to use GSAP.

There's no other tool in the JavaScript ecosystem I'm aware of that allows building nested timelines, combined animations, or complex text animations in such a simple way.

I have a GSAP business license, so I can also leverage additional tools to make more things simple.

Key Challenges

Before we dive in into stuff I use in the backend, let's see some challenges you need to solve while building a video editor in javascript.

Synchronize video/audio with the timeline

This question is often asked in the GSAP forum.

It doesn't matter if you use GSAP for timeline management or not, what you need to do is a couple of things.

On each render tick:

Get video relative time to the timeline. Let's say your video starts playing from the beginning at the 10-second mark of the timeline.

Well, before 10 seconds you actually don't care about the video element, but as soon as it enters the timeline, you need to keep it in sync.

You can do this by computing the relative time of the video, which must be computed from the video element's currentTime, compared against current scene time and within an acceptable "lag" period.

If the lag is larger than, let's say, 0.3 seconds, you need to auto-seek the video element to fix its sync with the main timeline. This also applies to audio elements as well.

Other things you need to consider:

handle play / pause / ended states
handle seeking

Play and pause are simple to implement. For seeking, I add the video seeking component id into our svelte StateManager, which will automatically change the state to "loading".

StateManager has an EventManager dependency and on each state change, it automatically triggers a "changestate" event, so we can listen to these events without using $effect.

The same thing happens after seeking is finished and the video is ready to play.

This way we can show a loading indicator instead of play / pause button in our UI when some of the components are loading.

Text rendering is not as simple as you think

CSS, GSAP, and GSAP's TextSplitter allow me to do really amazing stuff with text elements.

Native canvas text elements are limited, and since the primary use case of our app is to create short-form videos for social media, they are not a good fit.

Luckily, I found a way to render almost any HTML text into canvas, which is crutial for rendering the video output.

Pixi HTMLText

This would have been the simplest solution; unfortunately, it did not work for me.

When I was animating HTML text with GSAP, it was lagging significantly, and it also did not support many Google fonts I tried with it.

Satori

Satori is amazing, and I can imagine it being used in some simpler use cases. Unfortunately, some GSAP animations change styles that are not compatible with Satori, which results in an error.

SVG with foreign object

Finally, I made a custom solution to solve this.

The tricky part was supporting emojis and custom fonts, but I managed to solve that.

I created an SVGGenerator class that has a generateSVG method, which produces an SVG like this:

<svg xmlns="http://www.w3.org/2000/svg" width="${width}" height="${height}" viewBox="0 0 ${width} ${height}" version="1.1">${styleTag}<foreignObject width="100%" height="100%"><div xmlns="http://www.w3.org/1999/xhtml" style="transform-origin: 0 0;">${html}</div></foreignObject></svg>

The styleTag then looks like this:

<style>@font-face { font-family: ${fontFamilyName}; src: url('${fontData}') }</style>

For this to work, the HTML that we pass in needs to have the correct font-family set inside inline style. Font data needs to be a base64 encoded data string, something like data:font/ttf;base64,longboringstring

3. Component Lifecycle

Composition over inheritance, they say.

As an exercise to get my hands dirty, I refactored from an inheritance-based approach to hook-based system.

In my video editor, I call elements like VIDEO, AUDIO, TEXT, SUBTITLES, IMAGE, SHAPE, etc. components.

Before rewriting this, there was an abstract class BaseComponent, and each component class was extending it, so VideoComponent had logic for videos, etc.

The problem was that it became a mess pretty quickly.

Components were responsible for how they are rendered, how they manage their Pixi texture, how they are animated, and more.

Now, there is only one component class, which is very simple.

This now has four lifecycle events:

- setup
- update    // called on each render tick, video rewind, frame export...
- refresh   // called when user changes component data in UI
- destroy

This component class has a method called addHook that changes its behavior.

Hooks can hook into component lifecycle events and perform actions.

For example, there is a MediaHook that I use for video and audio components.

MediaHook creates the underlying audio or video element and automatically keeps it in sync with the main timeline.

For building components, I used the builder pattern along with the director pattern (see reference).

This way, when building an audio component, I add MediaHook to it, which I also add to video components. However, videos also need additional hooks for:

Creating the texture
Setting up the sprite
Setting the right location in the scene
Handling rendering

This approach makes it very easy to change, extend, or modify the rendering logic or how the components behave in the scene.

Backend and Rendering

I tried multiple different approaches on how to render videos in the fastest and most cost-efficient ways.

In 2020, I started with the simplest approach - rendering one frame after another, which is something that many tools do.

After some trial-and-error, I switched to a rendering layers approach.

That means our SceneData document contains layers which contain components.

Each of these layers is rendered separately and then combined with ffmpeg to create the final output.

The limitation was that a layer can only contain components of the same type.

For example, a layer with video cannot contain text elements; it can only contain other videos.

This obviously has some pros and cons.

It was quite simple to render HTML texts with animations on Lambda independently and turn them into transparent videos, which were then combined with other chunks for the final output.

On the other hand, layers with video components were simply processed with ffmpeg.

However, this approach had a huge drawback.

If I wanted to implement a keyframes system to scale, fade, or rotate the video, I would need to make ports of these features in fluent-ffmpeg.

That is definitely possible, but with all the other responsibilities I have, I simply didn't make it.

So I decided to go back to the first approach - rendering one frame after another.

Express and BullMQ

Rendering requests are sent to the backend server with Express.

This route checks if the video isn't being rendered yet, and if not, it's added into the BullMQ queue.

Playwright / Puppeteer

After the queue starts processing the render, it spawns multiple instances of headless Chrome.

Note: this processing happens on a dedicated Hetzner server with AMD EPYC 7502P 32-Core Processor and 128 GB RAM, so it's quite a performant machine.

Keep in mind Chromium doesn't have codecs, so I use Playwright which makes it trivial to install Chrome.

But still, the video frames came out black for some reason.

I'm sure I was just missing something; however, I decided to split the video components into individual image frames and use these in the serverless browser instead of using videos.

But still, the most important part was to avoid using the screenshot method.

Since we have everything in one canvas, we can get it into an image with .getDataURL() on the canvas, which is much faster.

To make this simpler, I made a static page that bundles the video editor and adds some functions into window.

This is then loaded with Playwright/Puppeteer, and on each frame, I simply call:

const frameData = await page.evaluate(`window.setFrame(${frameNumber})`);

This gives me the frame data that I can either save as an image or add into a buffer to render the video chunk.

This whole process is split into 5-10 different workers, depending on the video length, which are merged into the final output.

Instead of this, it can be offloaded to something like Lambda as well, but I'm leaning towards using RunPod. The only drawback of their serverless architecture is they use Python, which I'm not that familiar with.

This way, the rendering might be split into multiple chunks that are processed on the cloud, and even rendering of a 60-minute video can be done in a minute or two. Nice to have, but that's not our primary goal or use case.

What I Did NOT Solve (Yet)

The reason I downgraded from Pixi 8 to Pixi 7 is because Pixi 7 also has the "legacy" version that supports 2D canvas. This is MUCH faster for rendering. A 60-second video takes around 80 seconds to render on the server, but if the canvas has WebGL or WebGPU context, I was able to render only 1-2 frames per second.

Interestingly enough, serverless Chrome was much slower compared to headful Firefox when rendering WebGL canvases, according to my testing.

Even using a dedicated GPU didn't help speed up the rendering by any significant margin. Either I was doing something wrong, or simply headless Chrome isn't very performant with WebGL.

WebGL in our use case is great for transitions, which are usually quite short.

One of the ways I plan to test regarding this is to render WebGL and non-WebGL chunks separately.

Other Components

There are many parts involved in the project.

Scene data is stored on MongoDB, since the structure of the documents makes most sense to be stored in a schemaless database.

The frontend, written in SvelteKit, uses urql as a GraphQL client.

The GraphQL server uses PHP Laravel with MongoDB and the amazing Lighthouse GraphQL.

But this is a theme maybe for the next time.

Wrapping Up

So that's it for now! There's a lot of work that needs to be done before putting this into production and replacing the current video editor, which is quite buggy and reminds me a bit of Frankenstein.

Let me know what you think and keep on rockin'!