Forem: Nikhil Jathar

I Built Four Tools with Claude Code. None of Them Had Tests. So I Fixed That

Nikhil Jathar — Wed, 15 Apr 2026 22:12:01 +0000

I had just finished a two-hour Claude Code session on ERPClaw. The invoicing workflow was coming together. Journal entries were generating, invoices were producing the right totals, and the OpenClaw integration was responding correctly. I closed the terminal and thought: I should check if any of this broke anything in the accounting side.

Then I realized I had no way to check. There were no tests. There had never been any tests.

I opened the GL reconciliation logic and stared at it for a while. Claude had touched five files in that session. Any one of them could have introduced something subtle. I had no coverage to fall back on. The only way to verify was to mentally trace through the whole accounting flow by hand.

I did that. It took 45 minutes.

Everything was fine that time.

This Was Not the First Time

I had shipped the PHP Reddit API the same way. Claude wrote the core structure fast, the PSR compliance fell into place, the Laravel bridge worked on the first try. The PHP community picked it up. Real people were using it. No tests.

Same with SiteKit. Same pattern every project: Claude Code writes the code quickly, it works, you ship it, and somewhere in the back of your mind you know that "works now" is not the same as "will keep working."

ERPClaw is the most serious of these. It is an AI-native ERP system for the OpenClaw platform. Accounting, invoicing, inventory, payroll, tax, financial reporting -- 413 actions across 14 domains. When I say financial software, I mean software where a small bug in a journal entry can compound silently across hundreds of transactions and not announce itself until someone runs a report that should balance and does not.

I built all of that without tests.

The Incident That Changed My Approach

During an ERPClaw session, Claude refactored the journal entry creation logic for the invoicing workflow. The refactor was reasonable. The code ran cleanly. Invoices generated. Totals looked correct. I moved on to the next feature.

Two sessions later I was reviewing a financial report. The ledger was off. On multi-line invoices with compound tax rates, debits and credits were not balancing. Not by much -- but double-entry accounting does not have a "close enough." The ledger either balances or it does not.

Nothing had crashed. There was no error. The numbers were quietly wrong.

I traced it manually. The journal entry creation was distributing the compound tax calculation incorrectly across line items. It was a single logic error in one function. A unit test with a multi-line invoice and a compound tax rate would have caught it in three seconds.

Instead it took 45 minutes of manual tracing to find it.

The math became obvious: ERPClaw has 413 actions. That is 413 potential windows where a silent regression could sit undetected between when Claude writes the code and when I manually notice something is off. At some point, discipline stops being a viable strategy.

Why CLAUDE.md Instructions Do Not Solve This

I had tried the obvious fix. I put test instructions in CLAUDE.md. "Write tests after every file edit." Claude followed it sometimes, ignored it in long sessions, and there was nothing I could do to enforce it.

CLAUDE.md instructions are advisory. Claude reads them at the start of a session and applies them to the best of its ability. But in a complex multi-file session where Claude is focused on architecture, test generation falls off. It is not a bug in Claude -- it is how any attention-based system works under load.

The problem was structural. Tests required a separate act of will. Someone had to decide to write them. AI coding moves fast enough that "I'll do it after this task" means "I'll do it never."

So I Built tailtest

The fix was straightforward once I saw it clearly: hook into the file write event itself. Do not rely on Claude deciding to write tests. Make tests happen automatically as a consequence of Claude writing any file.

Claude Code has a PostToolUse hook. It fires after every tool call -- including every file write. tailtest uses that hook.

When Claude writes a file:

tailtest fires
It runs an intelligence filter (more on this below)
If the file is worth testing, it generates test scenarios for the code that was just written
It runs those tests immediately
If everything passes: nothing. Silent. You keep working.
If something fails: specific output, in the same session, while you still know what changed

The silence-on-pass decision was deliberate. If tailtest talks every time a test passes, you start ignoring it within a week. The only time it surfaces output is when something actually needs your attention. That is the only design that survives long-term use.

The Intelligence Filter

Not every file Claude writes is worth testing. Config files are not. Schema migrations are not. Boilerplate index files are not. If tailtest ran on all of them, it would be noisy, slow, and would generate useless test output.

tailtest runs an intelligence filter before generating anything. It looks at the file extension, the path, and the content patterns to decide whether this is a file containing logic worth testing. Services, utilities, domain models, controllers, business logic -- these get tested. Configuration, migrations, generated files -- these get skipped.

This is not optional. Without filtering, the tool generates noise. Noise causes developers to turn it off. A turned-off testing tool does nothing.

The Ramp-Up Scan

If you install tailtest on an existing project -- like ERPClaw, which had zero tests when I started -- you do not get hit with a thousand test generation runs on the first session. tailtest scans the codebase on first run, identifies files with no coverage, and queues them for gradual background testing. New edits get coverage immediately. Existing files get covered over time.

This matters for real projects. A cold-start that tries to test everything at once is not useful. Gradual coverage is.

The Recursive Part

tailtest now has 332 tests in its own test suite. A significant number of those were generated by tailtest itself while I was building it.

At one point, tailtest caught a bug in its own intelligence filter logic. I had not noticed anything wrong. It fired on a file, ran the tests, and returned a failure on an edge case I had introduced while refactoring the filter's file-type detection.

I considered that the correct moment to ship a testing tool. When your testing tool tests itself and catches its own regressions before you ship it, the concept is proven.

Who This Is For

If you write code with Claude Code and you know you should have tests: tailtest removes the decision. Tests happen. You do not have to remember, you do not have to prompt, you do not have to discipline yourself. Every edit gets a test run.

If you are building with Claude Code and testing feels complicated: tailtest generates the tests. You do not need to know how to write pytest or vitest. You install it, and from that session forward, new code gets tested. You see failures when they happen. You do not need to understand the test framework to benefit from the coverage.

Honest Limits

tailtest generates tests. It does not guarantee they are the right tests. For complex business logic -- the kind of multi-state, multi-entity logic that took domain expertise to design -- a human should review the generated test scenarios. tailtest gives you coverage. It does not replace test strategy.

There is also a token cost. Every PostToolUse invocation uses tokens. A typical session adds roughly $5 or under to your Claude usage. For production software with real users -- ERPClaw, the PHP Reddit API -- that is a rounding error on the cost of shipping a silent regression. For a hobby project you are not maintaining seriously, it is a real tradeoff. I am not going to pretend otherwise.

Languages and Install

Python (pytest), TypeScript and JavaScript (vitest, jest), Go, Rust, Ruby, Java, PHP.

claude plugin marketplace add avansaber/tailtest
claude plugin install tailtest@avansaber-tailtest

No config file. No setup. It detects your language and test runner automatically.

Source: https://github.com/avansaber/tailtest
Website: tailtest.com
Open source, MIT licence, free.

Questions and issues go in the GitHub repo. I read them.

The New SaaS Playbook: What Building a 29-Module ERP Taught Me About AI-Native Software

Nikhil Jathar — Sun, 01 Mar 2026 04:32:14 +0000

2008 had the mobile revolution. Within three years, every company needed a mobile strategy. Entire industries (taxis, hotels, banking) were rebuilt mobile-first. Not "mobile-added." Mobile-FIRST.

2026 has the AI-native revolution. But most companies are doing the equivalent of 2009 "mobile strategy," bolting a chatbot sidebar onto an existing product. A summary button. An autocomplete. That is not AI-native. That is AI-decorated.

I built a 29-module ERP system to test a thesis: when AI becomes the primary implementation layer, the entire SaaS model changes. Not just how code gets written, but how software gets architectured, priced, delivered, and maintained. The system covers general ledger, inventory, manufacturing, payroll, CRM, AI analytics, and four regional compliance overlays.

612 actions. 191 database tables. 1,839 automated tests. Running on a $20/month server.

This article is the playbook for what I learned. ERPClaw is the proof, not the point.

1. Why the Current SaaS Model Is Fragile

The economics of traditional SaaS are straightforward. Hire engineers at $150K-$250K per year, build for 18 months, charge per-seat to recover costs. A typical ERP vendor charges $10K-$50K annually because they need to: 200-person engineering teams, multi-year roadmaps, legacy code maintenance, sales teams, implementation consultants.

The pricing is a function of build cost, not value delivered. A 40-person manufacturing shop pays $50K per year for SAP but uses perhaps 15% of the features. They need inventory, purchasing, invoicing, and payroll. They are paying for multi-currency consolidation across 40 subsidiaries, a feature they will never touch.

Now ask the question that should keep SaaS executives awake at night: what happens when the build cost drops 10x?

The pricing model does not decline gracefully. It collapses. Not because AI products are better, but because the cost of building equivalent software approaches zero. This is the Kodak parallel. Kodak did not fail because digital cameras took better photos. They failed because the cost of taking a photo went to zero, which destroyed the business model of selling film. SaaS incumbents face the same dynamic. Open-source alternatives built at AI speed will make per-seat pricing indefensible for commodity software.

ERP, CRM, project management, HR, invoicing: these are commodity problems with well-understood business rules. The code is not the moat. It never was. The moat was the cost of writing it.

That moat is gone.

2. The AI-Native Playbook

These seven principles are not specific to ERP. They are the architectural patterns that work when AI is your primary implementation tool. I learned them by building ERPClaw, a 29-module ERP with 612 actions, but they apply equally to healthcare scheduling, logistics management, property management, or any domain-specific SaaS.

Principle 1: Spec-First, Not Code-First

AI is an implementation tool, not an architect. It will happily build you a GL posting function that uses floating point arithmetic. Your trial balance will be off by $0.01 after a thousand transactions, and nobody will notice until the auditor does.

I spent the first full day writing a master specification. No Python. No SQL. Just a document: 33 sections, 9,766 lines. Every table (191 of them), every action (612), every naming convention, every validation rule, every test scenario defined before a single line of code existed.

I studied ERPNext's source code to understand real-world edge cases: double-entry reversals, FIFO stock valuation, partial payment allocation, multi-currency revaluation. That domain research went into the spec, not into prompts.

The plan quality directly determines the output quality. A well-specified action produces working code on the first generation 90% of the time. An underspecified action produces plausible code that fails on edge cases 90% of the time.

The rule: Spend 20% of your time on the specification. It saves 80% of debugging.

Principle 2: Metadata-Driven Everything

In ERPClaw, every skill has a SKILL.md file. That one file serves four audiences simultaneously:

AI instruction: tells the AI what actions exist and how to execute them
API documentation: defines parameters, types, and return values
Web form specification: auto-generates the web UI from the same metadata
User manual: progressive disclosure (basic, intermediate, advanced)

One file. Four surfaces. When I built the web dashboard (Webclaw), I wrote a UI.yaml auto-generator that scanned 24 skills and produced form specifications directly from SKILL.md metadata. All 612 actions became accessible through a web UI with zero per-action custom code. A 4,651-check validation suite verified every auto-generated form.

A traditional ERP frontend needs roughly 150 lines of form code per action. Across 612 actions, that is approximately 92,000 lines. The metadata-driven approach replaced all of it with one generic form renderer and 24 YAML files.

The rule: If your metadata serves only one purpose (API docs OR UI OR AI instructions), you are maintaining the same information in multiple places. Design one source of truth that drives all surfaces.

Principle 3: Modular by Default

Not microservices. The operational overhead of service mesh, distributed transactions, and container orchestration is unjustifiable at AI development speed. Not monolith either; too coupled for independent evolution. The middle ground is independent modules with clear ownership boundaries sharing a single database.

ERPClaw's 29 skills each have their own repo, their own tests, their own SKILL.md. They share one SQLite database (191 tables, 535 indexes). The ownership rule is simple: only the owning skill can WRITE to its tables, while any skill can READ any table. Cross-skill writes happen via subprocess calls. A shared library of 16 modules provides common plumbing: GL posting, stock posting, tax calculation, naming, and encryption.

One database file. One server. Zero DBA. Zero network hops between modules.

The rule: Choose the simplest architecture that supports independent module evolution. For most SaaS products, that is a shared database with clear ownership boundaries, not microservices.

Principle 4: Test Pyramid with Systemic Invariants

AI-generated code passes spot checks but fails systemic properties. Traditional unit tests are necessary but not sufficient. You need invariant checks: properties that must hold across the entire system after every change.

ERPClaw has 18 accounting invariant checks that run automatically after every test touching the general ledger:

Total debits equal total credits across all GL entries
Balance sheet equation holds (Assets = Liabilities + Equity)
GL chain hash integrity (SHA-256 sequential hashing)
Every submitted voucher has at least 2 GL entries (double-entry enforcement)
No NaN or Infinity values in any financial column
Cancelled vouchers have matching reversals with swapped debit/credit

If any single invariant fails, every GL-touching test in that run fails. You cannot accidentally break double-entry bookkeeping and have green tests.

On top of the invariants: 1,530 pytest tests, 168 Playwright browser E2E tests, 60 Telegram E2E tests on the production server. Total: 1,839 automated tests.

The rule: Define the systemic properties your software must maintain. Test those properties after every change, not just the individual functions.

Principle 5: Clean-Install as a Gate

"Works on my machine" is the number one failure mode of AI-assisted development. The AI optimises for the current environment. It does not think about first-time setup on a blank server.

I did a full server wipe: deleted all 30 skills, both databases, the shared library. Then reinstalled from published packages. It broke immediately.

Stale user sessions persisting across database wipes because the web UI had its own session database that was never cleaned
10 missing tables in the publish schema, out of sync with the development schema
Seed data creating UNIQUE constraint collisions across regional skills
$(whoami) returning "root" under sudo, causing services to launch as the wrong user

None of these were caught by 1,530 unit tests. All of them would have hit every single new user.

I ran three full clean-install rounds before it was stable. 49 E2E tests across five phases.

The rule: If it does not work on a blank server, it does not work. Gate every release on a clean-install test.

Principle 6: Security Audit the Output, Not the Process

AI does not think about what should NOT be in production. It generates functionally correct code that leaks development context everywhere.

I ran a security audit across 30 published packages, roughly 220 files. 21 findings: 3 HIGH, 2 MEDIUM, 7 LOW, 9 Open Source Readiness issues.

The HIGH findings were embarrassing in their simplicity: my local dev repository path hardcoded in a meta-package, .DS_Store files included in published packages, systemd configs with real server paths. The MEDIUM findings were worse. A real Indian taxpayer ID (GSTIN) was embedded in test seed data. Development paths appeared in user-facing error messages.

The AI-generated code passed every test. It also shipped my home directory path and a real person's tax ID.

The rule: Treat AI output like code from a brilliant but careless junior developer. Review for what is present that should not be, not just whether the logic is correct.

Principle 7: Ship Scope, Not Features

The old SaaS playbook says ship one feature, get feedback, iterate for 18 months. The AI-native playbook says ship the entire scope and let users install what they need.

ERPClaw shipped 29 modules and 612 actions. Not because shipping fast is impressive, but because modular architecture makes it possible to ship broad scope without shipping complexity. Each skill is independent. Install erpclaw-selling without erpclaw-manufacturing. Install erpclaw-payroll without erpclaw-crm. The system degrades gracefully when optional skills are absent.

Traditional per-feature development made this impossible. You could not justify building 29 modules with a 200-person team in 18 months. With AI handling implementation, the constraint becomes specification quality, not engineering hours.

The rule: When build cost drops 10x, the strategy shifts from "build fewer things better" to "build everything and let users choose." Modular architecture is the prerequisite.

3. The Proof

ERPClaw is not the point of this article. It is the evidence.

Scope: 29 skills covering general ledger, journals, payments, tax, reports, inventory, selling, buying, manufacturing, HR, payroll, CRM, projects, assets, quality, support, billing, AI engine, analytics, and four regional compliance overlays (India GST/TDS, Canada GST/HST/CPP/EI, UK VAT/PAYE/NI, EU 27-state VAT/OSS/SAF-T).

Architecture: Built on the OpenClaw platform. Each skill is a self-contained folder: SKILL.md metadata, scripts/db_query.py for logic, tests/ for pytest. Single SQLite database, 191 tables, 535 indexes. Shared library with 16 modules.

Web UI: Webclaw, built with FastAPI gateway plus Next.js 16 plus shadcn/ui. Auto-generated from SKILL.md metadata. 168 Playwright E2E tests.

Numbers: 612 actions. 1,839 automated tests. 18 accounting invariant checks. 33 GitHub repos, all open source licensed. $0 software cost, $20/month server.

4. What Two Weeks Looks Like

This timeline is not the flex. It is the point. If one developer can do this, the question every SaaS company must answer is: what does your 200-person engineering team do for 18 months?

Day 1: Master plan. 9,766 lines, zero code. The highest-leverage day of the entire project.

Days 2-3: Foundation. Setup, General Ledger, Journals, Payments, Tax, Reports. Six skills. The 12-step GL validation was the hardest part.

Days 4-5: Supply chain. Inventory with FIFO valuation, Selling with the full quote-to-cash pipeline, Buying with three-way matching (PO, receipt, invoice).

Days 6-7: Operations. Manufacturing (BOMs, work orders, WIP accounting), HR, Projects, Assets, Quality.

Days 8-9: People and growth. Payroll (FICA, federal progressive tax, state tax, 401k, garnishments, W-2), CRM, Support, Billing.

Days 10-11: Intelligence and compliance. AI anomaly detection, analytics dashboards, four regional tax overlays.

Days 12-14: Testing overhaul, v2 features, clean-install testing, security audit.

Roughly two skills per day, each tested before moving on.

The answer to "what does a 200-person team do for 18 months" is, for the most part, coordination. Meetings about meetings. Sprint planning for the sprint planning. Cross-team dependency resolution. Code review chains five people deep. AI eliminates the coding bottleneck. Small teams eliminate the coordination bottleneck. Together, that is the 10x.

5. What Broke

Every failure I encountered was a systemic property, not a local bug. Traditional testing catches local bugs. AI-native development needs systemic gates.

The Clean-Install Disaster. Full server wipe, reinstall from packages. Five immediate failures: stale sessions, missing tables, seed collisions, path leakage, sudo detection. Unit tests caught zero of these. A clean-install gate would have caught all five.

The Security Audit. 21 findings across 220 files. Real taxpayer IDs in test data. Build artifacts in packages. Development paths in error messages. The code was functionally correct and contextually careless. A security review stage would have caught all 21.

The Schema Drift. Four regional skills built in separate AI sessions. 39 cross-skill inconsistencies: total_tax vs tax_amount, company_setting vs regional_settings, employee_name vs full_name. Each skill passed its own tests. Cross-skill queries returned wrong data. A schema alignment check would have caught all 39.

The pattern is clear. AI-generated code fails at system boundaries, not within modules. Your testing strategy must match: invariant checks, clean-install gates, security audits, and schema alignment tests. These are the new mandatory layers.

6. The Honest AI Assessment

Where AI excels: CRUD implementation, SQL schema generation, test scaffolding, maintaining consistent patterns across 29 skills (it does not get bored on skill 27), and translating well-specified business rules into working code.

Where AI fails: Cross-module dependencies (intercompany invoicing needed heavy manual correction), edge cases not covered in the spec (GL reversals with partial payments, garnishment priority ordering), security awareness (zero instinct for what should not ship), and cross-session consistency (a table rename in one session is forgotten three sessions later).

The pattern that works: Spec-first. One module at a time. Test immediately. Never move on with failing tests. Give the AI a narrow, well-defined task and it executes with remarkable speed.

The pattern that fails: "Build everything at once." Context overflow, compounding bugs, inconsistent assumptions across modules. The AI is a sprinter, not a marathoner.

The role split: The human provides architecture, domain expertise, validation logic, and the questions the AI does not know to ask. The AI provides implementation speed, consistency, tirelessness, and test generation. Neither role is optional. The CTO who thinks "AI replaces my engineers" will ship broken software. The CTO who thinks "AI is just autocomplete" will ship too slowly.

7. What This Means for CTOs

These are uncomfortable questions. They are also unavoidable.

Headcount. If a solo developer ships 612 actions in two weeks, what is the right team size for a SaaS product? Not zero; architecture, domain expertise, security, and infrastructure still require humans. But not 200 either. The AI-native team is 3-5 people: one architect, one domain expert, one infrastructure engineer, one QA/security person. The "20 backend engineers" model belongs to 2010-2025.

Pricing. If build cost drops 10x, per-seat pricing becomes indefensible for commodity software. ERP, CRM, project management, HR: these are well-understood domains with public specifications. The value lies in domain expertise and data, not in the code. Open-source alternatives will eat commodity SaaS the way Linux ate proprietary Unix.

Build vs Buy. The calculus changes fundamentally. "Build" used to mean 18 months and $2M. Now it means 2 weeks and $20/month hosting. For domain-specific software where your industry has unique rules that off-the-shelf products handle poorly, building is now cheaper than buying and customising.

Moats. The code moat is gone. Anyone can build equivalent software with AI. The remaining moats are proprietary data, regulatory expertise, distribution, and trust. If your SaaS company's primary asset is "we wrote a lot of code over 10 years," that asset is depreciating fast.

Architecture. Every new SaaS product should be metadata-driven. If your metadata serves only one purpose, you are doing 3x the work and creating 3x the maintenance burden. Single-source-of-truth metadata that drives all surfaces (AI, API, UI, docs) is the new baseline.

8. The Playbook

If you are starting a new SaaS product:

Write the full specification first. Tables, actions, validations, edge cases. Spend 20% of your time here.
Design metadata-driven architecture. One definition file per module that drives AI, API, UI, and docs.
Choose boring infrastructure. SQLite or Postgres. Monorepo or simple multi-repo. Subprocess communication. Microservices are for Google-scale problems.
Implement systemic testing. Invariant checks that verify global properties, not just unit assertions.
Gate on clean-install. Every release must work on a blank server.
Security audit the output. Treat AI code like junior developer code; review for context leakage.
Ship scope, then polish. 29 modules at 90% is more useful than 3 modules at 99%.

If you are running an existing SaaS company:

Ask yourself: what is our moat if someone rebuilds our product scope in two weeks with AI?
If the answer is "our code," that is not a moat anymore. Invest in data, domain expertise, and distribution.
Evaluate your per-seat pricing model. The pressure is coming from open-source alternatives built at AI speed.
Remember: your customers do not want software. They want their business operations to work. The delivery mechanism is irrelevant to them.

9. Honest Tradeoffs

AI-native development is not magic. Here is what it cannot do yet.

SQLite is single-server, single-writer. Sufficient for 95% of SMBs. Not for Fortune 500 with 40 countries and 100,000 transactions per day.

Bus factor of one. Open source mitigates this (open source licence, anyone can fork) but it is not the same as having a team.

No SOX or ISO certification. 1,839 automated tests is not a formal audit. If your auditor requires compliance documentation, you will need to produce it yourself.

The web UI is auto-generated from metadata. It is functional, not beautiful. There is no design team behind it.

Cross-module complexity still requires human judgment. Intercompany invoicing, multi-currency revaluation, payroll garnishment priority ordering: these need a domain expert, not a prompt.

If you have 10,000 employees and operate in 40 countries, use SAP. Genuinely. This playbook is for the other 95%.

10. The Endgame

The $50K/year ERP licence becomes indefensible for most SMBs within two to three years. Not because ERPClaw is better than SAP at Fortune 500 scale (it is not), but because the cost of building equivalent scope for SMB needs approaches zero.

Every vertical will get the same treatment. Healthcare scheduling. Property management. Logistics. Legal practice management. Education administration. The formula is the same: domain expert plus AI plus metadata-driven architecture equals full-scope software at near-zero cost.

The surviving SaaS companies will sell domain expertise, regulatory compliance, and data network effects. Not code. The code becomes a commodity. The knowledge of what the code should do remains scarce.

Open-source AI-native software becomes the Linux of business applications. Not glamorous. Not venture-scale. But everywhere.

The CTO's job shifts from "managing engineering teams that write code" to "defining specifications that AI implements and humans validate." The best CTOs in 2030 will not be the ones who managed the largest teams. They will be the ones who wrote the best specs.

This is not a prediction. ERPClaw exists. 29 modules, 612 actions, 1,839 tests, four countries, running on a $20 server. The future is not coming. It shipped.

ERPClaw is free, open-source, and open source licensed. But the point of this article is not ERPClaw; it is the playbook. These principles apply whether you are building ERP, CRM, healthcare scheduling, or logistics management.

If you are a CTO evaluating what AI changes about your business, I would rather hear your skepticism than your applause. Comments are open.

We Built an Open Source Alternative to Laravel Forge - Here's Why

Nikhil Jathar — Sun, 04 Jan 2026 01:19:41 +0000

If you have ever deployed a Laravel application to a VPS, you know the pain. SSH into the server, install Nginx, configure PHP-FPM, set up MySQL, generate SSL certificates, create deploy scripts... and then repeat the whole thing for the next project.

For the longest time, I relied on Laravel Forge. It's brilliant software, honestly. But at ~ $12 / month (and more for teams), the costs add up quickly when you're managing multiple servers for clients. RunCloud is a similar story.

So we decided to build our own. And then open source it.

Introducing SiteKit

SiteKit is a self-hosted server management platform that does pretty much everything Forge does:

Server Provisioning - Point it to any Ubuntu VPS, run one command, and it installs the full stack (Nginx, PHP 8.x, MySQL/MariaDB, Redis, Node.js, Supervisor)
Git Deployments - Connect GitHub/GitLab, push to deploy with zero downtime
SSL Certificates - Free Let's Encrypt with auto-renewal
Database Management - Create databases and users without touching the terminal
Team Access - Multi-tenant setup with proper role management

The difference? You host it yourself. No monthly fees. No vendor lock-in.

The Tech Stack

We went all-in on Laravel (obviously):

Laravel 12
Filament 3 (for the entire admin panel)
Livewire 3
Jetstream (auth + teams)
Tailwind CSS

The server agent is a small Go binary that runs on managed servers and executes jobs dispatched from Laravel. It's around 5MB and quite efficient.

What About Node.js?

This was actually a recent addition. We realised many developers (ourselves included) are deploying Next.js and Express apps alongside Laravel. So we added proper Node.js support:

Automatic Supervisor process management
Health check monitoring
Reverse proxy configuration
Support for npm, yarn, and pnpm

You can deploy a Next.js app just as easily as a Laravel one now.

Screenshots

Here's what the dashboard looks like:

https://sitekit.dev/

Server overview with real-time metrics

Is It Production Ready?

We have been using it internally for a few days now, managing around 15 servers. It's stable, but I won't claim it's as battle-tested as Forge, which has years of development behind it.

The codebase is clean, though. If you're comfortable with Laravel and Filament, you can easily extend it or fix issues yourself. That's the beauty of open source.

Try It Out

The project is on GitHub: https://github.com/avansaber/sitekit

Documentation is in the README. Basic setup is:

git clone https://github.com/avansaber/sitekit.git
cd sitekit
composer install
cp .env.example .env
php artisan key:generate
php artisan migrate
php artisan make:filament-user

If you do try it, I would genuinely appreciate feedback. Open an issue, start a discussion, or drop a comment here. We're actively developing this and community input helps prioritise features.

What's Next?

Currently working on:

Docker support for easier self-hosting
More cloud provider integrations
Better backup management
Redis and queue monitoring

If this solves a problem for you or your team, do give it a star on GitHub. And if you find bugs (you probably will), please report them!

Built by the team at https://www.avansaber.com. We build developer tools and work on Laravel projects.