Forem: Muhammed Rasin O M

The Best Python Library for Generating Quick Synthetic Data in 2026

Muhammed Rasin O M — Sat, 11 Apr 2026 14:22:09 +0000

Misata: Generate Realistic Synthetic Datasets From Plain English Descriptions

Generating synthetic data in Python used to mean one of three things: write random.uniform() loops by hand, use Faker for fake names and emails, or spend a week configuring SDV on top of real data you might not even have. But we have got LLMs now. Still maintaining the logics and the referential integrity is a nightmare.

Misata is none of those things.

One sentence in. Multiple related tables out. Distributions calibrated to real-world statistics. Foreign key integrity guaranteed. Monthly revenue targets hit to the cent.

pip install misata

import misata

tables = misata.generate(
    "A SaaS company with 2000 users. "
    "MRR rises from 80k in January to 320k in June, "
    "drops to 180k in August due to churn, "
    "then recovers to 400k in December.",
    seed=42,
)

That generates two linked tables with 21,000+ rows. Here is what the monthly MRR looks like when you sum the rows:

Jan    $80,000   ✓
Feb   $128,000   ✓
Mar   $176,000   ✓
Apr   $224,000   ✓
May   $272,000   ✓
Jun   $320,000   ✓
Jul   $250,000   ✓
Aug   $180,000   ✓   <- churn dip, as described
Sep   $235,000   ✓
Oct   $290,000   ✓
Nov   $345,000   ✓
Dec   $400,000   ✓

Every target exact. Not approximate. The individual rows still follow a log-normal distribution (median MRR $126, mean $150, p90 $291) because that is what real SaaS revenue looks like. But the monthly totals are pinned to whatever story you gave it.

The core problem: why most synthetic data is useless

There's a gap between what synthetic data generators produce and what you actually need to build, test, or demo a data system.

Uniform distributions lie. Real revenue data is log-normal. Real fraud rates hover around 2%, not 50%. Real product category distributions follow Zipf's law - one category dominates, the others trail off. When your fake data looks nothing like the real thing, your model trains on lies, your dashboards tell wrong stories, and your tests pass cases that would fail in production.

Referential integrity breaks things. If you're testing a JOIN across customers and transactions, orphan foreign keys will silently ruin your results. Most data generators either skip relational structure entirely or produce it inconsistently.

Business targets get ignored. You don't just want data that looks roughly right. You want a dataset where Q3 revenue dips 22% due to a simulated product recall, or where churn spikes in August because your description says so. No general-purpose generator can do this.

Misata was built specifically to close this gap.

Why distributions matter more than people think

Most fake data generators produce values that are uniformly distributed. When you plot them, everything looks flat. Real business data is never flat.

Misata ships calibrated distribution priors for seven domains. Here is what that means in practice.

Fintech: fraud rates, credit scores, and account balances

tables = misata.generate(
    "A fintech company with 2000 customers and banking transactions.",
    seed=42,
)

transactions = tables["transactions"]
print(f"Fraud rate: {transactions['is_fraud'].mean() * 100:.2f}%")

Fraud rate: 2.00%

400 fraudulent transactions out of 20,000. The calibrated real-world baseline for card fraud is around 2%. That is what you get. Not a random number. A calibrated one.

Credit scores follow the actual US distribution:

mean:   679   (real US average: 680-720)
std:     80   (real range: 70-90)
min:    328
max:    850

Account balances follow log-normal because real bank balances do:

median     $1,976
mean       $6,128
p90       $14,260
p99       $62,565

Most customers have under two thousand dollars. A few have tens of thousands. The tail is real. This matters enormously if you're building fraud detection models, credit scoring pipelines, or stress-testing payment infrastructure — a flat distribution would make every one of those models overfit to a distribution that doesn't exist in production.

Healthcare: blood type frequencies, age distributions, and appointment patterns

tables = misata.generate("A hospital with 500 patients and doctors.", seed=42)
patients = tables["patients"]

Blood type    Generated    Real-world
O+               37.9%        38.0%   ✓
A+               33.9%        34.0%   ✓
B+                9.6%         9.0%   ✓
AB+               3.0%         3.0%   ✓
O-                6.5%         7.0%   ✓
A-                6.1%         6.0%   ✓
B-                2.0%         2.0%   ✓
AB-               0.9%         1.0%   ✓

All eight blood types within 0.6% of the actual ABO/Rh frequency distribution. Patient ages center on 45 with a standard deviation of 18, matching a chronic-care hospital population. Nobody configured any of this. It is what the healthcare domain prior knows.

This level of epidemiological accuracy is essential when you're training triage models, testing EHR systems, or building health analytics pipelines that will eventually run on real patient data.

Ecommerce: Zipf categories, seasonal peaks, and return rates

schema = misata.parse(
    "An ecommerce store with 5000 customers and orders. "
    "Revenue grows from 100k in January to 300k in November "
    "then 350k in December.",
    rows=5000,
)
tables = misata.generate_from_schema(schema)

Product categories follow Zipf's law because that is how real shopping behavior works:

electronics      47.1%
clothing         20.0%
home & garden    12.3%
sports            8.7%
books             6.5%
beauty            5.5%

One category dominates. The rest trail off. Uniform would give you ~17% each. Real shopping does not look like that.

Order statuses come with realistic rates:

completed    71.5%
shipped      12.4%
pending       8.2%
returned      5.0%
cancelled     3.0%

Real e-commerce return rates are 8–10%. That is what gets generated. If you're building a returns processing pipeline, this means your test data will actually stress the right code paths.

Referential integrity across all tables

Every child table samples foreign key values from the actual parent pool. This means zero orphan rows by construction, not by luck.

tables = misata.generate(
    "A fintech company with 2000 customers and banking transactions.",
    seed=42,
)

customers    = tables["customers"]     # 2,000 rows
accounts     = tables["accounts"]      # 2,600 rows
transactions = tables["transactions"]  # 20,000 rows

# Both FK edges hold
orphan_accounts = (~accounts["customer_id"].isin(customers["customer_id"])).sum()
orphan_txns     = (~transactions["account_id"].isin(accounts["account_id"])).sum()

print(orphan_accounts)  # 0
print(orphan_txns)      # 0

Tables are generated in topological dependency order. Parents first. Children sample from the completed parent pool. It cannot produce orphans.

This matters for any workflow that involves JOINs. Referential integrity errors in test data produce false negatives — your pipeline looks like it works until it meets real data.

The two-step flow for more control

When you want to inspect or modify the schema before committing to generation:

schema = misata.parse("A hospital with 500 patients and doctors.")
print(schema.summary())

Schema: Healthcare Dataset
Domain: healthcare
Tables (3)
  doctors         25 rows    [doctor_id, first_name, last_name, specialty, years_experience]
  patients       500 rows    [patient_id, name, age, gender, blood_type, registered_at]
  appointments  1500 rows    [appointment_id, patient_id, doctor_id, type, duration_minutes]

Relationships (2)
  patients.patient_id  -> appointments.patient_id
  doctors.doctor_id    -> appointments.doctor_id

Adjust the seed, add columns, change row counts. Then generate. The two-step flow is useful for teams where a data engineer defines the schema and a developer generates data against it — the schema becomes a shared artifact you can version control.

Real-world use cases

Use case 1: Training ML models without access to production data

Privacy regulations — GDPR, HIPAA, CCPA — make it difficult or impossible to use real user data for model training in many industries. The usual workaround is anonymization, but anonymized data often loses the statistical properties that make it useful for training.

Misata generates statistically calibrated data with no PII at all. A fraud detection team can produce 500,000 transactions with a realistic 2% fraud rate, a plausible credit score distribution, and calibrated account balance tails — without touching a single real customer record.

tables = misata.generate(
    "A fintech company with 50000 customers and banking transactions. "
    "Fraud rate is 2%. High-value accounts above 50k balance are 3x more likely to be targeted.",
    seed=42,
)

The model trains on data that behaves like production data. The privacy risk is zero.

Use case 2: Seeding development and staging databases

Every new developer joining a product team hits the same wall: the development database is empty or has three test rows from 2019. You can't build features that depend on realistic data patterns without realistic data.

Misata can seed a full development database in seconds:

from misata import seed_database

tables = misata.generate("A SaaS company with 1000 users.", seed=42)
report = seed_database(tables, "postgresql://user:pass@localhost/mydb", create=True)
print(report.total_rows)  # 12,400

Or from the CLI, which makes it easy to add to a Makefile or docker-compose setup:

misata generate \
  --story "A SaaS company with 1000 users" \
  --db-url postgresql://user:pass@localhost/mydb \
  --db-create --db-truncate

SQLite works too for local-only development:

misata generate \
  --story "A SaaS company with 1000 users" \
  --db-url sqlite:///./dev.db \
  --db-create --db-truncate

A new developer can run make seed-db and have a working dataset in their environment in under 10 seconds.

Use case 3: Building product demos without real customer data

Sales engineering teams routinely need to demo analytics dashboards, CRM systems, and data products to prospects. Using real customer data for demos is a legal and ethical non-starter. Using hand-crafted fake data means someone spends two days building a CSV in Excel.

Misata lets you generate a compelling, internally consistent demo dataset for any domain:

tables = misata.generate(
    "A B2B SaaS company with 800 enterprise customers. "
    "ARR grows from 2M in Q1 to 5M in Q4. "
    "Average contract value is 6000. Churn rate is 8%.",
    seed=42,
)

The result is a dataset where every KPI in the demo dashboard reflects a plausible business trajectory — not a random scatter of numbers.

Use case 4: Testing data pipelines and ETL systems

Data pipeline tests are only as good as the data they run on. Edge cases like NULL foreign keys, skewed distributions, and outlier values are exactly what break pipelines in production — and exactly what hand-crafted test data tends to miss.

Misata's calibrated distributions naturally produce the tail values that stress-test pipelines:

tables = misata.generate(
    "A logistics company with 10000 shipments. "
    "Include delayed deliveries at a 12% rate. "
    "International shipments are 30% of total volume.",
    seed=42,
)

The p99 values in account balances, the occasional NULL in optional fields, the rare blood type AB- at 1% frequency — these are the values that reveal pipeline brittleness.

Use case 5: Generating benchmark datasets for academic and research use

Researchers publishing papers on data systems, query optimizers, or ML benchmarks need datasets that are reproducible, realistic, and free of privacy concerns. Misata's seed parameter makes generation fully deterministic:

tables = misata.generate(
    "A marketplace with 5000 buyers and sellers, orders, and product listings.",
    seed=42,  # Anyone running this gets the exact same dataset
)

Share the seed and description in your paper. Readers can reproduce your exact dataset with a single Python call.

Use case 6: Prototyping data products and BI dashboards

Before you connect a BI tool to production data, you need something to build against. Misata gives you a structurally correct, statistically plausible dataset to prototype on — so you can validate your data model, build your first dashboard, and demo your schema to stakeholders before a single production row exists.

LLM-powered generation for custom domains

The rule-based parser covers SaaS, ecommerce, fintech, healthcare, marketplace, logistics, and pharma. For anything outside those domains, the LLM backend handles arbitrary schema generation:

from misata import LLMSchemaGenerator

gen    = LLMSchemaGenerator(provider="groq")   # or openai, ollama
schema = gen.generate_from_story(
    "A B2B marketplace with vendor tiers, SLA contracts, and quarterly invoices"
)
tables = misata.generate_from_schema(schema)

This works with any LLM provider that supports the OpenAI-compatible API format. Requires GROQ_API_KEY or OPENAI_API_KEY. Retries automatically on rate limits.

The LLM path is useful for:

Industry-specific schemas with unusual entities (clinical trials, commodity trading, fleet management)
Multi-tenant SaaS with complex permission hierarchies
Any domain where the rule-based parser doesn't have calibrated priors

The LLM infers a reasonable schema, column types, and row count ratios from your description. You get back the same DataFrames as the rule-based path — just with the schema derived from a language model instead of hard-coded priors.

How it compares to Faker and SDV

Faker generates individual fake values. One row at a time. It has no concept of tables that reference each other and no domain-specific distributions. Wiring foreign keys and getting log-normal amounts is your job.

SDV (Synthetic Data Vault) learns patterns from real data and generates synthetic copies. It requires actual training data, pulls in heavy ML dependencies, and cannot pin specific business targets like "fraud rate must be 2%." If you don't have real data to train on, SDV is a dead end.

Misata generates from a description. No real data required. No ML training. Distributions are calibrated to domain knowledge. Business targets are exact.

	Faker	SDV	Misata
Multi-table FK integrity	No	Partial	Yes
No real data needed	Yes	No	Yes
Calibrated domain distributions	No	Learned	Yes
Exact monthly aggregate targets	No	No	Yes
Plain-English story input	No	No	Yes
Database seeding	Manual	No	Yes
LLM-powered custom domains	No	No	Yes
Reproducible with seed	No	No	Yes

The key distinction: SDV is a synthetic data replication tool. Misata is a synthetic data generation tool. They solve different problems. SDV needs real data to learn from. Misata generates from scratch.

Installation and quick start

pip install misata pandas numpy

All of these produce full verified output in under 3 seconds:

python examples/saas_revenue_curve.py
python examples/fintech_fraud_detection.py
python examples/healthcare_multi_table.py
python examples/ecommerce_seasonal.py

Or open the Colab notebook and run it without installing anything. No signup, no API key, no configuration.

Design principles

A few constraints Misata holds to that are worth understanding:

Determinism over randomness. Given the same description and seed, you always get the same dataset. This is non-negotiable for reproducible research and CI pipelines where test data needs to be stable across runs.

Statistical realism over convenience. It would be simpler to generate uniformly distributed values. Misata does not do this because uniform distributions produce data that behaves nothing like real data. The extra calibration work is the point.

Aggregate targets are constraints, not approximations. When you say MRR should be $320k in June, the generated data will sum to exactly $320k in June. Not $318k. Not $322k. The individual rows remain statistically realistic while the aggregates are treated as hard constraints.

Referential integrity is structural, not checked. Misata does not generate data and then validate foreign keys. It generates in dependency order so invalid keys cannot occur. This is a stronger guarantee than post-hoc validation.

Frequently asked questions

Can I add custom columns to a generated schema?

Yes. The two-step parse → generate_from_schema flow lets you inspect and modify the schema object before generating. You can add columns, change data types, adjust row counts, and modify relationship cardinality.

How large can generated datasets be?

Misata is DataFrame-based, so the practical limit is your available RAM. For datasets larger than a few million rows, you can generate in chunks and write directly to a database using seed_database. Benchmarks on a standard laptop show ~500k rows/second for most schemas.

Does it support databases other than PostgreSQL and SQLite?

seed_database accepts any SQLAlchemy connection string, which covers PostgreSQL, MySQL, SQLite, MS SQL Server, Oracle, and others. If SQLAlchemy can connect to it, Misata can seed it.

Is there a way to generate time-series data?

Temporal columns are supported. The registered_at, transaction_date, and similar timestamp fields follow realistic distributions relative to one another — a customer's first transaction always comes after their account creation date, for example. You can specify date ranges in your description: "transactions between January 2023 and December 2024."

What if I need data that follows my company's specific distribution?

The LLM path lets you describe distribution constraints in natural language: "30% of accounts are enterprise tier with balances above $50k." For highly specific requirements, the schema object exposes column-level distribution parameters you can override directly.

Misata is open source, MIT licensed, and available now.

GitHub: github.com/rasinmuhammed/misata
PyPI: pypi.org/project/misata
Docs: QUICKSTART.md
Colab: Run the quickstart notebook

How I Built a "Story-to-Data" Engine in Python (Because Faker Wasn't Enough)

Muhammed Rasin O M — Tue, 16 Dec 2025 16:01:49 +0000

The "2 Months of Pain" Origin Story

An year ago, I was working as a Data Science Engineer at a consultancy firm. We needed to build a Tableau dashboard to demonstrate a new business model. The consultants didn't want "random" data; they wanted a specific story:

"Show a _____ failing in Phase 2, causing a 40% revenue dip in Q3, followed by a recovery in Q4 due to a new ____ launch."

I tried at first using standard libraries like Faker and Mimesis. They are fantastic for generating random names and emails, but they failed hard on Business Logic. Then I used python scripting to generate the data, using for loops and all kind of loops.

I ended up with:

Time Travel Bugs: Timesheets dated before an employee's hire date.
Orphaned Rows: Orders linked to non-existent Users.
Flat Curves: Revenue that looked like static noise, not a "Q3 Dip."

I spent 2 months manually hacking Python scripts, hard-coding probabilities, and stitching CSVs together to make the demo look real. It was a nightmare.

I realized: We don't need more random data generators. We need Narrative Data Engines.

So, I built Misata.

What is Misata?
Misata is an open-source Python engine that turns a natural language story into a multi-table, referentially intact dataset.

Instead of writing 500 lines of schema config, you just type:

misata generate --story "A SaaS platform with 50k users, 20% churn in Q3, and usage-based billing" --use-llm

And it generates SQL-ready CSVs where the math actually works.

Under the Hood: The Architecture

Misata isn't just a wrapper around Faker. It uses a Neuro-Symbolic approach to solve the consistency problem.

The Brain (LLM Parser)

First, it uses an LLM (I optimized it for Llama 3.3 via Groq) to parse your story into a strict JSON schema. It extracts:

Entities: Users, Subscriptions, Invoices.

Distributions: "20% churn" becomes a probability weight.

Relationships: "Invoices belong to Subscriptions."

The Logic (Topological Sort)

To prevent "Orphaned Rows," Misata builds a Directed Acyclic Graph (DAG) of your tables. It uses Topological Sorting to ensure parent tables (e.g., Users) are generated before child tables (e.g., Orders).

Python

# Simplified logic from misata/simulator.py
def topological_sort(self):
    graph = defaultdict(list)
    in_degree = {table.name: 0 for table in self.config.tables}

    for rel in self.config.relationships:
        graph[rel.parent_table].append(rel.child_table)
        in_degree[rel.child_table] += 1

    # Standard Kahn's Algorithm...

The Muscle (Vectorized NumPy)

The biggest bottleneck with Python data generation is looping. Generating 10 million rows in a loop is too slow.

Misata uses Vectorized Operations (via NumPy and Pandas) to generate data in blocks. This allows it to hit speeds of ~250,000 rows/second on a standard laptop.

Features for Data Engineers

I built this to solve the specific pains I faced in consulting:

Relational Integrity: It automatically maps Primary Keys to Foreign Keys. No more broken joins in SQL/Tableau.

No "Time Travel": Child tables (like Timesheets) automatically look up their parent's Start Date to ensure events happen chronologically.

Business Constraints: You can define rules like "Employees cannot log > 8 hours/day."

Try it out
It's open source and available on PyPI.

pip install misata

Generate a dataset:

# Needs GROQ_API_KEY (free tier works great)
misata generate --story "E-commerce store with seasonal spikes in December" --use-llm

Why I Open Sourced It
I know there are enterprise tools out there that cost $10k+/year. But for individual consultants, students, and indie hackers, there was no good "middle ground" between Faker and Enterprise Privacy tools.

I want Misata to be that middle ground.

I'm currently working on adding Curve Fitting (so you can draw a chart and get data that matches it). If you're into Data Engineering or Python optimization, I'd love your feedback on the architecture!

Repo: github.com/rasinmuhammed/misata

P.S. If you are a consultant stuck in "Demo Data Hell" right now and need a specific scenario generated, drop a comment or DM me. I'm looking for complex edge cases to stress-test the engine.

I Built a TUI to Visualize RAG Chunking because chunk_size=1000 is a Lie 📉

Muhammed Rasin O M — Wed, 10 Dec 2025 15:38:26 +0000

Let’s be honest for a second. When you are building a RAG (Retrieval-Augmented Generation) pipeline, how do you pick your chunk_size and overlap?

If you are like 90% of us, you copy-paste 1000 and 200 from a tutorial, run it, and hope the LLM doesn't hallucinate.

I realized I was doing "vibes-based engineering". I had no idea if my chunks were cutting sentences in half, if my overlap was actually preserving context, or if my retrieval was failing because of the embedding model or the chunking strategy.

So, I spent my nights and weekends building a tool to fix it.

Meet RAG-TUI.

It is an open-source, terminal-based visual debugger for RAG pipelines. It helps you see what your splitters are doing before you index millions of documents.

The Problem: "The Black Box"

We treat text splitters like black boxes. You feed in a PDF, and out comes a list of strings. But what do those strings look like?

Did you just cut a critical definition in half?
Is your 10% overlap actually capturing the previous sentence?
Are you feeding your embedding model garbage?

I got tired of printing chunks to the console to debug this. I wanted a UI, but I didn't want to leave my terminal.

The Solution: RAG-TUI

RAG-TUI is a lightweight CLI tool built in Python. You point it at a file, and it gives you an interactive dashboard to tune your indexing strategy in real-time.

Key Features (Why you might want this)

1. Real-time Visualization 🎨
Drag a slider to change the chunk_size. Watch the text re-chunk instantly.
The UI uses color-coded cards to show you exactly where one chunk ends and the next begins.

2. Quality Indicators 🚦
I added visual "linters" for your chunks:

🟢 Green: Clean break (ends with ., !, ?).
🟡 Yellow: Mid-phrase break (ends with ,, :).
🔴 Red: Hard cut (ends with a character).
⚠️ Warning: Chunk is too small (<50 tokens) or too large.

3. "Scientific" Batch Testing
Stop guessing. Enter 20 test queries ("What is the refund policy?", "How do I reset my password?"). RAG-TUI runs them against your current settings using local vector search and calculates a Hit Rate.

Hit Rate > 80%? Ship it.
Hit Rate < 60%? Your chunks are wrong. Change the strategy.

4. Privacy First (Ollama Support) 🔒
You don't need to send your private docs to OpenAI just to debug a splitter. RAG-TUI has native support for Ollama. You can run the entire debugging loop offline on your laptop.

💻 Under the Hood

For the Python nerds (like me), here is the stack that makes this possible:

UI: Textual (The best TUI framework, period).
Chunking: Chonkie (Blazing fast token splitting).
Vector DB: Usearch (Lightweight, in-memory vector search).
LLM: Async wrapper for Ollama/OpenAI.

🚀 How to Try It

I tried to make the onboarding as painless as possible.

pip install rag-tui

Then, just run:

rag-tui

(Make sure you have ollama serve running if you want to test embeddings!)

🤝 I Need Your Feedback!

This is currently in v0.0.2 Beta. It works, but I know there are edge cases I haven't found yet.

I am building this in public because I believe RAG tooling needs to get better. If you are learning RAG or building a production pipeline, please give this a spin.

Does it support your weird PDF format?
Do you need a specific splitter I haven't added?
Is the TUI crashing on Windows? (It shouldn't, but you know... Windows).

Star the repo if you think this is useful. It motivates me to keep shipping updates! ⭐️

🔗 GitHub Repo: https://github.com/rasinmuhammed/rag-tui

Happy Chunking! ✂️

I built a FastAPI admin panel that doesn't suck (and here's why it's different)

Muhammed Rasin O M — Sat, 06 Dec 2025 13:56:15 +0000

TL;DR: FastAPI Matrix Admin combines one-line auto-discovery, async-first architecture, and production-grade security in a package that requires zero Node.js.

🟢 Live Demo (Read-only)
💻 Github Repo

The admin panel problem nobody talks about

Every FastAPI project follows the same arc:

You build a great API.
Product wants to "just update a few records manually."
You reluctantly install Django admin (now you have two frameworks).
Or you build a custom React dashboard (6 weeks later...).
Or you use a generic admin and spend days configuring it.

I've done all three. They all sucked for different reasons.

What's wrong with existing FastAPI admin solutions?

I evaluated every major option before building this. Here's what I found:

Library	Issue
FastAPI-Admin	Requires Tortoise ORM (can't use SQLAlchemy).
SQLAdmin	Good, but sync-only. No async support in 2024?
Starlette-Admin	Heavy Starlette dependency, limited FastAPI integration.
Admin-One	Requires Vue.js build step, defeats FastAPI's simplicity.

The pattern: They either force you into specific ORMs, ignore async, or add frontend build complexity.

What I Built Instead

FastAPI Matrix Admin focuses on three non-negotiables:

1. One-Line Auto-Discovery (Because Your Time Matters)

from fastapi_matrix_admin import MatrixAdmin

admin = MatrixAdmin(app, engine=engine, secret_key="...")
admin.auto_discover(Base)  # Done. All models registered.

Under the hood:

Introspects SQLAlchemy models using inspect().
Analyzes column types to generate appropriate form fields.
Detects text columns for search.
Finds timestamp columns for default ordering.
Creates a sensible list display based on column types.

Customization when you need it:

admin.register(
    User,
    list_display=["id", "email", "created_at"],
    searchable_fields=["email", "name"],
    ordering=["-created_at"],
    exclude=["password_hash"]  # Obviously
)

2. Zero Node.js (Seriously)

No npm. No webpack. No package.json.

Stack: Tailwind CSS via CDN, HTMX for dynamic updates, Alpine.js (3KB), and Jinja2 templates.
Why this matters:
- pip install fastapi-matrix-admin → you're done.
- No build step in CI/CD.
- No frontend/backend version mismatches.
- Deploys anywhere Python runs.

3. Production-Grade Security (Not an Afterthought)

Most admin libraries treat security as optional. Here is what is built-in:

Content Security Policy (CSP): Prevents XSS by strictly controlling script sources.
CSRF Protection: Every form gets a signed token automatically.
URL Signing: All admin URLs are cryptographically signed to prevent ID enumeration/tampering.
Type Safety with Pydantic v2: Input validation happens automatically.

Security Comparison:

Feature	FastAPI Matrix Admin	SQLAdmin	FastAPI-Admin
CSP Headers	✅ Built-in	❌ Manual	❌ Manual
CSRF Protection	✅ Automatic	⚠️ Optional	❌ None
URL Signing	✅ Yes	❌ No	❌ No
Pydantic v2	✅ Yes	⚠️ v1	❌ No validation

Performance: Async All The Way Down

Full async support isn't optional in 2024.

# Async SQLAlchemy 2.0
from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession

engine = create_async_engine("postgresql+asyncpg://...")
admin = MatrixAdmin(app, engine=engine)

Simple benchmark (100 concurrent list view requests):

FastAPI Matrix Admin (async): ~50ms avg
SQLAdmin (sync): ~180ms avg (blocks other requests)

The Matrix UI (Yeah, It's Different)

I'm not going to pretend the cyberpunk aesthetic is for everyone. But here's why it exists:

Problem: Every admin panel looks the same. Generic Bootstrap tables. Boring gray sidebars. No personality.
Solution: Terminal-style green/black theme with neon accents.

It makes internal tools feel less corporate, and stakeholders actually remember seeing "that Matrix admin thing." (If you hate it, the CSS variables are customizable).

Quick Start

pip install fastapi-matrix-admin

from fastapi import FastAPI
from sqlalchemy import Column, Integer, String, Boolean
from sqlalchemy.ext.asyncio import create_async_engine
from sqlalchemy.orm import declarative_base
from fastapi_matrix_admin import MatrixAdmin

Base = declarative_base()

class User(Base):
    __tablename__ = "users"
    id = Column(Integer, primary_key=True)
    email = Column(String, unique=True)
    is_active = Column(Boolean, default=True)

app = FastAPI()
engine = create_async_engine("sqlite+aiosqlite:///./database.db")

# That's it
admin = MatrixAdmin(app, engine=engine, secret_key="your-secret-key-min-32-chars")
admin.auto_discover(Base)

Run it:

uvicorn app:app
# Visit http://localhost:8000/admin

What's Next

Current roadmap:

[ ] File/image upload support
[ ] Advanced filters (date ranges, multi-select)
[ ] Export to CSV/Excel
[ ] Custom dashboard widgets
light theme for the Matrix-haters

Final Thoughts

I built this because I kept rebuilding the same admin panel over and over. Auto-discovery saves me hours per project. Zero Node.js means simple deploys. The security features mean I can actually use this in production.

If you try it, let me know what breaks. Or what you wish it did differently.

Questions I'd love feedback on:

Is auto-discovery too magical, or genuinely useful?
What security features am I missing?
Would you actually use this in production?

Drop a comment or open an issue. First-time contributors welcome!

Github: fastapi-matrix-admin