Forem: Andrew Jewell Sr

AxonML -- A PyTorch-equivalent ML framework written in Rust

Andrew Jewell Sr — Sat, 28 Feb 2026 20:23:15 +0000

Andrew Jewell Sr / AutomataNexus LLC

Over the past year and a half, I've been building AxonML -- a machine learning framework in Rust that aims for feature parity with PyTorch. It's now at v0.3.2: 22 crates, 336 Rust source files, 1,095 passing tests, and it's running production inference on Raspberry Pi edge hardware in commercial buildings. This post covers why I built it, how it's architected, the hard technical problems I ran into, and where it's actually being used.

GitHub: github.com/AutomataNexus/AxonML
License: MIT / Apache-2.0

Motivation

I built an entire building automation ecosystem from scratch. NexusBMS is the central building management platform -- won an InfluxDB hackathon with it, runs InfluxDB 3.0 OSS alongside my own database (Aegis-DB, also open source). The edge controllers are 50+ Raspberry Pi 4/5s running my custom NexusEdge software: Rust hardware daemons for I2C, BACnet, and Modbus communications, direct HVAC equipment control via analog outputs, 24V triacs, 0-10V inputs, 10K/1K thermistor inputs, and dry contact inputs. Custom control logic per equipment type. 16+ facilities including Taylor University, Element Labs, Byrna Ammunition, St. Jude Catholic School, Heritage Point Retirement Facilities in two different cities. Over 120 pieces of equipment -- air handlers, boilers, cooling towers, pumps, DOAS units, natatorium pool units, exhaust fans, greenhouses.

The monitoring uses machine learning -- LSTM autoencoders for anomaly detection, GRU networks for failure prediction -- running on those Pi edge controllers mounted in mechanical rooms. Pi 5s have Hailo NPU chips running larger models; Pi 4s run smaller AxonML Rust inference models.

The original plan was to train models in PyTorch and deploy inference in Python on the Pis. This didn't work well. Python's memory footprint on a 1 GB RAM Pi was too high. Dependency management was fragile. PyTorch's ARM support was incomplete. And I was spending more time fighting the deployment pipeline than building models.

I wanted a framework where I could:

Define and train models with PyTorch-like ergonomics
Compile to a single static binary
Cross-compile to ARM
Run inference at 2-3 MB RSS with no runtime dependencies

Rust was the obvious choice. The question was whether one person could build enough of a framework to actually be useful.

The answer, it turns out, is yes -- with caveats I'll get into.

Architecture: 22 Crates

AxonML is structured as a Cargo workspace with 22 crates, organized in layers. Each crate is independently testable and can be pulled in via feature flags.

Layer 1: Compute Foundation

axonml-core provides device abstraction across CPU, CUDA, Vulkan, Metal, and WebGPU. The Device enum dispatches operations to the appropriate backend. Storage<T> is the reference-counted raw memory backing for tensors. The CUDA backend implements GPU memory allocation, cuBLAS GEMM for matrix multiply, and 20+ element-wise CUDA kernels compiled from PTX source.

axonml-tensor implements N-dimensional tensors generic over scalar type: Tensor<T: Scalar>. Broadcasting follows NumPy rules. Views and slicing are zero-copy where possible (backed by Arc<Storage>). 60+ operations including arithmetic, reductions (sum, mean, max, min, prod), sorting (sort, argsort, topk), indexing (gather, scatter, nonzero, unique), shape manipulation (flip, roll, squeeze, unsqueeze, permute), and activations (ReLU, Sigmoid, Tanh, Softmax, GELU, SiLU, ELU, LeakyReLU). Sparse tensor support in COO format.

axonml-autograd is the reverse-mode automatic differentiation engine. Variable wraps a tensor and connects it to the computational graph via gradient functions. The graph is tape-based with Arc<Mutex<>> for shared ownership. Backward pass performs topological sort over the graph and applies the chain rule. Includes Automatic Mixed Precision (autocast context for F16 training) and gradient checkpointing (trade compute for memory). Backward functions cover activations (ReLU, Sigmoid, Tanh, Softmax, LeakyReLU, GELU), arithmetic (Add, Sub, Mul, Div, Neg, Pow, Sum, Mean), and linalg (MatMul, Transpose, Reshape, Cat, Select, Expand, SumDim).

Layer 2: ML Primitives

axonml-nn provides the Module trait (with forward(), parameters(), train()/eval()) and 37+ layer types across 12 source files:

Core: Linear, Conv1d, Conv2d, Embedding, ResidualBlock
Pooling: MaxPool1d/2d, AvgPool1d/2d, AdaptiveAvgPool2d
Normalization: BatchNorm1d/2d, LayerNorm, GroupNorm, InstanceNorm2d
Regularization: Dropout, Dropout2d, AlphaDropout
Recurrent: RNN, LSTM, GRU (each with cell variants: RNNCell, LSTMCell, GRUCell)
Attention: MultiHeadAttention, CrossAttention
Transformer: TransformerEncoderLayer, TransformerDecoderLayer, TransformerEncoder, TransformerDecoder, Seq2SeqTransformer
Graph neural networks: GCNConv (Graph Convolutional Network), GATConv (Graph Attention Network)
Signal processing: FFT1d, STFT (Short-Time Fourier Transform)

Loss functions: MSELoss, CrossEntropyLoss, BCELoss, BCEWithLogitsLoss, L1Loss, SmoothL1Loss, NLLLoss.

Initialization: Xavier/Glorot (uniform and normal), Kaiming/He (uniform and normal), Orthogonal, Sparse, plus uniform, normal, constant, zeros, ones, eye, diag.

Activations: ReLU, Sigmoid, Tanh, GELU, SiLU, ELU, LeakyReLU, Softmax, LogSoftmax, Identity.

axonml-optim implements five optimizers: SGD (with momentum, Nesterov, weight decay, dampening), Adam (with AMSGrad), AdamW (decoupled weight decay), RMSprop (centered, with momentum), and LAMB (layer-wise adaptive moments for large-batch training). GradScaler for mixed-precision gradient scaling. Seven learning rate schedulers: StepLR, MultiStepLR, ExponentialLR, CosineAnnealingLR, OneCycleLR, WarmupLR, ReduceLROnPlateau.

axonml-data provides the Dataset trait, DataLoader with batching and shuffling, samplers (Sequential, Random, SubsetRandom, WeightedRandom, Batch), transforms (Normalize, RandomNoise, RandomCrop, RandomFlip, Scale, Clamp), and collate functions.

Layer 3: Infrastructure

axonml-serialize handles model save/load in the .axonml binary format, JSON, and SafeTensors. StateDict extraction (PyTorch-compatible concept), checkpoint management with builder pattern, format auto-detection from file extensions and magic bytes, PyTorch key conversion utilities.

axonml-onnx imports and exports ONNX models with 40+ operator implementations at opset version 17.

axonml-quant provides block-based quantization (32-element blocks) in INT8 (Q8_0), INT4 (Q4_0, Q4_1), INT5 (Q5_0, Q5_1), and F16 with four calibration methods: MinMax, Percentile, Entropy, MeanStd. Q4 quantization achieves roughly 8x model size reduction. Parallel processing via Rayon. Compression stats with RMSE, max error, mean error.

axonml-fusion detects and applies kernel fusion patterns automatically: MatMul+Bias, MatMul+Bias+ReLU/GELU, Conv+BatchNorm, Conv+BatchNorm+ReLU, elementwise chains, Add+ReLU, Mul+Add FMA. FusedLinear and FusedElementwise (14 elementwise ops). Configurable optimizer with conservative/aggressive modes. Up to 2x speedup for memory-bound operations.

axonml-jit provides an intermediate representation for computation graphs (40+ ops), operation tracing, graph optimization passes (constant folding, dead code elimination, common subexpression elimination, algebraic simplification, elementwise fusion, strength reduction), LRU function caching, and shape inference with broadcasting. Built on a Cranelift foundation for native codegen.

axonml-profile includes memory profiling (allocation tracking, peak usage, leak detection), compute profiling (operation timing, FLOPS, throughput, bandwidth), timeline profiling with Chrome trace export (chrome://tracing), automatic bottleneck analysis (5 categories: SlowOperation, HighCallCount, MemoryHotspot, MemoryLeak, LowThroughput), and reports in Text/JSON/Markdown/HTML.

axonml-distributed implements four parallelism strategies: DistributedDataParallel (DDP) with gradient bucketing, Fully Sharded Data Parallel (FSDP) with ZeRO-2/ZeRO-3 and CPU offload, Pipeline Parallelism with microbatching and configurable schedules, and Tensor Parallelism via ColumnParallelLinear/RowParallelLinear. Collective operations: all-reduce (sum, mean, min, max, product), broadcast, all-gather, reduce-scatter, barrier, ring all-reduce, scatter, gather.

Layer 4: Domain Libraries

axonml-vision provides image transforms (Resize, CenterCrop, RandomHorizontalFlip, RandomVerticalFlip, RandomRotation, ColorJitter, Grayscale, Normalize, Pad), synthetic datasets (MNIST, CIFAR, Fashion-MNIST, CIFAR-100), and architecture implementations: LeNet, SimpleCNN, MLP, ResNet (18/34), VGG (11/13/16/19), Vision Transformer (ViT). Pretrained weight hub with local caching and ImageNet/MNIST/CIFAR normalization presets.

axonml-audio provides MelSpectrogram, MFCC, resampling, time stretching, pitch shifting, noise augmentation (SNR-based), audio normalization, and silence trimming. Synthetic datasets for command recognition, music genre classification, and speaker identification.

axonml-text provides six tokenizer types: Whitespace, Character, WordPunct, NGram (word and character n-grams), BPE (with training), and Unigram. Vocabulary management with special tokens (PAD, UNK, BOS, EOS, MASK) and frequency-based filtering. Synthetic datasets for sentiment analysis, seq2seq, and language modeling.

axonml-llm implements five LLM architectures:

BERT -- encoder with BertForSequenceClassification, BertForMaskedLM, BertPooler, configs for base/large/tiny
GPT-2 -- decoder with GPT2LMHead, configs for small/medium/large/xl/tiny
LLaMA -- LLaMAForCausalLM with RMSNorm, RotaryEmbedding, GroupedQueryAttention
Mistral -- MistralForCausalLM with sliding window attention
Phi -- PhiForCausalLM

Plus: FlashAttention with KVCache, MultiHeadSelfAttention, CausalSelfAttention. Text generation with greedy, top-k, top-p/nucleus, temperature, and beam search with repetition penalty. Hugging Face model loader with weight mappers for LLaMA, Mistral, and Phi. Pretrained model hub with configs for LLaMA, Mistral, Phi, and Qwen.

Layer 5: Application Stack

axonml-cli is a comprehensive CLI with 50+ commands covering project scaffolding, training, evaluation, inference, model conversion, ONNX export/import, quantization, workspace management, analysis/reports, data management, bundling/deployment, benchmarking, GPU management, pretrained model hub, Kaggle integration, dataset management (NexusConnectBridge API), Weights & Biases experiment tracking, and dashboard/server lifecycle management.

axonml-tui is a ratatui-based terminal dashboard with model architecture visualization, dataset explorer, real-time training monitor with sparkline trends and ETA, interactive loss/accuracy/learning rate graphs, file browser with preview, and vi-style keyboard navigation.

axonml-dashboard is a Leptos/WASM web frontend (client-side rendered) with 20+ page routes: full authentication flow (login/register/session), MFA setup (TOTP + WebAuthn), training run management with real-time WebSocket metrics, model registry (browse/upload/version), inference endpoint deployment, dark mode toggle, toast notifications, slide-out terminal with WebSocket PTY, responsive design.

axonml-server is an Axum-based REST + WebSocket API backend with 50+ endpoints: JWT authentication with refresh tokens, multi-factor authentication (TOTP/RFC 6238 + WebAuthn/FIDO2 hardware keys + recovery codes), Argon2 password hashing, training run management with WebSocket streaming, model registry with file upload/download, inference endpoint management, terminal WebSocket PTY, CORS, structured tracing, and Prometheus metrics export. Uses Aegis-DB as its backing document store.

The Hard Problems

Autograd Without a Garbage Collector

This was the central technical challenge. PyTorch's computational graph is managed by Python's reference counting and garbage collector. When a tensor goes out of scope, its graph node is freed. When you call loss.backward(), Python's GC ensures the graph stays alive exactly as long as needed.

Rust doesn't have a GC. The ownership model is fundamentally different.

My approach: each Variable (the autograd-aware tensor wrapper) holds an Arc reference to its gradient function. Gradient functions hold Arc references to their input variables. This creates a reference-counted graph that stays alive as long as any variable referencing it is alive.

Backward traversal collects all reachable nodes, topologically sorts them, and applies gradient functions in reverse order. After backward, the graph can be dropped.

The Arc<Mutex<>> pattern adds allocation overhead on every forward operation. Each tensor op creates a new Variable with a new Arc-wrapped gradient function. For small models and short sequences this is negligible. For very large models with long computational graphs, it's measurable.

If I were starting over, I'd explore arena-based allocation -- allocate all graph nodes from a bump allocator that gets reset after each backward pass. This would trade some API complexity for better performance characteristics.

A critical bug I found and fixed: early versions used Variable::new() for intermediate results, which creates leaf variables that sever the gradient graph. The fix was Variable::from_operation(), which creates non-leaf variables that properly participate in backpropagation. This is the kind of bug that's obvious in hindsight but took significant debugging to identify (loss would decrease for a few epochs then plateau because gradients weren't flowing through certain layers).

CUDA Integration

The CUDA backend was built incrementally. The core abstraction is CudaStorage -- GPU-resident memory managed via CUDA's cudaMalloc/cudaFree, with deallocation in Drop so GPU memory leaks are structurally impossible (barring panics).

Matrix multiplication dispatches to cuBLAS GEMM. Element-wise operations (add, subtract, multiply, divide, relu, sigmoid, tanh, exp, log, sqrt, abs, neg, clamp, pow, and more) use custom CUDA kernels compiled from PTX source.

The main challenge is keeping tensors on-device. In PyTorch, .to('cuda') moves a tensor to GPU and subsequent operations stay on GPU. In AxonML, the Device enum propagates through operations -- if both inputs are on CUDA, the output stays on CUDA. If there's a mismatch, you get an explicit error at compile time via the type system, rather than a runtime error.

Generic Tensors vs. Dynamic Types

I chose Tensor<T: Scalar> -- tensors are generic over their scalar type. This means Tensor<f32> and Tensor<f64> are distinct types. You can't accidentally add a float tensor to an integer tensor. Dimension mismatches are caught at compile time.

The tradeoff: you can't dynamically switch dtypes without enum dispatch. PyTorch lets you call .float() or .half() and it returns the same type with different internal representation. In AxonML, changing dtype requires converting to a different concrete type. This adds some API friction but eliminates an entire class of runtime bugs.

For the LLM architectures, where mixed-precision training switches between f32 and f16, I implemented the AMP autocast context that handles the conversion explicitly.

Cross-Compilation for ARM

The deployment target for my HVAC models is armv7-unknown-linux-musleabihf -- 32-bit ARM with hardware floating point, statically linked against musl libc. The build command:

cargo build --release --target armv7-unknown-linux-musleabihf

This produces a single binary with no dynamic library dependencies. Copy it to the Raspberry Pi, set it executable, and it runs. No cross-compilation toolchain complexity beyond having the right Rust target installed.

The inference binaries use pure tensor operations -- no autograd tape, no gradient tracking, no optimizer state. This keeps the binary small and the runtime footprint minimal. Each inference daemon runs at ~2-3 MB RSS.

Production: HVAC Predictive Maintenance

This is where AxonML proves itself beyond benchmarks and test suites.

I have 69 trained .axonml model files across 7 commercial building facilities: FCOG, Warren, Huntington, Akron, Hopebridge, NE Realty, and a unified NexusBMS system. The models cover a wide range of HVAC equipment:

Air handlers -- supply air temperature prediction, mixed air anomalies
Boilers -- steam/comfort/domestic hot water anomaly detection
Chillers -- condenser/evaporator anomaly patterns
VAV boxes -- zone temperature and airflow prediction
Fan coils -- heating/cooling valve anomalies
Make-up air units -- outside air conditioning monitoring
DOAS (Dedicated Outdoor Air Systems) -- ventilation anomalies
Pumps -- flow and pressure anomaly detection
Steam systems -- bundle condition and trap monitoring

The model architectures are:

Anomaly detectors: LSTM autoencoders that learn normal operating patterns and flag deviations. An input sequence of sensor readings goes through an LSTM encoder, gets compressed to a latent representation, then reconstructed by an LSTM decoder. Reconstruction error above a threshold signals anomalous behavior
Failure predictors: GRU networks that take recent sensor history and predict probability of equipment failure in the near future

12 of these models are running live inference on Raspberry Pi edge controllers. The deployment pipeline:

Train on server (CPU) using AxonML
Save model weights as .axonml files (or quantize to INT4/INT8 for smaller footprint)
Cross-compile inference daemon to ARM static binary
Deploy to Pi via the building management system's OTA update pipeline
PM2 manages the process (auto-restart, log management)
Daemon polls local NexusEdge controller for sensor data at 1 Hz
Runs inference, maintains rolling time-series buffers
Exposes anomaly scores and failure predictions via REST API (/api/inference/latest)
NexusBMS building management dashboard consumes the API

The NexusBMS system alone has 22 trained models covering every major equipment type in a commercial building. Each model trains in minutes on CPU, serializes to a few hundred KB (or less with quantization), and runs inference in microseconds.

Kaggle: Akkadian-to-English Machine Translation

To exercise the seq2seq and NLP capabilities, I entered the Deep Past Initiative Kaggle competition. The task: translate Akkadian cuneiform text to English. The dataset has ~1,561 parallel sentence pairs with 5,571 unique source tokens.

The AxonML model:

BPE tokenizer for both source and target languages
Sinusoidal positional encoding
Transformer encoder-decoder with multi-head attention
Trained end-to-end through AxonML's training pipeline
Evaluated on BLEU + chrF++

The entire pipeline -- data loading, tokenization, vocabulary building, model definition, training loop, checkpoint management, generation with beam search -- runs through AxonML. No Python anywhere in the pipeline.

This was a good stress test for the framework's NLP capabilities. seq2seq translation exercises: embedding layers, positional encoding, encoder with self-attention, decoder with masked self-attention and cross-attention, output projection, autoregressive generation.

What's Next

Real-time model serving with batched inference. The inference server works but doesn't batch across concurrent requests yet
Expanded CUDA kernel coverage. More operations need GPU implementations to reduce CPU fallbacks
Self-hosted pretrained weight hosting. Currently using a hub config system; want to host weights directly
More pretrained weights and ONNX import improvements. Making it easier to convert models from Hugging Face

Should You Use AxonML?

If you're doing standard ML research with Jupyter notebooks, Hugging Face, and cloud GPUs: probably not. PyTorch's ecosystem is vast and mature, and fighting a smaller ecosystem isn't worth it for research iteration speed.

If you're in one of these situations, it might be worth evaluating:

Edge deployment. You need ML inference on constrained hardware without Python
Rust applications. You're building a Rust application that needs embedded ML inference
Single-binary deployment. You want a model that compiles to one file with no dependencies
Graph neural networks in Rust. GCNConv and GATConv layers are implemented and ready to use
Learning ML internals. The codebase is MIT/Apache-2.0 and every layer is implemented from scratch in readable Rust. If you want to understand how autograd, attention, LSTM gates, or graph convolutions actually work, the source is there
HVAC/IoT/industrial. You're in a similar domain where models need to run on real hardware in real buildings

AxonML is one developer's work. It's not going to outpace PyTorch's development velocity. But it solves a real problem -- production ML on constrained hardware with compile-time safety -- and it's been doing that in production for real buildings with real equipment.

GitHub: github.com/AutomataNexus/AxonML

Andrew Jewell Sr is the founder of AutomataNexus LLC. AxonML is open source under MIT/Apache-2.0 dual license.

Why I Built a Multi-Paradigm Database in Rust — And Deployed It on 50+ Raspberry Pis

Andrew Jewell Sr — Sat, 28 Feb 2026 19:12:15 +0000

By Andrew Jewell Sr

I built an entire building automation ecosystem from scratch, and Aegis-DB is one piece of it.

NexusBMS is the central building management platform — won an InfluxDB hackathon with it, runs InfluxDB 3.0 OSS alongside Aegis-DB on the central server. The edge controllers are 50+ Raspberry Pi 4/5s running my custom NexusEdge software: Rust hardware daemons for I2C, BACnet, and Modbus communications, direct HVAC equipment control via analog outputs, 24V triacs, 0-10V inputs, 10K/1K thermistor inputs, and dry contact inputs. Custom control logic per equipment type. Pi 5s have Hailo NPU chips running larger ML models for predictive maintenance; Pi 4s run smaller AxonML Rust inference models (AxonML is my own ML framework — also open source).

16+ facilities. Taylor University, Element Labs, Byrna Ammunition, St. Jude Catholic School, Heritage Point Retirement Facilities in two different cities, and more. Over 120 pieces of equipment — air handlers, boilers, cooling towers, pumps, DOAS units, natatorium pool units, exhaust fans, greenhouses.

Each Pi needs to store sensor readings, equipment metadata, session state, and stream alerts in real time. And each Pi has 1-4 GB of RAM shared with the hardware daemons, control logic, and ML inference.

The original plan was the standard stack: Postgres for relational data, InfluxDB for sensor time series, Redis for caching, some message broker for alerts. On a server, that's fine. On a Raspberry Pi in a mechanical room that's already running BACnet/Modbus daemons and ML models with intermittent network to the central server, it's not viable.

So I built Aegis-DB — a single Rust binary that handles SQL, key-value, document, time series, graph, and event streaming through one REST API on one port. It runs on every Pi at ~50 MB RSS and replicates to the central NexusBMS server using CRDTs.

But Aegis-DB isn't just for edge devices. It's also the primary database for my PWAs, mobile apps, and the central NexusBMS server itself. The edge deployment is what forced the design to be efficient — but that efficiency benefits every deployment. It scales from a mechanical room Pi to a full server without changing a line of configuration.

The Problem: Six Data Models, One Raspberry Pi, Plus Everything Else

Each edge controller in a commercial building is already busy. NexusEdge runs Rust hardware daemons handling I2C, BACnet, and Modbus communications. It drives analog outputs, 24V triacs, 0-10V signals, reads 10K/1K thermistors and dry contacts. Custom HVAC control logic runs per equipment type. Pi 5s run ML models on Hailo NPU chips; Pi 4s run AxonML Rust inference. That's the baseline workload before you add a database.

On top of that, each controller needs to handle multiple kinds of data simultaneously:

Time series: Temperature sensors, pressure readings, flow rates — sampled at 1-10 Hz, stored for trending and anomaly detection
Documents: Equipment configurations, maintenance records, BACnet point mappings — semi-structured JSON that changes per equipment type
Key-value: Session tokens, cached control setpoints, runtime state that needs sub-millisecond reads
SQL: Relational metadata — which sensors belong to which equipment, which equipment belongs to which building, scheduling tables
Streaming: Real-time alerts when a sensor reading crosses a threshold, change data capture for audit trails
Graph: Equipment relationships — this AHU feeds these VAV boxes, this chiller serves these air handlers, this boiler serves these heating coils

On a server you'd run six different databases. On a Pi that's already running hardware daemons and ML inference with 1 GB of RAM, you run Aegis-DB. And on the server, you also run Aegis-DB — same binary, same API, just with more headroom.

Architecture: 13 Crates, ~60,000 Lines of Rust

Aegis-DB is a Cargo workspace with 13 crates in a layered dependency structure:

aegis-dashboard    → Leptos/WASM web UI (cluster monitoring, data browsers, query builder)
aegis-cli          → CLI with interactive SQL shell, node registry, multi-format output
aegis-server       → REST API (Axum, port 9090) + middleware stack + all endpoint handlers
    ↓
aegis-client       → Rust SDK with connection pooling
aegis-query        → SQL parser (sqlparser) → analyzer → cost-based planner → volcano executor
aegis-timeseries   → Gorilla compression, delta-of-delta timestamps, retention policies
aegis-document     → JSON document store, MongoDB-style queries, collection indexes
aegis-streaming    → Pub/sub channels, consumer groups, CDC with before/after images
aegis-replication  → Raft consensus, CRDTs, 2PC, consistent hashing, vector clocks
aegis-monitoring   → System metrics (CPU/RAM/disk/network), alerts, query statistics
aegis-updates      → OTA rolling updates with SHA-256 verification and auto-rollback
    ↓
aegis-storage      → Pluggable backends (Memory, LocalFS), WAL, MVCC, block compression
aegis-memory       → Arena allocators, buffer pool with LRU eviction
    ↓
aegis-common       → Shared types, unified AegisError, configuration

Each paradigm is a real implementation, not a shim over a KV store.

SQL Engine

The SQL engine uses the sqlparser crate for parsing into an AST, then runs through a semantic analyzer, a cost-based query planner, and a volcano-model executor with vectorized batch processing (configurable batch size, default 1024 rows per batch).

It supports the full DDL/DML surface you'd expect: CREATE/DROP/ALTER TABLE (with ADD/DROP/RENAME COLUMN, ALTER COLUMN type changes, ADD/DROP CONSTRAINT for PRIMARY KEY, UNIQUE, FOREIGN KEY, CHECK), CREATE/DROP INDEX, INSERT, UPDATE, DELETE, SELECT with JOINs, aggregations, subqueries, WHERE/GROUP BY/HAVING/ORDER BY/LIMIT. ALTER TABLE supports IF EXISTS/IF NOT EXISTS clauses.

Indexes: B-tree indexes for range queries and hash indexes for point lookups. Indexes accelerate SELECT, UPDATE, and DELETE — not just reads. When an UPDATE's WHERE clause matches an indexed column, Aegis-DB uses the index to find the rows in O(log N) instead of scanning the full table.

Direct execution API: For hot paths where SQL parsing overhead matters, Aegis-DB has a closure-based direct execution API (execute_update_indexed_fn) that bypasses parsing, planning, and expression evaluation entirely. Pre-resolved column indices, combined find+lookup in a single B-tree lock acquisition. This is how the fund transfer benchmark hits 758K TPS — the same operations through the SQL path run at ~40K TPS.

Plan cache: An LRU cache (1024 entries) stores parsed and planned queries. Repeated SQL statements skip re-parsing and re-planning entirely.

Query safety: Configurable max_result_rows (default 100,000) and query_timeout_secs (default 30) enforced at the executor level. Queries that exceed either limit return a clear error (HTTP 413 or 408) instead of consuming unbounded memory or CPU.

KV Store

DashMap-based concurrent hashmap. 12.3 million reads per second at the engine level, 203K reads/sec over HTTP. Supports optional TTL per key, prefix-based listing, and JSON values.

Document Store

JSON document collections with full CRUD. MongoDB-style query operators: $eq, $ne, $gt, $gte, $lt, $lte, $in, $nin, $exists, $regex, $and, $or, $contains, $startsWith, $endsWith. Collection-level indexing (hash and B-tree), sort/skip/limit/projection, and schema-optional storage. Index-accelerated queries route equality filters to hash/B-tree indexes for O(1) lookup instead of full-collection scan.

Time Series Engine

Gorilla compression (delta-of-delta timestamps + XOR floats) for high-density storage. Registered metrics with type information (Counter, Gauge, Histogram, Summary). Tag-based grouping, time-range queries, configurable retention policies with automatic cleanup, and automatic downsampling. Persistence layer with atomic writes and crash recovery. Lazy decompression — only decompresses data points within the requested time bounds.

Graph Engine

Nodes and edges with property bags. Adjacency lists for O(degree) traversal instead of O(E) full-edge scan. Label index and relationship index for fast filtered lookups. Batch operations for bulk graph construction.

Streaming Engine

Pub/sub channels with persistent subscriptions, consumer groups, and change data capture with before/after images. Event types: Create, Update, Delete, Custom. Channel history retrieval. Atomic stats tracking using lock-free atomics instead of RwLock contention.

Multi-Database Isolation

Each application gets its own isolated database namespace. Pass {"database": "my_app", "sql": "..."} on any query — databases are auto-provisioned on first use. Separate persistence files per database, separate table namespaces, no cross-contamination. One Aegis-DB instance serving multiple apps with full isolation.

634 tests across the workspace.

Why Rust?

1. Performance without GC pauses. A database is the worst place for garbage collection. Rust's ownership model gives predictable latency. The KV store does 12.3 million reads per second — that doesn't happen if you're stopping the world.

2. Fearless concurrency. Thousands of connections hitting the same data structures. Rust's type system catches data races at compile time. DashMap for the KV store, RwLock per database for SQL, atomics for stats counters. The number of concurrency bugs in Aegis-DB is near zero.

3. Single binary deployment. cargo install aegis-server gives you a running database. No JVM, no Python, no node_modules. One 27 MB binary you can scp to a Raspberry Pi and run.

4. Cross-compilation. cargo build --release --target aarch64-unknown-linux-musl produces a static binary for ARM64. Copy to Pi, chmod +x, done. This is critical when you're deploying to 50+ devices.

One Port, Every Data Model

One REST API on port 9090. SQL, KV, documents, time series, graph, streaming, admin, auth, compliance, bulk import, backups, OTA updates — all through one endpoint. Over 90 routes, all implemented and tested. The full API reference is in the documentation.

Edge Deployment: CRDTs and OTA Updates

This is the part that matters most for the NexusBMS use case.

Each Raspberry Pi runs Aegis-DB locally. It stores sensor data, serves the local control software, and doesn't depend on network connectivity to the central server. If the network goes down, the Pi keeps collecting data and running inference models. When connectivity returns, it syncs.

The sync mechanism uses CRDTs — Conflict-free Replicated Data Types. Aegis-DB implements eight CRDT types:

GCounter — Grow-only counter (event counts)
PNCounter — Positive-negative counter (connection counts)
GSet — Grow-only set (discovered devices)
TwoPSet — Two-phase set (add and remove, but removed items can't be re-added)
ORSet — Observed-remove set (active alerts — can add, remove, re-add)
LWWRegister — Last-writer-wins register (latest sensor reading)
MVRegister — Multi-value register (concurrent writes preserved until application resolves)
LWWMap — Last-writer-wins map (equipment configuration)

CRDTs are mathematically guaranteed to converge — there's no conflict resolution logic, no "last write wins" heuristics, no merge conflicts. Two Aegis-DB instances that have been disconnected for hours will converge to the same state when they reconnect, regardless of the order operations happened.

The cluster layer also provides:

Raft consensus with leader election, log replication, and automatic failover
Vector clocks and hybrid clocks (Lamport + physical time) for causality tracking
Consistent hashing — HashRing, JumpHash, and Rendezvous hashing for shard routing
2-phase commit for distributed transactions across nodes
Snapshot replication with CRC32 integrity verification

For OTA updates, Aegis-DB supports rolling updates across the cluster. The central server creates an update plan, nodes update one at a time (followers first, leader last), each update stages the binary with SHA-256 checksum verification, and if a node fails health checks after update, it automatically rolls back to the previous version. This is how I push new Aegis-DB releases across 50+ Pis without visiting each building.

Clustering: Three Nodes in Three Commands

# Node 1 — Leader
aegis-server --port 9090 --node-name Dashboard --peers 127.0.0.1:9091,127.0.0.1:7001

# Node 2 — Follower
aegis-server --port 9091 --node-name NexusScribe --peers 127.0.0.1:9090,127.0.0.1:7001 --data-dir /data/nexus

# Node 3 — Follower
aegis-server --port 7001 --node-name AxonML --peers 127.0.0.1:9090,127.0.0.1:9091 --data-dir /data/axon

The CLI discovers nodes automatically:

aegis-client nodes sync                # Auto-discover all nodes from running cluster
aegis-client nodes list                # See name, URL, role, status, node ID
aegis-client -d nexusscribe query "SELECT * FROM sensors"
aegis-client -d axonml status
aegis-client -d dashboard shell        # Interactive SQL shell

For production, we run the cluster under PM2 with auto-restart, max restart limits, and restart delays. Nodes track heartbeats, uptime, and version. Admin endpoints let you restart, drain, or remove nodes through the API.

The Dashboard: Not an Afterthought

Aegis-DB ships with a full web dashboard built in Leptos/WASM, served on port 8000. It's not a third-party add-on — it's a workspace crate that compiles to WebAssembly and talks to the REST API.

What's in it: cluster health overview, node monitoring with metrics, KV browser, document collection manager, graph visualization, visual query builder for SQL, data visualization, real-time activity feed, user and role management, settings management, and alerts dashboard.

It supports MFA login (TOTP with QR code generation and backup codes), and the same RBAC system that protects the API protects the dashboard.

Compliance: The Feature Nobody Builds

Most databases treat compliance as documentation — "here's how you could implement GDPR with our database." Aegis-DB has actual compliance endpoints built into the server. Not middleware you configure. Not plugins you install. REST endpoints under /api/v1/compliance/* that handle real regulatory workflows.

GDPR Article 17 — Right to Erasure

When a data subject requests deletion, Aegis-DB doesn't just delete the rows. It searches across all data stores — SQL tables, KV entries, document collections, graph nodes — and removes everything matching the subject identifier. Then it issues a cryptographic deletion certificate — a SHA-256 signed attestation that the data was found and removed, with a timestamp, the requestor identity, and what was deleted. The certificate is independently verifiable through a separate endpoint.

There's also a full deletion audit trail with integrity verification. Regulators don't want "we deleted it, trust us." They want cryptographic proof.

The deletion scope is configurable: delete everything, delete from specific collections only, or exclude audit logs (for legal hold requirements).

GDPR Article 20 — Data Portability

Structured export of all data associated with a data subject, in machine-readable JSON and flat CSV formats. Configurable export scope and date range filtering.

HIPAA PHI Classification

Column-level data classification with six levels: Public, Internal, Confidential, PHI, PII, and Restricted.

# Tag a column as containing Protected Health Information
POST /api/v1/compliance/classify
{"table": "patients", "column": "diagnosis", "classification": "phi"}

# Query all classifications
GET /api/v1/compliance/classifications

Classification is enforced at the query engine level. PHI-tagged columns trigger audit logging on every access, and query safety limits prevent bulk extraction of classified data.

CCPA Do Not Sell

Tracking and enforcement of consumer opt-out requests via the Do Not Sell endpoint.

Consent Management

Full consent lifecycle management with 12 purpose types: Marketing, Analytics, ThirdPartySharing, DataProcessing, Personalization, LocationTracking, Profiling, CrossDeviceTracking, Advertising, Research, DoNotSell, and Custom.

Each consent record tracks: who consented, to what purpose, when, through which channel (WebForm, MobileApp, API, PaperForm, Email, etc.), which privacy policy version they saw, and metadata like IP address and user agent. You can grant, deny, withdraw, renew, and export consent. Full audit trail with actor identification and reason tracking. Consent expiration with automatic validation.

Breach Detection and Notification

Anomaly detection on access patterns with configurable thresholds:

Failed login detection (default: 5 attempts in 5 minutes)
Unusual access patterns (default: 100 accesses in 1 minute)
Mass data operations (default: 1000+ rows affected)

Detected breaches are tracked with severity levels (Critical, High, Medium, Low), can be acknowledged and resolved through the API, and generate incident reports. Webhook notifications for real-time alerting to your incident response system.

These aren't checkboxes. In regulated industries — healthcare, education, financial services, and yes, commercial building management with tenant data — this matters. Every facility I manage has different compliance requirements, and I got tired of bolting compliance onto applications after the fact.

Security

Security was built in from the first commit, not bolted on:

TLS 1.2/1.3 via rustls — no OpenSSL dependency, modern cipher suites only
Argon2id password hashing — 19 MB memory cost, 2 iterations, unique random salts. Memory-hard, resistant to GPU cracking
RBAC with three tiers (Admin, Operator, Viewer) and 25+ granular permissions
OAuth2/OIDC and LDAP/Active Directory integration — pluggable AuthProvider trait
MFA with TOTP (RFC 6238) — QR code generation, backup codes, enforcement per user
HashiCorp Vault integration — Token auth, AppRole, and Kubernetes auth methods. Secrets loaded through a SecretsManager that chains Vault → environment variables → defaults
Rate limiting — Token bucket algorithm. 1000 requests/min for API, 30/min for login endpoints, per-client IP (with X-Forwarded-For and X-Real-IP proxy support)
Security headers on every response — Content-Security-Policy, X-Content-Type-Options: nosniff, X-Frame-Options: DENY, X-XSS-Protection, HSTS when TLS is active
Request IDs — UUID per request in x-request-id header for distributed tracing
Encrypted backups — AES-256-GCM authenticated encryption with random nonces, SHA-256 checksums per file
Cryptographic audit log verification — tamper-evident audit trail

Storage Engine

Pluggable backend architecture via the StorageBackend trait. Two implementations: in-memory (for development and testing) and local filesystem (for production with persistence).

Write-ahead logging for durability — every mutation hits the WAL before it's applied. Crash recovery replays the WAL on startup
MVCC with snapshot isolation — concurrent reads never block writes, no dirty reads
Block compression — LZ4 (fast), Zstd (balanced), Snappy (Google's fast compressor). Configurable per backend
Buffer pool with LRU eviction for page caching
Arena allocators for batch memory allocation patterns
Checkpoint creation for point-in-time recovery
VACUUM/Compaction — removes dead rows from MVCC, shrinks storage vectors, and fully rebuilds all indexes. Available through POST /api/v1/admin/vacuum with optional table targeting

When a --data-dir is set, Aegis-DB persists everything to disk: SQL tables, KV data, document collections, time series, graph data, audit logs, and activity records. Daily rotating log files are written to {data_dir}/logs/ via tracing-appender.

SDKs and Tools

Python SDK

Async-first client built on aiohttp. Connection pooling, automatic retry with configurable attempts, timeout configuration, streaming query support with batching, and full type definitions. Authentication with username/password or API key, MFA support, database switching.

from aegisdb import AegisClient

client = AegisClient("http://localhost:9090")
await client.login("admin", password)
result = await client.query("SELECT * FROM sensors WHERE location = 'AHU-1'")
await client.kv_set("setpoint:ahu1", {"temp": 72.0}, ttl=3600)

JavaScript/TypeScript SDK

Fetch API based — works in both browser and Node.js. Promise-based async API, abort signal support for cancellation, full TypeScript type definitions (Row, QueryResult, TableInfo, ColumnInfo, KeyValueEntry, GraphData), token-based session management.

import { AegisClient } from '@aegis-db/client';

const client = new AegisClient('http://localhost:9090');
await client.login('admin', password);
const result = await client.query('SELECT * FROM sensors');

CLI

Interactive SQL shell, multi-format output (table/JSON/CSV), node registry for cluster shorthand, and KV operations:

aegis-client -d nexusscribe shell              # Interactive SQL shell
aegis-client -d nexusscribe query "SELECT 1"   # Single query
aegis-client -d axonml kv set mykey "myvalue"  # KV operations
aegis-client -d axonml kv list --prefix "sensor:"
aegis-client nodes sync                        # Auto-discover cluster
aegis-client nodes list                        # Show all nodes with roles

Grafana Data Source Plugin

Native Grafana data source plugin for SQL queries, time series visualization, annotations, and template variables.

Benchmarks

Tested on Intel Core Ultra 9 275HX, 55 GB RAM, Rust 1.92.0.

Engine-Level (direct calls, no network)

Workload	Throughput
SQL single-row insert	223,000 rows/sec
SQL batch insert (1000 rows)	195,000 rows/sec
KV read (64B values)	12,350,000 ops/sec
KV write (64B values)	3,970,000 ops/sec
KV delete	2,657,000 ops/sec
Fund transfer, 0% contention	758,000 TPS
Fund transfer, high contention (Zipf)	2,496,000 TPS

vs SpacetimeDB

Workload	Aegis-DB	SpacetimeDB
Fund transfer, 0% contention	758,000 TPS	107,850 TPS	7x faster
Fund transfer, high contention	2,496,000 TPS	103,590 TPS	24x faster

The fund transfer benchmark is the standard database comparison workload: debit one account, credit another, enforce non-negative balances. SpacetimeDB runs this as compiled WASM modules inside the database. Aegis-DB's direct execution API — closure-based indexed updates with pre-resolved column indices, combined find+lookup in a single B-tree lock acquisition — achieves 7-24x the throughput.

HTTP API (50 concurrent connections, 10s duration)

Endpoint	Throughput	Avg Latency
SQL insert	80,450 ops/sec	620 μs
SQL read	40,496 ops/sec	1.2 ms
KV get	203,117 ops/sec	245 μs
Mixed 80/20 read/write	23,868 ops/sec	2.1 ms

These are single-node numbers on a fast dev machine. The Pi controllers obviously don't hit these numbers, but they don't need to — 1 Hz sensor polling doesn't require millions of ops per second.

Design Trade-offs

Aegis-DB is a unified platform, not six separate databases crammed into one binary. Each paradigm is purpose-built for the workloads it handles — but the design philosophy prioritizes breadth and operational simplicity over matching every niche feature of single-paradigm databases.

The SQL engine covers the DDL/DML surface that production applications actually use: JOINs, aggregations, subqueries, indexes, ALTER TABLE, constraints. The document store supports the MongoDB-style query operators that handle real filtering workloads. The graph engine provides adjacency-list traversal with property lookups — the operations that equipment relationship mapping and dependency tracking actually need.

What you get in return is one binary, one port, one backup, one set of credentials, one monitoring dashboard, and one replication layer across all six data models. For teams running on constrained hardware, multi-tenant environments, or anywhere operational complexity matters more than squeezing out the last 5% of a single paradigm — that's the right trade-off.

Licensing

Aegis-DB uses the Business Source License 1.1. You can use it for anything — development, testing, production, internal tools, SaaS applications — except reselling it as a managed database service. In 2030 it converts to Apache 2.0.

I chose BSL because I wanted the project to be sustainable. Every company can use it for free. The only restriction is that cloud providers can't take it and sell "Managed Aegis-DB" without a commercial license.

Try It

cargo install aegis-server
aegis-server

# SQL
curl -X POST http://localhost:9090/api/v1/query \
  -H "Content-Type: application/json" \
  -d '{"sql": "CREATE TABLE sensors (id INT, name TEXT, location TEXT)"}'

curl -X POST http://localhost:9090/api/v1/query \
  -H "Content-Type: application/json" \
  -d '{"sql": "INSERT INTO sensors VALUES (1, '\''supply_air_temp'\'', '\''AHU-1'\'')"}'

# KV
curl -X POST http://localhost:9090/api/v1/kv/keys \
  -H "Content-Type: application/json" \
  -d '{"key": "setpoint:ahu1", "value": {"temp": 72.0}}'

# Time Series
curl -X POST http://localhost:9090/api/v1/timeseries/write \
  -H "Content-Type: application/json" \
  -d '{"metric": "supply_air_temp", "value": 54.2, "tags": {"unit": "AHU-1"}}'

# Documents
curl -X POST http://localhost:9090/api/v1/documents/collections \
  -H "Content-Type: application/json" \
  -d '{"name": "equipment"}'

curl -X POST http://localhost:9090/api/v1/documents/collections/equipment/documents \
  -H "Content-Type: application/json" \
  -d '{"type": "AHU", "name": "AHU-1", "capacity_tons": 25, "serves": ["VAV-1", "VAV-2"]}'

# Graph
curl -X POST http://localhost:9090/api/v1/graph/nodes \
  -H "Content-Type: application/json" \
  -d '{"label": "Equipment", "properties": {"name": "AHU-1", "type": "air_handler"}}'

# Streaming
curl -X POST http://localhost:9090/api/v1/streaming/channels \
  -H "Content-Type: application/json" \
  -d '{"name": "alerts"}'

curl -X POST http://localhost:9090/api/v1/streaming/publish \
  -H "Content-Type: application/json" \
  -d '{"channel": "alerts", "event": {"type": "high_temp", "unit": "AHU-1", "value": 85.2}}'

All six data models, one port, one binary.

GitHub: github.com/AutomataNexus/Aegis-DB
Crates.io: crates.io/crates/aegis-server

Andrew Jewell Sr is the founder of AutomataNexus LLC. Aegis-DB powers the NexusBMS building management platform across 16+ commercial facilities on 50+ edge controllers.