<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Andrew Jewell Sr</title>
    <description>The latest articles on Forem by Andrew Jewell Sr (@automatanexus).</description>
    <link>https://forem.com/automatanexus</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3361531%2Ff373e494-3f75-4d14-85af-5208964c270e.png</url>
      <title>Forem: Andrew Jewell Sr</title>
      <link>https://forem.com/automatanexus</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/automatanexus"/>
    <language>en</language>
    <item>
      <title>AxonML -- A PyTorch-equivalent ML framework written in Rust</title>
      <dc:creator>Andrew Jewell Sr</dc:creator>
      <pubDate>Sat, 28 Feb 2026 20:23:15 +0000</pubDate>
      <link>https://forem.com/automatanexus/axonml-a-pytorch-equivalent-ml-framework-written-in-rust-328a</link>
      <guid>https://forem.com/automatanexus/axonml-a-pytorch-equivalent-ml-framework-written-in-rust-328a</guid>
      <description>&lt;p&gt;&lt;em&gt;Andrew Jewell Sr / AutomataNexus LLC&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Over the past year and a half, I've been building AxonML -- a machine learning framework in Rust that aims for feature parity with PyTorch. It's now at v0.3.2: 22 crates, 336 Rust source files, 1,095 passing tests, and it's running production inference on Raspberry Pi edge hardware in commercial buildings. This post covers why I built it, how it's architected, the hard technical problems I ran into, and where it's actually being used.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/AutomataNexus/AxonML" rel="noopener noreferrer"&gt;github.com/AutomataNexus/AxonML&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;License:&lt;/strong&gt; MIT / Apache-2.0&lt;/p&gt;
&lt;h2&gt;
  
  
  Motivation
&lt;/h2&gt;

&lt;p&gt;I built an entire building automation ecosystem from scratch. NexusBMS is the central building management platform -- won an InfluxDB hackathon with it, runs InfluxDB 3.0 OSS alongside my own database (Aegis-DB, also open source). The edge controllers are 50+ Raspberry Pi 4/5s running my custom NexusEdge software: Rust hardware daemons for I2C, BACnet, and Modbus communications, direct HVAC equipment control via analog outputs, 24V triacs, 0-10V inputs, 10K/1K thermistor inputs, and dry contact inputs. Custom control logic per equipment type. 16+ facilities including Taylor University, Element Labs, Byrna Ammunition, St. Jude Catholic School, Heritage Point Retirement Facilities in two different cities. Over 120 pieces of equipment -- air handlers, boilers, cooling towers, pumps, DOAS units, natatorium pool units, exhaust fans, greenhouses.&lt;/p&gt;

&lt;p&gt;The monitoring uses machine learning -- LSTM autoencoders for anomaly detection, GRU networks for failure prediction -- running on those Pi edge controllers mounted in mechanical rooms. Pi 5s have Hailo NPU chips running larger models; Pi 4s run smaller AxonML Rust inference models.&lt;/p&gt;

&lt;p&gt;The original plan was to train models in PyTorch and deploy inference in Python on the Pis. This didn't work well. Python's memory footprint on a 1 GB RAM Pi was too high. Dependency management was fragile. PyTorch's ARM support was incomplete. And I was spending more time fighting the deployment pipeline than building models.&lt;/p&gt;

&lt;p&gt;I wanted a framework where I could:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Define and train models with PyTorch-like ergonomics&lt;/li&gt;
&lt;li&gt;Compile to a single static binary&lt;/li&gt;
&lt;li&gt;Cross-compile to ARM&lt;/li&gt;
&lt;li&gt;Run inference at 2-3 MB RSS with no runtime dependencies&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Rust was the obvious choice. The question was whether one person could build enough of a framework to actually be useful.&lt;/p&gt;

&lt;p&gt;The answer, it turns out, is yes -- with caveats I'll get into.&lt;/p&gt;
&lt;h2&gt;
  
  
  Architecture: 22 Crates
&lt;/h2&gt;

&lt;p&gt;AxonML is structured as a Cargo workspace with 22 crates, organized in layers. Each crate is independently testable and can be pulled in via feature flags.&lt;/p&gt;
&lt;h3&gt;
  
  
  Layer 1: Compute Foundation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;axonml-core&lt;/code&gt;&lt;/strong&gt; provides device abstraction across CPU, CUDA, Vulkan, Metal, and WebGPU. The &lt;code&gt;Device&lt;/code&gt; enum dispatches operations to the appropriate backend. &lt;code&gt;Storage&amp;lt;T&amp;gt;&lt;/code&gt; is the reference-counted raw memory backing for tensors. The CUDA backend implements GPU memory allocation, cuBLAS GEMM for matrix multiply, and 20+ element-wise CUDA kernels compiled from PTX source.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;axonml-tensor&lt;/code&gt;&lt;/strong&gt; implements N-dimensional tensors generic over scalar type: &lt;code&gt;Tensor&amp;lt;T: Scalar&amp;gt;&lt;/code&gt;. Broadcasting follows NumPy rules. Views and slicing are zero-copy where possible (backed by &lt;code&gt;Arc&amp;lt;Storage&amp;gt;&lt;/code&gt;). 60+ operations including arithmetic, reductions (sum, mean, max, min, prod), sorting (sort, argsort, topk), indexing (gather, scatter, nonzero, unique), shape manipulation (flip, roll, squeeze, unsqueeze, permute), and activations (ReLU, Sigmoid, Tanh, Softmax, GELU, SiLU, ELU, LeakyReLU). Sparse tensor support in COO format.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;axonml-autograd&lt;/code&gt;&lt;/strong&gt; is the reverse-mode automatic differentiation engine. &lt;code&gt;Variable&lt;/code&gt; wraps a tensor and connects it to the computational graph via gradient functions. The graph is tape-based with &lt;code&gt;Arc&amp;lt;Mutex&amp;lt;&amp;gt;&amp;gt;&lt;/code&gt; for shared ownership. Backward pass performs topological sort over the graph and applies the chain rule. Includes Automatic Mixed Precision (autocast context for F16 training) and gradient checkpointing (trade compute for memory). Backward functions cover activations (ReLU, Sigmoid, Tanh, Softmax, LeakyReLU, GELU), arithmetic (Add, Sub, Mul, Div, Neg, Pow, Sum, Mean), and linalg (MatMul, Transpose, Reshape, Cat, Select, Expand, SumDim).&lt;/p&gt;
&lt;h3&gt;
  
  
  Layer 2: ML Primitives
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;axonml-nn&lt;/code&gt;&lt;/strong&gt; provides the &lt;code&gt;Module&lt;/code&gt; trait (with &lt;code&gt;forward()&lt;/code&gt;, &lt;code&gt;parameters()&lt;/code&gt;, &lt;code&gt;train()&lt;/code&gt;/&lt;code&gt;eval()&lt;/code&gt;) and 37+ layer types across 12 source files:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Core:&lt;/strong&gt; Linear, Conv1d, Conv2d, Embedding, ResidualBlock&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pooling:&lt;/strong&gt; MaxPool1d/2d, AvgPool1d/2d, AdaptiveAvgPool2d&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Normalization:&lt;/strong&gt; BatchNorm1d/2d, LayerNorm, GroupNorm, InstanceNorm2d&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Regularization:&lt;/strong&gt; Dropout, Dropout2d, AlphaDropout&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Recurrent:&lt;/strong&gt; RNN, LSTM, GRU (each with cell variants: RNNCell, LSTMCell, GRUCell)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Attention:&lt;/strong&gt; MultiHeadAttention, CrossAttention&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transformer:&lt;/strong&gt; TransformerEncoderLayer, TransformerDecoderLayer, TransformerEncoder, TransformerDecoder, Seq2SeqTransformer&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Graph neural networks:&lt;/strong&gt; GCNConv (Graph Convolutional Network), GATConv (Graph Attention Network)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Signal processing:&lt;/strong&gt; FFT1d, STFT (Short-Time Fourier Transform)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Loss functions: MSELoss, CrossEntropyLoss, BCELoss, BCEWithLogitsLoss, L1Loss, SmoothL1Loss, NLLLoss.&lt;/p&gt;

&lt;p&gt;Initialization: Xavier/Glorot (uniform and normal), Kaiming/He (uniform and normal), Orthogonal, Sparse, plus uniform, normal, constant, zeros, ones, eye, diag.&lt;/p&gt;

&lt;p&gt;Activations: ReLU, Sigmoid, Tanh, GELU, SiLU, ELU, LeakyReLU, Softmax, LogSoftmax, Identity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;axonml-optim&lt;/code&gt;&lt;/strong&gt; implements five optimizers: SGD (with momentum, Nesterov, weight decay, dampening), Adam (with AMSGrad), AdamW (decoupled weight decay), RMSprop (centered, with momentum), and LAMB (layer-wise adaptive moments for large-batch training). GradScaler for mixed-precision gradient scaling. Seven learning rate schedulers: StepLR, MultiStepLR, ExponentialLR, CosineAnnealingLR, OneCycleLR, WarmupLR, ReduceLROnPlateau.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;axonml-data&lt;/code&gt;&lt;/strong&gt; provides the &lt;code&gt;Dataset&lt;/code&gt; trait, &lt;code&gt;DataLoader&lt;/code&gt; with batching and shuffling, samplers (Sequential, Random, SubsetRandom, WeightedRandom, Batch), transforms (Normalize, RandomNoise, RandomCrop, RandomFlip, Scale, Clamp), and collate functions.&lt;/p&gt;
&lt;h3&gt;
  
  
  Layer 3: Infrastructure
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;axonml-serialize&lt;/code&gt;&lt;/strong&gt; handles model save/load in the &lt;code&gt;.axonml&lt;/code&gt; binary format, JSON, and SafeTensors. StateDict extraction (PyTorch-compatible concept), checkpoint management with builder pattern, format auto-detection from file extensions and magic bytes, PyTorch key conversion utilities.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;axonml-onnx&lt;/code&gt;&lt;/strong&gt; imports and exports ONNX models with 40+ operator implementations at opset version 17.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;axonml-quant&lt;/code&gt;&lt;/strong&gt; provides block-based quantization (32-element blocks) in INT8 (Q8_0), INT4 (Q4_0, Q4_1), INT5 (Q5_0, Q5_1), and F16 with four calibration methods: MinMax, Percentile, Entropy, MeanStd. Q4 quantization achieves roughly 8x model size reduction. Parallel processing via Rayon. Compression stats with RMSE, max error, mean error.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;axonml-fusion&lt;/code&gt;&lt;/strong&gt; detects and applies kernel fusion patterns automatically: MatMul+Bias, MatMul+Bias+ReLU/GELU, Conv+BatchNorm, Conv+BatchNorm+ReLU, elementwise chains, Add+ReLU, Mul+Add FMA. FusedLinear and FusedElementwise (14 elementwise ops). Configurable optimizer with conservative/aggressive modes. Up to 2x speedup for memory-bound operations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;axonml-jit&lt;/code&gt;&lt;/strong&gt; provides an intermediate representation for computation graphs (40+ ops), operation tracing, graph optimization passes (constant folding, dead code elimination, common subexpression elimination, algebraic simplification, elementwise fusion, strength reduction), LRU function caching, and shape inference with broadcasting. Built on a Cranelift foundation for native codegen.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;axonml-profile&lt;/code&gt;&lt;/strong&gt; includes memory profiling (allocation tracking, peak usage, leak detection), compute profiling (operation timing, FLOPS, throughput, bandwidth), timeline profiling with Chrome trace export (&lt;code&gt;chrome://tracing&lt;/code&gt;), automatic bottleneck analysis (5 categories: SlowOperation, HighCallCount, MemoryHotspot, MemoryLeak, LowThroughput), and reports in Text/JSON/Markdown/HTML.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;axonml-distributed&lt;/code&gt;&lt;/strong&gt; implements four parallelism strategies: DistributedDataParallel (DDP) with gradient bucketing, Fully Sharded Data Parallel (FSDP) with ZeRO-2/ZeRO-3 and CPU offload, Pipeline Parallelism with microbatching and configurable schedules, and Tensor Parallelism via ColumnParallelLinear/RowParallelLinear. Collective operations: all-reduce (sum, mean, min, max, product), broadcast, all-gather, reduce-scatter, barrier, ring all-reduce, scatter, gather.&lt;/p&gt;
&lt;h3&gt;
  
  
  Layer 4: Domain Libraries
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;axonml-vision&lt;/code&gt;&lt;/strong&gt; provides image transforms (Resize, CenterCrop, RandomHorizontalFlip, RandomVerticalFlip, RandomRotation, ColorJitter, Grayscale, Normalize, Pad), synthetic datasets (MNIST, CIFAR, Fashion-MNIST, CIFAR-100), and architecture implementations: LeNet, SimpleCNN, MLP, ResNet (18/34), VGG (11/13/16/19), Vision Transformer (ViT). Pretrained weight hub with local caching and ImageNet/MNIST/CIFAR normalization presets.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;axonml-audio&lt;/code&gt;&lt;/strong&gt; provides MelSpectrogram, MFCC, resampling, time stretching, pitch shifting, noise augmentation (SNR-based), audio normalization, and silence trimming. Synthetic datasets for command recognition, music genre classification, and speaker identification.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;axonml-text&lt;/code&gt;&lt;/strong&gt; provides six tokenizer types: Whitespace, Character, WordPunct, NGram (word and character n-grams), BPE (with training), and Unigram. Vocabulary management with special tokens (PAD, UNK, BOS, EOS, MASK) and frequency-based filtering. Synthetic datasets for sentiment analysis, seq2seq, and language modeling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;axonml-llm&lt;/code&gt;&lt;/strong&gt; implements five LLM architectures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;BERT&lt;/strong&gt; -- encoder with BertForSequenceClassification, BertForMaskedLM, BertPooler, configs for base/large/tiny&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPT-2&lt;/strong&gt; -- decoder with GPT2LMHead, configs for small/medium/large/xl/tiny&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLaMA&lt;/strong&gt; -- LLaMAForCausalLM with RMSNorm, RotaryEmbedding, GroupedQueryAttention&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mistral&lt;/strong&gt; -- MistralForCausalLM with sliding window attention&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Phi&lt;/strong&gt; -- PhiForCausalLM&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Plus: FlashAttention with KVCache, MultiHeadSelfAttention, CausalSelfAttention. Text generation with greedy, top-k, top-p/nucleus, temperature, and beam search with repetition penalty. Hugging Face model loader with weight mappers for LLaMA, Mistral, and Phi. Pretrained model hub with configs for LLaMA, Mistral, Phi, and Qwen.&lt;/p&gt;
&lt;h3&gt;
  
  
  Layer 5: Application Stack
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;axonml-cli&lt;/code&gt;&lt;/strong&gt; is a comprehensive CLI with 50+ commands covering project scaffolding, training, evaluation, inference, model conversion, ONNX export/import, quantization, workspace management, analysis/reports, data management, bundling/deployment, benchmarking, GPU management, pretrained model hub, Kaggle integration, dataset management (NexusConnectBridge API), Weights &amp;amp; Biases experiment tracking, and dashboard/server lifecycle management.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;axonml-tui&lt;/code&gt;&lt;/strong&gt; is a ratatui-based terminal dashboard with model architecture visualization, dataset explorer, real-time training monitor with sparkline trends and ETA, interactive loss/accuracy/learning rate graphs, file browser with preview, and vi-style keyboard navigation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;axonml-dashboard&lt;/code&gt;&lt;/strong&gt; is a Leptos/WASM web frontend (client-side rendered) with 20+ page routes: full authentication flow (login/register/session), MFA setup (TOTP + WebAuthn), training run management with real-time WebSocket metrics, model registry (browse/upload/version), inference endpoint deployment, dark mode toggle, toast notifications, slide-out terminal with WebSocket PTY, responsive design.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;axonml-server&lt;/code&gt;&lt;/strong&gt; is an Axum-based REST + WebSocket API backend with 50+ endpoints: JWT authentication with refresh tokens, multi-factor authentication (TOTP/RFC 6238 + WebAuthn/FIDO2 hardware keys + recovery codes), Argon2 password hashing, training run management with WebSocket streaming, model registry with file upload/download, inference endpoint management, terminal WebSocket PTY, CORS, structured tracing, and Prometheus metrics export. Uses Aegis-DB as its backing document store.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Hard Problems
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Autograd Without a Garbage Collector
&lt;/h3&gt;

&lt;p&gt;This was the central technical challenge. PyTorch's computational graph is managed by Python's reference counting and garbage collector. When a tensor goes out of scope, its graph node is freed. When you call &lt;code&gt;loss.backward()&lt;/code&gt;, Python's GC ensures the graph stays alive exactly as long as needed.&lt;/p&gt;

&lt;p&gt;Rust doesn't have a GC. The ownership model is fundamentally different.&lt;/p&gt;

&lt;p&gt;My approach: each &lt;code&gt;Variable&lt;/code&gt; (the autograd-aware tensor wrapper) holds an &lt;code&gt;Arc&lt;/code&gt; reference to its gradient function. Gradient functions hold &lt;code&gt;Arc&lt;/code&gt; references to their input variables. This creates a reference-counted graph that stays alive as long as any variable referencing it is alive.&lt;/p&gt;

&lt;p&gt;Backward traversal collects all reachable nodes, topologically sorts them, and applies gradient functions in reverse order. After backward, the graph can be dropped.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;Arc&amp;lt;Mutex&amp;lt;&amp;gt;&amp;gt;&lt;/code&gt; pattern adds allocation overhead on every forward operation. Each tensor op creates a new &lt;code&gt;Variable&lt;/code&gt; with a new &lt;code&gt;Arc&lt;/code&gt;-wrapped gradient function. For small models and short sequences this is negligible. For very large models with long computational graphs, it's measurable.&lt;/p&gt;

&lt;p&gt;If I were starting over, I'd explore arena-based allocation -- allocate all graph nodes from a bump allocator that gets reset after each backward pass. This would trade some API complexity for better performance characteristics.&lt;/p&gt;

&lt;p&gt;A critical bug I found and fixed: early versions used &lt;code&gt;Variable::new()&lt;/code&gt; for intermediate results, which creates leaf variables that sever the gradient graph. The fix was &lt;code&gt;Variable::from_operation()&lt;/code&gt;, which creates non-leaf variables that properly participate in backpropagation. This is the kind of bug that's obvious in hindsight but took significant debugging to identify (loss would decrease for a few epochs then plateau because gradients weren't flowing through certain layers).&lt;/p&gt;
&lt;h3&gt;
  
  
  CUDA Integration
&lt;/h3&gt;

&lt;p&gt;The CUDA backend was built incrementally. The core abstraction is &lt;code&gt;CudaStorage&lt;/code&gt; -- GPU-resident memory managed via CUDA's &lt;code&gt;cudaMalloc&lt;/code&gt;/&lt;code&gt;cudaFree&lt;/code&gt;, with deallocation in &lt;code&gt;Drop&lt;/code&gt; so GPU memory leaks are structurally impossible (barring panics).&lt;/p&gt;

&lt;p&gt;Matrix multiplication dispatches to cuBLAS GEMM. Element-wise operations (add, subtract, multiply, divide, relu, sigmoid, tanh, exp, log, sqrt, abs, neg, clamp, pow, and more) use custom CUDA kernels compiled from PTX source.&lt;/p&gt;

&lt;p&gt;The main challenge is keeping tensors on-device. In PyTorch, &lt;code&gt;.to('cuda')&lt;/code&gt; moves a tensor to GPU and subsequent operations stay on GPU. In AxonML, the &lt;code&gt;Device&lt;/code&gt; enum propagates through operations -- if both inputs are on CUDA, the output stays on CUDA. If there's a mismatch, you get an explicit error at compile time via the type system, rather than a runtime error.&lt;/p&gt;
&lt;h3&gt;
  
  
  Generic Tensors vs. Dynamic Types
&lt;/h3&gt;

&lt;p&gt;I chose &lt;code&gt;Tensor&amp;lt;T: Scalar&amp;gt;&lt;/code&gt; -- tensors are generic over their scalar type. This means &lt;code&gt;Tensor&amp;lt;f32&amp;gt;&lt;/code&gt; and &lt;code&gt;Tensor&amp;lt;f64&amp;gt;&lt;/code&gt; are distinct types. You can't accidentally add a float tensor to an integer tensor. Dimension mismatches are caught at compile time.&lt;/p&gt;

&lt;p&gt;The tradeoff: you can't dynamically switch dtypes without enum dispatch. PyTorch lets you call &lt;code&gt;.float()&lt;/code&gt; or &lt;code&gt;.half()&lt;/code&gt; and it returns the same type with different internal representation. In AxonML, changing dtype requires converting to a different concrete type. This adds some API friction but eliminates an entire class of runtime bugs.&lt;/p&gt;

&lt;p&gt;For the LLM architectures, where mixed-precision training switches between f32 and f16, I implemented the AMP autocast context that handles the conversion explicitly.&lt;/p&gt;
&lt;h3&gt;
  
  
  Cross-Compilation for ARM
&lt;/h3&gt;

&lt;p&gt;The deployment target for my HVAC models is &lt;code&gt;armv7-unknown-linux-musleabihf&lt;/code&gt; -- 32-bit ARM with hardware floating point, statically linked against musl libc. The build command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cargo build &lt;span class="nt"&gt;--release&lt;/span&gt; &lt;span class="nt"&gt;--target&lt;/span&gt; armv7-unknown-linux-musleabihf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This produces a single binary with no dynamic library dependencies. Copy it to the Raspberry Pi, set it executable, and it runs. No cross-compilation toolchain complexity beyond having the right Rust target installed.&lt;/p&gt;

&lt;p&gt;The inference binaries use pure tensor operations -- no autograd tape, no gradient tracking, no optimizer state. This keeps the binary small and the runtime footprint minimal. Each inference daemon runs at ~2-3 MB RSS.&lt;/p&gt;

&lt;h2&gt;
  
  
  Production: HVAC Predictive Maintenance
&lt;/h2&gt;

&lt;p&gt;This is where AxonML proves itself beyond benchmarks and test suites.&lt;/p&gt;

&lt;p&gt;I have 69 trained &lt;code&gt;.axonml&lt;/code&gt; model files across 7 commercial building facilities: FCOG, Warren, Huntington, Akron, Hopebridge, NE Realty, and a unified NexusBMS system. The models cover a wide range of HVAC equipment:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Air handlers&lt;/strong&gt; -- supply air temperature prediction, mixed air anomalies&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Boilers&lt;/strong&gt; -- steam/comfort/domestic hot water anomaly detection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chillers&lt;/strong&gt; -- condenser/evaporator anomaly patterns&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;VAV boxes&lt;/strong&gt; -- zone temperature and airflow prediction&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fan coils&lt;/strong&gt; -- heating/cooling valve anomalies&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Make-up air units&lt;/strong&gt; -- outside air conditioning monitoring&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DOAS&lt;/strong&gt; (Dedicated Outdoor Air Systems) -- ventilation anomalies&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pumps&lt;/strong&gt; -- flow and pressure anomaly detection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Steam systems&lt;/strong&gt; -- bundle condition and trap monitoring&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The model architectures are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Anomaly detectors:&lt;/strong&gt; LSTM autoencoders that learn normal operating patterns and flag deviations. An input sequence of sensor readings goes through an LSTM encoder, gets compressed to a latent representation, then reconstructed by an LSTM decoder. Reconstruction error above a threshold signals anomalous behavior&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Failure predictors:&lt;/strong&gt; GRU networks that take recent sensor history and predict probability of equipment failure in the near future&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;12 of these models are running live inference on Raspberry Pi edge controllers. The deployment pipeline:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Train on server (CPU) using AxonML&lt;/li&gt;
&lt;li&gt;Save model weights as &lt;code&gt;.axonml&lt;/code&gt; files (or quantize to INT4/INT8 for smaller footprint)&lt;/li&gt;
&lt;li&gt;Cross-compile inference daemon to ARM static binary&lt;/li&gt;
&lt;li&gt;Deploy to Pi via the building management system's OTA update pipeline&lt;/li&gt;
&lt;li&gt;PM2 manages the process (auto-restart, log management)&lt;/li&gt;
&lt;li&gt;Daemon polls local NexusEdge controller for sensor data at 1 Hz&lt;/li&gt;
&lt;li&gt;Runs inference, maintains rolling time-series buffers&lt;/li&gt;
&lt;li&gt;Exposes anomaly scores and failure predictions via REST API (&lt;code&gt;/api/inference/latest&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;NexusBMS building management dashboard consumes the API&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The NexusBMS system alone has 22 trained models covering every major equipment type in a commercial building. Each model trains in minutes on CPU, serializes to a few hundred KB (or less with quantization), and runs inference in microseconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Kaggle: Akkadian-to-English Machine Translation
&lt;/h2&gt;

&lt;p&gt;To exercise the seq2seq and NLP capabilities, I entered the Deep Past Initiative Kaggle competition. The task: translate Akkadian cuneiform text to English. The dataset has ~1,561 parallel sentence pairs with 5,571 unique source tokens.&lt;/p&gt;

&lt;p&gt;The AxonML model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;BPE tokenizer for both source and target languages&lt;/li&gt;
&lt;li&gt;Sinusoidal positional encoding&lt;/li&gt;
&lt;li&gt;Transformer encoder-decoder with multi-head attention&lt;/li&gt;
&lt;li&gt;Trained end-to-end through AxonML's training pipeline&lt;/li&gt;
&lt;li&gt;Evaluated on BLEU + chrF++&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The entire pipeline -- data loading, tokenization, vocabulary building, model definition, training loop, checkpoint management, generation with beam search -- runs through AxonML. No Python anywhere in the pipeline.&lt;/p&gt;

&lt;p&gt;This was a good stress test for the framework's NLP capabilities. seq2seq translation exercises: embedding layers, positional encoding, encoder with self-attention, decoder with masked self-attention and cross-attention, output projection, autoregressive generation.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Real-time model serving with batched inference.&lt;/strong&gt; The inference server works but doesn't batch across concurrent requests yet&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Expanded CUDA kernel coverage.&lt;/strong&gt; More operations need GPU implementations to reduce CPU fallbacks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-hosted pretrained weight hosting.&lt;/strong&gt; Currently using a hub config system; want to host weights directly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;More pretrained weights and ONNX import improvements.&lt;/strong&gt; Making it easier to convert models from Hugging Face&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Should You Use AxonML?
&lt;/h2&gt;

&lt;p&gt;If you're doing standard ML research with Jupyter notebooks, Hugging Face, and cloud GPUs: probably not. PyTorch's ecosystem is vast and mature, and fighting a smaller ecosystem isn't worth it for research iteration speed.&lt;/p&gt;

&lt;p&gt;If you're in one of these situations, it might be worth evaluating:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Edge deployment.&lt;/strong&gt; You need ML inference on constrained hardware without Python&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rust applications.&lt;/strong&gt; You're building a Rust application that needs embedded ML inference&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Single-binary deployment.&lt;/strong&gt; You want a model that compiles to one file with no dependencies&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Graph neural networks in Rust.&lt;/strong&gt; GCNConv and GATConv layers are implemented and ready to use&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Learning ML internals.&lt;/strong&gt; The codebase is MIT/Apache-2.0 and every layer is implemented from scratch in readable Rust. If you want to understand how autograd, attention, LSTM gates, or graph convolutions actually work, the source is there&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HVAC/IoT/industrial.&lt;/strong&gt; You're in a similar domain where models need to run on real hardware in real buildings&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AxonML is one developer's work. It's not going to outpace PyTorch's development velocity. But it solves a real problem -- production ML on constrained hardware with compile-time safety -- and it's been doing that in production for real buildings with real equipment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/AutomataNexus/AxonML" rel="noopener noreferrer"&gt;github.com/AutomataNexus/AxonML&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Andrew Jewell Sr is the founder of AutomataNexus LLC. AxonML is open source under MIT/Apache-2.0 dual license.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>rust</category>
      <category>ai</category>
      <category>machinelearning</category>
      <category>iot</category>
    </item>
    <item>
      <title>Why I Built a Multi-Paradigm Database in Rust — And Deployed It on 50+ Raspberry Pis</title>
      <dc:creator>Andrew Jewell Sr</dc:creator>
      <pubDate>Sat, 28 Feb 2026 19:12:15 +0000</pubDate>
      <link>https://forem.com/automatanexus/why-i-built-a-multi-paradigm-database-in-rust-and-deployed-it-on-50-raspberry-pis-1o7e</link>
      <guid>https://forem.com/automatanexus/why-i-built-a-multi-paradigm-database-in-rust-and-deployed-it-on-50-raspberry-pis-1o7e</guid>
      <description>&lt;p&gt;&lt;em&gt;By Andrew Jewell Sr&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;I built an entire building automation ecosystem from scratch, and Aegis-DB is one piece of it.&lt;/p&gt;

&lt;p&gt;NexusBMS is the central building management platform — won an InfluxDB hackathon with it, runs InfluxDB 3.0 OSS alongside Aegis-DB on the central server. The edge controllers are 50+ Raspberry Pi 4/5s running my custom NexusEdge software: Rust hardware daemons for I2C, BACnet, and Modbus communications, direct HVAC equipment control via analog outputs, 24V triacs, 0-10V inputs, 10K/1K thermistor inputs, and dry contact inputs. Custom control logic per equipment type. Pi 5s have Hailo NPU chips running larger ML models for predictive maintenance; Pi 4s run smaller AxonML Rust inference models (AxonML is my own ML framework — also open source).&lt;/p&gt;

&lt;p&gt;16+ facilities. Taylor University, Element Labs, Byrna Ammunition, St. Jude Catholic School, Heritage Point Retirement Facilities in two different cities, and more. Over 120 pieces of equipment — air handlers, boilers, cooling towers, pumps, DOAS units, natatorium pool units, exhaust fans, greenhouses.&lt;/p&gt;

&lt;p&gt;Each Pi needs to store sensor readings, equipment metadata, session state, and stream alerts in real time. And each Pi has 1-4 GB of RAM shared with the hardware daemons, control logic, and ML inference.&lt;/p&gt;

&lt;p&gt;The original plan was the standard stack: Postgres for relational data, InfluxDB for sensor time series, Redis for caching, some message broker for alerts. On a server, that's fine. On a Raspberry Pi in a mechanical room that's already running BACnet/Modbus daemons and ML models with intermittent network to the central server, it's not viable.&lt;/p&gt;

&lt;p&gt;So I built Aegis-DB — a single Rust binary that handles SQL, key-value, document, time series, graph, and event streaming through one REST API on one port. It runs on every Pi at ~50 MB RSS and replicates to the central NexusBMS server using CRDTs.&lt;/p&gt;

&lt;p&gt;But Aegis-DB isn't just for edge devices. It's also the primary database for my PWAs, mobile apps, and the central NexusBMS server itself. The edge deployment is what forced the design to be efficient — but that efficiency benefits every deployment. It scales from a mechanical room Pi to a full server without changing a line of configuration.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Six Data Models, One Raspberry Pi, Plus Everything Else
&lt;/h2&gt;

&lt;p&gt;Each edge controller in a commercial building is already busy. NexusEdge runs Rust hardware daemons handling I2C, BACnet, and Modbus communications. It drives analog outputs, 24V triacs, 0-10V signals, reads 10K/1K thermistors and dry contacts. Custom HVAC control logic runs per equipment type. Pi 5s run ML models on Hailo NPU chips; Pi 4s run AxonML Rust inference. That's the baseline workload before you add a database.&lt;/p&gt;

&lt;p&gt;On top of that, each controller needs to handle multiple kinds of data simultaneously:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Time series:&lt;/strong&gt; Temperature sensors, pressure readings, flow rates — sampled at 1-10 Hz, stored for trending and anomaly detection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Documents:&lt;/strong&gt; Equipment configurations, maintenance records, BACnet point mappings — semi-structured JSON that changes per equipment type&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Key-value:&lt;/strong&gt; Session tokens, cached control setpoints, runtime state that needs sub-millisecond reads&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SQL:&lt;/strong&gt; Relational metadata — which sensors belong to which equipment, which equipment belongs to which building, scheduling tables&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Streaming:&lt;/strong&gt; Real-time alerts when a sensor reading crosses a threshold, change data capture for audit trails&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Graph:&lt;/strong&gt; Equipment relationships — this AHU feeds these VAV boxes, this chiller serves these air handlers, this boiler serves these heating coils&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On a server you'd run six different databases. On a Pi that's already running hardware daemons and ML inference with 1 GB of RAM, you run Aegis-DB. And on the server, you also run Aegis-DB — same binary, same API, just with more headroom.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture: 13 Crates, ~60,000 Lines of Rust
&lt;/h2&gt;

&lt;p&gt;Aegis-DB is a Cargo workspace with 13 crates in a layered dependency structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aegis-dashboard    → Leptos/WASM web UI (cluster monitoring, data browsers, query builder)
aegis-cli          → CLI with interactive SQL shell, node registry, multi-format output
aegis-server       → REST API (Axum, port 9090) + middleware stack + all endpoint handlers
    ↓
aegis-client       → Rust SDK with connection pooling
aegis-query        → SQL parser (sqlparser) → analyzer → cost-based planner → volcano executor
aegis-timeseries   → Gorilla compression, delta-of-delta timestamps, retention policies
aegis-document     → JSON document store, MongoDB-style queries, collection indexes
aegis-streaming    → Pub/sub channels, consumer groups, CDC with before/after images
aegis-replication  → Raft consensus, CRDTs, 2PC, consistent hashing, vector clocks
aegis-monitoring   → System metrics (CPU/RAM/disk/network), alerts, query statistics
aegis-updates      → OTA rolling updates with SHA-256 verification and auto-rollback
    ↓
aegis-storage      → Pluggable backends (Memory, LocalFS), WAL, MVCC, block compression
aegis-memory       → Arena allocators, buffer pool with LRU eviction
    ↓
aegis-common       → Shared types, unified AegisError, configuration
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each paradigm is a real implementation, not a shim over a KV store.&lt;/p&gt;

&lt;h3&gt;
  
  
  SQL Engine
&lt;/h3&gt;

&lt;p&gt;The SQL engine uses the &lt;code&gt;sqlparser&lt;/code&gt; crate for parsing into an AST, then runs through a semantic analyzer, a cost-based query planner, and a volcano-model executor with vectorized batch processing (configurable batch size, default 1024 rows per batch).&lt;/p&gt;

&lt;p&gt;It supports the full DDL/DML surface you'd expect: CREATE/DROP/ALTER TABLE (with ADD/DROP/RENAME COLUMN, ALTER COLUMN type changes, ADD/DROP CONSTRAINT for PRIMARY KEY, UNIQUE, FOREIGN KEY, CHECK), CREATE/DROP INDEX, INSERT, UPDATE, DELETE, SELECT with JOINs, aggregations, subqueries, WHERE/GROUP BY/HAVING/ORDER BY/LIMIT. ALTER TABLE supports IF EXISTS/IF NOT EXISTS clauses.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Indexes:&lt;/strong&gt; B-tree indexes for range queries and hash indexes for point lookups. Indexes accelerate SELECT, UPDATE, and DELETE — not just reads. When an UPDATE's WHERE clause matches an indexed column, Aegis-DB uses the index to find the rows in O(log N) instead of scanning the full table.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Direct execution API:&lt;/strong&gt; For hot paths where SQL parsing overhead matters, Aegis-DB has a closure-based direct execution API (&lt;code&gt;execute_update_indexed_fn&lt;/code&gt;) that bypasses parsing, planning, and expression evaluation entirely. Pre-resolved column indices, combined find+lookup in a single B-tree lock acquisition. This is how the fund transfer benchmark hits 758K TPS — the same operations through the SQL path run at ~40K TPS.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Plan cache:&lt;/strong&gt; An LRU cache (1024 entries) stores parsed and planned queries. Repeated SQL statements skip re-parsing and re-planning entirely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Query safety:&lt;/strong&gt; Configurable &lt;code&gt;max_result_rows&lt;/code&gt; (default 100,000) and &lt;code&gt;query_timeout_secs&lt;/code&gt; (default 30) enforced at the executor level. Queries that exceed either limit return a clear error (HTTP 413 or 408) instead of consuming unbounded memory or CPU.&lt;/p&gt;

&lt;h3&gt;
  
  
  KV Store
&lt;/h3&gt;

&lt;p&gt;DashMap-based concurrent hashmap. 12.3 million reads per second at the engine level, 203K reads/sec over HTTP. Supports optional TTL per key, prefix-based listing, and JSON values.&lt;/p&gt;

&lt;h3&gt;
  
  
  Document Store
&lt;/h3&gt;

&lt;p&gt;JSON document collections with full CRUD. MongoDB-style query operators: &lt;code&gt;$eq&lt;/code&gt;, &lt;code&gt;$ne&lt;/code&gt;, &lt;code&gt;$gt&lt;/code&gt;, &lt;code&gt;$gte&lt;/code&gt;, &lt;code&gt;$lt&lt;/code&gt;, &lt;code&gt;$lte&lt;/code&gt;, &lt;code&gt;$in&lt;/code&gt;, &lt;code&gt;$nin&lt;/code&gt;, &lt;code&gt;$exists&lt;/code&gt;, &lt;code&gt;$regex&lt;/code&gt;, &lt;code&gt;$and&lt;/code&gt;, &lt;code&gt;$or&lt;/code&gt;, &lt;code&gt;$contains&lt;/code&gt;, &lt;code&gt;$startsWith&lt;/code&gt;, &lt;code&gt;$endsWith&lt;/code&gt;. Collection-level indexing (hash and B-tree), sort/skip/limit/projection, and schema-optional storage. Index-accelerated queries route equality filters to hash/B-tree indexes for O(1) lookup instead of full-collection scan.&lt;/p&gt;

&lt;h3&gt;
  
  
  Time Series Engine
&lt;/h3&gt;

&lt;p&gt;Gorilla compression (delta-of-delta timestamps + XOR floats) for high-density storage. Registered metrics with type information (Counter, Gauge, Histogram, Summary). Tag-based grouping, time-range queries, configurable retention policies with automatic cleanup, and automatic downsampling. Persistence layer with atomic writes and crash recovery. Lazy decompression — only decompresses data points within the requested time bounds.&lt;/p&gt;

&lt;h3&gt;
  
  
  Graph Engine
&lt;/h3&gt;

&lt;p&gt;Nodes and edges with property bags. Adjacency lists for O(degree) traversal instead of O(E) full-edge scan. Label index and relationship index for fast filtered lookups. Batch operations for bulk graph construction.&lt;/p&gt;

&lt;h3&gt;
  
  
  Streaming Engine
&lt;/h3&gt;

&lt;p&gt;Pub/sub channels with persistent subscriptions, consumer groups, and change data capture with before/after images. Event types: Create, Update, Delete, Custom. Channel history retrieval. Atomic stats tracking using lock-free atomics instead of RwLock contention.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-Database Isolation
&lt;/h3&gt;

&lt;p&gt;Each application gets its own isolated database namespace. Pass &lt;code&gt;{"database": "my_app", "sql": "..."}&lt;/code&gt; on any query — databases are auto-provisioned on first use. Separate persistence files per database, separate table namespaces, no cross-contamination. One Aegis-DB instance serving multiple apps with full isolation.&lt;/p&gt;

&lt;p&gt;634 tests across the workspace.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Rust?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Performance without GC pauses.&lt;/strong&gt; A database is the worst place for garbage collection. Rust's ownership model gives predictable latency. The KV store does 12.3 million reads per second — that doesn't happen if you're stopping the world.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Fearless concurrency.&lt;/strong&gt; Thousands of connections hitting the same data structures. Rust's type system catches data races at compile time. DashMap for the KV store, RwLock per database for SQL, atomics for stats counters. The number of concurrency bugs in Aegis-DB is near zero.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Single binary deployment.&lt;/strong&gt; &lt;code&gt;cargo install aegis-server&lt;/code&gt; gives you a running database. No JVM, no Python, no node_modules. One 27 MB binary you can scp to a Raspberry Pi and run.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Cross-compilation.&lt;/strong&gt; &lt;code&gt;cargo build --release --target aarch64-unknown-linux-musl&lt;/code&gt; produces a static binary for ARM64. Copy to Pi, chmod +x, done. This is critical when you're deploying to 50+ devices.&lt;/p&gt;

&lt;h2&gt;
  
  
  One Port, Every Data Model
&lt;/h2&gt;

&lt;p&gt;One REST API on port 9090. SQL, KV, documents, time series, graph, streaming, admin, auth, compliance, bulk import, backups, OTA updates — all through one endpoint. Over 90 routes, all implemented and tested. The full API reference is in the &lt;a href="https://github.com/AutomataNexus/Aegis-DB/blob/main/docs/USER_GUIDE.md" rel="noopener noreferrer"&gt;documentation&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Edge Deployment: CRDTs and OTA Updates
&lt;/h2&gt;

&lt;p&gt;This is the part that matters most for the NexusBMS use case.&lt;/p&gt;

&lt;p&gt;Each Raspberry Pi runs Aegis-DB locally. It stores sensor data, serves the local control software, and doesn't depend on network connectivity to the central server. If the network goes down, the Pi keeps collecting data and running inference models. When connectivity returns, it syncs.&lt;/p&gt;

&lt;p&gt;The sync mechanism uses &lt;strong&gt;CRDTs&lt;/strong&gt; — Conflict-free Replicated Data Types. Aegis-DB implements eight CRDT types:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GCounter&lt;/strong&gt; — Grow-only counter (event counts)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PNCounter&lt;/strong&gt; — Positive-negative counter (connection counts)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GSet&lt;/strong&gt; — Grow-only set (discovered devices)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TwoPSet&lt;/strong&gt; — Two-phase set (add and remove, but removed items can't be re-added)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ORSet&lt;/strong&gt; — Observed-remove set (active alerts — can add, remove, re-add)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LWWRegister&lt;/strong&gt; — Last-writer-wins register (latest sensor reading)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MVRegister&lt;/strong&gt; — Multi-value register (concurrent writes preserved until application resolves)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LWWMap&lt;/strong&gt; — Last-writer-wins map (equipment configuration)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;CRDTs are mathematically guaranteed to converge — there's no conflict resolution logic, no "last write wins" heuristics, no merge conflicts. Two Aegis-DB instances that have been disconnected for hours will converge to the same state when they reconnect, regardless of the order operations happened.&lt;/p&gt;

&lt;p&gt;The cluster layer also provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Raft consensus&lt;/strong&gt; with leader election, log replication, and automatic failover&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector clocks&lt;/strong&gt; and &lt;strong&gt;hybrid clocks&lt;/strong&gt; (Lamport + physical time) for causality tracking&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consistent hashing&lt;/strong&gt; — HashRing, JumpHash, and Rendezvous hashing for shard routing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;2-phase commit&lt;/strong&gt; for distributed transactions across nodes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Snapshot replication&lt;/strong&gt; with CRC32 integrity verification&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For &lt;strong&gt;OTA updates&lt;/strong&gt;, Aegis-DB supports rolling updates across the cluster. The central server creates an update plan, nodes update one at a time (followers first, leader last), each update stages the binary with SHA-256 checksum verification, and if a node fails health checks after update, it automatically rolls back to the previous version. This is how I push new Aegis-DB releases across 50+ Pis without visiting each building.&lt;/p&gt;

&lt;h2&gt;
  
  
  Clustering: Three Nodes in Three Commands
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Node 1 — Leader&lt;/span&gt;
aegis-server &lt;span class="nt"&gt;--port&lt;/span&gt; 9090 &lt;span class="nt"&gt;--node-name&lt;/span&gt; Dashboard &lt;span class="nt"&gt;--peers&lt;/span&gt; 127.0.0.1:9091,127.0.0.1:7001

&lt;span class="c"&gt;# Node 2 — Follower&lt;/span&gt;
aegis-server &lt;span class="nt"&gt;--port&lt;/span&gt; 9091 &lt;span class="nt"&gt;--node-name&lt;/span&gt; NexusScribe &lt;span class="nt"&gt;--peers&lt;/span&gt; 127.0.0.1:9090,127.0.0.1:7001 &lt;span class="nt"&gt;--data-dir&lt;/span&gt; /data/nexus

&lt;span class="c"&gt;# Node 3 — Follower&lt;/span&gt;
aegis-server &lt;span class="nt"&gt;--port&lt;/span&gt; 7001 &lt;span class="nt"&gt;--node-name&lt;/span&gt; AxonML &lt;span class="nt"&gt;--peers&lt;/span&gt; 127.0.0.1:9090,127.0.0.1:9091 &lt;span class="nt"&gt;--data-dir&lt;/span&gt; /data/axon
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The CLI discovers nodes automatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aegis-client nodes &lt;span class="nb"&gt;sync&lt;/span&gt;                &lt;span class="c"&gt;# Auto-discover all nodes from running cluster&lt;/span&gt;
aegis-client nodes list                &lt;span class="c"&gt;# See name, URL, role, status, node ID&lt;/span&gt;
aegis-client &lt;span class="nt"&gt;-d&lt;/span&gt; nexusscribe query &lt;span class="s2"&gt;"SELECT * FROM sensors"&lt;/span&gt;
aegis-client &lt;span class="nt"&gt;-d&lt;/span&gt; axonml status
aegis-client &lt;span class="nt"&gt;-d&lt;/span&gt; dashboard shell        &lt;span class="c"&gt;# Interactive SQL shell&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For production, we run the cluster under PM2 with auto-restart, max restart limits, and restart delays. Nodes track heartbeats, uptime, and version. Admin endpoints let you restart, drain, or remove nodes through the API.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Dashboard: Not an Afterthought
&lt;/h2&gt;

&lt;p&gt;Aegis-DB ships with a full web dashboard built in Leptos/WASM, served on port 8000. It's not a third-party add-on — it's a workspace crate that compiles to WebAssembly and talks to the REST API.&lt;/p&gt;

&lt;p&gt;What's in it: cluster health overview, node monitoring with metrics, KV browser, document collection manager, graph visualization, visual query builder for SQL, data visualization, real-time activity feed, user and role management, settings management, and alerts dashboard.&lt;/p&gt;

&lt;p&gt;It supports MFA login (TOTP with QR code generation and backup codes), and the same RBAC system that protects the API protects the dashboard.&lt;/p&gt;

&lt;h2&gt;
  
  
  Compliance: The Feature Nobody Builds
&lt;/h2&gt;

&lt;p&gt;Most databases treat compliance as documentation — "here's how you &lt;em&gt;could&lt;/em&gt; implement GDPR with our database." Aegis-DB has actual compliance endpoints built into the server. Not middleware you configure. Not plugins you install. REST endpoints under &lt;code&gt;/api/v1/compliance/*&lt;/code&gt; that handle real regulatory workflows.&lt;/p&gt;

&lt;h3&gt;
  
  
  GDPR Article 17 — Right to Erasure
&lt;/h3&gt;

&lt;p&gt;When a data subject requests deletion, Aegis-DB doesn't just delete the rows. It searches across all data stores — SQL tables, KV entries, document collections, graph nodes — and removes everything matching the subject identifier. Then it issues a &lt;strong&gt;cryptographic deletion certificate&lt;/strong&gt; — a SHA-256 signed attestation that the data was found and removed, with a timestamp, the requestor identity, and what was deleted. The certificate is independently verifiable through a separate endpoint.&lt;/p&gt;

&lt;p&gt;There's also a full deletion audit trail with integrity verification. Regulators don't want "we deleted it, trust us." They want cryptographic proof.&lt;/p&gt;

&lt;p&gt;The deletion scope is configurable: delete everything, delete from specific collections only, or exclude audit logs (for legal hold requirements).&lt;/p&gt;

&lt;h3&gt;
  
  
  GDPR Article 20 — Data Portability
&lt;/h3&gt;

&lt;p&gt;Structured export of all data associated with a data subject, in machine-readable JSON and flat CSV formats. Configurable export scope and date range filtering.&lt;/p&gt;

&lt;h3&gt;
  
  
  HIPAA PHI Classification
&lt;/h3&gt;

&lt;p&gt;Column-level data classification with six levels: Public, Internal, Confidential, &lt;strong&gt;PHI&lt;/strong&gt;, &lt;strong&gt;PII&lt;/strong&gt;, and Restricted.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Tag a column as containing Protected Health Information&lt;/span&gt;
POST /api/v1/compliance/classify
&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"table"&lt;/span&gt;: &lt;span class="s2"&gt;"patients"&lt;/span&gt;, &lt;span class="s2"&gt;"column"&lt;/span&gt;: &lt;span class="s2"&gt;"diagnosis"&lt;/span&gt;, &lt;span class="s2"&gt;"classification"&lt;/span&gt;: &lt;span class="s2"&gt;"phi"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;# Query all classifications&lt;/span&gt;
GET /api/v1/compliance/classifications
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Classification is enforced at the query engine level. PHI-tagged columns trigger audit logging on every access, and query safety limits prevent bulk extraction of classified data.&lt;/p&gt;

&lt;h3&gt;
  
  
  CCPA Do Not Sell
&lt;/h3&gt;

&lt;p&gt;Tracking and enforcement of consumer opt-out requests via the Do Not Sell endpoint.&lt;/p&gt;

&lt;h3&gt;
  
  
  Consent Management
&lt;/h3&gt;

&lt;p&gt;Full consent lifecycle management with 12 purpose types: Marketing, Analytics, ThirdPartySharing, DataProcessing, Personalization, LocationTracking, Profiling, CrossDeviceTracking, Advertising, Research, DoNotSell, and Custom.&lt;/p&gt;

&lt;p&gt;Each consent record tracks: who consented, to what purpose, when, through which channel (WebForm, MobileApp, API, PaperForm, Email, etc.), which privacy policy version they saw, and metadata like IP address and user agent. You can grant, deny, withdraw, renew, and export consent. Full audit trail with actor identification and reason tracking. Consent expiration with automatic validation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Breach Detection and Notification
&lt;/h3&gt;

&lt;p&gt;Anomaly detection on access patterns with configurable thresholds:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Failed login detection (default: 5 attempts in 5 minutes)&lt;/li&gt;
&lt;li&gt;Unusual access patterns (default: 100 accesses in 1 minute)&lt;/li&gt;
&lt;li&gt;Mass data operations (default: 1000+ rows affected)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Detected breaches are tracked with severity levels (Critical, High, Medium, Low), can be acknowledged and resolved through the API, and generate incident reports. Webhook notifications for real-time alerting to your incident response system.&lt;/p&gt;

&lt;p&gt;These aren't checkboxes. In regulated industries — healthcare, education, financial services, and yes, commercial building management with tenant data — this matters. Every facility I manage has different compliance requirements, and I got tired of bolting compliance onto applications after the fact.&lt;/p&gt;

&lt;h2&gt;
  
  
  Security
&lt;/h2&gt;

&lt;p&gt;Security was built in from the first commit, not bolted on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;TLS 1.2/1.3&lt;/strong&gt; via rustls — no OpenSSL dependency, modern cipher suites only&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Argon2id&lt;/strong&gt; password hashing — 19 MB memory cost, 2 iterations, unique random salts. Memory-hard, resistant to GPU cracking&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RBAC&lt;/strong&gt; with three tiers (Admin, Operator, Viewer) and 25+ granular permissions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OAuth2/OIDC&lt;/strong&gt; and &lt;strong&gt;LDAP/Active Directory&lt;/strong&gt; integration — pluggable AuthProvider trait&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MFA&lt;/strong&gt; with TOTP (RFC 6238) — QR code generation, backup codes, enforcement per user&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HashiCorp Vault&lt;/strong&gt; integration — Token auth, AppRole, and Kubernetes auth methods. Secrets loaded through a SecretsManager that chains Vault → environment variables → defaults&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rate limiting&lt;/strong&gt; — Token bucket algorithm. 1000 requests/min for API, 30/min for login endpoints, per-client IP (with X-Forwarded-For and X-Real-IP proxy support)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security headers&lt;/strong&gt; on every response — Content-Security-Policy, X-Content-Type-Options: nosniff, X-Frame-Options: DENY, X-XSS-Protection, HSTS when TLS is active&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Request IDs&lt;/strong&gt; — UUID per request in x-request-id header for distributed tracing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Encrypted backups&lt;/strong&gt; — AES-256-GCM authenticated encryption with random nonces, SHA-256 checksums per file&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cryptographic audit log verification&lt;/strong&gt; — tamper-evident audit trail&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Storage Engine
&lt;/h2&gt;

&lt;p&gt;Pluggable backend architecture via the &lt;code&gt;StorageBackend&lt;/code&gt; trait. Two implementations: in-memory (for development and testing) and local filesystem (for production with persistence).&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Write-ahead logging&lt;/strong&gt; for durability — every mutation hits the WAL before it's applied. Crash recovery replays the WAL on startup&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MVCC with snapshot isolation&lt;/strong&gt; — concurrent reads never block writes, no dirty reads&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Block compression&lt;/strong&gt; — LZ4 (fast), Zstd (balanced), Snappy (Google's fast compressor). Configurable per backend&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Buffer pool&lt;/strong&gt; with LRU eviction for page caching&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Arena allocators&lt;/strong&gt; for batch memory allocation patterns&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Checkpoint creation&lt;/strong&gt; for point-in-time recovery&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;VACUUM/Compaction&lt;/strong&gt; — removes dead rows from MVCC, shrinks storage vectors, and fully rebuilds all indexes. Available through &lt;code&gt;POST /api/v1/admin/vacuum&lt;/code&gt; with optional table targeting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When a &lt;code&gt;--data-dir&lt;/code&gt; is set, Aegis-DB persists everything to disk: SQL tables, KV data, document collections, time series, graph data, audit logs, and activity records. Daily rotating log files are written to &lt;code&gt;{data_dir}/logs/&lt;/code&gt; via tracing-appender.&lt;/p&gt;

&lt;h2&gt;
  
  
  SDKs and Tools
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Python SDK
&lt;/h3&gt;

&lt;p&gt;Async-first client built on aiohttp. Connection pooling, automatic retry with configurable attempts, timeout configuration, streaming query support with batching, and full type definitions. Authentication with username/password or API key, MFA support, database switching.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aegisdb&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AegisClient&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AegisClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:9090&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;login&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;admin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM sensors WHERE location = &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;AHU-1&lt;/span&gt;&lt;span class="sh"&gt;'"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;kv_set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;setpoint:ahu1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;72.0&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3600&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  JavaScript/TypeScript SDK
&lt;/h3&gt;

&lt;p&gt;Fetch API based — works in both browser and Node.js. Promise-based async API, abort signal support for cancellation, full TypeScript type definitions (Row, QueryResult, TableInfo, ColumnInfo, KeyValueEntry, GraphData), token-based session management.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;AegisClient&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@aegis-db/client&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;AegisClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;http://localhost:9090&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;login&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;admin&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;password&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;SELECT * FROM sensors&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  CLI
&lt;/h3&gt;

&lt;p&gt;Interactive SQL shell, multi-format output (table/JSON/CSV), node registry for cluster shorthand, and KV operations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aegis-client &lt;span class="nt"&gt;-d&lt;/span&gt; nexusscribe shell              &lt;span class="c"&gt;# Interactive SQL shell&lt;/span&gt;
aegis-client &lt;span class="nt"&gt;-d&lt;/span&gt; nexusscribe query &lt;span class="s2"&gt;"SELECT 1"&lt;/span&gt;   &lt;span class="c"&gt;# Single query&lt;/span&gt;
aegis-client &lt;span class="nt"&gt;-d&lt;/span&gt; axonml kv &lt;span class="nb"&gt;set &lt;/span&gt;mykey &lt;span class="s2"&gt;"myvalue"&lt;/span&gt;  &lt;span class="c"&gt;# KV operations&lt;/span&gt;
aegis-client &lt;span class="nt"&gt;-d&lt;/span&gt; axonml kv list &lt;span class="nt"&gt;--prefix&lt;/span&gt; &lt;span class="s2"&gt;"sensor:"&lt;/span&gt;
aegis-client nodes &lt;span class="nb"&gt;sync&lt;/span&gt;                        &lt;span class="c"&gt;# Auto-discover cluster&lt;/span&gt;
aegis-client nodes list                        &lt;span class="c"&gt;# Show all nodes with roles&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Grafana Data Source Plugin
&lt;/h3&gt;

&lt;p&gt;Native Grafana data source plugin for SQL queries, time series visualization, annotations, and template variables.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmarks
&lt;/h2&gt;

&lt;p&gt;Tested on Intel Core Ultra 9 275HX, 55 GB RAM, Rust 1.92.0.&lt;/p&gt;

&lt;h3&gt;
  
  
  Engine-Level (direct calls, no network)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Workload&lt;/th&gt;
&lt;th&gt;Throughput&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SQL single-row insert&lt;/td&gt;
&lt;td&gt;223,000 rows/sec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SQL batch insert (1000 rows)&lt;/td&gt;
&lt;td&gt;195,000 rows/sec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;KV read (64B values)&lt;/td&gt;
&lt;td&gt;12,350,000 ops/sec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;KV write (64B values)&lt;/td&gt;
&lt;td&gt;3,970,000 ops/sec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;KV delete&lt;/td&gt;
&lt;td&gt;2,657,000 ops/sec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fund transfer, 0% contention&lt;/td&gt;
&lt;td&gt;758,000 TPS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fund transfer, high contention (Zipf)&lt;/td&gt;
&lt;td&gt;2,496,000 TPS&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  vs SpacetimeDB
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Workload&lt;/th&gt;
&lt;th&gt;Aegis-DB&lt;/th&gt;
&lt;th&gt;SpacetimeDB&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Fund transfer, 0% contention&lt;/td&gt;
&lt;td&gt;758,000 TPS&lt;/td&gt;
&lt;td&gt;107,850 TPS&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;7x faster&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fund transfer, high contention&lt;/td&gt;
&lt;td&gt;2,496,000 TPS&lt;/td&gt;
&lt;td&gt;103,590 TPS&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;24x faster&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The fund transfer benchmark is the standard database comparison workload: debit one account, credit another, enforce non-negative balances. SpacetimeDB runs this as compiled WASM modules inside the database. Aegis-DB's direct execution API — closure-based indexed updates with pre-resolved column indices, combined find+lookup in a single B-tree lock acquisition — achieves 7-24x the throughput.&lt;/p&gt;

&lt;h3&gt;
  
  
  HTTP API (50 concurrent connections, 10s duration)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Endpoint&lt;/th&gt;
&lt;th&gt;Throughput&lt;/th&gt;
&lt;th&gt;Avg Latency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SQL insert&lt;/td&gt;
&lt;td&gt;80,450 ops/sec&lt;/td&gt;
&lt;td&gt;620 μs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SQL read&lt;/td&gt;
&lt;td&gt;40,496 ops/sec&lt;/td&gt;
&lt;td&gt;1.2 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;KV get&lt;/td&gt;
&lt;td&gt;203,117 ops/sec&lt;/td&gt;
&lt;td&gt;245 μs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mixed 80/20 read/write&lt;/td&gt;
&lt;td&gt;23,868 ops/sec&lt;/td&gt;
&lt;td&gt;2.1 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These are single-node numbers on a fast dev machine. The Pi controllers obviously don't hit these numbers, but they don't need to — 1 Hz sensor polling doesn't require millions of ops per second.&lt;/p&gt;

&lt;h2&gt;
  
  
  Design Trade-offs
&lt;/h2&gt;

&lt;p&gt;Aegis-DB is a unified platform, not six separate databases crammed into one binary. Each paradigm is purpose-built for the workloads it handles — but the design philosophy prioritizes breadth and operational simplicity over matching every niche feature of single-paradigm databases.&lt;/p&gt;

&lt;p&gt;The SQL engine covers the DDL/DML surface that production applications actually use: JOINs, aggregations, subqueries, indexes, ALTER TABLE, constraints. The document store supports the MongoDB-style query operators that handle real filtering workloads. The graph engine provides adjacency-list traversal with property lookups — the operations that equipment relationship mapping and dependency tracking actually need.&lt;/p&gt;

&lt;p&gt;What you get in return is one binary, one port, one backup, one set of credentials, one monitoring dashboard, and one replication layer across all six data models. For teams running on constrained hardware, multi-tenant environments, or anywhere operational complexity matters more than squeezing out the last 5% of a single paradigm — that's the right trade-off.&lt;/p&gt;

&lt;h2&gt;
  
  
  Licensing
&lt;/h2&gt;

&lt;p&gt;Aegis-DB uses the Business Source License 1.1. You can use it for anything — development, testing, production, internal tools, SaaS applications — except reselling it as a managed database service. In 2030 it converts to Apache 2.0.&lt;/p&gt;

&lt;p&gt;I chose BSL because I wanted the project to be sustainable. Every company can use it for free. The only restriction is that cloud providers can't take it and sell "Managed Aegis-DB" without a commercial license.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cargo &lt;span class="nb"&gt;install &lt;/span&gt;aegis-server
aegis-server

&lt;span class="c"&gt;# SQL&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:9090/api/v1/query &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"sql": "CREATE TABLE sensors (id INT, name TEXT, location TEXT)"}'&lt;/span&gt;

curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:9090/api/v1/query &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"sql": "INSERT INTO sensors VALUES (1, '&lt;/span&gt;&lt;span class="se"&gt;\'&lt;/span&gt;&lt;span class="s1"&gt;'supply_air_temp'&lt;/span&gt;&lt;span class="se"&gt;\'&lt;/span&gt;&lt;span class="s1"&gt;', '&lt;/span&gt;&lt;span class="se"&gt;\'&lt;/span&gt;&lt;span class="s1"&gt;'AHU-1'&lt;/span&gt;&lt;span class="se"&gt;\'&lt;/span&gt;&lt;span class="s1"&gt;')"}'&lt;/span&gt;

&lt;span class="c"&gt;# KV&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:9090/api/v1/kv/keys &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"key": "setpoint:ahu1", "value": {"temp": 72.0}}'&lt;/span&gt;

&lt;span class="c"&gt;# Time Series&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:9090/api/v1/timeseries/write &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"metric": "supply_air_temp", "value": 54.2, "tags": {"unit": "AHU-1"}}'&lt;/span&gt;

&lt;span class="c"&gt;# Documents&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:9090/api/v1/documents/collections &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"name": "equipment"}'&lt;/span&gt;

curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:9090/api/v1/documents/collections/equipment/documents &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"type": "AHU", "name": "AHU-1", "capacity_tons": 25, "serves": ["VAV-1", "VAV-2"]}'&lt;/span&gt;

&lt;span class="c"&gt;# Graph&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:9090/api/v1/graph/nodes &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"label": "Equipment", "properties": {"name": "AHU-1", "type": "air_handler"}}'&lt;/span&gt;

&lt;span class="c"&gt;# Streaming&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:9090/api/v1/streaming/channels &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"name": "alerts"}'&lt;/span&gt;

curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:9090/api/v1/streaming/publish &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"channel": "alerts", "event": {"type": "high_temp", "unit": "AHU-1", "value": 85.2}}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All six data models, one port, one binary.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/AutomataNexus/Aegis-DB" rel="noopener noreferrer"&gt;github.com/AutomataNexus/Aegis-DB&lt;/a&gt;&lt;br&gt;
Crates.io: &lt;a href="https://crates.io/crates/aegis-server" rel="noopener noreferrer"&gt;crates.io/crates/aegis-server&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Andrew Jewell Sr is the founder of AutomataNexus LLC. Aegis-DB powers the NexusBMS building management platform across 16+ commercial facilities on 50+ edge controllers.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>rust</category>
      <category>database</category>
      <category>opensource</category>
      <category>iot</category>
    </item>
  </channel>
</rss>
