Forem

soy profile picture

soy

Patent lawyer turned AI engineer. Processed 4M patents with local LLM on RTX 5090. Building PatentLLM — AI-powered patent search. Also ranked #1 on Floodgate (shogi AI). Writing about local LLM etc.

Self-Host Like a Pro: From Security Tools to 100x Faster AI Agent Sandboxing

Self-Host Like a Pro: From Security Tools to 100x Faster AI Agent Sandboxing

Comments
3 min read

Want to connect with soy?

Create an account to connect with soy. You can also sign in below to proceed if you already have an account.

Already have an account? Sign in
SQLite, Go/Postgres, & Petabytes: Database Patterns for Builders

SQLite, Go/Postgres, & Petabytes: Database Patterns for Builders

1
Comments
4 min read
Local LLMs & Agents: Build Real-time Deepfakes, Agent Frameworks & AI Scientists

Local LLMs & Agents: Build Real-time Deepfakes, Agent Frameworks & AI Scientists

Comments
3 min read
Boost Local LLMs: TurboQuant KV Cache, Fast Cold Starts, & Rust GPU Dev

Boost Local LLMs: TurboQuant KV Cache, Fast Cold Starts, & Rust GPU Dev

Comments
4 min read
Local LLM Efficiency & Security: TurboQuant Innovations and Supply Chain Alerts

Local LLM Efficiency & Security: TurboQuant Innovations and Supply Chain Alerts

Comments
3 min read
Self-Host Strong, AI Agents Fast, & Master Your JSON Tools

Self-Host Strong, AI Agents Fast, & Master Your JSON Tools

2
Comments
3 min read
Building High-Performance Data Stacks: Vector Search, SQLite Ops, & Open-Source Monitoring

Building High-Performance Data Stacks: Vector Search, SQLite Ops, & Open-Source Monitoring

1
Comments
4 min read
Local AI Agents, Voice Models & Self-Hosted Research Tools

Local AI Agents, Voice Models & Self-Hosted Research Tools

Comments
4 min read
Local LLM Power-Ups: Voxtral TTS, TurboQuant, & Sub-Second Cold Starts

Local LLM Power-Ups: Voxtral TTS, TurboQuant, & Sub-Second Cold Starts

Comments
3 min read
GPU-Accelerated LLMs: Serving at 1M Tok/s, Voxtral TTS, & 4-bit Weight Quantization

GPU-Accelerated LLMs: Serving at 1M Tok/s, Voxtral TTS, & 4-bit Weight Quantization

Comments
3 min read
DIY Compute: Tesla Hacking, RAG Systems, and Blazing Fast AI Agents

DIY Compute: Tesla Hacking, RAG Systems, and Blazing Fast AI Agents

Comments
4 min read
Building & Monitoring Data Backends: Tools, Architecture, and Observability

Building & Monitoring Data Backends: Tools, Architecture, and Observability

Comments
4 min read
Cutting-Edge AI Agents: Building Multi-Agent Workflows Locally

Cutting-Edge AI Agents: Building Multi-Agent Workflows Locally

Comments
3 min read
Local LLM Unleashed: Faster Inference, Instant Starts, & Open TTS

Local LLM Unleashed: Faster Inference, Instant Starts, & Open TTS

Comments
4 min read
Local LLM Acceleration: Quantization, TTS, and 1M Tokens/Sec

Local LLM Acceleration: Quantization, TTS, and 1M Tokens/Sec

Comments
4 min read
vLLM On-Demand Gateway: Zero-VRAM Standby for Local LLMs on Consumer GPUs

vLLM On-Demand Gateway: Zero-VRAM Standby for Local LLMs on Consumer GPUs

1
Comments
4 min read
Databases Are the New AI Moat: Why DB-First Architecture Changes Everything

Databases Are the New AI Moat: Why DB-First Architecture Changes Everything

Comments
5 min read
Local LLM Apps, Persistent Certs & K8s Storage Mastery

Local LLM Apps, Persistent Certs & K8s Storage Mastery

Comments
4 min read
DIY Data Stacks: Building, Optimizing, and Self-Hosting Your Data Infrastructure

DIY Data Stacks: Building, Optimizing, and Self-Hosting Your Data Infrastructure

Comments
3 min read
Next-Gen LLM Dev: APIs, Agents, and Accessible Local Inference

Next-Gen LLM Dev: APIs, Agents, and Accessible Local Inference

Comments
4 min read
New Arc GPUs, Supply Chain Security, and Deep CUDA Optimization

New Arc GPUs, Supply Chain Security, and Deep CUDA Optimization

Comments
3 min read
Local LLMs & Edge AI: Hardware Boost, Security Fixes, and Extreme Compression

Local LLMs & Edge AI: Hardware Boost, Security Fixes, and Extreme Compression

Comments
4 min read
Urgent Security Alerts & Self-Hosted Swarm: Building Local LLM Infra Safely

Urgent Security Alerts & Self-Hosted Swarm: Building Local LLM Infra Safely

1
Comments
3 min read
New Local-First SQL Tools, Vector Database MVCC, & AI Agent Query Risks

New Local-First SQL Tools, Vector Database MVCC, & AI Agent Query Risks

Comments
3 min read
Local AI Agents, Scalable Memory, and Multimodal Creation: Top Dev Tools

Local AI Agents, Scalable Memory, and Multimodal Creation: Top Dev Tools

Comments
3 min read
Local LLM Security Criticals, Rust on GPU, & Deep Dive into PTX Optimization

Local LLM Security Criticals, Rust on GPU, & Deep Dive into PTX Optimization

Comments
3 min read
Local LLM Revolution: Speed, Security, and Million-Token Contexts

Local LLM Revolution: Speed, Security, and Million-Token Contexts

Comments
3 min read
AI's Infrastructure & Agents: From Chips to Code Automation

AI's Infrastructure & Agents: From Chips to Code Automation

Comments
4 min read
I Built a SQLite Editor in 180 Lines, Then Rebuilt It in 380 for the Browser

I Built a SQLite Editor in 180 Lines, Then Rebuilt It in 380 for the Browser

Comments
3 min read
Vision and Hardware Strategy Shaping the Future of AI: From Apple to AGI and AI Chips

Vision and Hardware Strategy Shaping the Future of AI: From Apple to AGI and AI Chips

Comments
3 min read
The Future of Open Source and Security: From Geopolitics to Threats in the Development Field

The Future of Open Source and Security: From Geopolitics to Threats in the Development Field

Comments
4 min read
The Forefront of Development Efficiency with AI Agents: From OSS to Code Review

The Forefront of Development Efficiency with AI Agents: From OSS to Code Review

Comments
3 min read
The Dawn of the Local AI Era: From iPhone 17 Pro to the Future of NVIDIA RTX

The Dawn of the Local AI Era: From iPhone 17 Pro to the Future of NVIDIA RTX

Comments
3 min read
Today's LLM Frontier: From the Breakthrough of Kimi K2.5 to GPT-5.4/Gemini Flash-Lite

Today's LLM Frontier: From the Breakthrough of Kimi K2.5 to GPT-5.4/Gemini Flash-Lite

Comments
3 min read
Canonical Eyes IPO — Ubuntu Proves the Revival of Linux and OSS

Canonical Eyes IPO — Ubuntu Proves the Revival of Linux and OSS

Comments
3 min read
Developer Security and AI Industry Trends: Langflow Vulnerability, Cargo Advisory, and the State of AI at GDC

Developer Security and AI Industry Trends: Langflow Vulnerability, Cargo Advisory, and the State of AI at GDC

Comments
3 min read
AI and Cloud Infrastructure Convergence: Innovations in Cloudflare Workers AI, Project Nomad, and Trainium

AI and Cloud Infrastructure Convergence: Innovations in Cloudflare Workers AI, Project Nomad, and Trainium

Comments
3 min read
Next-Generation LLM Inference Technology: From Flash-MoE to Gemini Flash-Lite, and Local GPU Utilization

Next-Generation LLM Inference Technology: From Flash-MoE to Gemini Flash-Lite, and Local GPU Utilization

Comments
3 min read
The Wave of Open-Source AI and Investment in Security: Trends from Qwen, MS, and Google

The Wave of Open-Source AI and Investment in Security: Trends from Qwen, MS, and Google

Comments
3 min read
Current Frontline in AI Agent Development: Robust Agent Design and Security Measures

Current Frontline in AI Agent Development: Robust Agent Design and Security Measures

Comments
3 min read
Running Karpathy's autoresearch with Local LLM — Zero API Cost Autonomous AI Research

Running Karpathy's autoresearch with Local LLM — Zero API Cost Autonomous AI Research

Comments
3 min read
Built a Local-First RAG Research Tool with Nemotron + vLLM + Tool Calling

Built a Local-First RAG Research Tool with Nemotron + vLLM + Tool Calling

Comments
2 min read
AI Era Security and OSS: Trivy Compromise, Google and Cloudflare's Countermeasures

AI Era Security and OSS: Trivy Compromise, Google and Cloudflare's Countermeasures

Comments
3 min read
Frontiers of AI Agent Development: Claude HUD, OpenAI Monitoring, and the Impact of Astral Acquisition

Frontiers of AI Agent Development: Claude HUD, OpenAI Monitoring, and the Impact of Astral Acquisition

Comments
3 min read
Today's Local LLM Acceleration Techniques: ik_llama.cpp Speedup, Tinybox, and NVIDIA GTC Latest Trends

Today's Local LLM Acceleration Techniques: ik_llama.cpp Speedup, Tinybox, and NVIDIA GTC Latest Trends

Comments
3 min read
AI Questions the Future of the Internet: Freedom, Quality Decline, and Content Reliability

AI Questions the Future of the Internet: Freedom, Quality Decline, and Content Reliability

Comments
7 min read
Data Preparation and Security to Accelerate AI Development: Leveraging Open Source Tools

Data Preparation and Security to Accelerate AI Development: Leveraging Open Source Tools

Comments
6 min read
Next-Gen LLMs: Deep Dive into Compact, High-Speed Models and Temporal Reasoning – Gemini 3.1 Flash-Lite, GPT-5.4 mini/nano

Next-Gen LLMs: Deep Dive into Compact, High-Speed Models and Temporal Reasoning – Gemini 3.1 Flash-Lite, GPT-5.4 mini/nano

Comments
7 min read
AI Agent Safety and Operations: Frontline Measures Against Prompt Injection and Monitoring

AI Agent Safety and Operations: Frontline Measures Against Prompt Injection and Monitoring

Comments
6 min read
2026: Local AI Evolves! From Offline Devices to Large-Scale Inference on RTX

2026: Local AI Evolves! From Offline Devices to Large-Scale Inference on RTX

Comments
5 min read
New Synergy of Cloud and AI: Cloudflare Workers AI, Google's OSS Security, and AI Integration in WordPress

New Synergy of Cloud and AI: Cloudflare Workers AI, Google's OSS Security, and AI Integration in WordPress

Comments
5 min read
RTX 40 Series Makes LLM Blazing Fast! The Complete Guide to Inference Optimization for Individual Developers [2026 Latest Edi...

RTX 40 Series Makes LLM Blazing Fast! The Complete Guide to Inference Optimization for Individual Developers [2026 Latest Edi...

Comments
7 min read
The Real Inflection Point GTC 2026 Quietly Announced — Why NVIDIA Bet on "Open"

The Real Inflection Point GTC 2026 Quietly Announced — Why NVIDIA Bet on "Open"

1
Comments
9 min read
Training a Shogi Engine: ONNX Conversion, TensorRT, and Getting Crushed by Ryfamate

Training a Shogi Engine: ONNX Conversion, TensorRT, and Getting Crushed by Ryfamate

Comments
4 min read
Lennart Poettering and the systemd Wars: The Most Controversial Software in Linux History

Lennart Poettering and the systemd Wars: The Most Controversial Software in Linux History

Comments
6 min read
The README Trap: Why AI Coding Assistants Skip Your Docs (and 3 Fixes)

The README Trap: Why AI Coding Assistants Skip Your Docs (and 3 Fixes)

1
Comments
4 min read
From 30 Seconds to 3 Milliseconds: Replacing LIKE with FTS5 on 1.7M Patent Records

From 30 Seconds to 3 Milliseconds: Replacing LIKE with FTS5 on 1.7M Patent Records

1
Comments
4 min read
SoyLM: Building a Zero-Dependency Local RAG Tool in a Single Python File

SoyLM: Building a Zero-Dependency Local RAG Tool in a Single Python File

Comments
5 min read
Adding Stripe Checkout to a Solo SaaS: Lessons from PatentLLM's $1K/mo Plan

Adding Stripe Checkout to a Solo SaaS: Lessons from PatentLLM's $1K/mo Plan

3
Comments
4 min read
When Gemini Hallucinates Patent Numbers: Fixing the FTS5 + LLM Analysis Pipeline

When Gemini Hallucinates Patent Numbers: Fixing the FTS5 + LLM Analysis Pipeline

Comments
3 min read
loading...