Forem: Nguyen Dong

How 5 AI Agents Run Our SOC Autonomously — Architecture Deep Dive

Nguyen Dong — Tue, 17 Mar 2026 12:42:42 +0000

We replaced a 6-person SOC team with 5 AI agents running 24/7 for $5/month in API costs. Here's the architecture.

The Problem: Alert Fatigue is Killing SMB Security

The average SOC receives 11,000 alerts per day. Enterprise teams with 10+ analysts struggle to keep up. Now imagine an SMB with zero security staff.

That was our starting point. We built VRadar — a cloud SOC platform for SMBs — and quickly realized that collecting alerts is useless if nobody's reading them. A dashboard with 1,000 unread alerts is the same as having no dashboard at all.

So we did something unconventional: we built 5 specialized AI agents, each handling a different aspect of SOC operations. Not one monolithic AI — five focused agents that collaborate.

The 5 Agents

Here's what each agent does and how they interact:

                        ┌─────────────────┐
                        │   AI Operator   │ ← Alert triage (GPT-4o-mini)
                        │   Every 5 min   │   Batch 100 alerts
                        └────────┬────────┘
                                 │ escalate / create incident
                        ┌────────▼────────┐
                        │   AI Monitor    │ ← Infrastructure health
                        │   Every 10 min  │   10 health checks
                        └────────┬────────┘
                                 │ degraded service alert
                        ┌────────▼────────┐
                        │  AI Optimizer   │ ← Resource + threat defense
                        │  & Firewall     │   Auto-block attackers
                        └────────┬────────┘
                                 │ knowledge for responses
                        ┌────────▼────────┐
                        │   AI Care       │ ← Customer support (RAG)
                        │   Real-time     │   Auto-reply chat + social
                        └────────┬────────┘
                                 │ content from knowledge base
                        ┌────────▼────────┐
                        │  AI Marketing   │ ← Content + social media
                        │   On-demand     │   Auto-reply FB comments
                        └─────────────────┘

Agent 1: AI Operator — The Autonomous SOC Analyst

Job: Triage every security alert and decide what to do.

How it works:

Cron job runs every 5 minutes
Pulls up to 100 unprocessed alerts from PostgreSQL
Each alert goes through GPT-4o-mini with function calling
LLM chooses from 5 actions: block_ip, create_incident, acknowledge, escalate, notify_customer
Each action executes real consequences (Wazuh Active Response, incident creation, notifications)

The economics trick — Hybrid AI mode:

90%+ of security alerts are LOW or MEDIUM severity (Windows Event 4624 "successful login", Sysmon process creation, etc.). Sending these to GPT-4o costs $0.002/alert. At 1,000 alerts/day, that's $60/month per tenant.

Our Hybrid mode: LOW + MEDIUM → rule-based auto-acknowledge ($0), HIGH + CRITICAL → LLM triage ($0.0002/alert). Total: ~$2-5/month per tenant. 94% cost savings.

if (processingMode === 'hybrid_ai') {
  const lowMedAlerts = alerts.filter(a => 
    ['LOW', 'MEDIUM'].includes(a.severity));
  // Rule-based: auto-ack, $0
  await autoAcknowledgeLow(lowMedAlerts);

  const highCritAlerts = alerts.filter(a => 
    ['HIGH', 'CRITICAL'].includes(a.severity));
  // LLM: function calling, ~$0.0002/alert
  await processWithLLM(highCritAlerts);
}

Human-in-the-loop: Every AI decision is logged in AiOperatorDecision with confidence score. Admin can override any decision. There's an evaluation system (6 mock scenarios) for testing AI accuracy without executing real actions.

Agent 2: AI Monitor — Infrastructure Watchdog

Job: Ensure all 12 Docker containers and security services are healthy.

10 health checks (6 infra + 4 security):

Check	What it does
Docker containers	Auto-discover + verify all 12 containers running
PostgreSQL	Connection + query latency
ClickHouse	Connection + table accessibility
Redis	Connection + memory usage
Disk usage	Alert if > 85%
Memory usage	Alert if > 90%
Anomaly detection	ML service health (IsolationForest + LSTM)
Agent heartbeat	Wazuh agent connectivity check
Failed logins	Brute force detection (suspicious patterns)
SSL certificate	TLS handshake to vradar.io:443, check expiry

Runs every 10 minutes. Results stored in SystemConfig. Uptime trend visualization in the dashboard.

Agent 3: AI Optimizer & Firewall — Self-Defense System

Job: Optimize resources and auto-block attackers.

This agent is unique because it runs on every single HTTP request via middleware:

// threat-defense.middleware.ts — runs on EVERY request
app.use(threatDefenseMiddleware);

What it tracks per IP:

Request rate (Redis counter, 60-second window)
4xx error rate (scanning detection)
Known-bad User-Agent patterns (nmap, sqlmap, nuclei, etc.)

Auto-response: IP exceeds threshold → blocked in Redis → all future requests return 403.

Resource monitoring (8 sub-checks):

OS disk/RAM usage
Redis memory consumption
ClickHouse table sizes across all tenants
AI cost tracking (LLM API calls in last 24h)
Expired session detection
Device capacity per tenant
Cross-service degradation alerts

Result: We've auto-blocked 2,197 malicious IPs on our VPS without human intervention. Current active blocks: 209.

Agent 4: AI Care — RAG-Powered Customer Support

Job: Auto-reply customer chat messages using Retrieval-Augmented Generation.

Architecture:

Customer sends message → Chat API
    ↓
triggerAICareReply() — async
    ↓
ChromaDB semantic search (all-MiniLM-L6-v2 embeddings)
    ↓
Top 3 relevant knowledge chunks retrieved
    ↓
GPT-4o-mini generates reply with context
    ↓
Confidence check (threshold: 0.7)
    ↓
If confident → auto-reply as AI_CARE bot
If not → escalate to human agent

Knowledge base: Kreuzberg (document extraction + OCR) processes uploaded PDFs/DOCX → chunks → ChromaDB vector store. We pre-loaded 820 lines of VRadar product knowledge.

Bonus: Works on Facebook Messenger and Zalo OA too. Same RAG pipeline, different input channels.

Agent 5: AI Marketing — Content & Social Manager

Job: Generate marketing content and manage social media interactions.

Content generation: 5 channels (Facebook, LinkedIn, Zalo, Blog, Email) with distinct tones
DALL-E 3 image generation: Branded cybersecurity visuals ($0.04/image)
Facebook comment auto-reply: Webhook receives mentions → RAG-enriched AI response → posts via Graph API
Smart scheduling: Platform-specific optimal posting times

The Shared Brain: Unified LLM Service

All 5 agents share one llm.service.ts:

const response = await callLLMText(prompt, {
  model: 'gpt-4o-mini',     // Default for all agents
  maxTokens: 500,            // Cost control
  systemPrompt: agentPrompt, // Agent-specific context
});

Key design decisions:

Single model everywhere: GPT-4o-mini ($0.15/1M tokens) instead of GPT-4o ($2.50/1M). 94% savings, negligible quality difference for SOC tasks.
Redis caching: Knowledge search results (1h TTL) + AI replies (30min TTL). Same question = cached answer = $0.
Graceful degradation: If OpenAI is down, agents log the failure but don't crash. Security monitoring continues without AI triage.
Cost tracking: Every LLM call logged with token count. Dashboard shows daily/weekly AI spend.

Running Cost Breakdown

Agent	Frequency	Cost/Month (per tenant)
AI Operator (Hybrid)	Every 5 min	~$2-5
AI Monitor	Every 10 min	~$0.50
AI Optimizer	Continuous (middleware)	$0 (rule-based)
AI Care	On-demand (chat)	~$0.10-1.00
AI Marketing	On-demand	~$0.50-2.00
Total		~$3-8/month

Compare this to a 6-person SOC team: $300K-600K/year in salaries alone.

Lessons From Building AI Agents for Production

Don't build one mega-agent. Specialized agents with clear boundaries are easier to debug, test, and iterate. Our AI Operator went through 4 rewrites — without touching the other agents.
Hybrid AI is mandatory. Sending every LOW-severity alert to an LLM is burning money. Rule-based filtering for the 90% + LLM for the 10% = same security, 94% less cost.
Function calling > text parsing. GPT-4o-mini with structured function calling (block_ip, create_incident) is dramatically more reliable than asking it to output JSON or parse text responses.
Cache aggressively. The same customer asks "What is VRadar?" 50 times. Redis cache = $0 after the first answer.
Log everything. Every AI decision, every confidence score, every action taken. When a customer asks "why did AI block this IP?", you need the audit trail.
Build an evaluation mode. Our AI Operator has 6 test scenarios that run through the full LLM pipeline without executing real actions. Test before you deploy.

Try VRadar

VRadar is live at vradar.io — AI-powered SOC from $25/device/month.

All 5 agents are running in production right now, monitoring real customer networks across ASEAN.

I'm Dong, solo dev from Vietnam. Built VRadar's 5-agent SOC system over 3 months. Happy to deep-dive on any architectural question in the comments.

Tags: #ai #cybersecurity #soc #llm #gpt4 #architecture #startup #buildinpublic #agents #aisecurity #devto #opensource

HIDS + NIDS: Why Your SMB Needs Both (And How We Integrated Wazuh + Suricata in a Single Platform)

Nguyen Dong — Tue, 10 Mar 2026 12:09:42 +0000

Most SMBs think they're "covered" with just antivirus. Here's why that's like locking the front door but leaving every window wide open.

The Blind Spot in SMB Security

I've talked to dozens of SMB owners about their security setup. The conversation usually goes like this:

Me: "What security monitoring do you have?"
Them: "We have antivirus on every computer."
Me: "What about network traffic? Can you see what's going in and out?"
Them: ...silence...

This is the blind spot. Antivirus checks what's ON your computers. But nobody checks what's FLOWING THROUGH your network. A hacker stealing data over DNS tunneling, a compromised device beaconing to a C2 server, lateral movement between machines — antivirus won't catch any of it.

You need two types of monitoring. And no, you don't need a $200K/year SIEM to get them.

HIDS vs. NIDS: A 60-Second Primer

	HIDS (Host-based IDS)	NIDS (Network-based IDS)
What it watches	Individual devices (endpoints)	Network traffic flow
Detects	File changes, process anomalies, login attempts, malware	Port scans, intrusion attempts, data exfiltration, C2 beaconing
Tool	Wazuh Agent	Suricata IDS
Where it runs	On each endpoint	On a network sensor or device
Analogy	Security camera inside each room	Guard at the building entrance

HIDS tells you what happened on a machine. NIDS tells you what's happening on the wire.

You need both. Here's a real example:

A Wazuh alert says "3 failed SSH logins from 182.23.XX.XX". That's HIDS.
Suricata simultaneously sees "182.23.XX.XX is port-scanning 47 services on your network". That's NIDS.

Combined? You know it's not a typo — it's an active attacker probing your infrastructure. Block them instantly.

What We Built: Wazuh + Suricata → ClickHouse → AI

In VRadar, we integrated both HIDS and NIDS into a single pipeline. Here's how the data flows:

                    HIDS Pipeline
Windows/Linux/Mac ──→ Wazuh Agent ──→ Wazuh Manager
                                            │
                                            ▼ webhook
                                      VRadar Backend ──→ ClickHouse
                                            │              (security_logs)
                                            ▼
                                      AI Operator ──→ Triage + Auto-Response

                    NIDS Pipeline  
Network Traffic ──→ Suricata IDS ──→ eve.json
                                        │
                                  Wazuh Agent (monitors eve.json)
                                        │
                                  Wazuh Manager ──→ Custom Rules 100100-100104
                                        │
                                        ▼ webhook
                                  VRadar Backend ──→ ClickHouse
                                                     (nta_events)

Both pipelines converge into the same backend. One dashboard. One AI engine analyzing everything.

The Technical Integration (For the Engineers)

Getting Suricata to talk to Wazuh cleanly wasn't trivial. Here are the challenges we solved:

1. Interface Detection on Windows

Suricata crashes if you pass it a friendly interface name like "Wi-Fi" or "Ethernet". It needs the NPF device path: \Device\NPF_{GUID}. Our installation script auto-detects this:

# Convert friendly name → NPF device path (Suricata requirement)
$adapter = Get-NetAdapter | Where-Object { $_.Status -eq 'Up' -and $_.InterfaceDescription -notmatch 'Loopback|Virtual|Hyper-V' } | Select-Object -First 1
$npcapDevice = "\\Device\\NPF_$($adapter.InterfaceGuid)"

2. Rule File Auto-Detection

Suricata ships different rule files depending on version. Our script scans the actual rules/ directory and rewrites suricata.yaml to match:

$rules = Get-ChildItem "$env:ProgramFiles\Suricata\rules" -Filter "*.rules" |
    Where-Object { $_.Name -notmatch 'dnp3|modbus|ipsec' }  # Exclude ICS rules

3. JSON Decoder Limit

Suricata's eve.json events are large (800+ bytes). Wazuh's default JSON decoder limit (256) truncates them:

analysisd.decoder_order_size=1024  # Increased from 256

4. Custom Wazuh Rules for Suricata

Suricata alerts come through Wazuh's rule 86600 at level 0 (ignored by default). We created custom rules 100100-100104 to elevate them:

<rule id="100100" level="3">
  <if_sid>86600</if_sid>
  <field name="event_type">^flow$</field>
  <description>Suricata: Network flow event</description>
</rule>

5. One-Click Installation

The biggest challenge: making all of this install with one command. Our agent script handles 6 steps automatically:

Clean up any existing Wazuh/Suricata installation
Register with Wazuh Manager
Install Wazuh Agent (version-matched to Manager)
Configure HIDS monitoring
Install Npcap + Suricata IDS
Wire Suricata → Wazuh → VRadar pipeline

Works on Windows, Linux, and macOS.

What You See in the Dashboard

Once both HIDS and NIDS are running, the VRadar dashboard shows:

HIDS Tab (System Alerts)

Security alerts from Wazuh (3,000+ detection rules)
Alert severity distribution (Critical/High/Medium/Low)
AI-powered triage decisions with confidence scores
One-click IP blocking via Wazuh Active Response

NIDS Tab (Network Monitoring)

Suricata IDS events (flow, DNS, HTTP, TLS)
Severity breakdown over 7 days
Protocol distribution and traffic patterns
Source/destination IP analysis with geolocation

Threat Map

Real-time world map showing attacks hitting your network
SVG-based Mercator projection with animated attack lines
Data from both HIDS (login attempts, malware) and NIDS (port scans, intrusion attempts)

Security Score

9-factor scoring including both HIDS and NIDS health
NIDS Monitoring is one of the 9 scoring factors (10% weight)
Getting both working pushes your score above 80 (Grade B → A territory)

The Cost Argument

Here's what dual HIDS + NIDS monitoring costs at scale:

Vendor	HIDS + NIDS	Monthly Cost (50 devices)
Arctic Wolf	Managed SOC	$3,500+
Blumira	SIEM + IDS	$850+
SentinelOne + Darktrace	EDR + NDR	$2,500+
VRadar	Wazuh + Suricata + AI	$1,250

We can offer this pricing because:

Wazuh and Suricata are open-source — $0 licensing
AI triage via GPT-4o-mini — $0.15/1M tokens (we spend ~$2-5/tenant/month)
ClickHouse for log storage — handles millions of events on a single server
Solo operation — no sales team, no marketing department (yet)

Lessons for Other Builders

If you're building security tooling for SMBs:

Don't make users choose between HIDS and NIDS. They need both. Bundle them.
Auto-install everything. If setup takes more than one command, adoption drops to near zero.
AI triage is table stakes now. GPT-4o-mini costs almost nothing. Use it to reduce alert fatigue.
Suricata on Windows is possible but painful. Budget extra time for NPF device paths, rule-file compatibility, and threshold configs.
Log everything to a columnar DB. ClickHouse handles millions of events for $0 and queries complete in milliseconds.

Try It

VRadar is live at vradar.io — plans start at $25/device/month for dual HIDS + NIDS monitoring with AI-powered threat analysis.

If you're running an SMB with no security monitoring (or just antivirus), you're exactly who we built this for.

I'm Dong, a solo developer from Vietnam building affordable security tools. If you have questions about integrating Wazuh + Suricata or building security products for the SMB market — ask me anything in the comments.

Tags: #cybersecurity #wazuh #suricata #HIDS #NIDS #ai #SOC #startup #opensource #buildinpublic

From Zero to 140 Features: How I Built a Cloud SOC Platform as a Solo Developer

Nguyen Dong — Fri, 06 Mar 2026 14:39:52 +0000

How AI, open-source security tools, and relentless iteration turned a side project into a full-featured Security Operations Center.

The Problem

Here's a number that should scare you: 90% of small and medium businesses in Southeast Asia have zero security monitoring.
Not weak monitoring. Not basic monitoring. Zero.
They run their business, store customer data, process payments — all with no visibility into who's poking around their network. The reason? Traditional SOC platforms cost $100-300 per device per month. For an SMB with 50 endpoints, that's $60,000-$180,000 per year. Most can't justify that, so they just... hope for the best.

I decided to fix this. Alone.

What I Built

VRadar is a cloud-native SOC (Security Operations Center) platform that monitors networks, detects threats using AI, and responds automatically. Here's what 140+ features look like after 36 development phases:

The Core Stack

HIDS: Wazuh Manager 4.14.2 (3,000+ detection rules) — monitors every host
NIDS: Suricata IDS — analyzes network traffic in real-time
Database: PostgreSQL for relational data, ClickHouse for 1M+ security logs (fast analytical queries)
Cache: Redis with JWT blacklist for instant token revocation
Frontend: Next.js 15 with dark cyberpunk dashboard
Backend: Node.js + Express + TypeScript, 32 API modules
AI: GPT-4o-mini for threat analysis + autonomous agents ### The Five AI Agents This is what makes VRadar different. Instead of just collecting logs and showing dashboards, VRadar has 5 autonomous AI agents that actually do things:
AI Operator — Triages every alert automatically. Reads the alert, checks threat intelligence (AbuseIPDB, VirusTotal, MITRE ATT&CK), assigns severity, and decides if it needs human attention. Handles 80% of alerts without human intervention.
AI Monitor — Runs 9 health checks every 30 minutes (6 infrastructure + 3 security). Detects anomalies, generates incidents, escalates via Telegram.
AI Optimizer — Self-defense mechanism. When it detects flooding or scanning patterns, it auto-blocks attacking IPs and adjusts firewall rules.
AI Care — Customer support chatbot powered by RAG (Retrieval-Augmented Generation). Trained on product documentation, answers questions 24/7.
AI Marketing — Generates SEO-optimized blog posts and social media content from knowledge base. ### Security That Watches Itself The platform doesn't just monitor your network — it monitors itself:
HIDS + NIDS running on the VRadar server itself
Auto-escalation: Alert → AI Triage → Incident → Notification → Auto-Response
Threat defense middleware that auto-blocks IPs showing scanning/flooding behavior

- Compliance scoring against ISO 27001 (28 controls), PCI DSS v4.0 (27 controls), NIST CSF 2.0 (25 controls)

The Architecture

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│   Wazuh     │────▶│              │────▶│  ClickHouse │
│   Agents    │     │   Backend    │     │  (1M+ logs) │
└─────────────┘     │   (Node.js)  │     └─────────────┘
                    │              │
┌─────────────┐     │  32 Modules  │     ┌─────────────┐
│  Suricata   │────▶│  5 AI Agents │────▶│ PostgreSQL  │
│   (NIDS)    │     │  80 Controls │     │  (Prisma)   │
└─────────────┘     └──────┬───────┘     └─────────────┘
                           │
                    ┌──────▼───────┐
                    │   Next.js    │
                    │  Dashboard   │
                    │ (Dark Theme) │
                    └──────────────┘

12 Docker containers, orchestrated with Docker Compose. The entire platform runs on a single VPS with 23GB RAM.

Lessons Learned Building Solo

1. AI is a Force Multiplier, Not a Replacement

GPT-4o-mini costs ~$2-5 per tenant per month for alert triage. At that price, every SMB can afford AI-powered security analysis. But it's not magic — you need:

Structured prompts with context (alert data + threat intel + historical patterns)
Fallback logic when AI fails (and it will)
Caching (Redis) to avoid redundant API calls — saved 94% on AI costs ### 2. Open Source is Your Superpower Wazuh + Suricata give you enterprise-grade HIDS + NIDS for free. The real value I added:
Integration layer (webhook forwarding, ClickHouse storage)
AI triage on top of raw alerts
Multi-tenant SaaS wrapper
One-click agent installation scripts (Windows/Linux/macOS) ### 3. Security is Non-Negotiable from Day 1 Before going public, I did a full security hardening:
Penetration test: reduced risk from 6.2/10 to 2.8/10
OWASP Top 10: 9/10 pass
3-layer rate limiting (Nginx → Express → per-endpoint)
MFA, JWT blacklist, bcrypt-12, AES-256-GCM encryption
ClickHouse/Redis authentication, Cloudflare DDoS protection
Fail2ban banned 14 attacking IPs within the first 30 minutes of installation ### 4. Ship Fast, But Track Everything 36 phases in 5 weeks. Every feature documented in CONTEXT.md (1,200+ lines). Every commit purposeful. The key: time-box features to 2-4 hours max, ship, observe, iterate. --- ## The Numbers | Metric | Value | |:---|:---| | Features | 140+ | | Development phases | 36 | | API modules | 32 | | Compliance controls | 80 (ISO + PCI + NIST) | | Security logs stored | 1,062,253 | | AI agents | 5 | | Docker services | 12 | | Pentest risk score | 2.8/10 | | QA score | 8.0/10 | | Starting price | $25/device/month | | Competitor price | $100-300/device/month | --- ## What's Next VRadar is live and serving customers in Vietnam. We're expanding to ASEAN markets with a simple pitch: get 80% of enterprise SOC capabilities at 20% of the cost. The platform is built for SMBs with 10-500 endpoints. If you're an IT manager tired of having zero visibility into your security posture, or an MSSP looking for a white-label SOC platform — I'd love your feedback. 🔗 vradar.io --- I'm Dong, a developer from Vietnam building security tools for businesses that can't afford a Fortune 500 security budget. Ask me anything in the comments. ---