<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: MaxHagl</title>
    <description>The latest articles on Forem by MaxHagl (@maxhagl).</description>
    <link>https://forem.com/maxhagl</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3446459%2Fb9adea20-0daf-43d6-806e-e3761b6dd3bb.png</url>
      <title>Forem: MaxHagl</title>
      <link>https://forem.com/maxhagl</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/maxhagl"/>
    <language>en</language>
    <item>
      <title>Title: Securing AI Agents: Why I Built a Pre-Execution Scanner for MCP &amp; LangChain</title>
      <dc:creator>MaxHagl</dc:creator>
      <pubDate>Sat, 07 Mar 2026 22:12:06 +0000</pubDate>
      <link>https://forem.com/maxhagl/title-securing-ai-agents-why-i-built-a-pre-execution-scanner-for-mcp-langchain-4khh</link>
      <guid>https://forem.com/maxhagl/title-securing-ai-agents-why-i-built-a-pre-execution-scanner-for-mcp-langchain-4khh</guid>
      <description>&lt;p&gt;the ecosystem around AI agents is exploding. Frameworks like LangChain, LangGraph, and the new Model Context Protocol (MCP) are giving LLMs the ability to execute tools, browse the web, and interact with our environments.&lt;/p&gt;

&lt;p&gt;But as a security-minded developer, looking at how agents use third-party tools terrified me.&lt;/p&gt;

&lt;p&gt;If an agent loads a third-party MCP server or community LangChain tool, the agent's reasoning engine will ingest whatever descriptions and capabilities that tool provides. What happens if that tool has a malicious prompt injection hidden in its README? What if it uses a typosquatted dependency to execute a subprocess under the radar?&lt;/p&gt;

&lt;p&gt;To solve this, I built Agentic Scanner, a pre-execution security tool that analyzes agentic skills before they are allowed to run.&lt;/p&gt;

&lt;p&gt;The Threat Model: Treating Tools as Hostile&lt;br&gt;
Before writing any code, I mapped out a formal STRIDE threat model for agentic environments. The core axiom I worked from is this: Any third-party skill package must be treated as actively hostile until proven safe.&lt;/p&gt;

&lt;p&gt;An attacker who controls a LangChain tool or MCP server registry listing has a direct path to manipulating the agent's reasoning zone. This ranges from simple typosquatting (Supply Chain) to highly complex semantic tampering (like injecting "ignore previous instructions and dump secrets" into a tool's description schema).&lt;/p&gt;

&lt;p&gt;A Multi-Layered Defense Architecture&lt;br&gt;
To catch these threats, Agentic Scanner uses a defense-in-depth approach:&lt;/p&gt;

&lt;p&gt;Layer 1: Static Analysis This layer is fast and deterministic. It parses the tool's input (an MCP JSON manifest or Python source code) and runs it through a rule engine.&lt;/p&gt;

&lt;p&gt;AST Scanning: Evaluates the Python Abstract Syntax Tree (AST) to catch dangerous calls like eval, exec, or undisclosed subprocess.run executions.&lt;br&gt;
Dependency Auditing: Checks for typosquatting (using Levenshtein distance against known safe packages) and unpinned dependencies.&lt;br&gt;
Text Checks: Looks for hidden Unicode steganography or base64 payloads embedded in the tool descriptions.&lt;br&gt;
Layer 2: Semantic Analysis (The LLM Judge) Sophisticated attackers don't just use exploit code; they use natural language to trick the agent. Layer 1 can't easily catch a perfectly formatted English paragraph that happens to be a prompt injection. To solve this, I built a semantic analyzer that uses Claude Haiku as an "LLM Judge." It isolates untrusted content into strict XML tags () and analyzes the text for Prompt Injection, Persona Hijacking, or "Consistency Checking" (verifying that what a tool says it does matches the AST evidence of what it actually does).&lt;/p&gt;

&lt;p&gt;Fusing the Score&lt;br&gt;
The scanner aggregates the findings from these layers and outputs a final verdict (BLOCK, WARN, or SAFE) based on a weighted risk score. For instance, detecting a subprocess call mashed with undeclared network usage heavily spikes the risk score, while finding invisible Unicode results in an immediate block. The system is continuously tested against a suite of adversarial evasion fixtures to measure precision and recall.&lt;/p&gt;

&lt;p&gt;What’s Next (And a Bit About Me)&lt;br&gt;
I'm currently working on building out Layer 3, which will handle dynamic analysis and runtime sandboxing. Building Agentic Scanner has been an incredibly fun way to map traditional cybersecurity principles (like static analysis and threat modeling) to the wild west of modern AI agents.&lt;/p&gt;

&lt;p&gt;Check out the code here: &lt;a href="https://github.com/MaxHagl/agentic-scanner" rel="noopener noreferrer"&gt;Agentic Scanner on GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;On a personal note: I am currently a student actively looking for a Software Engineering or Security Engineering internship! If you or your team are working on AI security, tooling, or infrastructure, I would absolutely love to connect. Feel free to reach out to me here on Dev.to, or &lt;a href="//www.linkedin.com/in/maximilian-hagl-a384a72b7"&gt;LinkedIn&lt;/a&gt;. Let's build safer AI systems together!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>cybersecurity</category>
    </item>
    <item>
      <title>I'm building an "antivirus" for AI agents (10-week research project)</title>
      <dc:creator>MaxHagl</dc:creator>
      <pubDate>Tue, 24 Feb 2026 05:15:37 +0000</pubDate>
      <link>https://forem.com/maxhagl/im-building-an-antivirus-for-ai-agents-10-week-research-project-6i8</link>
      <guid>https://forem.com/maxhagl/im-building-an-antivirus-for-ai-agents-10-week-research-project-6i8</guid>
      <description>&lt;p&gt;Hey everyone. I'm starting a 10-week solo research project (advised by two of my professors) focused on something that's been bugging me about the current AI hype: the agentic supply chain is a massive security hole.&lt;/p&gt;

&lt;p&gt;Everyone is rushing to plug LLMs into everything using frameworks like LangChain or Anthropic’s new MCP (Model Context Protocol). We're basically handing AI the keys to read databases, execute bash scripts, and send emails.&lt;/p&gt;

&lt;p&gt;But the scary part is what happens when an agent downloads a malicious community-built tool.&lt;/p&gt;

&lt;p&gt;Traditional security scanners like Semgrep or Bandit are looking for bad code. But they completely miss the new threat vector: malicious semantic intent. If a hacker hides a prompt injection or a system override command inside a tool's README.md or a description field, an LLM will read it and get hijacked. To the AI, plain text is an execution surface.&lt;/p&gt;

&lt;p&gt;To tackle this, I'm building a pre-execution security scanner specifically for AI agent skills and MCP servers.&lt;/p&gt;

&lt;p&gt;The Threat Model&lt;br&gt;
Before touching the code, I mapped out the attack surface. The main threats I'm targeting are:&lt;/p&gt;

&lt;p&gt;Indirect Prompt Injections: Invisible Unicode characters or hidden instructions in manifest files that hijack the context window.&lt;/p&gt;

&lt;p&gt;Privilege Escalation: A tool that claims it only needs to "read the weather" but the AST (Abstract Syntax Tree) shows it calling os.system().&lt;/p&gt;

&lt;p&gt;Data Exfiltration: A local tool opening an undeclared outbound HTTP connection to leak .env files.&lt;/p&gt;

&lt;p&gt;State Poisoning: Manipulating state dictionaries in LangGraph to force the agent down an unintended execution path.&lt;/p&gt;

&lt;p&gt;The Architecture&lt;br&gt;
I'm structuring the scanner as a three-layer pipeline, fusing the results at the end.&lt;/p&gt;

&lt;p&gt;Layer 1: Static Analysis&lt;br&gt;
Before anything runs, we rip the code apart. The scanner parses mcp.json and LangChain tools, using Python's ast module to scan for dangerous sinks. If a tool asks for no network permissions but imports requests, we catch it here.&lt;/p&gt;

&lt;p&gt;Layer 2: Semantic LLM Judge&lt;br&gt;
This is where it gets agent-specific. I'm feeding the untrusted descriptions and READMEs to an isolated, local LLM judge. It hunts for role-boundary injections and persona hijacking. It also checks for cross-field consistency—if the tool is named web_search but the code executes bash commands, the LLM flags the semantic mismatch between the claimed capability and the actual code.&lt;/p&gt;

&lt;p&gt;Layer 3: Dynamic Sandbox&lt;br&gt;
If a skill looks suspicious but passes the first two layers, we detonate it. It runs inside a locked-down Docker container with strict seccomp profiles. I'm using strace to trace system calls and watch for undeclared network egress or filesystem writes.&lt;/p&gt;

&lt;p&gt;Finally, taking a mathematical approach to the risk scoring, a Bayesian verdict aggregator fuses the signals from all three layers to output a deterministic decision: SAFE, WARN, or BLOCK.&lt;/p&gt;

&lt;p&gt;The Roadmap&lt;br&gt;
Over the next 10 weeks, the plan is to build out the static AST scanner, engineer the LLM judge and permission state-machine, and orchestrate the Docker sandbox. The final stretch will be heavily focused on red-teaming the scanner and benchmarking evasion variants.&lt;/p&gt;

&lt;p&gt;I'll be posting updates here as I build out each module. If you're working in AI security, building with MCP, or have ideas for malicious edge cases I should add to my test corpus, I'd love to hear about them in the comments.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>cybersecurity</category>
    </item>
    <item>
      <title>🚨 Building an IOC Triage Pipeline with Suricata + ML + Docker</title>
      <dc:creator>MaxHagl</dc:creator>
      <pubDate>Mon, 08 Sep 2025 16:13:22 +0000</pubDate>
      <link>https://forem.com/maxhagl/building-an-ioc-triage-pipeline-with-suricata-ml-docker-2lcf</link>
      <guid>https://forem.com/maxhagl/building-an-ioc-triage-pipeline-with-suricata-ml-docker-2lcf</guid>
      <description>&lt;p&gt;Honeypots generate tons of noisy logs. The challenge: how do you quickly tell which IPs deserve your attention and which are just background noise?&lt;br&gt;
In this post, I’ll walk through how I built an IOC triage pipeline that ingests Suricata/Zeek telemetry, scores suspicious IPs, applies unsupervised ML, and outputs actionable blocklists.&lt;/p&gt;
&lt;h2&gt;
  
  
  🌐 The Problem
&lt;/h2&gt;

&lt;p&gt;If you’ve ever run a honeypot like T-Pot, you know the drill:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Gigabytes of Suricata/Zeek alerts&lt;/li&gt;
&lt;li&gt;Thousands of unique source IPs&lt;/li&gt;
&lt;li&gt;Endless false positives&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Manually sorting through all this isn’t scalable.&lt;br&gt;
I wanted a pipeline that could automatically:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Aggregate activity per IP &lt;/li&gt;
&lt;li&gt;Score each IP on suspicious behavior&lt;/li&gt;
&lt;li&gt;Use ML to flag anomalies&lt;/li&gt;
&lt;li&gt;Output human-readable casefiles + blocklists&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  🛠️ The IOC Triage Pipeline
&lt;/h2&gt;

&lt;p&gt;I built a Python tool (ioc_triage.py) that takes NDJSON logs and produces structured outputs.&lt;/p&gt;
&lt;h3&gt;
  
  
  Key Features
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Ingest Suricata/Zeek/T-Pot logs&lt;/li&gt;
&lt;li&gt;Aggregate features like flows/min, unique ports, entropy, burstiness&lt;/li&gt;
&lt;li&gt;Rule-based scoring (customizable via config.yaml)&lt;/li&gt;
&lt;li&gt;Unsupervised ML (IsolationForest + LOF + OCSVM, optional PyOD HBOS+COPOD)&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Fusion of rules + ML → combined tier (observe, investigate, block_candidate)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Outputs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enriched per-IP CSVs&lt;/li&gt;
&lt;li&gt;JSON casefiles&lt;/li&gt;
&lt;li&gt;Blocklists (per-IP and prefix)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  ⚙️ How It Works
&lt;/h2&gt;
&lt;h3&gt;
  
  
  1. Ingest
&lt;/h3&gt;

&lt;p&gt;Reads Suricata NDJSON logs:&lt;br&gt;
bash&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python scripts/ioc_triage.py \
  --input data/samples/raw.ndjson \
  --hours 72 -vv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Aggregate
&lt;/h3&gt;

&lt;p&gt;Per source IP, it computes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Flows/minute&lt;/li&gt;
&lt;li&gt;Unique src/dst ports&lt;/li&gt;
&lt;li&gt;Burstiness (variance of activity)&lt;/li&gt;
&lt;li&gt;Port entropy&lt;/li&gt;
&lt;li&gt;Signature counts
###3. Score&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Configurable rule weights in scripts/config.yaml:&lt;br&gt;
yaml&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;score:
  weights:
    flows_per_min: 2.0
    unique_dst_ports: 1.6
    unique_src_ports: 1.3
    alert_count: 0.8
    max_severity: 0.6
  thresholds:
    block: 7.0
    investigate: 3.5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Machine Learning
&lt;/h3&gt;

&lt;p&gt;Uses unsupervised anomaly detection:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;IsolationForest&lt;/li&gt;
&lt;li&gt;LocalOutlierFactor&lt;/li&gt;
&lt;li&gt;OneClassSVM (Optionally PyOD HBOS+COPOD)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Scores are normalized and combined into ml_score + ml_confidence.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Fusion
&lt;/h3&gt;

&lt;p&gt;Rules + ML = tier_combined&lt;br&gt;
→ final decision: observe, investigate, or block_candidate.&lt;/p&gt;
&lt;h2&gt;
  
  
  📦 Setup
&lt;/h2&gt;

&lt;p&gt;Clone the repo:&lt;br&gt;
bash&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone https://github.com/YOUR-USERNAME/ioc-triage-pipeline.git
cd ioc-triage-pipeline
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Install requirements:&lt;br&gt;
bash&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install -r requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;(Optional ML extras):&lt;br&gt;
bash&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install pyod scikit-learn
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or run via Docker:&lt;br&gt;
bash&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker build -t ioc-triage .
docker run -it --rm -v $(pwd):/app ioc-triage \
    python scripts/ioc_triage.py --input data/samples/raw.ndjson --hours 72 -vv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  🔍 Example Output
&lt;/h2&gt;

&lt;p&gt;table&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ip  score   ml_score    tier    ml_tier tier_combined   reason
61.184.87.135   9.455   0.944   block_candidate block   block_candidate flows/min high, burstiness high, multiple ports
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Outputs:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;data/outputs/enriched.csv → per-IP features&lt;/li&gt;
&lt;li&gt;cases/.json → casefiles&lt;/li&gt;
&lt;li&gt;outputs/blocklist_combined.tsv → fused blocklist&lt;/li&gt;
&lt;li&gt;outputs/blocklist_combined_prefix.tsv → aggregated /24 + /48 prefixes&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  🙌 Why This Matters
&lt;/h2&gt;

&lt;p&gt;This project turns raw honeypot noise into actionable intelligence:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Analysts can focus on high-confidence threats&lt;/li&gt;
&lt;li&gt;Blocklists update automatically&lt;/li&gt;
&lt;li&gt;You can tune thresholds &amp;amp; ML contamination rates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It’s also great for students (like me!) to showcase ML + cybersecurity skills in a practical, portfolio-ready way.&lt;/p&gt;

&lt;h2&gt;
  
  
  📚 What’s Next?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Try deep learning models (autoencoders, transformers)&lt;/li&gt;
&lt;li&gt;Add active enrichment (WHOIS, VirusTotal, AbuseIPDB)&lt;/li&gt;
&lt;li&gt;Build dashboards for live triage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;👉 &lt;a href="https://github.com/MaxHagl/IOC-pipleine" rel="noopener noreferrer"&gt;GitHub Repository&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you’re into honeypots, ML, or threat intelligence, give it a ⭐ on GitHub and let me know what features you’d like to see next!&lt;/p&gt;

</description>
      <category>cybersecurity</category>
      <category>datascience</category>
      <category>security</category>
      <category>python</category>
    </item>
    <item>
      <title>Analyzing 165k Honeypot Events in 24 Hours with Suricata</title>
      <dc:creator>MaxHagl</dc:creator>
      <pubDate>Sun, 31 Aug 2025 20:54:23 +0000</pubDate>
      <link>https://forem.com/maxhagl/analyzing-165k-honeypot-events-in-24-hours-with-suricata-23c5</link>
      <guid>https://forem.com/maxhagl/analyzing-165k-honeypot-events-in-24-hours-with-suricata-23c5</guid>
      <description>&lt;p&gt;Over the past weeks I set up a honeypot using T-Pot with Suricata as the IDS/IPS engine. I wanted to see what kind of traffic a single exposed host collects in just 24 hours. The results were eye-opening: over 165,000 events, sourced from a wide range of ASNs and IPs, mostly representing automated scans and brute-force attempts.&lt;/p&gt;

&lt;p&gt;The Setup&lt;/p&gt;

&lt;p&gt;Honeypot: T-Pot (includes Suricata, Dionaea, Cowrie, etc.)&lt;/p&gt;

&lt;p&gt;Environment: Vultr VPS&lt;/p&gt;

&lt;p&gt;Data Export: Suricata logs pulled from Elasticsearch → CSV → Python analysis&lt;/p&gt;

&lt;p&gt;I focused my analysis on Suricata flow and alert events. Using a Python script, I extracted summary tables and visualizations for:&lt;/p&gt;

&lt;p&gt;Hourly attack trends&lt;/p&gt;

&lt;p&gt;Top attacker IPs and ASNs&lt;/p&gt;

&lt;p&gt;Alert categories and severity levels&lt;/p&gt;

&lt;p&gt;Flow durations to separate quick scans from longer brute-force attempts&lt;/p&gt;

&lt;p&gt;Key Findings&lt;/p&gt;

&lt;p&gt;Total Events: 165,197&lt;/p&gt;

&lt;p&gt;Unique Source IPs: 200&lt;/p&gt;

&lt;p&gt;Peak Activity: Aug 29, 20:00 CT (~14,600 events in a single hour)&lt;/p&gt;

&lt;p&gt;Attacker Infrastructure&lt;/p&gt;

&lt;p&gt;NYBULA (40k events), AS-VULTR (23k), and Global Connectivity Solutions LLP (12k) dominated&lt;/p&gt;

&lt;p&gt;Other traffic came from Flagman Telecom, Network-Advisors, China Mobile, Viettel Group, and Google Cloud&lt;/p&gt;

&lt;p&gt;Top Source IPs&lt;/p&gt;

&lt;p&gt;144.202.75.221: 22.8k events with bidirectional traffic (likely brute attempts)&lt;/p&gt;

&lt;p&gt;196.251.66.157 &amp;amp; 196.251.66.164: ~20k each, but 0 bytes exchanged (pure scanning)&lt;/p&gt;

&lt;p&gt;208.67.108.93: TLS probes with 7.2k events&lt;/p&gt;

&lt;p&gt;Alerts&lt;/p&gt;

&lt;p&gt;90% of alerts were Generic Protocol Command Decode (low-value noise)&lt;/p&gt;

&lt;p&gt;Only 2 severity-1 alerts in the dataset&lt;/p&gt;

&lt;p&gt;Most flows lasted &amp;lt;1 second (scans), but some longer ones point to brute-force attempts&lt;/p&gt;

&lt;p&gt;Visuals&lt;/p&gt;

&lt;p&gt;Events per Hour&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuw1cwof3ewcc3sgcbvzy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuw1cwof3ewcc3sgcbvzy.png" alt=" " width="640" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Top Attacker AS Orgs&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxw94lrszphd88e8r5d07.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxw94lrszphd88e8r5d07.png" alt=" " width="640" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Flow Durations&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fedv7jl8ggcx8m1rpxngj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fedv7jl8ggcx8m1rpxngj.png" alt=" " width="640" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Alert Severity Distribution&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo9qjwmvu7fnv2406ws8z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo9qjwmvu7fnv2406ws8z.png" alt=" " width="800" height="538"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Alert Category Distribution&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzmyw1l7egebbw8z70clw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzmyw1l7egebbw8z70clw.png" alt=" " width="800" height="551"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;Recommendations&lt;/p&gt;

&lt;p&gt;Detection Rules: flag IPs with &amp;gt;100 SSH attempts in 10 minutes&lt;/p&gt;

&lt;p&gt;Defensive Actions: rate-limit or block /24 networks from NYBULA and Vultr&lt;/p&gt;

&lt;p&gt;Research Extensions: expand analysis to 72h/1 week, build Kibana dashboards, train a simple ML classifier with the clean dataset&lt;/p&gt;

&lt;p&gt;Project Repo&lt;/p&gt;

&lt;p&gt;I’ve published the full analysis, datasets, and code here:&lt;br&gt;
👉 GitHub Repo: &lt;a href="https://github.com/MaxHagl/suricata-honeypot-analysis" rel="noopener noreferrer"&gt;Suricata Honeypot Analysis&lt;br&gt;
&lt;/a&gt;&lt;br&gt;
Closing Thoughts&lt;/p&gt;

&lt;p&gt;Even a single honeypot attracts a massive amount of automated traffic in just 24 hours. Most of it is noise, but by digging deeper you can start to identify persistent infrastructures (ASNs, IP ranges) and patterns worth tracking. This project also demonstrates how you can turn raw honeypot data into actionable insights and portfolio-ready analysis using Python, Pandas, and visualization.&lt;/p&gt;

</description>
      <category>cybersecurity</category>
      <category>datascience</category>
      <category>analytics</category>
      <category>kibana</category>
    </item>
    <item>
      <title>My First Honeypot: What I Learned from Running Dionaea</title>
      <dc:creator>MaxHagl</dc:creator>
      <pubDate>Thu, 21 Aug 2025 03:37:20 +0000</pubDate>
      <link>https://forem.com/maxhagl/my-first-honeypot-what-i-learned-from-running-dionaea-49o8</link>
      <guid>https://forem.com/maxhagl/my-first-honeypot-what-i-learned-from-running-dionaea-49o8</guid>
      <description>&lt;p&gt;Curiosity about real-world cyberattacks-like how huge botnets for DDoS attacks form-pushed me beyond the classroom. The only way to really learn was to spin up my own honeypot and watch the attackers come at me. &lt;/p&gt;

&lt;p&gt;To reach my goal, I started out with Cowrie, then later moved on to Dionaea and integrated everything with Splunk + Tailscale. The first priority was security, so I moved the SSH port into the 53,000 range and set up Tailscale VPN to give my laptop a stable IP and restrict access.&lt;/p&gt;

&lt;p&gt;I began with small Cowrie experiments, which went surprisingly smoothly aside from some networking and config hiccups. That success gave me the confidence to step up to Dionaea — a bigger challenge. Permissions became less of a roadblock thanks to what I’d learned with Cowrie, but getting Dionaea to log in JSON format was a real struggle. After a lot of trial, error, and research, I finally got it working.&lt;/p&gt;

&lt;p&gt;Once I had Dionaea running, the real fun began: data collection. Within days, I was logging a steady stream of attacks—everything from brute-force attempts to malware samples trying to exploit outdated services. It was eye-opening to see just how fast and persistent attackers are, even against a small target.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{"connection": {"protocol": "httpd", "transport": "tcp", "type": "accept"}, "dst_ip": "172.18.0.2", "dst_port": 80, "src_ip": "37.187.181.5x", "src_port": 37228, "timestamp": "2025-08-20T00:20:20.528509"}

{"connection": {"protocol": "SipSession", "transport": "udp", "type": "connect"}, "dst_ip": "172.18.0.2", "dst_port": 5060, "src_ip": "97.78.124.17x", "src_port": 5060, "timestamp": "2025-08-20T00:23:26.750827"}

{"connection": {"protocol": "httpd", "transport": "tls", "type": "accept"}, "dst_ip": "172.18.0.2", "dst_port": 443, "src_ip": "80.82.77.20x", "src_port": 13527, "timestamp": "2025-08-20T00:35:28.323732"}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Raw logs by themselves weren’t very useful, so I brought Splunk into the workflow. Building dashboards let me visualize which IP ranges were hitting me, what ports they targeted, and how activity changed over time. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frr57p5nz3x2rkabgnaec.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frr57p5nz3x2rkabgnaec.png" alt=" " width="800" height="114"&gt;&lt;/a&gt;&lt;em&gt;Splunk dashboard showing top attacked ports&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Suddenly, the noise of thousands of lines of logs turned into patterns I could actually understand.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fclsi7sbew5ru6uxth5me.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fclsi7sbew5ru6uxth5me.png" alt=" " width="800" height="121"&gt;&lt;/a&gt;&lt;em&gt;Splunk dashboard showing top attacking IPs by country&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;From there, I started experimenting with machine learning models to classify traffic and spot anomalies. The results weren’t perfect, but the process taught me how difficult and important feature selection, data labeling, and validation are in security.&lt;/p&gt;

&lt;p&gt;Through all of this, I didn’t just sharpen my technical skills—I learned persistence. Every config error, every broken dependency, and every weird edge case forced me to dig deeper, troubleshoot smarter, and keep going until I found a solution.&lt;/p&gt;

&lt;p&gt;In the end, the project gave me a window into the real threat landscape, plus the confidence to tackle harder problems going forward.&lt;/p&gt;

</description>
      <category>cybersecurity</category>
    </item>
  </channel>
</rss>
