Forem: jtarkington77

When Malware Starts Prompt-Engineering Itself

jtarkington77 — Fri, 05 Dec 2025 13:30:00 +0000

Somewhere right now, there’s a sketchy little script on a compromised Windows box asking an AI model how to be sneakier.

Not a red-teamer. Not a malware dev in Visual Studio. Just the malware itself, pinging an LLM API and basically saying:

“Rewrite me so I stop getting caught.”

That’s the shift we’re watching in real time.

A recent report from Google’s Threat Intelligence Group (GTIG) walks through the first real wave of "AI-enabled" malware seen in actual operations, not just on conference slides. We’re talking about names like PROMPTFLUX, PROMPTSTEAL, and FRUITSHELL — all trying, in different ways, to bolt large language models (LLMs) onto old-school tradecraft.

The punchline is simple:

The malware code is still pretty rudimentary.
The architecture, however, is brilliant.

If you only look at the code quality, you’re going to miss what they’re really doing.

They aren't building better bombs; they are building smarter delivery systems.

What’s Actually New Here?

Threat actors have been using AI as a glorified Stack Overflow for a while:

“Write me a PowerShell script to do X,” or “Fix this compile error.”

That’s boring. That’s just a developer using tools.

The interesting part in this report is the runtime angle: malware calling out to LLMs during execution to change how it behaves on the victim machine.

Instead of shipping a fully baked malware family with a static decision tree, they are shipping a thin client and outsourcing the tactical decision-making to Gemini, Hugging Face models, or whatever else is cheap and accessible.

It’s basically malware with a plug-in.

The plug-in just happens to be a Large Language Model.

Meet the First Wave (It’s Not Just Skiddies)

The names might sound like someone let an intern name the projects, but the actors behind them are serious.

In fact, Google’s reporting ties PROMPTSTEAL to APT28 (aka Fancy Bear / Forest Blizzard) — the Russian military intelligence group responsible for some of the most significant cyberattacks of the last decade.

When nation-states enter the chat, it’s no longer a science experiment.

Here’s the breakdown of what is actually hitting networks.

PROMPTFLUX: The Polymorphic Loop

PROMPTFLUX is a VBScript-based dropper that talks to Google’s Gemini API. Its job isn’t to be clever on its own. Its job is to ask Gemini to rewrite its own source code.

According to the analysis, PROMPTFLUX attempts to generate new, obfuscated iterations of itself as often as every hour. It then saves these fresh variants into persistence locations like the Startup folder.

The old way: attackers pack/encrypt the malware once before sending it.

The AI way: the malware repacks itself constantly, asking an LLM to generate unique variations that break signature-based detection.

PROMPTSTEAL: The Consultant on Retainer

PROMPTSTEAL is the family linked to APT28. It’s a Python data miner that uses the Hugging Face API to query a Qwen2.5-Coder model for Windows commands.

Roughly, the loop looks like this:

Malware scrapes system info.
Malware sends context to the LLM: “Here is the environment. What should I run to find sensitive files?”
The LLM replies with specific commands for discovery and exfiltration.

The malware isn’t shipping with a hard-coded list of targets.

It’s shipping with a loop that keeps asking an AI for the next best move.

FRUITSHELL & PROMPTLOCK

FRUITSHELL is a reverse shell that uses hard-coded prompts to dynamically adjust how it evades detection and communicates back to the attacker.

GTIG also calls out PROMPTLOCK, an AI-powered ransomware proof-of-concept. It doesn’t just encrypt files; it feeds hard-coded prompts into a local LLM to generate Lua scripts for scanning, exfiltration, encryption, and even shaping ransom-note content in ways that crank up the psychological pressure on the victim.

In other words: even the “PoC” stuff is showing where this is going.

AI APIs Are the New C2

Here’s the mental shift Blue Teams need to make immediately:

If your endpoint is quietly calling Gemini, Hugging Face, or a custom model on a VPS, that is not just "weird traffic." That is Command and Control (C2).

It just doesn’t look like the C2 we grew up on. Traditionally, C2 has meant:

HTTP(S) beacons to sketchy, low-reputation domains
Encrypted traffic to bulletproof hosts
DNS tunneling

Now, imagine a world where your malware beacons out to generativelanguage.googleapis.com or api-inference.huggingface.co.

From the network firewall’s perspective, it looks like:

“Oh, this user is just using an AI productivity tool.”

If you don’t have a strong baseline on who in your environment is allowed to talk to those APIs — from which hosts and using which identities — you’ve basically given attackers a free, encrypted outbound channel with the label “Innovation” slapped on top.

Prototype Malware, Real Humans

On paper, this is all "experimental" and "nascent." In practice, it tells you a lot about how threat actors are adapting.

The report calls out a simple, terrifying trick regarding guardrails. When Gemini initially refused to generate offensive code for PROMPTFLUX, the operator didn’t give up. They simply reframed the request as a Capture-The-Flag (CTF) exercise.

Suddenly, the same model that said “I cannot assist with malware” handed over useful building blocks for obfuscation and persistence.

We keep talking about "AI-powered threats" like the model is the villain. It’s not.

The dangerous part is the human feedback loop:

They learn which prompts bypass safety filters.
They learn which open-source models (like Qwen) have zero or minimal safety filters.
They learn how far they can push "legitimate" APIs before they get rate-limited or blocked.

“This Isn’t Skynet” Is Not Comforting

Right now, almost every write-up is careful to say some version of:

“AI-enabled malware is still immature and often detectable.”

And that’s true.

A lot of the samples have sloppy execution.
They depend on external network access to work at all.
They leave very obvious artifacts for EDR (like massive Python libraries dropped on disk).

But "immature" doesn’t mean "safe." It means we’re early in the learning curve.

We’ve seen this movie before with polymorphic engines in the 90s and Malware-as-a-Service in the 2010s. LLMs are just the latest mutator.

The difference this time is velocity.

Instead of a niche assembly coder painstakingly building a mutation engine, you now have copy-paste access to obfuscation strategies and cheap AI APIs doing the heavy lifting.

So What Do We Actually Do About It?

The good news: you don’t need a massive "AI for XDR" budget to start taking this seriously. You need to treat AI interactions with the same suspicion you treat PowerShell.

1. Treat AI endpoints like high-risk C2

Start small and practical. Decide which systems are allowed to talk to AI services at all.

Developer boxes? Maybe.
Domain controllers and file servers? Absolutely not.

Put egress controls around common AI endpoints. Alert on new processes reaching out to those domains — especially things that shouldn’t be running Python or VBScript in the first place.

2. Stop sprinkling long-lived API keys

If your organization is experimenting with AI, you are likely leaving API keys hardcoded in scripts or environment variables.

Centralize API calls through a gateway.
Scope tokens tightly (rate limits, IP ranges, least privilege).
Rotate keys.

The more AI access you casually hand to random machines, the more surface area an attacker has to hijack your paid quota to generate their malware.

3. Hunt for the “thinking” latency

There is a specific behavioral quirk to this malware: latency.

Unlike a hard-coded script that executes instantly, AI-enabled malware has to pause:

Process starts.
Pause (network call to API).
Wait (LLM token generation).
Execute new command.

That pause — the "thinking time" — is a huntable anomaly. Look for processes that hang with an open network connection to an AI provider before spawning a child process or writing new script content to disk.

Don’t Sleep on the Prototypes

Right now, PROMPTFLUX and its cousins look like what they are: first attempts. They’re weird. They’re noisy. And in a lot of environments, they’re catchable.

That’s exactly why they matter.

They show us what attackers are trying to learn: how to turn AI APIs into on-demand C2, and how to mutate code fast enough to outpace signatures.

If you’re on the Blue Team, this isn’t the moment to panic.

It’s the moment to quietly adjust your mental model.

AI isn’t just a thing your security vendor bolts onto their product slide. It’s now a resource your adversaries can rent by the hour.

The malware is still dumb.

The operators aren't.

Your job is to make sure your defenses aren’t either.

See more of my work and tools

Portfolio: https://jtarkington-portfolio.netlify.app

GitHub: https://github.com/jtarkington77

Sealed Box AI: A Runbook for Owning Your Own Local-Only AI Stack

jtarkington77 — Sun, 30 Nov 2025 21:13:34 +0000

I don’t really trust “private AI” that still runs on someone else’s hardware.

Every vendor has some version of: “Your data is safe, we don’t train on it, trust us.” But at the end of the day, you’re still piping sensitive work into a black box you don’t control, on infrastructure you can’t see, under policies that can change whenever it’s convenient.

So I started designing what I actually wanted:

A sealed box, on my own hardware, where AI works for me instead of on me.

That turned into the Sealed Box AI Runbook – a full write-up on how I run a local-only AI stack with a worker model, a watchdog model, local RAG, and agents, all behind my own guardrails.

GitHub repo: https://github.com/jtarkington77/sealed-box-ai-runbook

What “Sealed Box AI” means here

This isn’t “install one app and call it a day.” It’s an architecture and a set of habits:

Worker model – the main model that answers questions, writes code, drafts reports, etc.
Watchdog model – a second model that reads summaries of what the worker is doing and scores it for risky behavior, policy violations, or weird patterns.
Local RAG – a retrieval layer (Qdrant in my case) that only indexes content I explicitly feed it.
Agents – tightly scoped tools (internet research, intel sync, lab actions) that the worker can call, but only in specific ways.
Strict boundaries – clear lanes between “things the model can see,” “things the model can touch,” and “things that never leave this box.”

The goal isn’t “maximum complexity.” It’s:

Use powerful models, but own the stack and the blast radius.

Who this is for

If you’re:

Running a homelab or small environment and want AI without handing everything to a cloud vendor
Doing blue-team / security work and don’t want incident data living in random SaaS logs
Building tools where privacy, provenance, and control actually matter

…this runbook is written for you.

It’s not a sales deck. It’s “here’s how I actually wire this up at home.”

Architecture at a glance

The stack is meant to be understandable even if you’re not an ML engineer.

High-level flow:

You → Open WebUI (or your UI of choice)
Open WebUI → Worker model
Worker can:
- Call local tools/agents (research, scripts, retrieval)
- Read from local RAG (Qdrant) for your own notes, docs, logs
Each run generates a summary + metadata (what tools were used, what it tried to do, etc.)
The watchdog model reads those summaries and:
- Flags risky behavior or policy violations
- Scores runs, so you can spot “spiky” or odd sessions
Everything lives behind your own network controls:
- Reverse proxy / zero-trust edge if you expose anything
- No direct inbound to the models
- Clear separation between “inside the box” and “outside traffic”

Think of it as combining:

Self-hosted LLM stack
Minimal SIEM-style visibility
Old-school “story of the system” runbook

Hardware tiers: reality, not fantasy

The runbook doesn’t assume you own a data center. I break things down by VRAM tiers and what you trade off at each level:

12–16 GB VRAM – Bare minimum
- Smaller models, fewer concurrent agents, more careful prompt design.
16–24 GB VRAM – Comfortable for a primary box
- Better 7B/8B models, more tools, more headroom.
24+ GB VRAM – Where it gets fun
- Multiple agents, stronger models, more experimentation without everything falling over.

The idea is: you can start on what you have now, and grow into the bigger build as you go.

What the runbook actually gives you

The repo isn’t just “here’s an idea.” It’s a practical guide:

Conceptual design – how worker + watchdog + RAG + agents fit together
Model selection notes – what I’m using and why, and what you can swap
Network and host layout – how I separate concerns and keep the blast radius small
Operational habits – how to think about logging, summaries, and watching your own AI stack
Build-sheet style notes – so you can adapt it to your own hardware instead of copying blindly

If you’ve ever wanted to move from:

“I hope my vendor’s ‘private AI’ story holds up”

“I know exactly where this data lives and what these models can touch”

…that’s the gap this runbook is trying to close.

Grab the full runbook

If any of this resonates, the full write-up lives here:

Sealed Box AI Runbook

https://github.com/jtarkington77/sealed-box-ai-runbook

I’ll keep iterating on it as I test new models, refine the watchdog, and tighten the guardrails. Feedback, arguments, and “you’re missing a huge threat” comments are all welcome.

More of my work & tools:

Portfolio: https://jtarkington-portfolio.netlify.app
GitHub: https://github.com/jtarkington77