Forem: Thomas

The liar's dividend has a second payout, and devs helped build it

Thomas — Thu, 30 Apr 2026 20:25:36 +0000

The liar's dividend has a second payout, and devs helped build it

TL;DR: The "liar's dividend" isn't just about faking things. It's about claiming real things are fake. Detection infrastructure the very thing we built to fight deepfakes is now being used as cover. This is a systems design problem as much as a machine learning one.

I've been sitting with a Forbes piece on digital forensics and deepfakes for a few days, and the part that stuck wasn't the forensics. It was a phrase: "the liar's dividend's second payout."

The first payout, if you haven't heard the term, comes from Chesney and Citron's 2019 paper on deepfakes and democracy. The idea is simple and brutal: once people know synthetic media exists, a bad actor can claim any real, damaging media is fake. You don't need to make a convincing deepfake. You just need enough public doubt to muddy the water.

The second payout is what we built next. And I mean "we" literally — developers, ML engineers, product teams. We built detection tools. Classification APIs. Real-time flagging pipelines. And in doing so, we handed the liars a new prop.

How the escape hatch works

Consider the logic a bad actor now has available:

IF  (incriminating_media EXISTS)
AND (public_awareness_of_deepfakes == HIGH)
AND (detection_tools PRODUCE != 100% certainty)
THEN
   claim "this is AI-generated"
   point to ambiguous classifier output as "proof"
   wait for news cycle to move on

This isn't hypothetical. In 2023, a Slovakian election audio clip of a candidate allegedly discussing election fraud circulated two days before polls opened. The candidate's party called it AI-generated. Analysts were split. The election happened before anyone reached consensus.

That's the second payout: the detection ecosystem itself becomes the alibi. A shrug from a classifier is now a press release.

What I actually see when I run stuff through detection

I use AI or Not when something looks off to me — it handles images, video, and audio, which covers most of what circulates on social platforms. The output is a confidence score, not a verdict. That matters.

A 73% "likely AI" rating on a clip is meaningful signal. It is not a court finding. The problem is that a 73% rating is also something a bad actor can screenshot and frame as "even the detectors aren't sure."

This isn't a flaw in AI or Not specifically. It's a fundamental property of probabilistic classification. Every detection system that produces a confidence score below 100% will have that score weaponized by someone. We built the weapon while trying to build the shield.

The four things I'd do differently (as a builder)

If I were shipping something in this space today, here's where I'd change my assumptions:

Design for legal weight, not just accuracy. A 92% confidence score means nothing in a courtroom without a chain of custody, a known model version, and a documented methodology. If your output might ever be used as evidence, treat it that way from day one — not as an afterthought.
Log model provenance explicitly. Which version of the detector flagged this? What training data was it exposed to? These questions matter the moment someone disputes a finding in public. Most APIs I've worked with don't surface this at all.
Build in uncertainty communication by default. Instead of a single score, surface a distribution. "This result falls in a range where the model produces false positives 18% of the time under these image conditions." Harder to misquote.
Think about the adversarial UI, not just the adversarial input. We spend a lot of time thinking about adversarial examples that fool detectors. We spend almost no time thinking about how bad actors will present detector output to audiences who don't understand what it means.

The forensics paradox

Here's the thing about digital forensics being the "only sure answer" to deepfakes: it requires a trusted institution to perform it, a trusted chain of custody for the media, and a public that believes the institution. All three of those are eroding simultaneously.

A forensic finding from a university lab means less when half your audience thinks universities are politically captured. A chain of custody argument lands differently when the platform hosting the media is actively in a political fight.

I'm not saying detection tools are useless — I keep using AI or Not because the signal is real and it's gotten my antenna up on things I would have scrolled past. But I've started thinking of detection as one input into a much larger trust problem, not as a solution to it.

The liar's dividend was always about epistemics, not technology. We built better detectors and handed the epistemics problem a new set of props.

What actually changes the calculus

A few things that seem underbuilt relative to the detection side:

Provenance standards. The C2PA spec attaches cryptographic provenance to media at capture time. If the camera signs the frame and the signature breaks on edit, that's a different kind of evidence than a classifier score. It's not widespread yet, but it's the right direction.
Legal frameworks for false claims of AI generation. Right now there's almost no cost to wrongly claiming something is a deepfake. A few jurisdictions are looking at this; none have moved fast enough.
Adversarial red-teaming of the human layer. We red-team models constantly. We almost never red-team how users and journalists will misread or be manipulated by model output.

LLM Drift: Why Your AI Detection Pipeline is Quietly Decaying (Kimi K2 Benchmark)

Thomas — Mon, 27 Apr 2026 21:09:02 +0000

A short field report on what current AI detectors actually do when you point them at frontier reasoning model output, and what I changed in my own detection workflow.

I integrate AI detection into a few small side projects—content moderation pre-filters, writing quality flags, etc. The more I relied on detection, the more concerned I became that I was trusting numbers based on stale benchmarks.

This week, a benchmark study confirmed my worst fears. It tested two popular detectors against 47 essays generated by Kimi K2 in "thinking mode," which mimics modern, high-variance LLM output.

ZeroGPT missed 62% of the AI content. For context, the same study notes that ZeroGPT classifies the 1776 U.S. Declaration of Independence as 99% AI-generated. If a detector flags famously human text as AI, the false-positive ceiling is high enough to invalidate its positives on actual AI text.Why Legacy Detection Fails Modern LLMs

If you've shipped AI detection, you probably integrated it once, picked a confidence threshold, and considered the job done. This is the failure mode the benchmark exposes: Detector accuracy is not stable across model generations.

Most public detectors were built around three assumptions about older LLM output:

Low perplexity: Text is predictable and falls below a certain perplexity score $\rightarrow$ Flag as AI.
Uniform structure (Low Burstiness): Sentences have low variance in length and structure $\rightarrow$ Flag as AI.
Predictable features: Use of function-word patterns and standard transition phrases $\rightarrow$ Flag as AI.

Reasoning models like Kimi K2, Gemini 2.5 Pro, and GPT-5 break all three:

Output is contextually adaptive, meaning perplexity varies wildly within a single response.
Sentence variance increases during exploratory "thinking" passages.
Token distributions are deliberately broadened to mimic human reasoning rhythms.

If your detector hasn't been retrained on current reasoning model output, it’s classifying against a distribution that no longer exists in production. The 38% accuracy is the result of this structural drift.Actionable Fixes: Hardening the Detection Pipeline

After re-checking my own setup, here are the four concrete changes I made 1 Confidence Threshold Raised to 0.85

A 0.62 mean confidence on a fully AI-positive test set indicates that individual high-looking scores can still be coin flips. For anything that triggers an action (like a submission rejection or account flag), I now require multi-signal corroboration or human review if the score is below 0.85.2. Build a Held-Out Test Set from Current Models

I’m now generating my own validation samples from current frontier models (Kimi K2, Claude Sonnet 4.6, GPT-5, Gemini 2.5 Pro) and running them through my detection layer monthly.

The set also includes "human-positive" texts (like the Declaration) to constantly monitor the false-positive rate.

Pseudo-code for the monitoring set I now keep around

HELD_OUT = {
"ai_positive": [
# 50 samples each from current frontier models
kimi_k2_samples,
claude_sonnet_4_6_samples,
gpt_5_samples,
gemini_2_5_pro_samples,
],
"human_positive": [
# public-domain texts written before 2020
declaration_of_independence,
federalist_papers_excerpts,
public_domain_essays,
],
}

Treat Detection as a Probabilistic Component

Even 97% accuracy means a 3% misclassification rate at scale. For anything where the cost of an error is real, detection must be a signal, not a verdict.4. Verify Modality Fit

I use AI or Not for image and audio checks in my projects because it covers multiple modalities. The Kimi K2 benchmark gave me a current-model accuracy number for the text side, which closed a vital verification gap I couldn't easily verify on my own.A Minimum-Viable Detector-Monitoring Pattern

If you are running detection in a production pipeline, this is the basic ML hygiene that keeps the integration from silently failing:

LOOP (monthly):
for detector in production_pipeline:
accuracy_ai = run(detector, HELD_OUT.ai_positive)
accuracy_human = run(detector, HELD_OUT.human_positive)
mean_confidence = avg_confidence(detector, HELD_OUT.ai_positive)

   if accuracy_ai     < baseline.ai - 0.05:    alert("AI detection regressed")
   if accuracy_human  < baseline.human - 0.05: alert("FP rate increased")
   if mean_confidence < baseline.conf - 0.10:  alert("Detector going uncertain")

Most teams I've seen integrate detection once and never check it again. This pattern is essential because accuracy decays per model generation.TL;DR

97% vs 38% on Kimi K2 essays shows a structural, not a tuning, gap.
Detector accuracy decays per model generation. Re-benchmark quarterly.
Test false-positive rate against famously human text (the Declaration of Independence is a free check).
Raise your confidence threshold; one number is not a verdict.
Build a held-out test set from current models and monitor it on cadence.

If you're running detection in production and you can't name the generation of model you benchmarked against, you have an invisible calibration gap. The benchmark was the wake-up call; the monitoring pattern is what makes the fix permanent.