Bank of Scotland was fined £160K for a Cyrillic transliteration failure. Here's the technical breakdown.

Verifex — Sun, 12 Apr 2026 14:15:31 +0000

In January 2026, OFSI fined Bank of Scotland £160,000.
24 payments went through to a designated Russian individual.
Root cause: the screening tool couldn't match Cyrillic
transliteration variants.

This wasn't negligence. It was a technical failure that
most sanctions screening tools still have today.

Why Cyrillic matching fails

There are multiple competing standards for Cyrillic → Latin
transliteration: BGN/PCGN (used by US/UK governments), ISO 9,
GOST, ICAO, and dozens of informal spellings.

A single name like "Шварц" legitimately appears as:

Shvarts
Shvartz
Schwarz
Shvarc
Svarc

Every one of them is "correct" — depending on which standard
was used. Most screening tools pick one. If the watchlist
entry uses BGN/PCGN and the customer's passport uses ICAO,
you get a miss. That miss cost Bank of Scotland £160K.

The patronymic problem

Russian names have three parts: given name, patronymic,
and surname.

"Ivan," "Ivanov," and "Ivanovich" are completely different
people:

Ivan → given name
Ivanov → surname ("of Ivan")
Ivanovich → patronymic ("son of Ivan")

A naive fuzzy matcher sees 70%+ character overlap and scores
them as near-matches. This floods compliance queues with
false positives while simultaneously missing real hits.

The "Mohammed problem"

Arabic has 12+ formal romanization systems: ALA-LC, ISO 233,
UNGEGN, BGN/PCGN, DIN 31635...

A single Arabic name produces 300+ valid Latin spellings.
"Mohammed," "Muhammad," "Mohamed," "Mehmet," "Muhamad" —
same person, different systems.

The Beider-Morse algorithm — arguably the most sophisticated
phonetic matching system ever built — explicitly removed
Arabic support. The maintainers cited "severe performance
issues related to excessively complicated phonetics."

If the best phonetic algorithm gives up on Arabic, what are
most commercial tools doing?

Answer: Jaro-Winkler with a threshold. Which is why false
positive rates on Arabic names run above 90% in most systems.

The substring trap

"Computing" contains the substring "p-u-t-i-n."

Without whole-word boundary enforcement, your screening
system flags tech companies. This sounds absurd — but it
happens in production systems every day.

We caught this when testing our own engine. A query for
a software company returned a high-confidence sanctions
match because a substring of the company name overlapped
with a sanctioned individual's name.

The fix: whole-word tokenization. Only match on complete
tokens, never on substrings.

What the benchmark gap looks like

No commercial sanctions screening vendor publishes accuracy
benchmarks. Not Refinitiv, not ComplyAdvantage, not
sanctions.io.

OpenSanctions — the best open-source system — publishes
their numbers: 91.3% F1, 99% recall, 84.5% precision.

The Federal Reserve published a sanctions screening paper
in September 2025. Best result using GPT-4o: 98.95% F1 —
tested on Latin-script organization names only.

Nobody is publishing results on Arabic transliteration,
Cyrillic variants, or patronymic edge cases. Exactly the
cases that generate real fines.

What we built

We built Verifex (verifex.dev) to address this directly.
The matching engine combines:

Soft TF-IDF + Monge-Elkan — the academic gold standard for string matching (Cohen, Ravikumar, Fienberg 2003)
IDF corpus weighting — "Mohammed" and "Kim" are statistically common. They should score lower than rare tokens like "Qadhafi"
Double Metaphone phonetic blocking — across multiple transliteration standards simultaneously
9 penalty layers — patronymic derivatives, substring boundaries, entity-type mismatches, mixed-script detection
LLM cascade — for ambiguous matches in the 40-95% confidence range

Result: 100% F1 on an independent 145-case benchmark —
including Arabic transliteration, Cyrillic variants, phonetic
matching, and adversarial substring inputs.

The full benchmark is public: verifex.dev/benchmark

Anyone can run it against any provider.

Bank of Scotland's fine was preventable. The technology
to handle Cyrillic transliteration exists — it's just not
in most commercial tools. If you're building or evaluating
a sanctions screening solution, the benchmark cases at
verifex.dev/benchmark show exactly where most tools fail.

How we built a sanctions screening API that outperformed the Federal Reserve's benchmark

Verifex — Sat, 11 Apr 2026 20:43:22 +0000

The Federal Reserve published a sanctions screening
benchmark in September 2025. Their best result using
GPT-4o: 98.95% F1.

We hit 100%. Here's how.

The problem with existing tools

90-95% of sanctions screening alerts are false positives.
Analysts spend $130B/year investigating alerts that are wrong.

The root cause: basic fuzzy matching. Most tools use
Jaro-Winkler with a threshold. That's it.

What we built

9 penalty layers targeting specific false positive patterns:

Patronymic derivatives (Ivan ≠ Ivanov)
Business-to-person mismatch
Substring traps ("Computing" contains "Putin")
Common name IDF weighting
Mixed-script rejection
Zero-width character evasion detection

The matching pipeline

Normalization → smartNormalize()
FAISS MiniLM semantic ANN search
Jaro-Winkler + Monge-Elkan + Soft TF-IDF
Double Metaphone phonetic blocking
9 penalty layers
LLM cascade (40-85 confidence range)
Adjudication engine

The benchmark

145 real test cases across 13 categories:

OFAC, UN, EU, UK sanctions lists
Arabic/Cyrillic transliteration
Phonetic matching
Substring traps
Adversarial inputs

Result: 145/145. 100% F1, 100% Recall, 100% Precision.

The Federal Reserve tested organization names only,
Latin script only, 10 countries. They explicitly noted
individual names and non-Latin scripts were
"beyond the scope."

That's exactly what we tested.

The dataset is public

verifex.dev/benchmark

Anyone can run it against any provider.

We're Verifex — sanctions screening API for developers.
$49/month. verifex.dev

Forem: Verifex

Bank of Scotland was fined £160K for a Cyrillic transliteration failure. Here's the technical breakdown.

Why Cyrillic matching fails

The patronymic problem

The "Mohammed problem"

The substring trap

What the benchmark gap looks like

What we built

How we built a sanctions screening API that outperformed the Federal Reserve's benchmark

The problem with existing tools

What we built

The matching pipeline

The benchmark

The dataset is public