<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Verifex</title>
    <description>The latest articles on Forem by Verifex (@verifex).</description>
    <link>https://forem.com/verifex</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3845333%2F7b385c48-735c-486f-b496-4d0aa2054973.png</url>
      <title>Forem: Verifex</title>
      <link>https://forem.com/verifex</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/verifex"/>
    <language>en</language>
    <item>
      <title>How we built a sanctions screening API that outperformed the Federal Reserve's benchmark</title>
      <dc:creator>Verifex</dc:creator>
      <pubDate>Sat, 11 Apr 2026 20:43:22 +0000</pubDate>
      <link>https://forem.com/verifex/how-we-built-a-sanctions-screening-api-that-outperformed-the-federal-reserves-benchmark-57m2</link>
      <guid>https://forem.com/verifex/how-we-built-a-sanctions-screening-api-that-outperformed-the-federal-reserves-benchmark-57m2</guid>
      <description>&lt;p&gt;The Federal Reserve published a sanctions screening &lt;br&gt;
benchmark in September 2025. Their best result using &lt;br&gt;
GPT-4o: 98.95% F1.&lt;/p&gt;

&lt;p&gt;We hit 100%. Here's how.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem with existing tools
&lt;/h2&gt;

&lt;p&gt;90-95% of sanctions screening alerts are false positives.&lt;br&gt;
Analysts spend $130B/year investigating alerts that are wrong.&lt;/p&gt;

&lt;p&gt;The root cause: basic fuzzy matching. Most tools use &lt;br&gt;
Jaro-Winkler with a threshold. That's it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we built
&lt;/h2&gt;

&lt;p&gt;9 penalty layers targeting specific false positive patterns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Patronymic derivatives (Ivan ≠ Ivanov)&lt;/li&gt;
&lt;li&gt;Business-to-person mismatch&lt;/li&gt;
&lt;li&gt;Substring traps ("Computing" contains "Putin")&lt;/li&gt;
&lt;li&gt;Common name IDF weighting&lt;/li&gt;
&lt;li&gt;Mixed-script rejection&lt;/li&gt;
&lt;li&gt;Zero-width character evasion detection&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The matching pipeline
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Normalization → smartNormalize()&lt;/li&gt;
&lt;li&gt;FAISS MiniLM semantic ANN search&lt;/li&gt;
&lt;li&gt;Jaro-Winkler + Monge-Elkan + Soft TF-IDF&lt;/li&gt;
&lt;li&gt;Double Metaphone phonetic blocking&lt;/li&gt;
&lt;li&gt;9 penalty layers&lt;/li&gt;
&lt;li&gt;LLM cascade (40-85 confidence range)&lt;/li&gt;
&lt;li&gt;Adjudication engine&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The benchmark
&lt;/h2&gt;

&lt;p&gt;145 real test cases across 13 categories:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OFAC, UN, EU, UK sanctions lists&lt;/li&gt;
&lt;li&gt;Arabic/Cyrillic transliteration&lt;/li&gt;
&lt;li&gt;Phonetic matching&lt;/li&gt;
&lt;li&gt;Substring traps&lt;/li&gt;
&lt;li&gt;Adversarial inputs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Result: 145/145. 100% F1, 100% Recall, 100% Precision.&lt;/p&gt;

&lt;p&gt;The Federal Reserve tested organization names only, &lt;br&gt;
Latin script only, 10 countries. They explicitly noted &lt;br&gt;
individual names and non-Latin scripts were &lt;br&gt;
"beyond the scope."&lt;/p&gt;

&lt;p&gt;That's exactly what we tested.&lt;/p&gt;

&lt;h2&gt;
  
  
  The dataset is public
&lt;/h2&gt;

&lt;p&gt;verifex.dev/benchmark&lt;/p&gt;

&lt;p&gt;Anyone can run it against any provider.&lt;/p&gt;

&lt;p&gt;We're Verifex — sanctions screening API for developers.&lt;br&gt;
$49/month. verifex.dev&lt;/p&gt;

</description>
      <category>fintech</category>
      <category>api</category>
      <category>webdev</category>
      <category>security</category>
    </item>
  </channel>
</rss>
