<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Felipe Carvajal Brown</title>
    <description>The latest articles on Forem by Felipe Carvajal Brown (@fcarvajalbrown).</description>
    <link>https://forem.com/fcarvajalbrown</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3813778%2Fbe74c6e6-9d36-4bac-b311-1b61a0b3cfba.jpeg</url>
      <title>Forem: Felipe Carvajal Brown</title>
      <link>https://forem.com/fcarvajalbrown</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/fcarvajalbrown"/>
    <language>en</language>
    <item>
      <title>MaskOps 0.1.0: A Native Polars Plugin for High-Speed PII Masking in Python</title>
      <dc:creator>Felipe Carvajal Brown</dc:creator>
      <pubDate>Mon, 09 Mar 2026 03:55:40 +0000</pubDate>
      <link>https://forem.com/fcarvajalbrown/maskops-010-a-native-polars-plugin-for-high-speed-pii-masking-in-python-850</link>
      <guid>https://forem.com/fcarvajalbrown/maskops-010-a-native-polars-plugin-for-high-speed-pii-masking-in-python-850</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; I built a Rust-powered Polars plugin that masks GDPR-sensitive data (IBAN, EU VAT) at up to 16 million rows per second — no NLP models, no spaCy, no Presidio overhead. &lt;code&gt;pip install maskops&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;If you work with financial data, healthcare records, or any GDPR-regulated dataset in Python, you've likely hit the same wall: &lt;strong&gt;de-identifying structured data at scale is painfully slow&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The go-to solution is &lt;a href="https://github.com/microsoft/presidio" rel="noopener noreferrer"&gt;Microsoft Presidio&lt;/a&gt;. It's powerful, but it's built for unstructured text — it spins up a full spaCy NLP pipeline to find a phone number in a CSV column. For structured DataFrames where you already &lt;em&gt;know&lt;/em&gt; which columns contain PII, that's enormous overhead:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Presidio with spaCy NER: ~1,000–5,000 rows/s&lt;/li&gt;
&lt;li&gt;Presidio with regex-only recognizers: ~10,000–50,000 rows/s&lt;/li&gt;
&lt;li&gt;Pure Python &lt;code&gt;re&lt;/code&gt; module: ~1,100,000 rows/s&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these integrate natively with Polars, the fastest DataFrame library in Python.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Solution: maskops
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;maskops&lt;/code&gt; is a &lt;strong&gt;native Polars expression plugin&lt;/strong&gt; written in Rust. It extends Polars with two new expressions — &lt;code&gt;mask_pii()&lt;/code&gt; and &lt;code&gt;contains_pii()&lt;/code&gt; — that run directly on Arrow memory buffers with zero Python overhead per row.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;polars&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pl&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;maskops&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;payments.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Mask all PII in a column
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;with_columns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;maskops&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mask_pii&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;notes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="c1"&gt;# "Transfer to DE89370400440532013000" → "Transfer to DE89******************"
&lt;/span&gt;
&lt;span class="c1"&gt;# Boolean detection — filter rows containing PII
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;maskops&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;contains_pii&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;free_text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. No model downloads, no engine initialization, no spaCy.&lt;/p&gt;




&lt;h2&gt;
  
  
  Benchmarks
&lt;/h2&gt;

&lt;p&gt;Tested on 1,000,000 rows, Intel i-series CPU, Python 3.14, Windows.&lt;/p&gt;

&lt;h3&gt;
  
  
  maskops throughput
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Profile&lt;/th&gt;
&lt;th&gt;Expression&lt;/th&gt;
&lt;th&gt;Time&lt;/th&gt;
&lt;th&gt;Rows/s&lt;/th&gt;
&lt;th&gt;MB/s&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;clean (no PII)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;mask_pii&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;0.404s&lt;/td&gt;
&lt;td&gt;2,477,599&lt;/td&gt;
&lt;td&gt;54.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;clean (no PII)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;contains_pii&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;0.169s&lt;/td&gt;
&lt;td&gt;5,915,970&lt;/td&gt;
&lt;td&gt;130.2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;dense (all PII)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;mask_pii&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1.385s&lt;/td&gt;
&lt;td&gt;722,104&lt;/td&gt;
&lt;td&gt;15.9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;dense (all PII)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;contains_pii&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;0.059s&lt;/td&gt;
&lt;td&gt;16,987,879&lt;/td&gt;
&lt;td&gt;373.7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;mixed (50/50)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;mask_pii&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;0.760s&lt;/td&gt;
&lt;td&gt;1,315,407&lt;/td&gt;
&lt;td&gt;28.9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;mixed (50/50)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;contains_pii&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;0.133s&lt;/td&gt;
&lt;td&gt;7,498,315&lt;/td&gt;
&lt;td&gt;165.0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  vs pure Python regex (same machine)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Profile&lt;/th&gt;
&lt;th&gt;maskops &lt;code&gt;mask_pii&lt;/code&gt;
&lt;/th&gt;
&lt;th&gt;Python &lt;code&gt;re&lt;/code&gt;
&lt;/th&gt;
&lt;th&gt;Speedup&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;clean&lt;/td&gt;
&lt;td&gt;0.404s&lt;/td&gt;
&lt;td&gt;0.925s&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.3×&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;dense&lt;/td&gt;
&lt;td&gt;1.385s&lt;/td&gt;
&lt;td&gt;1.653s&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1.2×&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;mixed&lt;/td&gt;
&lt;td&gt;0.760s&lt;/td&gt;
&lt;td&gt;1.337s&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1.8×&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;On clean and mixed data maskops is consistently faster. On dense data (every row is a full IBAN) both are regex-bound — the bottleneck is the pattern itself, not Python overhead.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  vs Microsoft Presidio (estimated)
&lt;/h3&gt;

&lt;p&gt;Presidio processes structured DataFrames via &lt;code&gt;presidio-structured&lt;/code&gt;, which runs a spaCy NLP pipeline per row. Based on community reports and the architecture:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Throughput (structured data)&lt;/th&gt;
&lt;th&gt;Requires NLP model&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;maskops&lt;/td&gt;
&lt;td&gt;~700K–17M rows/s&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Presidio (regex-only recognizers)&lt;/td&gt;
&lt;td&gt;~10–50K rows/s*&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Presidio (spaCy NER)&lt;/td&gt;
&lt;td&gt;~1–5K rows/s*&lt;/td&gt;
&lt;td&gt;Yes (250MB+)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;* Estimated from community benchmarks and Presidio's own documentation noting it is "not optimized for bulk structured data." &lt;a href="https://github.com/microsoft/presidio/discussions/1226" rel="noopener noreferrer"&gt;Microsoft confirmed no official throughput benchmarks exist.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;maskops is purpose-built for structured data pipelines where Presidio's NLP overhead is unnecessary.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;p&gt;The key is the &lt;a href="https://docs.pola.rs/user-guide/plugins/expr_plugins/" rel="noopener noreferrer"&gt;Polars expression plugin system&lt;/a&gt;, introduced in Polars 0.20. It allows you to register custom Rust functions that Polars calls directly on Arrow &lt;code&gt;ChunkedArray&lt;/code&gt; buffers — bypassing Python entirely for the hot loop.&lt;/p&gt;

&lt;p&gt;The architecture is three layers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Python (user code)
    ↓  register_plugin_function()
Polars expression engine
    ↓  Arrow ChunkedArray
Rust (maskops core)
    ↓  regex::Regex on &amp;amp;str slices
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each PII type lives in its own Rust module (&lt;code&gt;iban.rs&lt;/code&gt;, &lt;code&gt;vat.rs&lt;/code&gt;) with a compiled &lt;code&gt;once_cell::Lazy&amp;lt;Regex&amp;gt;&lt;/code&gt; — the regex is compiled once at startup, not per row.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Rust side — called directly by Polars on each string slice&lt;/span&gt;
&lt;span class="nd"&gt;#[polars_expr(output_type=String)]&lt;/span&gt;
&lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;mask_pii&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Series&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;PolarsResult&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Series&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;ca&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="nf"&gt;.str&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;StringChunked&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ca&lt;/span&gt;&lt;span class="nf"&gt;.apply&lt;/span&gt;&lt;span class="p"&gt;(|&lt;/span&gt;&lt;span class="n"&gt;opt_val&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Option&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;amp;&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;opt_val&lt;/span&gt;&lt;span class="nf"&gt;.map&lt;/span&gt;&lt;span class="p"&gt;(|&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="nn"&gt;std&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;borrow&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;Cow&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;Owned&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;mask_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="nf"&gt;.into_series&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Supported PII Patterns (v0.1.0)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pattern&lt;/th&gt;
&lt;th&gt;Coverage&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;IBAN&lt;/td&gt;
&lt;td&gt;All 36 SEPA countries&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;DE89370400440532013000&lt;/code&gt; → &lt;code&gt;DE89******************&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;EU VAT&lt;/td&gt;
&lt;td&gt;All 27 EU member states&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;DE123456789&lt;/code&gt; → &lt;code&gt;DE*********&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Tested against Faker-generated data in 8 EU locales: DE, FR, ES, IT, NL, PL, PT, SE.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Not Just Use Polars &lt;code&gt;.str.replace()&lt;/code&gt;?
&lt;/h2&gt;

&lt;p&gt;You could write &lt;code&gt;pl.col("x").str.replace_all(pattern, "****")&lt;/code&gt; directly in Polars. The problem:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;You need one expression per PII type&lt;/strong&gt; — maskops applies all patterns in a single pass.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No detection&lt;/strong&gt; — Polars has no &lt;code&gt;contains_pii()&lt;/code&gt; equivalent without writing the regex yourself.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No masking logic&lt;/strong&gt; — &lt;code&gt;mask_pii&lt;/code&gt; preserves the IBAN country code and check digits, which is standard practice for audit trails. A raw &lt;code&gt;str.replace_all&lt;/code&gt; would wipe everything.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Roadmap
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;v0.1.1&lt;/strong&gt;: Email, phone number, IP address patterns&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;v0.1.2&lt;/strong&gt;: Format-Preserving Encryption (FPE/FF3-1) for reversible masking + PyPI publish&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;v0.2.0&lt;/strong&gt;: Latin American IDs (Chilean RUT, Brazilian CPF, Mexican CURP)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Install &amp;amp; Getting Started
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;maskops
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;polars&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pl&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;maskops&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;transaction&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Payment from DE89370400440532013000&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Invoice VAT: DE123456789&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No PII here&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;with_columns&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="n"&gt;maskops&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mask_pii&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;transaction&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;alias&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;masked&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;maskops&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;contains_pii&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;transaction&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;alias&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;has_pii&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────┬──────────────────────────────────┬─────────┐
│ transaction                         ┆ masked                           ┆ has_pii │
╞═════════════════════════════════════╪══════════════════════════════════╪═════════╡
│ Payment from DE89370400440532013000 ┆ Payment from DE89*************** ┆ true    │
│ Invoice VAT: DE123456789            ┆ Invoice VAT: DE*********         ┆ true    │
│ No PII here                         ┆ No PII here                      ┆ false   │
└─────────────────────────────────────┴──────────────────────────────────┴─────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Source code: &lt;a href="https://github.com/fcarvajalbrown/MaskOps" rel="noopener noreferrer"&gt;github.com/fcarvajalbrown/MaskOps&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built with Rust, pyo3-polars, and maturin. Contributions welcome.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; &lt;code&gt;#rust&lt;/code&gt; &lt;code&gt;#python&lt;/code&gt; &lt;code&gt;#polars&lt;/code&gt; &lt;code&gt;#gdpr&lt;/code&gt; &lt;code&gt;#dataengineering&lt;/code&gt; &lt;code&gt;#privacy&lt;/code&gt; &lt;code&gt;#pii&lt;/code&gt; &lt;code&gt;#opensource&lt;/code&gt;&lt;/p&gt;

</description>
      <category>performance</category>
      <category>privacy</category>
      <category>python</category>
      <category>rust</category>
    </item>
  </channel>
</rss>
