<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Guy Gontar</title>
    <description>The latest articles on Forem by Guy Gontar (@guy_gontar_7dca4bc5499c48).</description>
    <link>https://forem.com/guy_gontar_7dca4bc5499c48</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1669863%2F3f94f616-b8bf-4e11-a227-01b8439cbcd9.png</url>
      <title>Forem: Guy Gontar</title>
      <link>https://forem.com/guy_gontar_7dca4bc5499c48</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/guy_gontar_7dca4bc5499c48"/>
    <language>en</language>
    <item>
      <title>Our API Was Successfully Hacked. Here's What We Fixed Before the Regulator Showed Up</title>
      <dc:creator>Guy Gontar</dc:creator>
      <pubDate>Tue, 12 May 2026 07:46:47 +0000</pubDate>
      <link>https://forem.com/guy_gontar_7dca4bc5499c48/when-the-pentest-wins-hardening-a-legacy-api-without-breaking-the-business-4dbc</link>
      <guid>https://forem.com/guy_gontar_7dca4bc5499c48/when-the-pentest-wins-hardening-a-legacy-api-without-breaking-the-business-4dbc</guid>
      <description>&lt;p&gt;The report was a disaster.&lt;/p&gt;

&lt;p&gt;During a scheduled Penetration Test, the security firm didn’t just find "theoretical vulnerabilities"—they walked out the digital front door with a database full of real customer PII. They didn't need a complex zero-day exploit; they just used the doors we left wide open.&lt;/p&gt;

&lt;p&gt;This is the story of how a "successful" hack became the starting point for a deep-cleaning mission of a legacy system built on outsourced layers and technical debt.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 0: The Outsourcing Relay&lt;/strong&gt;&lt;br&gt;
The system I inherited was a classic "black box."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Team A&lt;/strong&gt; (Outsourced) built the foundation under a "speed at all costs" mandate. They were eventually fired for delays.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Team B&lt;/strong&gt; (Also outsourced) took over, but followed a strict "don't touch what isn't broken" policy. The API layer was treated as sacred ground, even though it was built on sand.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Result:&lt;/strong&gt; A backend where business logic worked, but security was non-existent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. The "String Interpolation" Disaster&lt;/strong&gt;&lt;br&gt;
The Pentest confirmed our worst fear: SQL Injection was everywhere. The code was littered with raw string interpolations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="c1"&gt;// The hacker's playground&lt;/span&gt;
&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;sql&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"SELECT * FROM account WHERE id = $accountId"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In a legacy environment, developers often use $var because it’s fast. But in a financial system, it’s a liability. By passing variables directly into the query string, we were essentially letting the client-side input write our database logic.&lt;/p&gt;

&lt;p&gt;The Hardening: We systematically replaced every raw query with JDBI Named Parameters.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;sql&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"SELECT * FROM account WHERE id = :accId"&lt;/span&gt;
&lt;span class="c1"&gt;// Bound securely via .bind("accId", accountId)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This change alone neutralized the primary entry point the hackers used. It moved the responsibility of data sanitization from the developer to the database driver.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. The Unlocked Front Door: Query Parameter Injection&lt;/strong&gt;&lt;br&gt;
After I started binding my SQL queries, I realized a second, more subtle vulnerability: the Filter Builder.&lt;/p&gt;

&lt;p&gt;In an effort to be "flexible," the legacy system had a method that grabbed parameters from the URL and appended them into a list of strings to build the WHERE clause. It was a masterpiece of technical debt that looked like this:&lt;/p&gt;

&lt;p&gt;The "Legacy Special" (Implicit Trust):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Direct concatenation from the URL context&lt;/span&gt;
&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;queryParam&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"accountId"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;let&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; 
    &lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"doc.account_id = $it"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;queryParam&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"startDate"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;let&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"DATE(dt) &amp;gt;= '$it'"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Vulnerability: Even if the final query was "parameterized," the logic itself was being written by the user. If a hacker sent &lt;code&gt;?accountId=1 OR 1=1&lt;/code&gt;, the filter list would happily include a condition that exposed every record in the database.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Surgical Fix: The "Type Shield"&lt;/strong&gt;&lt;br&gt;
We refactored this into a Type Gateway. Instead of trusting the string, we forced every parameter to prove its identity before it was allowed near the query builder.&lt;/p&gt;

&lt;p&gt;The Hardened Version (Zero Trust):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Numeric Shields (toLongOrNull is your best friend)&lt;/span&gt;
&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;queryParam&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"accountId"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;toLongOrNull&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;let&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;
    &lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"doc.account_id = $id"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Regex Validation for Strings&lt;/span&gt;
&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;dateRegex&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Regex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"""^\d{4}-\d{2}-\d{2}$"""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;queryParam&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"startDate"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;takeIf&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;matches&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dateRegex&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;let&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;date&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;
    &lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"DATE(dt) &amp;gt;= '$date'"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// 3. JSONB Hardening&lt;/span&gt;
&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;queryParam&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"depositId"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;toLongOrNull&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;let&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;
    &lt;span class="c1"&gt;// Casting to Long ensures the JSON structure can't be "broken out of"&lt;/span&gt;
    &lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"""doc.meta @&amp;gt; '{"items":[$id]}'"""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why this mattered:&lt;br&gt;
The BIGINT Shield: By using toLongOrNull(), any non-numeric "injection" simply returns null, and the filter is ignored. The attack disappears before it's even born.&lt;/p&gt;

&lt;p&gt;Regex as a Filter: If a startDate doesn't match the YYYY-MM-DD pattern perfectly, it’s not invited to the party.&lt;/p&gt;

&lt;p&gt;Sanitizing the Strings: For fields where we had to allow text (like type or status), we implemented a surgical replace("'", "''") to ensure a single quote couldn't break the SQL string.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. The "Any?" Type-Safety Minefield&lt;/strong&gt;&lt;br&gt;
Security isn't just about SQL; it’s about contracts. Because the system grew through different teams, the core functions were written to be "flexible." One crucial function looked like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;newMovement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;accountId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;?,&lt;/span&gt; &lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="p"&gt;.)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why Any?? Because the Account ID could come from a JSON ObjectNode, a String key, or a Long from the DB. Instead of handling types, the code just accepted everything.&lt;/p&gt;

&lt;p&gt;The Production Crash: While hardening this, a colleague added new parameters to the middle of the function. Because the call was positional and the type was Any?, the compiler stayed silent. In production, a String was passed into a BIGINT slot.&lt;br&gt;
The Lesson: If your API doesn't enforce types at the front door, "garbage" will eventually reach your database. In our case, it led to a PSQLException that I had to debug over the phone while on a road trip.&lt;/p&gt;

&lt;p&gt;When you can’t refactor the entire app (because the system ID is an Int but the DB expects a Long), you have to build a "Gatekeeper."&lt;/p&gt;

&lt;p&gt;We couldn't fix Team A's architectural choices overnight. Instead, we implemented Surgical Casting at the service boundary:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Forcing the contract before it touches the DAO&lt;/span&gt;
&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;boundId&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;accountId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="nc"&gt;Number&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;accountId&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toLong&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;accountId&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toLongOrNull&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; 
    &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nc"&gt;SecurityException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Unauthorized ID format"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This ensured that no matter how messy the "Outsourced Relay" became, the core database operations were shielded by strict type validation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzuqpwhzecbunvexjb1c9.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzuqpwhzecbunvexjb1c9.webp" alt=" " width="720" height="479"&gt;&lt;/a&gt;&lt;br&gt;
Hardening a legacy system under regulatory pressure isn’t a sprint — it’s a negotiation. You’re negotiating with old code you didn’t write, with business stakeholders who fear lawsuits and with a regulator who expects a remediation report with actual closure dates. What the pentest gave us, ironically, was leverage. For the first time, the risk wasn’t theoretical. It was a PDF with our customer data in it. That changes conversations. We didn’t fix everything at once — but we fixed what mattered, documented it properly, and passed the next audit. Sometimes that’s exactly what winning looks like.&lt;/p&gt;

</description>
      <category>security</category>
      <category>api</category>
      <category>sql</category>
      <category>backend</category>
    </item>
    <item>
      <title>When os.walk() Freezes: Diagnosing and Fixing Silent Network Hangs on NetApp Shared Storage with Python</title>
      <dc:creator>Guy Gontar</dc:creator>
      <pubDate>Mon, 11 May 2026 07:51:47 +0000</pubDate>
      <link>https://forem.com/guy_gontar_7dca4bc5499c48/when-oswalk-freezes-diagnosing-and-fixing-silent-network-hangs-on-netapp-shared-storage-with-3o1p</link>
      <guid>https://forem.com/guy_gontar_7dca4bc5499c48/when-oswalk-freezes-diagnosing-and-fixing-silent-network-hangs-on-netapp-shared-storage-with-3o1p</guid>
      <description>&lt;p&gt;&lt;u&gt;The Problem&lt;br&gt;
&lt;/u&gt;Our team manages a NetApp shared storage cluster holding operational data for approximately 1,500 entities. Each entity has its own directory subtree, split by record type and date, resulting in a tree of roughly 4,500 leaf directories per date. Each leaf directory held tens of small .txt files — each containing a JSON array of records dumped by a backend service — scattered across the tree with no consolidation.&lt;br&gt;
The file count had grown to a point where NetApp itself started reporting problems. Inodes were being consumed, quotas were being hit, and our sysadmin was fielding out-of-disk-space errors even after physically expanding the storage. The culprit wasn't raw capacity — it was file count. NetApp, like most enterprise storage systems, tracks files through a fixed inode table. Running out of inodes while having free disk space is a well-known but frequently misunderstood failure mode.&lt;br&gt;
The solution was straightforward in concept: consolidate the small files per leaf directory into single JSONL files — one JSON object per line — reducing tens of files per directory down to one. A Python script using os.walk() to traverse the tree, parse each JSON array, merge the objects, sort by timestamp, and write a consolidated output file. Standard stuff.&lt;br&gt;
The problem emerged at scale.&lt;/p&gt;

&lt;p&gt;&lt;u&gt;Diagnosing the Real Cause&lt;br&gt;
&lt;/u&gt;The first version of the consolidation script ran fine on individual leaf directories and small subtrees. Pointed at the full tree — all 1,500 entities across three dates — it appeared to start normally, printed its initial configuration, and then went silent. No output. No tqdm progress bar. No exception. No traceback. The process was running according to Task Manager but nothing was happening.&lt;br&gt;
The natural first instinct was a code bug. I attached a debugger and set breakpoints at the point where matched directories were being appended to the results list. The breakpoints were not being hit — but the process wasn't stopping either. It was alive, consuming a small amount of CPU, and doing apparently nothing.&lt;br&gt;
This is the behavior of a syscall-level hang.&lt;br&gt;
os.walk() works by calling os.scandir() on each directory it visits. os.scandir() is a thin wrapper over the operating system's directory enumeration syscall — on Windows against a UNC path, this translates to an SMB QUERY_DIRECTORY request sent to the remote share. When the share responds normally, the call returns in milliseconds. When the share is under stress, the response can be delayed by seconds, tens of seconds, or indefinitely — and Python has no timeout mechanism for this. The thread simply blocks, waiting for a network response that may never arrive, with no way to surface this condition as an exception.&lt;br&gt;
The share was under stress because it was approaching capacity. NetApp's performance characteristics above approximately 85% utilization are non-linear — latency increases sharply as the storage system works harder to manage fragmentation, snapshot reserves, and metadata operations simultaneously. Our consolidation task was running against a share that was already struggling, and os.walk()'s sequential single-threaded enumeration was exposing that struggle in the most opaque way possible: total silence.&lt;br&gt;
The debugger not hitting breakpoints was actually the key diagnostic signal. The code wasn't executing incorrectly — it wasn't executing at all. It was stuck below Python's layer, in a blocking network call that the runtime had no visibility into and no ability to interrupt.&lt;/p&gt;

&lt;p&gt;&lt;u&gt;&lt;strong&gt;"Just Parallelize It" — But How, Exactly?&lt;/strong&gt;&lt;br&gt;
&lt;/u&gt;Every developer has heard this advice at some point, usually delivered with confidence and very little elaboration. "Parallelize it. It will be faster". The gap between that sentence and actually knowing which concurrency model to apply — and more importantly, why — is where most of the real engineering work lives.&lt;br&gt;
Python offers three primary concurrency models and choosing the wrong one for a given problem can produce no improvement at all, or actively make things worse.&lt;br&gt;
multiprocessing spawns independent OS processes, each with its own Python interpreter and memory space. Because they don't share the GIL, they achieve true parallelism and are the correct tool when the bottleneck is CPU — computation, data transformation, numerical work. The cost is overhead: spawning processes is expensive, and passing data between them requires serialization. For our problem, multiprocessing would have been the instinctive-sounding choice — it's the model most associated with "serious" parallelism — but it would have been wrong.&lt;br&gt;
asyncio uses a single-threaded event loop to interleave I/O operations cooperatively. It's elegant in theory for high-concurrency network workloads but carries a critical constraint: file I/O in Python is not truly asynchronous without third-party libraries like aiofiles. The standard open(), read(), and os.scandir() calls are all blocking and will stall the event loop just as thoroughly as they stall a regular thread. Rewriting the entire script around asyncio would have added significant complexity for no real benefit in this context.&lt;br&gt;
threading with ThreadPoolExecutor is the correct model here, and the reason comes down to one specific behavior: Python's GIL is released during I/O syscalls. When a thread is blocked waiting for a network response — precisely what was happening with os.scandir() on a stressed share — other threads are free to run. The GIL, which normally serializes Python execution, steps aside exactly when we need it to. Multiple threads can be simultaneously blocked on different network requests, and whichever one gets a response first continues executing while the others remain blocked. This is genuine concurrency for I/O-bound workloads, achieved with minimal overhead and no data serialization.&lt;br&gt;
The distinction matters because it's not about which tool sounds more powerful. It's about matching the tool to the bottleneck.&lt;/p&gt;

&lt;p&gt;&lt;u&gt;The Solution&lt;br&gt;
&lt;/u&gt;The fix was to split the directory scan at the top level of the tree — one subtree per entity — and assign each subtree to a thread in a ThreadPoolExecutor pool. Rather than one thread walking 1,500 entity directories sequentially, four threads each walked roughly 375, concurrently.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;scan_subtree&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;subdir&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dates&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;set&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;found&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;dirpath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dirnames&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;walk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;subdir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;topdown&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;followlinks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;dirnames&lt;/span&gt;&lt;span class="p"&gt;[:]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dirnames&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dirpath&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;dirnames&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;dates&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;found&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;found&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;collect_targets&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;root_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dates&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;set&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;workers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;top_level&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;root_path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;iterdir&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;is_dir&lt;/span&gt;&lt;span class="p"&gt;()])&lt;/span&gt;
    &lt;span class="n"&gt;targets&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;ThreadPoolExecutor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_workers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;workers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;futures&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;submit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scan_subtree&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dates&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;top_level&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;tqdm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;as_completed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;futures&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;top_level&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                  &lt;span class="n"&gt;unit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;subtree&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dynamic_ncols&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pbar&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;future&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;pbar&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;pbar&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_description&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Scanning &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;futures&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;result&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="n"&gt;targets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;tqdm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  ERROR scanning &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;futures&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;targets&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two details in this implementation are worth noting explicitly.&lt;br&gt;
The followlinks=False parameter on os.walk() is not optional on a network share. If any directory in the tree contains a symlink pointing to a parent directory — a configuration that is entirely possible on NetApp — the default behavior of following symlinks would produce an infinite loop with no error. This is a silent correctness bug that only surfaces at the worst possible moment.&lt;br&gt;
The dirnames[:] = sorted(dirnames) in-place assignment is equally important and easy to get wrong. os.walk() uses the dirnames list internally to determine which subdirectories to visit next. Modifying it in place with slice assignment changes the list that os.walk() holds — giving consistent traversal order and the ability to prune subtrees if needed. Reassigning the variable with dirnames = sorted(dirnames) creates a new list that os.walk() never sees, silently doing nothing.&lt;/p&gt;

&lt;p&gt;The more important insight from this solution is that the speed improvement was almost a secondary benefit. The primary benefit was resilience. In the single-threaded version, one slow or hung os.scandir() call stalled the entire program. In the concurrent version, a thread blocked on a slow subtree simply waits while the other threads continue making progress. The tqdm progress bar — which had been completely absent in the single-threaded freeze — now advanced continuously, confirming that the system was alive and working even when individual threads were stalled. In a long-running batch operation against an unreliable network resource, that visibility is not cosmetic. It is operational.&lt;/p&gt;

&lt;p&gt;The script successfully scanned and deleted leaf directories across all 1,500 entities in a fraction of the time the single-threaded version had spent frozen on its first subtree.&lt;/p&gt;

</description>
      <category>python</category>
      <category>netapp</category>
      <category>threadpool</category>
      <category>concurrency</category>
    </item>
  </channel>
</rss>
