<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: David Villamizar</title>
    <description>The latest articles on Forem by David Villamizar (@daws4).</description>
    <link>https://forem.com/daws4</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3873919%2Fca01c0c1-6591-4f18-a33a-a95e64a7f808.jpg</url>
      <title>Forem: David Villamizar</title>
      <link>https://forem.com/daws4</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/daws4"/>
    <language>en</language>
    <item>
      <title>How I Built a Lightning-Fast Face Recognition Batch Processor using Python &amp; Docker</title>
      <dc:creator>David Villamizar</dc:creator>
      <pubDate>Sat, 11 Apr 2026 19:05:57 +0000</pubDate>
      <link>https://forem.com/daws4/how-i-built-a-lightning-fast-face-recognition-batch-processor-using-python-docker-588g</link>
      <guid>https://forem.com/daws4/how-i-built-a-lightning-fast-face-recognition-batch-processor-using-python-docker-588g</guid>
      <description>&lt;h2&gt;
  
  
  &lt;strong&gt;Introduction&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Have you ever tried to find a specific person in a folder containing thousands of event photos? Whether it's a wedding, a graduation, or a corporate event, photographers spend hours manually sifting through galleries to deliver personalized photo sets to their clients.&lt;/p&gt;

&lt;p&gt;I wanted to automate this, but I quickly ran into a wall: performing deep learning face recognition on thousands of high-resolution images is computationally expensive and memory-hungry.&lt;/p&gt;

&lt;p&gt;So, I built Py_Faces, a batch face recognition system that solves this by separating the heavy lifting from the actual search process. Here is how I designed the architecture to search through thousands of photos in seconds, and how I tamed Docker memory limits along the way.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Architecture: Calculate Once, Search Instantly&lt;/strong&gt;&lt;br&gt;
The biggest mistake I could have made was scanning the entire photo folder every time the user wanted to search for a new person. Instead, I split the system into three completely independent steps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Step 1: The Heavy Lifting (Encoding Extraction)&lt;/strong&gt;
The first script (escaner_encodings.py) scans every photo in the batch just once. It detects the faces, applies a CLAHE filter (Contrast Limited Adaptive Histogram Equalization) to handle bad lighting, and extracts a 128-dimension facial encoding vector using the face_recognition library.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These vectors—along with metadata and file paths—are saved into a binary .pkl file. This process can take around an hour depending on the CPU, but it's a one-time cost.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Step 2: Defining the Target&lt;/strong&gt;&lt;br&gt;
When the user wants to find someone, they drop a few clear photos of that person into a persona_objetivo folder. The second script (definir_objetivo.py) extracts the encodings from these reference photos and averages them out to create a highly accurate "Target Profile".&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Step 3: The Lightning-Fast Search&lt;/strong&gt;&lt;br&gt;
Here is where the magic happens. The third script (buscador_objetivo.py) doesn't look at images at all. It simply loads the massive .pkl file from Step 1 and uses NumPy to calculate the Euclidean distance between the "Target Profile" and every face in the batch.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because it's just comparing arrays of numbers, searching through thousands of photos takes about 2 seconds. The script then automatically copies the matching photos into a new folder and generates a detailed Excel report using pandas.&lt;/p&gt;

&lt;p&gt;Taming the Docker &amp;amp; Memory Beast&lt;br&gt;
To make this tool accessible, I wrapped it in a Docker container (python:3.11-slim). This avoids the nightmare of making users install C++ build tools, CMake, and dlib natively on Windows.&lt;/p&gt;

&lt;p&gt;However, this introduced a massive challenge: Memory Management.&lt;br&gt;
Docker Desktop on Windows (WSL2) limits memory usage. Processing high-res images with HOG or CNN models in parallel quickly leads to BrokenExecutor crashes because the container runs out of RAM.&lt;/p&gt;

&lt;p&gt;To fix this, I implemented a dynamic worker calculation function that checks the actual available RAM inside the Linux container (/proc/meminfo) before launching the ProcessPoolExecutor:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="err"&gt;`&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calcular_workers&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Estimates safe workers based on free memory in the container.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
    &lt;span class="n"&gt;memoria_por_worker_gb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;1.2&lt;/span&gt;  &lt;span class="c1"&gt;# Estimated RAM per parallel process
&lt;/span&gt;    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/proc/meminfo&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/proc/meminfo&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;linea&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;linea&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;MemAvailable:&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                        &lt;span class="n"&gt;mem_kb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;linea&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
                        &lt;span class="n"&gt;mem_gb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mem_kb&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                        &lt;span class="c1"&gt;# Reserve at least 0.8 GB for the OS and main orchestrator
&lt;/span&gt;                        &lt;span class="n"&gt;mem_disponible_gb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mem_gb&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                        &lt;span class="n"&gt;workers_por_ram&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mem_disponible_gb&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;memoria_por_worker_gb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;workers_por_ram&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cpu_count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;pass&lt;/span&gt;

    &lt;span class="c1"&gt;# Safe fallback
&lt;/span&gt;    &lt;span class="n"&gt;cpus&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cpu_count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cpus&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="err"&gt;`&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This ensures the script scales perfectly. If you run it on a 16GB machine, it maximizes the workers; if you run it on a constrained Docker environment, it dials it back to prevent crashes. Furthermore, images are resized (e.g., max 1800px or 2400px width) before processing to keep memory spikes in check.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://youtu.be/uYee-ZZ0VTI" rel="noopener noreferrer"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dealing with Real-World Dirty Data&lt;/strong&gt;&lt;br&gt;
When dealing with raw client photos, you learn quickly that data is never clean. I had to implement several fallbacks:&lt;/p&gt;

&lt;p&gt;EXIF Orientations: Photos taken vertically often appear horizontal to dlib. I wrote a utility using Pillow (PIL) to read EXIF tags and physically rotate the arrays before detection.&lt;/p&gt;

&lt;p&gt;Sequential Retries: If the multiprocessing pool does crash, the script catches the BrokenExecutor error, rescues the failed batch, and processes them sequentially so the user doesn't lose an hour of progress.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;br&gt;
Building Py_Faces taught me that sometimes the best way to optimize a slow process isn't to write faster algorithms, but to change the architecture entirely. By decoupling the extraction from the comparison, a heavy machine-learning task became an instant search tool.&lt;/p&gt;

&lt;p&gt;You can check out the full code on my GitHub: &lt;a href="https://github.com/daws-4/pyfaces" rel="noopener noreferrer"&gt;https://github.com/daws-4/pyfaces&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Have you ever dealt with memory leaks or dlib crashes in Docker? I'd love to hear how you solved them in the comments!&lt;/p&gt;

</description>
      <category>python</category>
      <category>docker</category>
      <category>machinelearning</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
