How I Built a Lightning-Fast Face Recognition Batch Processor using Python & Docker

David Villamizar — Sat, 11 Apr 2026 19:05:57 +0000

Introduction

Have you ever tried to find a specific person in a folder containing thousands of event photos? Whether it's a wedding, a graduation, or a corporate event, photographers spend hours manually sifting through galleries to deliver personalized photo sets to their clients.

I wanted to automate this, but I quickly ran into a wall: performing deep learning face recognition on thousands of high-resolution images is computationally expensive and memory-hungry.

So, I built Py_Faces, a batch face recognition system that solves this by separating the heavy lifting from the actual search process. Here is how I designed the architecture to search through thousands of photos in seconds, and how I tamed Docker memory limits along the way.

The Architecture: Calculate Once, Search Instantly
The biggest mistake I could have made was scanning the entire photo folder every time the user wanted to search for a new person. Instead, I split the system into three completely independent steps:

Step 1: The Heavy Lifting (Encoding Extraction) The first script (escaner_encodings.py) scans every photo in the batch just once. It detects the faces, applies a CLAHE filter (Contrast Limited Adaptive Histogram Equalization) to handle bad lighting, and extracts a 128-dimension facial encoding vector using the face_recognition library.

These vectors—along with metadata and file paths—are saved into a binary .pkl file. This process can take around an hour depending on the CPU, but it's a one-time cost.

Step 2: Defining the Target
When the user wants to find someone, they drop a few clear photos of that person into a persona_objetivo folder. The second script (definir_objetivo.py) extracts the encodings from these reference photos and averages them out to create a highly accurate "Target Profile".
Step 3: The Lightning-Fast Search
Here is where the magic happens. The third script (buscador_objetivo.py) doesn't look at images at all. It simply loads the massive .pkl file from Step 1 and uses NumPy to calculate the Euclidean distance between the "Target Profile" and every face in the batch.

Because it's just comparing arrays of numbers, searching through thousands of photos takes about 2 seconds. The script then automatically copies the matching photos into a new folder and generates a detailed Excel report using pandas.

Taming the Docker & Memory Beast
To make this tool accessible, I wrapped it in a Docker container (python:3.11-slim). This avoids the nightmare of making users install C++ build tools, CMake, and dlib natively on Windows.

However, this introduced a massive challenge: Memory Management.
Docker Desktop on Windows (WSL2) limits memory usage. Processing high-res images with HOG or CNN models in parallel quickly leads to BrokenExecutor crashes because the container runs out of RAM.

To fix this, I implemented a dynamic worker calculation function that checks the actual available RAM inside the Linux container (/proc/meminfo) before launching the ProcessPoolExecutor:

`def calcular_workers():
    """Estimates safe workers based on free memory in the container."""
    import os
    memoria_por_worker_gb = 1.2  # Estimated RAM per parallel process
    try:
        if os.path.exists('/proc/meminfo'):
            with open('/proc/meminfo') as f:
                for linea in f:
                    if linea.startswith('MemAvailable:'):
                        mem_kb = int(linea.split()[1])
                        mem_gb = mem_kb / (1024 * 1024)
                        # Reserve at least 0.8 GB for the OS and main orchestrator
                        mem_disponible_gb = max(0.5, mem_gb - 0.8)
                        workers_por_ram = int(mem_disponible_gb / memoria_por_worker_gb)
                        return max(1, min(workers_por_ram, os.cpu_count() or 4))
    except Exception:
        pass

    # Safe fallback
    cpus = os.cpu_count() or 2
    return max(1, int(cpus / 2))`

This ensures the script scales perfectly. If you run it on a 16GB machine, it maximizes the workers; if you run it on a constrained Docker environment, it dials it back to prevent crashes. Furthermore, images are resized (e.g., max 1800px or 2400px width) before processing to keep memory spikes in check.

Dealing with Real-World Dirty Data
When dealing with raw client photos, you learn quickly that data is never clean. I had to implement several fallbacks:

EXIF Orientations: Photos taken vertically often appear horizontal to dlib. I wrote a utility using Pillow (PIL) to read EXIF tags and physically rotate the arrays before detection.

Sequential Retries: If the multiprocessing pool does crash, the script catches the BrokenExecutor error, rescues the failed batch, and processes them sequentially so the user doesn't lose an hour of progress.

Conclusion
Building Py_Faces taught me that sometimes the best way to optimize a slow process isn't to write faster algorithms, but to change the architecture entirely. By decoupling the extraction from the comparison, a heavy machine-learning task became an instant search tool.

You can check out the full code on my GitHub: https://github.com/daws-4/pyfaces

Have you ever dealt with memory leaks or dlib crashes in Docker? I'd love to hear how you solved them in the comments!

Forem: David Villamizar

How I Built a Lightning-Fast Face Recognition Batch Processor using Python & Docker

Introduction