Forem: Harshil Jani

Introducing fastapi-bgtasks-dashboard : One-liner integration on your FastAPI application

Harshil Jani — Wed, 24 Sep 2025 20:40:15 +0000

FastAPI is a premier framework for constructing high-performance APIs, favored by tech giants, fintech firms, and e-commerce platforms for its asynchronous capabilities and efficiency.

One of its standout features is the BackgroundTasks class, which facilitates "fire-and-forget" operations such as email dispatching, data processing, or external API integrations. These tasks run post-response, ensuring non-blocking user experiences.

However, tracking it is not possible without having excessive logs. Constant log monitoring becomes impractical at scale, leading to opaque systems where tasks may fail silently, consume excessive resources, or inflate cloud costs.

To address this problem I've developed fastapi-bgtasks-dashboard, an open-source Python package that integrates a real-time dashboard into your FastAPI application with minimal effort. It's just a single line of change on your application which enables the entire real-time dashboard which trackes the background tasks.

Integration and Setup

Incorporating the dashboard is straightforward, requiring just one line of code after installation:

pip install fastapi-bgtasks-dashboard

In your main application file:

from fastapi import FastAPI
from fastapi_bgtasks_dashboard import mount_bg_tasks_dashboard

app = FastAPI()

# Integrate the dashboard effortlessly
mount_bg_tasks_dashboard(app=app)

Launch your server (e.g., via uvicorn main:app --reload), and navigate to http://localhost:8000/dashboard. The interface instantly populates with task details as they execute.

Under the hood, the package leverages FastAPI's dependency injection to capture tasks added through BackgroundTasks. It records essential metadata without altering your existing codebase. For example, consider a typical endpoint:

from fastapi import BackgroundTasks, FastAPI

@app.post("/analyze")
async def analyze_data(background_tasks: BackgroundTasks, data: dict):
    background_tasks.add_task(heavy_computation, data)
    return {"status": "Processing initiated"}

def heavy_computation(data: dict):
    # Simulate intensive processing
    import time
    time.sleep(10)  # Replace with actual logic
    print(f"Processed data: {data}")

Triggering this endpoint queues the task, which appears in the dashboard with attributes like start time, duration (formatted in ms/s/m/h), parameters, status, and any exceptions.

Technical Features

It is already designed for enterprise-scale applications, the features that dashboard offers are:

Real-Time Monitoring: Utilizes WebSockets for live updates, ensuring instantaneous visibility into task progress without page reloads. This is built on Starlette's async infrastructure, aligning perfectly with FastAPI's ecosystem.
Interactive Controls: Sort and filter tasks by function name, status, or duration. Re-execute failed tasks with a single click, preserving original parameters.
Efficient Storage: Defaults to an in-memory, thread-safe dictionary, optimized for handling millions of tasks on modest hardware (1-2 GB RAM suffices, with metadata footprint in mere MBs). A "clear tasks" feature allows manual flushing to prevent memory bloat, ideal for long-running services.

Without such a tool, applications risk undetected inefficiencies—tasks could deadlock, leak memory, or violate SLAs in distributed environments like Kubernetes clusters.

Future Development Roadmap

Version 0.1.5 represents a stable release, with all known issues resolved. It focuses on core functionality without external dependencies.

Upcoming iterations will introduce persistent storage options, targeting Redis for caching in distributed systems and PostgreSQL for robust querying and historical retention.

Why Adopt This Tool?

For teams managing critical FastAPI systems such as real-time analytics or payment gateways—this dashboard mitigates blind spots, reduces debugging overhead, and optimizes resource utilization. It prevents production meltdowns and curtails unnecessary cloud expenditures, fostering reliable, scalable architectures.

Explore the full documentation on PyPI or the GitHub repository. I encourage starring the repo to support visibility and contributing via pull requests—whether enhancing integrations, adding metrics (e.g., Prometheus), or refining the UI.

My vision is to integrate this into the official FastAPI ecosystem, and improve the background task handling industry-wide. Let's collaborate to make FastAPI applications more resilient and observable. Feedback and experiences with background task challenges are welcome in the comments!

NumPy’s SIMD-Friendly Design Boosts Performance Over Python Lists

Harshil Jani — Sat, 06 Sep 2025 23:25:50 +0000

NumPy’s SIMD-Friendly Design Boosts Performance Over Python Lists

When optimizing Python code for speed, especially in data-heavy applications like machine learning or analytics, the choice of data structure matters a lot.

Python lists are slow in comparison to NumPy arrays for numerical tasks. Its use of contiguous memory makes SIMD (Single Instruction, Multiple Data) vector processing which is a hardware feature that processes multiple data elements in parallel much faster.

In this post, we’ll explore why NumPy’s design delivers massive performance improvements over Python lists, with simple explanations, a clear code example and benchmarks to prove it.

Why Memory Layout Matters

The difference between NumPy arrays and Python lists is how they store data:

Python Lists (Scattered) : Each element is a separate Python object, stored at potentially different memory addresses.
NumPy Arrays (Contiguous) : Data is stored in a single, continuous block of memory. This allows the CPU to grab chunks of data efficiently.

SIMD utilizes contiguous memory because it can load several values (e.g., 4 or 8 floats) into a vector register with one instruction. Scattered memory, like in Python lists, forces the CPU to access elements individually, preventing SIMD and causing:

Cache Misses : Scattered data misses the CPU’s fast cache, fetching from slower main memory.
No Parallelism : Individual accesses block SIMD, reducing throughput.
Extra Overhead : Pointer chasing in scattered memory adds latency.

NumPy’s contiguous layout aligns perfectly with SIMD, enabling faster, parallel processing.

A Simple Benchmark to Show the Difference

Let’s test this with a basic operation by adding a constant to 100,000 numbers. We’ll compare a NumPy array to a Python list.

import numpy as np
import time

# Setup: 100,000 elements
size = 100_000
numpy_array = np.ones(size, dtype=np.float32)  # Contiguous
python_list = [1.0] * size  # Scattered

# Operation: Add 5 to each element
constant = 5.0

# NumPy (SIMD-friendly)
start = time.time()
numpy_result = numpy_array + constant
numpy_time = time.time() - start

# Python List (Scattered)
start = time.time()
python_result = [x + constant for x in python_list]
python_time = time.time() - start

# Results
print(f&quot;NumPy Array Time: {numpy_time:.6f} seconds&quot;)
print(f&quot;Python List Time: {python_time:.6f} seconds&quot;)
print(f&quot;NumPy Speedup: {python_time / numpy_time:.2f}x&quot;)
print(f&quot;NumPy result sample: {numpy_result[:5]}&quot;)
print(f&quot;Python result sample: {python_result[:5]}&quot;)

Sample Output (varies by system):

NumPy Array Time: 0.000306 seconds
Python List Time: 0.010526 seconds
NumPy Speedup: 34.36x
NumPy result sample: [6. 6. 6. 6. 6.]
Python result sample: [6.0, 6.0, 6.0, 6.0, 6.0]

NumPy is ~34x faster here because its contiguous memory enables SIMD to process multiple elements per CPU cycle, while Python lists require slow, sequential access.

In a real-world application like processing millions of records in a data pipeline this speedup can mean the difference between a snappy service and one that struggles under load.

Is It Just for Integers? Supporting Other Data Types

NumPy is optimized for homogeneous data, so all elements in an array must be of the same type. This uniformity is what enables SIMD and contiguous memory benefits.

Python lists can hold mixed types (e.g., integers, strings, objects) but lack the performance optimization due to their scattered memory addresses.

Use NumPy Arrays When :

Working with large, homogeneous numerical data (e.g., integers, floats) for performance-critical tasks.
Needing vectorized operations (e.g., matrix multiplication, statistical computations).
Integrating with numerical libraries (e.g., SciPy, Pandas).

Use Python Lists When :

Dealing with small datasets or mixed data types (e.g., a list of [1, “text”, 3.14]).
Prototyping non-numerical logic or needing dynamic resizing.
Example: Storing configuration settings or a small collection of diverse objects.

For mixed-type or small-scale tasks, Python lists are more flexible. For numerical performance, especially with SIMD, NumPy’s typed arrays are the way to go. You can always convert a list to a NumPy array with np.array(my_list, dtype=desired_type) to leverage these benefits.

Encourage your team to experiment with NumPy in small tasks to see the benefits firsthand. A quick profiling session can highlight the performance wins. Do share your performance gains in the comments below.