<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Ankit Jangwan</title>
    <description>The latest articles on Forem by Ankit Jangwan (@jangwanankit).</description>
    <link>https://forem.com/jangwanankit</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1789656%2F53532038-5a9a-45e4-a8e6-7c3631d2d5d3.jpeg</url>
      <title>Forem: Ankit Jangwan</title>
      <link>https://forem.com/jangwanankit</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/jangwanankit"/>
    <language>en</language>
    <item>
      <title>How to Optimise Backend Performance: A Practical Playbook</title>
      <dc:creator>Ankit Jangwan</dc:creator>
      <pubDate>Fri, 03 Apr 2026 00:00:00 +0000</pubDate>
      <link>https://forem.com/jangwanankit/how-to-optimise-backend-performance-a-practical-playbook-1485</link>
      <guid>https://forem.com/jangwanankit/how-to-optimise-backend-performance-a-practical-playbook-1485</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;: Backend performance work is a loop: observe, profile, fix, verify. This post covers the full cycle. Setting up observability, identifying bottlenecks with percentile metrics, applying targeted fixes (N+1 queries, indexing, caching, async offloading), and verifying improvements against p75/p95/p99 latency targets.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Percentiles Matter More Than Averages
&lt;/h2&gt;

&lt;p&gt;Average response time is misleading. An endpoint averaging 80 ms might seem fine until you realise 5% of your users are waiting 800 ms or more.&lt;/p&gt;

&lt;p&gt;Percentile metrics give you the actual picture:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;What It Tells You&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;p50&lt;/strong&gt; (median)&lt;/td&gt;
&lt;td&gt;The typical user experience&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;p75&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Where the experience starts degrading&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;p95&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The worst experience for most users&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;p99&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The tail, your worst-case under normal load&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The goal I worked towards: p95 under 200 ms, p99 under 500 ms, and critical queries completing in under 50 ms.&lt;/p&gt;

&lt;p&gt;When you optimise, you're compressing the gap between p50 and p99. A fast median with a slow tail means your system is unpredictable, and users notice unpredictability more than raw speed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 1: Establish Observability
&lt;/h2&gt;

&lt;p&gt;Before touching any code, you need visibility into what your system is actually doing. I've seen teams spend weeks optimising the wrong endpoint because they didn't have the data to tell them where the real problems were.&lt;/p&gt;

&lt;h3&gt;
  
  
  Application Performance Monitoring (APM)
&lt;/h3&gt;

&lt;p&gt;APM tools trace requests end-to-end through your stack. They break down where time goes: application code, database queries, external API calls, template rendering, serialisation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tools&lt;/strong&gt;: Datadog APM, New Relic, Elastic APM, Jaeger (open-source)&lt;/p&gt;

&lt;p&gt;What to look for in APM data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Flame graphs give you a visual breakdown of time spent in each function call&lt;/li&gt;
&lt;li&gt;Trace waterfalls show sequential vs. parallel execution of sub-operations&lt;/li&gt;
&lt;li&gt;Service maps lay out which services call which, and where dependencies bottleneck&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Database Profiling
&lt;/h3&gt;

&lt;p&gt;Most backend latency lives in the database layer. Profiling queries tells you exactly which ones are slow and why.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tools&lt;/strong&gt;: Datadog Database Monitoring, pganalyze (PostgreSQL), &lt;code&gt;django-debug-toolbar&lt;/code&gt; (local development). For a hands-on walkthrough of using &lt;code&gt;django-debug-toolbar&lt;/code&gt; and &lt;code&gt;snakeviz&lt;/code&gt; for local profiling, see my &lt;a href="https://ankitjang.one/blog/profiling-django-apis-debug-toolbar-snakeviz" rel="noopener noreferrer"&gt;Case Study: Profiling Django APIs with Debug Toolbar and snakeviz&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Key metrics to track:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Query execution time: how long the database spends running each query&lt;/li&gt;
&lt;li&gt;Query frequency: a 5 ms query executed 200 times per request is worse than a single 100 ms query&lt;/li&gt;
&lt;li&gt;Lock wait time: queries blocked waiting for row or table locks&lt;/li&gt;
&lt;li&gt;Rows scanned vs. rows returned: a high ratio points to missing indexes&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Structured Logging
&lt;/h3&gt;

&lt;p&gt;Logs are your investigation trail. When APM shows a slow trace, logs tell you &lt;em&gt;what happened&lt;/em&gt; during that request.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;structlog&lt;/span&gt;

&lt;span class="n"&gt;logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;structlog&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_logger&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_processed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;duration_ms&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;elapsed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;item_count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;cache_hit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cache_hit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Log with enough context to reconstruct the request path: IDs, durations, counts, cache hit/miss status.&lt;/p&gt;

&lt;h3&gt;
  
  
  Dashboards and Alerting
&lt;/h3&gt;

&lt;p&gt;Combine these signals into dashboards. I use Datadog dashboards tracking:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;p75 / p95 / p99 latency per endpoint over time&lt;/li&gt;
&lt;li&gt;Error rate alongside latency (slow responses often precede errors)&lt;/li&gt;
&lt;li&gt;Database query count per request, where a sudden jump signals a regression&lt;/li&gt;
&lt;li&gt;Queue depth for async workers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Set up both threshold-based and rate-of-change alerts. Static thresholds catch known-bad states; rate-of-change alerts catch regressions as they happen.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Screenshot to add&lt;/strong&gt; (&lt;code&gt;dd-dashboard-latency.png&lt;/code&gt;): Datadog dashboard showing p75/p95/p99 latency timeseries for a single endpoint. Capture a view where the three percentile lines are visible and diverging (e.g., p50 flat around 80 ms while p99 spikes to 800 ms). Include the time range selector and the endpoint name in the title. This gives readers a concrete reference for what "observability" looks like in practice.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Reading Profiling Data
&lt;/h2&gt;

&lt;p&gt;Setting up observability tools is step one. Getting useful information out of them is where most people get stuck. Below is how I read the output from the three profiling interfaces I use most: Python's &lt;code&gt;cProfile&lt;/code&gt;, Django Debug Toolbar, and Datadog APM.&lt;/p&gt;

&lt;h3&gt;
  
  
  Python cProfile
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;cProfile&lt;/code&gt; is built into Python and requires no dependencies. It profiles function-level execution time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Running a profile:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cProfile&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pstats&lt;/span&gt;

&lt;span class="c1"&gt;# Profile a function call
&lt;/span&gt;&lt;span class="n"&gt;cProfile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;my_slow_function()&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;output.prof&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Read the results
&lt;/span&gt;&lt;span class="n"&gt;stats&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pstats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Stats&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;output.prof&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort_stats&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cumulative&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;print_stats&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Top 20 functions
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;For profiling a Django view in isolation:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cProfile&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;django.test&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RequestFactory&lt;/span&gt;

&lt;span class="n"&gt;factory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RequestFactory&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;request&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;factory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/api/orders/&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;profiler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cProfile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Profile&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;profiler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enable&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;my_view&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;profiler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;disable&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;profiler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;print_stats&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sort&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cumulative&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Reading the output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      1    0.000    0.000    1.842    1.842 views.py:45(order_list)
    200    0.003    0.000    1.650    0.008 models.py:12(get_customer)
    200    1.580    0.008    1.580    0.008 base.py:330(execute)
      1    0.001    0.001    0.180    0.180 serializers.py:88(to_representation)
      1    0.000    0.000    0.012    0.012 pagination.py:22(paginate)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Column&lt;/th&gt;
&lt;th&gt;What It Means&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ncalls&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;How many times this function was called&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tottime&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Time spent inside this function, excluding sub-calls&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;percall&lt;/code&gt; (first)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;tottime / ncalls&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;cumtime&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Total time spent in this function, including sub-calls&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;percall&lt;/code&gt; (second)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;cumtime / ncalls&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;How to read this:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Start from the top (sorted by &lt;code&gt;cumtime&lt;/code&gt;). In the example above, &lt;code&gt;order_list&lt;/code&gt; takes 1.84 seconds total. Drilling down, &lt;code&gt;get_customer&lt;/code&gt; is called 200 times and accounts for 1.65 seconds — that's 89% of the total. The actual time is spent in &lt;code&gt;base.py:execute&lt;/code&gt;, which is Django's database query executor. This is a textbook N+1: 200 individual queries to fetch customer data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What to focus on:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;High &lt;code&gt;ncalls&lt;/code&gt; on database functions: N+1 queries&lt;/li&gt;
&lt;li&gt;High &lt;code&gt;tottime&lt;/code&gt; on a single function: CPU-bound bottleneck (serialisation, computation)&lt;/li&gt;
&lt;li&gt;High &lt;code&gt;cumtime&lt;/code&gt; with low &lt;code&gt;tottime&lt;/code&gt;: the function itself is fast but calls something slow&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a visual alternative to the text output, pipe cProfile data into &lt;code&gt;snakeviz&lt;/code&gt; — it renders the same data as an interactive flame graph in the browser:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python &lt;span class="nt"&gt;-m&lt;/span&gt; cProfile &lt;span class="nt"&gt;-o&lt;/span&gt; output.prof my_script.py
snakeviz output.prof
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;snakeviz&lt;/code&gt; reads the profile top-down. Each box is a function, and boxes nested below others mean they were called by the function above. Wider boxes took more time. Click a box to zoom in, and sort the table below by &lt;code&gt;ncalls&lt;/code&gt; or &lt;code&gt;cumtime&lt;/code&gt; to find outliers.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Screenshot to add&lt;/strong&gt; (&lt;code&gt;snakeviz-flamegraph.png&lt;/code&gt;): snakeviz browser output showing a sunburst or icicle chart for a Django view profile. Ideally capture a view where one function (e.g., a database query) is visibly wider than the rest, with the stats table below showing &lt;code&gt;ncalls&lt;/code&gt;, &lt;code&gt;tottime&lt;/code&gt;, and &lt;code&gt;cumtime&lt;/code&gt; columns. Annotate or circle the wide block to show what "this is where the time goes" looks like.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Django Debug Toolbar Profiling
&lt;/h3&gt;

&lt;p&gt;Django Debug Toolbar gives you per-request profiling without writing any code. It has several panels, but for performance work the most useful are the &lt;strong&gt;SQL panel&lt;/strong&gt; and the &lt;strong&gt;Profiling panel&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enabling the profiler:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# settings.py (development only)
&lt;/span&gt;&lt;span class="n"&gt;INSTALLED_APPS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;debug_toolbar&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;MIDDLEWARE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;debug_toolbar.middleware.DebugToolbarMiddleware&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;DEBUG_TOOLBAR_PANELS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;debug_toolbar.panels.sql.SQLPanel&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;debug_toolbar.panels.profiling.ProfilingPanel&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;debug_toolbar.panels.timer.TimerPanel&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;debug_toolbar.panels.cache.CachePanel&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;INTERNAL_IPS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;127.0.0.1&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The SQL Panel:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is usually the first place to look. It shows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Total number of queries and total time&lt;/li&gt;
&lt;li&gt;Each individual query with its SQL, execution time, and stack trace&lt;/li&gt;
&lt;li&gt;Duplicate queries highlighted (immediate N+1 indicator)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;EXPLAIN&lt;/code&gt; output for each query (click to expand)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What to look for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;"Similar" or "Duplicated" badges&lt;/strong&gt; — these are N+1 queries. The toolbar groups identical query patterns and shows how many times each pattern was executed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Total query count&lt;/strong&gt; — a list API returning 50 items should not fire 150 queries. If it does, you're missing &lt;code&gt;select_related&lt;/code&gt; or &lt;code&gt;prefetch_related&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Query time distribution&lt;/strong&gt; — if one query takes 200 ms and the rest take 1 ms each, that single query is your target&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The stack trace&lt;/strong&gt; — click on any query to see exactly which line of Python code triggered it. This tells you whether the query came from the view, the serialiser, a model method, or a template&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Screenshot to add&lt;/strong&gt; (&lt;code&gt;ddt-sql-panel.png&lt;/code&gt;): Django Debug Toolbar SQL panel on a page with an N+1 problem. Capture the panel showing a high query count (e.g., "187 queries in 420 ms") with several queries marked "Duplicated" or "Similar" in red/orange badges. Expand one query to show the SQL text and the stack trace link. This is the most common first encounter with N+1 queries for Django developers.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;The Profiling Panel:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The profiling panel is disabled by default. Click its checkbox in the toolbar to activate it. On Python 3.12+, you need to run the dev server with &lt;code&gt;--nothreading&lt;/code&gt; for it to work:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python manage.py runserver &lt;span class="nt"&gt;--nothreading&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once enabled, it shows a collapsible call tree for the current request, similar to &lt;code&gt;cProfile&lt;/code&gt; output but rendered as an indented HTML table. Each row shows a function, its cumulative time, own time, and call count. You can expand and collapse levels to drill into the call hierarchy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GET /api/orders/ — 1842 ms
├── OrderListView.get() — 1842 ms (cumtime)
│   ├── OrderQuerySet.all() — 12 ms
│   ├── OrderSerializer.to_representation() — 1650 ms
│   │   ├── CustomerField.to_representation() × 200 — 1580 ms
│   │   │   └── SQL: SELECT * FROM customers WHERE id = %s × 200
│   │   └── ItemSerializer.to_representation() × 200 — 60 ms
│   └── Paginator.paginate() — 180 ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Reading this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Nesting&lt;/strong&gt; shows the call hierarchy. A slow parent with a fast own-time means the parent is slow because of its children&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Call count&lt;/strong&gt; (× 200) is the key signal. If a function repeats many times inside a loop, you're probably looking at an N+1 or a missing batch operation&lt;/li&gt;
&lt;li&gt;Start from the deepest nodes with the highest cumulative time and work upward&lt;/li&gt;
&lt;li&gt;You can adjust &lt;code&gt;PROFILER_MAX_DEPTH&lt;/code&gt; (default: 10) and &lt;code&gt;PROFILER_THRESHOLD_RATIO&lt;/code&gt; (default: 8) in &lt;code&gt;DEBUG_TOOLBAR_CONFIG&lt;/code&gt; to control how deep the tree goes and which functions get included&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Screenshot to add&lt;/strong&gt; (&lt;code&gt;ddt-profiling-panel.png&lt;/code&gt;): Django Debug Toolbar Profiling panel showing the call tree for a request. Capture a view with several levels expanded, where one branch has a high cumulative time and a high call count (e.g., a serialiser method called 200× inside a loop). The indented table format with CumTime, TotTime, and Per Call columns should be visible.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Datadog APM Resource Pages
&lt;/h3&gt;

&lt;p&gt;When you open a resource (endpoint) in Datadog APM, you see several tabs and visualisations. Here's what each one tells you.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Resource Page Overview:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The top of the page shows aggregate metrics for the selected endpoint over your chosen time range:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Requests/sec&lt;/strong&gt; — throughput&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency&lt;/strong&gt; — shown as p50, p75, p90, p95, p99 over time&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Errors&lt;/strong&gt; — error rate as a percentage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Total time&lt;/strong&gt; — the proportion of your service's total processing time spent on this resource&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Latency Distribution:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A histogram showing how response times are distributed. You want a tight cluster on the left. A long tail to the right means outlier requests are much slower than typical ones. Bimodal distributions (two humps) suggest two distinct code paths — for example, cache hits completing in 20 ms and cache misses in 400 ms.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Screenshot to add&lt;/strong&gt; (&lt;code&gt;dd-latency-distribution.png&lt;/code&gt;): Datadog resource page latency distribution histogram for a problematic endpoint. Capture one showing a long tail (bulk of requests clustered around 50–100 ms but a visible tail stretching to 800+ ms). If you have an example of a bimodal distribution (two distinct humps), capture that as a second image (&lt;code&gt;dd-latency-bimodal.png&lt;/code&gt;) — it's a much clearer illustration of the cache-hit vs. cache-miss pattern.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Spans (Trace Waterfall):&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When you click into an individual trace, you get the span waterfall. Each span represents a unit of work:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[django.request]──────────────────────── 1200 ms
  [django.middleware]──── 5 ms
  [django.view]──────────────────────── 1190 ms
    [postgresql.query]── 4 ms    SELECT * FROM orders WHERE ...
    [postgresql.query]── 3 ms    SELECT * FROM customers WHERE id = 1
    [postgresql.query]── 4 ms    SELECT * FROM customers WHERE id = 2
    [postgresql.query]── 3 ms    SELECT * FROM customers WHERE id = 3
    ... (197 more identical spans)
    [serialization]───── 15 ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What each span tells you:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Span Attribute&lt;/th&gt;
&lt;th&gt;What It Shows&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Service&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Which service produced this span (web app, database, cache, external API)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Operation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The type of work (e.g., &lt;code&gt;postgresql.query&lt;/code&gt;, &lt;code&gt;redis.command&lt;/code&gt;, &lt;code&gt;http.request&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Duration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;How long this span took&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Resource&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The specific query, URL, or cache key&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Error flag&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Whether this span resulted in an error&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Child count&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Number of child spans (sub-operations)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;How to read the waterfall:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Spans stacked vertically with small gaps are executing sequentially. This is normal for database queries within a single-threaded request&lt;/li&gt;
&lt;li&gt;A tall stack of identical spans (same operation, same resource pattern) is an N+1. In the example above, 200 &lt;code&gt;postgresql.query&lt;/code&gt; spans with &lt;code&gt;SELECT * FROM customers WHERE id = ?&lt;/code&gt; is the smoking gun&lt;/li&gt;
&lt;li&gt;Spans with long durations but no children indicate time spent in application code (CPU-bound work, synchronous I/O)&lt;/li&gt;
&lt;li&gt;A single very wide span early in the waterfall followed by fast spans suggests a slow initial query or connection setup&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Screenshot to add&lt;/strong&gt; (&lt;code&gt;dd-trace-waterfall-n1.png&lt;/code&gt;): Datadog trace waterfall view for a request with an N+1 problem. Capture a trace where you can see many identical &lt;code&gt;postgresql.query&lt;/code&gt; spans stacked vertically (each 3–5 ms, but dozens of them). The total trace should be visibly long (1000+ ms). The span colours should show postgresql spans in a distinct colour from the django spans. This is the "smoking gun" visual for N+1.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Span List tab:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The span list groups spans by resource and service, sorted by span count. Instead of a timeline, you see a table with columns for resource name, number of spans, average duration, execution time, and percentage of total trace time. This is useful when you want to quickly answer "which database query ran the most times?" or "which service consumed the most time?" without scrolling through a long waterfall.&lt;/p&gt;

&lt;p&gt;Sort by &lt;code&gt;SPANS&lt;/code&gt; to find N+1 patterns (one query repeated hundreds of times), or by &lt;code&gt;% EXEC TIME&lt;/code&gt; to find the single heaviest operation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Flame Graph tab:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The flame graph is the default trace visualisation in Datadog. It shows all spans from a trace laid out on a timeline, colour-coded by service.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The x-axis is time. Wider spans took longer&lt;/li&gt;
&lt;li&gt;The y-axis is call depth. Each row is a child of the row above it&lt;/li&gt;
&lt;li&gt;Colours represent services by default (you can switch to group by host or container)&lt;/li&gt;
&lt;li&gt;Spans from different services are visually distinct, so you can tell at a glance whether time is spent in your application code, the database, or an external API call&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Reading the flame graph:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Look for the widest spans at the deepest level. These are where actual time is spent&lt;/li&gt;
&lt;li&gt;Hover over any span to see the service name, operation, resource, and duration&lt;/li&gt;
&lt;li&gt;Click a span to open the detail panel below, which includes the full query text, error details, and related logs&lt;/li&gt;
&lt;li&gt;Use the legend at the top to see what percentage of total execution time each service accounts for. If postgresql takes 80% of the trace, the database is your bottleneck&lt;/li&gt;
&lt;li&gt;Toggle the &lt;strong&gt;Errors&lt;/strong&gt; checkbox under "Filter Spans" to highlight error spans in the graph&lt;/li&gt;
&lt;li&gt;Compare flame graphs before and after a fix. The previously wide span should be narrower or gone&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Screenshot to add&lt;/strong&gt; (&lt;code&gt;dd-flamegraph.png&lt;/code&gt;): Datadog flame graph for a trace, colour-coded by service. Capture one where the postgresql service takes a large portion of the total width, with many narrow child spans visible. The legend at the top should show the &lt;code&gt;% Exec Time&lt;/code&gt; breakdown per service. If possible, capture a second image (&lt;code&gt;dd-flamegraph-fixed.png&lt;/code&gt;) of the same endpoint after an N+1 fix — the postgresql portion should be visibly smaller.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Walkthrough: Finding a Bottleneck End-to-End
&lt;/h2&gt;

&lt;p&gt;Here's how the full process looks on a real endpoint. I'll use a simplified version of a case I've worked through.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The symptom:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Datadog dashboard shows &lt;code&gt;/api/orders/&lt;/code&gt; with p95 at 1200 ms, well above the 200 ms target. The endpoint handles 50,000 requests/day, making it a critical priority.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Check the Datadog resource page&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Open the resource page for &lt;code&gt;/api/orders/&lt;/code&gt;. The latency distribution shows a long tail — p50 is 180 ms, but p95 jumps to 1200 ms. The tail requests correlate with customers who have many orders.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2: Drill into a slow trace&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Filter traces by duration &amp;gt; 1000 ms. Open one. The span waterfall shows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[django.request] ────────────────────────── 1180 ms
  [django.view] ─────────────────────────── 1170 ms
    [postgresql.query] ─── 8 ms   SELECT * FROM orders WHERE user_id = 42 ...
    [postgresql.query] ─── 3 ms   SELECT * FROM customers WHERE id = 42
    [postgresql.query] ─── 4 ms   SELECT * FROM order_items WHERE order_id = 101
    [postgresql.query] ─── 3 ms   SELECT * FROM products WHERE id = 55
    [postgresql.query] ─── 4 ms   SELECT * FROM order_items WHERE order_id = 102
    [postgresql.query] ─── 3 ms   SELECT * FROM products WHERE id = 23
    ... (380 more query spans)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Total query count: 384. Total database time: ~980 ms. The rest is serialisation overhead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Identify the pattern&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Two N+1 patterns:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;For each order, a separate query fetches its items (&lt;code&gt;SELECT * FROM order_items WHERE order_id = ?&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;For each item, a separate query fetches the product (&lt;code&gt;SELECT * FROM products WHERE id = ?&lt;/code&gt;)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Step 4: Reproduce locally with Debug Toolbar&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Hit the same endpoint locally with &lt;code&gt;django-debug-toolbar&lt;/code&gt; enabled. The SQL panel confirms: 384 queries, with "Duplicated" badges on the &lt;code&gt;order_items&lt;/code&gt; and &lt;code&gt;products&lt;/code&gt; queries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 5: Apply the fix&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Before
&lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# After
&lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select_related&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;customer&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;prefetch_related&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;Prefetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;items&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;queryset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;OrderItem&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select_related&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;product&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Query count drops from 384 to 3: one for orders, one for items, one for products (via prefetch).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 6: Verify&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Push to staging. Monitor the Datadog resource page for &lt;code&gt;/api/orders/&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;p50: 180 ms → 45 ms&lt;/li&gt;
&lt;li&gt;p95: 1200 ms → 95 ms&lt;/li&gt;
&lt;li&gt;p99: 2400 ms → 180 ms&lt;/li&gt;
&lt;li&gt;Query count per request: 384 → 3&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The latency distribution shifts from a long-tail shape to a tight cluster under 100 ms. Ship to production.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Screenshot to add&lt;/strong&gt; (&lt;code&gt;walkthrough-before-after.png&lt;/code&gt;): Side-by-side or stacked comparison of the Datadog latency distribution for &lt;code&gt;/api/orders/&lt;/code&gt; before and after the fix. The "before" should show a long tail; the "after" should show a tight cluster. If a side-by-side isn't possible, use two separate images (&lt;code&gt;walkthrough-before.png&lt;/code&gt; and &lt;code&gt;walkthrough-after.png&lt;/code&gt;). This is the payoff visual — it shows the reader what success looks like in Datadog.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Step 2: Identify and Prioritise Bottlenecks
&lt;/h2&gt;

&lt;p&gt;With observability in place, the next step is triage. Not every slow endpoint matters equally.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prioritisation Framework
&lt;/h3&gt;

&lt;p&gt;Rank endpoints by impact × frequency:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Endpoint&lt;/th&gt;
&lt;th&gt;p95 Latency&lt;/th&gt;
&lt;th&gt;Requests/day&lt;/th&gt;
&lt;th&gt;Priority&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/api/orders/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1200 ms&lt;/td&gt;
&lt;td&gt;50,000&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Critical&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/api/users/profile/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;400 ms&lt;/td&gt;
&lt;td&gt;30,000&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;High&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/api/reports/monthly/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;3000 ms&lt;/td&gt;
&lt;td&gt;200&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Low&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/api/dashboard/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;600 ms&lt;/td&gt;
&lt;td&gt;15,000&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;High&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A 3-second report endpoint used 200 times a day is less urgent than a 1.2-second orders endpoint hit 50,000 times. Fix what affects the most users first.&lt;/p&gt;

&lt;h3&gt;
  
  
  Common Bottleneck Patterns
&lt;/h3&gt;

&lt;p&gt;These are the patterns I've run into most often:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;N+1 queries&lt;/strong&gt;: a list endpoint fires one query per item instead of batching&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Missing database indexes&lt;/strong&gt;: full table scans on filtered or sorted columns&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Over-fetching&lt;/strong&gt;: loading entire rows when only a few columns are needed, especially with large text or JSON fields&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Synchronous blocking&lt;/strong&gt;: waiting on external APIs, email sending, or file processing in the request cycle&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No caching&lt;/strong&gt;: recomputing identical results on every request&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unoptimised serialisation&lt;/strong&gt;: serialisers performing additional queries or heavy computation&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;As any system scales, these patterns get more noticeable. An N+1 that's invisible with 10 records becomes a real problem at 10,000.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 3: Apply Targeted Fixes
&lt;/h2&gt;

&lt;p&gt;Start with small wins. They're often low-effort but make a disproportionate difference.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fix N+1 Queries
&lt;/h3&gt;

&lt;p&gt;N+1 is probably the most common performance bug in ORM-based backends. It happens when you load a list of objects and then access a related object on each one individually.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The problem:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# This fires 1 query for orders + N queries for customer (one per order)
&lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Each access = 1 query
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# select_related: single JOIN query for ForeignKey/OneToOne
&lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select_related&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;customer&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# prefetch_related: two queries for ManyToMany/reverse FK
&lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;prefetch_related&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;items&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Detecting N+1 queries:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Datadog APM traces showing dozens of identical &lt;code&gt;SELECT&lt;/code&gt; statements per request&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;django-debug-toolbar&lt;/code&gt; showing query count spikes on list views&lt;/li&gt;
&lt;li&gt;Middleware that logs query count per request (useful in staging or with a debug flag):
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;django.db&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;connection&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;QueryCountMiddleware&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;get_response&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;get_response&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__call__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;initial&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;connection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;queries&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;connection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;queries&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;initial&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warning&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;high_query_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note: &lt;code&gt;connection.queries&lt;/code&gt; only populates when &lt;code&gt;DEBUG=True&lt;/code&gt;. In production, rely on APM tracing or a package like &lt;code&gt;django-querycount&lt;/code&gt; instead.&lt;/p&gt;

&lt;h3&gt;
  
  
  Add Database Indexes
&lt;/h3&gt;

&lt;p&gt;Indexes are the highest-leverage single change for query performance. Without one, the database scans every row.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Identifying missing indexes:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- PostgreSQL: find slow queries and their execution plans&lt;/span&gt;
&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="k"&gt;ANALYZE&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'pending'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'2026-01-01'&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Look for &lt;code&gt;Seq Scan&lt;/code&gt; in the output. That means no index is being used.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Adding targeted indexes:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Single-column index for filtered lookups&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_orders_status&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Composite index for queries that filter + sort&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_orders_status_created&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Partial index for a common filter condition&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_orders_pending&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'pending'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;In Django migrations:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Migration&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;migrations&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Migration&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;operations&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;migrations&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;AddIndex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;order&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;fields&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;-created_at&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;idx_order_status_created&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Index trade-offs:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Indexes speed up reads but slow down writes (every INSERT/UPDATE must update the index)&lt;/li&gt;
&lt;li&gt;On very large tables (100M+ rows), adding an index can lock the table. Use &lt;code&gt;CREATE INDEX CONCURRENTLY&lt;/code&gt; in PostgreSQL to avoid this&lt;/li&gt;
&lt;li&gt;Over-indexing wastes storage and makes the query planner's job harder&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Stop Over-Fetching Data
&lt;/h3&gt;

&lt;p&gt;Loading columns you don't need wastes memory and network bandwidth, especially with large text or JSON fields.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The problem:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Loads ALL columns including a 50KB description field
&lt;/span&gt;&lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Product&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Only fetch what you need
&lt;/span&gt;&lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Product&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;only&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Or explicitly defer heavy fields
&lt;/span&gt;&lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Product&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;defer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;metadata_json&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# For read-only list views, use values/values_list
&lt;/span&gt;&lt;span class="n"&gt;product_names&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Product&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;values_list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flat&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Implement Caching
&lt;/h3&gt;

&lt;p&gt;Cache results that are expensive to compute and don't change frequently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 1: Application-level cache (Redis/Memcached)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;django.core.cache&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cache&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_dashboard_stats&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;cache_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dashboard_stats:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;stats&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cache_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;stats&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;stats&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;compute_expensive_stats&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cache_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# 5 minutes
&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;stats&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Layer 2: Query-level caching with Django's cached queries&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;django.utils.functional&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cached_property&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;OrderSerializer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;serializers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ModelSerializer&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nd"&gt;@cached_property&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_prefetched_items&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;instance&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select_related&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;product&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Layer 3: HTTP caching for read-heavy endpoints&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;django.views.decorators.cache&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cache_page&lt;/span&gt;

&lt;span class="nd"&gt;@cache_page&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Cache for 5 minutes
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;product_list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Cache invalidation strategies:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Time-based (TTL): simplest, works for data that can tolerate staleness&lt;/li&gt;
&lt;li&gt;Event-based: invalidate on write operations using signals or hooks&lt;/li&gt;
&lt;li&gt;Versioned keys: append a version counter to cache keys, increment on data changes
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;django.db.models.signals&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;post_save&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;django.dispatch&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;receiver&lt;/span&gt;

&lt;span class="nd"&gt;@receiver&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;post_save&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sender&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Order&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;invalidate_order_cache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sender&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;instance&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dashboard_stats:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;instance&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_detail:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;instance&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Offload Async Work
&lt;/h3&gt;

&lt;p&gt;Anything that doesn't need to happen before the HTTP response can be moved out of the request cycle. I use this pattern extensively in my &lt;a href="https://ankitjang.one/case-studies/message-scheduler" rel="noopener noreferrer"&gt;Message Scheduler&lt;/a&gt; project — Celery workers handle email and Telegram delivery asynchronously, keeping the API response under 50 ms even during peak load.&lt;/p&gt;

&lt;p&gt;Common candidates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sending emails and notifications&lt;/li&gt;
&lt;li&gt;Generating reports or PDFs&lt;/li&gt;
&lt;li&gt;Processing uploaded files&lt;/li&gt;
&lt;li&gt;Syncing data with external services&lt;/li&gt;
&lt;li&gt;Updating search indexes&lt;/li&gt;
&lt;li&gt;Aggregating analytics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Using Celery in Django:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# tasks.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;celery&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;shared_task&lt;/span&gt;

&lt;span class="nd"&gt;@shared_task&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;send_order_confirmation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;order&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;send_email&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;subject&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Order #&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; confirmed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;render_confirmation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# views.py
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_order&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;order&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;
    &lt;span class="n"&gt;send_order_confirmation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;delay&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Non-blocking
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;201&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This takes email sending (which can take 500 ms–2 s depending on the provider) out of the response path entirely. For a production example of Celery + Redis in action with retry logic and idempotency keys, see the &lt;a href="https://ankitjang.one/case-studies/message-scheduler" rel="noopener noreferrer"&gt;Message Scheduler case study&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use Connection Pooling
&lt;/h3&gt;

&lt;p&gt;Opening a new database connection per request is expensive. Connection pooling keeps a pool of reusable connections.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For Django with PostgreSQL:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# settings.py
&lt;/span&gt;&lt;span class="n"&gt;DATABASES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;default&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ENGINE&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;django.db.backends.postgresql&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;NAME&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;mydb&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;CONN_MAX_AGE&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Keep connections alive for 10 minutes
&lt;/span&gt;    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;CONN_MAX_AGE&lt;/code&gt; keeps connections alive across requests within a thread. For actual connection pooling with control over pool size, use PgBouncer as an external pooler between Django and PostgreSQL. This matters when you're running multiple application workers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use Read Replicas
&lt;/h3&gt;

&lt;p&gt;For read-heavy workloads, route read queries to replica databases while writes go to the primary.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Django database router:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;PrimaryReplicaRouter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;db_for_read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;hints&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;replica&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;db_for_write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;hints&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;default&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;allow_relation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;obj1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;obj2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;hints&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;allow_migrate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;app_label&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;hints&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;default&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Caveat&lt;/strong&gt;: Read replicas have replication lag (usually milliseconds, but it can spike under load). Don't route reads to replicas immediately after a write if the user expects to see their own changes. This causes "read-your-own-write" inconsistency.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 4: Query-Level Deep Dives
&lt;/h2&gt;

&lt;p&gt;When quick fixes aren't enough, you need to go deeper into individual query performance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Using EXPLAIN ANALYZE
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; executes the query and shows the plan the database used. It's where you go when you need to understand exactly why a specific query is slow.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="k"&gt;ANALYZE&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'2026-01-01'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'pending'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'processing'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What to look for in the output:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Indicator&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Seq Scan&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Full table scan&lt;/td&gt;
&lt;td&gt;Add an index on the filtered columns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;Nested Loop&lt;/code&gt; with high row count&lt;/td&gt;
&lt;td&gt;Looping join on large result set&lt;/td&gt;
&lt;td&gt;Consider a hash join, or add indexes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;Sort&lt;/code&gt; with high cost&lt;/td&gt;
&lt;td&gt;Sorting without index&lt;/td&gt;
&lt;td&gt;Add an index that matches the ORDER BY&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;Rows Removed by Filter&lt;/code&gt; (high number)&lt;/td&gt;
&lt;td&gt;Index not selective enough&lt;/td&gt;
&lt;td&gt;Use a more specific composite index&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;Buffers: shared read&lt;/code&gt; (high)&lt;/td&gt;
&lt;td&gt;Data not in memory&lt;/td&gt;
&lt;td&gt;Increase &lt;code&gt;shared_buffers&lt;/code&gt; or optimise query to touch fewer pages&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Screenshot to add&lt;/strong&gt; (&lt;code&gt;explain-analyze-output.png&lt;/code&gt;): Terminal output from running &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; on a query with a &lt;code&gt;Seq Scan&lt;/code&gt;. Highlight or annotate the &lt;code&gt;Seq Scan&lt;/code&gt; node and the &lt;code&gt;rows=&lt;/code&gt; vs &lt;code&gt;Rows Removed by Filter&lt;/code&gt; values. If you have a second capture showing the same query after adding an index (showing &lt;code&gt;Index Scan&lt;/code&gt; instead), include that as &lt;code&gt;explain-analyze-indexed.png&lt;/code&gt; for a before/after comparison.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Reproducing Production Queries Safely
&lt;/h3&gt;

&lt;p&gt;Don't run &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; on production directly, since it actually executes the query. Instead:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Copy the slow query from Datadog/APM traces&lt;/li&gt;
&lt;li&gt;Run it in a read replica or staging environment with production-like data&lt;/li&gt;
&lt;li&gt;Compare plans between staging and production (data distribution matters)
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Connect to read replica&lt;/span&gt;
psql &lt;span class="nt"&gt;-h&lt;/span&gt; replica-host &lt;span class="nt"&gt;-U&lt;/span&gt; readonly_user &lt;span class="nt"&gt;-d&lt;/span&gt; mydb

&lt;span class="c"&gt;# Set statement timeout as a safety net&lt;/span&gt;
SET statement_timeout &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'30s'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

EXPLAIN &lt;span class="o"&gt;(&lt;/span&gt;ANALYZE, BUFFERS, FORMAT TEXT&lt;span class="o"&gt;)&lt;/span&gt;
SELECT ...&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 5: Verify and Monitor
&lt;/h2&gt;

&lt;p&gt;Every fix needs measurement. The process I follow: push to staging, monitor, confirm improvement, then ship to production. Or revert.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Verification Process
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Push fix to staging
       ↓
Monitor Datadog dashboards (p75, p95, p99)
       ↓
  ┌─── Improved? ───┐
  ↓                  ↓
 YES                 NO
  ↓                  ↓
Push to prod     Investigate further
  ↓                  ↓
Monitor prod     Iterate or revert
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What to Check After Each Fix
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Did p95/p99 latency actually drop?&lt;/li&gt;
&lt;li&gt;Did N+1 fixes reduce total query count per request?&lt;/li&gt;
&lt;li&gt;Did the fix introduce any new errors? (Performance "fixes" sometimes break things.)&lt;/li&gt;
&lt;li&gt;Did database CPU and I/O improve? Reduced query time should show up here too.&lt;/li&gt;
&lt;li&gt;For caching changes, is the hit ratio trending upward?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Tracking Regressions
&lt;/h3&gt;

&lt;p&gt;Performance work isn't something you do once and move on. Set up monitors that alert on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;p75, p95, p99 latency exceeding a threshold for more than 5 minutes&lt;/li&gt;
&lt;li&gt;Query count per request increasing by more than 20%&lt;/li&gt;
&lt;li&gt;Cache hit rate dropping below 80%&lt;/li&gt;
&lt;li&gt;New slow queries appearing (anything exceeding 500 ms)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Trade-offs and Gotchas
&lt;/h2&gt;

&lt;p&gt;Performance work involves trade-offs. Here are the ones I've dealt with most.&lt;/p&gt;

&lt;h3&gt;
  
  
  Indexing on Large Tables
&lt;/h3&gt;

&lt;p&gt;Adding indexes to tables with hundreds of millions of rows isn't always possible. &lt;code&gt;CREATE INDEX&lt;/code&gt; can lock the table for minutes. &lt;code&gt;CREATE INDEX CONCURRENTLY&lt;/code&gt; avoids locking but takes longer and can fail under high write throughput. Sometimes the answer is redesigning: partition the table, archive old data, or use a materialised view.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cache Invalidation
&lt;/h3&gt;

&lt;p&gt;Incorrect invalidation leads to stale data: users seeing outdated information, balance mismatches, ghost records. Start with short TTLs and event-based invalidation. Don't cache data that changes on every request.&lt;/p&gt;

&lt;h3&gt;
  
  
  Over-Optimisation
&lt;/h3&gt;

&lt;p&gt;Not every endpoint needs 50 ms latency. A monthly report endpoint used by 3 internal users can take 5 seconds and nobody will notice. Spend your time where users are, not where the numbers look bad in isolation.&lt;/p&gt;

&lt;p&gt;One useful approach: set alerting thresholds relative to traffic. If an endpoint handles 50,000 requests/day, alert at 150 ms p75, 300 ms p95, 500 ms p99. A low-traffic internal endpoint can have much looser thresholds.&lt;/p&gt;

&lt;h3&gt;
  
  
  Read Replica Lag
&lt;/h3&gt;

&lt;p&gt;Replication lag is usually sub-second, but under load it can spike. Design your application to tolerate this. Route reads-after-writes to the primary, not the replica.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tools Reference
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Tools&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;APM / Tracing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Datadog APM, New Relic, Elastic APM, Jaeger&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Database Profiling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Datadog DB Monitoring, pganalyze, &lt;code&gt;django-debug-toolbar&lt;/code&gt; — see also my &lt;a href="https://ankitjang.one/blog/profiling-django-apis-debug-toolbar-snakeviz" rel="noopener noreferrer"&gt;Django profiling walkthrough&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Logging&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;structlog, ELK Stack, Datadog Logs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Caching&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Redis, Memcached, Django cache framework&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Async Workers&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Celery + Redis/RabbitMQ — production example in &lt;a href="https://ankitjang.one/case-studies/message-scheduler" rel="noopener noreferrer"&gt;Message Scheduler&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Connection Pooling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;PgBouncer, Django &lt;code&gt;CONN_MAX_AGE&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Load Testing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Locust, k6, Apache Bench&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Python Profiling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;cProfile&lt;/code&gt;, &lt;code&gt;snakeviz&lt;/code&gt;, Pyinstrument&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Query Analysis&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt;, &lt;code&gt;pg_stat_statements&lt;/code&gt;, &lt;code&gt;auto_explain&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For language-specific performance tradeoffs: my &lt;a href="https://ankitjang.one/case-studies/healthlab" rel="noopener noreferrer"&gt;HealthLab project&lt;/a&gt; uses Go for a single-binary deployment with goroutine-based concurrency, which avoids the GIL limitations I hit in Python for CPU-bound bot processing. And my &lt;a href="https://ankitjang.one/case-studies/portfolio" rel="noopener noreferrer"&gt;portfolio system&lt;/a&gt; shows how Jinja2 template rendering with LaTeX achieves a 5x improvement over the previous manual workflow.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Measure first, optimise second. Observability is a prerequisite.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Percentiles over averages. p95 and p99 show what users actually experience.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Fix the boring stuff first. N+1 queries, missing indexes, and over-fetching account for most backend latency.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If it doesn't need to happen before the response, move it out of the request cycle.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Cache deliberately: short TTLs, event-based invalidation, clear cache keys.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Verify every change. The loop is observe → fix → measure → ship, not fix → hope.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Accept trade-offs. Not everything needs to be fast, and some optimisations create new problems.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;Performance work is incremental. As traffic grows and features ship, new bottlenecks surface. The system is never "done." The point is having a process that catches regressions early and fixes them before users feel it.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://ankitjang.one/blog/how-to-optimise-performance" rel="noopener noreferrer"&gt;ankitjang.one&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>backend</category>
      <category>performance</category>
      <category>database</category>
      <category>caching</category>
    </item>
    <item>
      <title>Profiling Django APIs with Debug Toolbar and snakeviz</title>
      <dc:creator>Ankit Jangwan</dc:creator>
      <pubDate>Thu, 02 Apr 2026 12:57:28 +0000</pubDate>
      <link>https://forem.com/jangwanankit/profiling-django-apis-with-debug-toolbar-and-snakeviz-1510</link>
      <guid>https://forem.com/jangwanankit/profiling-django-apis-with-debug-toolbar-and-snakeviz-1510</guid>
      <description>&lt;p&gt;You don't need paid monitoring tools to find what's slow in your Django application. Two free, open-source tools cover most of it: &lt;strong&gt;Django Debug Toolbar&lt;/strong&gt; for per-request profiling and &lt;strong&gt;snakeviz&lt;/strong&gt; for visualizing Python's built-in &lt;code&gt;cProfile&lt;/code&gt; data.&lt;/p&gt;

&lt;p&gt;This post walks through how I use both tools to find and fix performance problems, based on patterns from my own &lt;a href="https://ankitjang.one/projects" rel="noopener noreferrer"&gt;projects&lt;/a&gt;. The examples are grounded in a Django API that handles 10,000+ scheduled messages per day with Celery workers and external API calls.&lt;/p&gt;

&lt;p&gt;If you want the broader performance optimization workflow — including production monitoring, caching, and async offloading — I covered that in &lt;a href="https://ankitjang.one/blog/how-to-optimise-performance" rel="noopener noreferrer"&gt;How to Optimise Backend Performance&lt;/a&gt;. This post goes deeper on the local profiling tools.&lt;/p&gt;




&lt;h2&gt;
  
  
  Setting Up Django Debug Toolbar
&lt;/h2&gt;

&lt;p&gt;Installation takes about two minutes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;django-debug-toolbar
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# settings.py (development only)
&lt;/span&gt;&lt;span class="n"&gt;INSTALLED_APPS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;debug_toolbar&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;MIDDLEWARE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;debug_toolbar.middleware.DebugToolbarMiddleware&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;DEBUG_TOOLBAR_PANELS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;debug_toolbar.panels.sql.SQLPanel&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;debug_toolbar.panels.profiling.ProfilingPanel&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;debug_toolbar.panels.timer.TimerPanel&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;debug_toolbar.panels.cache.CachePanel&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;INTERNAL_IPS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;127.0.0.1&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add the URL configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# urls.py
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DEBUG&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;debug_toolbar&lt;/span&gt;
    &lt;span class="n"&gt;urlpatterns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="nf"&gt;path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;__debug__/&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;include&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;debug_toolbar&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;urlpatterns&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On Python 3.12+, the profiling panel needs the dev server running single-threaded:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python manage.py runserver &lt;span class="nt"&gt;--nothreading&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The SQL Panel: Your N+1 Detector
&lt;/h2&gt;

&lt;p&gt;The SQL panel is where I spend most of my time in Debug Toolbar. It shows every database query fired during a request, with timing, SQL text, and stack traces.&lt;/p&gt;

&lt;h3&gt;
  
  
  What to look for
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Query count.&lt;/strong&gt; A list endpoint returning 50 items should not fire 150 queries. If it does, you're missing &lt;code&gt;select_related&lt;/code&gt; or &lt;code&gt;prefetch_related&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Duplicated" and "Similar" badges.&lt;/strong&gt; Debug Toolbar groups identical query patterns and flags them. If you see a red "Duplicated" badge next to &lt;code&gt;SELECT * FROM customers WHERE id = ?&lt;/code&gt; repeated 200 times, that's a textbook N+1.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The stack trace.&lt;/strong&gt; Click any query to see which line of Python triggered it. This tells you whether the query came from the view, a serializer, a model method, or a template. Knowing &lt;em&gt;where&lt;/em&gt; matters as much as knowing &lt;em&gt;what&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Query time distribution.&lt;/strong&gt; If one query takes 200 ms and the rest take 1 ms each, that query is your target. Often it's a missing index — the query is doing a sequential scan instead of using an index.&lt;/p&gt;

&lt;h3&gt;
  
  
  Finding an N+1 in practice
&lt;/h3&gt;

&lt;p&gt;On a project similar to my &lt;a href="https://ankitjang.one/projects/message-scheduler" rel="noopener noreferrer"&gt;Message Scheduler&lt;/a&gt;, I hit this endpoint locally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;GET /api/messages/?status=pending
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Debug Toolbar showed: &lt;strong&gt;187 queries in 420 ms&lt;/strong&gt;. Several queries had "Duplicated" badges — the same &lt;code&gt;SELECT * FROM users WHERE id = ?&lt;/code&gt; pattern repeated for every message in the list.&lt;/p&gt;

&lt;p&gt;The view was loading messages and then accessing &lt;code&gt;message.user.email&lt;/code&gt; in the serializer. Each access triggered a separate query.&lt;/p&gt;

&lt;p&gt;The fix:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Before: 187 queries
&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;pending&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# After: 2 queries
&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;pending&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;select_related&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After the change, Debug Toolbar showed &lt;strong&gt;2 queries in 12 ms&lt;/strong&gt;. One query for messages with a JOIN to users, one for the count.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Profiling Panel: Where Time Actually Goes
&lt;/h2&gt;

&lt;p&gt;The SQL panel tells you about database time. The profiling panel tells you about everything else — serialization, template rendering, Python computation, middleware.&lt;/p&gt;

&lt;p&gt;Enable it by clicking its checkbox in the toolbar. It shows a collapsible call tree for the request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GET /api/messages/ — 1842 ms
├── MessageListView.get() — 1842 ms (cumtime)
│   ├── MessageQuerySet.all() — 12 ms
│   ├── MessageSerializer.to_representation() — 1650 ms
│   │   ├── UserField.to_representation() × 200 — 1580 ms
│   │   │   └── SQL: SELECT * FROM users WHERE id = %s × 200
│   │   └── ChannelSerializer.to_representation() × 200 — 60 ms
│   └── Paginator.paginate() — 180 ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Reading the call tree
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Call count is the key signal.&lt;/strong&gt; A function called 200 times inside a loop is almost always an N+1 or a missing batch operation. In the example above, &lt;code&gt;UserField.to_representation()&lt;/code&gt; runs 200 times — once per message in the list.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Nesting shows the hierarchy.&lt;/strong&gt; A slow parent with fast own-time means the parent is slow because of its children. &lt;code&gt;MessageSerializer.to_representation()&lt;/code&gt; takes 1650 ms, but it's not doing anything slow itself — its child &lt;code&gt;UserField.to_representation()&lt;/code&gt; is.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Start from the deepest nodes with the highest cumulative time and work upward.&lt;/strong&gt; The actual bottleneck is usually at the bottom of the tree.&lt;/p&gt;

&lt;p&gt;You can adjust the profiling depth with &lt;code&gt;DEBUG_TOOLBAR_CONFIG&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;DEBUG_TOOLBAR_CONFIG&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;PROFILER_MAX_DEPTH&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;# default: 10
&lt;/span&gt;    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;PROFILER_THRESHOLD_RATIO&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# default: 8
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  snakeviz: Visual Profiling with cProfile
&lt;/h2&gt;

&lt;p&gt;Django Debug Toolbar works great for web requests. But when you need to profile a management command, a Celery task, or a function in isolation, &lt;code&gt;cProfile&lt;/code&gt; + &lt;code&gt;snakeviz&lt;/code&gt; is the tool.&lt;/p&gt;

&lt;h3&gt;
  
  
  Capturing a profile
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cProfile&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;django.test&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RequestFactory&lt;/span&gt;

&lt;span class="n"&gt;factory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RequestFactory&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;request&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;factory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/api/messages/?status=pending&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;profiler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cProfile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Profile&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;profiler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enable&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;message_list_view&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;profiler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;disable&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;profiler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dump_stats&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;message_list.prof&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or from the command line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python &lt;span class="nt"&gt;-m&lt;/span&gt; cProfile &lt;span class="nt"&gt;-o&lt;/span&gt; output.prof manage.py some_management_command
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Viewing with snakeviz
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;snakeviz
snakeviz message_list.prof
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;snakeviz opens a browser with an interactive visualization — either a sunburst chart or an icicle chart. Each block is a function. Wider blocks took more time. Blocks nested inside others were called by the parent.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reading snakeviz output
&lt;/h3&gt;

&lt;p&gt;The text table below the chart shows the same data as &lt;code&gt;cProfile&lt;/code&gt;'s text output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      1    0.000    0.000    1.842    1.842 views.py:45(message_list)
    200    0.003    0.000    1.650    0.008 serializers.py:12(get_user)
    200    1.580    0.008    1.580    0.008 base.py:330(execute)
      1    0.001    0.001    0.180    0.180 pagination.py:22(paginate)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Column&lt;/th&gt;
&lt;th&gt;What It Means&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ncalls&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;How many times this function was called&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tottime&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Time inside this function, excluding sub-calls&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;cumtime&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Total time including sub-calls&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;percall&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Time per call&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;How to read this:&lt;/strong&gt; Start from the top (sorted by &lt;code&gt;cumtime&lt;/code&gt;). &lt;code&gt;message_list&lt;/code&gt; takes 1.84 seconds total. &lt;code&gt;get_user&lt;/code&gt; is called 200 times and accounts for 1.65 seconds — 89% of the view's time. The actual time is in &lt;code&gt;base.py:execute&lt;/code&gt;, which is Django's database query executor. Classic N+1.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What patterns to look for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;High &lt;code&gt;ncalls&lt;/code&gt; on database functions → N+1 queries&lt;/li&gt;
&lt;li&gt;High &lt;code&gt;tottime&lt;/code&gt; on a single function → CPU-bound bottleneck (serialization, computation)&lt;/li&gt;
&lt;li&gt;High &lt;code&gt;cumtime&lt;/code&gt; with low &lt;code&gt;tottime&lt;/code&gt; → the function itself is fast but calls something slow&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Profiling Celery tasks
&lt;/h3&gt;

&lt;p&gt;For my Message Scheduler's delivery tasks, I profile individual task functions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cProfile&lt;/span&gt;

&lt;span class="nd"&gt;@shared_task&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;send_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Normal task code...
&lt;/span&gt;    &lt;span class="k"&gt;pass&lt;/span&gt;

&lt;span class="c1"&gt;# Profile it
&lt;/span&gt;&lt;span class="n"&gt;cProfile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;send_message(42)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;send_task.prof&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then &lt;code&gt;snakeviz send_task.prof&lt;/code&gt; shows exactly where delivery time goes — API calls to SES, Telegram latency, database reads for message content. This is how I discovered that loading the full message object (including a large &lt;code&gt;metadata&lt;/code&gt; JSON field) was adding unnecessary overhead. Switching to &lt;code&gt;.only('id', 'channel', 'recipient', 'body')&lt;/code&gt; cut the database portion by 60%.&lt;/p&gt;




&lt;h2&gt;
  
  
  Deep Dives with EXPLAIN ANALYZE
&lt;/h2&gt;

&lt;p&gt;When Debug Toolbar or snakeviz points to a slow query, &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; tells you &lt;em&gt;why&lt;/em&gt; it's slow at the database level.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="k"&gt;ANALYZE&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;send_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'pending'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;send_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'2026-03-01'&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;send_at&lt;/span&gt; &lt;span class="k"&gt;ASC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What the output tells you
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Indicator&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;th&gt;Fix&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Seq Scan&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Full table scan — no index used&lt;/td&gt;
&lt;td&gt;Add index on filtered columns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;Nested Loop&lt;/code&gt; + high rows&lt;/td&gt;
&lt;td&gt;Looping join on large sets&lt;/td&gt;
&lt;td&gt;Check join indexes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;Sort&lt;/code&gt; with high cost&lt;/td&gt;
&lt;td&gt;Sorting without index support&lt;/td&gt;
&lt;td&gt;Add index matching ORDER BY&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;Rows Removed by Filter&lt;/code&gt; (high)&lt;/td&gt;
&lt;td&gt;Index not selective enough&lt;/td&gt;
&lt;td&gt;Use composite or partial index&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Fixing a missing index
&lt;/h3&gt;

&lt;p&gt;Debug Toolbar showed a query on the messages table taking 180 ms. &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; confirmed a sequential scan:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;Seq&lt;/span&gt; &lt;span class="n"&gt;Scan&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cost&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt;&lt;span class="p"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;28453&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1250&lt;/span&gt; &lt;span class="n"&gt;actual&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;028&lt;/span&gt;&lt;span class="p"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;178&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;403&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1247&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;Filter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'pending'&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;send_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'2026-03-01'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
  &lt;span class="k"&gt;Rows&lt;/span&gt; &lt;span class="n"&gt;Removed&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;Filter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;498753&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Scanning 500,000 rows to return 1,247. A partial index fixed it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_messages_pending&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;send_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'pending'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After the index:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;Index&lt;/span&gt; &lt;span class="n"&gt;Scan&lt;/span&gt; &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;idx_messages_pending&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cost&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;29&lt;/span&gt;&lt;span class="p"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1250&lt;/span&gt; &lt;span class="n"&gt;actual&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;015&lt;/span&gt;&lt;span class="p"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;203&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1247&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From 178 ms to 1.2 ms. The partial index is small because it only covers pending messages, so it stays fast even as the table grows.&lt;/p&gt;

&lt;p&gt;In Django migrations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Migration&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;migrations&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Migration&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;operations&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;migrations&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;AddIndex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;fields&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;send_at&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="n"&gt;condition&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Q&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;pending&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;idx_messages_pending&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Full Workflow
&lt;/h2&gt;

&lt;p&gt;Here's the process I follow for every slow endpoint:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Hit the endpoint with Debug Toolbar enabled.&lt;/strong&gt;&lt;br&gt;
Check the SQL panel first. High query count with duplicate badges = N+1. Fix with &lt;code&gt;select_related&lt;/code&gt; or &lt;code&gt;prefetch_related&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Check the profiling panel.&lt;/strong&gt;&lt;br&gt;
If query count is fine but the request is still slow, the profiling panel shows where time goes in Python code — serialization, computation, template rendering.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Profile in isolation with cProfile + snakeviz.&lt;/strong&gt;&lt;br&gt;
For deeper analysis or non-web-request profiling (management commands, Celery tasks), capture a &lt;code&gt;.prof&lt;/code&gt; file and visualize it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Run EXPLAIN ANALYZE on slow queries.&lt;/strong&gt;&lt;br&gt;
When a specific query is the bottleneck, check the execution plan. Look for &lt;code&gt;Seq Scan&lt;/code&gt; and add targeted indexes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Verify the fix.&lt;/strong&gt;&lt;br&gt;
Hit the endpoint again with Debug Toolbar. Confirm query count dropped, execution time improved. Run &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; again to confirm the index is being used.&lt;/p&gt;

&lt;p&gt;This loop — observe, identify, fix, verify — is the same one I follow across all my projects. I wrote about it in broader context (including production monitoring, caching strategies, and async offloading) in &lt;a href="https://ankitjang.one/blog/how-to-optimise-performance" rel="noopener noreferrer"&gt;How to Optimise Backend Performance&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Beyond Local Profiling
&lt;/h2&gt;

&lt;p&gt;Debug Toolbar and snakeviz are local development tools. They catch problems before code ships. But some issues only appear under production load — connection pool exhaustion, cache stampedes, replication lag.&lt;/p&gt;

&lt;p&gt;For my &lt;a href="https://ankitjang.one/case-studies/message-scheduler" rel="noopener noreferrer"&gt;Message Scheduler&lt;/a&gt;, I use Celery Flower for worker monitoring and structured logging with &lt;code&gt;structlog&lt;/code&gt; for production request tracing. On my &lt;a href="https://ankitjang.one/projects/portfolio" rel="noopener noreferrer"&gt;portfolio's AI chatbot&lt;/a&gt;, the Cloudflare Worker proxy handles error states and I track response latency through server logs.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://ankitjang.one/projects/healthlab" rel="noopener noreferrer"&gt;HealthLab&lt;/a&gt; platform uses health check endpoints that verify database connectivity — simple but catches the most common production failure.&lt;/p&gt;

&lt;p&gt;The tools change between local and production, but the principle stays: find where time goes, fix the biggest bottleneck, verify the improvement.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tools Reference
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;What It Does&lt;/th&gt;
&lt;th&gt;When to Use&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Django Debug Toolbar (SQL panel)&lt;/td&gt;
&lt;td&gt;Shows all queries per request with timing and stack traces&lt;/td&gt;
&lt;td&gt;First check on any slow endpoint&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Django Debug Toolbar (Profiling panel)&lt;/td&gt;
&lt;td&gt;Call tree with cumulative time per function&lt;/td&gt;
&lt;td&gt;When query count is fine but request is slow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;cProfile + snakeviz&lt;/td&gt;
&lt;td&gt;Python profiler with visual flame graph&lt;/td&gt;
&lt;td&gt;Management commands, Celery tasks, isolated functions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;PostgreSQL execution plan with actual timings&lt;/td&gt;
&lt;td&gt;When a specific query is the bottleneck&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;QueryCountMiddleware&lt;/td&gt;
&lt;td&gt;Logs query count per request in staging&lt;/td&gt;
&lt;td&gt;Catching N+1 regressions before they hit production&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;p&gt;All my projects — including architecture diagrams, tradeoff analysis, and failure mode documentation — are at &lt;a href="https://ankitjang.one/projects" rel="noopener noreferrer"&gt;ankitjang.one/projects&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;About me&lt;/strong&gt;: I'm &lt;a href="https://ankitjang.one" rel="noopener noreferrer"&gt;Ankit Jangwan&lt;/a&gt;, a Senior Software Engineer building backend systems with Django, PostgreSQL, Celery, and Go. See my case studies at &lt;a href="https://ankitjang.one/case-studies" rel="noopener noreferrer"&gt;ankitjang.one/case-studies&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>django</category>
      <category>performance</category>
      <category>database</category>
      <category>profiling</category>
    </item>
  </channel>
</rss>
