<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Tomislav Maricevic</title>
    <description>The latest articles on Forem by Tomislav Maricevic (@tmarice).</description>
    <link>https://forem.com/tmarice</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F199683%2F69f764bc-a0a0-4318-8a6e-f015d2630af6.jpeg</url>
      <title>Forem: Tomislav Maricevic</title>
      <link>https://forem.com/tmarice</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/tmarice"/>
    <language>en</language>
    <item>
      <title>The Illusion of Simplicity</title>
      <dc:creator>Tomislav Maricevic</dc:creator>
      <pubDate>Tue, 25 Nov 2025 16:07:56 +0000</pubDate>
      <link>https://forem.com/tmarice/the-illusion-of-simplicity-3kc7</link>
      <guid>https://forem.com/tmarice/the-illusion-of-simplicity-3kc7</guid>
      <description>&lt;p&gt;I really like Django. I would pick Django over any other option for setting up a website regardless of expected complexity. In my opinion, if you fully embrace Django, it will allow you to focus on the product and not fight an uphill battle against the computer.&lt;/p&gt;

&lt;p&gt;I do a bit of freelancing on the side, and it saddens me that I rarely see Django projects in the wild. Wherever I join and I'm lucky enough that it's a Python gig, it's usually Flask or FastAPI.&lt;/p&gt;

&lt;p&gt;When I ask why, it's usually something along the lines of: "Oh, we don't need Django, it's too complex. We just need a simple API".&lt;/p&gt;

&lt;p&gt;Yet, they need database access, and ORMs are nice so they brought in SQLAlchemy. And they need user authentication, so they roll their own roles and permissions. And they need JWTs because the frontend is a React app with its own stack. And they need caching so they roll their own. And they need request validation and OpenAPI Javascript client generation so they bring in Pydantic. And of course they need horizontal scalability so they deploy everything on Kubernetes. And then: "We don't need Celery, it's too complex." so they add APScheduler. And then it turns out they do need simple workflows and CPU-heavy processing so they roll their own background task manager.&lt;/p&gt;

&lt;p&gt;And here I am, looking at this amalgamation of bytes created in the name of simplicity, thinking: "What a poor reimplementation of Django."&lt;/p&gt;

</description>
      <category>django</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Vim mandates &gt;&gt; AI mandates</title>
      <dc:creator>Tomislav Maricevic</dc:creator>
      <pubDate>Tue, 16 Sep 2025 09:15:03 +0000</pubDate>
      <link>https://forem.com/tmarice/vim-mandates-ai-mandates-4l98</link>
      <guid>https://forem.com/tmarice/vim-mandates-ai-mandates-4l98</guid>
      <description>&lt;h2&gt;
  
  
  Intro
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://x.com/tobi/status/1909251946235437514" rel="noopener noreferrer"&gt;The internet&lt;/a&gt; &lt;a href="https://www.linkedin.com/posts/duolingo_below-is-an-all-hands-email-from-our-activity-7322560534824865792-l9vh/" rel="noopener noreferrer"&gt;is full&lt;/a&gt; &lt;a href="https://x.com/michakaufman/status/1909610844008161380" rel="noopener noreferrer"&gt;of articles&lt;/a&gt; &lt;a href="https://www.bloomberg.com/news/videos/2024-12-12/klarna-ceo-on-us-banking-ambitions-video" rel="noopener noreferrer"&gt;on CEOs&lt;/a&gt; declaring their companies "AI-first", in the name of increasing efficiency. After all, why have 50 engineers if you can have 5 managing a swarm of AI agents?&lt;/p&gt;

&lt;p&gt;I am a heavy user of GenAI assistive tools and they truly do help me achieve results faster. But another thing that helps you achieve results faster is not having to fight an uphill battle against whichever text editor you're using. If you have to lift your hands from the keyboard, you're wasting time.&lt;/p&gt;

&lt;p&gt;Yet I never heard any company issuing a "Vim keybindings mandate" and declaring themselves "modal-editing first", even though these tools are over 30 years old.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is programming?
&lt;/h2&gt;

&lt;p&gt;If we squint hard enough, defining programming is very simple -- we solve problems by translating abstract processes into textual artifacts we feed into our magic-rune-inscribed melted-sand tablets so they can understand and execute them.&lt;/p&gt;

&lt;p&gt;This process is not linear: we do not first come up with a complete solution in our head, and then type it out, then run it, and call it a day. We iteratively think about the problem, type in some of the code, then think about it some more, then maybe look up something in the documentation, then type some more, etc.&lt;/p&gt;

&lt;p&gt;Our brains are quite efficient at thinking, so most of the friction happens in other steps: typing text and looking things up. For now, the keyboard is still the best tool we have for this. From seasoned engineers to vibe coders, everyone has to move the text from their heads to the computer by typing out code (or prompts!).&lt;/p&gt;

&lt;p&gt;If every time you notice a typo you have to hunt for the mouse, or smash those arrow keys for 10 seconds, you risk losing your train of thought and falling out of the flow state. The more you fight with inputting text, the less you focus on the problem itself.&lt;/p&gt;

&lt;p&gt;Our duty as software engineering professionals is to reduce this friction as much as possible. We learn the ins and outs of programming languages so we can think in appropriate abstractions and avoid frequent lookups. We learn multiple languages so we can deliver efficient solutions without performing acrobatics in languages not appropriate for the task. We utilize GenAI tools to handle boilerplate and cruft. And we optimize our text editors so we can input text as fast as possible.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Vim?
&lt;/h2&gt;

&lt;p&gt;Vim tries to reduce the friction in text editing as much as possible. Your hands stay on the keyboard on the home row all the time. The editor comes with a rich set of built-in shortcuts encompassing every text manipulation you can imagine. For specialized tasks, there are plugins that address them, and you can define your own custom shortcuts for your own workflow needs.&lt;/p&gt;

&lt;p&gt;Building your own configuration is a crucial part of the process, where you get acquainted with the editor and the ecosystem. Realistically, the migration to Vim will be a J-curve: at first, you will be less productive, then after a week or two, you'll be where you were before, and if you pushed through, only after a couple of weeks of use you will start seeing gains in productivity.&lt;/p&gt;

&lt;p&gt;The worst and the best part is that if you embrace Vim, it will ruin all other software for you. You will start looking at software through the lens of keyboard-only usability. If you have to reach for the mouse, or if it has its own silly shortcuts, you won't use it. And it really is a slippery slope: Vim is just a gateway drug, leading to tiling window managers and terminal multiplexers, and before you know it, you'll be using Nix and wondering how on Earth you managed to get anything done before.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where does this leave us?
&lt;/h2&gt;

&lt;p&gt;The "AI mandates" bullshit is purely performative, and the sad thing is everyone knows this -- the CEOs know it, the employees know it, the rest of us observing from the sidelines know it. LLMs truly are wonderful technology, and the productivity gains are real, but if the true goal is productivity, there are already many many ways it can be improved, and no one wrote memos about it. The truly efficient companies are staffed with conscientious engineers who do not have to be mandated to use the best tools available; they seek them out themselves.&lt;/p&gt;

&lt;p&gt;Time is the only real currency in this world, and you're leaving money on the table if you're not using Vim. After all, Vim won't take your job. But someone using Vim will.&lt;/p&gt;

</description>
      <category>vim</category>
    </item>
    <item>
      <title>Configuring PostgreSQL server parameters on GitHub Actions</title>
      <dc:creator>Tomislav Maricevic</dc:creator>
      <pubDate>Thu, 28 Aug 2025 09:01:51 +0000</pubDate>
      <link>https://forem.com/tmarice/configuring-postgresql-server-parameters-on-github-actions-49p3</link>
      <guid>https://forem.com/tmarice/configuring-postgresql-server-parameters-on-github-actions-49p3</guid>
      <description>&lt;p&gt;If your project is not fully containerized, but you still want to use PostgreSQL in your GitHub Actions workflow, you can use the &lt;a href="https://docs.github.com/en/actions/tutorials/use-containerized-services" rel="noopener noreferrer"&gt;&lt;code&gt;services&lt;/code&gt;&lt;/a&gt; feature of GitHub Actions to easily spin up a PostgreSQL container.&lt;/p&gt;

&lt;p&gt;However, the &lt;code&gt;services&lt;/code&gt; functionality restricts what you can configure declaratively in the workflow file -- namely you cannot configure the PostgreSQL server parameters that you would usually set up in the &lt;code&gt;postgresql.conf&lt;/code&gt; file. Fortunately, most of these can be configured through &lt;code&gt;ALTER SYSTEM&lt;/code&gt; commands.&lt;/p&gt;

&lt;p&gt;For example, this is how to configure &lt;code&gt;max_locks_per_transaction&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;

    &lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;postgres&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgres:17&lt;/span&gt;
        &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;POSTGRES_USER&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgres&lt;/span&gt;
          &lt;span class="na"&gt;POSTGRES_PASSWORD&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgres&lt;/span&gt;
          &lt;span class="na"&gt;POSTGRES_DB&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;
        &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;5432:5432&lt;/span&gt;
        &lt;span class="na"&gt;options&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;-&lt;/span&gt;
          &lt;span class="s"&gt;--name pg_container&lt;/span&gt;
          &lt;span class="s"&gt;--health-cmd pg_isready&lt;/span&gt;
          &lt;span class="s"&gt;--health-interval 10s&lt;/span&gt;
          &lt;span class="s"&gt;--health-timeout 5s&lt;/span&gt;
          &lt;span class="s"&gt;--health-retries 5&lt;/span&gt;

    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Set up environment&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;sudo apt-get update&lt;/span&gt;
          &lt;span class="s"&gt;sudo apt-get install -y \&lt;/span&gt;
            &lt;span class="s"&gt;postgresql-client \&lt;/span&gt;
            &lt;span class="s"&gt;wait-for-it&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Configure PostgreSQL&lt;/span&gt;
        &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;PGUSER&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgres&lt;/span&gt;
          &lt;span class="na"&gt;PGPASSWORD&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgres&lt;/span&gt;
          &lt;span class="na"&gt;PGHOST&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;127.0.0.1&lt;/span&gt;
          &lt;span class="na"&gt;PGPORT&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5432&lt;/span&gt;
          &lt;span class="na"&gt;PGDATABASE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;template1&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;psql -c "SHOW max_locks_per_transaction;"&lt;/span&gt;
          &lt;span class="s"&gt;psql -c "ALTER SYSTEM set max_locks_per_transaction = 128;"&lt;/span&gt;
          &lt;span class="s"&gt;docker restart pg_container&lt;/span&gt;
          &lt;span class="s"&gt;wait-for-it localhost:5432 --timeout=30 --strict -- echo "PostgreSQL is up"&lt;/span&gt;
          &lt;span class="s"&gt;psql -c "SHOW max_locks_per_transaction;"&lt;/span&gt;

&lt;span class="c1"&gt;# ... the rest of the workflow&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Of course, if you need heavy customization, it makes more sense to skip &lt;code&gt;services&lt;/code&gt; and run your own container through a &lt;code&gt;docker run&lt;/code&gt; step or docker-compose, but for simple use cases, this is a quick way to get started.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>githubactions</category>
      <category>cicd</category>
    </item>
    <item>
      <title>On Usable Documentation</title>
      <dc:creator>Tomislav Maricevic</dc:creator>
      <pubDate>Sun, 30 Mar 2025 05:53:21 +0000</pubDate>
      <link>https://forem.com/tmarice/on-usable-documentation-ea1</link>
      <guid>https://forem.com/tmarice/on-usable-documentation-ea1</guid>
      <description>&lt;p&gt;Having no documentation is often less harmful than having inaccurate documentation.&lt;/p&gt;

&lt;p&gt;Like code, documentation degrades over time. What was once accurate may now be obsolete. And practices we once ignored might now be part of our daily workflow. Unless maintaining documentation is an intentional process, it will rot, maybe beyond saving.&lt;/p&gt;

&lt;p&gt;In this article I'll outline a few guidelines that worked well for me in the past for keeping the developer documentation usable. They're certainly not universal, but they are a good starting point for a small team of experienced developers. We’ll skip the philosophical debates and focus on real-world practices that work for small, fast-moving engineering teams - the aim is to get the most value with the least effort. Documentation doesn't pay the&lt;br&gt;
bills.&lt;/p&gt;
&lt;h2&gt;
  
  
  1. Keep Documentation in the Codebase
&lt;/h2&gt;

&lt;p&gt;If your documentation is for developers, the natural thing to do is to keep the documentation close to the code. Create a &lt;code&gt;docs/&lt;/code&gt; folder in the project root and add some markdown files to it. Everyone already has a documentation viewer -- their code editor. As a cherry on top, GitHub renders markdown files in the browser so you can visually browse the documentation. Add a README.md with a short description to every subfolder to make it easier to navigate on the web.&lt;/p&gt;

&lt;p&gt;Greppability is a huge plus -- searching for a class name will render both code and documentation results.&lt;/p&gt;

&lt;p&gt;I usually read the raw markdown files because most of our articles are text-only, but in case of multimedia, I can view it either using my editor's markdown preview, or navigate to GitHub.&lt;/p&gt;
&lt;h2&gt;
  
  
  2. Flat is generally better
&lt;/h2&gt;

&lt;p&gt;Engineers tend to value neatness. It's very tempting to start developing a complex hierarchy of folders, enumerating all possible domains, adding placeholder folders and articles. The thing is, documentation is meant to be read. If the structure is too complex, the information becomes fragmented and hard to find, even though it's seemingly well-organized.&lt;/p&gt;

&lt;p&gt;When organizing the documentation, ask yourself how would you find this if production was down and finding this information was the only thing that could help.&lt;/p&gt;

&lt;p&gt;Another benefit of flatter organization is surfacing "unknown unknowns" — useful insights you wouldn’t have searched for but found by proximity to other information.&lt;/p&gt;

&lt;p&gt;Not so good structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/docs
    /architecture
        /infrastructure
            servers.md
            linux.md
            dns.md
            cdn.md
        /backend
            django.md
            celery.md
        /frontend
            components.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Better structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/docs
    README.md
    local_setup.md
    infrastructure.md
    deployment.md
    troubleshooting.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  3. Keep It Company-Specific
&lt;/h2&gt;

&lt;p&gt;Your documentation should contain company-specific notes, and not write in general about engineering concepts. If we have a special way of handling CORS, write about our specifics, do not explain what CORS is. There are plenty of articles on that written by people who know more about it than you.&lt;/p&gt;

&lt;p&gt;Not so good article:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CORS stands for Cross-Origin Resource Sharing.
It is a mechanism that allows ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Better article:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CORS is handled by `django-cors-headers`.
See `settings.py` for list of allowed origins.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If it's really important to understand the concept well, link to an external source, but keep in mind that links rot as well, and you will need to update them periodically. Prefer well-established sources like MDN, Django documentation, etc.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Someone Has To Be the Documentation Police
&lt;/h2&gt;

&lt;p&gt;Good documentation is a process, not a one-off task — and every process needs an owner.&lt;/p&gt;

&lt;p&gt;Even in small teams, there are people with different priorities. If we're moving fast, it's OK to skip writing documentation for a while, but someone needs to keep the documentation debt on their mind, and schedule updates, deprecations and additions. This can be anyone who is willing. Tenured engineers are probably a better choice than engineering managers who are more removed from day to day operations.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Continual Improvement
&lt;/h2&gt;

&lt;p&gt;Best test of your documentation is a new person joining the team. They come from a different way of doing things, and haven't yet got accustomed to "our way". Their first months in the company should be used to re-evaluate the usefulness and correctness of the documentation.&lt;/p&gt;

&lt;p&gt;They might suggest new additions because we're taking some knowledge as a given. They will quickly catch things that are no longer true because they will be following the steps in the documentation to the letter. It would be a shame to not use this opportunity for improvement.&lt;/p&gt;

&lt;p&gt;It's especially a good idea to make documentation review a part of the onboarding process: tell the new hires it's one of their first duties to review the documentation and suggest improvements. Make sure to follow-up on their suggestions, providing feedback on acceptance or rejection, and let them implement the changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Periodic Revision
&lt;/h2&gt;

&lt;p&gt;Very related to previous point, but still different. You will probably not reorganize the documentation because one person found it unclear. But, if it's a pattern, then you will schedule time for this on the roadmap, and approach it very seriously.&lt;/p&gt;

&lt;p&gt;6 or 12 month cadence seems reasonable. Dedicate some time to gather feedback. Take a high-level overview:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which articles are rarely used?&lt;/li&gt;
&lt;li&gt;What's outdated?&lt;/li&gt;
&lt;li&gt;Are any links broken?&lt;/li&gt;
&lt;li&gt;Are any articles too long or unfocused?&lt;/li&gt;
&lt;li&gt;Are we missing any key topics?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Asking publicly in Slack what people think of the current state of the documentation is a good start.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Make Documentation a Habit
&lt;/h2&gt;

&lt;p&gt;Unused documentation is a sign of bad documentation.&lt;/p&gt;

&lt;p&gt;Propagate the documentation. Link to articles instead of providing the answer directly, this increases the chance of discovering unknown unknowns. If the answer is not in the documentation, then write it down. Encourage your team colleagues to do the same. Build the culture of writing documentation.&lt;/p&gt;

&lt;p&gt;When done right, usable documentation is a force multiplier that makes onboarding faster, reduces support noise, and helps everyone move faster with confidence.&lt;/p&gt;

</description>
      <category>documentation</category>
      <category>python</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Handling CSRF Login Errors Gracefully in Django</title>
      <dc:creator>Tomislav Maricevic</dc:creator>
      <pubDate>Sat, 22 Mar 2025 12:37:28 +0000</pubDate>
      <link>https://forem.com/tmarice/handling-csrf-login-errors-gracefully-in-django-1chn</link>
      <guid>https://forem.com/tmarice/handling-csrf-login-errors-gracefully-in-django-1chn</guid>
      <description>&lt;h1&gt;
  
  
  What's CSRF?
&lt;/h1&gt;

&lt;p&gt;Cross site request forgery is a type of attack where a malicious website tricks a user into performing actions on another site where they're authenticated. This is usually done by embedding a form in the malicious site, and submitting it to the target site.&lt;/p&gt;

&lt;p&gt;An example of this would be a card game website where, when you hit the "Play" button, it sends a POST request to another site with the payload to change your login email address to the attacker's. Since you're logged in to the target site, the request goes through and you lose access to your account.&lt;/p&gt;

&lt;h1&gt;
  
  
  How does it work in Django
&lt;/h1&gt;

&lt;p&gt;By default, Django servers you a cookie with the CSRF token on the first request. This token (in a masked form) is embedded in every form that Django generates, and is unique to the user and the session.&lt;/p&gt;

&lt;p&gt;The form token is checked on every unsafe request (POST, PUT, DELETE, PATCH). If the token is missing, invalid, or does not match the token in the cookie, the server responds with a 403 Forbidden response.&lt;/p&gt;

&lt;p&gt;This way Django ensures that the request is coming from the site itself, and not from a malicious third party, since no other server can generate valid CSRF tokens.&lt;/p&gt;

&lt;h1&gt;
  
  
  The problem
&lt;/h1&gt;

&lt;p&gt;The scenario is as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You open a website in one tab&lt;/li&gt;
&lt;li&gt;You open the same website in another tab&lt;/li&gt;
&lt;li&gt;You log in in the second tab, and start using the website&lt;/li&gt;
&lt;li&gt;You go back to the first tab, and try to do something that requires a POST request (like submitting a form)&lt;/li&gt;
&lt;li&gt;You get a 403 Forbidden CSRF Error response&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For security reasons, Django cycles CSRF tokens on every login. This means that the token embedded in the form in the first tab is now invalid since it was generated before your login in the second tab.&lt;/p&gt;

&lt;p&gt;Django, being the best web framework out there, even warns you about this if you have &lt;code&gt;DEBUG = True&lt;/code&gt; and you get a CSRF failure.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6rfw6frrpx9ntoo42pgh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6rfw6frrpx9ntoo42pgh.png" alt="Image description" width="800" height="232"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Since this can happen to regular users, it's not just a security problem, but also a UX problem. The users are most likely to encounter it on the login page because it is one of the few public forms every site has, and a successful login cycles the token.&lt;/p&gt;

&lt;h1&gt;
  
  
  Solution #1: Pure Django solution
&lt;/h1&gt;

&lt;p&gt;Django allows setting a custom CSRF failure handler view via &lt;code&gt;settings.CSRF_FAILURE_VIEW&lt;/code&gt; variable. For a seamless UX, in case this happens on the login view, you could redirect the user back to the referrer page. Since they're already logged in, they will be able to access it.&lt;/p&gt;

&lt;p&gt;As a bonus, let's add nice template for the CSRF failure view that explains what happened and offers a button to go back to the previous page.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# settings.py
&lt;/span&gt;&lt;span class="n"&gt;CSRF_FAILURE_VIEW&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;myapp.views.csrf_failure&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

&lt;span class="c1"&gt;# views.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HTTPStatus&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;django.shortcuts&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;redirect&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;render&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;csrf_failure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;referer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;META&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HTTP_REFERER&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;url_name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;login&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;redirect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;referer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;render&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;csrf_failure.html&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;referer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;referer&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;HTTPStatus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FORBIDDEN&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Solution #2: Javascript
&lt;/h1&gt;

&lt;p&gt;Another solution would be to use Javascript to periodically check if the CSRF cookie changed since the initial page load and warn the user if it did.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// csrf.js&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;COOKIE_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;csrftoken&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;getCookie&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`; &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cookie&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;parts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`; &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;=`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pop&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;;&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;shift&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;checkCSRFChange&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;currentToken&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getCookie&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;COOKIE_NAME&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;currentToken&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;currentToken&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="nx"&gt;initialToken&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;alert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Your session has changed or expired. Please reload the page to avoid losing changes.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;initialToken&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getCookie&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;COOKIE_NAME&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nf"&gt;setInterval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;checkCSRFChange&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>python</category>
      <category>django</category>
      <category>csrf</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Better living through optimized Django</title>
      <dc:creator>Tomislav Maricevic</dc:creator>
      <pubDate>Sat, 22 Mar 2025 12:32:08 +0000</pubDate>
      <link>https://forem.com/tmarice/better-living-through-optimized-django-59en</link>
      <guid>https://forem.com/tmarice/better-living-through-optimized-django-59en</guid>
      <description>&lt;p&gt;Every engineer that loves Django and has a blog has at least one of these posts.&lt;/p&gt;

&lt;p&gt;Django's ORM is excellent, but given enough time it's easy for approaches that weren't mistakes to grow into mistakes This is a great thing, because it usually means your company didn't go bankrupt, you're still here and can fix things, and the company is doing well because the scale increased (hopefully your compensation as well).&lt;/p&gt;

&lt;p&gt;This is a recap of my recent experience into optimizing Celery tasks that started out as non-problematic, but with the passage of time became problematic, causing server and database stability issues.&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;code&gt;prefetch_related&lt;/code&gt; + &lt;code&gt;iterator()&lt;/code&gt; problem
&lt;/h1&gt;

&lt;p&gt;Up until Django 4.1, calling &lt;code&gt;iterator()&lt;/code&gt; on a queryset with &lt;code&gt;prefetch_related()&lt;/code&gt; caused the prefetched data to be dropped, causing N+1 queries problem.&lt;/p&gt;

&lt;p&gt;In Django 4.1 the &lt;code&gt;iterator&lt;/code&gt; started accepting &lt;code&gt;batch_size&lt;/code&gt; argument that allows us to get the best of both worlds -- avoid pulling the entire dataset into memory while avoiding the N+1 queries problem. Or at least turn the N+1 into N / batch_size + 1, which is considerably better. &lt;br&gt;
But the &lt;code&gt;batch_size&lt;/code&gt;'s default value is None which reverts to old behavior, discarding the prefetched data silently. &lt;/p&gt;

&lt;p&gt;The pseudocode from the problematic task looked something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;profiles&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Profile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;community&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;community&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;select_related&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;bunch&lt;/span&gt; &lt;span class="n"&gt;of&lt;/span&gt; &lt;span class="n"&gt;joins&lt;/span&gt; &lt;span class="n"&gt;here&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;prefetch_related&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;user__groups&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;bunch&lt;/span&gt; &lt;span class="n"&gt;of&lt;/span&gt; &lt;span class="n"&gt;other&lt;/span&gt; &lt;span class="n"&gt;m2ms&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;BATCH_SIZE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;
&lt;span class="n"&gt;records&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;profile&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;profiles&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;iterator&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;group_names&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;g&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;groups&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt;
    &lt;span class="n"&gt;primary_group&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;determine_primary_group&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;group_names&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;denorm_profile&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DenormProfileRecord&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;primary_group&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;primary_group&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;groups&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;group_names&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;denorm_profile&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;BATCH_SIZE&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;DenormProfileRecord&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bulk_create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;records&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="n"&gt;DenormProfileRecord&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bulk_create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This task took &lt;em&gt;hours&lt;/em&gt;, and used up a lot of database resources. When it was originally written, there were not that many profiles and there were not many other tasks demanding database resources so it worked well. But, we were lucky: people registered in increasing numbers, other business processes took their own chunk of database's resources, and this really became a bottleneck.&lt;/p&gt;

&lt;p&gt;My first optimization attempt was:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Increase the batch size to 10k, to reduce the number of &lt;code&gt;bulk_create ()&lt;/code&gt; calls&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;only()&lt;/code&gt; on the Profile queryset to avoid fetching unnecessary data&lt;/li&gt;
&lt;li&gt;Since I believed &lt;code&gt;only()&lt;/code&gt; reduced the memory requirements enough, I dropped the &lt;code&gt;iterator()&lt;/code&gt; to enable &lt;code&gt;prefetch_related()&lt;/code&gt; to do its thing&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This optimization turned out to be ... less than optimal.&lt;/p&gt;

&lt;p&gt;One week and one server and database outage later, I was forced to revisit my optimization.&lt;/p&gt;

&lt;p&gt;My first failing was in not examining the entire context in which this code is run. It's part of a Celery task executed by a Celery worker with &lt;code&gt;--concurrency=4&lt;/code&gt;, meaning that it's possible that we try to refresh 4 big communities at the same time. &lt;/p&gt;

&lt;p&gt;Second, I failed to account for some communities having 100s of thousands of profiles. Removing the &lt;code&gt;iterator()&lt;/code&gt; call means all of these profiles are loaded into memory at once.&lt;/p&gt;

&lt;p&gt;Third, I underestimated the difference in memory consumption between Python models instances (which are still constructed when you use &lt;code&gt;only()&lt;/code&gt;) and Python built-in types.&lt;/p&gt;

&lt;p&gt;The second optimization attempt was:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Recognizing that we don't really need model instances from the prefetched relations, we just need certain values -- we can get &lt;em&gt;much&lt;/em&gt; better performance by using PostgreSQL-specific &lt;code&gt;ArrayAgg&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;only()&lt;/code&gt; only marginally reduced the memory footprint and since we don't actually need the Profile model instances, we can get a huge benefit from enumerating all required fields in a &lt;code&gt;values_list()&lt;/code&gt; call and avoid constructing the model instances completely&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The final version looked something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;profiles&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Profile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;community&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;community&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;annotate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;group_names&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;ArrayAgg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;user__groups&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="nb"&gt;filter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;Q&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user__groups__isnull&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;distinct&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="n"&gt;other&lt;/span&gt; &lt;span class="n"&gt;values&lt;/span&gt; &lt;span class="n"&gt;that&lt;/span&gt; &lt;span class="n"&gt;can&lt;/span&gt; &lt;span class="n"&gt;be&lt;/span&gt; &lt;span class="n"&gt;aggreged&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;well&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;values_list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;group_names&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nb"&gt;all&lt;/span&gt; &lt;span class="n"&gt;other&lt;/span&gt; &lt;span class="n"&gt;fields&lt;/span&gt; &lt;span class="n"&gt;that&lt;/span&gt; &lt;span class="n"&gt;were&lt;/span&gt; &lt;span class="n"&gt;necessary&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;DenormProfileRecord&lt;/span&gt;
    &lt;span class="n"&gt;named&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;BATCH_SIZE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt;
&lt;span class="n"&gt;records&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;profile&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;profiles&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;iterator&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;primary_group&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;determine_primary_group&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;group_names&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;denorm_profile&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DenormProfileRecord&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;primary_group&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;primary_group&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;groups&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;group_names&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;denorm_profile&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;BATCH_SIZE&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;DenormProfileRecord&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bulk_create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;records&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="n"&gt;DenormProfileRecord&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bulk_create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This way we could keep the &lt;code&gt;iterator()&lt;/code&gt; call since there were no &lt;code&gt;prefetch_related()&lt;/code&gt; calls. The &lt;code&gt;values_list()&lt;/code&gt; optimization wasn't actually necessary because we only had a single row of Profile data in memory at the same time, but I kept it just in case.&lt;/p&gt;

&lt;p&gt;This reduced the memory strain on the server from "fills up the RAM and swap and causes OOM killer to go on a rampage" to "unnoticeable". The runtime dropped from several hours to ~30s.&lt;/p&gt;

&lt;h1&gt;
  
  
  Optimizing Redis access
&lt;/h1&gt;

&lt;p&gt;This one isn't really Django ORM related, but it was done in the same batch of optimizations so I'll touch on it.&lt;br&gt;
We utilize Redis to keep a sorted set of video name prefixes allowing live autocomplete while typing in the search box on the site.&lt;/p&gt;

&lt;p&gt;Populating this Redis sorted set is done a daily basis: it's completely dropped and recreated from scratch. &lt;/p&gt;

&lt;p&gt;The function looks something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;redis.client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Redis&lt;/span&gt;

&lt;span class="n"&gt;AUTOCOMPLETE_REDIS_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;autocomplete&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="n"&gt;MAX_PREFIX_LENGHT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;update_autocomplete_prefixes&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;redis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;REDIS_CACHE_LOCATION&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;AUTOCOMPLETE_REDIS_KEY&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;video_title&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;Video&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;values_list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flat&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;video_title&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;MAX_PREFIX_LENGHT&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;zadd&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Upon investigating the used &lt;code&gt;redis&lt;/code&gt; library, it turns out each &lt;code&gt;zadd()&lt;/code&gt; call is a single network request. As the number of videos grew, the number of network requests grew as well, until this task took about 15 minutes to complete.&lt;/p&gt;

&lt;p&gt;The optimization approach here was to collect all Redis updates in a single dictionary and push it in a single network call. This approach also allowed moving the delete call much closer to the single &lt;code&gt;zadd()&lt;/code&gt; call, reducing the time where the autocomplete prefixes were only partially available.&lt;/p&gt;

&lt;p&gt;One small database related improvement was pushing the &lt;code&gt;strip()&lt;/code&gt; and &lt;code&gt;lower()&lt;/code&gt; calls to the database utilizing the &lt;code&gt;Trim()&lt;/code&gt; and &lt;code&gt;Lower()&lt;/code&gt; database functions.&lt;/p&gt;

&lt;p&gt;The rewritten task looks something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;redis.client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Redis&lt;/span&gt;

&lt;span class="n"&gt;AUTOCOMPLETE_REDIS_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;autocomplete&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="n"&gt;MAX_PREFIX_LENGHT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;update_autocomplete_prefixes&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;prefixes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;video_title&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;Video&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;annotate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;values_list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Trim&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Lower&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="n"&gt;flat&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;prefixes&lt;/span&gt; &lt;span class="o"&gt;|=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;video_title&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;MAX_PREFIX_LENGTH&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;

    &lt;span class="n"&gt;redis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;REDIS_CACHE_LOCATION&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;AUTOCOMPLETE_REDIS_KEY&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;zadd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prefixes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This reduced the runtime of the task from 15 minutes to ~5 seconds. &lt;/p&gt;

&lt;p&gt;I also entertained the idea of using &lt;code&gt;memoryview&lt;/code&gt;s to avoid constructing new string objects for each prefix, but there were risks associated with handling unicode characters (which were present in the video titles, and memoryviews operate on bytes), and not really being familiar with how the  redis Python library handles the passed data (it could quite easily cast these memoryviews back to strings, annuling any gains). &lt;/p&gt;

&lt;h1&gt;
  
  
  Optimizing deletion of old records
&lt;/h1&gt;

&lt;p&gt;For debugging purposes, we retain a copy of each email sent to our users. Since we like to keep things simple, this data is kept within a table in PostgreSQL, and the old records are purged from the table daily. Retention policy is 2 weeks, so every day there is a Celery task that identifies old records and deletes them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;remove_old_emails&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;old_mailer_ids&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;Mailer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sent__lte&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;timezone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nf"&gt;relativedelta&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;weeks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;old_emails&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Email&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mailer_id__in&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;old_mailer_ids&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;old_email&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_raw_delete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;old_emails&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This regularly took 20-30 minutes, even with the &lt;code&gt;_raw_delete&lt;/code&gt; optimization. The table is not humongous, it definitely should not take that long.&lt;/p&gt;

&lt;p&gt;For historical reasons (when the table was humongous and a join would kill the database) the table doesn't have any foreign key constraints, and all other tables are referenced through soft foreign keys (e.g. &lt;code&gt;mailer_id&lt;/code&gt; is an integer column with an index, without a foreign key constraint). In the meantime we introduced a data retention policy to manage the table's size.&lt;/p&gt;

&lt;p&gt;The problematic part quickly emerged upon inspecting the SQL query: the list of old mailer IDs has 100k members, and is growing daily since it's a list of all mailers ever sent. This makes Postgres' life hard, and degrades every query to a full table sequential scan.&lt;/p&gt;

&lt;p&gt;The solution is clear: reduce the list of mailer IDs to something manageable. Since the task is run daily, it's safe to reduce the list to mailers that were sent between 2 weeks ago and 2 weeks and 1 day ago. We want to have some redundancy so we increased that range to 2 weeks an 3 days ago, in case something prevents the task from running for a day or two.&lt;/p&gt;

&lt;p&gt;Postgres started using an index scan instead of a sequential scan, and things sped up drastically -- the runtime dropped from 20-30 minutes to 3-5 minutes.&lt;/p&gt;

&lt;h1&gt;
  
  
  Optimizing creation of new many-to-many records
&lt;/h1&gt;

&lt;p&gt;Using &lt;code&gt;Model.objects.create()&lt;/code&gt; in a for loop is a surefire way to degrade the performance of you database -- every &lt;code&gt;create()&lt;/code&gt; call is a network request to the database with an &lt;code&gt;INSERT&lt;/code&gt; command.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;for_date&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;impression_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Impression&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;created__date&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;for_date&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content_type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;object_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;annotate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total_impressions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;Count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;content_impression&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;impression_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;DailyImpressionStats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;content_impression&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One of basic ways to improve performance, if memory allows it, is to accumulate unsaved model instances in memory and then create them all at once using a bulk_create call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;for_date&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;impression_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Impression&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;created__date&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;for_date&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content_type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;object_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;annotate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total_impressions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;Count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;daily_impressions_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;content_impression&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;impression_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;daily_impressions_list&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;DailyImpressionStats&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;content_impression&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;DailyImpressionStats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bulk_create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;daily_impressions_list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a fairly common optimization, but here's a follow-up problem: what if we also have to set a many-to-many relationship on the model we want to bulk create? At first, it seems like we cannot use the &lt;code&gt;bulk_create()&lt;/code&gt; approach anymore:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;content_impression&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;impression_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;daily_impression&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DailyImpressionStats&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;content_impression&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;tags&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_tags_for_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;content_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;content_impression&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content_type_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; 
        &lt;span class="n"&gt;object_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;content_impression&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;object_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;daily_impression&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# ERROR: daily_impression has to be saved before we can set the M2M relationship
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Luckily, digging a bit deeper into how Django implements the many to many relationship offers an answer. When we define a many to many relationship between models, Django creates an intermediary table with an accompanying &lt;code&gt;through&lt;/code&gt; model accessible via &lt;br&gt;
 &lt;code&gt;Model.m2m_field.through&lt;/code&gt;. This allows us to also accumulate the &lt;code&gt;through&lt;/code&gt; model instances in another list and bulk create them as well.&lt;/p&gt;

&lt;p&gt;If the &lt;code&gt;id&lt;/code&gt; field is declared as an &lt;code&gt;AutoField&lt;/code&gt;, PostgreSQL, MariaDB and SQLite set the &lt;code&gt;id&lt;/code&gt; field on model instances when using &lt;code&gt;bulk_create()&lt;/code&gt;. Our many to many records reference these instances so we can first bulk create the model instances, and then bulk create the many to many instances:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;for_date&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;impression_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Impression&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;created__date&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;for_date&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content_type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;object_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;annotate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total_impressions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;Count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;daily_impressions_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="n"&gt;daily_impressions_tag_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;content_impression&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;impression_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;daily_impression&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DailyImpressionStats&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;content_impression&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;daily_impressions_list&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;daily_impression&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;tags&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_tags_for_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;content_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;content_impression&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content_type_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;object_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;content_impression&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;object_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;daily_impressions_tag_list&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;DailyImpressionStats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;through&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;daily_impression&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;daily_impression&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tag&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tag&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt; 
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tag&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tags&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;DailyImpressionStats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bulk_create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;daily_impressions_list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;DailyImpressionStats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;through&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bulk_create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;daily_impressions_tag_list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Epilogue
&lt;/h1&gt;

&lt;p&gt;It's easy to dismiss the problems with the original code snippets as "skill issues", and sometimes they really are. But we need to keep in mind that equally often code starts out as performant and ends up as a bottleneck. Conditions change, scale increases, tech stack evolves. If you tried to do some of the described optimizations in the initial code push, I would probably be the first one to invoke YAGNI and ask for a simplification. Business first, tech second -- every minute you spend on optimizing a query means the business might not make enough money and not live to see the day your optimization pays off.&lt;/p&gt;

&lt;p&gt;It's not important to write the optimal code in the first go, it's important to be able to write it once it becomes problematic, and in the mean time, build things that matter.&lt;/p&gt;

</description>
      <category>django</category>
      <category>python</category>
      <category>postgres</category>
      <category>webdev</category>
    </item>
    <item>
      <title>On Python's @property decorator</title>
      <dc:creator>Tomislav Maricevic</dc:creator>
      <pubDate>Mon, 30 Sep 2024 08:56:20 +0000</pubDate>
      <link>https://forem.com/tmarice/on-pythons-property-decorator-el5</link>
      <guid>https://forem.com/tmarice/on-pythons-property-decorator-el5</guid>
      <description>&lt;p&gt;&lt;code&gt;@property&lt;/code&gt; decorator is an excellent way to reduce the readability of Python code. It obfuscates a perfectly good&lt;br&gt;
function call and tricks readers into thinking they're performing a regular attribute access or assignment.&lt;/p&gt;

&lt;p&gt;Unless there's a really good and explicit reason to do this, don't.&lt;/p&gt;
&lt;h2&gt;
  
  
  List of good and explicit reasons:
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Refactoring&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's pretty much it.&lt;/p&gt;

&lt;p&gt;If you need to turn something that (rightfully so) started out as a simple attribute, but with time accrued some more&lt;br&gt;
complex logic, @property is a good way to gracefully transition from attributes to function calls.&lt;/p&gt;
&lt;h3&gt;
  
  
  Version 1
&lt;/h3&gt;

&lt;p&gt;We start out with a simple attribute. You can get it, you can set it. As a consenting adult, you're free to do with it&lt;br&gt;
whatever you want.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
      &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Version 2:
&lt;/h3&gt;

&lt;p&gt;The project gains traction. You need to add two new features:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Emit an event whenever the &lt;code&gt;Client.value&lt;/code&gt; attribute is accessed, so other parts of the code can listen to it and
do their own thing&lt;/li&gt;
&lt;li&gt;You want a central place to validate values being assigned, to avoid littering the rest of your codebase with
error handling&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Because we're a &lt;a href="https://grugbrain.dev/" rel="noopener noreferrer"&gt;self-aware smol brain developer&lt;/a&gt;, we like plain old functions. We craft a plan&lt;br&gt;
to change the class interface to use getter/setter functions instead of direct attribute access. But since&lt;br&gt;
we're also responsible and respectful to our colleagues/clients, we don't just change the API abruptly. No, we will be&lt;br&gt;
emitting a deprecation warning for some time, and only introduce breaking changes in the API after we've given everyone&lt;br&gt;
ample time to migrate.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;warnings&lt;/span&gt;


&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# We add a private attribute to hold the value
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_value&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nd"&gt;@property&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;value&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# We can now emit a deprecation warning on 
&lt;/span&gt;        &lt;span class="c1"&gt;# each access, urging our users to migrate to the new API
&lt;/span&gt;        &lt;span class="n"&gt;warnings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A.value is deprecated, use A.get_value() instead!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;DeprecationWarning&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# ... and offload the act of retrieving the value 
&lt;/span&gt;        &lt;span class="c1"&gt;# to the newly-introduced function
&lt;/span&gt;        &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_value&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;

    &lt;span class="nd"&gt;@property.setter&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;value&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;new_value&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;warnings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A.value is deprecated, use A.set_value() instead!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;DeprecationWarning&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_value&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;new_value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# We add getter/setter functions with the new logic
&lt;/span&gt;    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_value&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_emit_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;value_access&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_value&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;set_value&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;new_value&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_validate_value&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;new_value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;new_value&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Version 3:
&lt;/h3&gt;

&lt;p&gt;Time has passed, and people have migrated to the new API. We're ready to make our lives easier, and simplify the codebase&lt;br&gt;
by removing the dirty &lt;code&gt;@property&lt;/code&gt;. Life is good again.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_value&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_value&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_emit_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;value_access&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_value&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;set_value&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;new_value&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_validate_value&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;new_value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;new_value&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Going a bit deeper
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;@property&lt;/code&gt; is an example of a descriptor. Descriptors are a neat Python construct that &lt;a href="https://docs.python.org/3/howto/descriptor.html" rel="noopener noreferrer"&gt;"lets objects customize attribute&lt;br&gt;
lookup, storage, and deletion"&lt;/a&gt;. Some of the nicer things in life I&lt;br&gt;
enjoy are made using descriptors, namely Django's ORM.&lt;/p&gt;

&lt;p&gt;But just because you can doesn't mean you should. We always strive for the least complex option, and if you're certain&lt;br&gt;
descriptors will make everyone's (not just yours!) lives easier, then go for it. Most of the time, though, plain&lt;br&gt;
functions are the way to go.&lt;/p&gt;

&lt;p&gt;Stop worrying and learn to love the function call.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Why I always assign intermediate values to local variables instead of passing them directly to function calls</title>
      <dc:creator>Tomislav Maricevic</dc:creator>
      <pubDate>Thu, 26 Sep 2024 15:53:08 +0000</pubDate>
      <link>https://forem.com/tmarice/why-i-always-assign-intermediate-values-to-local-variables-instead-of-passing-them-directly-to-function-calls-2kh7</link>
      <guid>https://forem.com/tmarice/why-i-always-assign-intermediate-values-to-local-variables-instead-of-passing-them-directly-to-function-calls-2kh7</guid>
      <description>&lt;p&gt;Instead of&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;do_something&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;res_fn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nf"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="nf"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;c&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I do:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;do_something&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;inter_1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;inter_2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;res_fn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inter_1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;inter_2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first version is much shorter, and when formatted properly, equally readable. &lt;/p&gt;

&lt;p&gt;But the reason I prefer the second approach is because all intermediate steps are saved to local variables. &lt;/p&gt;

&lt;p&gt;Exception tracking tools like Sentry, and even Django's error page that pops up when DEBUG=True is set, capture the local context. On top of that, if you ever have to step through the function with a debugger, you can see the exact return value before stepping out from the function. This is the reason why I even save the final result in a local variable, just before returning it.&lt;/p&gt;

&lt;p&gt;At the performance cost of couple of extra variable assignments, and couple of extra lines of code, this makes debugging much easier.&lt;/p&gt;

</description>
      <category>python</category>
      <category>django</category>
      <category>sentry</category>
    </item>
  </channel>
</rss>
