<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Lawrence Cooke</title>
    <description>The latest articles on Forem by Lawrence Cooke (@mrpercival).</description>
    <link>https://forem.com/mrpercival</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1040000%2Fe996541f-637f-45ac-b003-b6dbabe9aad3.jpg</url>
      <title>Forem: Lawrence Cooke</title>
      <link>https://forem.com/mrpercival</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/mrpercival"/>
    <language>en</language>
    <item>
      <title>Setting up Claude Code for success</title>
      <dc:creator>Lawrence Cooke</dc:creator>
      <pubDate>Fri, 27 Mar 2026 16:42:44 +0000</pubDate>
      <link>https://forem.com/mrpercival/setting-up-claude-code-for-success-4g73</link>
      <guid>https://forem.com/mrpercival/setting-up-claude-code-for-success-4g73</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhn72yxkh7nt5sbr9t6a1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhn72yxkh7nt5sbr9t6a1.png" alt=" " width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When first starting a new project using Claude Code, it is easy to jump ahead, diving straight into coding. However, if you spend a bit of time setting up Claude Code, the outcome will be a smoother and more enjoyable development experience.&lt;/p&gt;

&lt;h2&gt;
  
  
  Generating a CLAUDE.md file
&lt;/h2&gt;

&lt;p&gt;When first starting your project, spend time talking through the requirements of the project with Claude. Take into account your tech stack, What language are you writing it in? Are you using a framework?&lt;/p&gt;

&lt;p&gt;You should also not only discuss the tech stack, but also how you want this built. How do you want the folder structure to be laid out? How do you want to interact with the database?&lt;/p&gt;

&lt;p&gt;Here is a general list of things you might include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Programming Language (and version)&lt;/li&gt;
&lt;li&gt;Framework and routing approach&lt;/li&gt;
&lt;li&gt;Dependency injection patterns&lt;/li&gt;
&lt;li&gt;Database access layer (raw queries, query builder, ORM)&lt;/li&gt;
&lt;li&gt;Coding standard&lt;/li&gt;
&lt;li&gt;Folder and namespace conventions&lt;/li&gt;
&lt;li&gt;Error handling approach&lt;/li&gt;
&lt;li&gt;Unit Testing&lt;/li&gt;
&lt;li&gt;Third party packages you might include.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal of this conversation is to produce a &lt;strong&gt;CLAUDE.md&lt;/strong&gt; file — a markdown document that lives in your project root, that is automatically loaded into Claude's context at the start of every session. It's the source of truth that means you never have to re-explain your stack again.&lt;/p&gt;

&lt;h2&gt;
  
  
  What a CLAUDE.md looks like
&lt;/h2&gt;

&lt;p&gt;Here’s an example CLAUDE.md file&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Project Blueprint&lt;/span&gt;

&lt;span class="gu"&gt;## Stack&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Language: PHP 8.3
&lt;span class="p"&gt;-&lt;/span&gt; Framework: FlightPHP (micro-framework)
&lt;span class="p"&gt;-&lt;/span&gt; Database: Postgres 18
&lt;span class="p"&gt;-&lt;/span&gt; Coding standard: PSR-12

&lt;span class="gu"&gt;## Database Access&lt;/span&gt;
Use Flight PHP build in PDO wrapper with prepared statements exclusively.
All queries live inside the business logic — never in controllers.

&lt;span class="gu"&gt;## Folder Structure&lt;/span&gt;
app/
  Controllers/ # Thin controllers, Only reference Logic classes and UI
  Logic/       # Business logic
  config/      # Config files — DO NOT READ OR EDIT

&lt;span class="gu"&gt;## Code Style&lt;/span&gt;
Run PSR-12 checks via: vendor/bin/phpcs --standard=PSR12 app/
Auto-fix via:          vendor/bin/phpcbf --standard=PSR12 app/

&lt;span class="gu"&gt;## What NOT to do&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Never use static methods on service classes
&lt;span class="p"&gt;-&lt;/span&gt; Never put SQL in controllers
&lt;span class="p"&gt;-&lt;/span&gt; Never read from app/config/config.php
&lt;span class="p"&gt;-&lt;/span&gt; Never commit directly — all git operations are manual
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once this file exists, every new Claude Code session starts with this context. Claude knows your conventions without being told.&lt;/p&gt;

&lt;p&gt;The CLAUDE.md is a time saver when coding across multiple sessions. It also allows more frequent context clearing, which can help keep Claude Code focused on the current ask, lowering costs, and save time not having to explain design choices every session.&lt;/p&gt;

&lt;p&gt;Items can also be added to the file as you come across issues and technical asks that may not have been in the CLAUDE.md file initially, building up a good repository of information that Claude can use to help build the application to your specifications.&lt;/p&gt;

&lt;p&gt;We all have different ways of coding, and teaching Claude how you like to code through the CLAUDE.md file, will result in code similar to how you code yourself, which helps when reviewing the code, as it will seem familiar.&lt;/p&gt;

&lt;h2&gt;
  
  
  Guardrails
&lt;/h2&gt;

&lt;p&gt;Claude Code is powerful, which is why it needs guardrails. Left unconstrained, an AI can read your environment files, touch your git history, run database commands, or make network requests you didn’t intend. None of that maliciously, just helpfully, and that’s the problem.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;permissions system&lt;/strong&gt; in Claude Code lets you define exactly what it can and cannot do, locked into your project’s .claude/settings.json. The deny list is your non-negotiable safety layer.&lt;/p&gt;

&lt;p&gt;There are three settings files:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;managed-settings.json&lt;/li&gt;
&lt;li&gt;settings.local.json&lt;/li&gt;
&lt;li&gt;settings.json&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are hierarchical.&lt;/p&gt;

&lt;p&gt;Managed-settings.json is the top tier, It lives outside the repo.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;OS&lt;/th&gt;
&lt;th&gt;Path&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;macOS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/Library/Application Support/ClaudeCode/managed-settings.json&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Linux&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/etc/claude-code/managed-settings.json&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Windows&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;C:\Program Files\ClaudeCode\managed-settings.json&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Instructions in the managed-settings.json cannot be overridden by instructions in the repo level json files.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;settings.json should be committed to your repository.&lt;/li&gt;
&lt;li&gt;settings.local.json should not be committed to the repository.&lt;/li&gt;
&lt;li&gt;settings.local.json overrides settings.json.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The difference between these is that settings.json is shared across developers in a multi developer setup, while settings.local.json is intended for individual developer instructions.&lt;/p&gt;

&lt;p&gt;In a business setting, putting the most critical instructions in the managed-settings.json, and limiting the use of settings.json &amp;amp; settings.local.json, sets the system for success. While as an individual developer, just using the settings.local.json file might be sufficient.&lt;/p&gt;

&lt;h2&gt;
  
  
  What belongs on the deny list
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Environment files &amp;amp; secrets&lt;/strong&gt;&lt;br&gt;
Your &lt;code&gt;.env&lt;/code&gt;, &lt;code&gt;.env.*&lt;/code&gt;, certificates (&lt;code&gt;.pem&lt;/code&gt;, &lt;code&gt;.key&lt;/code&gt;, &lt;code&gt;.p12&lt;/code&gt;), and SSH/AWS credential folders should be completely off-limits. Claude has no reason to read them. With config files, creating a sample config with no secret keys set is useful for giving Claude access to the config structure without giving access to the keys. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Destructive git operations&lt;/strong&gt;&lt;br&gt;
Block &lt;code&gt;git commit&lt;/code&gt;, &lt;code&gt;git push&lt;/code&gt;, &lt;code&gt;git merge&lt;/code&gt;, &lt;code&gt;git reset&lt;/code&gt;, &lt;code&gt;git clean&lt;/code&gt;, and anything else that writes to your history. You own the git history, not Claude. Every commit should be decided on and controlled by a person. This allows for code reviewing prior to committing the code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Direct database access&lt;/strong&gt;&lt;br&gt;
No &lt;code&gt;mysql&lt;/code&gt;, &lt;code&gt;psql&lt;/code&gt;, &lt;code&gt;pg_dump&lt;/code&gt;, &lt;code&gt;mysqldump&lt;/code&gt;, or any other direct database CLI commands. Claude should generate migration files and queries in code, not run commands directly against your data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Network and remote access&lt;/strong&gt;&lt;br&gt;
Block &lt;code&gt;curl&lt;/code&gt;, &lt;code&gt;wget&lt;/code&gt;, &lt;code&gt;ssh&lt;/code&gt;, &lt;code&gt;scp&lt;/code&gt;, &lt;code&gt;rsync&lt;/code&gt;, and similar tools. Claude should fetch docs through approved &lt;code&gt;WebFetch&lt;/code&gt; domains, not make arbitrary outbound calls.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;System-level commands&lt;/strong&gt;&lt;br&gt;
&lt;code&gt;rm&lt;/code&gt;, &lt;code&gt;sudo&lt;/code&gt;, &lt;code&gt;chmod&lt;/code&gt;, &lt;code&gt;chown&lt;/code&gt;, &lt;code&gt;kill&lt;/code&gt;, &lt;code&gt;crontab&lt;/code&gt; — anything that can damage your system or escalate privileges.&lt;/p&gt;
&lt;h3&gt;
  
  
  The full settings file
&lt;/h3&gt;

&lt;p&gt;The allow list is just as important as the deny list. It gives explicit permission for the tools you &lt;em&gt;do&lt;/em&gt; want Claude to be able to run. Fetching framework docs, requiring Composer packages, running your linter etc.&lt;/p&gt;
&lt;h4&gt;
  
  
  Special Note
&lt;/h4&gt;

&lt;p&gt;Within a settings file, DENY instructions override ALLOW instructions, however when there is a hierarchy of settings files, the higher priority files instructions override the instructions from lower priority files. If something is denied in managed-settings.json, adding an ALLOW in settings.json will not override the denial in managed-setting.json&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"permissions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"allow"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(composer require:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(vendor/bin/phpcbf --standard=PSR12 app/)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(vendor/bin/phpcs --standard=PSR12 app/)"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"deny"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Read(app/config/config.php)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Edit(app/config/config.php)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Read(.env)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Edit(.env)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Read(**/*.pem)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Read(**/*.key)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Read(**/*.p12)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Read(**/*.pfx)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Read(~/.ssh/**)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Read(~/.aws/**)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Read(./.git/**)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(git commit:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(git push:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(git merge:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(git rebase:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(git reset:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(git clean:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(git branch -d:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(git branch -D:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(git tag -d:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(git stash drop:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(git stash clear:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(git remote add:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(git remote remove:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(git remote set-url:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(git config --global:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(rm:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(rmdir:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(shred:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(dd:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(mkfs:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(fdisk:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(curl:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(wget:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(nc:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(netcat:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(nmap:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(telnet:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(ftp:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(sftp:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(scp:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(rsync:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(ssh:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(mysql:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(mysqldump:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(mysqlimport:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(mariadb:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(mariadb-dump:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(psql:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(pg_dump:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(pg_restore:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(pg_dumpall:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(redis-cli:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(sudo:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(su:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(chmod:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(chown:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(passwd:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(useradd:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(usermod:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(crontab:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(kill:*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(pkill:*)"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The pattern to remember:&lt;/strong&gt; tight allow, broad deny. Explicitly permit the specific tools Claude needs; broadly block everything that could cause harm if run without your supervision.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Custom Commands
&lt;/h2&gt;

&lt;p&gt;Custom commands are saved prompts you can used during any session to run a repeatable, structured action across your codebase.&lt;/p&gt;

&lt;p&gt;The most valuable commands aren’t about generating code, they’re about &lt;strong&gt;reviewing what’s been built&lt;/strong&gt;. Running a security audit through a project catches problems before they’re buried under layers of new code.&lt;/p&gt;

&lt;p&gt;Custom commands live in .claude/commands/ as markdown files. Each file is a detailed prompt that Claude executes when you invoke the slash command.&lt;/p&gt;

&lt;p&gt;Claude Code itself can help you draft these command files efficiently.&lt;/p&gt;

&lt;h2&gt;
  
  
  Commands worth building
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;/security-audit&lt;/code&gt;&lt;/strong&gt;&lt;br&gt;
Reviews all files in the application for SQL injection risks, unvalidated input, missing authentication checks, insecure direct object references, and exposed error messages. Outputs a prioritised list of findings with file and line references.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;/architecture-review&lt;/code&gt;&lt;/strong&gt;&lt;br&gt;
Checks that the  business rules are  being followed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;/update-claude&lt;/code&gt;&lt;/strong&gt;&lt;br&gt;
"Review what we have just discussed and update CLAUDE.md with any new architectural patterns or 'What not to do'  rules we have discovered".&lt;/p&gt;

&lt;h2&gt;
  
  
  Example: the security audit command
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;Perform a security audit on the PHP codebase in app/.

Check for the following issues and report each with file
path, line number, and severity (high/medium/low):
&lt;span class="p"&gt;
1.&lt;/span&gt; SQL injection risks — any string concatenation in
   queries, any unparameterised input
&lt;span class="p"&gt;2.&lt;/span&gt; Missing input validation — user-supplied data used
   without sanitisation or type checking
&lt;span class="p"&gt;3.&lt;/span&gt; Authentication gaps — routes or methods that should
   require auth but don't check for a session
&lt;span class="p"&gt;4.&lt;/span&gt; IDOR risks — fetching records by ID without verifying
   the current user owns that record
&lt;span class="p"&gt;5.&lt;/span&gt; Verbose error exposure — raw exceptions or stack
   traces that could leak system details

Output format:
[SEVERITY] File: path/to/file.php (line N)
Issue: description
Fix: recommended action

Do not fix anything — report only. Fixes happen
in a separate pass once the full list is reviewed.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Run audits often.&lt;/strong&gt; The best time to run a security audit is not at the end of the project , it's during the project development. Issues caught early are easy to fix. Issues caught after three months of layered code become complex.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Claude Ignore
&lt;/h2&gt;

&lt;p&gt;Similar to .gitignore, there is a &lt;strong&gt;.claudeignore&lt;/strong&gt; file. Entries in the .claudeignore tells Claude that the files and folders listed are not relevant.&lt;/p&gt;

&lt;p&gt;Folders like node_modules, vendor, logs, caches etc would be good one to add to .claudeignore.&lt;/p&gt;

&lt;p&gt;.claudeignore is not a replacement for settings.json. Claude can still read items listed in .claudeignore, You can still ask Claude to read them to gain context.&lt;/p&gt;

&lt;p&gt;.claudeignore saves you tokens by not having Claude read the files upfront.&lt;/p&gt;

&lt;h2&gt;
  
  
  Context Compacting
&lt;/h2&gt;

&lt;p&gt;As a Claude Code session grows longer, it will eventually compact, compressing older parts of the conversation to free up context window space. Claude does its best to preserve what matters, but compacting loses context. Architectural decisions made three hours ago, the reasoning behind a particular pattern choice,  all of that is at risk of being quietly forgotten.&lt;/p&gt;

&lt;p&gt;This is where the &lt;code&gt;CLAUDE.md&lt;/code&gt; file helps.&lt;/p&gt;

&lt;p&gt;Because &lt;code&gt;CLAUDE.md&lt;/code&gt; is loaded at the start of every session,  and re-read whenever Claude needs to orient itself , your core context is never actually lost to compacting. It doesn't live in the conversation history. It lives in the file. &lt;/p&gt;

&lt;p&gt;Compacting can chew through hours of back-and-forth . Your architecture conventions are still right there, intact, waiting to be loaded again. &lt;/p&gt;

&lt;p&gt;In long conversations, you may want to update the CLAUDE.md file with new information that came about during the conversation. &lt;/p&gt;

&lt;p&gt;This can change how you work. Instead of using a  single long session and worrying about context drift, you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;code&gt;/clear&lt;/code&gt; freely and often to start fresh without losing your architectural context&lt;/li&gt;
&lt;li&gt;Treat each task or feature as its own clean session, describe the scope, build it, clear, repeat&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Working in small, focused sessions with frequent clears is actually a healthier pattern than one long marathon session anyway. It forces you to scope tasks tightly, keeps the context window lean, and means Claude is always working with fresh, uncluttered context.&lt;/p&gt;

&lt;h2&gt;
  
  
  Time to code
&lt;/h2&gt;

&lt;p&gt;With &lt;code&gt;CLAUDE.md&lt;/code&gt; setup, and your system secured, you've created a safe space for Claude Code to be genuinely excellent.&lt;/p&gt;

&lt;p&gt;This lets you focus on working through the architecture and delivering a quality product.&lt;/p&gt;

&lt;p&gt;Setting up Claude Code isn't about restricting the AI,  it's about defining the playing field so you can stop worrying about the boundaries and start focusing on the architecture.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>development</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Partial Indexes in PostgreSQL</title>
      <dc:creator>Lawrence Cooke</dc:creator>
      <pubDate>Sun, 15 Feb 2026 20:56:41 +0000</pubDate>
      <link>https://forem.com/mrpercival/partial-indexes-in-postgresql-24pb</link>
      <guid>https://forem.com/mrpercival/partial-indexes-in-postgresql-24pb</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fso5825ow262dlkfha5u3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fso5825ow262dlkfha5u3.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Partial indexes are refined indexes, used to target specific access patterns. Instead of indexing every row in a table, they only index the rows that match a condition — making them smaller, faster, and more efficient for the right use cases.&lt;/p&gt;

&lt;p&gt;Partial indexes work best on queries that filter first then scan multiple rows, target meaningful subsets of data, or might otherwise hit index thresholds that cause the planner to ignore the index entirely.&lt;/p&gt;

&lt;h2&gt;
  
  
  The setup
&lt;/h2&gt;

&lt;p&gt;To demonstrate partial indexes, I am using the &lt;a href="https://github.com/datacharmer/test_db" rel="noopener noreferrer"&gt;MySQL sample database&lt;/a&gt; converted to PostgreSQL. The salary table has about 3 million rows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="nv"&gt;"employees"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"salaries"&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nv"&gt;"id"&lt;/span&gt; &lt;span class="n"&gt;int4&lt;/span&gt; &lt;span class="k"&gt;GENERATED&lt;/span&gt; &lt;span class="n"&gt;ALWAYS&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;IDENTITY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;"emp_no"&lt;/span&gt; &lt;span class="n"&gt;int4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;"salary"&lt;/span&gt; &lt;span class="n"&gt;int4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;"from_date"&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;"to_date"&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here is a query with a standard index on the &lt;strong&gt;salary&lt;/strong&gt; field:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;salaries&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;100000&lt;/span&gt; 
&lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;to_date&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'9999-01-01'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this sample database, &lt;code&gt;9999-01-01&lt;/code&gt; is used to indicate an active salary.&lt;/p&gt;

&lt;p&gt;This query took about 140ms (cold) and 40ms (hot) to run.&lt;/p&gt;

&lt;h2&gt;
  
  
  Choosing when to create a partial index
&lt;/h2&gt;

&lt;p&gt;If queries are often run using the same pattern, they're good candidates for a partial index.&lt;/p&gt;

&lt;p&gt;Before deciding to add one, consider the data itself. How much would be filtered out by adding this partial index? To really benefit, you want significant filtering — the more rows excluded, the better.&lt;/p&gt;

&lt;p&gt;In this case, most of our queries focus on active employees and their current salary. Out of the 3 million rows, only 247,000 are current. That's a solid reduction, making it a good candidate for a partial index.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_salaries_salary_todate_partial&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;salaries&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;salary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;to_date&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'9999-01-01'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates the index on the salary field again, but this time it only includes active employee salaries.&lt;/p&gt;

&lt;p&gt;Adding this partial index took the query time down to 16ms (both cold and hot).&lt;/p&gt;

&lt;h2&gt;
  
  
  Index thresholds
&lt;/h2&gt;

&lt;p&gt;Here's where partial indexes really start to shine.&lt;/p&gt;

&lt;p&gt;Range queries like the one we've used are susceptible to having the index ignored when the result set gets too large. This is known as the 30% rule — once the result set exceeds roughly 30% of total rows (it's not an exact number), the query planner often chooses a table scan over using the index.&lt;/p&gt;

&lt;p&gt;Changing the query to look for a different salary filter demonstrates this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;salaries&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;50000&lt;/span&gt; 
&lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;to_date&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'9999-01-01'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;While &lt;code&gt;&amp;gt; 100000&lt;/code&gt; only returned 17,000 rows, &lt;code&gt;&amp;gt; 50000&lt;/code&gt; returns 215,000 out of the 247,000 rows. With a standard index, the planner ignores it and uses a table scan instead.&lt;/p&gt;

&lt;p&gt;In this scenario, the query time (hot) was around 120ms.&lt;/p&gt;

&lt;p&gt;The partial index, however, is still used because it reduces the starting row count upfront. Even with this broader filter, results come back in around 40ms.&lt;/p&gt;

&lt;h2&gt;
  
  
  Meaningful subsets of data
&lt;/h2&gt;

&lt;p&gt;Another scenario where partial indexes shine is when you're consistently querying a meaningful subset of data.&lt;/p&gt;

&lt;p&gt;A good example is a queue, where you might have an &lt;code&gt;is_processed&lt;/code&gt; boolean field and only care about the ones not yet processed. The number of unprocessed rows should be small compared to the processed ones, and over time that difference only grows larger.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_unprocessed_queue&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;queue&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;is_processed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Index size benefits
&lt;/h2&gt;

&lt;p&gt;An advantage of partial indexes worth mentioning is the index size.&lt;/p&gt;

&lt;p&gt;In our salary example, the full salary index is 58MB while the partial index is just 7MB.&lt;/p&gt;

&lt;p&gt;That said, partial indexes are often used alongside regular indexes rather than replacing them. The partial index solves for a specific access pattern, while the regular index covers other scenarios that the partial index wouldn't help with.&lt;/p&gt;

&lt;h2&gt;
  
  
  When partial indexes might not be as beneficial
&lt;/h2&gt;

&lt;p&gt;Partial indexes are great, but there are times where their benefits are minimal.&lt;/p&gt;

&lt;p&gt;A table where you're already using a unique index is one scenario:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'test@test.com'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;is_active&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Since email is almost certainly a unique field, it's unlikely that adding a partial index on &lt;code&gt;is_active&lt;/code&gt; would produce any real gains in query execution time.&lt;/p&gt;

&lt;p&gt;This doesn't mean a partial index is never worth adding in these cases. If the table is large, a partial index might be used simply to reduce index bloat. Smaller indexes fit better into the buffer cache, potentially keeping the index in memory where it belongs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Partial indexes are a powerful tool for reducing query times on specific access patterns. They're not right for every situation — but when you've got queries that consistently filter on the same conditions, they're absolutely worth considering. Start by looking at your most common query patterns and checking how much data would be filtered out. If the numbers look good, it might be worth adding a partial index.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>webdev</category>
      <category>sql</category>
      <category>postgressql</category>
    </item>
    <item>
      <title>Journey into Claude Code</title>
      <dc:creator>Lawrence Cooke</dc:creator>
      <pubDate>Mon, 26 Jan 2026 19:30:14 +0000</pubDate>
      <link>https://forem.com/mrpercival/journey-into-claude-code-1d6a</link>
      <guid>https://forem.com/mrpercival/journey-into-claude-code-1d6a</guid>
      <description>&lt;p&gt;Using AI in your daily development process requires a shift in how you think about writing code.&lt;/p&gt;

&lt;p&gt;Having built websites for over 30 years, I’ve lived through many changes in how software is developed. AI is not the first major shift developers have had to adapt to, and it certainly won’t be the last. Like previous transitions, it has caused disruption, some of it useful, some of it driven by marketing hype that sets expectations far too high.&lt;/p&gt;

&lt;p&gt;When those expectations aren’t met, the conclusion is often that AI has failed. In reality, it’s not that AI isn’t useful, it’s that it needs to be used differently. It works best as a tool within your workflow, not something you can simply turn on and forget.&lt;/p&gt;

&lt;h2&gt;
  
  
  Delving into Claude Code
&lt;/h2&gt;

&lt;p&gt;My first attempt at using Claude Code didn’t go well. I approached it the way it was being marketed: asking it to “build a product.” The scope of that request was far too large. By giving the AI so much latitude, I was asking it to make architectural, stylistic, and framework-level decisions without sufficient context.&lt;/p&gt;

&lt;p&gt;The result was a tangled web of code that often didn’t even respect the framework I had already set up. It would fall back to raw PDO statements instead of using framework tooling. It created methods and classes that were later abandoned but left behind. Coding standards shifted mid-stream, and the overall result felt incoherent.&lt;/p&gt;

&lt;p&gt;This wasn’t a failure of Claude Code so much as a failure in how I was using it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Second Attempt: Adapting the Approach
&lt;/h2&gt;

&lt;p&gt;The second attempt required a mental shift. Rather than expecting AI to work instead of me, I needed it to work with me.&lt;/p&gt;

&lt;p&gt;AI isn’t there to replace the developer, it’s there to support the developer. That means staying in control. In practical terms, this meant drastically reducing the scope of what I asked it to do and taking on more of an architectural role, guiding it much as I would guide a junior developer.&lt;/p&gt;

&lt;p&gt;Instead of asking it to build an entire project, I broke the work down into much smaller pieces: a class, a method, or a clearly defined section of functionality. You can expand the scope somewhat, but keeping it small was critical while learning how AI fit into my workflow.&lt;/p&gt;

&lt;p&gt;The results were noticeably better. There were still odd decisions, sometimes it used framework tools, sometimes it didn’t, but at this scale it was easy to catch and correct those issues. The code was generally solid, required some cleanup, and was far quicker to produce than writing everything myself. Importantly, the problems were easy to spot and fix.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Importance of Review
&lt;/h2&gt;

&lt;p&gt;Throughout this process, code review became even more important.&lt;/p&gt;

&lt;p&gt;Any code you didn’t write yourself should be reviewed carefully, and AI-generated code is no exception. In truth, even code you &lt;em&gt;did&lt;/em&gt; write yourself benefits from review, we all make mistakes.&lt;/p&gt;

&lt;p&gt;Regularly reviewing AI-generated code helps keep things from becoming large and unwieldy. It’s how you maintain control, catch deviations early, and ensure the codebase stays consistent and understandable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Attempt Three: Teaching AI How I Code
&lt;/h2&gt;

&lt;p&gt;While the output from the second attempt was good, it still didn’t feel quite right. The code wasn’t bad, it just wasn’t how I would write it.&lt;/p&gt;

&lt;p&gt;In PHP, there are countless ways to solve the same problem, but most developers have &lt;em&gt;their&lt;/em&gt; way of doing things. If Claude Code could follow my preferences, there were clear benefits: the code would feel familiar, differences would stand out more clearly, and issues would be easier to identify.&lt;/p&gt;

&lt;p&gt;The third attempt focused on creating a &lt;code&gt;claude.md&lt;/code&gt; file that clearly documented how I wanted code to be written. This included decisions around function and variable naming, whether to use static methods, specific usage patterns for &lt;a href="https://flightphp.com" rel="noopener noreferrer"&gt;Flight PHP&lt;/a&gt; (my framework of choice), the PHP version I target, database choices, dependency injection approaches, and adherence to SOLID principles.&lt;/p&gt;

&lt;p&gt;That file grew to around 500 lines of guidance, including examples of what to do, and what not to do. With those constraints in place, working with Claude Code became a much more pleasant experience.&lt;/p&gt;

&lt;p&gt;Having detailed instructions on &lt;em&gt;how&lt;/em&gt; to code also made it possible to slightly increase the scope of requests, though still not to the level of “build me an entire product.”&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Whether you love AI or hate it, learning how to use it has become important for long-term relevance as a developer. Not taking the time to understand how it can best support your workflow risks being left behind.&lt;/p&gt;

&lt;p&gt;One unexpected benefit I’ve found is that development feels more restful. Building software is mentally demanding, from designing architecture to writing code to reasoning about database access. Allowing AI to take on some of that load can reduce cognitive strain without sacrificing quality.&lt;/p&gt;

&lt;p&gt;You still can’t let it run unchecked. You’re ultimately responsible for the code it produces. But by guiding it with clear constraints and documentation, and by keeping yourself firmly in control, AI can become a focused, effective, and genuinely helpful part of your development process.&lt;/p&gt;

&lt;p&gt;It's not magic. But used well, it might just make the work a little lighter.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>php</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Using TF-IDF Vectors With PHP &amp; PostgreSQL</title>
      <dc:creator>Lawrence Cooke</dc:creator>
      <pubDate>Thu, 27 Mar 2025 23:17:45 +0000</pubDate>
      <link>https://forem.com/mrpercival/using-tf-idf-vectors-with-php-postgresql-3ll1</link>
      <guid>https://forem.com/mrpercival/using-tf-idf-vectors-with-php-postgresql-3ll1</guid>
      <description>&lt;p&gt;Vectors in PostgreSQL are used to compare data to find similarities, outliers, groupings, classifications and other things.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/pgvector/pgvector" rel="noopener noreferrer"&gt;pg_vector&lt;/a&gt; is a popular extension for PostgreSQL that adds vector functionality to PostgreSQL.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is TF-IDF?
&lt;/h2&gt;

&lt;p&gt;TF-IDF stands for Term Frequency-Inverse Document Frequency. It's a way to compare the importance of a word in a document compared to a collection of documents.&lt;/p&gt;

&lt;h2&gt;
  
  
  Term Frequency
&lt;/h2&gt;

&lt;p&gt;Term frequency refers to how often a word is used within a document. In a 100 word document, if the word 'test' occurs 5 times, then the term frequency would be 5/100 = 0.05&lt;/p&gt;

&lt;h2&gt;
  
  
  Inverse Document Frequency
&lt;/h2&gt;

&lt;p&gt;Inverse Document Frequency measures how unique a word is across a group of documents.  &lt;/p&gt;

&lt;p&gt;Common words like "the" or "and" appear in almost all documents, so they are assigned a low IDF score. Rare, specific words are assigned a higher IDF score.&lt;/p&gt;

&lt;p&gt;The TF-IDF score is &lt;strong&gt;TF * IDF&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Normalizing TF-IDF
&lt;/h2&gt;

&lt;p&gt;A drawback to using TF-IDF is that it unfairly advantages long documents over short documents. &lt;/p&gt;

&lt;p&gt;Longer documents can accumulate higher TF-IDF scores simply because they contain more words, not necessarily because the word is more relevant. &lt;/p&gt;

&lt;p&gt;This can be corrected by normalizing the score based on the document length.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;TD-IDF score / total words in document.&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  PHP Implementation Guide
&lt;/h2&gt;

&lt;p&gt;To create vectors in PHP, select all articles from a database and loop through them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$articles&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;$article&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
     &lt;span class="nv"&gt;$articleText&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$article&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'description'&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
     &lt;span class="nv"&gt;$tokenizedDocuments&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;$article&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'id'&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;tokenizeArticle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$articleText&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
     &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;updateDocumentFrequencies&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$documentFrequencies&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$words&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Break up the document into an array of words. Additional word processing could be done here if required.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;tokenizeArticle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;array&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;strtolower&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nv"&gt;$text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;preg_replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'/[^\w\s]/'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nv"&gt;$words&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;preg_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'/\s+/'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$text&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$words&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create an array to keep track of the word frequency across all documents.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;updateDocumentFrequencies&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;array&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nv"&gt;$documentFrequencies&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;array&lt;/span&gt; &lt;span class="nv"&gt;$words&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$uniqueWords&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;array_unique&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$words&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$uniqueWords&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;$word&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="k"&gt;isset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$documentFrequencies&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;$word&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nv"&gt;$documentFrequencies&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;$word&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="nv"&gt;$documentFrequencies&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;$word&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once the articles have been processed, create the embedding vector&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;createEmbeddings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="kt"&gt;array&lt;/span&gt; &lt;span class="nv"&gt;$articles&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="kt"&gt;array&lt;/span&gt; &lt;span class="nv"&gt;$tokenizedDocuments&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="kt"&gt;array&lt;/span&gt; &lt;span class="nv"&gt;$documentFrequencies&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$totalDocuments&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$articles&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$articles&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;$article&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$articleId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$article&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'id'&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
        &lt;span class="nv"&gt;$words&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$tokenizedDocuments&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;$articleId&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;

        &lt;span class="nv"&gt;$embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;calculateEmbedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="nv"&gt;$words&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nv"&gt;$documentFrequencies&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nv"&gt;$totalDocuments&lt;/span&gt;
        &lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;CaclulateEmbedding() is where the main calculations for TF-IDF score is done.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;calculateEmbedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="kt"&gt;array&lt;/span&gt; &lt;span class="nv"&gt;$words&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="kt"&gt;array&lt;/span&gt; &lt;span class="nv"&gt;$documentFrequencies&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nv"&gt;$totalDocuments&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;array&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$termFrequencies&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;array_count_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$words&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nv"&gt;$totalWords&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$words&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="nv"&gt;$embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;array_fill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;dimension&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$termFrequencies&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;$word&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$count&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$tf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$count&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nv"&gt;$totalWords&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="nv"&gt;$idf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$totalDocuments&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$documentFrequencies&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;$word&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
        &lt;span class="nv"&gt;$tfidf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$tf&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nv"&gt;$idf&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="nv"&gt;$index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;crc32&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$word&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;dimension&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nv"&gt;$embedding&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;$index&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nv"&gt;$tfidf&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;normalizeVector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$embedding&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;u&gt;&lt;strong&gt;Dimensions&lt;/strong&gt;&lt;/u&gt;&lt;/p&gt;

&lt;p&gt;The number chosen for dimensions is critical to good quality TF-IDF.     The number should be large enough to hold the number of unique words in any of your documents. 768 or 1536 are good numbers for medium sized documents. As a general rule about 20 - 30% of words in a document are unique. 1536 equates to about a 20 to 30 page document. &lt;/p&gt;

&lt;p&gt;&lt;u&gt;&lt;strong&gt;Calculate TF&lt;/strong&gt;&lt;/u&gt;&lt;/p&gt;

&lt;p&gt;Divide the number of times a word occurs in a document by the total words in the document.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;$tf = $count / $totalWords;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;u&gt;Calculate IDF&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Since IDF is the inverse of the document frequency, we use log to calculate the score &lt;/p&gt;

&lt;p&gt;&lt;code&gt;$idf = log($totalDocuments / ($documentFrequencies[$word] + 1));&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;u&gt;Calculate TF-IDF&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;$tfidf = $tf * $idf;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;u&gt;TF-IDF array&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;TF-IDF arrays do not store values in order, instead they are stored in a calculated array key. This ensures that the same word will always appear in the same array position across all documents. &lt;/p&gt;

&lt;p&gt;While it is possible to calculate duplicate array keys, as long as the vectors dimension size chosen is appropriate for the size of the document, duplicates are rare and is generally represents a similar word. &lt;/p&gt;

&lt;p&gt;To calculate the position, use crc32 to generate an integer representation of the word and then divide it by the dimension size, and use the remainder as the array key position.&lt;/p&gt;

&lt;p&gt;This will give a good spread of spaces that are filled with the TF-IDF scores.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;u&gt;Normalizing&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Earlier we talked about normalizing as word frequency/docuiment length, while it can be calculated this way, normalizing is more commonly calculated using the &lt;code&gt;Euclidean norm formula&lt;/code&gt;: √(x₁² + x₂² + ... + xₙ²)&lt;/p&gt;

&lt;p&gt;The normalizeVector method is a PHP representation of this formula.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;normalizeVector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;array&lt;/span&gt; &lt;span class="nv"&gt;$vector&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;array&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$magnitude&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;array_sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;array_map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$x&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nv"&gt;$x&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="nv"&gt;$vector&lt;/span&gt;&lt;span class="p"&gt;)));&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$magnitude&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;array_map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$magnitude&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$x&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nv"&gt;$magnitude&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="nv"&gt;$vector&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$vector&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The final vector may look something like this:&lt;/p&gt;

&lt;p&gt;[0.052876625,0,0,0,0,0,0,0,0,0,0,0,0.013156515,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-0.012633555,0,0,0,0,0.0065987236,0,0 ...]&lt;/p&gt;

&lt;p&gt;This is known as a sparse vector. A sparse vector has a lot of empty array keys whereas a dense vector is much more filled in.&lt;/p&gt;

&lt;p&gt;Dense vectors can improve the quality of the vector. One method of doing this is to include bi-grams in the vector along with the single words. &lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;This is known as a sparse vector&lt;/code&gt; would include each word [this,is,known,as,a,sparse,vector] adding bi-grams would include [this_is,is_known,known_as,as_a,a_sparse,sparse_vector] which adds more context to the words by taking into account the words around them. &lt;/p&gt;

&lt;h2&gt;
  
  
  Creating Queries in PostgreSQL
&lt;/h2&gt;

&lt;p&gt;Once vectors have been generated for your documents, it's time to store them in PostgreSQL.&lt;/p&gt;

&lt;p&gt;Selecting the right dimension for your document is also critical here, once you  choose a dimension size, all vectors going into the field have to be the same dimension.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="nv"&gt;"articles"&lt;/span&gt;
&lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="nv"&gt;"embedding"&lt;/span&gt; &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1536&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Types of Comparisions
&lt;/h2&gt;

&lt;p&gt;There are three types of comparisons in PostgreSQL&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Euclidean (L2) distance&lt;/strong&gt;: &amp;lt;-&amp;gt; : Measures how far apart two vectors are.  Smaller numbers mean vectors are more similar. Good for finding  similar products etc.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cosine similarity&lt;/strong&gt;: &amp;lt;=&amp;gt; : Measures the angle between vectors, ignoring their magnitude. Smaller numbers mean vectors are more similar in direction. Good for text similarity where length shouldn't matter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Inner product&lt;/strong&gt;: &amp;lt;#&amp;gt; : Measures how much vectors "align" with each other. Larger numbers mean vectors are more similar (opposite of the others!). Useful for normalized comparisons.&lt;/p&gt;

&lt;p&gt;Try them all with your data to find the one that best suits your use case.&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating a recommendation system
&lt;/h2&gt;

&lt;p&gt;One of the use cases of vectors is to create a recommendation system, in this case to find articles that are related in some way to the one you are currently reading.&lt;/p&gt;

&lt;p&gt;To do this, we need to order the rows by the comparison to find the ones most relevant to the current article.&lt;/p&gt;

&lt;p&gt;In this query, first, the embedding of the current article needs to be selected and then compare other articles to it, to find the most relevant. &lt;/p&gt;

&lt;p&gt;For a recommendation, filtering out the current article from the query makes sense.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt;
    &lt;span class="n"&gt;search_article&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;
        &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;articles&lt;/span&gt;
        &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;articles&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;
        &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;search_article&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Creating a search engine
&lt;/h2&gt;

&lt;p&gt;Vectors can be used to create a search engine for your documents. Comparing articles with the user entered question or keywords.&lt;/p&gt;

&lt;p&gt;To do this, The user entered question would need to be converted into a vector using the term frequencies of your current articles (recommend this be stored in the database so you are not calculating them every time a search query is run). The user vector would need to be the same dimension size as the articles. &lt;/p&gt;

&lt;p&gt;Create a query to compare the user vector to the stored vectors&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;articles&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcqhc35l6lmlpiv1n1uma.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcqhc35l6lmlpiv1n1uma.png" alt="Image description" width="800" height="597"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Other use cases
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;u&gt;Classifying Articles&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A more complex use case for vectors would be to classify documents to put similar articles together. You may not have specific tags/keywords to classify documents against, but articles can still be classified into similar items.&lt;/p&gt;

&lt;p&gt;This results in similar articles having the same cluster id&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fye0x6x1390e123f9n34n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fye0x6x1390e123f9n34n.png" alt="Image description" width="800" height="517"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;u&gt;Finding Anomalies&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If users post articles about tech, and suddenly someone posts an article about places to buy plushies, that would be an anomaly and might be worth checking to see if it fits the site's requirements.&lt;/p&gt;

&lt;p&gt;To implement an anomaly checker, a distance threshold would need to be set and anything further away than the threshold would be flagged for manual review.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; 
    &lt;span class="n"&gt;article_distances&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;SELECT&lt;/span&gt; 
            &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
            &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1536&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
                &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;articles&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;distance_from_average&lt;/span&gt;
        &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;articles&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;distance_from_average&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;article_distances&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;distance_from_average&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;75&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;distance_from_average&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This query calculates the "average" embedding across all articles (representing your typical content) and then finds articles that are significantly different from this average.&lt;/p&gt;

&lt;p&gt;Experiment with the threshold to find what is right for your use case.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Vectors are both complex and powerful, well planned vectors can help automate many use cases or add features to your website.&lt;/p&gt;

&lt;p&gt;TF-IDF, while it is the method I chose here, it's not the only vector type. Open AI has their own model for generating vectors from text, as does Ollama. These may or may not be better for your use case. &lt;/p&gt;

&lt;p&gt;It's important to experiment with different approaches - test various dimension sizes, comparison methods, and even vector generation techniques to find what works best for your specific needs.&lt;/p&gt;

</description>
      <category>php</category>
      <category>postgres</category>
      <category>postgressql</category>
    </item>
    <item>
      <title>How To Use Materialized Views</title>
      <dc:creator>Lawrence Cooke</dc:creator>
      <pubDate>Mon, 23 Dec 2024 16:35:29 +0000</pubDate>
      <link>https://forem.com/mrpercival/how-to-use-materialized-views-49dk</link>
      <guid>https://forem.com/mrpercival/how-to-use-materialized-views-49dk</guid>
      <description>&lt;p&gt;There are times when a query takes a long time to run. While indexes and good query design often help, sometimes the query is inherently slow.&lt;/p&gt;

&lt;p&gt;In cases like this, we need to find an alternative way to collect the data to prevent the queries from creating slowness on a website. &lt;/p&gt;

&lt;p&gt;Materialized views can provide significant improvement in query performance, and are especially useful in aggregated and reporting queries.&lt;/p&gt;

&lt;p&gt;To demonstrate this I am importing a 100 million row CSV of mock temperature data from about 400 cities.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Amsterdam;2010-06-28;15.4
Kano;2017-04-23;18.7
Calgary;2016-05-07;4.3
Reggane;2014-10-04;32.0
Fukuoka;2010-04-17;22.6
Khartoum;2017-05-29;29.8
Vilnius;2014-06-16;4.2
Murmansk;2011-09-29;3.8
Parakou;2010-09-12;10.6
Cairo;2014-03-29;42.6
Edmonton;2015-04-24;-2.1
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Creating and importing data into a table
&lt;/h2&gt;

&lt;p&gt;To create the table in PostgreSQL the following table definition can be used:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;weather_data&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt; &lt;span class="k"&gt;GENERATED&lt;/span&gt; &lt;span class="n"&gt;ALWAYS&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;IDENTITY&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;recorded_date&lt;/span&gt; &lt;span class="nb"&gt;DATE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;temperature&lt;/span&gt; &lt;span class="nb"&gt;FLOAT&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Importing the data can be done through psql&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;psql &lt;span class="nt"&gt;-h&lt;/span&gt; localhost &lt;span class="nt"&gt;-U&lt;/span&gt; postgres &lt;span class="nt"&gt;-W&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="se"&gt;\c&lt;/span&gt;opy weather_data&lt;span class="o"&gt;(&lt;/span&gt;city, recorded_date, temperature&lt;span class="o"&gt;)&lt;/span&gt; FROM &lt;span class="s1"&gt;'measurements.csv'&lt;/span&gt; WITH &lt;span class="o"&gt;(&lt;/span&gt;FORMAT csv, DELIMITER &lt;span class="s1"&gt;';'&lt;/span&gt;, HEADER&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Querying the data
&lt;/h2&gt;

&lt;p&gt;The query used here is one derived from the &lt;a href="https://1brc.dev" rel="noopener noreferrer"&gt;1 billion row challenge&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
  &lt;span class="k"&gt;MIN&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;min_temperature&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
  &lt;span class="k"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;max_temperature&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
  &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_temperature&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;weather_data&lt;/span&gt; 
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5odgj03icrzij6w9uwov.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5odgj03icrzij6w9uwov.png" alt="Image description" width="800" height="350"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On 100 million rows, this took about 7 seconds to run. If a query like this was running on a website, it would be too slow to be practical.&lt;/p&gt;

&lt;p&gt;An option to consider is to use Redis, or Elastic Search to take the load off the database. &lt;/p&gt;

&lt;p&gt;Another option is to use a materialized view in PostgreSQL to store the returned results. &lt;/p&gt;

&lt;h2&gt;
  
  
  What are Views?
&lt;/h2&gt;

&lt;p&gt;A view in SQL is a virtual table. Rather than storing data, they store a SQL query. When the virtual table is queried, it's running the more complex query underneath to return the result.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;weather_stats&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
  &lt;span class="k"&gt;MIN&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;min_temperature&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
  &lt;span class="k"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;max_temperature&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
  &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_temperature&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;weather_data&lt;/span&gt; 
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To query the view we would use a query like&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;weather_stats&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Since this is just running the stored SQL query, this does not provide any query execution time improvement.&lt;/p&gt;

&lt;p&gt;To improve execution time, a materialized view would be a better option.&lt;/p&gt;

&lt;h2&gt;
  
  
  What are Materialized Views?
&lt;/h2&gt;

&lt;p&gt;Materialized views are quite different from regular views. Instead of  storing a SQL query and running the query each time it's used, materialized views store the result set on disk. &lt;/p&gt;

&lt;p&gt;This makes accessing the result set much faster than just running the raw query.&lt;/p&gt;

&lt;p&gt;To create a materialized view, the syntax is almost the same as a regular view&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;MATERIALIZED&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;weather_stats&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
  &lt;span class="k"&gt;MIN&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;min_temperature&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
  &lt;span class="k"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;max_temperature&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
  &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_temperature&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;weather_data&lt;/span&gt; 
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The create time for the materialized view will be about the same amount of time as it would be if you just ran the query, however when querying the materialized view, instead of the query taking 7 seconds, it returns results in about 5ms.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;weather_stats&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Refreshing materialized views
&lt;/h2&gt;

&lt;p&gt;Since materialized views store the result set, the data can become stale. &lt;/p&gt;

&lt;p&gt;To prevent the data from becoming too stale, the materialized view can be refreshed as regularly as needed to keep the data up to date.&lt;/p&gt;

&lt;p&gt;To refresh the data, a REFRESH query is run&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;REFRESH&lt;/span&gt; &lt;span class="n"&gt;MATERIALIZED&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;weather_stats&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Refreshing the data will temporarily block access to the view, causing downtime. &lt;/p&gt;

&lt;p&gt;Once the refresh is complete, the data will become available again. The downtime would be based on how long the underlying query takes to run.&lt;/p&gt;

&lt;h2&gt;
  
  
  Concurrent Materialized View Refresh
&lt;/h2&gt;

&lt;p&gt;To avoid downtime, concurrent refreshes can be used.&lt;/p&gt;

&lt;p&gt;A concurrent refresh creates a temporary copy of the result set and when the refresh is complete, switches the materialized view to the new data. This allows access to the data during a data refresh. &lt;/p&gt;

&lt;p&gt;Concurrent refreshes require a unique index to be added to the materialized view.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;UNIQUE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_weather_stats_city&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;weather_stats&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once the index has been created, a concurrent refresh can be run&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;REFRESH&lt;/span&gt; &lt;span class="n"&gt;MATERIALIZED&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;CONCURRENTLY&lt;/span&gt; &lt;span class="n"&gt;weather_stats&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Updating A Materialized View Schema
&lt;/h2&gt;

&lt;p&gt;Updating the views schema requires dropping the view and recreating it. Downtime would occur in these cases.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="n"&gt;MATERIALIZED&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;weather_stats&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After the view has been dropped, a new CREATE query can be run to build the materialized view again and import the result set into the new table definition. &lt;/p&gt;

&lt;h2&gt;
  
  
  Scheduling the materialized view refresh
&lt;/h2&gt;

&lt;p&gt;To schedule the materialized view refresh, an external script could be created, and use crontab to trigger the refresh.&lt;/p&gt;

&lt;p&gt;Alternatively, it's possible to schedule the refresh inside PostgreSQL&lt;/p&gt;

&lt;h2&gt;
  
  
  Installing pg_cron extension
&lt;/h2&gt;

&lt;p&gt;To run cron jobs inside PostgreSQL, the PostgreSQL &lt;code&gt;pg_cron&lt;/code&gt; extension needs to be installed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt-get &lt;span class="nb"&gt;install &lt;/span&gt;postgresql-17-cron
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Make sure you install the version compatible with the version of PostgreSQL you have installed.&lt;/p&gt;

&lt;p&gt;Add the extension to postgresql.conf&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;nano /etc/postgresql/17/main/postgresql.conf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and add the extension at the bottom of the conf file&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;shared_preload_libraries &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'pg_cron'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Restart PostgreSQL&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt; &lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl restart postgresql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Activate the extension in PostgreSQL by running the CREATE EXTENSION query.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;EXTENSION&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;pg_cron&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we can schedule the view refresh&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;cron&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;schedule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'refresh_weather'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'0 * * * *'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="s1"&gt;'REFRESH MATERIALIZED VIEW CONCURRENTLY weather_stats'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will set the cron to refresh the view every hour on the hour. &lt;/p&gt;

&lt;p&gt;Check that the cron is running by executing this query&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;cron&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;jobid&lt;/th&gt;
&lt;th&gt;schedule&lt;/th&gt;
&lt;th&gt;command&lt;/th&gt;
&lt;th&gt;nodename&lt;/th&gt;
&lt;th&gt;nodeport&lt;/th&gt;
&lt;th&gt;database&lt;/th&gt;
&lt;th&gt;username&lt;/th&gt;
&lt;th&gt;active&lt;/th&gt;
&lt;th&gt;jobname&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0 * * * *&lt;/td&gt;
&lt;td&gt;REFRESH MATERIALIZED VIEW CONCURRENTLY weather_stats&lt;/td&gt;
&lt;td&gt;localhost&lt;/td&gt;
&lt;td&gt;5432&lt;/td&gt;
&lt;td&gt;postgres&lt;/td&gt;
&lt;td&gt;postgres&lt;/td&gt;
&lt;td&gt;true&lt;/td&gt;
&lt;td&gt;refresh_weather&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  When not to use materialized views
&lt;/h2&gt;

&lt;p&gt;While materialized views are a useful feature, they are not the right solution for every situation. &lt;/p&gt;

&lt;p&gt;In cases where data changes frequently and/or when data consistency is critical, materialized views are not a good option to use. &lt;/p&gt;

&lt;p&gt;Currency exchange rates, where the data needs to be up to date to the second would not be a good use case for materialized views unless you were looking to store historical data. &lt;/p&gt;

&lt;p&gt;It's also worth noting that because materialized views are stored on disk, that the disk space needed to store the view needs to be considered. &lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;While there are a few concepts to learn with materialized views, they can significantly improve query execution time compared to raw queries, which will benefit your website. &lt;/p&gt;

&lt;p&gt;Materialized views are not a good option for all cases. Understanding your data and customer needs will help determine if they are a viable solution or not.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>postgressql</category>
      <category>sql</category>
    </item>
    <item>
      <title>Optimizing SQL Queries</title>
      <dc:creator>Lawrence Cooke</dc:creator>
      <pubDate>Sun, 13 Oct 2024 15:41:51 +0000</pubDate>
      <link>https://forem.com/mrpercival/optimizing-sql-queries-1n1c</link>
      <guid>https://forem.com/mrpercival/optimizing-sql-queries-1n1c</guid>
      <description>&lt;p&gt;When writing queries, we should always take time to find the best way to write the query. &lt;/p&gt;

&lt;p&gt;Sometimes this can mean using methods that on the surface seem like they wouldn't be fast, but actually are. &lt;/p&gt;

&lt;p&gt;Query optimization is critical to having an efficient website.&lt;/p&gt;

&lt;p&gt;While query optimization also applies to reporting and analytics, queries that run as part of a web service are the ones most noticed by users of your website.&lt;/p&gt;

&lt;p&gt;For this article I am using the MySQL test employee database: &lt;a href="https://dev.mysql.com/doc/employee/en/" rel="noopener noreferrer"&gt;https://dev.mysql.com/doc/employee/en/&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Schema
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="nv"&gt;`employees`&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nv"&gt;`emp_no`&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;`birth_date`&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;`first_name`&lt;/span&gt; &lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;`last_name`&lt;/span&gt; &lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;`gender`&lt;/span&gt; &lt;span class="nb"&gt;enum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'M'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'F'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;`hire_date`&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;`emp_no`&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="nv"&gt;`name`&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;`first_name`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nv"&gt;`last_name`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="nv"&gt;`salaries`&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nv"&gt;`emp_no`&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;`salary`&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;`from_date`&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;`to_date`&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;`emp_no`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nv"&gt;`from_date`&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="nv"&gt;`salary`&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;`emp_no`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nv"&gt;`salary`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The salaries table can contain the same employee multiple times, each time an employees salary changes, it's a new row in the salaries table.&lt;/p&gt;

&lt;h2&gt;
  
  
  The task
&lt;/h2&gt;

&lt;p&gt;The task for this query is to return a unique list of employee number, first_name, last_name who earn over $50,000 a year.&lt;/p&gt;

&lt;p&gt;Along with selecting the data, we will need to ensure there are no duplicate employees.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using DISTINCT
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt;
    &lt;span class="n"&gt;employees&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;emp_no&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;last_name&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
    &lt;span class="n"&gt;employees&lt;/span&gt;
    &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;salaries&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;emp_no&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;
    &lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;50000&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In general, the use of DISTINCT is an indication that the query could be written better. &lt;/p&gt;

&lt;p&gt;DISTINCT fetches all the possible rows, and at the end of the query process, strips out duplicate rows it doesn't need. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Distinct is calculated against all selected rows. This can mean that it's possible to return duplicate names in some cases. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;An example of when this could occur would be if we included a column where each row for an employee changed, for example &lt;strong&gt;salary&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt;
    &lt;span class="n"&gt;employees&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;emp_no&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;last_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;salary&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
    &lt;span class="n"&gt;employees&lt;/span&gt;
    &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;salaries&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;emp_no&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;
    &lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;50000&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Query Execution Plan:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-&amp;gt; Table scan on &amp;lt;temporary&amp;gt;  (cost=241946..245972 rows=321886)
   └─&amp;gt; Temporary table with deduplication  (cost=241946..241946 rows=321886)
      └─&amp;gt; Nested loop inner join  (cost=209757 rows=321886)
         ├─&amp;gt; Filter: (salaries.salary &amp;gt; 50000)  (cost=97097 rows=321886)
         │  └─&amp;gt; Index scan on salaries using salary  (cost=97097 rows=965756)
         └─&amp;gt; Single-row index lookup on employees using PRIMARY (emp_no=salaries.emp_no)  (cost=0.25 rows=1)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The execution plan shows the use of a temporary table and a high cost. Temporary tables are generally slower queries. They are necessary at times, but if you can find a way to query without the use of a temporary table, it's generally going to be more efficient.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Average response time:&lt;/strong&gt; 745ms&lt;/p&gt;

&lt;h2&gt;
  
  
  Using GROUP BY
&lt;/h2&gt;

&lt;p&gt;A common method of ensuring unique users is to use GROUP BY&lt;/p&gt;

&lt;p&gt;GROUP BY is generally faster than DISTINCT. It doesn't need that last step of removing duplicates to complete the query plan&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;employees&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;emp_no&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;last_name&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
    &lt;span class="n"&gt;employees&lt;/span&gt;
    &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;salaries&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;emp_no&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;
    &lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;50000&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt;
    &lt;span class="n"&gt;employees&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;emp_no&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Query Execution Plan:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-&amp;gt; Table scan on &amp;lt;temporary&amp;gt;  (cost=241946..245972 rows=321886)
   └─&amp;gt; Temporary table with deduplication  (cost=241946..241946 rows=321886)
      └─&amp;gt; Nested loop inner join  (cost=209757 rows=321886)
         ├─&amp;gt; Filter: (salaries.salary &amp;gt; 50000)  (cost=97097 rows=321886)
         │  └─&amp;gt; Index scan on salaries using salary  (cost=97097 rows=965756)
         └─&amp;gt; Single-row index lookup on employees using PRIMARY (emp_no=salaries.emp_no)  (cost=0.25 rows=1)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;While the GROUP BY is slightly faster than DISTINCT, the execution plan is the same. The difference between them in this case is generally related to the internal query optimizer, query caching etc.&lt;/p&gt;

&lt;p&gt;While execution plans are very useful, they don't always give you the whole story of what is going on internally, which leads to subtle differences between queries that might have the same execution plan.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Average response time:&lt;/strong&gt; 721ms&lt;/p&gt;

&lt;h2&gt;
  
  
  Using Subquery
&lt;/h2&gt;

&lt;p&gt;While subqueries are often viewed as less efficient, there are times where they can reduce the row count, which can make queries faster. &lt;/p&gt;

&lt;p&gt;In this case, we are going to use a subquery to find the employee numbers where salary is over $50,000&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;employees&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;emp_no&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;last_name&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
    &lt;span class="n"&gt;employees&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;
    &lt;span class="n"&gt;emp_no&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;SELECT&lt;/span&gt;
            &lt;span class="n"&gt;emp_no&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;salaries&lt;/span&gt;
        &lt;span class="k"&gt;WHERE&lt;/span&gt;
            &lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;50000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using this method, the query time drops significantly.&lt;/p&gt;

&lt;p&gt;Query Execution Plan:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-&amp;gt; Nested loop inner join  (cost=89029 rows=33961)
   ├─&amp;gt; Remove duplicates from input sorted on salary  (cost=5161 rows=33961)
   │  └─&amp;gt; Filter: (salaries.salary &amp;gt; 50000)  (cost=5161 rows=33961)
   │     └─&amp;gt; Index scan on salaries using salary  (cost=5161 rows=965756)
   └─&amp;gt; Single-row index lookup on employees using PRIMARY (emp_no=salaries.emp_no)  (cost=80472 rows=1)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here you will see that the query is no longer using a temporary table, and is using a much simpler plan, with a much lower cost value.&lt;/p&gt;

&lt;p&gt;These factors lead to a faster response time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Average response time:&lt;/strong&gt; 234ms&lt;/p&gt;

&lt;p&gt;While using a subquery significantly improved the query performance, we may be able to achieve better results by using the EXISTS clause, which offers some advantages over the IN statement used in the subquery.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using EXISTS
&lt;/h2&gt;

&lt;p&gt;When using EXISTS, the query early terminates once it finds a match. In this case, it will early terminate once it has found a specific employee. &lt;/p&gt;

&lt;p&gt;While there are multiple rows in the salaries table for an employee, it does not need to continue checking if that specific employee exists if it has found a matching row, so it stops looking for the employee and moves onto looking for the next one.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;employees&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;emp_no&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;last_name&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
    &lt;span class="n"&gt;employees&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;
    &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;SELECT&lt;/span&gt;
            &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="k"&gt;FROM&lt;/span&gt;
            &lt;span class="n"&gt;salaries&lt;/span&gt;
        &lt;span class="k"&gt;WHERE&lt;/span&gt;
            &lt;span class="n"&gt;salaries&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;emp_no&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;emp_no&lt;/span&gt;
            &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;50000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We use &lt;em&gt;SELECT 1&lt;/em&gt; in this query because EXISTS only returns TRUE or FALSE, not what that the row contains. &lt;/p&gt;

&lt;p&gt;While we could use &lt;em&gt;SELECT emp_no&lt;/em&gt; or &lt;em&gt;SELECT *&lt;/em&gt;, returning a constant makes the intent of the query clearer, and in some cases, can be more efficient. &lt;/p&gt;

&lt;p&gt;Query Execution Plan:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-&amp;gt; Nested loop inner join  (cost=89029 rows=33961)
   ├─&amp;gt; Remove duplicates from input sorted on salary  (cost=5161 rows=33961)
   │  └─&amp;gt; Filter: (salaries.salary &amp;gt; 50000)  (cost=5161 rows=33961)
   │     └─&amp;gt; Index scan on salaries using salary  (cost=5161 rows=965756)
   └─&amp;gt; Single-row index lookup on employees using PRIMARY (emp_no=salaries.emp_no)  (cost=80472 rows=1)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;While this query plan is the same as the subquery query plan, the early termination improves the execution time. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Average response time:&lt;/strong&gt; 220ms&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Distinct: 745ms&lt;br&gt;
Group By: 721ms&lt;br&gt;
Subquery: 234ms&lt;br&gt;
Exists  : 220ms&lt;/p&gt;

&lt;p&gt;Using subqueries is not always the most efficient querying method, however, in scenarios like this, it can significantly improve your query.&lt;/p&gt;

&lt;p&gt;While just changing the query can help fix slow queries, there are other optimizations that could be considered.&lt;/p&gt;

&lt;p&gt;Creating better indexes can also help resolve slow queries, but adding indexes should be reserved for times where rewriting the query doesn't help the query to be more efficient.&lt;/p&gt;

&lt;p&gt;It's important to try out different query strategies on your own data. While EXISTS was the most efficient strategy when querying this dataset, results may differ on other datasets, so try out a variety of queries and see which one works best for you.&lt;/p&gt;

</description>
      <category>mysql</category>
      <category>sql</category>
      <category>mariadb</category>
    </item>
    <item>
      <title>Slim and Flight PHP Framework Comparison</title>
      <dc:creator>Lawrence Cooke</dc:creator>
      <pubDate>Sat, 14 Sep 2024 00:17:18 +0000</pubDate>
      <link>https://forem.com/mrpercival/slim-and-flight-php-framework-comparison-17am</link>
      <guid>https://forem.com/mrpercival/slim-and-flight-php-framework-comparison-17am</guid>
      <description>&lt;h2&gt;
  
  
  Why use a micro framework?
&lt;/h2&gt;

&lt;p&gt;On social media, often new PHP devs ask "What framework should I use for my project" and generally the answers given are "Laravel" or "Symfony".&lt;/p&gt;

&lt;p&gt;While these are both good options, the right answer to this question should be "What do you need the framework to do?"&lt;/p&gt;

&lt;p&gt;The right framework should be one that does what you need it to, without loads of features you will never use. &lt;/p&gt;

&lt;p&gt;If you are making a website with one route, using Laravel or Symfony would be over engineering the site, while for a complex site, Laravel or Symfony may be the right choice. &lt;/p&gt;

&lt;p&gt;Micro frameworks are great for building small to medium sized sites that don't need all of the features a full stack framework provides. &lt;/p&gt;

&lt;p&gt;While there are many, Slim and Flight PHP are both great examples of micro frameworks.&lt;/p&gt;




&lt;p&gt;Recently I built a small website that asks the user to solve 10 database related questions. It had three routes, and some basic queries to fetch the questions and compare the answers. &lt;/p&gt;

&lt;p&gt;For a small project like this, a micro framework is a great choice. I built the site on both Slim and Flight PHP to compare them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Skeletons
&lt;/h2&gt;

&lt;p&gt;If you haven't used a particular framework before, using the provided skeleton project is generally a great place to start. &lt;/p&gt;

&lt;p&gt;Flight PHPs skeleton project is pretty much what I expected, light weight, simple MVC setup, easy to understand the folder structure and know where everything should go in the project. &lt;/p&gt;

&lt;p&gt;For someone new to the framework, the learning curve to getting up and running is minimal.&lt;/p&gt;

&lt;p&gt;Light on composer libraries, just 5 in total (including the core library), 4 used in production. &lt;/p&gt;

&lt;p&gt;The production size for the Skeleton, was 1.6Mb.&lt;/p&gt;

&lt;p&gt;Slims skeleton project surprised me, The directory structure was more complex than I had anticipated. Geared more towards a structure that may be used in a larger project than in a small project. For a micro framework, this wasn't expected.&lt;/p&gt;

&lt;p&gt;The Slim skeleton was a bit heavier than Flight PHP. 21 composer libraries, 9 used in production. Production size of the project was 3.3Mb.&lt;/p&gt;

&lt;p&gt;Both worked out of the box with minimal additional configuration needed. &lt;/p&gt;

&lt;h2&gt;
  
  
  Building From Scratch
&lt;/h2&gt;

&lt;p&gt;Instead of using the skeletons, I decided to build the sites by creating my own setup. The advantages of doing this is that I was able to tailor the frameworks to suit my needs, and see how flexible they were to different structures.&lt;/p&gt;

&lt;p&gt;One of the big advantages of using micro frameworks is being able to  build them to do exactly what you need without unnecessary overhead, adding features and libraries as they become needed. &lt;/p&gt;

&lt;p&gt;My setup with Flight PHP wasn't significantly different from the skeleton, While I did end up with less directories and different composer libraries, structurally, it was similar.&lt;/p&gt;

&lt;p&gt;With Slim, the structure of the project ended up significantly different from the skeleton. &lt;/p&gt;

&lt;p&gt;It was nice that Slim was flexible and wasn't making assumptions about structure and worked just fine with a completely different structure than the skeleton.&lt;/p&gt;

&lt;p&gt;Flight PHP is also flexible in this way, allowing for more complex structures if needed, adding new libraries into the framework was straight forward. &lt;/p&gt;

&lt;h2&gt;
  
  
  The Code
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;u&gt;Routing&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;From a routing point of view, both were nice to work with. They were both easy to set up without much documentation reading necessary. &lt;/p&gt;

&lt;p&gt;Routes in Flight PHP were slightly simpler to setup than Slim, and used less code to do so, but neither was difficult to setup.  &lt;/p&gt;

&lt;p&gt;Routing groups, regex abilities and middleware options made routes flexible while still being easy to work with.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;u&gt;Database Connections&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;With Slim, the expectation is that you should use an ORM like &lt;strong&gt;&lt;a href="https://github.com/illuminate/database" rel="noopener noreferrer"&gt;Eloquent&lt;/a&gt;&lt;/strong&gt; or &lt;strong&gt;&lt;a href="https://www.doctrine-project.org" rel="noopener noreferrer"&gt;Doctrine&lt;/a&gt;&lt;/strong&gt; for your database queries, whereas Flight PHP provides a simple wrapper for PDO that can be used if you need to and optionally, &lt;strong&gt;&lt;a href="https://docs.flightphp.com/awesome-plugins/active-record" rel="noopener noreferrer"&gt;Active Record&lt;/a&gt;&lt;/strong&gt; can be added to the project for query building. &lt;/p&gt;

&lt;p&gt;For a small project like the one I was working on, using an ORM seemed to be a bit more than necessary, so I ended up building a small PDO wrapper class for Slim, similar to the one that comes built into Flight PHP.&lt;/p&gt;

&lt;p&gt;ORMs are great, but having the flexibility built in to choose how I wish to code database queries is a good feature.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;u&gt;General Coding&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Both Slim and Flight PHP Frameworks are good at allowing you to write code your own way. &lt;/p&gt;

&lt;p&gt;Some frameworks tend to force you into coding a specific way and at times it can feel like you are fighting against the framework. &lt;/p&gt;

&lt;p&gt;Frameworks should work with you not against you, and both of these felt like they were working with me.&lt;/p&gt;

&lt;p&gt;Slim also provides a number of handy add ons including &lt;a href="https://github.com/slimphp/Slim-Csrf/" rel="noopener noreferrer"&gt;CSRF integration&lt;/a&gt; and &lt;a href="https://github.com/slimphp/Slim-HttpCache" rel="noopener noreferrer"&gt;HTTP caching&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;Flight PHP provides additional add ons including &lt;a href="https://docs.flightphp.com/awesome-plugins/permissions" rel="noopener noreferrer"&gt;Permissions&lt;/a&gt; and &lt;a href="https://docs.flightphp.com/awesome-plugins/active-record" rel="noopener noreferrer"&gt;Active Record&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;All of these add ons are helpful additions without having to use 3rd party solutions or build your own.&lt;/p&gt;

&lt;p&gt;Returning JSON as a response is cleaner in Flight PHP than it is in Slim, Slim 3 had a convenient &lt;code&gt;withJson&lt;/code&gt; response. While Slim 4 adheres more to PSR-7, it does mean that to build the JSON response requires more code.&lt;/p&gt;

&lt;p&gt;If I was going to be using JSON responses a lot, I would likely create a wrapper to make it more convenient while still adhering to the PSR-7 standard. &lt;/p&gt;

&lt;p&gt;This is a significant difference between the two Frameworks, Slim feels like it needs to be tailored more by creating classes to clean up and simplify the codebase, while Flight PHP has already done this for you.&lt;/p&gt;

&lt;p&gt;Slim provides a number of helper middleware. The middleware is required in order to make some features work. &lt;/p&gt;

&lt;p&gt;An example of this is fetching data from Javascript using FETCH. Slim has a method &lt;code&gt;getParsedBody&lt;/code&gt; to create a data array from the POST request. &lt;/p&gt;

&lt;p&gt;However, in order to use it the &lt;code&gt;addBodyParsingMiddleware&lt;/code&gt; needs to be added to the container. &lt;/p&gt;

&lt;p&gt;It's a little bit of a trap for new devs, but also provides access to optional features, which can lower the frameworks overall footprint by only enabling features you need. &lt;/p&gt;

&lt;p&gt;Flight PHP achieves this through a config file, some features can be turned on and off through the config rather than through middleware enablement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;u&gt;Speed Tests&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;According to benchmarks, comparing the two has interesting results, Slim edges out Flight PHP on some areas while Flight PHP edges out Slim in other areas.&lt;/p&gt;

&lt;p&gt;Putting the two frameworks to a test on my own code showed that Flight PHP had faster and more consistent response times than Slim. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;u&gt;Front End&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3oe2dhk8yh94fdnpl4jo.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3oe2dhk8yh94fdnpl4jo.jpg" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;u&gt;GET request returning JSON&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frqmmctgo9feu7z837obk.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frqmmctgo9feu7z837obk.jpg" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;u&gt;POST request returning JSON&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjn0wnxm704l8464qbqw4.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjn0wnxm704l8464qbqw4.jpg" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What I found noteworthy was the outlier spikes when using Slim.&lt;/p&gt;

&lt;p&gt;Running these tests multiple times produced similar results each time to the ones I have shown above, with generally good response times for both but with outlier spikes in Slim that didn't occur when testing Flight PHP, and Flight PHP generally having better response times.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;If you haven't ventured into micro frameworks, give them a go, there are a few out there and it can be a great learning experience to try them out and see what you like and what you don't like in each one. &lt;/p&gt;

&lt;p&gt;Both Slim and Flight PHP are great micro frameworks.&lt;/p&gt;

&lt;p&gt;Slim is a solid framework with some nice-to-have features, that will work quietly for you. &lt;/p&gt;

&lt;p&gt;Flight PHP is lighter weight, and its simplicity makes learning the framework really easy. &lt;/p&gt;

&lt;p&gt;Good response times and more simplified code to achieve the same thing makes it a really good choice for a micro framework to use.&lt;/p&gt;

&lt;p&gt;After putting these two side by side, I do prefer Flight PHP over Slim, but as with any framework, give it a go and see if it works for you.&lt;/p&gt;

&lt;p&gt;After all, the right framework is a framework that does what you need it to do. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://flightphp.com" rel="noopener noreferrer"&gt;Flight PHP&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.slimframework.com" rel="noopener noreferrer"&gt;Slim Framework&lt;/a&gt;&lt;/p&gt;

</description>
      <category>php</category>
      <category>beginners</category>
      <category>flightphp</category>
      <category>slimframework</category>
    </item>
    <item>
      <title>Web Developer Burnout</title>
      <dc:creator>Lawrence Cooke</dc:creator>
      <pubDate>Thu, 15 Aug 2024 14:00:00 +0000</pubDate>
      <link>https://forem.com/mrpercival/web-developer-burnout-phk</link>
      <guid>https://forem.com/mrpercival/web-developer-burnout-phk</guid>
      <description>&lt;h2&gt;
  
  
  A Little Background
&lt;/h2&gt;

&lt;p&gt;Back when I started my web developer journey, everything was much simpler than it is today. &lt;/p&gt;

&lt;p&gt;I started out using Mosaic web browser and then not too long after moved to the beta version of Netscape. &lt;/p&gt;

&lt;p&gt;Developers at that time learned things together, as new features were added into the browser, we would share how we used them over a snack at the local bakery. &lt;/p&gt;

&lt;p&gt;This lead to a decent amount of shared learning. Back then there was basically two things that were handy to know, HTML and Perl. &lt;/p&gt;

&lt;p&gt;HTML for the web design and if there was any web form processing, we would use Perl. &lt;/p&gt;

&lt;p&gt;While Perl has other more beneficial uses, Just knowing enough to process a web form was enough for a lot of use cases. &lt;/p&gt;

&lt;p&gt;At the time, database administration was a more specialized job, MySQL and Postgres were not yet around, so the job of a web developer was more concentrated on HTML, Perl and Web Design. &lt;/p&gt;

&lt;p&gt;Not long after, CSS and JavaScript were introduced, and as the 90s rolled on, Cold Fusion, PHP and others were added in. &lt;/p&gt;

&lt;p&gt;PHP was simplistic, and for me , was the next logical choice, as was dipping my toes into databases now that they were more accessible.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Web Development Shift
&lt;/h2&gt;

&lt;p&gt;During the next few years, there was a large shift in web design and development. PHP and other server side scripting became more polished and accessible.&lt;/p&gt;

&lt;p&gt;While Perl was still relevant, other options like Python and Ruby gained popularity.&lt;/p&gt;

&lt;p&gt;There was also a shift in front end web design. Faster internet speeds opened up the internet to more graphic heavy design, and web standards were developed and tuned.&lt;/p&gt;

&lt;p&gt;While Developers were busy learning new back end technologies, front end development matured into more complex designs requiring better graphic design skills. &lt;/p&gt;

&lt;p&gt;This created a split in web development, where web development split  into back end and front end specialists. &lt;/p&gt;

&lt;h2&gt;
  
  
  Burnout In Developers
&lt;/h2&gt;

&lt;p&gt;Fast forward to today, where it's all too common to see posts on social media, especially from junior developers suffering from burnout. &lt;/p&gt;

&lt;p&gt;Polls taken have shown that up to 83% of developers have suffered burnout, with high work load the most common reason. &lt;/p&gt;

&lt;p&gt;High work load can be a combination of potentially long work hours and pressures to spend time after work learning more just to keep up. &lt;/p&gt;

&lt;p&gt;Back when I started, the journey tended to lead itself, where new technologies were introduced at a rate where learning the new technology was just a natural progression, we were eased into learning new things. &lt;/p&gt;

&lt;p&gt;I still believe that taking time to introduce yourself to a new technology and concentrate on it has a better outcome than introducing multiple technologies at once, where you have to split your time between them. &lt;/p&gt;

&lt;p&gt;In a lot of cases, the pressure to learn multiple technologies to keep up, while balancing workplace expectations, leads to developer burnout and often leads to really good developers leaving the industry for simpler pastures. &lt;/p&gt;




&lt;p&gt;New developers are bombarded with a large range of technologies, technologies that are ever changing, as are coding standards around these technologies. &lt;/p&gt;

&lt;p&gt;Even within technologies there are multiple frameworks and libraries  to learn, while trying to grasp the language itself, best practices, coding styles, and design patterns.&lt;/p&gt;

&lt;p&gt;A look on Indeed at jobs listed as junior developer jobs shows jobs with vastly different tech stack requirements, leading to new developers to ask "where should I start?".&lt;/p&gt;

&lt;p&gt;Adding to the problem is that the answers they receive can be wildly different, depending on the responders own developer journey. &lt;/p&gt;

&lt;p&gt;There is not just one clear path to take, leaving junior developers needing to learn many technologies quickly to be successful. &lt;/p&gt;

&lt;p&gt;To progress, they need opportunities to learn on the job but also take the initiative to continue learning outside work hours, stretching them thin.&lt;/p&gt;

&lt;p&gt;Even as an experienced developer, the need to continually learn still remains. Those who don't spend time learning new things eventually get left behind. &lt;/p&gt;

&lt;p&gt;One developer I recently spoke to, had been working his 8 hour days, then spending 3 - 5 hours a night on learning. This is great ambition to have, to want to learn, but it leads to burnout very quickly. We need time to rest and absorb.&lt;/p&gt;

&lt;p&gt;Finding a good balance between learning what we need to learn and our own health and well-being can be hard to find.&lt;/p&gt;

&lt;h2&gt;
  
  
  Other Burnout factors
&lt;/h2&gt;

&lt;p&gt;Along with burnout from trying to keep up and meet expectations from the work place, burnout comes from other factors also. &lt;/p&gt;

&lt;p&gt;Home life can affect your developer life. &lt;/p&gt;

&lt;p&gt;When you are dealing with life difficulties, mentally, there is only so much that you can do.&lt;/p&gt;

&lt;p&gt;The needs of the home outweigh development (as it should). At some point you just reach a breaking point.&lt;/p&gt;

&lt;p&gt;How others talk to you at work or online can lead to burnout. There are only so many negative code reviews that could have been written in a more positive manner, only so many times you can do a good job that goes unnoticed, before it takes its toll. &lt;/p&gt;

&lt;p&gt;It's not that developers sit around waiting for appreciation, it's just that hearing it may be the inspiration they needed to keep going.&lt;/p&gt;

&lt;h2&gt;
  
  
  How can we help?
&lt;/h2&gt;

&lt;p&gt;Developer burnout is multi-faceted but we can all do our bit to help our fellow developers from suffering, and prevent good developers from leaving the industry. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Be willing to share your knowledge&lt;/strong&gt; &lt;br&gt;
Even as new developers, share what you just learned, your excitement can be contagious.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Have patience with developers&lt;/strong&gt;&lt;br&gt;
While there might be gaps in their knowledge, there are going to be days where they return the favor and are able to teach you something. &lt;/p&gt;

&lt;p&gt;Even experienced developers have gaps, don't look down on them because of their gaps. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Be mindful&lt;/strong&gt; &lt;br&gt;
While developers may write code that is not always great, they are learning, and I doubt there is a developer out there who hasn't been where they are at, and written code that we are not proud of. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Code Reviews&lt;/strong&gt;&lt;br&gt;
While at times code reviews need to be negative, there are ways to write them that lessen the blow to the developer. &lt;/p&gt;

&lt;p&gt;When writing them, think about who you are reviewing, take the time to get to know the person if you can, it will help connect with them in a code review.&lt;/p&gt;

&lt;p&gt;Often in code reviews, we are so fixated on finding what is wrong with the code, we forget to mention what is right with it. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Give praise where praise is due&lt;/strong&gt;&lt;br&gt;
Praise doesn't need to just be for big wins, praise for small wins may just be what the developer needed to hear that day. &lt;/p&gt;

&lt;p&gt;It can be as simple as telling someone they did a really good job with something, even if there are flaws in their code, there are still positives that can be highlighted.&lt;/p&gt;

&lt;p&gt;With the challenges new developers face, how we handle our interaction with them can make or break them as a developer. &lt;/p&gt;

&lt;p&gt;This still applies to interacting with experienced developers, the challenges they face may be different from the challenges new developers face, but our interaction can just as easily make or break them as a developer. &lt;/p&gt;

&lt;p&gt;You don't know what else might be going on in a developers life, and treating people with empathy can help them more than you will ever know.&lt;/p&gt;

&lt;h2&gt;
  
  
  How can we help ourselves?
&lt;/h2&gt;

&lt;p&gt;We also need to remember to take care of ourselves to avoid burnout. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Take time for yourself&lt;/strong&gt;&lt;br&gt;
While there may be pressure to learn more and do more, taking time away will rejuvenate yourself. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Switch off for a bit&lt;/strong&gt;&lt;br&gt;
Do something you like to do that doesn't involve tech. &lt;/p&gt;

&lt;p&gt;For me it's turning off the computer, putting my phone on silent and spending time outdoors, the beach if I can, lakes, forests and waterfalls if I can't get to a beach. A place where I can empty my head a bit.&lt;/p&gt;

&lt;p&gt;Burnout tends to come from either pressure being put on you, or putting pressure on yourself. Take that load off where you can, so you can rest and reset.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do something different&lt;/strong&gt;&lt;br&gt;
Learn something new just for yourself, or work on a new side project, something where you can do what you love doing but without any pressures.&lt;/p&gt;

&lt;p&gt;The journey back from burnout is made of small steps. Gaining confidence back in yourself, confidence in your skill set.&lt;/p&gt;

&lt;p&gt;Learning to once again love what you do.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Find the right balance&lt;/strong&gt; &lt;br&gt;
Find the right balance to be able to still learn, but learn at your own pace. &lt;/p&gt;

&lt;p&gt;Learning at your own pace creates a more enjoyable and rewarding experience. It can also help prevent burnout while still improving your skillset.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Developer burnout is a serious issue, affecting both newcomers and experienced professionals. &lt;/p&gt;

&lt;p&gt;The rapidly evolving tech landscape, coupled with high expectations and personal pressure, can be overwhelming. &lt;/p&gt;

&lt;p&gt;Creating a supportive environment, using empathy in our feedback to other developers, and prioritizing our own well-being can help with reducing burnout. &lt;/p&gt;

&lt;p&gt;It is important to continually learn in this industry, but do what works for you. Learn at a pace that ensures you remain in a good space. &lt;/p&gt;

&lt;p&gt;What that pace looks likes will differ from person to person, what's right for someone else doesn't mean its right for you. &lt;/p&gt;

&lt;p&gt;You are more beneficial to both your employer and your family if you are mentally and physically feeling 100%.&lt;/p&gt;

&lt;p&gt;If you are struggling, talk to someone about it. &lt;/p&gt;

&lt;p&gt;A fellow developer who you trust, your manager at work if you have a good rapport with them. &lt;/p&gt;

&lt;p&gt;Someone who might have experienced a similar problem and worked  through it. They may be able to offer you good advice and insights to help you get back on your feet again. &lt;/p&gt;

</description>
      <category>developers</category>
      <category>webdev</category>
      <category>productivity</category>
      <category>programmers</category>
    </item>
    <item>
      <title>Creating Custom Functions In PostgreSQL</title>
      <dc:creator>Lawrence Cooke</dc:creator>
      <pubDate>Thu, 25 Jul 2024 13:19:46 +0000</pubDate>
      <link>https://forem.com/mrpercival/creating-custom-functions-in-postgresql-52bn</link>
      <guid>https://forem.com/mrpercival/creating-custom-functions-in-postgresql-52bn</guid>
      <description>&lt;p&gt;In PostgreSQL, custom functions can be created to solve complex problems.&lt;/p&gt;

&lt;p&gt;These can be written using the default PL/pgSQL scripting language, or they can be written in another scripting language. &lt;/p&gt;

&lt;p&gt;Python, Perl, Tcl and R are some of the scripting languages supported. &lt;/p&gt;

&lt;p&gt;While PL/pgSQL comes with any Postgres installation, to use other languages requires some setup.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installing the extension
&lt;/h2&gt;

&lt;p&gt;Before an extension can be used, the extension package needs to be installed.&lt;/p&gt;

&lt;p&gt;On Ubuntu you would run:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Perl&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo apt-get -y install postgresql-plperl-14
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;The package name 'postgresql-plperl-14' is specific to PostgreSQL version 14. If you're using a different version of PostgreSQL, you need to change the version number in the package name to match your installed PostgreSQL version.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Python 3&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo apt-get install postgresql-plpython3-14
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Activating the extension
&lt;/h2&gt;

&lt;p&gt;To activate the extension in PostgreSQL the extension must be defined using the &lt;code&gt;CREATE EXTENSION&lt;/code&gt; statement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Perl&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;EXTENSION&lt;/span&gt; &lt;span class="n"&gt;plperl&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Python&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;EXTENSION&lt;/span&gt; &lt;span class="n"&gt;plpython3&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Hello world example
&lt;/h2&gt;

&lt;p&gt;Once the extension has been created, a custom function can be created using the extension.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Perl&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;REPLACE&lt;/span&gt; &lt;span class="k"&gt;FUNCTION&lt;/span&gt; &lt;span class="n"&gt;hello&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;RETURNS&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;
    &lt;span class="n"&gt;my&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;"Hello, $name!"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="err"&gt;$$&lt;/span&gt; &lt;span class="k"&gt;LANGUAGE&lt;/span&gt; &lt;span class="n"&gt;plperl&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Python&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;REPLACE&lt;/span&gt; &lt;span class="k"&gt;FUNCTION&lt;/span&gt; &lt;span class="n"&gt;hello&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;RETURNS&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;"Hello, "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nv"&gt;"!"&lt;/span&gt;
&lt;span class="err"&gt;$$&lt;/span&gt; &lt;span class="k"&gt;LANGUAGE&lt;/span&gt; &lt;span class="n"&gt;plpython3&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Breaking this down line by line&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;REPLACE&lt;/span&gt; &lt;span class="k"&gt;FUNCTION&lt;/span&gt; &lt;span class="n"&gt;hello&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This line is how a function is created in Postgres. By using CREATE OR REPLACE, it will overwrite whatever function is already defined with the name &lt;strong&gt;hello&lt;/strong&gt; with the new function. &lt;/p&gt;

&lt;p&gt;Using &lt;code&gt;CREATE FUNCTION hello(name text)&lt;/code&gt; will prevent the function from overwriting an existing function and will error if the function already exists. &lt;/p&gt;






&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;RETURNS&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This defines what Postgres data type will be returned, it's important that the data type specified is a type recognized by Postgres. A custom data type can be specified, if the custom type is already defined. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;$$&lt;/strong&gt; is a delimiter to mark the beginning and end of a block of code. In this line it's marking the start of the code block.&lt;/p&gt;

&lt;p&gt;All code between the start and end $$ will be executed by Postgres&lt;/p&gt;






&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="err"&gt;$$&lt;/span&gt; &lt;span class="k"&gt;LANGUAGE&lt;/span&gt; &lt;span class="n"&gt;plperl&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;$$&lt;/strong&gt; denotes the end of the script and tells Postgres what language the script should be parsed as.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using the function
&lt;/h2&gt;

&lt;p&gt;Functions can be used like any built-in Postgres function&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;hello&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'world'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will return a column with the value &lt;code&gt;Hello world!&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Functions can be part of more complex queries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hello&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'world'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;greeting&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  More complex example
&lt;/h2&gt;

&lt;p&gt;Here is an example function that accepts text from a field and returns a word count.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;REPLACE&lt;/span&gt; &lt;span class="k"&gt;FUNCTION&lt;/span&gt; &lt;span class="n"&gt;word_count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;paragraph&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;RETURNS&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;
&lt;span class="n"&gt;use&lt;/span&gt; &lt;span class="k"&gt;strict&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;use&lt;/span&gt; &lt;span class="n"&gt;warnings&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="n"&gt;my&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="n"&gt;my&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;words&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=~&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="o"&gt;+/&lt;/span&gt;&lt;span class="k"&gt;g&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;my&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;word_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;scalar&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="n"&gt;my&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="k"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'{'&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;
    &lt;span class="s1"&gt;'"word_count":'&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;word_count&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="s1"&gt;'}'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="k"&gt;result&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="err"&gt;$$&lt;/span&gt; &lt;span class="k"&gt;LANGUAGE&lt;/span&gt; &lt;span class="n"&gt;plperl&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This returns a JSON formatted result with the word count.&lt;/p&gt;




&lt;p&gt;We can add more detailed statistics to the function.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;REPLACE&lt;/span&gt; &lt;span class="k"&gt;FUNCTION&lt;/span&gt; &lt;span class="n"&gt;word_count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;paragraph&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;RETURNS&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;
&lt;span class="n"&gt;use&lt;/span&gt; &lt;span class="k"&gt;strict&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;use&lt;/span&gt; &lt;span class="n"&gt;warnings&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="n"&gt;my&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="n"&gt;my&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;words&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=~&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="o"&gt;+/&lt;/span&gt;&lt;span class="k"&gt;g&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="n"&gt;my&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;word_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;scalar&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="n"&gt;my&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;sentence_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=~&lt;/span&gt; &lt;span class="n"&gt;tr&lt;/span&gt;&lt;span class="o"&gt;/!?&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;/!?&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="n"&gt;my&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;average_words_per_sentence&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
  &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;sentence_count&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;word_count&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;sentence_count&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="n"&gt;my&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="k"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'{'&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;
    &lt;span class="s1"&gt;'"word_count":'&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;word_count&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="s1"&gt;','&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;
    &lt;span class="s1"&gt;'"sentence_count":'&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;sentence_count&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="s1"&gt;','&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;
    &lt;span class="s1"&gt;'"average_words_per_sentence":"'&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="n"&gt;sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"%.2f"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;average_words_per_sentence&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="s1"&gt;'"'&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="s1"&gt;'}'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="k"&gt;result&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="err"&gt;$$&lt;/span&gt; &lt;span class="k"&gt;LANGUAGE&lt;/span&gt; &lt;span class="n"&gt;plperl&lt;/span&gt; &lt;span class="k"&gt;SECURITY&lt;/span&gt; &lt;span class="k"&gt;DEFINER&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now when we use it in a query&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;word_count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text_field&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;word_count&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It will return JSON like&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"word_count"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;116&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"sentence_count"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"average_words_per_sentence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"7.73"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Security considerations
&lt;/h2&gt;

&lt;p&gt;When using custom functions or external scripting languages, there are additional security considerations to take into account. It can be a juggling act to get the right balance between usability and security. &lt;/p&gt;

&lt;h3&gt;
  
  
  Security Definer vs Security Invoker
&lt;/h3&gt;

&lt;p&gt;In the previous function, SECURITY DEFINER option was added to the create function statement. &lt;/p&gt;

&lt;p&gt;It's important to think about how you want a function run from a security point of view.&lt;/p&gt;

&lt;p&gt;The default behavior is to use SECURITY INVOKER. This will run the function with the privileges of the user who is running the function.&lt;/p&gt;

&lt;p&gt;SECURITY DEFINER provides more control over the privileges granted to the function. Using this mode, the function will run with the privileges of the user who created the function. &lt;/p&gt;

&lt;p&gt;This can be both good and bad, if a function is created by a user with limited privileges, then there is little harm that can be done  to the database. &lt;/p&gt;

&lt;p&gt;If the function is created by a user with high access privileges, then the function will run with those same privileges. Depending on the type of function, this could allow a user to run the function with more open privileges than they have been granted. &lt;/p&gt;

&lt;p&gt;There are times where this is useful, for example, if a user does not have read privileges to a table, but within the function , read is required, using SECURITY DEFINER can allow the required read privileges for the function to run. &lt;/p&gt;




&lt;h3&gt;
  
  
  Trusted and untrusted extensions
&lt;/h3&gt;

&lt;p&gt;When creating the extensions above, &lt;code&gt;plperl&lt;/code&gt; and &lt;code&gt;plpython3&lt;/code&gt; were used. In most circumstances these are the right extensions to use. &lt;/p&gt;

&lt;p&gt;These extensions have limited access to the servers file system and system calls. &lt;/p&gt;

&lt;p&gt;Extensions can also be created with a &lt;strong&gt;u&lt;/strong&gt; (plpython3u, plperlu)&lt;/p&gt;

&lt;p&gt;These are untrusted extensions and allow more access to the servers file system. &lt;/p&gt;

&lt;p&gt;There may be cases where this is required, for example, if you want to use Perl modules, Python Libraries, or use system calls. &lt;/p&gt;

&lt;p&gt;In the example above, the JSON output was generated as a string, if desired, the perl JSON module could have been used to encode the data as JSON. To do this would require using the untrusted extension to access the JSON module.&lt;/p&gt;

&lt;p&gt;It's advisable to not use the untrusted extensions, but if necessary, use with caution and understand the potential risks.   &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If Perl is being used, Perl will run in &lt;code&gt;taint mode&lt;/code&gt; when the untrusted extension is in use.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Being able to take advantage of Perls advanced text processing and  memory management, or Pythons data analytic libraries  within PostgreSQL can be a really powerful tool. &lt;/p&gt;

&lt;p&gt;Passing off complex tasks to tools more suited to handling the task can reduce overhead on the database.&lt;/p&gt;

&lt;p&gt;As always, when using custom functions and external scripting languages, take precautions to ensure secure usage.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>perl</category>
      <category>python</category>
    </item>
    <item>
      <title>Migrating from MySQL to PostgreSQL</title>
      <dc:creator>Lawrence Cooke</dc:creator>
      <pubDate>Thu, 11 Jul 2024 05:00:00 +0000</pubDate>
      <link>https://forem.com/mrpercival/migrating-from-mysql-to-postgresql-1oh7</link>
      <guid>https://forem.com/mrpercival/migrating-from-mysql-to-postgresql-1oh7</guid>
      <description>&lt;p&gt;Migrating a database from MySQL to Postgres is a challenging process. &lt;/p&gt;

&lt;p&gt;While MySQL and Postgres do a similar job, there are some fundamental  differences between them and those differences can create issues that need addressing for the migration to be successful.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where to start?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://pgloader.io" rel="noopener noreferrer"&gt;Pg Loader&lt;/a&gt; is a tool that can be used to move your data to PostgreSQL, however, it's not perfect, but can work well in some cases. It's worth looking at to see if it's the direction you want to go. &lt;/p&gt;

&lt;p&gt;Another approach to take is to create custom scripts. &lt;/p&gt;

&lt;p&gt;Custom scripts offer greater flexibility and scope to address issues specific to your dataset.&lt;/p&gt;

&lt;p&gt;For this article, custom scripts were built to handle the migration process.&lt;/p&gt;

&lt;h2&gt;
  
  
  Exporting the data
&lt;/h2&gt;

&lt;p&gt;How the data is exported is critical to a smooth migration. Using mysqldump in its default setup will lead to a more difficult process.&lt;/p&gt;

&lt;p&gt;Use the &lt;code&gt;--compatible=ansi&lt;/code&gt; option to export the data in a format PostgreSQL requires.&lt;/p&gt;

&lt;p&gt;To make the migration easier to handle, split up the schema and data dumps so they can be processed separately. The processing requirements for each file are very different and creating a script for each will make it more manageable. &lt;/p&gt;

&lt;h2&gt;
  
  
  Schema differences
&lt;/h2&gt;

&lt;h4&gt;
  
  
  Data Types
&lt;/h4&gt;

&lt;p&gt;There are differences in what data types are available in MySQL and PostgreSQL, this means when processing your schema you are going to need to decide what field data types work best for your data. &lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;MySQL&lt;/th&gt;
&lt;th&gt;PostgreSQL&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Numeric&lt;/td&gt;
&lt;td&gt;INT, TINYINT, SMALLINT, MEDIUMINT, BIGINT, FLOAT, DOUBLE, DECIMAL&lt;/td&gt;
&lt;td&gt;INTEGER, SMALLINT, BIGINT, NUMERIC, REAL, DOUBLE PRECISION, SERIAL, SMALLSERIAL, BIGSERIAL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;String&lt;/td&gt;
&lt;td&gt;CHAR, VARCHAR, TINYTEXT, TEXT, MEDIUMTEXT, LONGTEXT&lt;/td&gt;
&lt;td&gt;CHAR, VARCHAR, TEXT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Date and Time&lt;/td&gt;
&lt;td&gt;DATE, TIME, DATETIME, TIMESTAMP, YEAR&lt;/td&gt;
&lt;td&gt;DATE, TIME, TIMESTAMP, INTERVAL, TIMESTAMPTZ&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Binary&lt;/td&gt;
&lt;td&gt;BINARY, VARBINARY, TINYBLOB, BLOB, MEDIUMBLOB, LONGBLOB&lt;/td&gt;
&lt;td&gt;BYTEA&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Boolean&lt;/td&gt;
&lt;td&gt;BOOLEAN (TINYINT(1))&lt;/td&gt;
&lt;td&gt;BOOLEAN&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enum and Set&lt;/td&gt;
&lt;td&gt;ENUM, SET&lt;/td&gt;
&lt;td&gt;ENUM (no SET equivalent)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;JSON&lt;/td&gt;
&lt;td&gt;JSON&lt;/td&gt;
&lt;td&gt;JSON, JSONB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Geometric&lt;/td&gt;
&lt;td&gt;GEOMETRY, POINT, LINESTRING, POLYGON&lt;/td&gt;
&lt;td&gt;POINT, LINE, LSEG, BOX, PATH, POLYGON, CIRCLE&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Network Address&lt;/td&gt;
&lt;td&gt;No built-in types&lt;/td&gt;
&lt;td&gt;CIDR, INET, MACADDR&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;UUID&lt;/td&gt;
&lt;td&gt;No built-in type (can use CHAR(36))&lt;/td&gt;
&lt;td&gt;UUID&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Array&lt;/td&gt;
&lt;td&gt;No built-in support&lt;/td&gt;
&lt;td&gt;Supports arrays of any data type&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;XML&lt;/td&gt;
&lt;td&gt;No built-in type&lt;/td&gt;
&lt;td&gt;XML&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Range Types&lt;/td&gt;
&lt;td&gt;No built-in support&lt;/td&gt;
&lt;td&gt;int4range, int8range, numrange, tsrange, tstzrange, daterange&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Composite Types&lt;/td&gt;
&lt;td&gt;No built-in support&lt;/td&gt;
&lt;td&gt;User-defined composite types&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  Tinyint field type
&lt;/h4&gt;

&lt;p&gt;Tinyint doesn't exist in PostgreSQL. You have the choice of &lt;code&gt;smallint&lt;/code&gt; or &lt;code&gt;boolean&lt;/code&gt; to replace it with. Choose the data type most  like the current dataset.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight perl"&gt;&lt;code&gt; &lt;span class="nv"&gt;$line&lt;/span&gt; &lt;span class="o"&gt;=~&lt;/span&gt; &lt;span class="sr"&gt;s/\btinyint(?:\(\d+\))?\b/smallint/gi&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Enum Field type
&lt;/h4&gt;

&lt;p&gt;Enum fields are a little more complex, while enums exist in PostgreSQL, they require creating custom types.&lt;/p&gt;

&lt;p&gt;To avoid duplicating custom types, it is better to plan out what enum types are required and create the minimum number of custom types needed for your schema. Custom types are not table specific, one custom type can be used on multiple tables.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TYPE&lt;/span&gt; &lt;span class="n"&gt;color_enum&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;ENUM&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'blue'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'green'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="nv"&gt;"shirt_color"&lt;/span&gt; &lt;span class="n"&gt;color_enum&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="s1"&gt;'blue'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="nv"&gt;"pant_color"&lt;/span&gt; &lt;span class="n"&gt;color_enum&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="s1"&gt;'green'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The creation of the types would need to be done before the SQL is imported. The script could then be adjusted to use the custom types that have been created. &lt;/p&gt;

&lt;p&gt;If there are multiple fields using enum('blue','green'), these should all be using the same enum custom type. Creating custom types for each individual field would not be good database design.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight perl"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="nv"&gt;$line&lt;/span&gt; &lt;span class="o"&gt;=~&lt;/span&gt; &lt;span class="sr"&gt;/"([^"]+)"\s+enum\(([^)]+)\)/&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;my&lt;/span&gt; &lt;span class="nv"&gt;$column_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;my&lt;/span&gt; &lt;span class="nv"&gt;$enum_values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="nv"&gt;$enum_values&lt;/span&gt; &lt;span class="o"&gt;!~&lt;/span&gt; &lt;span class="sr"&gt;/''/&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$enum_values&lt;/span&gt; &lt;span class="o"&gt;.=&lt;/span&gt; &lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;,''&lt;/span&gt;&lt;span class="p"&gt;";&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;my&lt;/span&gt; &lt;span class="nv"&gt;@items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$enum_values&lt;/span&gt; &lt;span class="o"&gt;=~&lt;/span&gt; &lt;span class="sr"&gt;/'([^']*)'/g&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;my&lt;/span&gt; &lt;span class="nv"&gt;$sorted_enum_values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="p"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;,&lt;/span&gt;&lt;span class="p"&gt;',&lt;/span&gt; &lt;span class="nb"&gt;sort&lt;/span&gt; &lt;span class="nv"&gt;@items&lt;/span&gt; &lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;my&lt;/span&gt; &lt;span class="nv"&gt;$enum_type_name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="nb"&gt;exists&lt;/span&gt; &lt;span class="nv"&gt;$enum_types&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$sorted_enum_values&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$enum_type_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$enum_types&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$sorted_enum_values&lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$enum_type_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;create_enum_type_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$sorted_enum_values&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nv"&gt;$enum_types&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$sorted_enum_values&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$enum_type_name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="c1"&gt;# Add CREATE TYPE statement to post-processing&lt;/span&gt;
        &lt;span class="nb"&gt;push&lt;/span&gt; &lt;span class="nv"&gt;@enum_lines&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;CREATE TYPE &lt;/span&gt;&lt;span class="si"&gt;$enum_type_name&lt;/span&gt;&lt;span class="s2"&gt; AS ENUM (&lt;/span&gt;&lt;span class="si"&gt;$enum_values&lt;/span&gt;&lt;span class="s2"&gt;);&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="p"&gt;";&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;# Replace the line with the new ENUM type&lt;/span&gt;
    &lt;span class="nv"&gt;$line&lt;/span&gt; &lt;span class="o"&gt;=~&lt;/span&gt; &lt;span class="sr"&gt;s/enum\([^)]+\)/$enum_type_name/&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Indexes
&lt;/h4&gt;

&lt;p&gt;There are differences in how indexes are created. There are two variations of indexes, Indexes with character limitations and indexes without character limitations. Both of these needed to be handled and removed from the SQL and put into a separate SQL file to be run after the import is complete (&lt;code&gt;run_after.sql&lt;/code&gt;).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight perl"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$line&lt;/span&gt; &lt;span class="o"&gt;=~&lt;/span&gt; &lt;span class="sr"&gt;/^\s*KEY\s+/i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$line&lt;/span&gt; &lt;span class="o"&gt;=~&lt;/span&gt; &lt;span class="sr"&gt;/KEY\s+"([^"]+)"\s+\("([^"]+)"\)/&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;my&lt;/span&gt; &lt;span class="nv"&gt;$index_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;my&lt;/span&gt; &lt;span class="nv"&gt;$column_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="nb"&gt;push&lt;/span&gt; &lt;span class="nv"&gt;@post_process_lines&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;CREATE INDEX idx_&lt;/span&gt;&lt;span class="si"&gt;${current_table}&lt;/span&gt;&lt;span class="s2"&gt;_&lt;/span&gt;&lt;span class="si"&gt;$index_name&lt;/span&gt;&lt;span class="s2"&gt; ON &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="si"&gt;$current_table&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt; (&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="si"&gt;$column_name&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;);&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="p"&gt;";&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;elsif&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$line&lt;/span&gt; &lt;span class="o"&gt;=~&lt;/span&gt; &lt;span class="sr"&gt;/KEY\s+"([^"]+)"\s+\("([^"]+)"\((\d+)\)\)/i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;my&lt;/span&gt; &lt;span class="nv"&gt;$index_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;my&lt;/span&gt; &lt;span class="nv"&gt;$column_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;my&lt;/span&gt; &lt;span class="nv"&gt;$prefix_length&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="nb"&gt;push&lt;/span&gt; &lt;span class="nv"&gt;@post_process_lines&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;CREATE INDEX idx_&lt;/span&gt;&lt;span class="si"&gt;${current_table}&lt;/span&gt;&lt;span class="s2"&gt;_&lt;/span&gt;&lt;span class="si"&gt;$index_name&lt;/span&gt;&lt;span class="s2"&gt; ON &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="si"&gt;$current_table&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt; (LEFT(&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="si"&gt;$column_name&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;, &lt;/span&gt;&lt;span class="si"&gt;$prefix_length&lt;/span&gt;&lt;span class="s2"&gt;));&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="p"&gt;";&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;next&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Full text indexes work quite differently in PostgreSQL. To create full text index the index must convert the data into a vector. &lt;/p&gt;

&lt;p&gt;The vector can then be indexed. There are two index types to choose from when indexing vectors. GIN and GiST. Both have pros and cons. Generally GIN is preferred over GiST. While GIN is slower building the index, it's faster for lookups.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight perl"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="nv"&gt;$line&lt;/span&gt; &lt;span class="o"&gt;=~&lt;/span&gt; &lt;span class="sr"&gt;/^\s*FULLTEXT\s+KEY\s+"([^"]+)"\s+\("([^"]+)"\)/i&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;my&lt;/span&gt; &lt;span class="nv"&gt;$index_name&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;my&lt;/span&gt; &lt;span class="nv"&gt;$column_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nb"&gt;push&lt;/span&gt; &lt;span class="nv"&gt;@post_process_lines&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;CREATE INDEX idx_fts_&lt;/span&gt;&lt;span class="si"&gt;${current_table}&lt;/span&gt;&lt;span class="s2"&gt;_&lt;/span&gt;&lt;span class="si"&gt;$index_name&lt;/span&gt;&lt;span class="s2"&gt; ON &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="si"&gt;$current_table&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt; USING GIN (to_tsvector('english', &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="si"&gt;$column_name&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;));&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="p"&gt;";&lt;/span&gt;
    &lt;span class="k"&gt;next&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Auto increment
&lt;/h4&gt;

&lt;p&gt;PostgreSQL doesn't use the AUTOINCREMENT keyword, instead it uses GENERATED ALWAYS AS IDENTITY. &lt;/p&gt;

&lt;p&gt;There is a catch with using GENERATED ALWAYS AS IDENTITY while importing data. GENERATED ALWAYS AS IDENTITY is not designed for importing IDs, When inserting a row into a table, the ID field cannot be specified. The ID value will be auto generated. Trying to insert your own IDs into the row will produce an error. &lt;/p&gt;

&lt;p&gt;To work around this issue, the ID field can be set as SERIAL type instead of &lt;code&gt;int GENERATED ALWAYS AS IDENTITY&lt;/code&gt;. SERIAL is much more flexible for imports, but it is not recommended to leave the field as SERIAL.&lt;/p&gt;

&lt;p&gt;An alternative to using this method would be to add &lt;code&gt;OVERRIDING SYSTEM VALUE&lt;/code&gt; into the insert query.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;OVERRIDING&lt;/span&gt; &lt;span class="k"&gt;SYSTEM&lt;/span&gt; &lt;span class="n"&gt;VALUE&lt;/span&gt;
&lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'A Name'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you use SERIAL, some queries will need to be written into &lt;code&gt;run_after.sql&lt;/code&gt; to change the SERIAL to GENERATED ALWAYS AS IDENTITY and reset the internal counter after the schema is created and the data has been inserted.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight perl"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="nv"&gt;$line&lt;/span&gt; &lt;span class="o"&gt;=~&lt;/span&gt; &lt;span class="sr"&gt;/^\s*"(\w+)"\s+(int|bigint)\s+NOT\s+NULL\s+AUTO_INCREMENT\s*,/i&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;my&lt;/span&gt; &lt;span class="nv"&gt;$column_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nv"&gt;$line&lt;/span&gt; &lt;span class="o"&gt;=~&lt;/span&gt; &lt;span class="sr"&gt;s/^\s*"$column_name"\s+(int|bigint)\s+NOT\s+NULL\s+AUTO_INCREMENT\s*,/"$column_name" SERIAL,/&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="nb"&gt;push&lt;/span&gt; &lt;span class="nv"&gt;@post_process_lines&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ALTER TABLE &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="si"&gt;$current_table&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt; ALTER COLUMN &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="si"&gt;$column_name&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt; DROP DEFAULT;&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="p"&gt;";&lt;/span&gt;

    &lt;span class="nb"&gt;push&lt;/span&gt; &lt;span class="nv"&gt;@post_process_lines&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;DROP SEQUENCE &lt;/span&gt;&lt;span class="si"&gt;${current_table}&lt;/span&gt;&lt;span class="s2"&gt;_&lt;/span&gt;&lt;span class="si"&gt;${column_name}&lt;/span&gt;&lt;span class="s2"&gt;_seq;&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="p"&gt;";&lt;/span&gt;

    &lt;span class="nb"&gt;push&lt;/span&gt; &lt;span class="nv"&gt;@post_process_lines&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ALTER TABLE &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="si"&gt;$current_table&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt; ALTER COLUMN &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="si"&gt;$column_name&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt; ADD GENERATED ALWAYS AS IDENTITY;&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="p"&gt;";&lt;/span&gt;

    &lt;span class="nb"&gt;push&lt;/span&gt; &lt;span class="nv"&gt;@post_process_lines&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;SELECT setval('&lt;/span&gt;&lt;span class="si"&gt;${current_table}&lt;/span&gt;&lt;span class="s2"&gt;_&lt;/span&gt;&lt;span class="si"&gt;${column_name}&lt;/span&gt;&lt;span class="s2"&gt;_seq', (SELECT COALESCE(MAX(&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="si"&gt;$column_name&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;), 1) FROM &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="si"&gt;$current_table&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;));&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="p"&gt;";&lt;/span&gt;

&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Schema results
&lt;/h2&gt;

&lt;h4&gt;
  
  
  Original schema after exporting from MySQL
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="nv"&gt;"address_book"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="cm"&gt;/*!40101 SET @saved_cs_client     = @@character_set_client */&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="cm"&gt;/*!40101 SET character_set_client = utf8 */&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="nv"&gt;"address_book"&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nv"&gt;"id"&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="n"&gt;AUTO_INCREMENT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;"user_id"&lt;/span&gt; &lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;"common_name"&lt;/span&gt; &lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;"display_name"&lt;/span&gt; &lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="nv"&gt;"user_id"&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"user_id"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Processed main SQL file
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="nv"&gt;"address_book"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="nv"&gt;"address_book"&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nv"&gt;"id"&lt;/span&gt; &lt;span class="nb"&gt;SERIAL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;"user_id"&lt;/span&gt; &lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;85&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;"common_name"&lt;/span&gt; &lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;85&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;"display_name"&lt;/span&gt; &lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;85&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Run_after.sql
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="nv"&gt;"address_book"&lt;/span&gt; &lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="nv"&gt;"id"&lt;/span&gt; &lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="n"&gt;SEQUENCE&lt;/span&gt; &lt;span class="n"&gt;address_book_id_seq&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="nv"&gt;"address_book"&lt;/span&gt; &lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="nv"&gt;"id"&lt;/span&gt; &lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;GENERATED&lt;/span&gt; &lt;span class="n"&gt;ALWAYS&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;IDENTITY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;setval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'address_book_id_seq'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;COALESCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="nv"&gt;"address_book"&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_address_book_user_id&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="nv"&gt;"address_book"&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"user_id"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Its worth noting the index naming convention used in the migration. The index name includes both the table name and the field name.  Index names have to be unique, not only within the table the index was added to, but the entire database, adding the table name and the column name reduces the chances of duplicates in your script. &lt;/p&gt;

&lt;h2&gt;
  
  
  Data processing
&lt;/h2&gt;

&lt;p&gt;The biggest hurdle in migrating your database is getting the data into a format PostgreSQL accepts. There are some differences in how PostgreSQL stores data that requires extra attention. &lt;/p&gt;

&lt;h4&gt;
  
  
  Character sets
&lt;/h4&gt;

&lt;p&gt;The dataset used for this article predated &lt;code&gt;utf8mb4&lt;/code&gt; and uses the old default of &lt;code&gt;Latin1&lt;/code&gt;, the charset is not compatible with PostgreSQL default charset UTF8, it should be noted that PostgreSQL UTF8 also differs from MySQL's UTF8mb4. &lt;/p&gt;

&lt;p&gt;The issue with migrating from Latin1 to UTF8 is how the data is stored. In Latin1 each character is a single byte, while in UTF8 the characters can be multibyte, up to 4 bytes.&lt;/p&gt;

&lt;p&gt;An example of this is the word café&lt;/p&gt;

&lt;p&gt;in Latin1 the data is stored as 4 bytes and in UTF8 as 5 bytes. During migration of character sets, the byte value is taken into account and can lead to truncated data in UTF8. PostgreSQL will error on this truncation. &lt;/p&gt;

&lt;p&gt;To avoid truncation, add padding to affected Varchar fields.&lt;/p&gt;

&lt;p&gt;It's worth noting that this same truncation issue could occur if you were changing character sets within MySQL.&lt;/p&gt;

&lt;h3&gt;
  
  
  Character Escaping
&lt;/h3&gt;

&lt;p&gt;It's not uncommon to see backslash escaped single quotes stored in a database.&lt;/p&gt;

&lt;p&gt;However, PostgreSQL doesn't support this by default. Instead, the ANSI SQL standard method of using double single quotes is used.&lt;/p&gt;

&lt;p&gt;If the varchar field contains It\'s it would need to be changed to it''s&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight perl"&gt;&lt;code&gt; &lt;span class="nv"&gt;$line&lt;/span&gt; &lt;span class="o"&gt;=~&lt;/span&gt; &lt;span class="sr"&gt;s/\\'/\'\'/g&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Table Locking
&lt;/h4&gt;

&lt;p&gt;In SQL dumps there are table locking calls before each insert.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;LOCK&lt;/span&gt; &lt;span class="n"&gt;TABLES&lt;/span&gt; &lt;span class="nv"&gt;"address_book"&lt;/span&gt; &lt;span class="k"&gt;WRITE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Generally it is unnecessary to manually lock a table in PostgreSQL.&lt;/p&gt;

&lt;p&gt;PostgreSQL handles transactions by using Multi-Version Concurrency Control (MVCC). When a row is updated, it creates a new version. Once the old version is no longer in use, it will be removed. This means that table locking is often not needed. PostgreSQL will use locks along side MVCC to improve concurrency. Manually setting locks can negatively affect concurrency. &lt;/p&gt;

&lt;p&gt;For this reason, removing the manual locks from the SQL dump and letting PostgreSQL handle the locks as needed is the better choice.   &lt;/p&gt;

&lt;h2&gt;
  
  
  Importing data
&lt;/h2&gt;

&lt;p&gt;The next step in the migration process is running the SQL files generated by the script. If the previous steps were done correctly this part should be a smooth action. What actually happens is the import picks up problems that went unseen in the prior steps, and requires going back and adjusting the scripts and trying again. &lt;/p&gt;

&lt;p&gt;To run the SQL files sign into the Postgres database using Psql and run the import function&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="k"&gt;to&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;converted_schema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;sql&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The two main errors to watch out for:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ERROR: value too long for type character varying(50)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This can be fixed by increasing varchar field character length as mentioned earlier.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ERROR: invalid command \n&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This error can be caused by stray escaped single quotes, or other  incompatible data values. To fix these, regex may need to be added to the data processing script to target the specific problem area.  &lt;/p&gt;

&lt;p&gt;Some of these errors require a harder look at the insert statements to find where the issues are. This can be challenging in a large SQL file. To help with this, write out the INSERT statements that were erroring to a separate, much smaller SQL file, which can more easily be studied to find the issues.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight perl"&gt;&lt;code&gt;&lt;span class="k"&gt;my&lt;/span&gt; &lt;span class="nv"&gt;%lines_to_debug&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;map&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="vg"&gt;$_&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1148&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1195&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; 
 &lt;span class="o"&gt;...&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;exists&lt;/span&gt; &lt;span class="nv"&gt;$lines_to_debug&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$current_line_number&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt; &lt;span class="nv"&gt;$debug_data&lt;/span&gt; &lt;span class="p"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$line&lt;/span&gt;&lt;span class="p"&gt;";&lt;/span&gt;  
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Chunking Data
&lt;/h2&gt;

&lt;p&gt;Regardless of what scripting language you choose to use for your migration, chunking data is going to be important on large SQL files. &lt;/p&gt;

&lt;p&gt;For this script, the data was chunked into 1Mb chunks, which helped kept the script efficient. You should pick a chunk size that makes sense for your dataset.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight perl"&gt;&lt;code&gt;&lt;span class="k"&gt;my&lt;/span&gt; &lt;span class="nv"&gt;$bytes_read&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="nv"&gt;$original_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$chunk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$chunk_size&lt;/span&gt; &lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Verifying Data
&lt;/h2&gt;

&lt;p&gt;There are a few methods of verifying the data&lt;/p&gt;

&lt;h4&gt;
  
  
  Row Count
&lt;/h4&gt;

&lt;p&gt;Doing a row count is an easy way to ensure at least all the rows were inserted. Count the rows in the old database and compare that to the rows in the new database.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;address_book&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Checksum
&lt;/h4&gt;

&lt;p&gt;Running a checksum across the columns may help, but bear in mind that some fields, especially varchar fields, could have been changed to ANSI standard format. So while this will work on some fields, it won't be accurate on all fields.&lt;/p&gt;

&lt;p&gt;For MySQL&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;MD5&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;GROUP_CONCAT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;COALESCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;address_book&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For PostgreSQL&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;MD5&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;STRING_AGG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;COALESCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;address_book&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Manual Data Check
&lt;/h4&gt;

&lt;p&gt;You are going to want to verify the data through a manual process also. Run some queries that make sense, queries that would be likely to pick up issues with the import. &lt;/p&gt;

&lt;h2&gt;
  
  
  Final thoughts
&lt;/h2&gt;

&lt;p&gt;Migrating databases is a large undertaking, but with careful planning and a good understanding of both your dataset and the differences between the two database systems, it can be completed successfully. &lt;/p&gt;

&lt;p&gt;There is more to migrating to a new database than just the import, but a solid dataset migration will put you in a good place for the rest of the transition.&lt;/p&gt;




&lt;p&gt;Scripts created for this migration can be found on &lt;a href="https://github.com/Lawrence72/mysql-to-postgresql" rel="noopener noreferrer"&gt;Git Hub&lt;/a&gt;. &lt;/p&gt;

</description>
      <category>postgressql</category>
      <category>postgres</category>
      <category>mysql</category>
      <category>perl</category>
    </item>
    <item>
      <title>Postgres Arrays</title>
      <dc:creator>Lawrence Cooke</dc:creator>
      <pubDate>Fri, 21 Jun 2024 02:00:43 +0000</pubDate>
      <link>https://forem.com/mrpercival/postgres-arrays-2nni</link>
      <guid>https://forem.com/mrpercival/postgres-arrays-2nni</guid>
      <description>&lt;h2&gt;
  
  
  What are Postgres arrays?
&lt;/h2&gt;

&lt;p&gt;Arrays are columns that can hold multiple values. They are useful when there is additional data that is tightly coupled to a row  of data in a table. &lt;/p&gt;

&lt;p&gt;Storing tags associated with a row, values from a web form where multiple options can be selected. These are both examples of where you could use an array.&lt;/p&gt;

&lt;p&gt;Arrays do not replace lookup tables. Lookup tables can generally be accessed from multiple rows in a table and are not tightly coupled to a specific row. &lt;/p&gt;

&lt;h2&gt;
  
  
  Example without using arrays
&lt;/h2&gt;

&lt;p&gt;Here is a simplified schema for a migraine tracker that stores both the start and end time, and a list of triggers.&lt;/p&gt;

&lt;p&gt;Main table&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt;
  &lt;span class="k"&gt;public&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;migraines&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;GENERATED&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;IDENTITY&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="nb"&gt;integer&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;start_dt&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="k"&gt;without&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt; &lt;span class="k"&gt;zone&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;end_dt&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="k"&gt;without&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt; &lt;span class="k"&gt;zone&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Lookup table for trigger type names&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt;
  &lt;span class="k"&gt;public&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;trigger_types&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;GENERATED&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;IDENTITY&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="nb"&gt;character&lt;/span&gt; &lt;span class="nb"&gt;varying&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Table to store selected triggers&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt;
  &lt;span class="k"&gt;public&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;migraine_triggers&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;integer&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;GENERATED&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;IDENTITY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="nb"&gt;integer&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;migraine_id&lt;/span&gt; &lt;span class="nb"&gt;integer&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;trigger_id&lt;/span&gt; &lt;span class="nb"&gt;integer&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Inserting Data
&lt;/h3&gt;

&lt;p&gt;Inserting data requires two separate actions&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Insert data into the migraine table&lt;/li&gt;
&lt;li&gt;Insert Triggers into the migraine_triggers table&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The insert into the migraine_triggers is likely a multi row insert.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;migraines&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;start_dt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;end_dt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'2024-06-18 09:30:00'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2024-06-18 10:30:00'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;migraine_triggers&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;migraine_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;trigger_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;),(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;),(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Updating Data
&lt;/h3&gt;

&lt;p&gt;Updating data is not entirely straight forward, you have to decide what approach you want to take (or the approach that works best with your data). &lt;/p&gt;

&lt;p&gt;1) Run a SELECT before UPDATE to find which ones already exist and INSERT items not already in the list. You may need to also delete rows that are no longer in the list.&lt;/p&gt;

&lt;p&gt;2) Use a conflict resolution insert (if the table is indexed to allow it)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;migraine_triggers&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;migraine_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trigger_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;VALUES&lt;/span&gt; 
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;CONFLICT&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;migraine_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trigger_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;DO&lt;/span&gt; &lt;span class="k"&gt;NOTHING&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You may need to also delete rows that are no longer in the list with this method also.&lt;/p&gt;

&lt;p&gt;3) Run a delete query to delete all rows related to the migraine_id and INSERT all the new items.&lt;/p&gt;

&lt;p&gt;In all these scenarios multiple queries are required to update the data. &lt;/p&gt;

&lt;h3&gt;
  
  
  Selecting Data
&lt;/h3&gt;

&lt;p&gt;A simple selection might be to find migraines where the migraine was triggered by trigger 3&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;migraines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;migraine_triggers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;trigger_id&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;migraines&lt;/span&gt;
&lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;migraine_triggers&lt;/span&gt; 
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;migraine_triggers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;migraine_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;migraines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;migraine_triggers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;trigger_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now a slightly more complex query to bring back the name of the trigger from the &lt;code&gt;trigger_types&lt;/code&gt; table&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;migraines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;trigger_types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;migraines&lt;/span&gt;
&lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;migraine_triggers&lt;/span&gt; 
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;migraine_triggers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;migraine_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;migraines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; 
&lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;trigger_types&lt;/span&gt; 
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;trigger_types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;migraine_triggers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;trigger_id&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;migraine_triggers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;trigger_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Example using arrays
&lt;/h2&gt;

&lt;p&gt;Using arrays we can simplify the database design and the queries needed to retrieve the same information in the above examples.&lt;/p&gt;

&lt;p&gt;One of the features of arrays that separates it from JSON OR JSONB fields is that the data is strictly typed. &lt;/p&gt;

&lt;p&gt;The data that goes into an array must be the right type of data. &lt;/p&gt;

&lt;p&gt;This ensures that data integrity is maintained in the array.&lt;/p&gt;

&lt;p&gt;In this example the data type would be INTEGER. A CHAR could be used but using an integer and utilizing a lookup table has some advantages over just storing the names in the array.&lt;/p&gt;

&lt;h3&gt;
  
  
  Adding an array field
&lt;/h3&gt;

&lt;p&gt;Instead of using the &lt;code&gt;migraine_triggers&lt;/code&gt; table, we can add a column to the migraine table to hold the trigger_ids selected for the migraine. &lt;/p&gt;

&lt;p&gt;This will prevent the need for multiple row inserts, deletes and updates. It can also improve select performance because the queries can be simplified in some cases. It also reduces the size of the database by not needing an additional, potentially large table. &lt;/p&gt;

&lt;p&gt;To add an array column, add [] after the columns data type.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt;
  &lt;span class="k"&gt;public&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;migraines&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;GENERATED&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;IDENTITY&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="nb"&gt;integer&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;start_dt&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="k"&gt;without&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt; &lt;span class="k"&gt;zone&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;end_dt&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="k"&gt;without&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt; &lt;span class="k"&gt;zone&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;trigger_types&lt;/span&gt; &lt;span class="nb"&gt;integer&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Inserting Data
&lt;/h3&gt;

&lt;p&gt;Inserting data will now require just one query, wrap the values for the array in {} to insert the array.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;migraines&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start_dt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end_dt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trigger_types&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2024-06-18 09:30:00'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2024-06-18 10:00:00'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'{1,2}'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Updating Data
&lt;/h3&gt;

&lt;p&gt;Updating data is similar, just one query to update the migraine and the trigger data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;migraines&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;end_dt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'2024-06-18 11:00:00'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trigger_types&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'{1,3}'&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Selecting Data
&lt;/h3&gt;

&lt;p&gt;A simple selection to find migraines where the migraine was triggered by trigger 3 can now be simplified from what it was before.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;migraines&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;ANY&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trigger_types&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this case, there is no overhead from table joins. &lt;/p&gt;

&lt;p&gt;Here is a more complex query where we want to pull in the trigger name from the &lt;code&gt;trigger_types&lt;/code&gt; table.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;migraines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trigger_types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;migraines&lt;/span&gt; 
&lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="k"&gt;UNNEST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trigger_types&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;trigger_id&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;trigger_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
&lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;trigger_types&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;trigger_types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;trigger_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this case we can use &lt;code&gt;unnest&lt;/code&gt; to turn the array into rows and then join those rows with the trigger_types table.&lt;/p&gt;

&lt;h3&gt;
  
  
  Indexing Arrays
&lt;/h3&gt;

&lt;p&gt;To improve performance, you can add an index to an array field. &lt;/p&gt;

&lt;p&gt;Using a GIN (Generalized Inverted Index) is most likely the best index type to choose. &lt;/p&gt;

&lt;p&gt;GIN is designed for fields where multiple values are present. Arrays, JSONB are both examples where you might want to use a GIN index.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_gin_triggers&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;migraines&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;GIN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trigger_types&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;Arrays are not right for every situations, but can provide a efficient way to store row meta data.&lt;/p&gt;

&lt;p&gt;They can simplify database design and queries, while maintaining data integrity and ease of access.&lt;/p&gt;

&lt;p&gt;Further information on arrays can be found in the  &lt;a href="https://www.postgresql.org/docs/current/arrays.html" rel="noopener noreferrer"&gt;Postgres Manual&lt;/a&gt;&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>sql</category>
      <category>postgressql</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Updating legacy code to php 8.x</title>
      <dc:creator>Lawrence Cooke</dc:creator>
      <pubDate>Tue, 30 Apr 2024 14:41:22 +0000</pubDate>
      <link>https://forem.com/mrpercival/updating-legacy-code-to-php-8x-2jg1</link>
      <guid>https://forem.com/mrpercival/updating-legacy-code-to-php-8x-2jg1</guid>
      <description>&lt;p&gt;When you have really old code, while it might work on php 7.4, getting it ready for php 8 is a daunting task. &lt;/p&gt;

&lt;p&gt;But if you take it in small steps, you can do it. &lt;/p&gt;

&lt;p&gt;I have a code base that was originally written 20+ years ago, there was no classes, OOP, Frameworks, or any of the tools now at our disposal, so this code was old procedural code for the most part, that by some miracle actually worked on PHP 7.4.&lt;/p&gt;

&lt;p&gt;Knowing where to start is hard, but you have to start somewhere, so start somewhere simple. &lt;/p&gt;

&lt;h2&gt;
  
  
  Basic Cleanup
&lt;/h2&gt;

&lt;p&gt;I started with short open tags. &lt;/p&gt;

&lt;p&gt;Back in the early days of PHP, short open tags were the standard, until issues arose with XML, which is why I still had a bunch of these in this code. &lt;/p&gt;

&lt;p&gt;I updated these manually, and the reason I did this manually was that I wanted to see what gremlins I would find in the process. &lt;/p&gt;

&lt;p&gt;To find these short open tags I used regex to find them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;&lt;/span&gt;&lt;span class="err"&gt;\?(?!&lt;/span&gt;&lt;span class="na"&gt;php&lt;/span&gt;&lt;span class="err"&gt;|&lt;/span&gt;&lt;span class="na"&gt;xml&lt;/span&gt;&lt;span class="err"&gt;|=)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This was a helpful process, because I did notice some other things while doing this. &lt;/p&gt;

&lt;p&gt;While its tempting to try and fix these, don't, but make notes on them, if you get distracted in what you are trying to fix, you will end up going down rabbit holes and never finishing what you started. &lt;/p&gt;

&lt;p&gt;Once all the short open tags were converted, I found that I a lot of places where I had&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?php&lt;/span&gt; &lt;span class="k"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"something"&lt;/span&gt;&lt;span class="cp"&gt;?&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;While this isn't a deal breaker, this process is a chance to clean up your code. and &amp;lt;?php echo can be neater written as &amp;lt;?= so why not make the change? &lt;/p&gt;

&lt;p&gt;Now that the open tags are consistent, the ECHOs are easy to identify and update with a simple find and replace across the project.&lt;/p&gt;

&lt;h2&gt;
  
  
  Code Cleanup
&lt;/h2&gt;

&lt;p&gt;Now that some basics are out of the way, its time to clean up the code, while this clean up doesn't get us any closer to PHP 8, having neat well laid out code will be a big benefit in this process.&lt;/p&gt;

&lt;p&gt;For this I set up my phpcs.xml rules with just a couple of rules&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;rule&lt;/span&gt; &lt;span class="na"&gt;ref=&lt;/span&gt;&lt;span class="s"&gt;"PSR12"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;rule&lt;/span&gt; &lt;span class="na"&gt;ref=&lt;/span&gt;&lt;span class="s"&gt;"Generic.Arrays.DisallowLongArraySyntax"&lt;/span&gt;&lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will use the latest PSR standard for the most part with one additional rule to convert array() into []. You may feel differently about this, but I prefer the shorter array definition these days. &lt;/p&gt;

&lt;h2&gt;
  
  
  Random Errors
&lt;/h2&gt;

&lt;p&gt;It was in this code clean up process where I started running into trouble. I started getting errors. &lt;/p&gt;

&lt;p&gt;This was a great outcome. the beautifier was finding places in my code that had had little bugs for years, some of them I am still wondering why the pages worked at all, but here we are. &lt;/p&gt;

&lt;p&gt;This allowed me to fix these minor issues as they got picked up. There weren't many of these, but it was great to find them. &lt;/p&gt;

&lt;h2&gt;
  
  
  Secondary Cleanup
&lt;/h2&gt;

&lt;p&gt;After the beautifier had run, the code looked neat, but there were places where I had multiple blank lines, probably from removing code over time. &lt;/p&gt;

&lt;p&gt;I'm not sure if code sniffer has a rule for this, I couldn't find one in a quick search , but rather than spending time hunting it down, I thought I would just do another regex across the project to find these unnecessary gaps.&lt;/p&gt;

&lt;p&gt;Using the following regex did the job nicely.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="o"&gt;^&lt;/span&gt;&lt;span class="n"&gt;\n&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,}&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally a clean codebase! &lt;/p&gt;

&lt;h2&gt;
  
  
  Updating to PHP 8
&lt;/h2&gt;

&lt;p&gt;For the next phase of updates, I installed &lt;a href="https://github.com/rectorphp/rector" rel="noopener noreferrer"&gt;PHP Rector&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;There is a temptation to configure rector and set it to the latest version of PHP 8. &lt;/p&gt;

&lt;p&gt;With how old the code base is, and how much has changed, this lead to an overwhelming amount of changes.&lt;/p&gt;

&lt;p&gt;In total, doing it this way ended up at over 1000 files changed and it was too many to cope with.&lt;/p&gt;

&lt;p&gt;As great as rector is, its not a tool that you can just let it do whatever it wants with your code, it will end very badly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Light Bulb Moment
&lt;/h2&gt;

&lt;p&gt;During this process I realized that just because code ran on PHP 7.4, didn't mean it was PHP 7.4 code. This code had been through every version of PHP since PHP 2.0. There was a lot of old coding in it, and Rector wanted to fix it all. &lt;/p&gt;

&lt;h2&gt;
  
  
  Back Tracking
&lt;/h2&gt;

&lt;p&gt;To make this task easier, I pushed the Rector PHP version back down as far as I could to PHP 5.3. &lt;/p&gt;

&lt;p&gt;With rector targeted at PHP 5.3, a couple of minor issues were picked up, 3 files affected, easy changes. &lt;/p&gt;

&lt;p&gt;When Rector got a clean bill of health on 5.3, I increased it to PHP 5.4, again a couple of minor things on 2 files.&lt;/p&gt;

&lt;p&gt;It wasn't until PHP 5.6 that I caught anything mildly interesting, but what going back to 5.3 did was ensure I was fixing things in small bites, this is really significant to this process. &lt;/p&gt;

&lt;h2&gt;
  
  
  Updating to PHP 7.4
&lt;/h2&gt;

&lt;p&gt;Each time I did a version update in Rector, it found new things to fix, most of them were able to be automated, and those that weren't, I could fix the code quickly.&lt;/p&gt;

&lt;p&gt;In the process I found bunches of little code issues that might end up with warnings in PHP 7.4 but errors in PHP 8. &lt;/p&gt;

&lt;p&gt;Rector with PHP 7.1 brought about the first really odd results, and these odd results is why blindly allowing Rector to fix everything is a bad idea. &lt;/p&gt;

&lt;p&gt;Hidden amongst the changes was this really odd change:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$show_time_out&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;was changed to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A look at the code revealed that there were variables that were initialized as the wrong type. It also appeared that this change was triggered by variables that were set but never updated, probably from old code that had been removed but remanents remained.&lt;/p&gt;

&lt;p&gt;One thing to note about Rector is that you can filter rules out. While long term you are going to want to keep most of them in, when you are in a situation where you end up with 150 files to update, automating what is safe to update will make the job of dealing with the others less daunting, in my case I had a total of 143 files with issues with PHP 7.1, but there were a lot that were safe to update and I was able to filter out the more difficult rule violations to update manually. &lt;/p&gt;

&lt;h2&gt;
  
  
  Updating to PHP 8
&lt;/h2&gt;

&lt;p&gt;Finally the moment, to update to PHP 8. When I first started out on this update, the update to PHP 8 was daunting, there was a lot of issues. Too many to deal with, but because I did the changes incrementally the actual change for PHP 8 was only 25 files instead of 1000+, and most of those were changes in string manipulation changes that could be fixed automatically.&lt;/p&gt;

&lt;p&gt;Both PHP 8.0 and 8.1 had changes in them, but once I reached PHP 8.1, no further changes to code was found. I pushed Rector up to 8.4 to check this.  &lt;/p&gt;

&lt;h2&gt;
  
  
  Left Overs
&lt;/h2&gt;

&lt;p&gt;Rector can't do everything for you, there are going to be left overs, and for me those left overs were related to type mixing.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nv"&gt;$var&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$var&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;While I am still working through and testing to find these gremlins, there are only a few in the code base that were not corrected in some way by Rector during the process. &lt;/p&gt;

&lt;p&gt;As a result, the site itself is working, It still needs to be put through the usual testing phase, but the outcome was been better than expected.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Take your time with the process, rushing it will end with broken code. &lt;/p&gt;

&lt;p&gt;Commit code regularly, you want to be able to back track if needed, and try again without having to go too far back. Committing after each PHP version bump made sense to me. &lt;/p&gt;

&lt;p&gt;While I would love to update this code base into a framework, getting it to PHP 8 was a higher priority. The code works, even though it's not the best code. &lt;/p&gt;

&lt;p&gt;I will move it to a framework in stages, but having a working code base to work from will make that transition a much easier one. &lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
