<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: mayur</title>
    <description>The latest articles on Forem by mayur (@mayur555b).</description>
    <link>https://forem.com/mayur555b</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2082348%2Fef46910f-c6fc-4a0b-864a-cb1600da9f7d.jpg</url>
      <title>Forem: mayur</title>
      <link>https://forem.com/mayur555b</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/mayur555b"/>
    <language>en</language>
    <item>
      <title>Oracle to Postgres : Happily Ever After edition</title>
      <dc:creator>mayur</dc:creator>
      <pubDate>Sat, 19 Oct 2024 23:17:38 +0000</pubDate>
      <link>https://forem.com/mayur555b/oracle-to-postgres-happily-ever-after-edition-5766</link>
      <guid>https://forem.com/mayur555b/oracle-to-postgres-happily-ever-after-edition-5766</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe2qctwv6t9p23m5jb52d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe2qctwv6t9p23m5jb52d.png" alt="Image description" width="500" height="665"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Prelude :&lt;br&gt;
The launch of the Oracle Autonomous Database ignited a lively debate about the future of the database administrator (DBA) role.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvke5g5srq0cnl4eq134h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvke5g5srq0cnl4eq134h.png" alt="Image description" width="800" height="365"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Oracle envisioned a world where the DBA would become obsolete with a snap of their fingers, just like Thanos.&lt;/p&gt;

&lt;p&gt;However, Postgres said to Oracle “Hold my beer”.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frqjdcyi5hguok2rz3435.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frqjdcyi5hguok2rz3435.png" alt="Image description" width="720" height="696"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Few years before that time, I made the leap from the Oracle world to Postgres, seeing Postgres as the Galactus of databases (The Eater of database worlds), ready to devour the competition.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flx8b7zljy8rwbkr8u81w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flx8b7zljy8rwbkr8u81w.png" alt="Image description" width="720" height="378"&gt;&lt;/a&gt;&lt;br&gt;
PG : Eater of the database worlds.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa7x0l9lr0gmv3og1b1b0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa7x0l9lr0gmv3og1b1b0.png" alt="Image description" width="720" height="720"&gt;&lt;/a&gt;&lt;br&gt;
Postgres Ecosystem : Credits to Vonng (Author of PIGSTY)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7qkgpf1yo8liwvf4naml.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7qkgpf1yo8liwvf4naml.png" alt="Image description" width="720" height="405"&gt;&lt;/a&gt;&lt;br&gt;
Credits : @dotnetschizo&lt;/p&gt;

&lt;p&gt;The signs were everywhere. The two communities couldn’t have been more different. Oracle is flashy and glamorous, driven by the latest marketing buzz. In contrast, the Postgres community is a bubbling cauldron of innovation, with contributions flowing in from passionate developers around the globe. It was obvious to me that Postgres’s extreme extensibility would outshine any single company’s vision.&lt;/p&gt;

&lt;p&gt;The ”Ever After” :&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgrmowm53rcd3gkyrs991.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgrmowm53rcd3gkyrs991.png" alt="Image description" width="480" height="319"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There’s been plenty of chatter about migrating from Oracle to Postgres, but most of the focus is on the planning and execution phases. Rarely do we hear about what happens afterward, like the happily-ever-after in a Disney movie.&lt;/p&gt;

&lt;p&gt;Having been involved in numerous Oracle-to-Postgres migrations, I often find myself called in to address performance issues or to fix critical elements overlooked during the migration. So, I thought, why not have a blog or a talk dedicated to the “ever after” of these migrations? Let’s explore what happens once the dust settles!&lt;/p&gt;

&lt;h2&gt;
  
  
  Watch out for those Materialized Views
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjhprcp7p41uqy7mxox15.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjhprcp7p41uqy7mxox15.png" alt="Image description" width="429" height="720"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Oracle’s optimizer (planner), boasts a magical query rewrite option with materialized views. If the optimizer detects that any part of an SQL query matches a materialized view definition, it can seamlessly replace that portion of the SQL text with the materialized view. This clever trick saves significant time and resources typically spent on repetitive complex joins and aggregates on large tables.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd4smsf2kxmixm0p4lqvy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd4smsf2kxmixm0p4lqvy.png" alt="Image description" width="719" height="295"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Unfortunately, Postgres doesn’t have an equivalent to this Mview rewrite feature in its planner. This means that while migration tools can help convert code and schema, they can’t detect or provide alternatives to such internal rewrites happening within the database engine.&lt;/p&gt;

&lt;p&gt;Imagine you’re celebrating a smooth Oracle to Postgres migration over the weekend, only to have your pager go off due to a critical incident in the weekly production batch. Oracle developers often leverage this shortcut for query optimization in batch processes or weekend-monthly jobs, where retrieving the most recent data isn’t a priority. Although daily OLTP loads might not benefit significantly from Mview query rewrites, less frequent or off-peak tasks often do. This is one reason why such issues can be hard to detect during pre-go-live smoke tests.&lt;/p&gt;

&lt;p&gt;It’s always better to identify query rewrites while still on Oracle, before the migration, rather than scrambling to optimize a heavy batch process over the weekend post-migration to Postgres. You can easily enable or disable Mview rewrite, allowing you to establish a baseline during load testing that covers all application aspects with the rewrite on, and then conduct another test with the Mview rewrite disabled. This process will highlight any outliers, helping you identify which modules require performance optimization on Postgres. These optimizations can then be tackled in a stress-test environment, ensuring a smoother transition and fewer surprises post-migration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multi-node RAC &amp;lt;&amp;gt; Multi-node Patroni cluster
&lt;/h2&gt;

&lt;p&gt;Migration tooling and planning often overlook this critical aspect of database architecture. A three-node Oracle RAC setup doesn’t directly translate to a three-node Postgres Patroni cluster, typically used as a high-availability (HA) solution. The main reason is that all three nodes in an Oracle RAC handle write traffic, whereas Postgres replicas are read-only.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftw3n41e3tnz1ko6r2zf5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftw3n41e3tnz1ko6r2zf5.png" alt="Image description" width="714" height="720"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This becomes a headache if your application heavily relies on global temporary tables, as Postgres replicas can’t drop, create, or truncate these tables. PGTT extension could help regarding reducing code conversion effort by mimicking Oracle’s global temporary table implementation in Postgres. However, it still creates a temporary table on the fly, the first time it is used in a session. Due to this even with PGTT you need primary for a code with temp table usage to work. Apart from being bottleneck in offloading some of the traffic to replica, frequent drop-create of temp tables causes Postgres catalog tables to bloat.&lt;/p&gt;

&lt;p&gt;Oracle RAC implementations are usually designed to minimize global cache fusion (internode communication), meaning the application is logically modularized into multiple services, each interacting with a dedicated node. This setup is not just for high availability but also for load balancing.&lt;/p&gt;

&lt;p&gt;Even if we employ load balancers on the Postgres side, there’s always a limit to how much load can be offloaded to replicas due to their read-only nature. Therefore, when converting a multi-node RAC to a Postgres HA cluster, it’s crucial to anticipate the additional load on the primary node and upgrade the hardware accordingly. This might mean better CPUs, higher throughput instance and higher limits in the connection pool. Moreover, concurrency-related issues might arise, necessitating thorough load testing before going live. So, brace for the load, test rigorously, and ensure your primary node is up to the challenge!&lt;/p&gt;

&lt;p&gt;During the PG-Ibiza conference, someone asked why we don’t use logical replication to create a multi-active, multi-node cluster as a replacement for Oracle RAC. While it’s possible to use a flavor of Bi-Directional Replication (EDB-BDR,PgEdge,AWS Pg_Active) or a custom setup, you would need to manage conflicts, split-brain scenarios, logical replication bugs, and resynchronization issues. &lt;a href="https://www.depesz.com/2023/06/04/a-tale-about-incomplete-upgrade-from-postgresql-12-to-14/" rel="noopener noreferrer"&gt;This blog&lt;/a&gt; by depesz’s highlights pain of logical replication at scale. This means transitioning from the relatively low-maintenance environment of Oracle RAC to a multi-active setup that requires significantly more attention and care. For obvious reasons, this approach is unlikely to receive immediate approval from CTOs or infrastructure managers at large companies. However, if you have a single cluster to manage, it is feasible to pursue this solution.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scalar Subquery Caching
&lt;/h2&gt;

&lt;p&gt;Oracle introduced the fascinating feature of scalar subquery caching starting with version 11g. This nifty feature helps mitigate developer mistakes by optimizing slow subqueries in production. As a result, suboptimal code in Oracle could go unnoticed for ages. But once you migrate to Postgres, it can hit you like a wrecking ball. If these issues arise in parts of the code triggered by specific datasets, they can be tough to detect during load testing.&lt;/p&gt;

&lt;p&gt;Let me share an example. We had a tiny application with a small 150 MB database, running a single query most of the time. Despite its small size, it managed to bring an AWS c5.2xlarge instance (8 vCPUs, 16 GB RAM, SSD io1 storage) to its knees. There were frequent CPU spikes, the application encountered numerous timeout errors, and the server appeared overloaded. Developers blamed Postgres, pointing out that the same application had run smoothly on Oracle for months, even though the tables involved were only a few MBs in size.&lt;/p&gt;

&lt;p&gt;Upon examining the problematic query, I found it to be a simple SELECT statement with a few Common Table Expressions (CTEs) referenced multiple times as scalar subqueries. The explain plan revealed that one particular CTE was the root cause. Thanks to Depesz’s tool, it was clear that most of the time was spent on the CTE scan. This happened because Postgres executed the CTE scan for every row produced by the main query. In Oracle, the same CTE scan was cached via scalar subquery caching, allowing it to narrowly escape detection for a long time.&lt;br&gt;
Since this issue occurred on Postgres 13, we couldn’t use the enable_memoize option available in Postgres 14, which provides a similar feature. However, even with enable_memoize, we’ve noticed performance degradation in other databases and haven’t seen significant benefits in production systems. Also enable_memoize still has performance issues.&lt;/p&gt;

&lt;p&gt;Solution was pretty simple, break large sql into tiny parts using temp table and index temp table of problematic CTE. There was further option to put this in an immutable function call and utilize caching in Postgres but developers seem happy with just temp table indexing.&lt;/p&gt;

&lt;p&gt;I had to obfuscate plan, tables and query due to company security policy so below pictures are messy.&lt;/p&gt;

&lt;p&gt;Offending query was select on a view and partial view definition: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fakurhkdi0ho1vtyhlxh6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fakurhkdi0ho1vtyhlxh6.png" alt="Image description" width="720" height="444"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You could see cte_policy_status being put in multiple scalar subqueries above.&lt;/p&gt;

&lt;p&gt;Plan is big 300+ lines so I would just copy relevant part:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2rwjpi7f7za0qwfko9rb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2rwjpi7f7za0qwfko9rb.png" alt="Image description" width="720" height="376"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9pj9p73mkup3t8m7tulj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9pj9p73mkup3t8m7tulj.png" alt="Image description" width="720" height="362"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Problematic query took around 3–4 minutes per execution and there were multiple sessions concurrently executing it. You can see cte_policy_status being main bottleneck clearly, thanks to &lt;a href="https://explain.depesz.com/" rel="noopener noreferrer"&gt;depesz’s tool&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foq0gwpkmkpkrpmbkrpeq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foq0gwpkmkpkrpmbkrpeq.png" alt="Image description" width="710" height="720"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2lx3ksav0tnklmuabi4w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2lx3ksav0tnklmuabi4w.png" alt="Image description" width="720" height="334"&gt;&lt;/a&gt;&lt;br&gt;
Happily Ever After Version&lt;/p&gt;

&lt;p&gt;Problematic query finished in milliseconds after splitting it using temp tables &amp;amp; indexing/analyzing temp tables, logical io reduced significantly and everyone lived happily ever after.&lt;/p&gt;

&lt;h2&gt;
  
  
  SQL Plan Baselines and Hints
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq3shucks4heel20q2wdn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq3shucks4heel20q2wdn.png" alt="Image description" width="720" height="463"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Oracle’s dbms_spm guarantees SQL execution plan stability, a feature we lack in Postgres for locking down a specific execution plan for a SQL statement. Postgres does have pg_hint_plan, but it’s not quite the same as dbms_spm. After migration, SQLs with spm baselines could surprise you. However, most of these baselines are created to address missing statistics or incorrect timings for collecting stats. This can be resolved by reviewing the process flow and running ANALYZE on tables after any bulk data loads. For the remaining minor cases, pg_hint_plan can be a lifesaver. One example is the transactional outbox pattern. implementation in microservice architecture.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuftwyb1nlq78o0hfyd8u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuftwyb1nlq78o0hfyd8u.png" alt="Image description" width="720" height="319"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this pattern, every transaction includes a corresponding insert or update in the outbox. The data volume continuously fluctuates due to deletions based on polling frequency, and the table is constantly read by a message relay. This is a perfect recipe for disaster as it generates many dead tuples. Autoanalyze may collect stats when the outbox table is empty, resulting in subsequent full scans. Tiny cloud instances provisioned for microservices can quickly hit I/O, throughput, and CPU limits. Increasing autovacuum frequency helps but at the cost of instance throughput and CPU capacity. Even so, with many such full scan queries running on a bloated outbox table, the database can become overloaded.&lt;/p&gt;

&lt;p&gt;A simple index hint via pg_hint_plan on the outbox table’s queries can save DBAs from many sleepless nights. By proactively managing these potential pitfalls, you can ensure smoother operations and keep your database running efficiently.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F79h454cwmuh6bjo63qcb.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F79h454cwmuh6bjo63qcb.gif" alt="Image description" width="853" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9wxdp389ptx34dl9dg0f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9wxdp389ptx34dl9dg0f.png" alt="Image description" width="720" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  To Err is human, To exception divine.
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F12e5a3b4wbedrekgulvu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F12e5a3b4wbedrekgulvu.png" alt="Image description" width="720" height="182"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The reason for this distinction is evident in the PostgreSQL documentation, showcased in all its glory.&lt;/p&gt;

&lt;p&gt;Oracle and PostgreSQL behave differently in their approach to exceptions. In Oracle, one can perform various mental gymnastics within the exception handling section, utilizing statement-level rollbacks and commits. While I won’t delve into the functional differences here, it’s worth noting that when developers attempt to replicate Oracle’s behavior in PostgreSQL, they often resort to savepoints or unnecessarily complicate the exception handling logic. Exception handling is internally via savepoint.&lt;/p&gt;

&lt;p&gt;Using savepoints to mimic exception handling code can lead to catastrophic subtransaction SLRU lock wait events, potentially bringing your application to a complete halt.&lt;/p&gt;

&lt;p&gt;The famous gitlab incident….&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4rac1547vna0s5zbacjf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4rac1547vna0s5zbacjf.png" alt="Image description" width="719" height="370"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Their ruby code roughly translates to below flow.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgupib3tv4ccv6tvjgspw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgupib3tv4ccv6tvjgspw.png" alt="Image description" width="720" height="332"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here it shows performance cliff caused by SLRU wait.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fka399g95howwoatwqtd8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fka399g95howwoatwqtd8.png" alt="Image description" width="800" height="177"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The good news is that starting with PostgreSQL 17, you can increase cache sizes related to SLRU, such as subtransaction_buffers, multixact_member_buffers, and notify_buffers, which helps mitigate the risk of such performance cliffs.&lt;/p&gt;

&lt;p&gt;Additionally, you can use clever techniques to avoid using exceptions altogether.&lt;br&gt;
For example: Instead of pk violation exception section you could use on conflict clause.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fudy17839zpjus4nd7wnx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fudy17839zpjus4nd7wnx.png" alt="Image description" width="800" height="128"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Added benefit, on conflict .. do nothing reduces dead tuples.&lt;/p&gt;

&lt;h2&gt;
  
  
  Last but not the least
&lt;/h2&gt;

&lt;p&gt;At PGConf-EU 2023, I gave a lightning talk about the hidden objects that can cause portability and potential licensing issues when using cloud vendor-provided tools for Oracle (self-hosted) to Postgres (cloud) migration. I was surprised to find that many DBAs were unaware of these issues in cloud environments.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1r69cvz4bumcwnwmmgmp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1r69cvz4bumcwnwmmgmp.png" alt="Image description" width="800" height="534"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3gq04tep1cdl11ijs83k.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3gq04tep1cdl11ijs83k.gif" alt="Image description" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In just a year, the community has already come up with a solution. &lt;a href="https://github.com/dcgadmin/extensionmigrationassistant" rel="noopener noreferrer"&gt;ExtensionMigrationAssistant&lt;/a&gt; is an open-source tool designed to detect cloud vendor-specific objects and functions in a migrated Postgres schema. Additionally, the creators have established a company, DataCloudGaze, to assist with end-to-end Oracle to Postgres migrations. This rapid response is a testament to Postgres’s vibrant and innovative community.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyqw76w4q0pv2slkcbm38.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyqw76w4q0pv2slkcbm38.png" alt="Image description" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So, despite all the challenges moving out of Oracle, rest assured that Postgres’s dynamic community has got your back, constantly evolving and coming up with solutions to make your transition smoother and more efficient.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;References :&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://blogs.oracle.com/infrastructure/post/how-the-autonomous-database-will-change-the-dba-rolefor-the-better" rel="noopener noreferrer"&gt;https://blogs.oracle.com/infrastructure/post/how-the-autonomous-database-will-change-the-dba-rolefor-the-better&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/19/racad/introduction-to-oracle-rac.html#GUID-D04AA2A7-2E68-4C5C-BD6E-36C62427B98E" rel="noopener noreferrer"&gt;https://docs.oracle.com/en/database/oracle/oracle-database/19/racad/introduction-to-oracle-rac.html#GUID-D04AA2A7-2E68-4C5C-BD6E-36C62427B98E&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://explain.depesz.com/" rel="noopener noreferrer"&gt;https://explain.depesz.com/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.datacloudgaze.com/post/extension-migration-assistance" rel="noopener noreferrer"&gt;https://www.datacloudgaze.com/post/extension-migration-assistance&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://about.gitlab.com/blog/2021/09/29/why-we-spent-the-last-month-eliminating-postgresql-subtransactions/" rel="noopener noreferrer"&gt;https://about.gitlab.com/blog/2021/09/29/why-we-spent-the-last-month-eliminating-postgresql-subtransactions/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://microservices.io/patterns/data/transactional-outbox.html?source=post_page-----11a3c27c011f--------------------------------" rel="noopener noreferrer"&gt;https://microservices.io/patterns/data/transactional-outbox.html?source=post_page-----11a3c27c011f--------------------------------&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.depesz.com/2023/06/04/a-tale-about-incomplete-upgrade-from-postgresql-12-to-14/" rel="noopener noreferrer"&gt;https://www.depesz.com/2023/06/04/a-tale-about-incomplete-upgrade-from-postgresql-12-to-14/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>database</category>
      <category>oracle</category>
      <category>postgres</category>
      <category>dbmigration</category>
    </item>
    <item>
      <title>Schrödinger’s Backups: A Lesson in Uncertainty</title>
      <dc:creator>mayur</dc:creator>
      <pubDate>Thu, 17 Oct 2024 14:46:49 +0000</pubDate>
      <link>https://forem.com/mayur555b/schrodingers-backups-a-lesson-in-uncertainty-di9</link>
      <guid>https://forem.com/mayur555b/schrodingers-backups-a-lesson-in-uncertainty-di9</guid>
      <description>&lt;p&gt;Backups are a lot like Schrödinger’s cat. You think you have them, but when it’s time to check, they might not be what you expect! Even with the best tools in your arsenal, like pgBackRest, things can still go hilariously wrong. It’s not that pgBackRest falters; instead, the chaos is caused by human miscommunication and policy missteps. It’s a reminder that no matter how sharp the tool, it’s all about how you wield it!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4pxmbutw0n3a5lpwt8xm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4pxmbutw0n3a5lpwt8xm.png" alt="Image description" width="739" height="724"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Incident 1 :&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So, here’s the deal. A big miscommunication-mismanagement-misalignment went down between the top level decision-makers behind the one of my previous company’s backup retention policy. Result? BOOM. Backup retention was set to 60 days in pgbackrest stanza, instead of the seven years as written in the some obscure corner in the confluence (and it was a single stanza, no typical hot/cold separation omg!!). One fine day, I went hunting for an old backup after a developer mistake in code deleted last quarters data (Hence needed recovery from a old backup)… and there was nothing! Zero, zilch, zip, nada, nothing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fujzfod0j1tqvjrmcso62.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fujzfod0j1tqvjrmcso62.gif" alt="Image description" width="360" height="244"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://pgstef.github.io/2020/05/25/pgbackrest_preview_tour_of_retention_policy_options.html" rel="noopener noreferrer"&gt;This blog&lt;/a&gt; by pgstef is an excellent overview of retention policy for those new to the postgres. We had weekly automated backup restore tests configured, but they were ultimately futile in safeguarding against such a significant mistake by management in defining the retention policy.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb8w7cengeddd0rg0qepo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb8w7cengeddd0rg0qepo.png" alt="Image description" width="800" height="404"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Incident 2:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;But wait, it gets better (or worse). The company had a second backup server with different pgbackrest stanza for a few more databases where retention was set to seven years as expected. Unfortunately, they had lumped cold and hot backups into a single stanza. What did this lead to? WAL archives going back to the dawn of time itself — like, 3.2 billion years ago. The storage ballooned to petabytes. &lt;strong&gt;PETABYTES&lt;/strong&gt;. Yes, you read that right. It was as if the company was trying to archive the entire universe. Talk about overkill.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhq27y5ug4kixvjl19yeo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhq27y5ug4kixvjl19yeo.png" alt="Image description" width="800" height="592"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem 3:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Of course, we eventually separated the cold and hot backup stanzas for each database cluster. You’d think that would resolve the issue, right? Well, the quantum curse wasn’t entirely lifted. A single 32TB database took over an entire day to back up! Just imagine the Recovery Time Objective (RTO) for a point-in-time recovery (PITR). 😭 Just contemplating it was enough to break our spirits. But as they say, we don’t bow to the god of death just yet. &lt;br&gt;
We upgraded pgBackRest from v1.1 to v2.46, increased the process-max to 12, switched compression to lz4, upgraded the instance to r5b.16xlarge (64vCPU,512GB ram) and changed the storage to io2 blockexpress (64K IOPS). This brought the full backup time down to 4 hours, with incrementals taking about 1 hour. Nonetheless, there’s still a looming threat: if the database growth rate isn’t curbed, the RTO will continue to rise.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Enter PostgreSQL 17 to Save the Day:&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Enter PostgreSQL 17 (I wrote this article when PG17 RC1 was released and used it for demo, you should use GA version). With this, we could do a full backup once and then rely on incremental backups going forward. No more waiting for days on end. For month-end and year-end backups? Just restore them on ZFS with high compression, and shut down the database when not in use. No more insane delays. Just streamlined, efficient recovery and peace of mind.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Performing the Initial Full Backup&lt;/strong&gt;&lt;br&gt;
To set up your backup strategy, start with a full backup of your database.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Creating Incremental Backups&lt;/strong&gt;&lt;br&gt;
After completing the full backup, you can perform incremental backups to capture only the changes made since the last backup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Using pg_combinebackup to Merge and Propagate Full Backups&lt;/strong&gt;&lt;br&gt;
The pg_combinebackup utility is introduced to manage and combine incremental backups efficiently. This utility allows you to merge incremental backups with a previous full backup, creating a new, up-to-date full backup without having to perform a full backup from scratch.&lt;/p&gt;

&lt;p&gt;Below are the steps demonstrating above concept tested on Postgres-v17rc1:&lt;/p&gt;

&lt;p&gt;A) Make sure wal summary is turned on before you take 1st full backup.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk7pziau4mpknh5b1f2vo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk7pziau4mpknh5b1f2vo.png" alt="Image description" width="800" height="149"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;B) Sample database and dummy data gen for tests (note: I will increment data in between incremental backup).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsfg4b8ijgym4naavoq3p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsfg4b8ijgym4naavoq3p.png" alt="Image description" width="800" height="548"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;C) Create first full backup using pg_baseback in the usual way&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff28jwjrtqlv6ztxqdt9v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff28jwjrtqlv6ztxqdt9v.png" alt="Image description" width="800" height="53"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;D) Create next incrementals and change data for tests&lt;/p&gt;

&lt;p&gt;1st incremental uses backup_manifest of First full, second incremental we can use either 1st incremental or first full. Here we use 1st incremental as a base for 2nd incremental.&lt;/p&gt;

&lt;p&gt;1st incremental&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj0mg7vg0vgjhmiwlkwlh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj0mg7vg0vgjhmiwlkwlh.png" alt="Image description" width="800" height="160"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2nd incremental&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fae10iotjmlbk2euohnh0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fae10iotjmlbk2euohnh0.png" alt="Image description" width="800" height="144"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;E) Use pg_combinebackup to shift full backup forward in time without taking another full backup.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr02gou1f2fit2w3hwaky.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr02gou1f2fit2w3hwaky.png" alt="Image description" width="800" height="65"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is very powerful concept, we are moving full backup ahead in time and bringing it closer to the most recent incremental. Due to this we not only save on resources but also reduce recovery time significantly. &lt;/p&gt;

&lt;p&gt;F) Restore new combined full backup and compare data&lt;/p&gt;

&lt;p&gt;I will restore it on another port 6666 for the sake of testing and easy comparison for the blog format, you should always restore on a different host to minimize risks.&lt;/p&gt;

&lt;p&gt;Restoring database on port 6666:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmeeaeg64vq22oijp5x1l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmeeaeg64vq22oijp5x1l.png" alt="Image description" width="800" height="186"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Verifying data by comparing restored vs original&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkr1zn2nccpmzahfkcdy4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkr1zn2nccpmzahfkcdy4.png" alt="Image description" width="800" height="195"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;PostgreSQL 17’s native incremental backup may not address human induced challenges by default, but it certainly simplifies key aspects of backup management. It enables easier implementation of retention policies, facilitates easy hot and cold data separation, and allows for a single full backup followed by incremental backups indefinitely, significantly improving recovery time objectives (RTO).&lt;/p&gt;

&lt;p&gt;Finally, We have defeated schrodinger and kept our backup alive.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6c6x880g14zoe4oyatp2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6c6x880g14zoe4oyatp2.png" alt="Image description" width="448" height="499"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PS: Quantum Gravity : Optimizing Cold Storage with ZFS&lt;/strong&gt;&lt;br&gt;
It’s a wild idea, not yet used in Production(that I know of). Try ZFS for high compression and effective storage management. You can restore backups on ZFS, apply high compression (we were able to achieve ~6x compression ratio with zstd) and keep restored db but turned off, isolate machine from external connections completely and you have a very efficient cold backup with a 100% guaranteed restored Postgres.&lt;/p&gt;

</description>
      <category>database</category>
      <category>postgressql</category>
      <category>postgres</category>
      <category>dba</category>
    </item>
    <item>
      <title>Autovacuum Tuning in The Cloud Era</title>
      <dc:creator>mayur</dc:creator>
      <pubDate>Mon, 14 Oct 2024 09:30:03 +0000</pubDate>
      <link>https://forem.com/mayur555b/autovacuum-tuning-in-the-cloud-era-17lf</link>
      <guid>https://forem.com/mayur555b/autovacuum-tuning-in-the-cloud-era-17lf</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fby1ue4w9204olbfgoeug.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fby1ue4w9204olbfgoeug.png" alt="Image description" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;How often do you hear DBAs vent, "Autovacuum used to fly on bare-metal, but in the cloud, it feels like it's dragging forever!" Or they might say, "I've doubled the max_autovacuum_workers, yet dead tuples just keep stacking up," or "Autovacuum isn't getting the throughput our cloud provider promised!" And then there's the developer chiming in, "Every time Autovacuum kicks in on that big table, our application queries start timing out!" These frustrations are all too common in cloud environments, where expectations meet the harsh reality of "you get what you pay for."&lt;/p&gt;

&lt;p&gt;In the cloud era, PostgreSQL's Autovacuum tuning can feel like a tightrope walk - juggling application performance, costs, and autovacuum efficiency due to cloud-imposed limitations. The cloud puts strict caps on I/O and throughput. Push Autovacuum too hard, and it'll devour your application's resources, tanking performance. But dial it down too much, and bloat creeps in, eventually strangling the application's efficiency. It's all about finding that sweet spot.&lt;/p&gt;

&lt;p&gt;Often DBAs increase autovacuum_max_workers , maintenance_work_mem, adjust autovacuum_vacuum_scale_factor but still wonder why it fails to clean the database effectively. This inefficiency arises because Autovacuum throughput is constrained by the autovacuum_vacuum_cost_limit and further limited by host restrictions on Cloud. Before we delve into tuning Autovacuum, we need some metrics at hand. Autovacuum is known for being I/O-intensive, but just how much strain does it place on your system? Let's explore.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#Firstly, let's see how much IO to expect.
#At most, an autovacuum can do IO as shown below.
Max Autovacuum throughput = (1000/autovacuum_vacuum_cost_delay) * (autovacuum_vacuum_cost_limit/vacuum_cost_page_hit) * 8 KB (default block_size)

#For PG17 default settings:
Max Autovacuum throughput = (1000/2) * (200/1) * 8 KB = 800MB per second
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now let's see what is io throughput limit on different cloud and at what cost?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Starting with AWS :-&lt;/strong&gt;(since it's most familiar to me) &lt;br&gt;
Old answer but I just wanted to show that throughput limit in AWS depends on both instance class and storage class.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0uinpkrrnqgydx69d03m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0uinpkrrnqgydx69d03m.png" alt="Image description" width="800" height="481"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Burstable instance class : It's too small throughput and generally not suitable for high performance demanding production workloads hence ignoring it.&lt;/p&gt;

&lt;p&gt;Standard instance class : General purpose standard instances of latest graviton series offer better throughput upto 2.5GBps.&lt;br&gt;
Gbps/Mbps here is in Bits hence division by 8 to get GB/s or MB/s.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F935mnz90rlvpiebsdmtw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F935mnz90rlvpiebsdmtw.png" alt="Image description" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Up to"&lt;/strong&gt; is a very vague term. I found another piece of the puzzle in AWS document.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foajisauou9bjh3inxlpj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foajisauou9bjh3inxlpj.png" alt="Image description" width="800" height="380"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Since maximum performance is guaranteed for only 30 minutes per day, we will focus only on baseline values.&lt;/p&gt;

&lt;p&gt;Next storage classes and limits:&lt;br&gt;
We will only consider SSDs and latest storage classes availble.&lt;br&gt;
GP3: GP3 offers 15MB/s for tiny (below 400GB) databases and customizable range of 62.5MB/s to 500MB/s for larger database.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5dysxgtk0gq8l26svilb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5dysxgtk0gq8l26svilb.png" alt="Image description" width="800" height="653"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;io2 :&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fov4lxevpwua0vezr7c26.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fov4lxevpwua0vezr7c26.png" alt="Image description" width="800" height="216"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For io2 we can reach up to 500MB/s as we crank up iops knob.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu9e7gqtmv0n1xc7qszvh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu9e7gqtmv0n1xc7qszvh.png" alt="Image description" width="800" height="322"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Cost : Addressing the Elephant in the room&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv4fa4phn4qsdrl35riib.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv4fa4phn4qsdrl35riib.png" alt="Image description" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let's see in AWS pricing calculator how much we would need to spend for reaching 500MB/s throughput capacity on AWS (in cheapest region) for 1TB RDS Postgres database. For simplifying calculations and comparisons we will consider single-AZ RDS, you should always use multi-AZ in production.&lt;br&gt;
Instance class m7g.4xlarge (baseline is 625MB/s), Storage io2 256000 iops&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F42i56rrihkwpuqansimw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F42i56rrihkwpuqansimw.png" alt="Image description" width="800" height="136"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F02o3ckk8tkyq90mub074.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F02o3ckk8tkyq90mub074.png" alt="Image description" width="800" height="304"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;holy IOPS!! 26K for 1TB&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0zs97vkgsqh1naqw3gnp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0zs97vkgsqh1naqw3gnp.png" alt="Image description" width="500" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Don't worry GP3 saves the day.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flp8an1xekr8h1i8sdy54.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flp8an1xekr8h1i8sdy54.png" alt="Image description" width="800" height="406"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnxvigzmb6e5jtmzxd882.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnxvigzmb6e5jtmzxd882.png" alt="Image description" width="800" height="239"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Seems reasonable cost if you have high throughput requirement on RDS.&lt;/p&gt;

&lt;p&gt;Next we will see pricing and capacity for Google Cloud SQL for PostgreSQL and Azure Database for PostgreSQL — Flexible Server to achieve same 500MB/s throughput. These two are closest matching counterparts of Amazon RDS Postgres.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Google :&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx0nqg96fx3blcli7aqlk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx0nqg96fx3blcli7aqlk.png" alt="Image description" width="800" height="401"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So we will need a Zonal extreme PD and N2 VM with 64 vCPUs to get 500MB/s on GCP.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7taafrxjapb25a4y41wn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7taafrxjapb25a4y41wn.png" alt="Image description" width="800" height="451"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Google pricing calculator is very limited, it does not allow me to select zonal extreme-pd and just offers generalized cost of ssd.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Azure Database for PostgreSQL — Flexible Server&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We will need a configuration of at least 16 vCores (for example, the D16ds_v5 instance) as it can provide up to 600 MB/s throughput.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffei0tdllsyc2rzghdhm8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffei0tdllsyc2rzghdhm8.png" alt="Image description" width="800" height="573"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;SSD v2 is still in preview as of writing this article.&lt;/p&gt;

&lt;p&gt;So estimated monthly cost would be around $1,400.00 slightly lower than AWS.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ug03gf8qju9o4olbhu3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ug03gf8qju9o4olbhu3.png" alt="Image description" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Why did I dive into the cost of compute and storage? Well, it’s not just about convincing the cloud FinOps team that we need a buffer for throughput — though that’s part of it. As a DBA, you might find yourself in a production incident where a small capacity bump could cost your company only a few hundred dollars. But in doing so, you’d be preventing a potential disaster worth thousands! Sometimes, the difference between smooth sailing and costly downtime is just a matter of proactively increasing capacity before things go south. 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpgw01nmaz1ospp0yjt2m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpgw01nmaz1ospp0yjt2m.png" alt="Image description" width="500" height="504"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Juggling Cost, Autovacuum I/O, and Application Performance&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbb1fomvx6g3yp5v2hry2.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbb1fomvx6g3yp5v2hry2.jpeg" alt="Image description" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Autovacuum Tuning consider the following two scenarios:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Your “Pet” Database:&lt;br&gt;
If you only have a single beloved database to care with ample throughput capacity, you can fine-tuneautovacuum_vacuum_cost_limit and autovacuum_max_workersto keep the vacuum backlog low. Regularly monitor dead tuples using pg_stat_all_tables to identify any potential issues. For very large tables that require special attention, adjust the table-level Autovacuum parameters while maintaining conservative settings for the rest.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Microservices Architecture:&lt;br&gt;
In a company operating a microservices architecture with hundreds of independent RDS PostgreSQL databases, cloud FinOps experts aim to optimize costs by selecting RDS instance classes and storage that align with the application’s throughput and latency requirements — a practice known as right-sizing. However, this approach often overlooks the Autovacuum component. Being on an undersized instance means that merely increasing autovacuum_vacuum_cost_limit, reducing autovacuum_vacuum_cost_delay, or adding more workers won’t suffice.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;As a DBA, you must think creatively :&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Consider partitioning problematic tables, enabling Autovacuum to operate on smaller chunks. With indexes local to partitions, their sizes decrease, aiding Autovacuum as it’s vacuuming all indexes each time.&lt;/li&gt;
&lt;li&gt;Detect and drop unused indexes. Lesser the number of indexes, quicker Autovacuum can finish.&lt;/li&gt;
&lt;li&gt;Minimize long-running transactions to prevent them from holding up xmin, rendering some Autovacuum runs ineffective.&lt;/li&gt;
&lt;li&gt;Keep autovacuum_work_mem large around 1GB.&lt;/li&gt;
&lt;li&gt;If Autovacuum on large tables still impacts application performance by hogging throughput, you may need to dial back cost limit and delay.&lt;/li&gt;
&lt;li&gt;However, don’t go too conservative on cost limit and delay; this risks TXID wraparound, which can bring down your entire database. Always implement an Early Warning System for TXID wraparound (AWS offers &lt;a href="https://aws.amazon.com/blogs/database/implement-an-early-warning-system-for-transaction-id-wraparound-in-amazon-rds-for-postgresql/" rel="noopener noreferrer"&gt;a detailed guide&lt;/a&gt; on this).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Additionally, log these warnings and observe dead tuple growth trends in your most critical databases. Share these findings with FinOps, advocating funding for more throughput capacity at both instace and storage level.&lt;/p&gt;

&lt;p&gt;A picture is worth a thousand words&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fesp9sqxdlt945uqdd1m6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fesp9sqxdlt945uqdd1m6.png" alt="Image description" width="800" height="408"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>database</category>
      <category>postgres</category>
      <category>aws</category>
      <category>azure</category>
    </item>
  </channel>
</rss>
