<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Raouf Chebri</title>
    <description>The latest articles on Forem by Raouf Chebri (@raoufchebri).</description>
    <link>https://forem.com/raoufchebri</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F853339%2F3fe742c9-df9d-4d6a-af4a-4e995a896424.JPG</url>
      <title>Forem: Raouf Chebri</title>
      <link>https://forem.com/raoufchebri</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/raoufchebri"/>
    <language>en</language>
    <item>
      <title>Build with confidence with Schema Diff &amp; Protected Branches</title>
      <dc:creator>Raouf Chebri</dc:creator>
      <pubDate>Tue, 16 Apr 2024 13:33:07 +0000</pubDate>
      <link>https://forem.com/neon-postgres/build-with-confidence-with-schema-diff-protected-branches-1p68</link>
      <guid>https://forem.com/neon-postgres/build-with-confidence-with-schema-diff-protected-branches-1p68</guid>
      <description>&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--3tlN2BAU--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://neondatabase.wpengine.com/wp-content/uploads/2024/04/image-30-1024x576.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--3tlN2BAU--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://neondatabase.wpengine.com/wp-content/uploads/2024/04/image-30-1024x576.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Neon helps teams ship with confidence without compromising development velocity. One of it’s features that contributes to that is database branching. In this post, we will explore two new features related to database branching: schema diff and protected branches.&lt;/p&gt;

&lt;p&gt;Database branches in Neon are copy-on-write clones of your database that you can use for development, testing, or experimentation without compromising the original database. &lt;/p&gt;

&lt;p&gt;Ever since we introduced the database branching feature to Neon, developers have asked for ways to set permission rules on individual branches and identify differences between parent and child branches. And here it is. &lt;/p&gt;

&lt;p&gt;Today, we’ll explore two of our newest features, which will provide you with greater confidence. &lt;/p&gt;

&lt;h1&gt;
  
  
  Schema Diff
&lt;/h1&gt;

&lt;p&gt;Similar to diffs in Git, the Neon schema diff feature compares schemas between the current and past state of the branch. Schema diffs are important to development workflows as they allow you to easily track how your database schema has evolved for better debugging, code review, and cross-team collaboration. For example, you can compare schemas after your peer has merged their PR and applied migrations.&lt;/p&gt;

&lt;p&gt;Join us on &lt;a href="https://neon.tech/discord" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; and let us know what you think and how you use schema diff in your workflows. &lt;/p&gt;

&lt;p&gt;We detailed how Neon storage and ephemeral branches work in the &lt;a href="https://neon.tech/blog/point-in-time-recovery-in-postgres#ephemeral-branches" rel="noopener noreferrer"&gt;Point In Time Recovery Under the Hood in Serverless Postgres&lt;/a&gt; article. In short, Neon’s storage engine saves Write-Ahead-Log records and can reconstruct a Postgres page at any given timestamp or Log-Sequence Number, allowing for time travel queries. &lt;/p&gt;

&lt;p&gt;Under the hood, schema diff creates short-lived, ephemeral branches (TTL=10 seconds) set at a specific time (and LSN), then queries the Pageservers to retrieve the past schema, and then compares it with the current one to effectively display the changes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ZmS50z0k--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/i8SiccLqfaZc3LGzo9LS2PDUtely5JtRlumiSemhfxWXjpqklpN3UOCZ9FsQ77KRgk1tDxnhQyTRJIhjOq-0cVAqpPoBIEEbfnwdsJdHmEjtofApOcqf1cHPas-2Xe5kn4lFL_yoN_q2koMeXNE_wr8" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ZmS50z0k--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/i8SiccLqfaZc3LGzo9LS2PDUtely5JtRlumiSemhfxWXjpqklpN3UOCZ9FsQ77KRgk1tDxnhQyTRJIhjOq-0cVAqpPoBIEEbfnwdsJdHmEjtofApOcqf1cHPas-2Xe5kn4lFL_yoN_q2koMeXNE_wr8" width="800" height="356"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let’s see an example of schema diff on the Console. For that, we will create a Neon project and run the following query to create a user table:&lt;/p&gt;



&lt;p&gt;Once done, we can compare the current state of the database with the user table to its previous state, where I had no tables. To do so, we first need to navigate to the Restore page, select a branch, and click on Schema Diff.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--pDpMJraF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/Mdk92rcHNgmAI6N12lQgq9pJJrP9Z65yIQ91x-UOsJMFeqoFIfSybmvrHVWEdKWXDCBt3PDPd0HVep818pSP3fMuaKqXh_DXfoN4e_wV62DBS_PI2wp3aKkqtkwZ1_t6vhW6W21Lmc9hvU3woNkj9Qw" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--pDpMJraF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/Mdk92rcHNgmAI6N12lQgq9pJJrP9Z65yIQ91x-UOsJMFeqoFIfSybmvrHVWEdKWXDCBt3PDPd0HVep818pSP3fMuaKqXh_DXfoN4e_wV62DBS_PI2wp3aKkqtkwZ1_t6vhW6W21Lmc9hvU3woNkj9Qw" width="800" height="455"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Below is what the result will look like:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Oy1QsmCH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/mR7snpsTg0mn3d-X-DGrVzEYUGBy9ostInduEnvaO1TQv8F08R90KSomLPHY-hSCjhV0jNQGkp-pv7z9R2ciu73fShvxku7cWGkR70ZL8lMSnjY1IWLoULL5bBlMYJPzynA0JgAsw4uNnTpIRUQprM0" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Oy1QsmCH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/mR7snpsTg0mn3d-X-DGrVzEYUGBy9ostInduEnvaO1TQv8F08R90KSomLPHY-hSCjhV0jNQGkp-pv7z9R2ciu73fShvxku7cWGkR70ZL8lMSnjY1IWLoULL5bBlMYJPzynA0JgAsw4uNnTpIRUQprM0" width="800" height="459"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As expected, the previous state of my database is empty because I had just created the project. Let’s now modify my schema by adding a new &lt;code&gt;phone_number&lt;/code&gt; column:&lt;/p&gt;



&lt;p&gt;Let’s compare again with the state of my database at 10:52 PM, the time after I ran the schema changes. The result should look like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--l5IMB_yr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/kMZvzi2VLjUIQP-jVt8atJKrJCU9GQ8Rk3NVNcQq_WOIewfqNP0zzB16JGDQXV-t90Ok1nvmSoQPVMfk0b2J5mNMdZ2rE3muqBeR-L2tiKMetGX6Tl-tnfYr0pN6YGJXh7PzOVrXtgF4nF3Kw8PKMCc" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--l5IMB_yr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/kMZvzi2VLjUIQP-jVt8atJKrJCU9GQ8Rk3NVNcQq_WOIewfqNP0zzB16JGDQXV-t90Ok1nvmSoQPVMfk0b2J5mNMdZ2rE3muqBeR-L2tiKMetGX6Tl-tnfYr0pN6YGJXh7PzOVrXtgF4nF3Kw8PKMCc" width="800" height="294"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can observe that lines 60 and 61 were modified as a result of the schema change.&lt;/p&gt;

&lt;h1&gt;
  
  
  Protected branches
&lt;/h1&gt;

&lt;p&gt;Protected branches prevent unauthorized applications, users, and roles from accessing personally identifiable information (PII) or other sensitive data within your branch. This feature is available for users who are on&lt;a href="https://neon.tech/pricing" rel="noopener noreferrer"&gt;the Scale plan&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The first feature that’s available in this release is “IP Allow”, which restricts database access exclusively to trusted IP addresses. We plan on introducing more rules in the future.&lt;/p&gt;

&lt;p&gt;If there are other ways we can protect your database branches, let us know on &lt;a href="https://neon.tech/discord" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; or &lt;a href="https://x.com/neondatabase" rel="noopener noreferrer"&gt;X&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;You have a limit of 5 protected branches in your project. To set your branch as protected, simply follow these steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go to the branches page.&lt;/li&gt;
&lt;li&gt;Find the database branch you wish to protect.&lt;/li&gt;
&lt;li&gt;Click on “More” and then select “Set as protected.”&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--RFZLRyqU--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/hka0TDGJnAVcr7AGqLVClhL_Wg4SL_tzK7HFRdqAingPNEyAK5ZvkjGeGGbGvJU8Roz53hlCOZw_3HMGPxXR5VS0mfsWj9ofLDOXQJJoKvyoImhI2CIuz7_9tO1_-TP4HpGZfyno5QdrV58DOcM-puQ" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--RFZLRyqU--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/hka0TDGJnAVcr7AGqLVClhL_Wg4SL_tzK7HFRdqAingPNEyAK5ZvkjGeGGbGvJU8Roz53hlCOZw_3HMGPxXR5VS0mfsWj9ofLDOXQJJoKvyoImhI2CIuz7_9tO1_-TP4HpGZfyno5QdrV58DOcM-puQ" width="800" height="287"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Check out the documentation for &lt;a href="https://neon.tech/docs/manage/branches#protected-branch" rel="noopener noreferrer"&gt;more details on protected branches&lt;/a&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;The addition of Schema Diff and Protected Branches to Neon allows developers to easily identify schema changes and safeguard sensitive data, and equipped to build with confidence.&lt;br&gt;&lt;br&gt;
&lt;a href="https://console.neon.tech" rel="noopener noreferrer"&gt;You can try Schema Diff and Protected Branches on Neon&lt;/a&gt; today. Join us on &lt;a href="https://neon.tech/discord" rel="noopener noreferrer"&gt;Discord&lt;/a&gt;, and let us know how we can help you build better and ship faster with Neon.&lt;/p&gt;

</description>
      <category>community</category>
    </item>
    <item>
      <title>IP Allow with IPv6</title>
      <dc:creator>Raouf Chebri</dc:creator>
      <pubDate>Wed, 13 Mar 2024 10:03:44 +0000</pubDate>
      <link>https://forem.com/neon-postgres/ip-allow-with-ipv6-175d</link>
      <guid>https://forem.com/neon-postgres/ip-allow-with-ipv6-175d</guid>
      <description>&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--4Yh3A8iG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://neondatabase.wpengine.com/wp-content/uploads/2024/03/image-5-1024x576.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--4Yh3A8iG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://neondatabase.wpengine.com/wp-content/uploads/2024/03/image-5-1024x576.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;We’re Neon. We’re building Postgres that helps you confidently ship reliable and scalable apps. You can&lt;/strong&gt; &lt;a href="https://console.neon.tech" rel="noopener noreferrer"&gt;try Neon now for free&lt;/a&gt; &lt;strong&gt;. We recently added support for IPv6 addresses in the IP Allow feature. This post explains what IPv6 is and its benefits.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  IPv4 limitations
&lt;/h2&gt;

&lt;p&gt;IPv4 has been around for almost half a century. It uses 32-bit addresses, which allows for about 4.3 billion unique addresses. While this number seemed more than sufficient in the early days of the internet, the explosive growth of the internet and the proliferation of smart devices have led to a situation where the world is running out of available IPv4 addresses. This limitation has prompted the need for a solution to accommodate the vast scale of the modern internet.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution: IPv6
&lt;/h2&gt;

&lt;p&gt;IPv6 was developed to address the issue of IPv4 address exhaustion. It uses 128-bit addresses, which significantly expands the number of possible addresses to approximately 340 undecillion (3.4 × 10^38), a virtually inexhaustible supply for the foreseeable future. This vast expansion solves the primary problem of IPv4 address exhaustion, ensuring that every device on the internet can have a unique IP address.&lt;/p&gt;

&lt;p&gt;Here is an example of an IPv6 address: &lt;code&gt;2001:0db8:85a3:0000:0000:8a2e:0370:7334&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;IPv6 addresses consist of 128 bits, representing 8 groups of 4 hexadecimal digits. &lt;/p&gt;

&lt;h2&gt;
  
  
  Why support IPv6 addresses in IP Allow
&lt;/h2&gt;

&lt;p&gt;IP Allow limits database access to only trusted IP addresses, preventing unauthorized access and helping maintain overall data security.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ijZ7Bcm_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/Yvj7G-G9tyb8n3-Va2yLYKAVId93wppHQXg0mi6pmL63ZmBtaddTtPOSmzG0XqAYruizKblkpYkDm4hQA6Bqr9g8cUye-CPpHnPv3ARCKDp6g4mATuIGlmEtY_ZBZxUN1uCKSdRVCIZuldvI-FNoA5U" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ijZ7Bcm_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/Yvj7G-G9tyb8n3-Va2yLYKAVId93wppHQXg0mi6pmL63ZmBtaddTtPOSmzG0XqAYruizKblkpYkDm4hQA6Bqr9g8cUye-CPpHnPv3ARCKDp6g4mATuIGlmEtY_ZBZxUN1uCKSdRVCIZuldvI-FNoA5U" width="800" height="488"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Nowadays, most cloud providers allow you to design and deploy a global environment that leverages end-to-end IPv6 connectivity. AWS, for example, provides services with IPv6-only capabilities such as EC2 instances, Elastic Load Balancers, Amazon EKS, and others. &lt;/p&gt;



&lt;p&gt;Adding IPv6 support to Neon allows you to overcome IPv4 limitations and build highly scalable architectures while maintaining backward compatibility with your existing IPv4 workloads. This is particularly useful for large-scale and containerized applications, allowing you to focus on migrating and scaling applications without devoting effort towards overcoming IPv4 limits.&lt;/p&gt;

&lt;p&gt;Thanks for reading. We would love to get your feedback. Follow us on &lt;a href="https://x.com/neondatabase" rel="noopener noreferrer"&gt;X&lt;/a&gt;, join us on &lt;a href="https://neon.tech/discord" rel="noopener noreferrer"&gt;Discord&lt;/a&gt;, and let us know how we can help you build secure, reliable, and scalable applications.&lt;/p&gt;

</description>
      <category>community</category>
    </item>
    <item>
      <title>Better Postgres with Prisma Experience</title>
      <dc:creator>Raouf Chebri</dc:creator>
      <pubDate>Thu, 07 Mar 2024 14:17:20 +0000</pubDate>
      <link>https://forem.com/neon-postgres/better-postgres-with-prisma-experience-1ki4</link>
      <guid>https://forem.com/neon-postgres/better-postgres-with-prisma-experience-1ki4</guid>
      <description>&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--4gGr4bvV--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://neondatabase.wpengine.com/wp-content/uploads/2024/03/image-1024x576.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--4gGr4bvV--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://neondatabase.wpengine.com/wp-content/uploads/2024/03/image-1024x576.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;We’re Neon. We’re building Postgres that helps you confidently ship reliable and scalable apps. We made Postgres on Neon work seamlessly with Prisma. This article explains how we did it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We love &lt;a href="https://www.prisma.io/docs/orm/overview/databases/postgresql" rel="noopener noreferrer"&gt;Prisma&lt;/a&gt;, and so do developers. &lt;a href="https://www.prisma.io/docs/orm/overview/databases/postgresql" rel="noopener noreferrer"&gt;Prisma ORM&lt;/a&gt; makes it easy to perform schema migrations and map &lt;em&gt;any&lt;/em&gt; database objects with your existing JavaScript and TypeScript applications, allowing you to integrate type-safe queries into your codebase.&lt;/p&gt;

&lt;p&gt;Today, we’re pleased to share significant improvements to the developer experience of Neon using Prisma by adding support to schema migrations via pooled connections, making it possible to use Neon’s default connection string to scale your serverless apps and run schema migrations. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://console.neon.tech" rel="noopener noreferrer"&gt;You can start using Prisma with Neon for free.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For reference, when we first introduced Neon, Prisma users needed a direct database URL, a pooled database URL, and a shadow database URL. Here how your &lt;code&gt;prisma.schema&lt;/code&gt; file looked like:&lt;/p&gt;



&lt;p&gt;Now, all you need is one database URL. As a bonus, we also removed the need to specify the query parameter &lt;code&gt;pgbouncer=true&lt;/code&gt; when using pooled connections:&lt;/p&gt;



&lt;p&gt;This article discusses each step of the process and the changes made to Neon, PgBouncer, and Prisma to make this possible, including:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Schema migration support with pooled connections&lt;/li&gt;
&lt;li&gt;Dropping a shadow database WITH (FORCE)&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Schema migration support with pooled connections
&lt;/h2&gt;

&lt;p&gt;In enhancing the experience with Prisma, we &lt;a href="https://github.com/pgbouncer/pgbouncer/releases/tag/pgbouncer_1_21_0" rel="noopener noreferrer"&gt;added support for prepared statements&lt;/a&gt; and &lt;a href="https://github.com/pgbouncer/pgbouncer/releases/tag/pgbouncer_1_22_0" rel="noopener noreferrer"&gt;&lt;code&gt;DISCARD ALL/DEALLOCATE ALL&lt;/code&gt; to PgBouncer&lt;/a&gt; to allow for schema migration using pooled connections. Let’s explore why.&lt;/p&gt;

&lt;p&gt;In Postgres, each connection is a backend process that requires memory allocation, which limits the number of concurrent connections. The solution to this problem is connection pooling with PgBouncer, which helps keep the number of active backend processes low.&lt;/p&gt;

&lt;p&gt;PgBouncer becomes increasingly important at scale when using serverless services such as &lt;a href="https://aws.amazon.com/pm/lambda/" rel="noopener noreferrer"&gt;AWS Lambda&lt;/a&gt; or &lt;a href="https://vercel.com/docs/functions" rel="noopener noreferrer"&gt;Vercel functions&lt;/a&gt;, since each function call establishes a new connection. We name database connections that use PgBouncer pooled connections.&lt;/p&gt;

&lt;p&gt;Additionally, &lt;a href="https://www.prisma.io/docs/orm/prisma-migrate/getting-started" rel="noopener noreferrer"&gt;&lt;code&gt;prisma migrate&lt;/code&gt;&lt;/a&gt; uses prepared statements to optimize SQL query performance, and &lt;a href="https://github.com/prisma/prisma-engines/blob/4308b705cc0694626ff407996f3145ddef0ad1c6/quaint/src/connector/postgres/native/mod.rs#L507" rel="noopener noreferrer"&gt;&lt;code&gt;DEALLOCATE ALL&lt;/code&gt;&lt;/a&gt; to release all prepared statements in the current session &lt;a href="https://www.prisma.io/docs/orm/prisma-client/setup-and-configuration/databases-connections/pgbouncer#add-pgbouncertrue-to-the-connection-url" rel="noopener noreferrer"&gt;before preparing and executing Prisma Client queries&lt;/a&gt;. More on prepared statements in the &lt;a href="https://neon.tech/blog/pgbouncer-the-one-with-prepared-statements#what-are-prepared-statements" rel="noopener noreferrer"&gt;PgBouncer 1.22.0 support announcement article&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Before version 1.22.0, if you attempted to run &lt;code&gt;prisma migrate&lt;/code&gt; commands using a pooled connection, you might have seen the following error:&lt;/p&gt;



&lt;p&gt;To scale using pooled connections and be able to perform schema migrations, you had to specify both pooled and direct database URLs and set &lt;code&gt;pgbouncer&lt;/code&gt; mode as a query parameter in your schema file. &lt;/p&gt;

&lt;p&gt;Here is how the &lt;code&gt;datasource db&lt;/code&gt; block in your &lt;code&gt;prisma.schema&lt;/code&gt; file looked like:&lt;/p&gt;



&lt;p&gt;Here is how it looks now after adding support for prepared statements and &lt;code&gt;DISCARD ALL/DEALLOCATE ALL&lt;/code&gt; to PgBouncer:&lt;/p&gt;



&lt;p&gt;Note you only need the pooled connection to run Prisma with Postgres. It’s no longer required to specify a direct connection to the database and the &lt;code&gt;pgbouncer=true&lt;/code&gt; query parameter. The pooled connection is used to scale your queries and run schema migrations.&lt;/p&gt;

&lt;p&gt;This allows Neon to confidently set the default URL to the pooled connection string on the Console and the &lt;a href="https://neon.tech/docs/changelog/2024-02-23-console#neon-vercel-integration-improvements" rel="noopener noreferrer"&gt;Vercel Integration&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  DROP shadow database WITH (FORCE)
&lt;/h2&gt;

&lt;p&gt;When you run &lt;code&gt;prisma migrate dev&lt;/code&gt;, Prisma Migrate uses a shadow database to detect schema drifts and generate new migrations. During that process, Prisma creates, introspects, and then drops a shadow database. &lt;a href="https://www.prisma.io/docs/orm/prisma-migrate/understanding-prisma-migrate/shadow-database" rel="noopener noreferrer"&gt;More on shadow databases on Prisma’s documentation.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;However, certain cloud providers do not allow to drop and create databases via SQL, which forces developers to manually create shadow databases and specify them in the &lt;code&gt;prisma.schema&lt;/code&gt; file:&lt;/p&gt;



&lt;p&gt;We have added &lt;a href="https://neon.tech/blog/prisma-dx-improvements#removing-the-need-for-manually-creating-the-shadow-database" rel="noopener noreferrer"&gt;support for managing roles and databases via SQL&lt;/a&gt; on Neon, which allowed for removing the need for manually creating a shadow database. Additionally, &lt;a href="https://github.com/prisma/prisma/releases/tag/5.10.0" rel="noopener noreferrer"&gt;Prisma 5.10.0&lt;/a&gt; &lt;a href="https://github.com/prisma/prisma-engines/pull/4722" rel="noopener noreferrer"&gt;introduces support for &lt;code&gt;DROP WITH (FORCE)&lt;/code&gt;&lt;/a&gt; as an alternative drop database path in the schema engine, which allows it to dispose of shadow databases.&lt;/p&gt;

&lt;p&gt;So, in your &lt;code&gt;schema.prisma&lt;/code&gt; file, you would have:&lt;/p&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The improvements included in PgBouncer 1.22.0 have significantly streamlined the experience for developers using Postgres on Neon and Prisma, making is more efficient to scale serverless applications and run schema migrations. &lt;/p&gt;

&lt;p&gt;We would love to get your feedback. Follow us on &lt;a href="https://x.com/neondatabase" rel="noopener noreferrer"&gt;X&lt;/a&gt;, join us on &lt;a href="https://neon.tech/discord" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; and let us know how we can help you build the next generation of web applications.&lt;/p&gt;

&lt;p&gt;Shout out to all contributors for making this possible, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/JelteF" rel="noopener noreferrer"&gt;Jelte Fennema&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/dashorst" rel="noopener noreferrer"&gt;Martijn Dashorst&lt;/a&gt; &lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/knizhnik" rel="noopener noreferrer"&gt;Konstantin Knizhnik&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/sboschman" rel="noopener noreferrer"&gt;Sverre Boschman&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/fschmager" rel="noopener noreferrer"&gt;Frank Schmager&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/Medvecrab" rel="noopener noreferrer"&gt;Oleg Tselebrovskiy&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/skyzh" rel="noopener noreferrer"&gt;Alex Chi Z&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>community</category>
      <category>prisma</category>
    </item>
    <item>
      <title>Deploy Mistral Large to Azure and create a conversation with Python and LangChain</title>
      <dc:creator>Raouf Chebri</dc:creator>
      <pubDate>Tue, 27 Feb 2024 01:59:48 +0000</pubDate>
      <link>https://forem.com/neon-postgres/deploy-mistral-large-to-azure-and-create-a-conversation-with-python-and-langchain-3dbo</link>
      <guid>https://forem.com/neon-postgres/deploy-mistral-large-to-azure-and-create-a-conversation-with-python-and-langchain-3dbo</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fneondatabase.wpengine.com%2Fwp-content%2Fuploads%2F2024%2F02%2Fneon-mistral-large-1024x576.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fneondatabase.wpengine.com%2Fwp-content%2Fuploads%2F2024%2F02%2Fneon-mistral-large-1024x576.jpg"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;We’re Neon, and we’re redefining the database experience with our cloud-native serverless Postgres solution. If you’ve been looking for a database for your RAG apps that adapts to your application loads, you’re in the right place.&lt;/strong&gt; &lt;a href="https://neon.tech" rel="noopener noreferrer"&gt;Learn more about Neon and give it a try&lt;/a&gt;, &lt;strong&gt;and let us know what you think. Neon is cloud-native Postgres and scales your AI apps to millions of users with pgvector. In this post, Raouf is going to tell you what you need to know about Mistral Large, the most advanced LLM by MistralAI.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://mistral.ai/" rel="noopener noreferrer"&gt;Mistral AI&lt;/a&gt;has recently unveiled its most advanced open-source large language model (LLM) yet, &lt;a href="https://mistral.ai/news/mistral-large/" rel="noopener noreferrer"&gt;Mistral Large&lt;/a&gt;, alongside its ChatGPT competitor, &lt;a href="https://chat.mistral.ai/chat" rel="noopener noreferrer"&gt;Le Chat (beta)&lt;/a&gt;. Le Chat includes other models such as Next, and Small, to let you explore Mistral AI’s capabilities. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2F8gIqi97fGwzEyoTi2KZT0VEu5V5f6K97FSHSZDxienTF5OjhFd6q1iYetEISe_PlEeOZbTrdgPFtY5Gfn39WGj7wGa1UUN3Mqcg4LQIyhuIwyLobLxM1Ny28P5Z3VWzJEEPSVFZP5WDkJYrKQF3Yo4A" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2F8gIqi97fGwzEyoTi2KZT0VEu5V5f6K97FSHSZDxienTF5OjhFd6q1iYetEISe_PlEeOZbTrdgPFtY5Gfn39WGj7wGa1UUN3Mqcg4LQIyhuIwyLobLxM1Ny28P5Z3VWzJEEPSVFZP5WDkJYrKQF3Yo4A"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For those waiting to get their hands on Le Chat but stuck in the queue, this guide will show you how to deploy Mistral Large on Azure and start using it immediately with LangChain.&lt;/p&gt;

&lt;p&gt;Before we dive into the deployment process, let’s briefly explore Mistral Large.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mistral Large
&lt;/h2&gt;

&lt;p&gt;Mistral Large is Mistral AI’s most advanced model with unparalleled reasoning capabilities across multiple languages, including French, Spanish, German and Italian. It has a generous 32k token context window making interesting for Retrieval Augmented Generation applications. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2FcxDxRUeLm6wWrMNszLuhRULvsEzEBhHt8wd2vHiLbtm-DIG918JKnDKrkVoTQX8JZ89sI-fVamTsgczVvu6xvA3SbYtPpAsRnq00Q5qwQPEMj2tVk3S4F67tHbRXMhFxBdb9Fu6at-fiOmKXiIzKdg4" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2FcxDxRUeLm6wWrMNszLuhRULvsEzEBhHt8wd2vHiLbtm-DIG918JKnDKrkVoTQX8JZ89sI-fVamTsgczVvu6xvA3SbYtPpAsRnq00Q5qwQPEMj2tVk3S4F67tHbRXMhFxBdb9Fu6at-fiOmKXiIzKdg4"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Comparison measuring massive multitask language understanding&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;And most importantly, Mistral Large is pretty good at coding and math. The model ranks the highest in the MassiveText Benchmarks for Programming Problems (MBPP), which covers a wide range of difficulty levels and programming concepts and is designed to evaluate models on several fronts, including accuracy and efficiency.&lt;/p&gt;

&lt;p&gt;Mistral Large also ranks the highest in the GSM8K, which measures the capabilities of AI models in educational contexts and reasoning in mathematics.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2F8J_GrBLxGXn9Nc6F17IDeM67h7ExNr9aDoXTry0vW6sZZDu89Ik-wOYjDh2KkDL3r5EhEhvx5mrUZ7RIaYhIGBlNdP1kyukckWF8CtGqONU9EJH6Z_LWv40Kc-pDZK0-p9neoJVbGhV0J-rjQ2PU6Mw" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2F8J_GrBLxGXn9Nc6F17IDeM67h7ExNr9aDoXTry0vW6sZZDu89Ik-wOYjDh2KkDL3r5EhEhvx5mrUZ7RIaYhIGBlNdP1kyukckWF8CtGqONU9EJH6Z_LWv40Kc-pDZK0-p9neoJVbGhV0J-rjQ2PU6Mw"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But don’t believe the benchmarks. Next, we’ll deploy the Mistral Large model to Azure and try it for ourselves.&lt;/p&gt;
&lt;h2&gt;
  
  
  Deploy your own Mistral Large model to Azure
&lt;/h2&gt;

&lt;p&gt;As part of the launch, &lt;a href="https://techcommunity.microsoft.com/t5/ai-machine-learning-blog/mistral-large-mistral-ai-s-flagship-llm-debuts-on-azure-ai/ba-p/4066996" rel="noopener noreferrer"&gt;Mistral AI announced its partnership with Microsoft&lt;/a&gt;, making the Mistral Large model available on Azure. Below are the steps to deploy the model:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Access Azure AI Studio&lt;/strong&gt; : Sign into your Azure account and navigate to &lt;a href="https://aka.ms/aistudio/landing/mistral-large" rel="noopener noreferrer"&gt;AI Studio&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deploy Mistral Large&lt;/strong&gt; : Look for the “Deploy” option and select Mistral Large for deployment.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2FPfnrRJlLi2QkBbCQduHnQolVAVvIpboPWVS2BbvDI0LNp3urRqKVzZOP6aYaNuD7P3AQMcXgVzqjvvZ8OiEHbbnsvqcozZrt9sRC7C_mhQUpdryrxd9vfsS7xY_jsWXLIOoBS-AynTld7yvsmmwUD1Q" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2FPfnrRJlLi2QkBbCQduHnQolVAVvIpboPWVS2BbvDI0LNp3urRqKVzZOP6aYaNuD7P3AQMcXgVzqjvvZ8OiEHbbnsvqcozZrt9sRC7C_mhQUpdryrxd9vfsS7xY_jsWXLIOoBS-AynTld7yvsmmwUD1Q"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Create a Project&lt;/strong&gt; : If you haven’t already, set up a new project, opting for the Pay-As-You-Go plan and choosing France Central as your region.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2F0wSELK9qf2A2wRW-X2OL5vOr3xyqESk7w9QV30SvxORwk5nKoY3eKXMlV4H4xjUdX-WfqKaDwuFXtzoHNuGAeoX6g3USIQgQKB8n0mLvOExqmSDXaVxwPCwlB23AYB9Sw0jSrxEpXApkbrRSWAAOCdM" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2F0wSELK9qf2A2wRW-X2OL5vOr3xyqESk7w9QV30SvxORwk5nKoY3eKXMlV4H4xjUdX-WfqKaDwuFXtzoHNuGAeoX6g3USIQgQKB8n0mLvOExqmSDXaVxwPCwlB23AYB9Sw0jSrxEpXApkbrRSWAAOCdM"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Review and Create&lt;/strong&gt; : Double-check your resource information before finalizing your AI project.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2FFKZnA38G-1NsyZSrXPf9blXdJM2HXuiwp2jMkAKze9ikcn_XQGMd8hZKNiyV8gkR4iUWLLvZe4BUaruVHaOHidMvvrPV_UiVkWwLC3VzpGY8DiTuMZJvVqOYdYfeVNCWYq2Qy-lZ4W9PkjGstWAh8vo" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2FFKZnA38G-1NsyZSrXPf9blXdJM2HXuiwp2jMkAKze9ikcn_XQGMd8hZKNiyV8gkR4iUWLLvZe4BUaruVHaOHidMvvrPV_UiVkWwLC3VzpGY8DiTuMZJvVqOYdYfeVNCWYq2Qy-lZ4W9PkjGstWAh8vo"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Finalize Deployment&lt;/strong&gt; : After creating your AI project, proceed to deploy Mistral Large. Choose a name for your deployment; this will be your inference endpoint’s identifier.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Select a Deployment Name&lt;/strong&gt; : This is the name that will be displayed on your inference endpoint.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2FguIBz98FTh8vG_v6taDCz7GcQLj792GnMV-F0waGEMM_u6v6cp-RKR0E3w7JE6nIxy6ticXYMg40bqnJUWjJjaLvV8jYMUQilPYRFy8RuSRK6EfKPMavf_pTBRYT1SXKGXBWtQRtXdyGJ6PPNroI0rs" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2FguIBz98FTh8vG_v6taDCz7GcQLj792GnMV-F0waGEMM_u6v6cp-RKR0E3w7JE6nIxy6ticXYMg40bqnJUWjJjaLvV8jYMUQilPYRFy8RuSRK6EfKPMavf_pTBRYT1SXKGXBWtQRtXdyGJ6PPNroI0rs"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Congratulations 🎉 You’ve successfully deployed Mistral Large on Azure!&lt;/p&gt;
&lt;h2&gt;
  
  
  How to use Mistral Large with LangChain
&lt;/h2&gt;

&lt;p&gt;After deployment, you’ll receive an API endpoint and a security key for making inferences. We’ll use those further below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2FpvRNJlPtU4hNkDD2wFZ21Jfzh_-M3URrLdMctrJphRsYZ73F_nYoj2iW7_lpZcAnl194JZx-Gjpu8mybCxJfHnnmle5__LtTszAYEBgKl5r1l-QL71-m72KYYIK0WO1e89Ou0EgjDwwDIygkjRh_LOY" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2FpvRNJlPtU4hNkDD2wFZ21Jfzh_-M3URrLdMctrJphRsYZ73F_nYoj2iW7_lpZcAnl194JZx-Gjpu8mybCxJfHnnmle5__LtTszAYEBgKl5r1l-QL71-m72KYYIK0WO1e89Ou0EgjDwwDIygkjRh_LOY"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To use Mistral Large with LangChain, follow these steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Create project&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;mistral-large-example
&lt;span class="nb"&gt;cd &lt;/span&gt;mistral-large-example&lt;span class="s2"&gt;"&amp;gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Create and activate Python environment:&lt;/strong&gt; Run the following command to create an environment.
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python &lt;span class="nt"&gt;-m&lt;/span&gt; venv myenv
&lt;span class="nb"&gt;source &lt;/span&gt;myenv/bin/activate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Install packages and project dependencies:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;langchain langchain_mistralai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Create a LangChain conversation:&lt;/strong&gt; first, create a file:
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;touch &lt;/span&gt;main.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Here’s an example of how to create a LangChain conversation chain with Mistral Large:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.chains&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LLMChain&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.memory&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ConversationBufferMemory&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.prompts&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatPromptTemplate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;HumanMessagePromptTemplate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;MessagesPlaceholder&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.schema&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SystemMessage&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_mistralai.chat_models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatMistralAI&lt;/span&gt;

&lt;span class="c1"&gt;# Configuration for prompting
&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ChatPromptTemplate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_messages&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="nc"&gt;SystemMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a chatbot engaging in a conversation with a human, often incorporating French cultural references.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nc"&gt;MessagesPlaceholder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;variable_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chat_history&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;HumanMessagePromptTemplate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{human_input}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Memory configuration
&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConversationBufferMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memory_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chat_history&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;return_messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Configuring the Mistral model endpoint and API key
&lt;/span&gt;&lt;span class="n"&gt;chat_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatMistralAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://&amp;amp;lt;endpoint&amp;amp;gt;.francecentral.inference.ai.azure.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;mistral_api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;amp;lt;api-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Setting up the conversation chain
&lt;/span&gt;&lt;span class="n"&gt;chat_llm_chain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LLMChain&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;chat_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Example usage
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chat_llm_chain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;human_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hi there, my friend&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Copy/Paste the code above to the &lt;code&gt;main.py&lt;/code&gt; file and run the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python main.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here is how the output should look like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python3 main.py 

Entering new LLMChain chain...

Prompt after formatting:

System: You are a chatbot engaging &lt;span class="k"&gt;in &lt;/span&gt;a conversation with a human, often incorporating French cultural references.

Human: Hi there, my friend

&amp;amp;gt&lt;span class="p"&gt;;&lt;/span&gt; Finished chain.

 Hello! It&lt;span class="s1"&gt;'s a pleasure to chat with you. As you'&lt;/span&gt;ve noticed, I enjoy incorporating French cultural references into our conversations. Did you know that the Eiffel Tower, one of France&lt;span class="s1"&gt;'s most iconic landmarks, was initially criticized by some of France'&lt;/span&gt;s leading artists and intellectuals &lt;span class="k"&gt;for &lt;/span&gt;its design when it was first built? How can I assist you today?&lt;span class="s2"&gt;"&amp;gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;There has never been a better time to develop AI-powered applications. With rapid deployments to robust and scalable infrastructures such as Azure’s, developers can create applications that are more intelligent, interactive, and impactful.&lt;/p&gt;

&lt;p&gt;If you are building a RAG application, or simply need a Postgres database that scales, Neon with its autoscaling capabilities offers elastic vector search and fast index build with pgvector, making your AI apps fast and scalable to millions of users.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://console.neon.tech" rel="noopener noreferrer"&gt;Start building with Neon for free today&lt;/a&gt;, join us on &lt;a href="https://neon.tech/discord" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; and let us know what you’re working on and how we can help you build better apps. &lt;/p&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/neon-postgres/mixtral-8x7b-what-you-need-to-know-about-mistral-ais-latest-model-l3n"&gt;Mixtral 8x7B: What you need to know about Mistral AI’s latest model&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/neon-postgres/mistral-7b-and-baai-on-workers-ai-vs-openai-models-for-rag-4pb6"&gt;Mistral 7B and BAAI on Workers AI vs. OpenAI Models for RAG&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/neon-postgres/pgvector-30x-faster-index-build-for-your-vector-embeddings-46da"&gt;pgvector: 30x Faster Index Build for your Vector Embeddings&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/raoufchebri/building-an-ai-powered-chatbot-using-vercel-openai-and-postgres-5cob-temp-slug-5539820"&gt;Building an AI-powered ChatBot using Vercel, OpenAI, and Postgres&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>mistral</category>
      <category>llm</category>
      <category>rag</category>
    </item>
    <item>
      <title>Autoscaling in Action: Postgres Load Testing with pgbench</title>
      <dc:creator>Raouf Chebri</dc:creator>
      <pubDate>Fri, 23 Feb 2024 09:25:39 +0000</pubDate>
      <link>https://forem.com/neon-postgres/autoscaling-in-action-postgres-load-testing-with-pgbench-5e84</link>
      <guid>https://forem.com/neon-postgres/autoscaling-in-action-postgres-load-testing-with-pgbench-5e84</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyr5pb09yvp93ankmtyfn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyr5pb09yvp93ankmtyfn.png" alt="Blog post cover"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this article, I’ll show Neon autoscaling in action by running a load test using one of Postgres’ most popular benchmarking tool, &lt;code&gt;pgbench&lt;/code&gt;. The test simulates 30 clients running a heavy query. &lt;/p&gt;

&lt;p&gt;While 30 doesn’t sound like a lot, the query involves a mathematical function with high computational overhead, which signals to the autoscaler-agent that it needs to allocate more resources to the VM.&lt;/p&gt;

&lt;p&gt;We will not cover how autoscaling works, but for those interested in knowing the details, you can read more about &lt;a href="https://neon.tech/blog/scaling-serverless-postgres" rel="noopener noreferrer"&gt;how we implemented autoscaling in Neon&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For this load test, you will need:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://console.neon.tech" rel="noopener noreferrer"&gt;A Neon account&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://wiki.postgresql.org/wiki/Homebrew" rel="noopener noreferrer"&gt;pgbench&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The load test
&lt;/h2&gt;

&lt;p&gt;Ensuring your production database can perform under varying loads is crucial. That’s why we implemented autoscaling to Neon, a feature that dynamically adjusts resources allocated to a database in real-time, based on its current workload. &lt;/p&gt;

&lt;p&gt;However, the effectiveness and efficiency of autoscaling are often taken for granted without thorough testing. To showcase autoscaling in action, we turn to Postgres and &lt;code&gt;pgbench&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;pgbench&lt;/code&gt; is a benchmarking tool included with Postgres, designed to evaluate the performance of a Postgres server. The tool simulates client load on the server and runs tests to measure how the server handles concurrent data requests. &lt;/p&gt;

&lt;p&gt;&lt;code&gt;pgbench&lt;/code&gt; is executed from the command line, and its usage can vary widely depending on the specific tests or benchmarks being run. Here is the command we will use in our test:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pgbench &lt;span class="nt"&gt;-f&lt;/span&gt; test.sql &lt;span class="nt"&gt;-c&lt;/span&gt; 30 &lt;span class="nt"&gt;-T&lt;/span&gt; 120 &lt;span class="nt"&gt;-P&lt;/span&gt; 1 &amp;amp;lt&lt;span class="p"&gt;;&lt;/span&gt;CONNECTION_STRING&amp;amp;gt&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this example, &lt;code&gt;pgbench&lt;/code&gt; executes the query in &lt;code&gt;test.sql&lt;/code&gt;. The parameter &lt;code&gt;-c 30&lt;/code&gt; specifies 30 client connections, and &lt;code&gt;-T 120&lt;/code&gt; runs the test for 120 seconds against your database. &lt;code&gt;-P 1&lt;/code&gt; specifies that pgbench should report the progress of the test every 1 second. The progress report typically includes the number of transactions completed so far and the number of transactions per second.&lt;/p&gt;

&lt;p&gt;30 clients don’t seem like enough do stress a database. Well, it depends on the query you’re executing, which we’ll see next.&lt;/p&gt;

&lt;h2&gt;
  
  
  Query execution plan
&lt;/h2&gt;

&lt;p&gt;Here is the query we’ll use for our load test:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;factorial&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;32000&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;factorial&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20000&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Mathematically, this query essentially compares the growth rates of the factorials of 32,000 and 20,000 by examining the ratio of their logarithms. &lt;/p&gt;

&lt;p&gt;Remember factorials? The factorial of a number n (denoted as n!) is the product of all positive integers less than or equal to n. For example, the factorial of 5 (5!) is 5 * 4 * 3 * 2 * 1 = 120. Factorials grow very rapidly with increasing numbers. &lt;/p&gt;

&lt;p&gt;To give you a sense of scale, the factorial of just 20 is already a 19-digit number: 20!=2,432,902,008,176,640,000&lt;/p&gt;

&lt;p&gt;The natural logarithmic function (log), on the other hand, is the power to which &lt;em&gt;e&lt;/em&gt; (Euler’s number = 2.71828) must be raised to obtain the value x.&lt;/p&gt;

&lt;p&gt;In other words, this operation should take a long time to process. How long? Let’s examine the query execution plan using EXPLAIN ANALYZE:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="k"&gt;ANALYZE&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;factorial&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;32000&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;factorial&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20000&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;QUERY PLAN                                      

&lt;span class="nt"&gt;-------------------------------------------------------------------------------------&lt;/span&gt;

 Result  &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;cost&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0.00..0.01 &lt;span class="nv"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="nv"&gt;width&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;32&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;actual &lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0.000..0.001 &lt;span class="nv"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="nv"&gt;loops&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1&lt;span class="o"&gt;)&lt;/span&gt;

 Planning Time: 1921.630 ms

 Execution Time: 0.005 ms

&lt;span class="o"&gt;(&lt;/span&gt;3 rows&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This query was executed on ¼ vCPU. &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; includes the planner’s estimates and real execution metrics. Execution Time appears to be quite fast. However, Planning Time (the time taken by the Postgres query planner to generate the execution plan) takes almost 2 seconds and suggests that preparing to run this mathematical function involves significant computational overhead.&lt;/p&gt;

&lt;p&gt;Combine 30 of those, and we should stress Postgres enough to trigger autoscaling.&lt;/p&gt;

&lt;h2&gt;
  
  
  Enabling autoscaling
&lt;/h2&gt;

&lt;p&gt;Autoscaling is the process of automatically increasing or decreasing the CPU and memory allocated to a database based on its current load. It dynamically adjusts the compute resources allocated to a Neon compute instance in response to the current load, eliminating the need for manual intervention. &lt;a href="https://neon.tech/docs/introduction/autoscaling" rel="noopener noreferrer"&gt;Learn more about autoscaling in the docs&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;You can enable autoscaling by defining the minimum and maximum compute units (CU) you’d like to allocate to your Postgres instance. This way, you remain in control of your resource consumption. For example, 1 CU allocates 1vCPU and 4GB of RAM to your instance.&lt;/p&gt;

&lt;p&gt;You can set your instance size when you create a new project or by navigating to the Branches page on your Neon Console, clicking on the database branch, and setting the CU range.&lt;/p&gt;



&lt;p&gt;I will set the range for this load test from ¼ to 7 CUs.&lt;/p&gt;

&lt;h1&gt;
  
  
  Executing &amp;amp; monitoring the load test
&lt;/h1&gt;

&lt;p&gt;Let’s run our load test now and observe its effect on our Postgres instance. We recently added graphs to monitor the resources allocated to your Postgres instance and its usage, which will come in handy later. After enabling autoscaling, follow these steps to execute the load test:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create your project folder and test.sql file:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;pgbench-load-test
&lt;span class="nb"&gt;cd &lt;/span&gt;pgbench-load-test
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"SELECT log(factorial(32000)) / log(factorial(20000));"&lt;/span&gt; &amp;amp;gt&lt;span class="p"&gt;;&lt;/span&gt; test.sql&lt;span class="s1"&gt;'
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Execute the load test by running the following command:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pgbench &lt;span class="nt"&gt;-f&lt;/span&gt; test.sql &lt;span class="nt"&gt;-c&lt;/span&gt; 8 &lt;span class="nt"&gt;-T&lt;/span&gt; 120 &lt;span class="nt"&gt;-P&lt;/span&gt; 1 &amp;amp;lt&lt;span class="p"&gt;;&lt;/span&gt;YOUR_CONNECTION_STRING&amp;amp;gt&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;you can create a &lt;a href="https://console.neon.tech" rel="noopener noreferrer"&gt;Neon project&lt;/a&gt; if you don’t have a connection string.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Navigate to the autoscaling graph to monitor usage:&lt;/li&gt;
&lt;/ol&gt;



&lt;p&gt;You should observe a rapid change in CPU and memory allocated. The result should look similar to the graph below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fneondatabase.wpengine.com%2Fwp-content%2Fuploads%2F2024%2F02%2FScreenshot-2024-02-23-at-10.04.45-982x1024.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fneondatabase.wpengine.com%2Fwp-content%2Fuploads%2F2024%2F02%2FScreenshot-2024-02-23-at-10.04.45-982x1024.png" alt="Autoscaling graph"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The performance summary returned by &lt;code&gt;pgbench&lt;/code&gt; should look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;latency average &lt;span class="o"&gt;=&lt;/span&gt; 6000.891 ms
latency stddev &lt;span class="o"&gt;=&lt;/span&gt; 2768.066 ms
initial connection &lt;span class="nb"&gt;time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; 3712.770 ms
tps &lt;span class="o"&gt;=&lt;/span&gt; 4.978907 &lt;span class="o"&gt;(&lt;/span&gt;without initial connection &lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On average, each operation took slightly over 6 seconds to complete. A standard deviation of 2768.066 ms means that the latencies of individual operations varied quite a bit around the average latency. A higher standard deviation indicates more variability in how long each operation took to complete.&lt;/p&gt;

&lt;p&gt;Establishing this connection took approximately 3.7 seconds before any operations could be performed. A TPS of around 4.98 means that, on average, the database was able to complete nearly five transactions every second during the test, after excluding the initial connection time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;pgbench&lt;/code&gt; is a simple yet powerful tool to test your database and simulate multiple clients running heavy SQL queries. We also saw how to examine the query execution plan with &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt;, which provides insights to optimize your SQL queries.&lt;/p&gt;

&lt;p&gt;If you’re running an application that can be subject to varying workloads, autoscaling offers you the confidence that your database will always under the stress of real-world demands.&lt;/p&gt;

&lt;p&gt;Thanks for reading. If you are curious about autoscaling, &lt;a href="https://console.neon.tech" rel="noopener noreferrer"&gt;give Neon a try&lt;/a&gt; and join our &lt;a href="https://neon.tech/discord" rel="noopener noreferrer"&gt;Discord&lt;/a&gt;. We look forward to seeing you there and hearing your feedback.&lt;/p&gt;

&lt;p&gt;Happy scaling!&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>test</category>
      <category>scale</category>
    </item>
    <item>
      <title>Point In Time Recovery Under the Hood in Serverless Postgres</title>
      <dc:creator>Raouf Chebri</dc:creator>
      <pubDate>Thu, 22 Feb 2024 12:44:01 +0000</pubDate>
      <link>https://forem.com/neon-postgres/point-in-time-recovery-under-the-hood-in-serverless-postgres-2dhn</link>
      <guid>https://forem.com/neon-postgres/point-in-time-recovery-under-the-hood-in-serverless-postgres-2dhn</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fneondatabase.wpengine.com%2Fwp-content%2Fuploads%2F2024%2F02%2Fimage-28-1024x576.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fneondatabase.wpengine.com%2Fwp-content%2Fuploads%2F2024%2F02%2Fimage-28-1024x576.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Imagine working on a crucial project when suddenly, due to an unexpected event, you lose significant chunks of your database. Whether it’s a human error, a malicious attack, or a software bug, data loss is a nightmare scenario. But fear not! We recently added support for &lt;a href="https://dev.to/evanatneon/announcing-point-in-time-restore-864-temp-slug-3100656"&gt;Point-In-Time Restore (PITR)&lt;/a&gt; to Neon, so you can turn back the clock to a happier moment before things went south.&lt;/p&gt;

&lt;p&gt;In the video below and in the &lt;a href="https://dev.to/evanatneon/announcing-point-in-time-restore-864-temp-slug-3100656"&gt;PITR announcement article&lt;/a&gt;, my friend Evan shows you can recover your data in a few clicks. He also uses Time Travel Assist to observe the state of the database at a given timestamp to confidently and safely run the restore process.&lt;/p&gt;



&lt;p&gt;How is this possible? This article is for those interested in understanding how PITR works under the hood in Neon. To better explain this, we will: &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Cover the basics of PITR in Postgres &lt;/li&gt;
&lt;li&gt;Explore the underlying infrastructure that allows for PITR in Neon. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We’ll ensure by the end of this post that you’re always prepared for disaster strikes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding the basics of Point In-Time Recovery in Postgres
&lt;/h2&gt;

&lt;p&gt;PITR in Postgres is made possible using two key components:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Write-Ahead Logging&lt;/strong&gt; : Postgres uses &lt;a href="https://www.postgresql.org/docs/current/wal-intro.html" rel="noopener noreferrer"&gt;Write-Ahead Logging&lt;/a&gt; (WAL) to record all changes made to the database. Think of WAL as the database’s diary, keeping track of every detail of its day-to-day activities. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Base backups&lt;/strong&gt; : Base backups are snapshots of your database at a particular moment in time. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;With these two elements combined, you define a strategy to restore your database to any point after the base backup was taken, effectively traveling through your database’s timeline. However, you’d need to do some groundwork, which consists of the following:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Setting up WAL archiving:&lt;/strong&gt; By defining an &lt;code&gt;archive_command&lt;/code&gt; and setting &lt;code&gt;archive_mode&lt;/code&gt; to &lt;code&gt;on&lt;/code&gt;  in your &lt;code&gt;postgresql.conf&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Creating base backups:&lt;/strong&gt; You can use the &lt;code&gt;pg_basebackup&lt;/code&gt; to create daily backups.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2FDR4PajEMGMxzyTFgdbCNEUmieSgTLWZsjfaN94aUmc5mdNV1Fa3ZAkr56df29EdFfG-U5kC_8Zg7MDSqP6aJCHf0ZhpjFEfKdKhCXtHlGAUudLiCF4iuXViEXZCZJx7y3pYlo8p5cwvRTiduMn45Xuc" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2FDR4PajEMGMxzyTFgdbCNEUmieSgTLWZsjfaN94aUmc5mdNV1Fa3ZAkr56df29EdFfG-U5kC_8Zg7MDSqP6aJCHf0ZhpjFEfKdKhCXtHlGAUudLiCF4iuXViEXZCZJx7y3pYlo8p5cwvRTiduMn45Xuc"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If, for any reason, you need to restore your database, you need to recover the latest backup and replay the WAL on top of it. The same logic applies to restoring from a point in time in the retention period. &lt;/p&gt;

&lt;p&gt;Let’s say we want to restore the database to its state on February 1st at 14:30. We first locate the last backup file created before that target time, restore it, and then replay the WAL up to that time. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2FvTul66-QTVPuOMRscFhgCSpHVLZBUxNENuxuIVl0c9Vd8nvuoeFQiqOqW-TpMQ0-ZcmTffmzs4OF8TwE1on5qVQAhYPSPYK7ub9oKPZIkTPlghMzVQu9U8jQCcjQHGqsik8J9_PcYOBPVH1B2bQansA" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2FvTul66-QTVPuOMRscFhgCSpHVLZBUxNENuxuIVl0c9Vd8nvuoeFQiqOqW-TpMQ0-ZcmTffmzs4OF8TwE1on5qVQAhYPSPYK7ub9oKPZIkTPlghMzVQu9U8jQCcjQHGqsik8J9_PcYOBPVH1B2bQansA"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Great! We now know how to perform a PITR in Postgres. However, there are a few limitations to this approach:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You might notice a drop in performance while performing backups, &lt;/li&gt;
&lt;li&gt;Because you have a finite storage capacity, you must define a limit to your archived WAL. This limit is known as the retention period (a.k.a history retention), which determines how far back in time your data can be restored.&lt;/li&gt;
&lt;li&gt;You have a single point of failure (SPOF) since all base backups and WAL archives are in the same location.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We can enhance our architecture by adopting disaster recovery tools like &lt;a href="https://pgbarman.org/" rel="noopener noreferrer"&gt;Barman&lt;/a&gt; to avoid SPOF and downtime. With Barman, Postgres streams base backups and WAL archives to an external backup server. Or, if you know what you’re doing, you can configure Postgres to stream base backups and WAL archives to an AWS S3 bucket, and add a standby, which serves as an exact copy of your database, to avoid downtime. Your setup would look like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2Fhj6272QvVTzQ0MIduTt6MpFoCY7fMJSdDJjo9jE0yzRzokzKaZ4B5A1HymLIIP6g8FbblXxsR5ks73VPWI0yTvHQFCQ8JSiaYIV5YnhFmf4ORS6bwEXS_SCLtMnoHsSZ1mJltkpk13xKRLpnFyZ06nQ" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2Fhj6272QvVTzQ0MIduTt6MpFoCY7fMJSdDJjo9jE0yzRzokzKaZ4B5A1HymLIIP6g8FbblXxsR5ks73VPWI0yTvHQFCQ8JSiaYIV5YnhFmf4ORS6bwEXS_SCLtMnoHsSZ1mJltkpk13xKRLpnFyZ06nQ"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To sum it up and to perform a PITR in Postgres without downtime, you need to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Have a backup server&lt;/li&gt;
&lt;li&gt;Set up WAL archiving and stream it to the backup&lt;/li&gt;
&lt;li&gt;Schedule daily backups&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Additionally, you need to install a bunch of packages and configure and maintain this infrastructure, a time that can be spent focused on your application instead. It’s that convenience, simplicity, and confidence in your data of use that Neon offers.&lt;/p&gt;

&lt;p&gt;So, how do we make it look so easy? Let’s step back and explain how Neon’s storage engine works.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding Neon’s architecture
&lt;/h2&gt;

&lt;p&gt;Neon’s philosophy is that the “database is its logs”. In our case: “Postgres is its WAL records”.&lt;/p&gt;

&lt;p&gt;Neon configures Postgres to stream the WAL to a custom Rust-based storage engine. Neon’s storage engine is composed of three parts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A persistence layer called “&lt;a href="https://github.com/neondatabase/neon/blob/main/docs/rfcs/014-safekeepers-gossip.md" rel="noopener noreferrer"&gt;Safekeepers&lt;/a&gt;” makes sure the written data is never lost, &lt;a href="https://neon.tech/blog/paxos" rel="noopener noreferrer"&gt;using Paxos as a consensus algorithm&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;A storage layer called “Pageservers”: multi-tenant storage that can reconstruct the data from WAL and send it to Postgres.&lt;/li&gt;
&lt;li&gt;A second persistence layer to durably store the WAL in AWS S3.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And since all the data is stored in Neon’s storage engine, Postgres doesn’t need to persist data on the local disk. This turns Postgres into a stateless compute instance that can start in under 500ms, making Neon serverless. &lt;/p&gt;

&lt;p&gt;As a result, we no longer require: &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A standby: because, in the case of a Postgres crash, we can quickly spin up another instance.&lt;/li&gt;
&lt;li&gt;Backups: Neon’s storage engine stores the WAL and creates and performs &lt;a href="https://en.wikipedia.org/wiki/Compaction" rel="noopener noreferrer"&gt;compactions&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The data flow would look like the following:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2FziTbF_Fwf1anMRcVZwo4f7DbNWmhSFVvJOXqL7x-B2lTZ-zeq6m7eVxwGXFMTg4_8kd8-fociJ-ka4QCKntbS3jj5L7F7HAJ2TXuCCHbixTFo6m0ukn_keRa1ZsLRD0Ryn9vx0Y2xg45-OIQBK7XysI" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2FziTbF_Fwf1anMRcVZwo4f7DbNWmhSFVvJOXqL7x-B2lTZ-zeq6m7eVxwGXFMTg4_8kd8-fociJ-ka4QCKntbS3jj5L7F7HAJ2TXuCCHbixTFo6m0ukn_keRa1ZsLRD0Ryn9vx0Y2xg45-OIQBK7XysI"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Check out the &lt;a href="https://neon.tech/blog/architecture-decisions-in-neon" rel="noopener noreferrer"&gt;&lt;em&gt;Architecture decisions in Neon&lt;/em&gt; article by Heikki Linnakangas&lt;/a&gt; to learn more.&lt;/p&gt;

&lt;p&gt;To understand the magic behind PITR in Neon, we’ll explore how the Pageservers work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pageservers: under the hood
&lt;/h2&gt;

&lt;p&gt;Each transaction in the WAL is associated with a Log Sequence Number (LSN), marking the byte position in the WAL stream where the record of that transaction starts. If we follow our initial analogy of WAL being a detailed diary of everything in the database, then the LSN is the page number in that diary.&lt;/p&gt;

&lt;p&gt;The Pageserver can be represented by a 2-dimensional graph, where the Y-axis is the &lt;code&gt;LSN&lt;/code&gt;, and the X-axis is the &lt;code&gt;key&lt;/code&gt; that points to the database, relation, and then block number. A key for example can point to certain rows in your database.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2FfRlrDbEpnnuLSCTH2XwuiuhsU74euugyHI-ebB7EPrvwR0FbuEDSgkG9HvkzeDZwPyIrF_dQTz2hWIXHEl0NgKILbydD5QPMlJz5sKuFuLDneJKsOWrtyx4oRVJk8AJL58zdY5yLxdAJildhuEOMuAI" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2FfRlrDbEpnnuLSCTH2XwuiuhsU74euugyHI-ebB7EPrvwR0FbuEDSgkG9HvkzeDZwPyIrF_dQTz2hWIXHEl0NgKILbydD5QPMlJz5sKuFuLDneJKsOWrtyx4oRVJk8AJL58zdY5yLxdAJildhuEOMuAI"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When data is written in Neon, the role of Pageservers is to accumulate WAL records. Then, when these records reach approximately 1GB in size, Pageservers create two types of immutable layer files:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Image layers (bars)&lt;/strong&gt;: contain a &lt;strong&gt;&lt;em&gt;snapshot&lt;/em&gt;&lt;/strong&gt; of a key range for a specific LSN. You can see Image Layers as the state of rows in certain tables or indexes at a given time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Delta layers (rectangles)&lt;/strong&gt;: contain the &lt;strong&gt;&lt;em&gt;incremental changes&lt;/em&gt;&lt;/strong&gt; within a key range. You can see Delta layers as a log of all the changes that happened to your rows.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Does this sound familiar?&lt;/p&gt;

&lt;p&gt;Indeed, it employs the same principle as the traditional Postgres setups for PITR we’ve previously discussed, which include base backups and WAL archiving. The main difference here is that you don’t need to initiate a lengthy and complex restore procedure every time you wish to read data from a previous state of the database. This is because Pageservers inherently know how to reconstruct the state of the page at any given LSN or timeline.&lt;/p&gt;

&lt;p&gt;Ephemeral branches&lt;/p&gt;

&lt;p&gt;We mentioned previously that, in Postgres, each WAL record is associated with an LSN. In Neon, Postgres tracks the last evicted LSN in the buffer cache, so Postgres knows at which point in time it should fetch the data. &lt;/p&gt;

&lt;p&gt;When Postgres requests a page from the Pageserver, it triggers the &lt;a href="https://github.com/neondatabase/neon/blob/main/pageserver/pagebench/src/cmd/getpage_latest_lsn.rs" rel="noopener noreferrer"&gt;GetPage@LSN&lt;/a&gt; function, which returns the state of a given key at that specific LSN.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fneondatabase.wpengine.com%2Fwp-content%2Fuploads%2F2024%2F02%2F87-1024x456.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fneondatabase.wpengine.com%2Fwp-content%2Fuploads%2F2024%2F02%2F87-1024x456.jpg"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Read the &lt;a href="https://neon.tech/blog/get-page-at-lsn" rel="noopener noreferrer"&gt;Deep dive in Neon’s storage engine&lt;/a&gt; article to learn more about Neon’s architecture.&lt;/p&gt;

&lt;p&gt;In practice, you can access different timelines through database branches. These branches are copy-on-write clones of your database, representing the state of your data at any point in its history. When you create a branch, you specify the LSN (or a timestamp), and Neon’s control plane generates a timeline associated with your project, keeping track of it.&lt;/p&gt;

&lt;p&gt;We’ve enhanced the Point In Time Recovery (PITR) feature in Neon with Time Travel Assist. This functionality allows you to perform Time Travel queries to review the state of your database at a specific timestamp or LSN, following the same underlying steps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Creating a timeline, and&lt;/li&gt;
&lt;li&gt;Running &lt;a href="mailto:GetPage@LSN"&gt;GetPage@LSN&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, these branches are ephemeral, having a Time To Live (TTL) of 10 seconds. We refer to these as ephemeral branches, and they will soon become a crucial part of your development workflows.&lt;/p&gt;

&lt;p&gt;Ephemeral branches enable you to connect to a previous state of your database by merely specifying the LSN or timestamp in your connection string. This capability is natively supported by Pageservers, and Neon’s PITR feature is the first step towards making ephemeral connections available to developers. Stay tuned for more development in this area.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;While Postgres’ features offer powerful options and tools like Barman to help with disaster recovery, Neon’s approach makes PITR reliable, accessible, efficient, and integrated into a seamless database management experience. &lt;/p&gt;

&lt;p&gt;By first exploring how to do PITR in Postgres, we’ve learned about the importance of continuous archiving and creating base backups. &lt;/p&gt;

&lt;p&gt;Neon’s storage engine saves WAL records and snapshots of your database and can natively reconstruct data for any point in time in your history. This capability allows for the Time Travel Assist to query your database at a given timestamp before you proceed to its restoration using short-lived or ephemeral branches.&lt;/p&gt;

&lt;p&gt;Ephemeral branches introduce a unique way to interact with your data’s history by allowing developers to access different timelines and perform Time Travel queries to provide the ability to review prior states and understand your data’s lifecycle.&lt;/p&gt;

&lt;p&gt;What about you? How often do you use PITR in your projects? Join us on &lt;a href="https://neon.tech/discord" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; and let us know how we can enhance your Postgres experience in the cloud.&lt;/p&gt;

&lt;p&gt;Special thanks to &lt;a href="https://twitter.com/skeptrune" rel="noopener noreferrer"&gt;skeptrune&lt;/a&gt; for reviewing and suggesting adding a mention to Barman.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>recovery</category>
      <category>disaster</category>
    </item>
    <item>
      <title>PgBouncer: The one with prepared statements</title>
      <dc:creator>Raouf Chebri</dc:creator>
      <pubDate>Thu, 15 Feb 2024 09:43:20 +0000</pubDate>
      <link>https://forem.com/neon-postgres/pgbouncer-the-one-with-prepared-statements-198i</link>
      <guid>https://forem.com/neon-postgres/pgbouncer-the-one-with-prepared-statements-198i</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fneondatabase.wpengine.com%2Fwp-content%2Fuploads%2F2024%2F02%2Fimage-26-1024x576.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fneondatabase.wpengine.com%2Fwp-content%2Fuploads%2F2024%2F02%2Fimage-26-1024x576.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The latest release of &lt;a href="https://github.com/pgbouncer/pgbouncer/releases/tag/pgbouncer_1_21_0" rel="noopener noreferrer"&gt;PgBouncer 1.22.0&lt;/a&gt; increases query throughput by 15% to 250% and includes support for &lt;code&gt;DEALLOCATE ALL&lt;/code&gt; and &lt;code&gt;DISCARD ALL&lt;/code&gt;, as well as protocol-level prepared statements released in &lt;a href="https://github.com/pgbouncer/pgbouncer/releases/tag/pgbouncer_1_21_0" rel="noopener noreferrer"&gt;1.21.0&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In this article, we’ll explore what prepared statements are and how to use PgBouncer to optimize your queries in Postgres.&lt;/p&gt;

&lt;h2&gt;
  
  
  What are Prepared Statements?
&lt;/h2&gt;

&lt;p&gt;In Postgres, a prepared statement is a feature that allows you to create and optimize an SQL query once and then execute it multiple times with different parameters. It’s a template where you define the structure of your query and later fill in the specific values you want to use.&lt;/p&gt;

&lt;p&gt;Here’s an example of creating a prepared statement with &lt;code&gt;PREPARE&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;PREPARE&lt;/span&gt; &lt;span class="n"&gt;user_fetch_plan&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;username&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="nv"&gt;"&amp;gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, &lt;code&gt;user_fetch_plan&lt;/code&gt; is the name of the prepared statement, and &lt;code&gt;$1&lt;/code&gt; is a placeholder for the parameter. &lt;/p&gt;

&lt;p&gt;Here is how to execute the prepared statement:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXECUTE&lt;/span&gt; &lt;span class="n"&gt;user_fetch_plan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'alice'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="nv"&gt;"&amp;gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This query will fetch all columns from the &lt;code&gt;users&lt;/code&gt; table where the &lt;code&gt;username&lt;/code&gt; is &lt;code&gt;alice&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Use Prepared Statements?
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Performance&lt;/strong&gt; : Since the SQL statement is parsed and the execution plan is created only once, subsequent executions can be faster. However, this benefit might be more noticeable in databases with heavy and repeated traffic.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Security&lt;/strong&gt; : Prepared statements are a great way to avoid SQL injection attacks. Since data values are sent separately from the query, they aren’t executed as SQL, making injecting malicious SQL code difficult.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What is PgBouncer?
&lt;/h2&gt;

&lt;p&gt;Before diving into what PgBouncer is, let’s take a step back and briefly touch on how Postgres operates. &lt;/p&gt;

&lt;p&gt;Postgres runs on a system of several interlinked processes, with the &lt;code&gt;postmaster&lt;/code&gt; taking the lead. This initial process kicks things off, supervises other processes, and listens for new connections. The &lt;code&gt;postmaster&lt;/code&gt; also allocates a shared memory for these processes to interact.&lt;/p&gt;

&lt;p&gt;Whenever a client wants to establish a new connection, the &lt;code&gt;postmaster&lt;/code&gt; creates a new backend process for that client. This new connection starts a session with the backend, which stays active until the client decides to leave or the connection drops.&lt;/p&gt;

&lt;p&gt;Here’s where it gets tricky: Many applications, such as serverless backends, open numerous connections, and most eventually become inactive. Postgres needs to create a unique backend process for each client connection. When many clients try to connect, more memory is needed. In Neon, for example, the default maximum number of &lt;a href="https://neon.tech/docs/connect/connection-pooling#default-connection-limits" rel="noopener noreferrer"&gt;concurrent direct connections is set to 100&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;The solution to this problem is connection pooling with PgBouncer, which helps keep the number of active backend processes low.&lt;/p&gt;

&lt;p&gt;PgBouncer is a lightweight connection pooler which primary function is to manage and maintain a pool of database connections to overcome Postgres’ connection limitations. Neon projects come by default with direct and pooled connections. The latter uses PgBouncer and currently offers up to 10,000 connections.&lt;/p&gt;

&lt;p&gt;Depending on your database provider, you'll have different ways to access to PgBouncer. On Neon, you can check the “Pooled connection” box in the connection details widget and make sure is contains the -pooler suffix.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;postgres://johndoe:mypassword@ep-billowing-wood-25959289-pooler.us-east-1.aws.neon.tech/neondb"&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2FRnGdfOkY2GRjsa2zooCkJkdq838AK63X9LvXn2zvuEjbpNz3Hc3rVbwottAaEwQRkZ1NQd5USaFgMiKDJvtL5HUI5sUh058PTSG5NelFpJyJ8uwHjmQEavFjmgxp2BxmOugIrDpf-I1C-MriITe-Lkk" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-us.googleusercontent.com%2FRnGdfOkY2GRjsa2zooCkJkdq838AK63X9LvXn2zvuEjbpNz3Hc3rVbwottAaEwQRkZ1NQd5USaFgMiKDJvtL5HUI5sUh058PTSG5NelFpJyJ8uwHjmQEavFjmgxp2BxmOugIrDpf-I1C-MriITe-Lkk"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Using Prepared Statements with PgBouncer in client libraries:
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/pgbouncer/pgbouncer/releases/tag/pgbouncer_1_22_0" rel="noopener noreferrer"&gt;PgBouncer&lt;/a&gt; supports prepared statements at the protocol level, and therefore, the above SQL-level prepared statement using &lt;code&gt;PREPARE&lt;/code&gt; and &lt;code&gt;EXECUTE&lt;/code&gt; will not work with PgBouncer. See &lt;a href="https://www.pgbouncer.org/config.html#max_prepared_statements" rel="noopener noreferrer"&gt;PgBouncer’s documentation&lt;/a&gt; for more information.&lt;/p&gt;

&lt;p&gt;However, you can use prepared statements with pooled connections in a client library. Most PostgreSQL client libraries offer support for prepared statements, often abstracting away the explicit use of &lt;code&gt;PREPARE&lt;/code&gt; and &lt;code&gt;EXECUTE&lt;/code&gt;. Here’s how you might use it in a few popular languages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="n"&gt;using&lt;/span&gt; &lt;span class="n"&gt;psycopg2&lt;/span&gt;
&lt;span class="n"&gt;cur&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;quot&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="n"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="n"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;username&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;quot&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;alice&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,),&lt;/span&gt; &lt;span class="n"&gt;prepare&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetchall&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// using pg  &lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
   &lt;span class="c1"&gt;// give the query a unique name&lt;/span&gt;
   &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;fetch-user&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;SELECT * FROM users WHERE username = $1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;values&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;alice&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In these client libraries, the actual SQL command is parsed and prepared on the server, and then the data values are sent separately, ensuring both efficiency and security.&lt;/p&gt;

&lt;p&gt;Under the hood, PgBouncer examines all the queries sent as a prepared statement by clients and assigns each unique query string an internal name (e.g. PGBOUNCER_123). PgBouncer rewrites each command that uses a prepared statement to use the matching internal name before forwarding the corresponding command to Postgres.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;                +-------------+
                | Client |
                +------+------+
                       |
                       | Sends Prepared Statement &lt;span class="o"&gt;(&lt;/span&gt;e.g., &amp;amp;quot&lt;span class="p"&gt;;&lt;/span&gt;SELECT &lt;span class="k"&gt;*&lt;/span&gt; FROM &lt;span class="nb"&gt;users &lt;/span&gt;WHERE &lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; ?&amp;amp;quot&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
                       |
                +------v------+
                | PgBouncer |
                | |
                | 1. Examines and tracks the client&lt;span class="s1"&gt;'s statement. |
                | 2. Assigns an internal name (e.g., PGBOUNCER_123).|
                | 3. Checks if the statement is already prepared |
                | on the PostgreSQL server. |
                | 4. If not, prepares the statement on the server. |
                | 5. Rewrites the client'&lt;/span&gt;s &lt;span class="nb"&gt;command &lt;/span&gt;to use the |
                | internal name. |
                +------^------+
                       |
                       | Forwards Rewritten Statement &lt;span class="o"&gt;(&lt;/span&gt;e.g., &amp;amp;quot&lt;span class="p"&gt;;&lt;/span&gt;SELECT &lt;span class="k"&gt;*&lt;/span&gt; FROM &lt;span class="nb"&gt;users &lt;/span&gt;WHERE &lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; ?&amp;amp;quot&lt;span class="p"&gt;;&lt;/span&gt; as PGBOUNCER_123&lt;span class="o"&gt;)&lt;/span&gt;
                       |
                +------v------+
                | PostgreSQL |
                | Server |
                | |
                | Executes the forwarded statement with the internal name. |
                +-------------+&lt;span class="s2"&gt;"&amp;gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  In Summary
&lt;/h2&gt;

&lt;p&gt;PgBouncer bridges the gap between the inherent connection limitations of Postgres and the ever-growing demand for higher concurrency in modern applications. &lt;/p&gt;

&lt;p&gt;Leveraging prepared statements can be a valuable asset to boost your Postgres query performance and adds a layer of security against potential SQL injection attacks when using pooled connections. &lt;/p&gt;

&lt;p&gt;You can try prepared statements in PgBouncer with Neon today. We can’t wait to see what you build using it. Happy querying.&lt;br&gt;&lt;br&gt;
If you have any questions or feedback, don’t hesitate to get in touch with us on &lt;a href="https://neon.tech/discord" rel="noopener noreferrer"&gt;Discord&lt;/a&gt;. We’d love to hear from you.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>pgbouncer</category>
    </item>
    <item>
      <title>pgvector: 30x Faster Index Build for your Vector Embeddings</title>
      <dc:creator>Raouf Chebri</dc:creator>
      <pubDate>Wed, 07 Feb 2024 15:43:47 +0000</pubDate>
      <link>https://forem.com/neon-postgres/pgvector-30x-faster-index-build-for-your-vector-embeddings-46da</link>
      <guid>https://forem.com/neon-postgres/pgvector-30x-faster-index-build-for-your-vector-embeddings-46da</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5gsml9s8t1p9imi47u93.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5gsml9s8t1p9imi47u93.jpg" alt="Image description" width="800" height="450"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;We are Neon, the serverless Postgres. We power thousands of AI apps with the pgvector extension and separate storage and compute enabling your database resources to scale independently. In this article, Raouf explains how you can use Neon’s elasticity, and parallel HNSW index build in pgvector (0.5.1 for now, and 0.6.0 soon) to scale your AI apps.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Postgres’ most popular vector search extension, pgvector, recently implemented a parallel index build feature, which significantly improves the Hierarchical Navigable Small World (HNSW) index build time by a factor of 30.&lt;/p&gt;

&lt;p&gt;Congratulations to &lt;a href="https://github.com/ankane" rel="noopener noreferrer"&gt;Andrew Kane&lt;/a&gt; and pgvector contributors for this release, which solidifies Postgres’ position as one of the best databases for vector search and allows you to utilize the full power of your database to build the index.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--y8Cdewke--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/JF6GlzjwCLdIxG0PpOh66Q8GgqU60Ea_dXyGbGoKxjMCPQMtMMjzweMs4o9FeCBXY_ZKYNJQ2TuO8F-tUTFypUmN97XtyqhRgBM1ZjHg1wccgN5-IxTH5fpVQ7xrdM7l10lj99cJsmeYcOMPF-QGd0c" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--y8Cdewke--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/JF6GlzjwCLdIxG0PpOh66Q8GgqU60Ea_dXyGbGoKxjMCPQMtMMjzweMs4o9FeCBXY_ZKYNJQ2TuO8F-tUTFypUmN97XtyqhRgBM1ZjHg1wccgN5-IxTH5fpVQ7xrdM7l10lj99cJsmeYcOMPF-QGd0c" alt="Post image" width="800" height="516"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Tests run by Johnathan Katz using a 10M dataset with 1,536-dimension vectors on a 64 vCPU, 512GB RAM instance.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;With Neon’s elastic capabilities and its architecture that separates storage and compute, you can, from the console or using the Neon API, allocate additional resources to your Postgres instance specifically for your HNSW index build process and then scale down to meet user demands, making Neon and pgvector a match made in heaven for efficient AI applications that scale to millions of users.&lt;/p&gt;

&lt;p&gt;This article details how you can use pgvector with Neon.&lt;/p&gt;
&lt;h2&gt;
  
  
  The power of pgvector
&lt;/h2&gt;

&lt;p&gt;Pgvector is Postgres’ most popular extension for vector similarity search. Vector search has become increasingly crucial to semantic search and Retrieval Augmented Generation (RAG) applications, enhancing the long-term memory of large language models’ (LLMs).&lt;/p&gt;

&lt;p&gt;In both semantic search and RAG use cases, the database contains a knowledge base that the LLM wasn’t trained on, split into a series of texts or chunks. Each text is saved in a row and is associated with a vector generated by an embedding model such as &lt;a href="https://platform.openai.com/docs/guides/embeddings/embedding-models" rel="noopener noreferrer"&gt;OpenAI’s ada-embedding-002&lt;/a&gt; or &lt;a href="https://docs.mistral.ai/platform/client/#embeddings" rel="noopener noreferrer"&gt;Mistral-AI’s mistral-embed&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Vector search is then used to find the most similar (closer) text to the query vector. This is achieved by comparing the query vector with every row in the database, making vector search hard to scale. This is why pgvector implemented &lt;a href="https://en.wikipedia.org/wiki/(1%2B%CE%B5)-approximate_nearest_neighbor_search" rel="noopener noreferrer"&gt;approximate nearest neighbor (ANN) algorithms&lt;/a&gt; (or indexes), which conduct the vector search over a subset of the database to avoid lengthy sequential scans.&lt;/p&gt;

&lt;p&gt;One of the most efficient ANN algorithms is the Hierarchical Navigable Small World (HNSW) index. Its graph-based and multi-layered nature is designed for billions-of-row vector search. This makes HNSW extremely fast and efficient at scale and one of the most popular indexes in the vector store market.&lt;/p&gt;
&lt;h2&gt;
  
  
  HNSW’s Achilles heel: memory and build time
&lt;/h2&gt;

&lt;p&gt;HNSW was first introduced by Yu A Malkov and Dmitry A. Yashunin in their paper titled Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs.&lt;/p&gt;

&lt;p&gt;HNSW is a graph-based approach to indexing high-dimensional data. It constructs a hierarchy of graphs, where each layer is a subset of the previous one, which results in a time complexity of &lt;code&gt;O(log(rows))&lt;/code&gt;. During the search, it navigates through these graphs to quickly find the nearest neighbors.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--RvQVmOyF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/KqNnqzGKbZDgAGUD-Mbv_kUs4igPMlxV2t-L-OnHbMONP-KQ91MhNE1VwMhP9XHCjKGXXxFr6wpsBpGxaTR5z8PfiX4cmZPRs6c4MeU3IfvkliMJOQjjS4ghjdekfft16M2SZq7SNAaIBltie-VH7Mg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--RvQVmOyF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/KqNnqzGKbZDgAGUD-Mbv_kUs4igPMlxV2t-L-OnHbMONP-KQ91MhNE1VwMhP9XHCjKGXXxFr6wpsBpGxaTR5z8PfiX4cmZPRs6c4MeU3IfvkliMJOQjjS4ghjdekfft16M2SZq7SNAaIBltie-VH7Mg" alt="Post image" width="716" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As fast and efficient as HNSW is, the index has two drawbacks:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Memory&lt;/strong&gt;: The index requires significantly more memory than other indexes, such as the Inverted File Index (IVFFlat). You can solve the memory issue by having a larger database instance. But if you use standalone Postgres such as AWS RDS, you will find yourself in a position where you over-provision just for the index build. With Neon scaling capabilities, however, you can scale up, build the HNSW index, and then scale back down to save on cost.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--pKP2VR_3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/amQuw2B6GIie69Mfc_2QH1-13H9oVvur1pMutPy8XjosF8BFYAVtfKFlaqu7hQeE1Z6xU-zjqj_faSelXhj8EzulxztxZdprzCCGFE-HBaqPyvmzz9FZ337Mp-9pAGdWdK4cRq5DlQ7K5J6xRYFzqHA" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--pKP2VR_3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/amQuw2B6GIie69Mfc_2QH1-13H9oVvur1pMutPy8XjosF8BFYAVtfKFlaqu7hQeE1Z6xU-zjqj_faSelXhj8EzulxztxZdprzCCGFE-HBaqPyvmzz9FZ337Mp-9pAGdWdK4cRq5DlQ7K5J6xRYFzqHA" alt="Post image" width="800" height="482"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Build time&lt;/strong&gt;: The HNSW index can take hours to build for million-row datasets. This is mainly due to the time spent calculating the distance among vectors. And this is precisely what pgvector 0.6.0 solves by introducing &lt;a href="https://github.com/pgvector/pgvector/issues/409" rel="noopener noreferrer"&gt;Parallel Index Build&lt;/a&gt;. By allocating more CPU and workers, you build your HNSW index 30x faster.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--gN38rRGc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/-D_PE-Rd0kcV-U6x52yJjdwconeLHodXZ1MXTddB2p1q5-uFONE5Moem9RYmTLrB71uXKlA_sSyiN-viT1c9Xt26qbHHvFEwvGlXNEDgD1AmIgCak4GZPyvYQsX-4mwNWYAfGpc2nj31rp1cMihUKaM" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--gN38rRGc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/-D_PE-Rd0kcV-U6x52yJjdwconeLHodXZ1MXTddB2p1q5-uFONE5Moem9RYmTLrB71uXKlA_sSyiN-viT1c9Xt26qbHHvFEwvGlXNEDgD1AmIgCak4GZPyvYQsX-4mwNWYAfGpc2nj31rp1cMihUKaM" alt="Post image" width="716" height="386"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But wait! The HNSW index supports updates, so why is this feature parallel index build necessary if you only need to build the index once?&lt;/p&gt;

&lt;p&gt;Well, there are two cases where you need to create an HNSW index:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;When you want faster queries and to optimize for vector search&lt;/li&gt;
&lt;li&gt;When you already have an HNSW index, and you delete vectors from the table&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The latter might cause the indexed search to return false positives, negatively impacting the quality of the LLM response and the overall performance of your AI application.&lt;/p&gt;
&lt;h2&gt;
  
  
  Scale up and boost index build time
&lt;/h2&gt;

&lt;p&gt;pgvector 0.6.0 speeds up index build time up to 30 times compared to previous versions when using parallel workers. This improvement is especially notable when dealing with large data sets and vector sizes, such as OpenAI 1536 dimension vector embeddings.&lt;/p&gt;

&lt;p&gt;Creating an HNSW index could require significant resources. The reason is you need to allocate enough &lt;code&gt;maintenance_work_mem&lt;/code&gt; to fit the index in memory. Otherwise, the hnsw graph will take significantly longer to be built.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;NOTICE:  hnsw graph no longer fits into maintenance_work_mem after 100000 tuples
DETAIL:  Building will take significantly longer.
HINT:  Increase maintenance_work_mem to speed up builds.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With Neon, you can scale up your Postgres instance using the Console or the API, configure it to build the index, and then scale back down to save on cost.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--orTawCEn--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://neondatabase.wpengine.com/wp-content/uploads/2024/02/Export-1707305452931.mp4" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--orTawCEn--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://neondatabase.wpengine.com/wp-content/uploads/2024/02/Export-1707305452931.mp4" alt="Neon Console Operation" width="" height=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To effectively use parallel index build, it’s essential to configure Postgres with suitable settings. Key parameters to consider are:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;maintenance_work_mem&lt;/strong&gt;: This parameter determines the memory allocated for creating or rebuilding indexes. This parameter affects the performance and efficiency of these operations. Setting this to a high value, such as 8GB, allows for more efficient handling of the index build process.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SET maintenance_work_mem = '8GB';
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;max_parallel_maintenance_workers&lt;/strong&gt;: This dictates the number of parallel workers that can be employed. The default value of max_parallel_maintenance_workers is typically set to 2 in Postgres. Setting this to a high number enables the utilization of more computing resources for faster index builds.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SET max_parallel_maintenance_workers = 7; -- plus leader
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Note: Neon supports for pgvector 0.5.1. However, our engineering team is working on adding support for 0.6.0. Stay tuned.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  How does this affect recall performance?
&lt;/h2&gt;

&lt;p&gt;Recall is as important of a metric as query execution time in RAG applications. Recall is the percentage of correct answers the ANN provides. In the HNSW index, &lt;code&gt;ef_search&lt;/code&gt; is the parameter that determines the number of neighbors to scan at search time. The higher &lt;code&gt;ef_search&lt;/code&gt; is, the higher the recall and the higher the query execution time.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/pgvector/pgvector/issues/409#issuecomment-1898605567" rel="noopener noreferrer"&gt;The tests conducted by Johnathan Katz&lt;/a&gt; show that using parallel builds has negligible impact on recall, with most changes swinging positively by over 1%. Despite the substantial speed improvements, this remarkable stability in recall rates highlights the effectiveness of pgvector 0.6.0’s parallel build process.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--JsJxTTw4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/P-_6yF4v7-mzDO-_AdCzEQlKNz7KqleEIFz1jje5YNktnWlZ-MU5VyillAUJjdo0CZ-ux2PILd7_llFpE_hawJ_kexmF2b6w9zJ6r2G-mZl0fr3IaKRZgfDiFf5VDmg9y8TKsOK-GdpXXC2ZIPU9c0A" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--JsJxTTw4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/P-_6yF4v7-mzDO-_AdCzEQlKNz7KqleEIFz1jje5YNktnWlZ-MU5VyillAUJjdo0CZ-ux2PILd7_llFpE_hawJ_kexmF2b6w9zJ6r2G-mZl0fr3IaKRZgfDiFf5VDmg9y8TKsOK-GdpXXC2ZIPU9c0A" alt="Recall Performance Graph" width="800" height="477"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;pgvector 0.6.0 represents a significant leap forward, proving that Postgres is an important player in the vector search space. By harnessing the power of parallel index building, developers can now construct HNSW indexes more rapidly and efficiently, significantly reducing the time and resources traditionally required for such tasks.&lt;/p&gt;

&lt;p&gt;Neon’s flexible and scalable serverless Postgres offering complements pg vector’s capabilities perfectly. Users can scale their database resources according to their specific needs for index building and then scale down to optimize costs, ensuring an economical yet powerful solution.&lt;/p&gt;

&lt;p&gt;What AI applications are you currently building? &lt;a href="https://console.neon.tech" rel="noopener noreferrer"&gt;Try pgvector on Neon today&lt;/a&gt;, join us on &lt;a href="https://neon.tech/discord" rel="noopener noreferrer"&gt;Discord&lt;/a&gt;, and let us know how we can improve your experience with serverless PostgreSQL.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>pgvector</category>
      <category>vector</category>
      <category>ai</category>
    </item>
    <item>
      <title>Bring Your Own Extensions to Serverless PostgreSQL</title>
      <dc:creator>Raouf Chebri</dc:creator>
      <pubDate>Wed, 17 Jan 2024 14:07:36 +0000</pubDate>
      <link>https://forem.com/neon-postgres/bring-your-own-extensions-to-serverless-postgresql-1ba8</link>
      <guid>https://forem.com/neon-postgres/bring-your-own-extensions-to-serverless-postgresql-1ba8</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy04pls5fo5okulg8itif.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy04pls5fo5okulg8itif.png" alt="Bring Your Own Extensions Cover" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Extensions in PostgreSQL are comparable to libraries in programming languages or plugins in web browsers. They are pivotal in the PostgreSQL ecosystem, providing additional functionalities ranging from encryption and AI to handling time series and geospatial data. More complex extensions can transform PostgreSQL into a graph or analytical database, and some companies even create custom private extensions for specific business logic.&lt;/p&gt;

&lt;p&gt;Neon’s compute in stateless PostgreSQL, which runs as a VM or a Kubernetes pod. The compute image comes with a &lt;a href="https://neon.tech/docs/extensions/pg-extensions" rel="noopener noreferrer"&gt;list of supported extensions&lt;/a&gt;. However, supporting a wide range of PostgreSQL extensions can pose performance and security risks in a multi-tenant serverless environment like Neon. This is why we are excited to announce we added &lt;a href="https://neon.tech/docs/extensions/pg-extensions#custom-built-extensions" rel="noopener noreferrer"&gt;support for private and custom extensions&lt;/a&gt; using Dynamic Extension Loading. &lt;/p&gt;

&lt;p&gt;This feature is currently in beta on request only. You can contact support if you want to bring your own extensions to Neon. In this article, we’ll introduce Dynamic Extension Loading, its implementation, its benefits, and our future plans.&lt;/p&gt;

&lt;h2&gt;
  
  
  Extensions in PostgreSQL
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9b4wtxtczmev9fq5d3gr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9b4wtxtczmev9fq5d3gr.png" alt="PostgreSQL Extension Ecosystem" width="800" height="398"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;PostgreSQL is a robust and versatile database system that is further enhanced by its support for extensions. Some of the most popular extensions are &lt;a href="https://postgis.net/" rel="noopener noreferrer"&gt;PostGIS&lt;/a&gt; for geolocation, &lt;a href="https://www.postgresql.org/docs/current/pgstatstatements.html" rel="noopener noreferrer"&gt;pg_stat_statement&lt;/a&gt;, or &lt;a href="https://github.com/pgvector/pgvector" rel="noopener noreferrer"&gt;pgvector&lt;/a&gt; for vector similarity search. &lt;/p&gt;

&lt;p&gt;Extensions in PostgreSQL come in various forms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SQL Object Packages&lt;/strong&gt;: These are the most common, comprising domain-specific data types, functions, triggers, etc.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Procedural Languages&lt;/strong&gt;: Extensions like PLPython or PLV8 enable the use of different programming languages within PostgreSQL.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Internal API Enhancements&lt;/strong&gt;: Written in C, these powerful extensions can introduce new storage methods, volume replication, background jobs, and configuration parameters.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extensions in Other Languages&lt;/strong&gt;: Beyond C, extensions can be developed in languages like C++ or Rust, broadening the scope of functionality.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To use an extension, it must be built against the correct major version of PostgreSQL. The installation involves placing files in the shared directory and library files in the libdir, paths that vary across platforms. After placing the files, the &lt;code&gt;CREATE EXTENSION&lt;/code&gt; command is executed in the database, prompting PostgreSQL to locate and run the installation scripts for the extension.&lt;/p&gt;

&lt;h2&gt;
  
  
  Extension support limitations in serverless environments
&lt;/h2&gt;

&lt;p&gt;In Neon's serverless PostgreSQL environment, each compute runs as an ephemeral Kubernetes pod or VM. A compute instance can be scaled up, down, and descheduled whenever the workload changes. Therefore, supporting a wide range of PostgreSQL extensions presents significant challenges such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Compatibility&lt;/strong&gt;: Many extensions are not designed for serverless architectures, particularly those needing persistent storage or deep system integration, such as &lt;a href="https://github.com/citusdata/pg_cron" rel="noopener noreferrer"&gt;pg_cron&lt;/a&gt; and &lt;a href="https://www.postgresql.org/docs/current/file-fdw.html" rel="noopener noreferrer"&gt;file_fdw&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance Issues&lt;/strong&gt;: Embedding all extensions in the compute image significantly increases its size, leading to slower start times and reduced performance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maintenance Overhead&lt;/strong&gt;: Traditional methods require frequent updates to the entire compute image for each extension update, causing potential service disruptions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security Risks&lt;/strong&gt;: A larger set of extensions in the base image increases the potential attack surface, especially with extensions that remain unused by many users.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Limited Customization&lt;/strong&gt;: The open-source nature of compute images restricts the inclusion of custom or closed-source extensions, limiting tailored solutions for specific customer needs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Therefore, the conventional method of bundling extension files into compute images is impractical due to the sheer number of extensions and the varied needs of users. This led us to rethink how we provide extensions with Dynamic Extension Loading.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dynamic Extension Loading: A New Approach
&lt;/h2&gt;

&lt;p&gt;At Neon, we've addressed these challenges with our dynamic extension loading mechanism. Here's how it works:&lt;/p&gt;

&lt;p&gt;Building and Storing Extensions: We build extensions in a separate repository and store the resulting files in an S3 bucket.&lt;br&gt;
Configuring Extensions: Extensions are configured per user in the Neon control plane, enhancing customization.&lt;br&gt;
On-Demand Loading: Compute instances download control files at startup, and library files are fetched as needed when extension functions are called.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiszbg91ueiv8onrg4fwu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiszbg91ueiv8onrg4fwu.png" alt="Custom Extension download diagram on Neon" width="800" height="176"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With Dynamic Extension Loading, private and default extensions can be added to compute instances without restarting, reducing maintenance overhead. Additionally, it brings performance benefits to Neon. Our plans with Dynamic Extension Loading include moving all default-supported extensions to the extension storage, resulting in a smaller compute image size and faster start times.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to bring your own extension to Neon
&lt;/h2&gt;

&lt;p&gt;To request support for a Postgres extension, paid plan users can &lt;a href="https://console.neon.tech/app/projects?modal=support" rel="noopener noreferrer"&gt;open a support ticket&lt;/a&gt;. Free plan users can submit a request via the feedback channel on our &lt;a href="https://discord.com/invite/92vNTzKDGp" rel="noopener noreferrer"&gt;Discord Server&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Our engineers will then evaluate the compatibility of your extensions with Neon, build it, and upload the artifacts to the extension storage once it pass all the security tests.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This feature is currently in beta, with plans for general availability in the near future. This development marks a significant step forward in making PostgreSQL more adaptable and efficient in a serverless environment.&lt;/p&gt;

&lt;p&gt;What about you? Do you use PostgreSQL extensions in your projects? Join us on Discord and let us know which extensions you use and how we can enhance your PostgreSQL experience in the cloud.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>extensions</category>
      <category>cloud</category>
      <category>database</category>
    </item>
    <item>
      <title>Change Data Capture with Serverless Postgres</title>
      <dc:creator>Raouf Chebri</dc:creator>
      <pubDate>Thu, 21 Dec 2023 12:23:07 +0000</pubDate>
      <link>https://forem.com/neon-postgres/change-data-capture-with-serverless-postgres-823</link>
      <guid>https://forem.com/neon-postgres/change-data-capture-with-serverless-postgres-823</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk40ge716hp6gprsd0d0a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk40ge716hp6gprsd0d0a.png" alt="Cover Image" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Modern applications often require loosely coupled components and services that help teams and systems to scale. These data pipelines generate continuous data streams that need to be replicated, processed, or analyzed. &lt;/p&gt;

&lt;p&gt;However, moving data between different data stores can seriously compromise the quality and reliability of your decisions because inconsistent data or corruption occurs during transformation. This is why &lt;a href="https://en.wikipedia.org/wiki/Change_data_capture" rel="noopener noreferrer"&gt;Change Data Capture (CDC)&lt;/a&gt; has emerged as one of the most popular methods to synchronize data across multiple data stores. One way to use CDC in Postgres is with &lt;a href="https://www.postgresql.org/docs/current/logical-replication.html#:~:text=Logical%20replication%20is%20a%20method,byte%2Dby%2Dbyte%20replication." rel="noopener noreferrer"&gt;logical replication&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Today, we’re excited to announce the release of logical replication in beta on Neon. This feature lets you stream your data hosted on Neon to external data stores, allowing for change data capture and real-time analytics.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why CDC matters?
&lt;/h2&gt;

&lt;p&gt;CDC refers to the process of capturing changes made to data in a database – such as inserts, updates, and deletes – and then delivering these changes to downstream processes or systems. &lt;/p&gt;

&lt;p&gt;CDC operates by monitoring and capturing data changes in a source database as they occur. This is a departure from traditional batch processing, where data updates are transferred at scheduled intervals. CDC ensures that every change is captured and can be acted upon almost instantaneously.&lt;/p&gt;

&lt;p&gt;Why CDC Matters&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data synchronization: In a distributed system architecture, keeping data synchronized across various platforms and services is critical. CDC facilitates this by providing a mechanism for real-time data replication.&lt;/li&gt;
&lt;li&gt;Minimizing Latency: By capturing changes as they happen, CDC minimizes the latency in data transfer. This is essential for applications where even a slight delay in data availability can lead to significant issues, such as financial trading systems.&lt;/li&gt;
&lt;li&gt;Enabling Event-Driven Architectures: CDC is a cornerstone for building event-driven systems. In such architectures, actions are triggered in response to data changes, making real-time data capture essential.&lt;/li&gt;
&lt;li&gt;Data warehousing and real-time analytics: For organizations relying on data warehouses and analytics tools for decision-making, CDC ensures that the data in these systems is current, enhancing the accuracy of insights.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now that we understand it better, let’s explore the technical mechanics of how CDC is implemented in Postgres through logical replication. &lt;/p&gt;

&lt;h2&gt;
  
  
  Logical replication: under the hood
&lt;/h2&gt;

&lt;p&gt;In Postgres, logical replication is one of the methods of implementing CDC and streaming data from your database to an external source. It uses a publisher-subscriber model. &lt;/p&gt;

&lt;p&gt;Your Neon database works as a publisher, copying first a snapshot of the data and then streaming changes to one or more target data stores (subscribers). This model allows for selective replication, where only specified tables or even specific columns within a table can be replicated.&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F72iu7yda24nhm4jw8tfy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F72iu7yda24nhm4jw8tfy.png" alt="Neon as publisher" width="686" height="479"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Learn more about connecting &lt;a href="https://neon.tech/docs/guides/logical-replication-guide" rel="noopener noreferrer"&gt;Neon to different data stores&lt;/a&gt; in the documentation.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://www.postgresql.org/docs/current/wal-intro.html" rel="noopener noreferrer"&gt;Write-Ahead-Log (WAL)&lt;/a&gt; is a fundamental component in Postgres, designed to ensure data integrity and facilitate recovery. It records every change made to the database, including transactions and their states.&lt;/p&gt;

&lt;p&gt;For logical replication, the WAL serves as the primary data source. The WAL captures the comprehensive sequence of data changes, which are then decoded for replication purposes. Logical replication transforms the WAL to a format accepted by the subscriber through logical decoding, and the &lt;code&gt;walsender&lt;/code&gt; then streams the transformed data using the replication protocol. &lt;/p&gt;

&lt;p&gt;The &lt;code&gt;walsender&lt;/code&gt; initiates the logical decoding of the WAL using an output plugin. Postgres ships with several logical decoding plugins that can output the data in various formats. In addition, new plugins can be developed.&lt;/p&gt;

&lt;p&gt;For instance, in a Postgres-to-Postgres logical replication, the standard &lt;code&gt;pgoutput&lt;/code&gt; plugin transforms the data changes to the logical replication protocol. The transformed data is subsequently streamed using the replication protocol, which maps it to local tables and applies the changes in the exact sequence of the original transactions. However, integrations with non-Postgres systems require an output format different from the standard one specifically designed for Postgres-to-Postgres logical replication. &lt;/p&gt;

&lt;p&gt;Today’s data pipelines involve more than one data store type. For example, you can integrate all your Postgres databases into a data warehouse or streaming platform, such as &lt;a href="https://materialize.com/" rel="noopener noreferrer"&gt;Materialize&lt;/a&gt; or &lt;a href="https://kafka.apache.org/" rel="noopener noreferrer"&gt;Kafka&lt;/a&gt;, to process and analyze data at higher scales. This is why, with the release of logical replication on Neon, we added support for &lt;a href="https://github.com/eulerto/wal2json" rel="noopener noreferrer"&gt;wal2json&lt;/a&gt;, which outputs changes in the JSON format to be easily consumed by other systems and data stores.&lt;/p&gt;

&lt;p&gt;You can read more on &lt;a href="https://neon.tech/blog/cdc-with-materialize" rel="noopener noreferrer"&gt;Change Data Capture using Neon and Materialize&lt;/a&gt; by our friend Marta Paes, to learn how to integrate your database with external systems.&lt;/p&gt;
&lt;h2&gt;
  
  
  Logical vs. physical replication
&lt;/h2&gt;

&lt;p&gt;Logical replication differs from physical replication in that it replicates changes at the data level (row-level changes) rather than replicating the entire database block. This allows for more selective replication and reduces the amount of data transferred. Unlike snapshot replication, which provides a full copy of the data at a specific point in time, logical replication ensures continuous streaming of changes, making it more suitable for applications that require near real-time data availability.&lt;/p&gt;

&lt;p&gt;This comparison highlights the distinct characteristics, advantages, and applications of logical and physical replication.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Logical Replication&lt;/th&gt;
&lt;th&gt;Physical Replication&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Row-Level Changes&lt;/strong&gt;: focuses on replicating specific row-level changes (INSERT, UPDATE, DELETE) in selected tables.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Block-Level Replication&lt;/strong&gt;: replicates the entire database at the block level. It creates an exact copy of the source database, including all tables and system catalogs.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Flexibility&lt;/strong&gt;: Offers the flexibility to replicate specific tables and even specific columns within tables.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Limitations&lt;/strong&gt;: Doesn’t allow for selective table replication and requires the same PostgreSQL version on both the primary and standby servers.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;WAL-based&lt;/strong&gt;: Uses the WAL for capturing changes, but with logical decoding to convert these changes into a readable format for the subscriber.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Streaming Replication&lt;/strong&gt;: Changes are streamed as they are written to the WAL, minimizing lag.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Use Cases&lt;/strong&gt;: Ideal for situations requiring selective replication, minimal impact on the source database, or cross-version compatibility.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Use Cases&lt;/strong&gt;: Best suited for creating read-only replicas for load balancing, high availability, and disaster recovery solutions.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h2&gt;
  
  
  Get started with logical replication
&lt;/h2&gt;

&lt;p&gt;To enable logical replication, navigate to your project’s settings in the console and click on the “Beta” tab, locate Logical Replication then on the “Enable” button. &lt;/p&gt;

&lt;p&gt;Note that enabling logical replication will restart your compute instance, which will drop existing connections. A subscriber may also keep the connection to your Neon database active, preventing your Neon instance from scaling to zero.&lt;/p&gt;

&lt;p&gt;This action is also irreversible, and you will not be able to disable logical replication for your project.&lt;/p&gt;

&lt;p&gt;Ensure logical replication is enabled by running the following query in the SQL Editor within the Neon console or using psql on your terminal.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SHOW wal_level;

 wal_level 
-----------
 logical
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Create a publication
&lt;/h2&gt;

&lt;p&gt;Let’s assume you have the following users table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE TABLE users (

  id SERIAL PRIMARY KEY,

  username VARCHAR(50) NOT NULL,

  email VARCHAR(100) NOT NULL

);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Execute the following query to create a publication for the users table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE PUBLICATION users_publication FOR TABLE users;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Learn more about &lt;a href="https://neon.tech/docs/guides/logical-replication-guide" rel="noopener noreferrer"&gt;how to connect Neon to different data stores&lt;/a&gt; in the documentation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Limitations
&lt;/h2&gt;

&lt;p&gt;While logical replication in Neon Postgres offers numerous benefits for real-time data synchronization and flexibility, it has some limitations:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Publisher, not a subscriber&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This release of logical replication on Neon is in beta, and for security reasons, it does not include subscriber capabilities at the moment. We are currently working on these security constraints, which should be supported in future releases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Logical replication and Auto-suspend&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In a logical replication setup, a subscriber may keep the connection to your Neon publisher database active to poll for changes or perform sync operations, preventing your Neon compute instance from scaling to zero. Some subscribers allow you to configure connection or sync frequency, which may be necessary to continue taking advantage of Neon’s Auto-suspend feature. Please refer to your subscriber’s documentation or contact their support team for details.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Definition Language (DDL) Operations&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Logical replication in Postgres primarily handles Data Manipulation Language (DML) operations like INSERT, UPDATE, and DELETE. However, it does not automatically replicate Data Definition Language (DDL) operations such as CREATE TABLE, ALTER TABLE, or DROP TABLE. This means that schema changes in the publisher database are not directly replicated to the subscriber database.&lt;/p&gt;

&lt;p&gt;Manual intervention is required to replicate DDL changes. This can be done by applying the DDL changes separately in both the publisher and subscriber databases or by using third-party tools that can handle DDL replication.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Replication Lag&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In high-volume transaction environments, there is potential for replication lag. This is the time delay between a transaction being committed on the publisher and the same transaction being applied on the subscriber.&lt;/p&gt;

&lt;p&gt;It’s important to monitor replication lag and understand its impact, especially for applications that require near-real-time data consistency. Proper resource allocation and optimizing the network can help mitigate this issue.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Logical replication is undoubtedly one of the most important features for modern applications. As we continue to develop its capabilities, we encourage you to test, experiment, and push the boundaries of what logical replication can do. Join us on Discord, and share your experiences, suggestions, and challenges with us. &lt;/p&gt;

&lt;p&gt;We can’t wait to see what you build with Neon.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>database</category>
      <category>cdc</category>
      <category>streaming</category>
    </item>
    <item>
      <title>Mixtral 8x7B: What you need to know about Mistral AI’s latest model</title>
      <dc:creator>Raouf Chebri</dc:creator>
      <pubDate>Mon, 11 Dec 2023 17:15:23 +0000</pubDate>
      <link>https://forem.com/neon-postgres/mixtral-8x7b-what-you-need-to-know-about-mistral-ais-latest-model-l3n</link>
      <guid>https://forem.com/neon-postgres/mixtral-8x7b-what-you-need-to-know-about-mistral-ais-latest-model-l3n</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feithsyjh8omjtf3g1wbo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feithsyjh8omjtf3g1wbo.png" alt="Mixtral 8x7B What you need to know Cover" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;We’re Neon, and we’re redefining the database experience with our cloud-native serverless Postgres solution. If you’ve been looking for a database for your RAG apps that adapts to your application loads, you’re in the right place. &lt;a href="https://console.neon.tech/signup" rel="noopener noreferrer"&gt;Give Neon a try&lt;/a&gt;, and let us know what you think. Neon is cloud-native Postgres and scales your AI apps to millions of users with pgvector. In this post, Raouf is going to tell you what you need to know about Mixtral 8x7B, the new LLM by MistralAI.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Mistral AI, the company behind the Mistral 7B model, has released its latest model: Mixtral 8x7B (Mixtral). The model includes support for 32k tokens, better code generation, and it matches or outperforms GPT3.5 on most standard benchmarks.&lt;/p&gt;

&lt;p&gt;In this article, we'll review the new text-generation and embedding models by Mistral AI.&lt;/p&gt;

&lt;h2&gt;
  
  
  Background
&lt;/h2&gt;

&lt;p&gt;Mistral AI has emerged as a strong contender in the open-source large language model sphere with their &lt;a href="https://arxiv.org/pdf/2310.06825.pdf" rel="noopener noreferrer"&gt;Mistral 7B&lt;/a&gt; model, which outperforms existing models like Llama 2 (13B parameters) across multiple benchmarks.&lt;/p&gt;

&lt;p&gt;In a previous comparative analysis, we concluded that, although impressive, the Mistral 7B instruct model optimized for chat needed some improvements before being seen as an alternative to the &lt;code&gt;gpt-*&lt;/code&gt; models.&lt;/p&gt;

&lt;p&gt;Mixtral might change all of that as it’s pushing the frontier of open models. According to a recent benchmark, Mixtral matches or outperforms Llama 2 70B and GPT3.5.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;LLaMA 2 70B&lt;/th&gt;
&lt;th&gt;GPT – 3.5&lt;/th&gt;
&lt;th&gt;Mixtral 8x7B&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;MMLU&lt;/strong&gt;&lt;br&gt;(MCQ in 57 subjects)&lt;/td&gt;
&lt;td&gt;69.9%&lt;/td&gt;
&lt;td&gt;70.0%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;70.6%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;HellaSwag&lt;/strong&gt;&lt;br&gt;(10-shot)&lt;/td&gt;
&lt;td&gt;87.1%&lt;/td&gt;
&lt;td&gt;85.5%&lt;/td&gt;
&lt;td&gt;86.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;ARC Challenge&lt;/strong&gt;&lt;br&gt;(25-shot)&lt;/td&gt;
&lt;td&gt;85.1%&lt;/td&gt;
&lt;td&gt;85.2%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;85.8%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;WinoGrande&lt;/strong&gt;&lt;br&gt;(5-shot)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;83.2%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;81.6%&lt;/td&gt;
&lt;td&gt;81.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;MBPP&lt;/strong&gt;&lt;br&gt;(pass@1)&lt;/td&gt;
&lt;td&gt;49.8%&lt;/td&gt;
&lt;td&gt;52.2%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;60.7%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;GSM-8K&lt;/strong&gt;&lt;br&gt;(5-shot)&lt;/td&gt;
&lt;td&gt;53.6%&lt;/td&gt;
&lt;td&gt;57.1%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;58.4%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;MT Bench&lt;/strong&gt;&lt;br&gt;(for Instruct Models)&lt;/td&gt;
&lt;td&gt;6.86&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;8.32&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;8.30&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Developing with Mixtral 8x7B Instruct
&lt;/h2&gt;

&lt;p&gt;If you plan to fine-tune Mixtral and your own inference, it's important to note that Mixtral requires much more RAM and GPUs than Mistral 7B. While Mistral 7B works well on a 24GB RAM 1 GPU instance, Mixtral requires 64GB of RAM and 2 GPUs, which increases the cost by a factor of 3 (1.3$/h vs. 4.5$/h).&lt;/p&gt;

&lt;p&gt;Luckily for developers, &lt;a href="https://console.mistral.ai/" rel="noopener noreferrer"&gt;Mistral AI has an API&lt;/a&gt; in beta and under an invite gate. They also have client libraries for &lt;a href="https://pypi.org/project/mistralai/" rel="noopener noreferrer"&gt;Python&lt;/a&gt; and &lt;a href="https://www.npmjs.com/package/@mistralai/mistralai" rel="noopener noreferrer"&gt;JavaScript&lt;/a&gt; developers.&lt;/p&gt;

&lt;p&gt;Below is an example of code using the Python library.&lt;/p&gt;

&lt;p&gt;Prerequisite: install the &lt;code&gt;mistraiai&lt;/code&gt; client library using &lt;code&gt;pip&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install mistralai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here is a code example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from mistralai.client import MistralClient
from mistralai.models.chat_completion import ChatMessage

api_key = os.environ["MISTRAL_API_KEY"]
model = "mistral-tiny"

client = MistralClient(api_key=api_key)

messages = [
    ChatMessage(role="user", content="What is the elephant database?")
]

chat_response = client.chat(
    model=model,
    messages=messages,
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you’re familiar with the OpenAI client library, you will notice the similarity between the two SDKs. The Mistral AI library can be used as a drop-in replacement, which makes migrations seamless.&lt;/p&gt;

&lt;p&gt;Mistral AI API provides three models: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;mistral-tiny&lt;/code&gt; based on Mistral-7B-v0.2&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;mistral-small&lt;/code&gt; based on Mixtral-7Bx8-v0.1&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;mistral-medium&lt;/code&gt; based on an internal prototype model&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Mistral-embed: The new embedding model
&lt;/h2&gt;

&lt;p&gt;In addition to the text generation models, Mistral AI’s API gives you access to &lt;a href="https://huggingface.co/BAAI/bge-large-en" rel="noopener noreferrer"&gt;BGE-large-like&lt;/a&gt; 1024-dimension embedding model &lt;code&gt;mistral-embed&lt;/code&gt;, also accessible via the client library with the below code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from mistralai.client import MistralClient

api_key = os.environ["MISTRAL_API_KEY"]

client = MistralClient(api_key=api_key)
embeddings_batch_response = client.embeddings(
      model="mistral-embed",
      input=["I love Postgres!"],
  )
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What does it mean for your AI apps?
&lt;/h2&gt;

&lt;p&gt;Mixtral provides developers with a gpt-3.5-turbo API compatible alternative and, in the case of  mistral-tiny and mistral-small models, at a lower price. &lt;/p&gt;

&lt;p&gt;Below is the price comparison per one million tokens.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;mistral-tiny&lt;/th&gt;
&lt;th&gt;mistral-small&lt;/th&gt;
&lt;th&gt;mistral-medium&lt;/th&gt;
&lt;th&gt;gpt-3.5-turbo-1106&lt;/th&gt;
&lt;th&gt;gpt-3.5-turbo-instruct&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Input&lt;/td&gt;
&lt;td&gt;$0.15&lt;/td&gt;
&lt;td&gt;$0.64&lt;/td&gt;
&lt;td&gt;$2.68&lt;/td&gt;
&lt;td&gt;$1.0&lt;/td&gt;
&lt;td&gt;$1.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output&lt;/td&gt;
&lt;td&gt;$0.45&lt;/td&gt;
&lt;td&gt;$1.93&lt;/td&gt;
&lt;td&gt;$8.06&lt;/td&gt;
&lt;td&gt;$2.0&lt;/td&gt;
&lt;td&gt;$2.0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;However, if you previously stored ada v2 1536 dimension vector embeddings with &lt;code&gt;pgvector&lt;/code&gt;, you will need to re-create the embeddings to add support for &lt;code&gt;mistral-embed&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;embeddings_batch_response = client.embeddings(
      model="mistral-embed",
      input=["text 1", "text 2", "text 3"],
  )
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;mistral-embed&lt;/code&gt; model for text embedding is slightly more expensive than the &lt;code&gt;text-embedding-ada-002&lt;/code&gt; model.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;mistral-embed&lt;/th&gt;
&lt;th&gt;ada v2&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Input&lt;/td&gt;
&lt;td&gt;$0.107&lt;/td&gt;
&lt;td&gt;$0.1&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Note that &lt;a href="https://docs.mistral.ai/platform/pricing" rel="noopener noreferrer"&gt;Mistral AI’s pricing&lt;/a&gt; is in euros and the tables above reflect adjusted rates to USD.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The release of Mixtral 8x7B by Mistral AI represents a significant leap forward for open-source LLMs. With its enhanced capabilities like 32k token support, improved code generation, and competitive performance against &lt;code&gt;gpt-3.5-turbo&lt;/code&gt;, Mixtral is poised to be a game-changer for developers and AI enthusiasts alike.&lt;/p&gt;

&lt;p&gt;While the model’s resource requirements can be a potential barrier for some, those limitations are offset by the Mistral AI API, and the drop-in replacement client libraries in Python and JavaScript.&lt;/p&gt;

&lt;p&gt;The pricing structure of Mixtral, particularly for the mistral-tiny and mistral-small models, presents a more cost-effective alternative to gpt-3.5-* models. This, along with the advanced capabilities of the mistral-embed model for text embedding, makes Mixtral an attractive option for a wide range of AI apps and Retrieval Augmented Generation pipelines.&lt;/p&gt;

&lt;p&gt;However, it's worth noting that transitioning to Mixtral, especially for those who previously used models like ada v2 for embedding, may require some adjustments in terms of re-creating embeddings and accommodating the slightly higher cost of mistral-embed.&lt;/p&gt;

&lt;p&gt;Overall, Mixtral 8x7B marks an exciting development in the AI field, offering powerful and efficient tools for a variety of applications. As Mistral AI continues to innovate and expand its offerings, it will undoubtedly play a crucial role in shaping the future of AI technology.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>mistral</category>
      <category>mixtral</category>
    </item>
    <item>
      <title>Mistral 7B and BAAI on Workers AI vs. OpenAI Models for RAG</title>
      <dc:creator>Raouf Chebri</dc:creator>
      <pubDate>Mon, 11 Dec 2023 17:06:54 +0000</pubDate>
      <link>https://forem.com/neon-postgres/mistral-7b-and-baai-on-workers-ai-vs-openai-models-for-rag-4pb6</link>
      <guid>https://forem.com/neon-postgres/mistral-7b-and-baai-on-workers-ai-vs-openai-models-for-rag-4pb6</guid>
      <description>&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--xPYbtKfq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://neondatabase.wpengine.com/wp-content/uploads/2023/12/neon-mistral7B1-1024x576.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--xPYbtKfq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://neondatabase.wpengine.com/wp-content/uploads/2023/12/neon-mistral7B1-1024x576.jpg" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the rapidly progressing world of artificial intelligence, choosing the right model for AI-powered applications is crucial. This article explores a comparative analysis of the Mistral 7B model, a promising alternative to OpenAI’s GPT models and BAAI models in the context of Retrieval Augmented Generation (RAG) applications. But first, let’s understand the landscape.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding RAG Pipelines
&lt;/h2&gt;

&lt;p&gt;RAG pipelines enhance the capabilities of Large Language Models (LLMs) by providing them with external ‘research’ to inform their responses. Let me explain.&lt;/p&gt;

&lt;p&gt;Imagine you are a pastry chef, and someone asks you how to make a good chocolate cake. That’s an easy one for you since you have made countless chocolate cakes in the past. But what if someone asks you about making Tandoori chicken? You probably do not have the recipe on the top of your head, but with a little bit of research, you will likely be able to answer that question. The same applies to LLMs. &lt;/p&gt;

&lt;p&gt;When the user asks the LLM a question, the RAG pipeline does the search to provide more context and helps steer the model towards a more accurate and helpful answer. Typically, the context is a piece of information stored in a document or a database that the model hasn’t seen during training. &lt;/p&gt;

&lt;p&gt;The below diagram illustrates the RAG process using &lt;a href="https://neon.tech/docs/introduction" rel="noopener noreferrer"&gt;Neon Docs&lt;/a&gt; Chatbot as an example. The diagram shows three main steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Embedding generation: we need an embedding model to turn the user’s query into a query vector.&lt;/li&gt;
&lt;li&gt;Context retrieval: This is the process of looking for the information in a document or a database using similarity search.&lt;/li&gt;
&lt;li&gt;Completion (or text) generation: In this step, the application provides the completion model with the user query and the context to generate an answer.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--TjVuWal6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/dJsnDPFBNP31e1skLxptoAjV93IF6ujYainmGS_kD2O_mEt9MQx54HUUblzwtLK1ZEVBM1t9NCZ7nhxR-JcD59RELZjX0aHeFK2liy7MO24Bxvpykr6Ptp6wWzYE1QnzpmA4ZxaJTv9Qm-YIGt0bYn0" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--TjVuWal6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/dJsnDPFBNP31e1skLxptoAjV93IF6ujYainmGS_kD2O_mEt9MQx54HUUblzwtLK1ZEVBM1t9NCZ7nhxR-JcD59RELZjX0aHeFK2liy7MO24Bxvpykr6Ptp6wWzYE1QnzpmA4ZxaJTv9Qm-YIGt0bYn0" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We now want to explore open-source alternatives for text generation and embedding models. But first, let’s explain why open-source models are becoming increasingly popular.&lt;/p&gt;

&lt;h2&gt;
  
  
  What are open-source AI models, and why should you care?
&lt;/h2&gt;

&lt;p&gt;The short answer is &lt;em&gt;transparency&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;OpenAI provides developers with GPT and ADA models that are essential for RAG pipelines. However, those models operate as black boxes, which can pose a security concern for some, accelerating the interest in open-source models such as Llama2 and Mistral 7B.&lt;/p&gt;

&lt;p&gt;Being open-source in the AI model context means a transparent training process and output. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://mistral.ai/" rel="noopener noreferrer"&gt;Mistral AI&lt;/a&gt;, for instance, opened the code and weights of its 7-billion-parameter open-source large language model (LLM) &lt;a href="https://docs.mistral.ai/llm/mistral-instruct-v0.1" rel="noopener noreferrer"&gt;Mistral 7B&lt;/a&gt; to the public and explained how the model uses a &lt;a href="https://github.com/mistralai/mistral-src#sliding-window-to-speed-up-inference-and-reduce-memory-pressure" rel="noopener noreferrer"&gt;sliding window to speed up inference and reduce memory pressure&lt;/a&gt;, which gives the model an edge over other open-source models.&lt;/p&gt;

&lt;p&gt;Let’s explore the Mistral 7B model in more detail.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introducing Mistral 7B: A New Contender
&lt;/h2&gt;

&lt;p&gt;The Mistral 7B model is an alternative to OpenAI’s GPT models, engineered for performance and efficiency. According to a &lt;a href="https://arxiv.org/pdf/2310.06825.pdf" rel="noopener noreferrer"&gt;recent paper&lt;/a&gt;, Mistral 7B outperforms existing models like Llama 2 (13B parameters) across all evaluated benchmarks, showcasing superior performance in areas such as reasoning, mathematics, and code generation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--T_DuU8iH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://neondatabase.wpengine.com/wp-content/uploads/2023/12/16-1024x714.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--T_DuU8iH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://neondatabase.wpengine.com/wp-content/uploads/2023/12/16-1024x714.png" width="800" height="558"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--SZS9_Qst--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://neondatabase.wpengine.com/wp-content/uploads/2023/12/17-1024x714.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--SZS9_Qst--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://neondatabase.wpengine.com/wp-content/uploads/2023/12/17-1024x714.png" width="800" height="558"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Mistral 7B has been released under the Apache 2.0 license, with a reference implementation available for easy deployment on cloud platforms like AWS, GCP, and Azure or locally.&lt;/p&gt;

&lt;p&gt;The paper’s claims are impressive, so I wanted to experience firsthand how good the LLM is. &lt;/p&gt;

&lt;h2&gt;
  
  
  BGE: The open-source Embedding generation model
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://huggingface.co/BAAI/bge-base-en-v1.5" rel="noopener noreferrer"&gt;BGE embedding&lt;/a&gt; is a general Embedding Model pre-trained using &lt;a href="https://github.com/staoxiao/RetroMAE" rel="noopener noreferrer"&gt;retromae&lt;/a&gt; that can fine-tuned. Interestingly, BGE comes in three dimension sizes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;small: 384 dimensions&lt;/li&gt;
&lt;li&gt;base: 768 dimensions&lt;/li&gt;
&lt;li&gt;large: 1024 dimensions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This means that you can reduce the size of storage related to your embeddings by 30-80%, depending on the BGE model you choose.&lt;/p&gt;

&lt;p&gt;I used the base-768 model on &lt;a href="https://developers.cloudflare.com/workers-ai/models/text-embeddings/" rel="noopener noreferrer"&gt;Cloudflare Workers AI&lt;/a&gt; for Text Embedding generation for this article.&lt;/p&gt;

&lt;h2&gt;
  
  
  Methodology of Comparative Analysis
&lt;/h2&gt;

&lt;p&gt;For a hands-on evaluation, I replaced &lt;code&gt;gpt-3.5-turbo&lt;/code&gt; with &lt;code&gt;mistral-7b-instruct-v0.1&lt;/code&gt; and &lt;code&gt;text-embedding-ada-002&lt;/code&gt; with &lt;a href="https://github.com/FlagOpen/FlagEmbedding" rel="noopener noreferrer"&gt;BAAI’s bge-base-en-v1.5&lt;/a&gt; in a Neon Docs chatbot. I then compared their responses to eight PostgreSQL-related questions.&lt;/p&gt;

&lt;p&gt;Technical Setup&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Text generation model&lt;/strong&gt; : Mistral 7B on &lt;a href="https://developers.cloudflare.com/workers-ai/" rel="noopener noreferrer"&gt;Cloudflare Workers AI&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embedding model&lt;/strong&gt; : BGE base-768 on &lt;a href="https://developers.cloudflare.com/workers-ai/" rel="noopener noreferrer"&gt;Cloudflare Workers AI&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Experiment models&lt;/strong&gt; : mistral-7b-instruct-v0.1, gpt-3.5-turbo, text-embedding-ada-002, bge-base-en-v1.5.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cloudflare Workers AI allows you to run text, image, and embedding generation models using serverless GPUs. Note that Workers AI is currently in Open Beta and is not recommended for production data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--IIojcNzm--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/AVMuo4G4s2lTxN6fEaTDhc5DoaDcy9R1ifFj2zzH-TE-hvAlgOCLNk_KCdOmTInDjXHx4hPo2X0Dea_q9sNuzndazf4LpupLpa7-UPD5ykA-L-BLgKVCqcEb1Yn6myBi838g_w3nQJWUBVjLsjlCzpw" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--IIojcNzm--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/AVMuo4G4s2lTxN6fEaTDhc5DoaDcy9R1ifFj2zzH-TE-hvAlgOCLNk_KCdOmTInDjXHx4hPo2X0Dea_q9sNuzndazf4LpupLpa7-UPD5ykA-L-BLgKVCqcEb1Yn6myBi838g_w3nQJWUBVjLsjlCzpw" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I tested Mistral 7B on a virtual machine with 24GB of vRAM and NVIDIA GPUs for $1.3 per hour. But our friends at Cloudflare released &lt;a href="https://developers.cloudflare.com/workers-ai/" rel="noopener noreferrer"&gt;Workers AI&lt;/a&gt;, a GPU-powered serverless environment to run machine learning models that better fit my use case.&lt;/p&gt;

&lt;p&gt;For those interested in deploying their own Mistral 7B instance, I added instructions at the end of this article to deploy using the HuggingFace inference endpoint.&lt;/p&gt;

&lt;p&gt;In this analysis, we used the &lt;a href="https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1" rel="noopener noreferrer"&gt;mistralai/Mistral-7B-Instruct-v0.1&lt;/a&gt; model, which has been fine-tuned for conversation and answering questions.&lt;/p&gt;

&lt;p&gt;The default max number of tokens per query is 1512. For this test, I had to increase the max input length and number of tokens to 3000. Note that the larger this value, the more memory each request will consume.&lt;/p&gt;

&lt;h2&gt;
  
  
  Results and Discussion
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Context quality
&lt;/h3&gt;

&lt;p&gt;In the RAG pipeline, the context is the concatenation of texts resulting from a semantic search.&lt;/p&gt;

&lt;p&gt;Therefore, the quality of our context is heavily correlated to the quality of the semantic search using &lt;code&gt;bge-base-en-v1.5&lt;/code&gt; and &lt;code&gt;text-embedding-ada-002&lt;/code&gt; embeddings models. So, the question is: how different would my context be if I switched the text embedding model?&lt;/p&gt;

&lt;p&gt;Our analysis showed that the &lt;code&gt;bge-base-en-v1.5&lt;/code&gt; and &lt;code&gt;text-embedding-ada-002&lt;/code&gt; models provided similar results 46% of the time. A deeper dive using Jaccard and Cosine similarity scores indicated a significant overlap in contexts, suggesting comparable quality.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s---3jOt7K6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://neondatabase.wpengine.com/wp-content/uploads/2023/12/18-1024x549.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---3jOt7K6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://neondatabase.wpengine.com/wp-content/uploads/2023/12/18-1024x549.png" title="Chart" width="800" height="429"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With the number of returned chunks k=3, the &lt;code&gt;bge-base-en-v1.5&lt;/code&gt; and &lt;code&gt;text-embedding-ada-002&lt;/code&gt; models return similar results only 46% of the time. This number is reduced to 42% with k=10.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--8Ym8nVLH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://neondatabase.wpengine.com/wp-content/uploads/2023/12/19-1024x549.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--8Ym8nVLH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://neondatabase.wpengine.com/wp-content/uploads/2023/12/19-1024x549.png" title="Chart" width="800" height="429"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Further analysis using Jaccard and Cosine similarity scores to determine intersecting words and count shows that half of the contexts generated by the two models are similar and often share words.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Cdce2p4l--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://neondatabase.wpengine.com/wp-content/uploads/2023/12/20-1024x549.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Cdce2p4l--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://neondatabase.wpengine.com/wp-content/uploads/2023/12/20-1024x549.png" title="Chart" width="800" height="429"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note: In the above, we extracted the word’s frequency count in the context using TF-IDF to calculate the cosine similarity score. Typically, cosine similarity is sensitive to the frequency of words, while Jaccard similarity purely focuses on the intersecting words.&lt;/p&gt;

&lt;p&gt;More analysis of the completions generated by &lt;code&gt;gpt-3.5-turbo&lt;/code&gt; model shows strong cosine similarity among texts. Competitions share an average of 40% of words. &lt;/p&gt;

&lt;p&gt;Even for questions 3, 5, and 7, where retrieved contexts using &lt;code&gt;text-embedding-ada-002&lt;/code&gt; and &lt;code&gt;bge-base-en-v1.5&lt;/code&gt; embedding models were quite similar; the generated texts were different. This change is probably due to the LLM temperature set to default (0.7), which controls the degree of randomness in the response and allows for variations in the generated text.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--OiQh7pq5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://neondatabase.wpengine.com/wp-content/uploads/2023/12/21-1024x549.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--OiQh7pq5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://neondatabase.wpengine.com/wp-content/uploads/2023/12/21-1024x549.png" title="Chart" width="800" height="429"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Text generation quality
&lt;/h3&gt;

&lt;p&gt;Quality is subjective. Therefore, we surveyed Postgres and Neon experts to rate the generated text quality. &lt;code&gt;gpt-3.5-turbo&lt;/code&gt; scored an average of 3.8/5, outperforming &lt;code&gt;mistral-7b-instruct-v0.1&lt;/code&gt;‘s 2.5/5. (1 being very bad and 5 being very good).&lt;/p&gt;

&lt;p&gt;The answers shared in the survey were generated using context retrieved with &lt;code&gt;text-embedding-ada-002&lt;/code&gt; embedding model. The ground truth in this experiment is the Postgres and Neon documentation. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--qm5mJcO0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://neondatabase.wpengine.com/wp-content/uploads/2023/12/22-1024x549.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--qm5mJcO0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://neondatabase.wpengine.com/wp-content/uploads/2023/12/22-1024x549.png" title="Chart" width="800" height="429"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The BGE model, with its varying dimension sizes, allows for a customizable approach to managing storage and computational resources since smaller vector sizes reduce storage and semantic search query execution time.&lt;/p&gt;

&lt;p&gt;Our analysis revealed that the &lt;code&gt;bge-base-en-v1.5&lt;/code&gt; and &lt;code&gt;text-embedding-ada-002 models&lt;/code&gt;, while similar in results to some extent, display unique characteristics in their context generation capabilities. The observed differences in the semantic search results – with a 46% similarity rate in some instances – underscore the importance of choosing the right embedding model based on the specific requirements of an application.&lt;/p&gt;

&lt;p&gt;In the case of our chatbot, these initial results suggest that the output quality wouldn’t change drastically by migrating to bge-base-en-v1.5. Migrating text generation models, on the other hand, is a different story.&lt;/p&gt;

&lt;p&gt;The Mistral 7B model stands out as a strong contender. Its ability to outperform models like Llama 2 in reasoning, mathematics, and code generation, coupled with its ease of deployment, makes it a viable option for those seeking an alternative to OpenAI’s GPT models.&lt;/p&gt;

&lt;p&gt;However, the difference in performance in our tests – with &lt;code&gt;gpt-3.5-turbo&lt;/code&gt; outperforming &lt;code&gt;mistral-7b-instruct-v0.1&lt;/code&gt;– suggests that while newer models like Mistral 7B are closing the gap, there remains room for improvement and innovation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://huggingface.co/Trelis/zephyr-7b-beta-function-calling-v2" rel="noopener noreferrer"&gt;Zephyr-7B-beta&lt;/a&gt;, a fine-tuned version of &lt;code&gt;mistralai/Mistral-7B-v0.1&lt;/code&gt; that was trained on a mix of publicly available and synthetic datasets, looks promising and could further reduce the gap.&lt;/p&gt;

&lt;p&gt;What about you? Which models do you use for your RAG pipelines? Join us on &lt;a href="https://neon.tech/discord" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; and tell us about your experience with AI models and what you think.&lt;/p&gt;

&lt;p&gt;Note: A special thanks to &lt;a href="https://twitter.com/_StanGirard" rel="noopener noreferrer"&gt;Stan Girard&lt;/a&gt; for inspiring the topic of this article. His suggestion and enthusiasm for AI have been invaluable in shaping this discussion.&lt;/p&gt;

&lt;h2&gt;
  
  
  BONUS: Deploy Mistral 7B Instruct on HuggingFace
&lt;/h2&gt;

&lt;p&gt;For the ones interested in deploying on Huggingface:&lt;/p&gt;

&lt;p&gt;Pre-requisites:&lt;br&gt;&lt;br&gt;
– HuggingFace account&lt;/p&gt;

&lt;p&gt;– HuggingFace Token&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;On &lt;a href="https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1" rel="noopener noreferrer"&gt;HuggingFace&lt;/a&gt;,  locate the “Deploy” drop-down and click on “Inference Endpoints”&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s---zlRrVtD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/FbMhMxp29AtMFeJHxjNGdOBJR-xi0NpJO-XsQ_FLjaCVojhqIuy5V5io6MT52PWp-hO9ZjrcoqPHukx5JaG3sqpG3VhLUqWiK2RX6X14Kd1EGStd9ekKHCdKb_Bd2SCgeUltLqbT5udE5lUaYtg-cfE" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---zlRrVtD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/FbMhMxp29AtMFeJHxjNGdOBJR-xi0NpJO-XsQ_FLjaCVojhqIuy5V5io6MT52PWp-hO9ZjrcoqPHukx5JaG3sqpG3VhLUqWiK2RX6X14Kd1EGStd9ekKHCdKb_Bd2SCgeUltLqbT5udE5lUaYtg-cfE" width="800" height="251"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Select the default instance with 24GB of vRAM and an NVIDIA GPU, then click on “Create Endpoint” at the bottom of the page.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--gRTwVfsY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/8OUlgWH7dqeMSj7oVF2qGINvgkHojnB_muaHRdtpbVeIo_eOXA7BqBE_tXRGYAHCQvvHO-0e-2LiGgvMDBvgt6YSvyU9zHAQEfoNkKsEm11YYIf4Py5E1rwtaNhTE4FoBrxMKszifkg8vamg3yK-SI0" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--gRTwVfsY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/8OUlgWH7dqeMSj7oVF2qGINvgkHojnB_muaHRdtpbVeIo_eOXA7BqBE_tXRGYAHCQvvHO-0e-2LiGgvMDBvgt6YSvyU9zHAQEfoNkKsEm11YYIf4Py5E1rwtaNhTE4FoBrxMKszifkg8vamg3yK-SI0" width="800" height="502"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Once your instance is deployed, you can test your endpoint on HuggingFace: &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---_EOwW4y--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/XmWChg3tX5vahJWcc7URrTAikfY72WUAFdgJwIJDP_gpbflpTiAVrDLqEq-pm57Fo4YK4fx2p4jLxBZ7LtjRMa9tiLybImF2oY3yPhBFxjQpr2Im4YCjvbfdCMxIB-45WFhRqp4HmTi7vEYFNyhHggs" width="800" height="199"&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Or in code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import requests

API_URL = &amp;amp;quot;https://asdfghjklrtyui.us-east-1.aws.endpoints.huggingface.cloud&amp;amp;quot;

headers = {

&amp;amp;quot;Authorization&amp;amp;quot;: &amp;amp;quot;Bearer XXXXXX&amp;amp;quot;,

&amp;amp;quot;Content-Type&amp;amp;quot;: &amp;amp;quot;application/json&amp;amp;quot;

}

def query(payload):

response = requests.post(API_URL, headers=headers, json=payload)

return response.json()

output = query({

&amp;amp;quot;inputs&amp;amp;quot;: &amp;amp;quot;Who are you?&amp;amp;quot;,

})

# A: I'm Mistral, a language model trained by the Mistral AI team."&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>ai</category>
      <category>openai</category>
      <category>embeddings</category>
      <category>llm</category>
    </item>
  </channel>
</rss>
