Forem: Raouf Chebri

Build with confidence with Schema Diff & Protected Branches

Raouf Chebri — Tue, 16 Apr 2024 13:33:07 +0000

Neon helps teams ship with confidence without compromising development velocity. One of it’s features that contributes to that is database branching. In this post, we will explore two new features related to database branching: schema diff and protected branches.

Database branches in Neon are copy-on-write clones of your database that you can use for development, testing, or experimentation without compromising the original database.

Ever since we introduced the database branching feature to Neon, developers have asked for ways to set permission rules on individual branches and identify differences between parent and child branches. And here it is.

Today, we’ll explore two of our newest features, which will provide you with greater confidence.

Schema Diff

Similar to diffs in Git, the Neon schema diff feature compares schemas between the current and past state of the branch. Schema diffs are important to development workflows as they allow you to easily track how your database schema has evolved for better debugging, code review, and cross-team collaboration. For example, you can compare schemas after your peer has merged their PR and applied migrations.

Join us on Discord and let us know what you think and how you use schema diff in your workflows.

We detailed how Neon storage and ephemeral branches work in the Point In Time Recovery Under the Hood in Serverless Postgres article. In short, Neon’s storage engine saves Write-Ahead-Log records and can reconstruct a Postgres page at any given timestamp or Log-Sequence Number, allowing for time travel queries.

Under the hood, schema diff creates short-lived, ephemeral branches (TTL=10 seconds) set at a specific time (and LSN), then queries the Pageservers to retrieve the past schema, and then compares it with the current one to effectively display the changes.

Let’s see an example of schema diff on the Console. For that, we will create a Neon project and run the following query to create a user table:

Once done, we can compare the current state of the database with the user table to its previous state, where I had no tables. To do so, we first need to navigate to the Restore page, select a branch, and click on Schema Diff.

Below is what the result will look like:

As expected, the previous state of my database is empty because I had just created the project. Let’s now modify my schema by adding a new phone_number column:

Let’s compare again with the state of my database at 10:52 PM, the time after I ran the schema changes. The result should look like this:

We can observe that lines 60 and 61 were modified as a result of the schema change.

Protected branches

Protected branches prevent unauthorized applications, users, and roles from accessing personally identifiable information (PII) or other sensitive data within your branch. This feature is available for users who are onthe Scale plan

The first feature that’s available in this release is “IP Allow”, which restricts database access exclusively to trusted IP addresses. We plan on introducing more rules in the future.

If there are other ways we can protect your database branches, let us know on Discord or X.

You have a limit of 5 protected branches in your project. To set your branch as protected, simply follow these steps:

Go to the branches page.
Find the database branch you wish to protect.
Click on “More” and then select “Set as protected.”

Check out the documentation for more details on protected branches.

Conclusion

The addition of Schema Diff and Protected Branches to Neon allows developers to easily identify schema changes and safeguard sensitive data, and equipped to build with confidence.

You can try Schema Diff and Protected Branches on Neon today. Join us on Discord, and let us know how we can help you build better and ship faster with Neon.

IP Allow with IPv6

Raouf Chebri — Wed, 13 Mar 2024 10:03:44 +0000

We’re Neon. We’re building Postgres that helps you confidently ship reliable and scalable apps. You can try Neon now for free . We recently added support for IPv6 addresses in the IP Allow feature. This post explains what IPv6 is and its benefits.

IPv4 limitations

IPv4 has been around for almost half a century. It uses 32-bit addresses, which allows for about 4.3 billion unique addresses. While this number seemed more than sufficient in the early days of the internet, the explosive growth of the internet and the proliferation of smart devices have led to a situation where the world is running out of available IPv4 addresses. This limitation has prompted the need for a solution to accommodate the vast scale of the modern internet.

The Solution: IPv6

IPv6 was developed to address the issue of IPv4 address exhaustion. It uses 128-bit addresses, which significantly expands the number of possible addresses to approximately 340 undecillion (3.4 × 10^38), a virtually inexhaustible supply for the foreseeable future. This vast expansion solves the primary problem of IPv4 address exhaustion, ensuring that every device on the internet can have a unique IP address.

Here is an example of an IPv6 address: 2001:0db8:85a3:0000:0000:8a2e:0370:7334

IPv6 addresses consist of 128 bits, representing 8 groups of 4 hexadecimal digits.

Why support IPv6 addresses in IP Allow

IP Allow limits database access to only trusted IP addresses, preventing unauthorized access and helping maintain overall data security.

Nowadays, most cloud providers allow you to design and deploy a global environment that leverages end-to-end IPv6 connectivity. AWS, for example, provides services with IPv6-only capabilities such as EC2 instances, Elastic Load Balancers, Amazon EKS, and others.

Adding IPv6 support to Neon allows you to overcome IPv4 limitations and build highly scalable architectures while maintaining backward compatibility with your existing IPv4 workloads. This is particularly useful for large-scale and containerized applications, allowing you to focus on migrating and scaling applications without devoting effort towards overcoming IPv4 limits.

Thanks for reading. We would love to get your feedback. Follow us on X, join us on Discord, and let us know how we can help you build secure, reliable, and scalable applications.

Better Postgres with Prisma Experience

Raouf Chebri — Thu, 07 Mar 2024 14:17:20 +0000

We’re Neon. We’re building Postgres that helps you confidently ship reliable and scalable apps. We made Postgres on Neon work seamlessly with Prisma. This article explains how we did it.

We love Prisma, and so do developers. Prisma ORM makes it easy to perform schema migrations and map any database objects with your existing JavaScript and TypeScript applications, allowing you to integrate type-safe queries into your codebase.

Today, we’re pleased to share significant improvements to the developer experience of Neon using Prisma by adding support to schema migrations via pooled connections, making it possible to use Neon’s default connection string to scale your serverless apps and run schema migrations.

You can start using Prisma with Neon for free.

For reference, when we first introduced Neon, Prisma users needed a direct database URL, a pooled database URL, and a shadow database URL. Here how your prisma.schema file looked like:

Now, all you need is one database URL. As a bonus, we also removed the need to specify the query parameter pgbouncer=true when using pooled connections:

This article discusses each step of the process and the changes made to Neon, PgBouncer, and Prisma to make this possible, including:

Schema migration support with pooled connections
Dropping a shadow database WITH (FORCE)

Schema migration support with pooled connections

In enhancing the experience with Prisma, we added support for prepared statements and DISCARD ALL/DEALLOCATE ALL to PgBouncer to allow for schema migration using pooled connections. Let’s explore why.

In Postgres, each connection is a backend process that requires memory allocation, which limits the number of concurrent connections. The solution to this problem is connection pooling with PgBouncer, which helps keep the number of active backend processes low.

PgBouncer becomes increasingly important at scale when using serverless services such as AWS Lambda or Vercel functions, since each function call establishes a new connection. We name database connections that use PgBouncer pooled connections.

Additionally, prisma migrate uses prepared statements to optimize SQL query performance, and DEALLOCATE ALL to release all prepared statements in the current session before preparing and executing Prisma Client queries. More on prepared statements in the PgBouncer 1.22.0 support announcement article.

Before version 1.22.0, if you attempted to run prisma migrate commands using a pooled connection, you might have seen the following error:

To scale using pooled connections and be able to perform schema migrations, you had to specify both pooled and direct database URLs and set pgbouncer mode as a query parameter in your schema file.

Here is how the datasource db block in your prisma.schema file looked like:

Here is how it looks now after adding support for prepared statements and DISCARD ALL/DEALLOCATE ALL to PgBouncer:

Note you only need the pooled connection to run Prisma with Postgres. It’s no longer required to specify a direct connection to the database and the pgbouncer=true query parameter. The pooled connection is used to scale your queries and run schema migrations.

This allows Neon to confidently set the default URL to the pooled connection string on the Console and the Vercel Integration.

DROP shadow database WITH (FORCE)

When you run prisma migrate dev, Prisma Migrate uses a shadow database to detect schema drifts and generate new migrations. During that process, Prisma creates, introspects, and then drops a shadow database. More on shadow databases on Prisma’s documentation.

However, certain cloud providers do not allow to drop and create databases via SQL, which forces developers to manually create shadow databases and specify them in the prisma.schema file:

We have added support for managing roles and databases via SQL on Neon, which allowed for removing the need for manually creating a shadow database. Additionally, Prisma 5.10.0 introduces support for DROP WITH (FORCE) as an alternative drop database path in the schema engine, which allows it to dispose of shadow databases.

So, in your schema.prisma file, you would have:

Conclusion

The improvements included in PgBouncer 1.22.0 have significantly streamlined the experience for developers using Postgres on Neon and Prisma, making is more efficient to scale serverless applications and run schema migrations.

We would love to get your feedback. Follow us on X, join us on Discord and let us know how we can help you build the next generation of web applications.

Shout out to all contributors for making this possible, including:

Deploy Mistral Large to Azure and create a conversation with Python and LangChain

Raouf Chebri — Tue, 27 Feb 2024 01:59:48 +0000

We’re Neon, and we’re redefining the database experience with our cloud-native serverless Postgres solution. If you’ve been looking for a database for your RAG apps that adapts to your application loads, you’re in the right place. Learn more about Neon and give it a try, and let us know what you think. Neon is cloud-native Postgres and scales your AI apps to millions of users with pgvector. In this post, Raouf is going to tell you what you need to know about Mistral Large, the most advanced LLM by MistralAI.

Mistral AIhas recently unveiled its most advanced open-source large language model (LLM) yet, Mistral Large, alongside its ChatGPT competitor, Le Chat (beta). Le Chat includes other models such as Next, and Small, to let you explore Mistral AI’s capabilities.

For those waiting to get their hands on Le Chat but stuck in the queue, this guide will show you how to deploy Mistral Large on Azure and start using it immediately with LangChain.

Before we dive into the deployment process, let’s briefly explore Mistral Large.

Mistral Large

Mistral Large is Mistral AI’s most advanced model with unparalleled reasoning capabilities across multiple languages, including French, Spanish, German and Italian. It has a generous 32k token context window making interesting for Retrieval Augmented Generation applications.

Comparison measuring massive multitask language understanding

And most importantly, Mistral Large is pretty good at coding and math. The model ranks the highest in the MassiveText Benchmarks for Programming Problems (MBPP), which covers a wide range of difficulty levels and programming concepts and is designed to evaluate models on several fronts, including accuracy and efficiency.

Mistral Large also ranks the highest in the GSM8K, which measures the capabilities of AI models in educational contexts and reasoning in mathematics.

But don’t believe the benchmarks. Next, we’ll deploy the Mistral Large model to Azure and try it for ourselves.

Deploy your own Mistral Large model to Azure

As part of the launch, Mistral AI announced its partnership with Microsoft, making the Mistral Large model available on Azure. Below are the steps to deploy the model:

Access Azure AI Studio : Sign into your Azure account and navigate to AI Studio.
Deploy Mistral Large : Look for the “Deploy” option and select Mistral Large for deployment.

Create a Project : If you haven’t already, set up a new project, opting for the Pay-As-You-Go plan and choosing France Central as your region.

Review and Create : Double-check your resource information before finalizing your AI project.

Finalize Deployment : After creating your AI project, proceed to deploy Mistral Large. Choose a name for your deployment; this will be your inference endpoint’s identifier.
Select a Deployment Name : This is the name that will be displayed on your inference endpoint.

Congratulations 🎉 You’ve successfully deployed Mistral Large on Azure!

How to use Mistral Large with LangChain

After deployment, you’ll receive an API endpoint and a security key for making inferences. We’ll use those further below.

To use Mistral Large with LangChain, follow these steps:

Create project

mkdir mistral-large-example
cd mistral-large-example">

Create and activate Python environment: Run the following command to create an environment.

python -m venv myenv
source myenv/bin/activate

Install packages and project dependencies:

pip install langchain langchain_mistralai

Create a LangChain conversation: first, create a file:

touch main.py

Here’s an example of how to create a LangChain conversation chain with Mistral Large:

from langchain.chains import LLMChain
from langchain.memory import ConversationBufferMemory
from langchain.prompts import ChatPromptTemplate, HumanMessagePromptTemplate, MessagesPlaceholder
from langchain.schema import SystemMessage
from langchain_mistralai.chat_models import ChatMistralAI

# Configuration for prompting
prompt = ChatPromptTemplate.from_messages([
    SystemMessage(content="You are a chatbot engaging in a conversation with a human, often incorporating French cultural references."),
    MessagesPlaceholder(variable_name="chat_history"),
    HumanMessagePromptTemplate.from_template("{human_input}"),
])

# Memory configuration
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

# Configuring the Mistral model endpoint and API key
chat_model = ChatMistralAI(
    endpoint="https://&lt;endpoint&gt;.francecentral.inference.ai.azure.com",
    mistral_api_key="&lt;api-key",
)

# Setting up the conversation chain
chat_llm_chain = LLMChain(
    llm=chat_model,
    prompt=prompt,
    memory=memory,
    verbose=True,
)

# Example usage
result = chat_llm_chain.predict(human_input="Hi there, my friend")
print(result)

Copy/Paste the code above to the main.py file and run the following:

python main.py

Here is how the output should look like:

python3 main.py 

Entering new LLMChain chain...

Prompt after formatting:

System: You are a chatbot engaging in a conversation with a human, often incorporating French cultural references.

Human: Hi there, my friend

&gt; Finished chain.

 Hello! It's a pleasure to chat with you. As you've noticed, I enjoy incorporating French cultural references into our conversations. Did you know that the Eiffel Tower, one of France's most iconic landmarks, was initially criticized by some of France's leading artists and intellectuals for its design when it was first built? How can I assist you today?">

Conclusion

There has never been a better time to develop AI-powered applications. With rapid deployments to robust and scalable infrastructures such as Azure’s, developers can create applications that are more intelligent, interactive, and impactful.

If you are building a RAG application, or simply need a Postgres database that scales, Neon with its autoscaling capabilities offers elastic vector search and fast index build with pgvector, making your AI apps fast and scalable to millions of users.

Start building with Neon for free today, join us on Discord and let us know what you’re working on and how we can help you build better apps.

Resources

Autoscaling in Action: Postgres Load Testing with pgbench

Raouf Chebri — Fri, 23 Feb 2024 09:25:39 +0000

In this article, I’ll show Neon autoscaling in action by running a load test using one of Postgres’ most popular benchmarking tool, pgbench. The test simulates 30 clients running a heavy query.

While 30 doesn’t sound like a lot, the query involves a mathematical function with high computational overhead, which signals to the autoscaler-agent that it needs to allocate more resources to the VM.

We will not cover how autoscaling works, but for those interested in knowing the details, you can read more about how we implemented autoscaling in Neon.

For this load test, you will need:

The load test

Ensuring your production database can perform under varying loads is crucial. That’s why we implemented autoscaling to Neon, a feature that dynamically adjusts resources allocated to a database in real-time, based on its current workload.

However, the effectiveness and efficiency of autoscaling are often taken for granted without thorough testing. To showcase autoscaling in action, we turn to Postgres and pgbench.

pgbench is a benchmarking tool included with Postgres, designed to evaluate the performance of a Postgres server. The tool simulates client load on the server and runs tests to measure how the server handles concurrent data requests.

pgbench is executed from the command line, and its usage can vary widely depending on the specific tests or benchmarks being run. Here is the command we will use in our test:

pgbench -f test.sql -c 30 -T 120 -P 1 &lt;CONNECTION_STRING&gt;

In this example, pgbench executes the query in test.sql. The parameter -c 30 specifies 30 client connections, and -T 120 runs the test for 120 seconds against your database. -P 1 specifies that pgbench should report the progress of the test every 1 second. The progress report typically includes the number of transactions completed so far and the number of transactions per second.

30 clients don’t seem like enough do stress a database. Well, it depends on the query you’re executing, which we’ll see next.

Query execution plan

Here is the query we’ll use for our load test:

SELECT log(factorial(32000)) / log(factorial(20000));

Mathematically, this query essentially compares the growth rates of the factorials of 32,000 and 20,000 by examining the ratio of their logarithms.

Remember factorials? The factorial of a number n (denoted as n!) is the product of all positive integers less than or equal to n. For example, the factorial of 5 (5!) is 5 * 4 * 3 * 2 * 1 = 120. Factorials grow very rapidly with increasing numbers.

To give you a sense of scale, the factorial of just 20 is already a 19-digit number: 20!=2,432,902,008,176,640,000

The natural logarithmic function (log), on the other hand, is the power to which e (Euler’s number = 2.71828) must be raised to obtain the value x.

In other words, this operation should take a long time to process. How long? Let’s examine the query execution plan using EXPLAIN ANALYZE:

EXPLAIN ANALYZE SELECT log(factorial(32000)) / log(factorial(20000));

Output:

QUERY PLAN                                      

-------------------------------------------------------------------------------------

 Result  (cost=0.00..0.01 rows=1 width=32) (actual time=0.000..0.001 rows=1 loops=1)

 Planning Time: 1921.630 ms

 Execution Time: 0.005 ms

(3 rows)

This query was executed on ¼ vCPU. EXPLAIN ANALYZE includes the planner’s estimates and real execution metrics. Execution Time appears to be quite fast. However, Planning Time (the time taken by the Postgres query planner to generate the execution plan) takes almost 2 seconds and suggests that preparing to run this mathematical function involves significant computational overhead.

Combine 30 of those, and we should stress Postgres enough to trigger autoscaling.

Enabling autoscaling

Autoscaling is the process of automatically increasing or decreasing the CPU and memory allocated to a database based on its current load. It dynamically adjusts the compute resources allocated to a Neon compute instance in response to the current load, eliminating the need for manual intervention. Learn more about autoscaling in the docs.

You can enable autoscaling by defining the minimum and maximum compute units (CU) you’d like to allocate to your Postgres instance. This way, you remain in control of your resource consumption. For example, 1 CU allocates 1vCPU and 4GB of RAM to your instance.

You can set your instance size when you create a new project or by navigating to the Branches page on your Neon Console, clicking on the database branch, and setting the CU range.

I will set the range for this load test from ¼ to 7 CUs.

Executing & monitoring the load test

Let’s run our load test now and observe its effect on our Postgres instance. We recently added graphs to monitor the resources allocated to your Postgres instance and its usage, which will come in handy later. After enabling autoscaling, follow these steps to execute the load test:

Create your project folder and test.sql file:

mkdir pgbench-load-test
cd pgbench-load-test
echo "SELECT log(factorial(32000)) / log(factorial(20000));" &gt; test.sql'

Execute the load test by running the following command:

pgbench -f test.sql -c 8 -T 120 -P 1 &lt;YOUR_CONNECTION_STRING&gt;

you can create a Neon project if you don’t have a connection string.

Navigate to the autoscaling graph to monitor usage:

You should observe a rapid change in CPU and memory allocated. The result should look similar to the graph below.

The performance summary returned by pgbench should look like this:

latency average = 6000.891 ms
latency stddev = 2768.066 ms
initial connection time = 3712.770 ms
tps = 4.978907 (without initial connection time)

On average, each operation took slightly over 6 seconds to complete. A standard deviation of 2768.066 ms means that the latencies of individual operations varied quite a bit around the average latency. A higher standard deviation indicates more variability in how long each operation took to complete.

Establishing this connection took approximately 3.7 seconds before any operations could be performed. A TPS of around 4.98 means that, on average, the database was able to complete nearly five transactions every second during the test, after excluding the initial connection time.

Conclusion

pgbench is a simple yet powerful tool to test your database and simulate multiple clients running heavy SQL queries. We also saw how to examine the query execution plan with EXPLAIN ANALYZE, which provides insights to optimize your SQL queries.

If you’re running an application that can be subject to varying workloads, autoscaling offers you the confidence that your database will always under the stress of real-world demands.

Thanks for reading. If you are curious about autoscaling, give Neon a try and join our Discord. We look forward to seeing you there and hearing your feedback.

Happy scaling!

Point In Time Recovery Under the Hood in Serverless Postgres

Raouf Chebri — Thu, 22 Feb 2024 12:44:01 +0000

Imagine working on a crucial project when suddenly, due to an unexpected event, you lose significant chunks of your database. Whether it’s a human error, a malicious attack, or a software bug, data loss is a nightmare scenario. But fear not! We recently added support for Point-In-Time Restore (PITR) to Neon, so you can turn back the clock to a happier moment before things went south.

In the video below and in the PITR announcement article, my friend Evan shows you can recover your data in a few clicks. He also uses Time Travel Assist to observe the state of the database at a given timestamp to confidently and safely run the restore process.

How is this possible? This article is for those interested in understanding how PITR works under the hood in Neon. To better explain this, we will:

Cover the basics of PITR in Postgres
Explore the underlying infrastructure that allows for PITR in Neon.

We’ll ensure by the end of this post that you’re always prepared for disaster strikes.

Understanding the basics of Point In-Time Recovery in Postgres

PITR in Postgres is made possible using two key components:

Write-Ahead Logging : Postgres uses Write-Ahead Logging (WAL) to record all changes made to the database. Think of WAL as the database’s diary, keeping track of every detail of its day-to-day activities.
Base backups : Base backups are snapshots of your database at a particular moment in time.

With these two elements combined, you define a strategy to restore your database to any point after the base backup was taken, effectively traveling through your database’s timeline. However, you’d need to do some groundwork, which consists of the following:

Setting up WAL archiving: By defining an archive_command and setting archive_mode to on in your postgresql.conf.
Creating base backups: You can use the pg_basebackup to create daily backups.

If, for any reason, you need to restore your database, you need to recover the latest backup and replay the WAL on top of it. The same logic applies to restoring from a point in time in the retention period.

Let’s say we want to restore the database to its state on February 1st at 14:30. We first locate the last backup file created before that target time, restore it, and then replay the WAL up to that time.

Great! We now know how to perform a PITR in Postgres. However, there are a few limitations to this approach:

You might notice a drop in performance while performing backups,
Because you have a finite storage capacity, you must define a limit to your archived WAL. This limit is known as the retention period (a.k.a history retention), which determines how far back in time your data can be restored.
You have a single point of failure (SPOF) since all base backups and WAL archives are in the same location.

We can enhance our architecture by adopting disaster recovery tools like Barman to avoid SPOF and downtime. With Barman, Postgres streams base backups and WAL archives to an external backup server. Or, if you know what you’re doing, you can configure Postgres to stream base backups and WAL archives to an AWS S3 bucket, and add a standby, which serves as an exact copy of your database, to avoid downtime. Your setup would look like this:

To sum it up and to perform a PITR in Postgres without downtime, you need to:

Have a backup server
Set up WAL archiving and stream it to the backup
Schedule daily backups

Additionally, you need to install a bunch of packages and configure and maintain this infrastructure, a time that can be spent focused on your application instead. It’s that convenience, simplicity, and confidence in your data of use that Neon offers.

So, how do we make it look so easy? Let’s step back and explain how Neon’s storage engine works.

Understanding Neon’s architecture

Neon’s philosophy is that the “database is its logs”. In our case: “Postgres is its WAL records”.

Neon configures Postgres to stream the WAL to a custom Rust-based storage engine. Neon’s storage engine is composed of three parts:

A persistence layer called “Safekeepers” makes sure the written data is never lost, using Paxos as a consensus algorithm.
A storage layer called “Pageservers”: multi-tenant storage that can reconstruct the data from WAL and send it to Postgres.
A second persistence layer to durably store the WAL in AWS S3.

And since all the data is stored in Neon’s storage engine, Postgres doesn’t need to persist data on the local disk. This turns Postgres into a stateless compute instance that can start in under 500ms, making Neon serverless.

As a result, we no longer require:

A standby: because, in the case of a Postgres crash, we can quickly spin up another instance.
Backups: Neon’s storage engine stores the WAL and creates and performs compactions

The data flow would look like the following:

Check out the Architecture decisions in Neon article by Heikki Linnakangas to learn more.

To understand the magic behind PITR in Neon, we’ll explore how the Pageservers work.

Pageservers: under the hood

Each transaction in the WAL is associated with a Log Sequence Number (LSN), marking the byte position in the WAL stream where the record of that transaction starts. If we follow our initial analogy of WAL being a detailed diary of everything in the database, then the LSN is the page number in that diary.

The Pageserver can be represented by a 2-dimensional graph, where the Y-axis is the LSN, and the X-axis is the key that points to the database, relation, and then block number. A key for example can point to certain rows in your database.

When data is written in Neon, the role of Pageservers is to accumulate WAL records. Then, when these records reach approximately 1GB in size, Pageservers create two types of immutable layer files:

Image layers (bars): contain a snapshot of a key range for a specific LSN. You can see Image Layers as the state of rows in certain tables or indexes at a given time.
Delta layers (rectangles): contain the incremental changes within a key range. You can see Delta layers as a log of all the changes that happened to your rows.

Does this sound familiar?

Indeed, it employs the same principle as the traditional Postgres setups for PITR we’ve previously discussed, which include base backups and WAL archiving. The main difference here is that you don’t need to initiate a lengthy and complex restore procedure every time you wish to read data from a previous state of the database. This is because Pageservers inherently know how to reconstruct the state of the page at any given LSN or timeline.

Ephemeral branches

We mentioned previously that, in Postgres, each WAL record is associated with an LSN. In Neon, Postgres tracks the last evicted LSN in the buffer cache, so Postgres knows at which point in time it should fetch the data.

When Postgres requests a page from the Pageserver, it triggers the GetPage@LSN function, which returns the state of a given key at that specific LSN.

Read the Deep dive in Neon’s storage engine article to learn more about Neon’s architecture.

In practice, you can access different timelines through database branches. These branches are copy-on-write clones of your database, representing the state of your data at any point in its history. When you create a branch, you specify the LSN (or a timestamp), and Neon’s control plane generates a timeline associated with your project, keeping track of it.

We’ve enhanced the Point In Time Recovery (PITR) feature in Neon with Time Travel Assist. This functionality allows you to perform Time Travel queries to review the state of your database at a specific timestamp or LSN, following the same underlying steps:

Creating a timeline, and
Running GetPage@LSN.

However, these branches are ephemeral, having a Time To Live (TTL) of 10 seconds. We refer to these as ephemeral branches, and they will soon become a crucial part of your development workflows.

Ephemeral branches enable you to connect to a previous state of your database by merely specifying the LSN or timestamp in your connection string. This capability is natively supported by Pageservers, and Neon’s PITR feature is the first step towards making ephemeral connections available to developers. Stay tuned for more development in this area.

Conclusion

While Postgres’ features offer powerful options and tools like Barman to help with disaster recovery, Neon’s approach makes PITR reliable, accessible, efficient, and integrated into a seamless database management experience.

By first exploring how to do PITR in Postgres, we’ve learned about the importance of continuous archiving and creating base backups.

Neon’s storage engine saves WAL records and snapshots of your database and can natively reconstruct data for any point in time in your history. This capability allows for the Time Travel Assist to query your database at a given timestamp before you proceed to its restoration using short-lived or ephemeral branches.

Ephemeral branches introduce a unique way to interact with your data’s history by allowing developers to access different timelines and perform Time Travel queries to provide the ability to review prior states and understand your data’s lifecycle.

What about you? How often do you use PITR in your projects? Join us on Discord and let us know how we can enhance your Postgres experience in the cloud.

Special thanks to skeptrune for reviewing and suggesting adding a mention to Barman.

PgBouncer: The one with prepared statements

Raouf Chebri — Thu, 15 Feb 2024 09:43:20 +0000

The latest release of PgBouncer 1.22.0 increases query throughput by 15% to 250% and includes support for DEALLOCATE ALL and DISCARD ALL, as well as protocol-level prepared statements released in 1.21.0.

In this article, we’ll explore what prepared statements are and how to use PgBouncer to optimize your queries in Postgres.

What are Prepared Statements?

In Postgres, a prepared statement is a feature that allows you to create and optimize an SQL query once and then execute it multiple times with different parameters. It’s a template where you define the structure of your query and later fill in the specific values you want to use.

Here’s an example of creating a prepared statement with PREPARE:

PREPARE user_fetch_plan (TEXT) AS
SELECT * FROM users WHERE username = $1;">

Here, user_fetch_plan is the name of the prepared statement, and $1 is a placeholder for the parameter.

Here is how to execute the prepared statement:

EXECUTE user_fetch_plan('alice');">

This query will fetch all columns from the users table where the username is alice.

Why Use Prepared Statements?

Performance : Since the SQL statement is parsed and the execution plan is created only once, subsequent executions can be faster. However, this benefit might be more noticeable in databases with heavy and repeated traffic.
Security : Prepared statements are a great way to avoid SQL injection attacks. Since data values are sent separately from the query, they aren’t executed as SQL, making injecting malicious SQL code difficult.

What is PgBouncer?

Before diving into what PgBouncer is, let’s take a step back and briefly touch on how Postgres operates.

Postgres runs on a system of several interlinked processes, with the postmaster taking the lead. This initial process kicks things off, supervises other processes, and listens for new connections. The postmaster also allocates a shared memory for these processes to interact.

Whenever a client wants to establish a new connection, the postmaster creates a new backend process for that client. This new connection starts a session with the backend, which stays active until the client decides to leave or the connection drops.

Here’s where it gets tricky: Many applications, such as serverless backends, open numerous connections, and most eventually become inactive. Postgres needs to create a unique backend process for each client connection. When many clients try to connect, more memory is needed. In Neon, for example, the default maximum number of concurrent direct connections is set to 100.

The solution to this problem is connection pooling with PgBouncer, which helps keep the number of active backend processes low.

PgBouncer is a lightweight connection pooler which primary function is to manage and maintain a pool of database connections to overcome Postgres’ connection limitations. Neon projects come by default with direct and pooled connections. The latter uses PgBouncer and currently offers up to 10,000 connections.

Depending on your database provider, you'll have different ways to access to PgBouncer. On Neon, you can check the “Pooled connection” box in the connection details widget and make sure is contains the -pooler suffix.

postgres://johndoe:mypassword@ep-billowing-wood-25959289-pooler.us-east-1.aws.neon.tech/neondb">

Using Prepared Statements with PgBouncer in client libraries:

PgBouncer supports prepared statements at the protocol level, and therefore, the above SQL-level prepared statement using PREPARE and EXECUTE will not work with PgBouncer. See PgBouncer’s documentation for more information.

However, you can use prepared statements with pooled connections in a client library. Most PostgreSQL client libraries offer support for prepared statements, often abstracting away the explicit use of PREPARE and EXECUTE. Here’s how you might use it in a few popular languages:

// using psycopg2
cur = conn.cursor()
  query = &quot;SELECT * FROM users WHERE username = %s;&quot;
  cur.execute(query, ('alice',), prepare=True)
  results = cur.fetchall()">

// using pg  
const query = {
   // give the query a unique name
   name: 'fetch-user',
      text: 'SELECT * FROM users WHERE username = $1',
      values: ['alice'],
  };
  client.query(query);">

In these client libraries, the actual SQL command is parsed and prepared on the server, and then the data values are sent separately, ensuring both efficiency and security.

Under the hood, PgBouncer examines all the queries sent as a prepared statement by clients and assigns each unique query string an internal name (e.g. PGBOUNCER_123). PgBouncer rewrites each command that uses a prepared statement to use the matching internal name before forwarding the corresponding command to Postgres.

                +-------------+
                | Client |
                +------+------+
                       |
                       | Sends Prepared Statement (e.g., &quot;SELECT * FROM users WHERE id = ?&quot;)
                       |
                +------v------+
                | PgBouncer |
                | |
                | 1. Examines and tracks the client's statement. |
                | 2. Assigns an internal name (e.g., PGBOUNCER_123).|
                | 3. Checks if the statement is already prepared |
                | on the PostgreSQL server. |
                | 4. If not, prepares the statement on the server. |
                | 5. Rewrites the client's command to use the |
                | internal name. |
                +------^------+
                       |
                       | Forwards Rewritten Statement (e.g., &quot;SELECT * FROM users WHERE id = ?&quot; as PGBOUNCER_123)
                       |
                +------v------+
                | PostgreSQL |
                | Server |
                | |
                | Executes the forwarded statement with the internal name. |
                +-------------+">

In Summary

PgBouncer bridges the gap between the inherent connection limitations of Postgres and the ever-growing demand for higher concurrency in modern applications.

Leveraging prepared statements can be a valuable asset to boost your Postgres query performance and adds a layer of security against potential SQL injection attacks when using pooled connections.

You can try prepared statements in PgBouncer with Neon today. We can’t wait to see what you build using it. Happy querying.

If you have any questions or feedback, don’t hesitate to get in touch with us on Discord. We’d love to hear from you.

pgvector: 30x Faster Index Build for your Vector Embeddings

Raouf Chebri — Wed, 07 Feb 2024 15:43:47 +0000

We are Neon, the serverless Postgres. We power thousands of AI apps with the pgvector extension and separate storage and compute enabling your database resources to scale independently. In this article, Raouf explains how you can use Neon’s elasticity, and parallel HNSW index build in pgvector (0.5.1 for now, and 0.6.0 soon) to scale your AI apps.

Postgres’ most popular vector search extension, pgvector, recently implemented a parallel index build feature, which significantly improves the Hierarchical Navigable Small World (HNSW) index build time by a factor of 30.

Congratulations to Andrew Kane and pgvector contributors for this release, which solidifies Postgres’ position as one of the best databases for vector search and allows you to utilize the full power of your database to build the index.

Tests run by Johnathan Katz using a 10M dataset with 1,536-dimension vectors on a 64 vCPU, 512GB RAM instance.

With Neon’s elastic capabilities and its architecture that separates storage and compute, you can, from the console or using the Neon API, allocate additional resources to your Postgres instance specifically for your HNSW index build process and then scale down to meet user demands, making Neon and pgvector a match made in heaven for efficient AI applications that scale to millions of users.

This article details how you can use pgvector with Neon.

The power of pgvector

Pgvector is Postgres’ most popular extension for vector similarity search. Vector search has become increasingly crucial to semantic search and Retrieval Augmented Generation (RAG) applications, enhancing the long-term memory of large language models’ (LLMs).

In both semantic search and RAG use cases, the database contains a knowledge base that the LLM wasn’t trained on, split into a series of texts or chunks. Each text is saved in a row and is associated with a vector generated by an embedding model such as OpenAI’s ada-embedding-002 or Mistral-AI’s mistral-embed.

Vector search is then used to find the most similar (closer) text to the query vector. This is achieved by comparing the query vector with every row in the database, making vector search hard to scale. This is why pgvector implemented approximate nearest neighbor (ANN) algorithms (or indexes), which conduct the vector search over a subset of the database to avoid lengthy sequential scans.

One of the most efficient ANN algorithms is the Hierarchical Navigable Small World (HNSW) index. Its graph-based and multi-layered nature is designed for billions-of-row vector search. This makes HNSW extremely fast and efficient at scale and one of the most popular indexes in the vector store market.

HNSW’s Achilles heel: memory and build time

HNSW was first introduced by Yu A Malkov and Dmitry A. Yashunin in their paper titled Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs.

HNSW is a graph-based approach to indexing high-dimensional data. It constructs a hierarchy of graphs, where each layer is a subset of the previous one, which results in a time complexity of O(log(rows)). During the search, it navigates through these graphs to quickly find the nearest neighbors.

As fast and efficient as HNSW is, the index has two drawbacks:

1. Memory: The index requires significantly more memory than other indexes, such as the Inverted File Index (IVFFlat). You can solve the memory issue by having a larger database instance. But if you use standalone Postgres such as AWS RDS, you will find yourself in a position where you over-provision just for the index build. With Neon scaling capabilities, however, you can scale up, build the HNSW index, and then scale back down to save on cost.

2. Build time: The HNSW index can take hours to build for million-row datasets. This is mainly due to the time spent calculating the distance among vectors. And this is precisely what pgvector 0.6.0 solves by introducing Parallel Index Build. By allocating more CPU and workers, you build your HNSW index 30x faster.

But wait! The HNSW index supports updates, so why is this feature parallel index build necessary if you only need to build the index once?

Well, there are two cases where you need to create an HNSW index:

When you want faster queries and to optimize for vector search
When you already have an HNSW index, and you delete vectors from the table

The latter might cause the indexed search to return false positives, negatively impacting the quality of the LLM response and the overall performance of your AI application.

Scale up and boost index build time

pgvector 0.6.0 speeds up index build time up to 30 times compared to previous versions when using parallel workers. This improvement is especially notable when dealing with large data sets and vector sizes, such as OpenAI 1536 dimension vector embeddings.

Creating an HNSW index could require significant resources. The reason is you need to allocate enough maintenance_work_mem to fit the index in memory. Otherwise, the hnsw graph will take significantly longer to be built.

NOTICE:  hnsw graph no longer fits into maintenance_work_mem after 100000 tuples
DETAIL:  Building will take significantly longer.
HINT:  Increase maintenance_work_mem to speed up builds.

With Neon, you can scale up your Postgres instance using the Console or the API, configure it to build the index, and then scale back down to save on cost.

To effectively use parallel index build, it’s essential to configure Postgres with suitable settings. Key parameters to consider are:

maintenance_work_mem: This parameter determines the memory allocated for creating or rebuilding indexes. This parameter affects the performance and efficiency of these operations. Setting this to a high value, such as 8GB, allows for more efficient handling of the index build process.

SET maintenance_work_mem = '8GB';

max_parallel_maintenance_workers: This dictates the number of parallel workers that can be employed. The default value of max_parallel_maintenance_workers is typically set to 2 in Postgres. Setting this to a high number enables the utilization of more computing resources for faster index builds.

SET max_parallel_maintenance_workers = 7; -- plus leader

Note: Neon supports for pgvector 0.5.1. However, our engineering team is working on adding support for 0.6.0. Stay tuned.

How does this affect recall performance?

Recall is as important of a metric as query execution time in RAG applications. Recall is the percentage of correct answers the ANN provides. In the HNSW index, ef_search is the parameter that determines the number of neighbors to scan at search time. The higher ef_search is, the higher the recall and the higher the query execution time.

The tests conducted by Johnathan Katz show that using parallel builds has negligible impact on recall, with most changes swinging positively by over 1%. Despite the substantial speed improvements, this remarkable stability in recall rates highlights the effectiveness of pgvector 0.6.0’s parallel build process.

Conclusion

pgvector 0.6.0 represents a significant leap forward, proving that Postgres is an important player in the vector search space. By harnessing the power of parallel index building, developers can now construct HNSW indexes more rapidly and efficiently, significantly reducing the time and resources traditionally required for such tasks.

Neon’s flexible and scalable serverless Postgres offering complements pg vector’s capabilities perfectly. Users can scale their database resources according to their specific needs for index building and then scale down to optimize costs, ensuring an economical yet powerful solution.

What AI applications are you currently building? Try pgvector on Neon today, join us on Discord, and let us know how we can improve your experience with serverless PostgreSQL.

Bring Your Own Extensions to Serverless PostgreSQL

Raouf Chebri — Wed, 17 Jan 2024 14:07:36 +0000

Extensions in PostgreSQL are comparable to libraries in programming languages or plugins in web browsers. They are pivotal in the PostgreSQL ecosystem, providing additional functionalities ranging from encryption and AI to handling time series and geospatial data. More complex extensions can transform PostgreSQL into a graph or analytical database, and some companies even create custom private extensions for specific business logic.

Neon’s compute in stateless PostgreSQL, which runs as a VM or a Kubernetes pod. The compute image comes with a list of supported extensions. However, supporting a wide range of PostgreSQL extensions can pose performance and security risks in a multi-tenant serverless environment like Neon. This is why we are excited to announce we added support for private and custom extensions using Dynamic Extension Loading.

This feature is currently in beta on request only. You can contact support if you want to bring your own extensions to Neon. In this article, we’ll introduce Dynamic Extension Loading, its implementation, its benefits, and our future plans.

Extensions in PostgreSQL

PostgreSQL is a robust and versatile database system that is further enhanced by its support for extensions. Some of the most popular extensions are PostGIS for geolocation, pg_stat_statement, or pgvector for vector similarity search.

Extensions in PostgreSQL come in various forms:

SQL Object Packages: These are the most common, comprising domain-specific data types, functions, triggers, etc.
Procedural Languages: Extensions like PLPython or PLV8 enable the use of different programming languages within PostgreSQL.
Internal API Enhancements: Written in C, these powerful extensions can introduce new storage methods, volume replication, background jobs, and configuration parameters.
Extensions in Other Languages: Beyond C, extensions can be developed in languages like C++ or Rust, broadening the scope of functionality.

To use an extension, it must be built against the correct major version of PostgreSQL. The installation involves placing files in the shared directory and library files in the libdir, paths that vary across platforms. After placing the files, the CREATE EXTENSION command is executed in the database, prompting PostgreSQL to locate and run the installation scripts for the extension.

Extension support limitations in serverless environments

In Neon's serverless PostgreSQL environment, each compute runs as an ephemeral Kubernetes pod or VM. A compute instance can be scaled up, down, and descheduled whenever the workload changes. Therefore, supporting a wide range of PostgreSQL extensions presents significant challenges such as:

Compatibility: Many extensions are not designed for serverless architectures, particularly those needing persistent storage or deep system integration, such as pg_cron and file_fdw.
Performance Issues: Embedding all extensions in the compute image significantly increases its size, leading to slower start times and reduced performance.
Maintenance Overhead: Traditional methods require frequent updates to the entire compute image for each extension update, causing potential service disruptions.
Security Risks: A larger set of extensions in the base image increases the potential attack surface, especially with extensions that remain unused by many users.
Limited Customization: The open-source nature of compute images restricts the inclusion of custom or closed-source extensions, limiting tailored solutions for specific customer needs.

Therefore, the conventional method of bundling extension files into compute images is impractical due to the sheer number of extensions and the varied needs of users. This led us to rethink how we provide extensions with Dynamic Extension Loading.

Dynamic Extension Loading: A New Approach

At Neon, we've addressed these challenges with our dynamic extension loading mechanism. Here's how it works:

Building and Storing Extensions: We build extensions in a separate repository and store the resulting files in an S3 bucket.
Configuring Extensions: Extensions are configured per user in the Neon control plane, enhancing customization.
On-Demand Loading: Compute instances download control files at startup, and library files are fetched as needed when extension functions are called.

With Dynamic Extension Loading, private and default extensions can be added to compute instances without restarting, reducing maintenance overhead. Additionally, it brings performance benefits to Neon. Our plans with Dynamic Extension Loading include moving all default-supported extensions to the extension storage, resulting in a smaller compute image size and faster start times.

How to bring your own extension to Neon

To request support for a Postgres extension, paid plan users can open a support ticket. Free plan users can submit a request via the feedback channel on our Discord Server.

Our engineers will then evaluate the compatibility of your extensions with Neon, build it, and upload the artifacts to the extension storage once it pass all the security tests.

Conclusion

This feature is currently in beta, with plans for general availability in the near future. This development marks a significant step forward in making PostgreSQL more adaptable and efficient in a serverless environment.

What about you? Do you use PostgreSQL extensions in your projects? Join us on Discord and let us know which extensions you use and how we can enhance your PostgreSQL experience in the cloud.

Change Data Capture with Serverless Postgres

Raouf Chebri — Thu, 21 Dec 2023 12:23:07 +0000

Modern applications often require loosely coupled components and services that help teams and systems to scale. These data pipelines generate continuous data streams that need to be replicated, processed, or analyzed.

However, moving data between different data stores can seriously compromise the quality and reliability of your decisions because inconsistent data or corruption occurs during transformation. This is why Change Data Capture (CDC) has emerged as one of the most popular methods to synchronize data across multiple data stores. One way to use CDC in Postgres is with logical replication.

Today, we’re excited to announce the release of logical replication in beta on Neon. This feature lets you stream your data hosted on Neon to external data stores, allowing for change data capture and real-time analytics.

Why CDC matters?

CDC refers to the process of capturing changes made to data in a database – such as inserts, updates, and deletes – and then delivering these changes to downstream processes or systems.

CDC operates by monitoring and capturing data changes in a source database as they occur. This is a departure from traditional batch processing, where data updates are transferred at scheduled intervals. CDC ensures that every change is captured and can be acted upon almost instantaneously.

Why CDC Matters

Data synchronization: In a distributed system architecture, keeping data synchronized across various platforms and services is critical. CDC facilitates this by providing a mechanism for real-time data replication.
Minimizing Latency: By capturing changes as they happen, CDC minimizes the latency in data transfer. This is essential for applications where even a slight delay in data availability can lead to significant issues, such as financial trading systems.
Enabling Event-Driven Architectures: CDC is a cornerstone for building event-driven systems. In such architectures, actions are triggered in response to data changes, making real-time data capture essential.
Data warehousing and real-time analytics: For organizations relying on data warehouses and analytics tools for decision-making, CDC ensures that the data in these systems is current, enhancing the accuracy of insights.

Now that we understand it better, let’s explore the technical mechanics of how CDC is implemented in Postgres through logical replication.

Logical replication: under the hood

In Postgres, logical replication is one of the methods of implementing CDC and streaming data from your database to an external source. It uses a publisher-subscriber model.

Your Neon database works as a publisher, copying first a snapshot of the data and then streaming changes to one or more target data stores (subscribers). This model allows for selective replication, where only specified tables or even specific columns within a table can be replicated.

Learn more about connecting Neon to different data stores in the documentation.

The Write-Ahead-Log (WAL) is a fundamental component in Postgres, designed to ensure data integrity and facilitate recovery. It records every change made to the database, including transactions and their states.

For logical replication, the WAL serves as the primary data source. The WAL captures the comprehensive sequence of data changes, which are then decoded for replication purposes. Logical replication transforms the WAL to a format accepted by the subscriber through logical decoding, and the walsender then streams the transformed data using the replication protocol.

The walsender initiates the logical decoding of the WAL using an output plugin. Postgres ships with several logical decoding plugins that can output the data in various formats. In addition, new plugins can be developed.

For instance, in a Postgres-to-Postgres logical replication, the standard pgoutput plugin transforms the data changes to the logical replication protocol. The transformed data is subsequently streamed using the replication protocol, which maps it to local tables and applies the changes in the exact sequence of the original transactions. However, integrations with non-Postgres systems require an output format different from the standard one specifically designed for Postgres-to-Postgres logical replication.

Today’s data pipelines involve more than one data store type. For example, you can integrate all your Postgres databases into a data warehouse or streaming platform, such as Materialize or Kafka, to process and analyze data at higher scales. This is why, with the release of logical replication on Neon, we added support for wal2json, which outputs changes in the JSON format to be easily consumed by other systems and data stores.

You can read more on Change Data Capture using Neon and Materialize by our friend Marta Paes, to learn how to integrate your database with external systems.

Logical vs. physical replication

Logical replication differs from physical replication in that it replicates changes at the data level (row-level changes) rather than replicating the entire database block. This allows for more selective replication and reduces the amount of data transferred. Unlike snapshot replication, which provides a full copy of the data at a specific point in time, logical replication ensures continuous streaming of changes, making it more suitable for applications that require near real-time data availability.

This comparison highlights the distinct characteristics, advantages, and applications of logical and physical replication.

Logical Replication	Physical Replication
Row-Level Changes: focuses on replicating specific row-level changes (INSERT, UPDATE, DELETE) in selected tables.	Block-Level Replication: replicates the entire database at the block level. It creates an exact copy of the source database, including all tables and system catalogs.
Flexibility: Offers the flexibility to replicate specific tables and even specific columns within tables.	Limitations: Doesn’t allow for selective table replication and requires the same PostgreSQL version on both the primary and standby servers.
WAL-based: Uses the WAL for capturing changes, but with logical decoding to convert these changes into a readable format for the subscriber.	Streaming Replication: Changes are streamed as they are written to the WAL, minimizing lag.
Use Cases: Ideal for situations requiring selective replication, minimal impact on the source database, or cross-version compatibility.	Use Cases: Best suited for creating read-only replicas for load balancing, high availability, and disaster recovery solutions.

Get started with logical replication

To enable logical replication, navigate to your project’s settings in the console and click on the “Beta” tab, locate Logical Replication then on the “Enable” button.

Note that enabling logical replication will restart your compute instance, which will drop existing connections. A subscriber may also keep the connection to your Neon database active, preventing your Neon instance from scaling to zero.

This action is also irreversible, and you will not be able to disable logical replication for your project.

Ensure logical replication is enabled by running the following query in the SQL Editor within the Neon console or using psql on your terminal.

SHOW wal_level;

 wal_level 
-----------
 logical

Create a publication

Let’s assume you have the following users table:

CREATE TABLE users (

  id SERIAL PRIMARY KEY,

  username VARCHAR(50) NOT NULL,

  email VARCHAR(100) NOT NULL

);

Execute the following query to create a publication for the users table:

CREATE PUBLICATION users_publication FOR TABLE users;

Learn more about how to connect Neon to different data stores in the documentation.

Limitations

While logical replication in Neon Postgres offers numerous benefits for real-time data synchronization and flexibility, it has some limitations:

Publisher, not a subscriber

This release of logical replication on Neon is in beta, and for security reasons, it does not include subscriber capabilities at the moment. We are currently working on these security constraints, which should be supported in future releases.

Logical replication and Auto-suspend

In a logical replication setup, a subscriber may keep the connection to your Neon publisher database active to poll for changes or perform sync operations, preventing your Neon compute instance from scaling to zero. Some subscribers allow you to configure connection or sync frequency, which may be necessary to continue taking advantage of Neon’s Auto-suspend feature. Please refer to your subscriber’s documentation or contact their support team for details.

Data Definition Language (DDL) Operations

Logical replication in Postgres primarily handles Data Manipulation Language (DML) operations like INSERT, UPDATE, and DELETE. However, it does not automatically replicate Data Definition Language (DDL) operations such as CREATE TABLE, ALTER TABLE, or DROP TABLE. This means that schema changes in the publisher database are not directly replicated to the subscriber database.

Manual intervention is required to replicate DDL changes. This can be done by applying the DDL changes separately in both the publisher and subscriber databases or by using third-party tools that can handle DDL replication.

Replication Lag

In high-volume transaction environments, there is potential for replication lag. This is the time delay between a transaction being committed on the publisher and the same transaction being applied on the subscriber.

It’s important to monitor replication lag and understand its impact, especially for applications that require near-real-time data consistency. Proper resource allocation and optimizing the network can help mitigate this issue.

Conclusion

Logical replication is undoubtedly one of the most important features for modern applications. As we continue to develop its capabilities, we encourage you to test, experiment, and push the boundaries of what logical replication can do. Join us on Discord, and share your experiences, suggestions, and challenges with us.

We can’t wait to see what you build with Neon.

Mixtral 8x7B: What you need to know about Mistral AI’s latest model

Raouf Chebri — Mon, 11 Dec 2023 17:15:23 +0000

We’re Neon, and we’re redefining the database experience with our cloud-native serverless Postgres solution. If you’ve been looking for a database for your RAG apps that adapts to your application loads, you’re in the right place. Give Neon a try, and let us know what you think. Neon is cloud-native Postgres and scales your AI apps to millions of users with pgvector. In this post, Raouf is going to tell you what you need to know about Mixtral 8x7B, the new LLM by MistralAI.

Mistral AI, the company behind the Mistral 7B model, has released its latest model: Mixtral 8x7B (Mixtral). The model includes support for 32k tokens, better code generation, and it matches or outperforms GPT3.5 on most standard benchmarks.

In this article, we'll review the new text-generation and embedding models by Mistral AI.

Background

Mistral AI has emerged as a strong contender in the open-source large language model sphere with their Mistral 7B model, which outperforms existing models like Llama 2 (13B parameters) across multiple benchmarks.

In a previous comparative analysis, we concluded that, although impressive, the Mistral 7B instruct model optimized for chat needed some improvements before being seen as an alternative to the gpt-* models.

Mixtral might change all of that as it’s pushing the frontier of open models. According to a recent benchmark, Mixtral matches or outperforms Llama 2 70B and GPT3.5.

	LLaMA 2 70B	GPT – 3.5	Mixtral 8x7B
MMLU (MCQ in 57 subjects)	69.9%	70.0%	70.6%
HellaSwag (10-shot)	87.1%	85.5%	86.7%
ARC Challenge (25-shot)	85.1%	85.2%	85.8%
WinoGrande (5-shot)	83.2%	81.6%	81.2%
MBPP (pass@1)	49.8%	52.2%	60.7%
GSM-8K (5-shot)	53.6%	57.1%	58.4%
MT Bench (for Instruct Models)	6.86	8.32	8.30

Developing with Mixtral 8x7B Instruct

If you plan to fine-tune Mixtral and your own inference, it's important to note that Mixtral requires much more RAM and GPUs than Mistral 7B. While Mistral 7B works well on a 24GB RAM 1 GPU instance, Mixtral requires 64GB of RAM and 2 GPUs, which increases the cost by a factor of 3 (1.3$/h vs. 4.5$/h).

Luckily for developers, Mistral AI has an API in beta and under an invite gate. They also have client libraries for Python and JavaScript developers.

Below is an example of code using the Python library.

Prerequisite: install the mistraiai client library using pip:

pip install mistralai

Here is a code example:

from mistralai.client import MistralClient
from mistralai.models.chat_completion import ChatMessage

api_key = os.environ["MISTRAL_API_KEY"]
model = "mistral-tiny"

client = MistralClient(api_key=api_key)

messages = [
    ChatMessage(role="user", content="What is the elephant database?")
]

chat_response = client.chat(
    model=model,
    messages=messages,
)

If you’re familiar with the OpenAI client library, you will notice the similarity between the two SDKs. The Mistral AI library can be used as a drop-in replacement, which makes migrations seamless.

Mistral AI API provides three models:

mistral-tiny based on Mistral-7B-v0.2
mistral-small based on Mixtral-7Bx8-v0.1
mistral-medium based on an internal prototype model

Mistral-embed: The new embedding model

In addition to the text generation models, Mistral AI’s API gives you access to BGE-large-like 1024-dimension embedding model mistral-embed, also accessible via the client library with the below code:

from mistralai.client import MistralClient

api_key = os.environ["MISTRAL_API_KEY"]

client = MistralClient(api_key=api_key)
embeddings_batch_response = client.embeddings(
      model="mistral-embed",
      input=["I love Postgres!"],
  )

What does it mean for your AI apps?

Mixtral provides developers with a gpt-3.5-turbo API compatible alternative and, in the case of mistral-tiny and mistral-small models, at a lower price.

Below is the price comparison per one million tokens.

	mistral-tiny	mistral-small	mistral-medium	gpt-3.5-turbo-1106	gpt-3.5-turbo-instruct
Input	$0.15	$0.64	$2.68	$1.0	$1.5
Output	$0.45	$1.93	$8.06	$2.0	$2.0

However, if you previously stored ada v2 1536 dimension vector embeddings with pgvector, you will need to re-create the embeddings to add support for mistral-embed.

embeddings_batch_response = client.embeddings(
      model="mistral-embed",
      input=["text 1", "text 2", "text 3"],
  )

The mistral-embed model for text embedding is slightly more expensive than the text-embedding-ada-002 model.

	mistral-embed	ada v2
Input	$0.107	$0.1

Note that Mistral AI’s pricing is in euros and the tables above reflect adjusted rates to USD.

Conclusion

The release of Mixtral 8x7B by Mistral AI represents a significant leap forward for open-source LLMs. With its enhanced capabilities like 32k token support, improved code generation, and competitive performance against gpt-3.5-turbo, Mixtral is poised to be a game-changer for developers and AI enthusiasts alike.

While the model’s resource requirements can be a potential barrier for some, those limitations are offset by the Mistral AI API, and the drop-in replacement client libraries in Python and JavaScript.

The pricing structure of Mixtral, particularly for the mistral-tiny and mistral-small models, presents a more cost-effective alternative to gpt-3.5-* models. This, along with the advanced capabilities of the mistral-embed model for text embedding, makes Mixtral an attractive option for a wide range of AI apps and Retrieval Augmented Generation pipelines.

However, it's worth noting that transitioning to Mixtral, especially for those who previously used models like ada v2 for embedding, may require some adjustments in terms of re-creating embeddings and accommodating the slightly higher cost of mistral-embed.

Overall, Mixtral 8x7B marks an exciting development in the AI field, offering powerful and efficient tools for a variety of applications. As Mistral AI continues to innovate and expand its offerings, it will undoubtedly play a crucial role in shaping the future of AI technology.

Mistral 7B and BAAI on Workers AI vs. OpenAI Models for RAG

Raouf Chebri — Mon, 11 Dec 2023 17:06:54 +0000

In the rapidly progressing world of artificial intelligence, choosing the right model for AI-powered applications is crucial. This article explores a comparative analysis of the Mistral 7B model, a promising alternative to OpenAI’s GPT models and BAAI models in the context of Retrieval Augmented Generation (RAG) applications. But first, let’s understand the landscape.

Understanding RAG Pipelines

RAG pipelines enhance the capabilities of Large Language Models (LLMs) by providing them with external ‘research’ to inform their responses. Let me explain.

Imagine you are a pastry chef, and someone asks you how to make a good chocolate cake. That’s an easy one for you since you have made countless chocolate cakes in the past. But what if someone asks you about making Tandoori chicken? You probably do not have the recipe on the top of your head, but with a little bit of research, you will likely be able to answer that question. The same applies to LLMs.

When the user asks the LLM a question, the RAG pipeline does the search to provide more context and helps steer the model towards a more accurate and helpful answer. Typically, the context is a piece of information stored in a document or a database that the model hasn’t seen during training.

The below diagram illustrates the RAG process using Neon Docs Chatbot as an example. The diagram shows three main steps:

Embedding generation: we need an embedding model to turn the user’s query into a query vector.
Context retrieval: This is the process of looking for the information in a document or a database using similarity search.
Completion (or text) generation: In this step, the application provides the completion model with the user query and the context to generate an answer.

We now want to explore open-source alternatives for text generation and embedding models. But first, let’s explain why open-source models are becoming increasingly popular.

What are open-source AI models, and why should you care?

The short answer is transparency.

OpenAI provides developers with GPT and ADA models that are essential for RAG pipelines. However, those models operate as black boxes, which can pose a security concern for some, accelerating the interest in open-source models such as Llama2 and Mistral 7B.

Being open-source in the AI model context means a transparent training process and output.

Mistral AI, for instance, opened the code and weights of its 7-billion-parameter open-source large language model (LLM) Mistral 7B to the public and explained how the model uses a sliding window to speed up inference and reduce memory pressure, which gives the model an edge over other open-source models.

Let’s explore the Mistral 7B model in more detail.

Introducing Mistral 7B: A New Contender

The Mistral 7B model is an alternative to OpenAI’s GPT models, engineered for performance and efficiency. According to a recent paper, Mistral 7B outperforms existing models like Llama 2 (13B parameters) across all evaluated benchmarks, showcasing superior performance in areas such as reasoning, mathematics, and code generation.

Mistral 7B has been released under the Apache 2.0 license, with a reference implementation available for easy deployment on cloud platforms like AWS, GCP, and Azure or locally.

The paper’s claims are impressive, so I wanted to experience firsthand how good the LLM is.

BGE: The open-source Embedding generation model

BGE embedding is a general Embedding Model pre-trained using retromae that can fine-tuned. Interestingly, BGE comes in three dimension sizes:

small: 384 dimensions
base: 768 dimensions
large: 1024 dimensions

This means that you can reduce the size of storage related to your embeddings by 30-80%, depending on the BGE model you choose.

I used the base-768 model on Cloudflare Workers AI for Text Embedding generation for this article.

Methodology of Comparative Analysis

For a hands-on evaluation, I replaced gpt-3.5-turbo with mistral-7b-instruct-v0.1 and text-embedding-ada-002 with BAAI’s bge-base-en-v1.5 in a Neon Docs chatbot. I then compared their responses to eight PostgreSQL-related questions.

Technical Setup

Text generation model : Mistral 7B on Cloudflare Workers AI.
Embedding model : BGE base-768 on Cloudflare Workers AI.
Experiment models : mistral-7b-instruct-v0.1, gpt-3.5-turbo, text-embedding-ada-002, bge-base-en-v1.5.

Cloudflare Workers AI allows you to run text, image, and embedding generation models using serverless GPUs. Note that Workers AI is currently in Open Beta and is not recommended for production data.

I tested Mistral 7B on a virtual machine with 24GB of vRAM and NVIDIA GPUs for $1.3 per hour. But our friends at Cloudflare released Workers AI, a GPU-powered serverless environment to run machine learning models that better fit my use case.

For those interested in deploying their own Mistral 7B instance, I added instructions at the end of this article to deploy using the HuggingFace inference endpoint.

In this analysis, we used the mistralai/Mistral-7B-Instruct-v0.1 model, which has been fine-tuned for conversation and answering questions.

The default max number of tokens per query is 1512. For this test, I had to increase the max input length and number of tokens to 3000. Note that the larger this value, the more memory each request will consume.

Results and Discussion

Context quality

In the RAG pipeline, the context is the concatenation of texts resulting from a semantic search.

Therefore, the quality of our context is heavily correlated to the quality of the semantic search using bge-base-en-v1.5 and text-embedding-ada-002 embeddings models. So, the question is: how different would my context be if I switched the text embedding model?

Our analysis showed that the bge-base-en-v1.5 and text-embedding-ada-002 models provided similar results 46% of the time. A deeper dive using Jaccard and Cosine similarity scores indicated a significant overlap in contexts, suggesting comparable quality.

With the number of returned chunks k=3, the bge-base-en-v1.5 and text-embedding-ada-002 models return similar results only 46% of the time. This number is reduced to 42% with k=10.

Further analysis using Jaccard and Cosine similarity scores to determine intersecting words and count shows that half of the contexts generated by the two models are similar and often share words.

Note: In the above, we extracted the word’s frequency count in the context using TF-IDF to calculate the cosine similarity score. Typically, cosine similarity is sensitive to the frequency of words, while Jaccard similarity purely focuses on the intersecting words.

More analysis of the completions generated by gpt-3.5-turbo model shows strong cosine similarity among texts. Competitions share an average of 40% of words.

Even for questions 3, 5, and 7, where retrieved contexts using text-embedding-ada-002 and bge-base-en-v1.5 embedding models were quite similar; the generated texts were different. This change is probably due to the LLM temperature set to default (0.7), which controls the degree of randomness in the response and allows for variations in the generated text.

Text generation quality

Quality is subjective. Therefore, we surveyed Postgres and Neon experts to rate the generated text quality. gpt-3.5-turbo scored an average of 3.8/5, outperforming mistral-7b-instruct-v0.1‘s 2.5/5. (1 being very bad and 5 being very good).

The answers shared in the survey were generated using context retrieved with text-embedding-ada-002 embedding model. The ground truth in this experiment is the Postgres and Neon documentation.

Conclusion

The BGE model, with its varying dimension sizes, allows for a customizable approach to managing storage and computational resources since smaller vector sizes reduce storage and semantic search query execution time.

Our analysis revealed that the bge-base-en-v1.5 and text-embedding-ada-002 models, while similar in results to some extent, display unique characteristics in their context generation capabilities. The observed differences in the semantic search results – with a 46% similarity rate in some instances – underscore the importance of choosing the right embedding model based on the specific requirements of an application.

In the case of our chatbot, these initial results suggest that the output quality wouldn’t change drastically by migrating to bge-base-en-v1.5. Migrating text generation models, on the other hand, is a different story.

The Mistral 7B model stands out as a strong contender. Its ability to outperform models like Llama 2 in reasoning, mathematics, and code generation, coupled with its ease of deployment, makes it a viable option for those seeking an alternative to OpenAI’s GPT models.

However, the difference in performance in our tests – with gpt-3.5-turbo outperforming mistral-7b-instruct-v0.1– suggests that while newer models like Mistral 7B are closing the gap, there remains room for improvement and innovation.

Zephyr-7B-beta, a fine-tuned version of mistralai/Mistral-7B-v0.1 that was trained on a mix of publicly available and synthetic datasets, looks promising and could further reduce the gap.

What about you? Which models do you use for your RAG pipelines? Join us on Discord and tell us about your experience with AI models and what you think.

Note: A special thanks to Stan Girard for inspiring the topic of this article. His suggestion and enthusiasm for AI have been invaluable in shaping this discussion.

BONUS: Deploy Mistral 7B Instruct on HuggingFace

For the ones interested in deploying on Huggingface:

Pre-requisites:

– HuggingFace account

– HuggingFace Token

On HuggingFace, locate the “Deploy” drop-down and click on “Inference Endpoints”

Select the default instance with 24GB of vRAM and an NVIDIA GPU, then click on “Create Endpoint” at the bottom of the page.

Once your instance is deployed, you can test your endpoint on HuggingFace:

Or in code:

import requests

API_URL = &quot;https://asdfghjklrtyui.us-east-1.aws.endpoints.huggingface.cloud&quot;

headers = {

&quot;Authorization&quot;: &quot;Bearer XXXXXX&quot;,

&quot;Content-Type&quot;: &quot;application/json&quot;

}

def query(payload):

response = requests.post(API_URL, headers=headers, json=payload)

return response.json()

output = query({

&quot;inputs&quot;: &quot;Who are you?&quot;,

})

# A: I'm Mistral, a language model trained by the Mistral AI team.">