<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Neurelo</title>
    <description>The latest articles on Forem by Neurelo (@neurelo).</description>
    <link>https://forem.com/neurelo</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F8821%2F77dbba52-5893-4870-815c-7a36b431b298.png</url>
      <title>Forem: Neurelo</title>
      <link>https://forem.com/neurelo</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/neurelo"/>
    <language>en</language>
    <item>
      <title>In the land of LLMs, can we do better mock data generation?</title>
      <dc:creator>Parth Parikh</dc:creator>
      <pubDate>Thu, 12 Sep 2024 15:55:37 +0000</pubDate>
      <link>https://forem.com/neurelo/in-the-land-of-llms-can-we-do-better-mock-data-generation-489</link>
      <guid>https://forem.com/neurelo/in-the-land-of-llms-can-we-do-better-mock-data-generation-489</guid>
      <description>&lt;p&gt;"Lorem ipsum is latin, slightly jumbled, the remnants of a passage from Cicero's _de Finibus_ 1.10.32, which begins 'Neque porro quisquam est qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit...' &lt;em&gt;[There is no one who loves pain itself, who seeks after it and wants to have it, simply because it is pain.]. [de Finibus Bonorum et Malorum, written in 45 BC, is a treatise on the theory of ethics very popular in the Renaissance.]&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;"What I find remarkable is that this text has been the industry's standard dummy text ever since some printed in the 1500s took a galley of type and scrambled it to make a type specimen book; it has survived not only four centuries of letter-by-letter resetting but even the leap into electronic typesetting, essentially unchanged except for an occasional 'ing' or 'y' thrown in. It's ironic that when the then-understood Latin was scrambled, it became as incomprehensible as Greek; the phrase 'it's Greek to me' and 'greeking' have common semantic roots!"&lt;/p&gt;

&lt;p&gt;       - &lt;em&gt;Richard McClintock, in a &lt;a href="https://nwalsh.com/comp.fonts/FAQ/cf_36.htm" rel="noopener noreferrer"&gt;1994 letter to the editor of Before &amp;amp; After magazine&lt;/a&gt;, describing his discovery of Lorem ipsum’s origin&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Mock data or synthetic data &lt;a href="https://arxiv.org/pdf/2205.03257" rel="noopener noreferrer"&gt;is not a replacement for real data&lt;/a&gt;, but it is a technology with significant utility  in testing and development. High-fidelity mock data can considerably accelerate software development lifecycles. Yet, if you look at the history of mock data generation, it has remained fairly stagnant for decades and still needs a lot of work to get it right. While we have seen evolutionary progress, the revolution has yet to come.&lt;/p&gt;

&lt;p&gt;So, what would this revolution look like? “High-fidelity mock data that just works!” screamed a random programmer holding a Quackles rubber duck. But what does “high fidelity” mean in this context? I believe what we really want is a technology that can deeply understand a user’s schema and its underlying goals to generate high-fidelity synthetic data for a test database in “one-click”.&lt;/p&gt;

&lt;p&gt;At Neurelo, our goal is to make it way easier to build with databases. Many of our users initially connect to us with an empty data source to evaluate our platform. And, there are those who work on new features by taking their current schema, making changes, testing them, and committing the changes once they are satisfied with their schema and its corresponding Neurelo auto-generated and Custom APIs. To test these changes, they usually also start with an empty database. Early on, we realized that to provide these users with a production-like experience when using these APIs, it was essential for us to intelligently mock data for their empty data sources. And that's exactly what we did.&lt;/p&gt;

&lt;p&gt;In this blog post, I will discuss how we went about creating Neurelo’s “&lt;a href="https://docs.neurelo.com/guides/mock-data-generation" rel="noopener noreferrer"&gt;mock data generation&lt;/a&gt;” technology—the nuts and bolts, gears and cogs, silos and pipelines of it all!&lt;/p&gt;

&lt;p&gt;When we started this project, we had five requirements in mind:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;It should be diverse, meaning it should work with all three data sources we support: MongoDB, MySQL, and Postgres.
&lt;/li&gt;
&lt;li&gt;It should generate realistic data based solely on the schema, without requiring any external user input—a “one-click” solution with minimal friction.
&lt;/li&gt;
&lt;li&gt;The cost should be as low as possible, balancing scalability with accuracy.
&lt;/li&gt;
&lt;li&gt;Response time should be very fast.
&lt;/li&gt;
&lt;li&gt;We should use entirely native Rust to do this.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;To understand why it was essential for us to use Rust for this project, let’s take a look at a simplified view of Neurelo’s underlying architecture:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn9zboqc7w2ea651p5t8c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn9zboqc7w2ea651p5t8c.png" alt="A simplified view of Neurelo’s underlying architecture" width="800" height="346"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Neurelo’s management layer hosts our entire frontend and is primarily responsible for communicating various user-specified actions to our operations plane. The operations plane, in turn, is where the actual execution of jobs takes place. In many cases, it delegates these jobs to be run against our query runners.&lt;/p&gt;

&lt;p&gt;Our query runners are at the heart of Neurelo. They are primarily responsible for:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Translating incoming data APIs and generating queries against the database,
&lt;/li&gt;
&lt;li&gt;Introspecting an existing database schema,
&lt;/li&gt;
&lt;li&gt;Managing schema changes over time using our migration tools, etc.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;“Mock Data Generation” acts as another “job” in the operations plane, which eventually gets delegated to our query runners. This is crucial because it allows us to leverage our existing Write API code paths to execute queries against the database without having to build new paths. This approach benefits us in terms of both performance and maintainability. Since our entire query runner is written solely in Rust, we needed a native Rust-based mock data generator.&lt;/p&gt;
&lt;h2&gt;
  
  
  LLMs to the rescue ?
&lt;/h2&gt;

&lt;p&gt;Right from the outset, we thought LLMs would be a great fit for this and asked ourselves how we could leverage LLMs to accomplish this.&lt;/p&gt;

&lt;p&gt;Our initial approach involved using LLMs to generate Rust code that would dynamically create raw &lt;code&gt;INSERT&lt;/code&gt; queries. However, despite several attempts at prompt tuning, we were unable to achieve reproducible accuracy over multiple iterations. Two significant issues plagued this approach: first, the generated rust code was not always compilable, and second, when it did eventually compile, the quality of the synthetic data was subpar—it tended to regress to generic formats like “Movie1,” “Movie2,” etc., for movie names, despite explicit prompt tuning. &lt;/p&gt;

&lt;p&gt;We could have tried extending beyond the zero-shot learning methodology and fine-tuned prompts further, but it didn’t take long for us to realize we were spinning our wheels with this approach and needed a more deterministic foundation.&lt;/p&gt;

&lt;p&gt;But curiosity got the better of us, and we wondered, could we do the same in Python? Based on our understanding, LLMs are particularly effective at writing executable Python code. Additionally, prompting them to leverage Python’s “&lt;a href="https://faker.readthedocs.io/en/master/" rel="noopener noreferrer"&gt;faker&lt;/a&gt;” third-party module could help us mitigate the second issue as well.&lt;/p&gt;

&lt;p&gt;This is when we stumbled upon a rabbit hole! You see, when mocking multiple tables for a database schema, &lt;strong&gt;the order of insertion matters&lt;/strong&gt;!&lt;/p&gt;
&lt;h2&gt;
  
  
  Mad Hatter pulls out the “Order of Insertion”
&lt;/h2&gt;

&lt;p&gt;Say you have three tables, &lt;code&gt;A&lt;/code&gt;, &lt;code&gt;B&lt;/code&gt;, and &lt;code&gt;C&lt;/code&gt;, in a database schema. Table &lt;code&gt;A&lt;/code&gt; has a foreign key, &lt;code&gt;B_fk&lt;/code&gt;, corresponding to table &lt;code&gt;B&lt;/code&gt;’s primary key. Similarly, table &lt;code&gt;B&lt;/code&gt; has a foreign key, &lt;code&gt;C_fk&lt;/code&gt;, corresponding to table &lt;code&gt;C&lt;/code&gt;’s primary key. In this scenario, it is crucial to follow the correct insertion order to maintain the referential integrity enforced by the foreign keys.&lt;/p&gt;

&lt;p&gt;For this, we should first mock table &lt;code&gt;C&lt;/code&gt; since it does not have any foreign keys, meaning there are no referential integrity constraints dependent on it. Next, we can use this data and mock table &lt;code&gt;B&lt;/code&gt;, which is now possible because we have ensured that the referenced records in table &lt;code&gt;C&lt;/code&gt; (via &lt;code&gt;C_fk&lt;/code&gt;) exist. Finally, we can mock table &lt;code&gt;A&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;As you can see with the above example, the generation of &lt;code&gt;INSERT&lt;/code&gt; statements for each table must respect their relationships and the order matters. As the french proverb goes, “Il ne faut pas mettre la charrue avant les bœufs” (loosely translated to “We must not put the cart before the horse”).&lt;/p&gt;

&lt;p&gt;So, let us set aside LLMs for a moment and ask ourselves: given a DB schema, how can we ensure that the order of insertion is always correct? Or, in other words, how do we derive an accurate order of insertion from the DB schema?&lt;/p&gt;

&lt;p&gt;The answer - &lt;a href="https://en.wikipedia.org/wiki/Topological_sorting" rel="noopener noreferrer"&gt;&lt;strong&gt;Topological sorting&lt;/strong&gt;&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;The first step is to create a &lt;a href="https://en.wikipedia.org/wiki/Directed_acyclic_graph" rel="noopener noreferrer"&gt;directed acyclic graph&lt;/a&gt; (DAG) from the relationships present in our DB schema. A DAG is a type of graph where edges have a direction and there are no cycles. This means that you cannot start at one node and follow a path that eventually loops back to the same node.&lt;/p&gt;

&lt;p&gt;Note the word “acyclic.” &lt;em&gt;A topological ordering is possible if and only if the graph has no directed cycles!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The simplest way to perform topological sorting is by using &lt;a href="https://dl.acm.org/doi/pdf/10.1145/368996.369025" rel="noopener noreferrer"&gt;Kahn’s algorithm&lt;/a&gt;. The idea is as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Start by calculating the in-degree (number of incoming edges) for each node in the graph.&lt;/li&gt;
&lt;li&gt;Next, enqueue all nodes with an in-degree of &lt;code&gt;0&lt;/code&gt;, meaning nodes with no incoming edges.&lt;/li&gt;
&lt;li&gt;While the queue is not empty:

&lt;ol&gt;
&lt;li&gt;Dequeue a node &lt;code&gt;N&lt;/code&gt; and add it to the topological sort’s result.
&lt;/li&gt;
&lt;li&gt;For each of &lt;code&gt;N&lt;/code&gt;’s neighboring nodes, reduce its in-degree by &lt;code&gt;1&lt;/code&gt;.
&lt;/li&gt;
&lt;li&gt;If any neighboring nodes’ in-degree becomes &lt;code&gt;0&lt;/code&gt;, enqueue it.&lt;/li&gt;
&lt;li&gt;Continue until the queue is empty.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The order in which the nodes are added to the topological sort’s result represents the topological sort of the DAG.&lt;/p&gt;

&lt;p&gt;Here’s an example to demonstrate this:  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw45uzei0bri924wl62l3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw45uzei0bri924wl62l3.png" alt="Example to demonstrate how Kahn's algorithm can be used to create an insertion order for Mock data generation." width="800" height="1134"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;“But what if my schema has recursion in it?” a curious George asks. Well, as you might have noticed, cycles complicate Kahn’s algorithm. Imagine you had a cycle involving Authors. In such a scenario, none of the in-degrees would start at zero. Moreover, if the cycle were on Books, this would create complications when reducing the in-degrees of the neighboring nodes by 1, potentially leading to a situation where not all nodes are traversed.&lt;/p&gt;

&lt;p&gt;If you think about it, ignoring cyclic relationships can actually be a blessing in disguise. After processing all nodes with an in-degree of &lt;code&gt;0&lt;/code&gt;, if there are still nodes left in the graph (i.e., if the queue is not empty), we can be certain that cycles exist.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://stackoverflow.com/questions/14791309/is-it-ok-to-have-cyclic-foreign-key-dependencies-in-a-relational-database?rq=3" rel="noopener noreferrer"&gt;If cyclic relationships are essential&lt;/a&gt;, one way to handle them is by breaking cycles using &lt;code&gt;NULL&lt;/code&gt; values. The first step is to identify where the cycles are occurring, which can be done using &lt;a href="https://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm" rel="noopener noreferrer"&gt;Tarjan's strongly connected components algorithm&lt;/a&gt;. When inserting mock data, we can temporarily insert &lt;code&gt;NULL&lt;/code&gt; values in the foreign key fields. After all the mock data has been inserted for each strongly connected component, we can update the &lt;code&gt;NULL&lt;/code&gt; foreign keys with the actual mock data values. Note that this assumes the foreign key fields can be nullable. There are other pitfalls as well, such as what happens if the table has database triggers (like &lt;code&gt;AFTER INSERT&lt;/code&gt;) that depend on foreign key relationships. Inserting &lt;code&gt;NULL&lt;/code&gt; values should not trigger incorrect behavior in such scenarios. Composite keys add another layer of complexity to this. All of these are things we needed to handle.&lt;/p&gt;
&lt;h2&gt;
  
  
  Into the Mine Fields of Mock Data Generation
&lt;/h2&gt;

&lt;p&gt;So, how do we go about creating our “mock data generation” technology? As we’ve seen, generating dynamic code using LLMs that in turn dynamically generates raw &lt;code&gt;INSERT&lt;/code&gt; queries does not seem to be the solution—regardless of the language! However, there are a few positives we gained from this experimentation:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Leveraging a “faker”-like module can help us generate better quality mock data, elevating us from the generic world of “Name1” to “SpongeBob SquarePants.”
&lt;/li&gt;
&lt;li&gt;We now have a way to derive an insertion order from our schema.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Returning to the drawing board, we chalked up an interesting idea: instead of using LLMs to generate data, how about using them to predict which method to select from a list of methods that faker-like modules expose? For example, a &lt;code&gt;(Column, ColumnType)&lt;/code&gt; of &lt;code&gt;(hashval, string)&lt;/code&gt; can be mapped to “&lt;code&gt;md5()&lt;/code&gt;”, and a &lt;code&gt;(language, string)&lt;/code&gt; can be mapped to “&lt;code&gt;language_name()&lt;/code&gt;”. This approach has its caveats, but it’s a perfect starting point. It’s a fast and low-cost solution.&lt;/p&gt;

&lt;p&gt;To implement this, we leveraged Neurelo’s JSON-schema spec, which we call &lt;a href="https://docs.neurelo.com/neurelo-schema-language-nsl" rel="noopener noreferrer"&gt;Neurelo Schema Language (NSL)&lt;/a&gt;. Once a user introspects their data source with Neurelo, we use this NSL spec throughout to represent their schema. &lt;/p&gt;

&lt;p&gt;This schema can now become the anchor for us to build our mock data generation logic. Here’s how our 1.0 “Mock Data Generation” logic looked like:  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F342yr97blb1rbreyj3um.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F342yr97blb1rbreyj3um.png" alt="Neurelo's 1.0 “Mock Data Generation” pipeline" width="800" height="586"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The LLM prompt itself classifies an array of tuples of the form &lt;code&gt;(column name, column type)&lt;/code&gt; into a JSON dictionary where the classification results are present:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;  
    &lt;/span&gt;&lt;span class="err"&gt;table_name_&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;  
        &lt;/span&gt;&lt;span class="err"&gt;column_name_&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;mapped_method_&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;  
        &lt;/span&gt;&lt;span class="err"&gt;column_name_&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;mapped_method_&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;  
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;  
    &lt;/span&gt;&lt;span class="err"&gt;table_name_&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;  
        &lt;/span&gt;&lt;span class="err"&gt;column_name_&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;mapped_method_&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt;  
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;  
    &lt;/span&gt;&lt;span class="err"&gt;.....&lt;/span&gt;&lt;span class="w"&gt;  
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As we couldn’t find any suitable alternative to &lt;a href="https://faker.readthedocs.io/en/master/" rel="noopener noreferrer"&gt;Python’s faker module&lt;/a&gt; in Rust, we ended up writing an equivalent in Rust.&lt;/p&gt;

&lt;p&gt;Tangentially, we encountered an interesting issue while implementing this Rust-based faker module. We faced a scenario where the entire module would suddenly panic. Uncertain about the cause, we began narrowing down the data source and its affected versions, discovering that only systems running MySQL versions earlier than 5.6.4 were affected. This led us to the infamous &lt;a href="https://en.wikipedia.org/wiki/Year_2038_problem" rel="noopener noreferrer"&gt;Year 2038 problem&lt;/a&gt; associated with 32-bit systems. By constraining our mocked timestamp range to between &lt;code&gt;1970-01-01&lt;/code&gt; and &lt;code&gt;2038-01-19&lt;/code&gt;, we resolved the issue. As my colleague George aptly put it, dealing with anything closely tied to timestamps is a direct line to a mental institution.&lt;/p&gt;

&lt;h2&gt;
  
  
  What’s up with References?
&lt;/h2&gt;

&lt;p&gt;Here is yet another engineering problem: how do we ensure that foreign keys and primary keys are correctly mapped to each other? Moreover, how do we make this work for NoSQL databases such as MongoDB (with object id references)?&lt;/p&gt;

&lt;p&gt;One approach is to keep a global index counter whose value corresponds to the row number being mocked. This can be directly useful when auto-increment is enabled for a primary key.&lt;/p&gt;

&lt;p&gt;For most scenarios, to ensure that foreign keys and primary keys match correctly while mocking data, one can use the following steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Track mock data: Use a dictionary to keep track of the mock data for all tables related to the current row you’re working on.
&lt;/li&gt;
&lt;li&gt;Mock each table:

&lt;ol&gt;
&lt;li&gt;For each table, generate mock data based on its columns and their types.
&lt;/li&gt;
&lt;li&gt;Identify and set the foreign key and primary key relationships so they match in your dictionary.&lt;/li&gt;
&lt;/ol&gt;


&lt;/li&gt;

&lt;li&gt;Store mock data: Save the generated mock data for each table in your dictionary.&lt;/li&gt;

&lt;/ol&gt;

&lt;p&gt;However, this approach cannot be directly applied to MongoDB, as it supports implicit referencing. To address this, we used another layer of LLM! For each MongoDB collection corresponding to a database, we identify properties that contain an &lt;code&gt;ObjectId&lt;/code&gt; and, by default, treat them as part of a “relation.” The LLM prompt then performs another classification to convert these: &lt;code&gt;{collection_name1: [relation1, relation2, ….., relationN], …..}&lt;/code&gt; into a JSON dictionary of &lt;code&gt;{relationX: referencing_collection_nameY, relationA: referencing_collection_nameB, …..}&lt;/code&gt; If a relation cannot be accurately mapped to a collection name, we simply discard it by assigning it a JSON &lt;code&gt;null&lt;/code&gt; value. We then use these output references to ensure that the &lt;code&gt;ObjectId&lt;/code&gt; values match each other.&lt;/p&gt;

&lt;p&gt;Et voilà! Referencing just got a pair of Ray-Bans! ;)&lt;/p&gt;

&lt;h2&gt;
  
  
  Ahem, ahem … Unique Constraints?
&lt;/h2&gt;

&lt;p&gt;We used a similar strategy for ensuring unique constraints. However, we encountered an interesting issue when mocking large amounts of data for a specific foreign key with a unique constraint. Because LLMs would use a faker module to populate the foreign key, it meant that these methods can sometimes create duplicate entries, leading to a unique constraint violation. For example, if "zip code" is a foreign key and zip codes are generated using a pseudo-random method, even with just 1,000 rows, duplication can occur (as zip codes are not very long), causing the constraint to fail.&lt;/p&gt;

&lt;p&gt;This issue triggers a chain reaction. Since the current table isn’t created, subsequent tables that depend on it also fail. The root cause is that not all columns justify populating a certain number of rows. This issue extends to the tables themselves as well. One solution is to select unique data from a pool of pre-generated, distinct data rather than generating it on the fly. However, this approach might not be ideal due to memory consumption.&lt;/p&gt;

&lt;p&gt;We have resolved these challenges to a large extent, but there is still more to be done which is something we are actively working on.&lt;/p&gt;

&lt;h2&gt;
  
  
  Even the best tailors get it wrong
&lt;/h2&gt;

&lt;p&gt;All of this work proved to be a solid start for us, and we did manage to ship a pretty good mock data generator into production and test it with our early users. However, it wasn’t long before we started noticing the familiar weed that has previously poisoned many AI models—“&lt;strong&gt;overfitting&lt;/strong&gt;”!&lt;/p&gt;

&lt;p&gt;You see, one of the shortcomings of this implementation is that the quality of mock data is directly proportional to the classification pipeline of our LLM model. There are many times when we encounter a scenario with &lt;code&gt;(Column, ColumnType)&lt;/code&gt; as &lt;code&gt;(name, string)&lt;/code&gt;, which is classified as a “name.” However, this can be a red-herring—this column might belong to a “Movie” table, a “mediaType” table, or even a “Brand” table. So our next lesson was that integrating table names into the pipeline is absolutely essential.&lt;/p&gt;

&lt;p&gt;However, even with table names integrated, there are scenarios where we can’t realistically map them to one of our faker modules. Although we previously implemented around 217 faker modules in Rust, this proved to be a drop in the ocean. Customers can present scenarios like &lt;code&gt;(film, description, string)&lt;/code&gt; or &lt;code&gt;(posts, title, string)&lt;/code&gt; and expect us to go beyond the age-old “Lorem Ipsum.” To address this, we developed a novel solution called the “&lt;strong&gt;Genesis Point Strategy&lt;/strong&gt;.”&lt;/p&gt;

&lt;p&gt;The underlying motivation for the Genesis Point Strategy is that while we would like to leverage LLMs to intelligently generate this fake data, doing so can be both time-consuming and costly. So, how can we achieve this in a cheap, fast, and intelligent way?&lt;/p&gt;

&lt;p&gt;In comes the power of a “Cross product”! The idea is that if we want to mock 1,000 rows with the constraint that all rows must be unique, we don’t need to generate 1,000 unique values from GPT. Instead, we need only &lt;code&gt;ceil(sqrt(1000)) = 32&lt;/code&gt; fake data elements. We can then duplicate these into two sets, &lt;code&gt;A&lt;/code&gt; and &lt;code&gt;B&lt;/code&gt;, and use the cross product &lt;code&gt;A×B&lt;/code&gt; to produce our final mocked data elements. While this may not be perfectly realistic, it should suffice for most practical purposes.&lt;/p&gt;

&lt;p&gt;Note that we don’t discard the older pipeline. Instead, we use zero-shot learning to train our LLM to classify “would-be-overfitted” columns into a “NoneOfTheAbove” category. We then only need to parse and generate mock data for columns corresponding to this “NoneOfTheAbove” classification.&lt;/p&gt;

&lt;p&gt;Here’s the revised iteration for our 2.0 “Mock Data Generation” logic:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3zkov0dph3z6y44nwblv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3zkov0dph3z6y44nwblv.png" alt="Neurelo's 2.0 “Mock Data Generation” pipeline" width="800" height="661"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And that’s it, folks! This was a summary of our mock data generation pipeline. Our users can now enjoy one-click high fidelity mock data generation based on schemas for their development and testing needs. &lt;/p&gt;

&lt;p&gt;The future holds exciting possibilities. With the lessons we have learned from these iterations, we are looking forward to tackling more complex challenges, whether it is further optimizing for unique constraints, supporting composite types and multi schemas, or integrating more cost-driven LLM strategies. One thing is clear though, the revolution in mock data generation is underway, and we are thrilled to be leading the charge.&lt;/p&gt;

&lt;p&gt;We encourage you to explore this feature and more at &lt;a href="http://dashboard.neurelo.com" rel="noopener noreferrer"&gt;dashboard.neurelo.com&lt;/a&gt;. To learn more about how you can use our platform, be sure to check out &lt;a href="https://www.neurelo.com/" rel="noopener noreferrer"&gt;our website&lt;/a&gt; and our &lt;a href="https://docs.neurelo.com/tutorials" rel="noopener noreferrer"&gt;getting-started tutorials&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>mongodb</category>
      <category>database</category>
    </item>
    <item>
      <title>Top 8 Managed Postgres Providers</title>
      <dc:creator>Shani Shoham</dc:creator>
      <pubDate>Wed, 07 Aug 2024 19:23:00 +0000</pubDate>
      <link>https://forem.com/neurelo/top-8-managed-postgres-providers-gab</link>
      <guid>https://forem.com/neurelo/top-8-managed-postgres-providers-gab</guid>
      <description>&lt;p&gt;&lt;a href="https://www.postgresql.org/" rel="noopener noreferrer"&gt;PostgreSQL&lt;/a&gt;, often called Postgres, is a very strong and open-source object-relational database system. It's reliable, keeps data safe, has many useful features, and can grow with more data easily. It has become quite popular due to its ability to manage huge amounts of data while supporting &lt;a href="https://www.postgresql.org/docs/current/datatype.html" rel="noopener noreferrer"&gt;advanced data types&lt;/a&gt; and &lt;a href="https://www.postgresql.org/docs/current/external-pl.html" rel="noopener noreferrer"&gt;multiple programming interfaces&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;PostgreSQL is very flexible and allows developers to make and add their own functions and operators as well as create custom data types. This is why PostgreSQL is a popular choice for many kinds of applications, from small mobile apps to big enterprise systems.&lt;/p&gt;

&lt;p&gt;It follows &lt;a href="https://www.w3schools.com/sql/" rel="noopener noreferrer"&gt;SQL&lt;/a&gt; standards very well and also supports JSON and other &lt;a href="https://www.neurelo.com/post/sql-vs-nosql" rel="noopener noreferrer"&gt;NoSQL&lt;/a&gt; features, making it useful for both structured and unstructured queries.&lt;/p&gt;

&lt;p&gt;Handling databases like PostgreSQL can be very complicated, which is why managed database services have become popular. These services give businesses a chance to transfer the difficult work of setting up, maintaining, securing, and scaling their databases to outside providers.&lt;/p&gt;

&lt;p&gt;Managed services are very useful, as they provide high availability, disaster recovery, and the ability to scale up or down as needed. These things are important for dealing with unexpected workloads and making sure services keep running without breaks. They also have strong security features like regular updates and patches to protect the database from weaknesses.&lt;/p&gt;

&lt;p&gt;In this post, you'll find information about managed PostgreSQL services. You'll understand their advantages and important characteristics in addition to getting a complete view of top providers in the market. The goal is to assist you in choosing the most suitable managed Postgres service.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Managed Postgres?
&lt;/h2&gt;

&lt;p&gt;Managed PostgreSQL services mean giving almost all the &lt;a href="https://www.neurelo.com/post/database-observability-introduction" rel="noopener noreferrer"&gt;database&lt;/a&gt; management tasks to another company. These services include setting up, running, and taking care of PostgreSQL databases in cloud systems. The main aim is to free client organizations from difficult database administration tasks. This way, they can pay more attention to their important business activities while making sure their databases work well, are safe, and are always available.&lt;/p&gt;

&lt;p&gt;Using a managed PostgreSQL service has many advantages. One of the major ones is automated backups, which protect data and make it easier to recover if something goes wrong. Another important feature is scalability, letting companies change resources based on what they need without doing it manually. This helps save money and makes operations more flexible.&lt;/p&gt;

&lt;p&gt;Regular updates and maintenance, done by the service provider, ensure that the system stays safe and works well. This allows internal teams to concentrate on main business tasks instead of managing databases.&lt;/p&gt;

&lt;p&gt;When you manage a PostgreSQL database by yourself, it can demand many resources. You need to have in-house knowledge for setting up the system, keeping it maintained, and solving problems that may arise. Managed services lessen this load, as they offer automated answers and professional assistance, which frequently results in lower overall cost ownership along with enhanced dependability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Managed PostgreSQL
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Key Features of Managed Postgres Providers
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Scalability&lt;/strong&gt; —Managed PostgreSQL services give options so databases can grow as the business grows. Resources can change automatically depending on needs, which helps a lot when dealing with different amounts of work and growing data. This means there is no need for people to adjust things by themselves manually.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Performance&lt;/strong&gt; —Managed services provide ongoing observation of database performance. This means keeping an eye on important &lt;a href="https://www.qlik.com/us/kpi" rel="noopener noreferrer"&gt;performance markers (KPIs)&lt;/a&gt; to find and fix possible problems quickly that may impact how the database works or its speed.&lt;br&gt;
Security—Providers put in place strong security actions, such as network protection, data encryption, and access controls, to secure important information. Following regulatory rules is also handled to make sure that how the data is managed meets legal requirements.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Backup and recovery&lt;/strong&gt;—Regular backups happen to keep data safe, with plans ready for quick fixing if data gets lost or messed up. There are special disaster recovery plans made to make sure there is less downtime and not much loss of data when big problems like system failure occur.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cost&lt;/strong&gt;—Using managed PostgreSQL services can help save a lot of money because you don't need to have special database experts on your team. The way they charge is often by subscription, so you know how much you will pay every month without surprises. This model includes the expenses for infrastructure, software licenses, and continuous upkeep. It can lower capital outlays and running costs. Also, with resources that you can scale up or down easily, it helps in managing spending based on real needs, so there's no extra investment for things you don't use.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Top Managed Postgres Providers
&lt;/h2&gt;

&lt;p&gt;Navigating the world of managed Postgres services is like entering a never-ending carnival with endless options. Over 5,000 forks of PostgreSQL on GitHub indicate developers' eagerness to tinker and experiment. With each new version, a fresh batch of managed service providers emerges, each boasting unique features.&lt;/p&gt;

&lt;p&gt;With such a big and always-growing ecosystem, making an exact list of the best providers is really hard. Factors such as scalability, cost, support, security features, and additional services (like backups, monitoring tools, and automation resources) can greatly influence which provider is most suitable for a specific situation. Therefore, it’s more important to look for the best match rather than seeking out one "best" provider that fits all needs.&lt;/p&gt;

&lt;p&gt;Let’s take a quick look at some of the “top” providers:&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Supabase
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://supabase.com/" rel="noopener noreferrer"&gt;Supabase&lt;/a&gt; is a platform for backend-as-a-service that provides an open-source option to &lt;a href="https://firebase.google.com/" rel="noopener noreferrer"&gt;Firebase&lt;/a&gt;. It uses PostgreSQL as the database part and aims to make backend development easier with a collection of built-in tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Features
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Real-time data synchronization&lt;/li&gt;
&lt;li&gt;Auto-generated APIs&lt;/li&gt;
&lt;li&gt;Authentication services&lt;/li&gt;
&lt;li&gt;Interface for &lt;a href="https://www.neurelo.com/post/simplifying-database-programming-with-neurelo" rel="noopener noreferrer"&gt;database&lt;/a&gt; management&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Special Advantages
&lt;/h2&gt;

&lt;p&gt;The platform can do things in real time, so developers can add interactive and lively features with ease. This is very good for projects where you need to start quickly but still want the strong support that PostgreSQL gives.&lt;/p&gt;

&lt;h2&gt;
  
  
  Justification for Ranking
&lt;/h2&gt;

&lt;p&gt;Supabase gains a high rank because it provides developer-friendly tools and features that speed up backend development. It keeps the flexibility of being open-source, which is very helpful. It's a perfect fit for new startups or projects where you need to develop and launch products quickly.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Aiven
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://aiven.io/" rel="noopener noreferrer"&gt;Aiven&lt;/a&gt; provides managed cloud service for PostgreSQL, making sure databases run smoothly, safely, and can grow easily on different cloud platforms.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Features
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Works with many cloud systems&lt;/li&gt;
&lt;li&gt;Automatic backups&lt;/li&gt;
&lt;li&gt;Pays much attention to security with things like private networks and encryption&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Special Advantages
&lt;/h2&gt;

&lt;p&gt;Aiven can be used on different cloud services (AWS, Google Cloud, Azure), offering flexibility and strength. This makes it good for companies needing strong plans to recover from disasters.&lt;/p&gt;

&lt;h2&gt;
  
  
  Justification for Ranking
&lt;/h2&gt;

&lt;p&gt;Aiven is popular, as it has strong security features and works well on various cloud platforms. This helps businesses that care about keeping data safe and following rules without needing to stay with just one cloud provider.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Amazon RDS for PostgreSQL
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/rds/postgresql/" rel="noopener noreferrer"&gt;Amazon RDS&lt;/a&gt; is a managed service for relational databases that makes PostgreSQL setup, scaling, and management automatic. This lets developers concentrate on creating applications instead of handling database tasks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Features
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Simple to increase size&lt;/li&gt;
&lt;li&gt;Automatic backups&lt;/li&gt;
&lt;li&gt;Works well with many &lt;a href="https://docs.neurelo.com/guides/provisioning-cloud-databases-for-using-with-neurelo/postgresql/aws-rds-postgresql?_gl=1*evwq1h*_gcl_au*MTg2MjkyODU1OS4xNzIyMDE5ODM3" rel="noopener noreferrer"&gt;AWS&lt;/a&gt; services&lt;/li&gt;
&lt;li&gt;Upkeep tools such as patch management&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Special Advantages
&lt;/h2&gt;

&lt;p&gt;The service ties in very well with AWS’s environment, giving better features like scaling automatically and managed backup services. These are very important for big applications.&lt;/p&gt;

&lt;h2&gt;
  
  
  Justification for Ranking
&lt;/h2&gt;

&lt;p&gt;The way it fits perfectly with AWS and the strong infrastructure backing makes its top position reasonable, especially for companies already using AWS that need high scalability and dependability.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Google Cloud SQL for PostgreSQL
&lt;/h2&gt;

&lt;p&gt;This is &lt;a href="https://cloud.google.com/sql/docs/postgres" rel="noopener noreferrer"&gt;Google's managed service&lt;/a&gt; for databases that makes it easier to set up, maintain, and manage PostgreSQL databases on Google Cloud.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Features
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Provides high availability&lt;/li&gt;
&lt;li&gt;Automatic switching to backup in case of failure&lt;/li&gt;
&lt;li&gt;Easy blending with Google services&lt;/li&gt;
&lt;li&gt;Solid data encryption&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Special Advantages
&lt;/h2&gt;

&lt;p&gt;Using Google’s advanced infrastructure, it gives low delay and high data transfer speed. It also connects well with services for analysis and machine learning.&lt;/p&gt;

&lt;h2&gt;
  
  
  Justification for Ranking
&lt;/h2&gt;

&lt;p&gt;This service is best suited for organizations that work with big data and AI, taking advantage of Google's cutting-edge analytics and machine learning features integrated directly into their database management systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Microsoft Azure Database for PostgreSQL
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://azure.microsoft.com/en-us/products/postgresql" rel="noopener noreferrer"&gt;Microsoft Azure Database&lt;/a&gt; is a completely managed PostgreSQL database service made to manage important workloads with included high availability and ability to grow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Features
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Automatic performance tuning&lt;/li&gt;
&lt;li&gt;High availability&lt;/li&gt;
&lt;li&gt;Seamless integration with Azure services&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Special Advantages
&lt;/h2&gt;

&lt;p&gt;It gives strong security features for enterprises, follows rules and regulations strictly, and is very fast in performance. This makes it a perfect choice for big companies that also use other Azure services.&lt;/p&gt;

&lt;h2&gt;
  
  
  Justification for Ranking
&lt;/h2&gt;

&lt;p&gt;The deep integration with Microsoft's ecosystem and the provision of significant scalability and security features make it a favored option for enterprises needing strong data management capabilities.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Heroku Postgres
&lt;/h2&gt;

&lt;p&gt;Available as a fully managed service within the &lt;a href="https://www.heroku.com/" rel="noopener noreferrer"&gt;Heroku cloud platform&lt;/a&gt;, &lt;a href="https://www.heroku.com/postgres" rel="noopener noreferrer"&gt;Heroku Postgres&lt;/a&gt; makes application development easier by combining data services directly with app creation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Features
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Features such as fork and follow&lt;/li&gt;
&lt;li&gt;Data rollbacks&lt;/li&gt;
&lt;li&gt;High availability setups&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Special Advantages
&lt;/h2&gt;

&lt;p&gt;Integration with Heroku’s developer tools and workflows gives a smooth experience for making and deploying apps, so handling databases feels like part of creating the app.&lt;/p&gt;

&lt;h2&gt;
  
  
  Justification for Ranking
&lt;/h2&gt;

&lt;p&gt;It's very easy to use and fits well with Heroku’s development platform, which makes it especially attractive to developers and smaller businesses that focus on agile ways of working.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. DigitalOcean Managed Databases for PostgreSQL
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.digitalocean.com/products/managed-databases-postgresql" rel="noopener noreferrer"&gt;DigitalOcean&lt;/a&gt; offers an easy and affordable managed database service designed for developers and small-to-medium businesses.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Features
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Simple scaling options&lt;/li&gt;
&lt;li&gt;Automated backups&lt;/li&gt;
&lt;li&gt;A user-friendly management interface&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Special Advantages
&lt;/h2&gt;

&lt;p&gt;Famous for its simple and developer-friendly platform, it provides a good mix of performance and ease of use without the difficulty found in bigger cloud service providers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Justification for Ranking
&lt;/h2&gt;

&lt;p&gt;The provider's emphasis on simple use and low cost makes it very appealing to new companies and smaller businesses that need a trustworthy database solution without large expenses.&lt;/p&gt;

&lt;h2&gt;
  
  
  8. Rackspace Managed PostgreSQL
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://docs.rackspace.com/docs/managed-databases" rel="noopener noreferrer"&gt;Rackspace&lt;/a&gt; provides very flexible PostgreSQL hosting services, focusing on personalized setups and high-level customer support.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Features
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Customer support that's always available&lt;/li&gt;
&lt;li&gt;Special setups made to fit unique business needs&lt;/li&gt;
&lt;li&gt;Active performance management&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Special Advantages
&lt;/h2&gt;

&lt;p&gt;Focusing on custom solutions and dedicated support means businesses get their PostgreSQL environment designed just for their special needs, with help from experts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Justification for Ranking
&lt;/h2&gt;

&lt;p&gt;Rackspace fits very well for businesses needing a lot of customization and personal assistance. It helps companies look for options beyond just standard solutions.&lt;/p&gt;

&lt;p&gt;Though these providers often get attention, other services such as &lt;a href="https://scalegrid.io/" rel="noopener noreferrer"&gt;ScaleGrid&lt;/a&gt;, &lt;a href="https://www.crunchydata.com/" rel="noopener noreferrer"&gt;Crunchy Data&lt;/a&gt;, &lt;a href="https://www.ibm.com/products/databases-for-postgresql" rel="noopener noreferrer"&gt;IBM Cloud Databases for PostgreSQL&lt;/a&gt;, and &lt;a href="https://www.elephantsql.com/" rel="noopener noreferrer"&gt;ElephantSQL&lt;/a&gt; also provide useful features. Each one meets different needs, including flexibility, security, or ease of use. This makes them good options to think about for certain business purposes or smaller projects.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;After reading this post, you're now aware of several options for managed PostgreSQL services. These services make handling databases easier with things like automatic backups, the ability to grow easily (scalability), and regular updates. They are very useful for companies that want to improve how they work and spend more time on main development instead of taking care of databases.&lt;/p&gt;

&lt;p&gt;You've seen top providers such as Supabase, Aiven, and Amazon RDS for PostgreSQL, among others. These were selected because they meet certain important points and have good standing in the market. Each provider has its own special advantages that make it fit well with different business requirements and tech setups. Choosing the right managed PostgreSQL provider needs to be based on your business's unique needs, budget limits, and wanted tech features.&lt;/p&gt;

</description>
      <category>database</category>
      <category>api</category>
      <category>architecture</category>
      <category>sql</category>
    </item>
    <item>
      <title>Raw SQL or ORM. Which one is better?</title>
      <dc:creator>Shani Shoham</dc:creator>
      <pubDate>Wed, 24 Jul 2024 01:13:23 +0000</pubDate>
      <link>https://forem.com/neurelo/raw-sql-or-orm-which-one-is-better-1pp2</link>
      <guid>https://forem.com/neurelo/raw-sql-or-orm-which-one-is-better-1pp2</guid>
      <description>&lt;p&gt;In the world of database management, there are two main approaches that developers often grapple with: using raw SQL queries or utilizing an ORM (Object Relational Mapper). Both methods have their strengths and weaknesses, leaving developers wondering which one is ultimately better. You’ll hear lots of opinions about both approaches. In this article, we will delve into the intricacies of raw SQL and ORM, analyze their pros and cons, and explore the scenarios where one might be preferred over the other.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding Raw SQL and ORM
&lt;/h2&gt;

&lt;p&gt;Before we can compare the two approaches, it's crucial to have a clear understanding of what each entails. Raw SQL refers to writing SQL queries directly in the code, while an &lt;a href="https://www.neurelo.com/post/what-is-an-orm" rel="noopener noreferrer"&gt;ORM&lt;/a&gt; like SQLAlchemy, Active Records or Drizzle, is a software library that maps objects to relational database tables. &lt;/p&gt;

&lt;p&gt;When opting for raw SQL, developers have more control over the queries executed against the database. This control allows for complex queries and fine-tuning performance. On the other hand, an ORM acts as an abstraction layer, making database interactions more intuitive and reducing the need for repetitive and verbose SQL code.&lt;/p&gt;

&lt;p&gt;Raw SQL queries can be highly efficient when dealing with complex database operations that require specific optimizations. By directly writing SQL statements, developers can leverage database-specific features and optimizations that might not be easily achievable through an ORM. This level of control is particularly useful in scenarios where performance is a critical factor and the query logic needs to be finely tuned to meet strict requirements.&lt;/p&gt;

&lt;p&gt;On the contrary, ORMs provide a higher level of abstraction that simplifies the interaction with the database. By mapping database tables to object-oriented models, developers can work with familiar programming constructs, such as classes and objects, instead of dealing with raw SQL syntax. This abstraction not only enhances code readability but also promotes rapid development by reducing the amount of boilerplate SQL code that needs to be written and maintained.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pros and Cons of Using Raw SQL
&lt;/h2&gt;

&lt;p&gt;Raw SQL can be a powerful tool for developers who are well-versed in SQL and want maximum control over their database interactions. It allows for highly optimized queries, resulting in faster execution times. Additionally, raw SQL grants developers the flexibility to work with specific database features and syntax that may not be supported by an ORM.&lt;/p&gt;

&lt;p&gt;On top of these advantages, using raw SQL can also provide a deeper understanding of how databases function at a fundamental level. Developers can fine-tune their queries to leverage database indexes efficiently, leading to significant performance improvements. This level of control can be especially beneficial when working with large datasets or complex data structures.&lt;/p&gt;

&lt;p&gt;However, raw SQL does come with a set of challenges. It can be error-prone, difficult to troubleshoot, and, if mishandled, can lead to security risks like SQL injection attacks. Maintaining and updating raw SQL code can also become cumbersome, especially as the complexity of the project grows. &lt;/p&gt;

&lt;p&gt;Another potential drawback of relying heavily on raw SQL is the lack of portability across different database management systems. SQL syntax and functions can vary between databases, making it challenging to write queries that are compatible with multiple platforms. This limitation can hinder the scalability of a project, as it may necessitate significant rewrites if the database technology needs to be changed in the future. It's essential for developers to weigh the benefits of using raw SQL against these portability concerns when deciding on the approach to database interactions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pros and Cons of Using ORMs
&lt;/h2&gt;

&lt;p&gt;Using Object-Relational Mappers (ORMs) in software development offers several advantages that can boost productivity and streamline database interactions. One of the primary benefits is the significant reduction in boilerplate code, which allows developers to focus more on implementing business logic rather than writing repetitive SQL queries. ORMs abstract the database layer, enabling developers to work with objects rather than SQL statements, which aligns well with object-oriented programming principles. This abstraction not only simplifies CRUD operations—Create, Read, Update, and Delete—but also makes it easier to switch between different database systems with minimal code changes. Additionally, ORMs can enhance code maintenance by consolidating schema definitions and relationships in a single location, reducing redundancy and facilitating easier management of schema migrations over time. Moreover, an ORM offers built-in security features that help protect against SQL injection attacks. By parameterizing queries and escaping values, ORMs ensure that malicious input cannot compromise the database's integrity.&lt;/p&gt;

&lt;p&gt;On the downside, ORMs can introduce performance overhead due to the abstraction layer, which might not be suitable for applications requiring high-performance or finely-tuned SQL queries. The queries generated by ORMs may not always be as optimized as hand-crafted SQL, potentially affecting application performance. Moreover, the complexity of mastering ORM frameworks and understanding their behavior can be a challenge, particularly for developers new to the technology. The abstraction provided by ORMs can sometimes obscure the understanding of underlying database interactions, making debugging more complex. Additionally, ORMs can limit control over query optimization and execution plans, and they may not immediately support the latest database features or functionalities.&lt;/p&gt;

&lt;p&gt;Furthermore, ORMs can create "leaky abstractions," where underlying database-specific quirks and behaviors still need to be added into the code, which can complicate development. This tight coupling to the ORM might also make it difficult to switch to another data access method if needed. While ORMs offer substantial benefits in terms of productivity, maintainability, and security, these advantages come with trade-offs in performance, control, and complexity. Therefore, the decision to use an ORM should be carefully considered based on the specific needs and constraints of the project.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance Comparison: Raw SQL vs ORM
&lt;/h2&gt;

&lt;p&gt;Performance is a critical factor when deciding between raw SQL and an ORM. In general, raw SQL queries tend to offer faster execution times due to their ability to optimize the queries. However, the difference in performance heavily depends on the complexity of the database interactions and the efficiency of the underlying ORM implementation.&lt;/p&gt;

&lt;p&gt;ORMs often compensate for any performance deficiencies by providing caching mechanisms. By caching query results and reducing the number of database calls, ORM-powered applications can achieve better overall performance in certain scenarios.&lt;/p&gt;

&lt;h2&gt;
  
  
  Security Considerations: Raw SQL vs ORM
&lt;/h2&gt;

&lt;p&gt;When it comes to security, both raw SQL and ORM have their considerations. Raw SQL introduces the risk of SQL injection attacks if proper measures are not taken, making it crucial for developers to sanitize user inputs and use prepared statements or query builders.&lt;/p&gt;

&lt;p&gt;ORMs, while offering protection against SQL injection, can still be susceptible to other security vulnerabilities. Developers need to ensure that the ORM library they choose is actively maintained and regularly updated to address any security issues that arise.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Choose Raw SQL Over ORM
&lt;/h2&gt;

&lt;p&gt;Although ORMs provide numerous advantages, there are situations where raw SQL might be the preferable choice. Raw SQL has the upper hand when it comes to complex queries, as it allows developers to fine-tune and optimize them according to specific database requirements. If the project heavily relies on database-specific features or requires low-level control, raw SQL might be the better option.&lt;/p&gt;

&lt;p&gt;Additionally, raw SQL can be advantageous for performance-critical applications that demand lightning-fast database interactions. In such cases, the control over query optimization and the ability to manually fine-tune the code can make a noticeable impact on performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Deciding between raw SQL and an ORM is not a one-size-fits-all approach. It ultimately depends on the specific requirements of the project and the tradeoffs the development team is willing to make. &lt;/p&gt;

&lt;p&gt;For developers seeking to balance control, performance optimization of raw SQL with the efficiency, maintainability, and code simplicity of ORMs, Neurelo is a cloud &lt;a href="https://www.neurelo.com/features/platform" rel="noopener noreferrer"&gt;data API platform&lt;/a&gt; that offers &lt;a href="https://www.neurelo.com/features/auto-generated-apis" rel="noopener noreferrer"&gt;auto-generated APIs&lt;/a&gt;, together with &lt;a href="https://www.neurelo.com/features/ai-powered-query-playground" rel="noopener noreferrer"&gt;AI-powered complex query builder&lt;/a&gt;, &lt;a href="https://www.neurelo.com/features/schema-as-code" rel="noopener noreferrer"&gt;Git-like schema management&lt;/a&gt;, &lt;a href="https://www.neurelo.com/features/query-observability" rel="noopener noreferrer"&gt;observability&lt;/a&gt; and more. Performance and simplicity should not be tradeoffs.  &lt;a href="https://dashboard.neurelo.com/register" rel="noopener noreferrer"&gt;Click here&lt;/a&gt; to try it yourself. &lt;/p&gt;

</description>
      <category>database</category>
      <category>api</category>
      <category>architecture</category>
      <category>sql</category>
    </item>
    <item>
      <title>Database Observability: An Introductory Guide</title>
      <dc:creator>Shani Shoham</dc:creator>
      <pubDate>Tue, 04 Jun 2024 15:00:00 +0000</pubDate>
      <link>https://forem.com/neurelo/database-observability-an-introductory-guide-1pbc</link>
      <guid>https://forem.com/neurelo/database-observability-an-introductory-guide-1pbc</guid>
      <description>&lt;p&gt;Ever wondered what happens behind the scenes when you use an app or website? A crucial part of the magic lies in the database—a vast digital system storing all the information that keeps things running smoothly. But just like any complex system, databases require constant care and attention to ensure optimal performance. This is where database observability comes in. With database observability, it's like there's a guardian watching over your data.&lt;/p&gt;

&lt;p&gt;This post will teach you the importance of database observability, prepare you for the challenges that might be encountered, and equip you with practical strategies for implementing it effectively.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Database Observability?
&lt;/h2&gt;

&lt;p&gt;Database observability, to put it simply, is the process of actively tracking and comprehending the functionality and state of your database systems. It's similar to having a live window into your database, letting you see possible problems early on, maximize efficiency, and make sure your data is always available.&lt;/p&gt;

&lt;p&gt;Database observability relies on three key components to provide this comprehensive view:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Metrics: These are numerical assessments that monitor several facets of the health of your database, including disk use, connection counts, and &lt;a href="https://www.neurelo.com/features/query-observability"&gt;query execution&lt;/a&gt; times. They provide an instantaneous overview of your database's current state.&lt;/li&gt;
&lt;li&gt;Logs: Imagine a detailed record of everything happening within your database. Logs capture events like successful or failed queries, user actions, and error messages. By analyzing logs, you can gain deeper insights into potential problems and identify root causes.&lt;/li&gt;
&lt;li&gt;Traces: Think of traces as the behind-the-scenes story of a query. They capture the entire journey of a query as it travels through your database system, pinpointing any bottlenecks or slowdowns that might be hindering performance.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Importance of Database Observability
&lt;/h2&gt;

&lt;p&gt;Consider your database to be the central nervous system of your application, housing all the vital data needed to keep everything operating. A healthy database is necessary for the proper functioning of your applications and websites, just as a healthy heart is necessary for an individual's well-being. This is the point at which database observability becomes important. This is why it's a critical piece of work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deep Dive into Production and Application Behavior
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Significance of API and query-level insights: Database observability allows you to see beyond overall database health and delve into granular details. By monitoring &lt;a href="https://www.neurelo.com/features/auto-generated-apis"&gt;API&lt;/a&gt; and query-level metrics, you can pinpoint exactly how specific applications and functionalities interact with your database. This helps you identify areas where queries might be slow or inefficient, impacting the user experience.&lt;/li&gt;
&lt;li&gt;Impact on identifying and solving issues promptly: Traditional monitoring might only alert you after a major issue arises. Database observability empowers you to be proactive. By tracking key metrics and analyzing logs, you can identify potential problems early on—before they snowball into critical failures. This allows for faster troubleshooting and resolution, minimizing downtime and ensuring a smooth user experience.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Build a Reliable Database Fortress
&lt;/h2&gt;

&lt;p&gt;A sluggish database can significantly impact your application's performance. Database observability helps you identify bottlenecks and performance issues within your database. By analyzing query execution times, connection pools, and resource utilization, you can optimize your database configuration and fine-tune queries, leading to a faster and more responsive system.&lt;/p&gt;

&lt;h2&gt;
  
  
  Enhanced Scalability
&lt;/h2&gt;

&lt;p&gt;As your application grows, your database needs to keep pace. Database observability provides valuable insights into your database's resource usage, allowing you to proactively scale your infrastructure to meet evolving demands and ensure smooth performance under increasing loads.&lt;/p&gt;

&lt;h2&gt;
  
  
  Improved Development and Operations Collaboration
&lt;/h2&gt;

&lt;p&gt;Database observability fosters better communication between developers and operations teams. By providing shared visibility into database health and performance, both teams can work together to optimize &lt;a href="https://docs.neurelo.com/definitions/custom-query-endpoints-as-apis/write-and-commit-custom-queries?_gl=1*xi5ewg*_gcl_au*NzIwNzAxNzY1LjE3MTQ0MzIxMzI."&gt;queries&lt;/a&gt;, identify potential issues early on, and ensure a more efficient development and deployment process.&lt;/p&gt;

&lt;h2&gt;
  
  
  Optimizing Resource Utilization
&lt;/h2&gt;

&lt;p&gt;Database observability acts as a resource manager, akin to a wise gardener tending to a flourishing garden. It optimizes resource utilization, ensuring that every byte and cycle is utilized effectively. This not only improves efficiency but also reduces unnecessary expenses, much like turning off lights in unoccupied rooms to save energy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Challenges of Database Observability
&lt;/h2&gt;

&lt;p&gt;While database observability offers immense benefits, it's not without its challenges. Here are some key obstacles you might encounter on your journey.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data Privacy and Security
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Balancing observability with privacy concerns: Database observability involves collecting and analyzing data about your database's operation, which might include sensitive information. It's crucial to strike a balance between gaining valuable insights and protecting user privacy.&lt;/li&gt;
&lt;li&gt;Strategies for safeguarding sensitive information: There are several strategies to ensure data security while maintaining observability. You can implement &lt;a href="https://aws.amazon.com/what-is/data-masking/"&gt;data masking&lt;/a&gt; to hide sensitive data in logs, leverage &lt;a href="https://auth0.com/docs/manage-users/access-control/rbac"&gt;role-based access control&lt;/a&gt; to limit access to sensitive information, and encrypt sensitive data at rest and in transit.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Complexity of Design and Maintenance
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Navigating intricate database structures: Modern databases can be complex, with intricate structures and relationships between tables. This complexity can make it challenging to determine which metrics and logs are most relevant for monitoring and troubleshooting.&lt;/li&gt;
&lt;li&gt;Addressing challenges in maintaining observability tools: Database observability tools themselves require ongoing maintenance and updates. You'll need to invest time and resources in selecting the right tools, configuring them effectively, and ensuring they stay up-to-date to provide accurate and reliable insights.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Real-Time Observability
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Importance of real-time insights: In today's fast-paced world, real-time insights are crucial for identifying and responding to issues promptly. Delays in data collection and analysis can hinder your ability to react quickly to potential problems.&lt;/li&gt;
&lt;li&gt;Overcoming obstacles in achieving real-time observability: Achieving real-time observability can be challenging, especially for large and complex databases. Factors like data volume, processing power, and network latency can all contribute to delays. You can overcome these obstacles by implementing efficient data collection methods, leveraging streaming technologies, and optimizing infrastructure.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Resource Scalability
&lt;/h2&gt;

&lt;p&gt;As data volumes grow, so does the need for scalable observability solutions. Addressing this challenge involves adopting cloud-based solutions and optimizing resource allocation. It's akin to ensuring that your ship not only sails smoothly but also adapts to the ever-changing tides without capsizing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Strategies for Implementing Database Observability: A Roadmap to Success
&lt;/h2&gt;

&lt;p&gt;Equipping yourself with the right strategies is essential for unlocking the true power of database observability. These strategies act as your roadmap, guiding you toward a comprehensive understanding of your database's health and performance. Let's delve into some key strategies that will empower you to effectively implement database observability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demystifying Production Environments and Application Behavior
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Monitoring queries slowing down: Slow queries can significantly impact user experience. Here's how to tackle them:some text&lt;/li&gt;
&lt;li&gt;Identify bottlenecks: Use your observability tools to pinpoint queries with longer execution times. Analyze query plans and execution paths to identify bottlenecks that might be slowing down data retrieval.&lt;/li&gt;
&lt;li&gt;Optimize slow queries: Once you've identified bottlenecks, you can optimize slow queries. This might involve rewriting inefficient queries, creating appropriate indexes, or adjusting database configuration settings.&lt;/li&gt;
&lt;li&gt;Managing queries interfering with one another: Sometimes queries can compete for resources and slow each other down. Here's how to address this:some text&lt;/li&gt;
&lt;li&gt;Analyze query dependencies: Use your observability tools to track query dependencies and identify situations where one query might be blocking another.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Implement isolation techniques: Use database features like transactions and locking mechanisms to ensure queries execute without interference, preventing slowdowns.&lt;/p&gt;
&lt;h2&gt;
  
  
  Understanding Read/Write Patterns
&lt;/h2&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Analyzing data access patterns: Gaining insights into how data is accessed within your database is crucial. Here's what you need to track:some text&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Analyze read and write frequencies: Monitor the ratio of read operations to write operations (reads versus writes) within your database. This helps you understand how your application primarily interacts with the data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Adjust resources based on usage patterns: Based on your read/write analysis, you might need to adjust resources allocated to your database. For instance, if you have a read-heavy application, scaling your read replicas can improve performance.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Scaling for Optimal Performance
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;When to scale (scaling up or out): As your application grows, your database might need to scale as well. Here's how to decide:some text&lt;/li&gt;
&lt;li&gt;Recognize signs of increased load: Monitor key metrics like CPU usage, memory consumption, and connection pools. When these metrics reach capacity, it's a sign you might need to scale.&lt;/li&gt;
&lt;li&gt;Implement scaling strategies effectively: There are two main scaling approaches: scaling up (adding more resources to a single server) or scaling out (distributing the database load across multiple servers). Choosing the right approach depends on your specific needs and infrastructure.&lt;/li&gt;
&lt;li&gt;What to scale: Not all database components need to be scaled equally.some text&lt;/li&gt;
&lt;li&gt;Identify components for scaling: Focus on scaling components like CPU, memory, or storage based on which resources are reaching their limits.&lt;/li&gt;
&lt;li&gt;Ensure cost-effectiveness in scaling decisions: Consider the cost implications of scaling. Explore cost-effective options like using cloud-based database services with auto-scaling features.&lt;/li&gt;
&lt;li&gt;By implementing these strategies and tailoring them to your specific database environment, you'll gain a deeper understanding of your application's interaction with your database, optimize performance, and ensure your database scales effectively to meet your growing needs. Remember, database observability is an ongoing journey, and these strategies will serve as your guide as you refine your approach and continuously improve the health and performance of your databases.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;After reading this post, you now know about database observability, a critical practice for ensuring your database runs smoothly and efficiently. We've unpacked its importance, shedding light on how it helps you understand application behavior, improve system reliability, and ensure data remains readily accessible.&lt;/p&gt;

&lt;p&gt;We've also equipped you with practical strategies for implementing database observability. You've learned how to monitor queries, analyze read/write patterns, and effectively scale your database for optimal performance. By following these steps and continuing to explore this essential practice, you can ensure your database remains the strong foundation of your applications and websites.&lt;/p&gt;

&lt;p&gt;This post was written by Gourav Bais. &lt;a href="https://www.analyticsvidhya.com/blog/author/gourav29/?utm_source=social&amp;amp;utm_medium=linkedin&amp;amp;utm_campaign=author_profile"&gt;Gourav&lt;/a&gt; is an applied machine learning engineer skilled in computer vision/deep learning pipeline development, creating machine learning models, retraining systems, and transforming data science prototypes into production-grade solutions.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>mongodb</category>
      <category>mysql</category>
      <category>database</category>
    </item>
    <item>
      <title>Most popular Postgres ORMs</title>
      <dc:creator>Shani Shoham</dc:creator>
      <pubDate>Wed, 29 May 2024 15:00:00 +0000</pubDate>
      <link>https://forem.com/neurelo/postgres-orms-4g91</link>
      <guid>https://forem.com/neurelo/postgres-orms-4g91</guid>
      <description>&lt;p&gt;This blog post will discuss popular Postgres ORM tools and other alternatives.&lt;/p&gt;

&lt;p&gt;Those who are not familiar with ORMs should read “What is an ORM and when developers should and shouldn't use it", but to recap Object-Relational Mappers (ORMs) abstract the database interactions in applications by converting database records into objects and vice versa.&lt;/p&gt;

&lt;p&gt;This blog post includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Introduction&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Postgres ORM tools&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Popular Postgres ORMs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Sequelize&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;SQLAlchemy&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Active Record&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Eloquent&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Hibernate&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;GORM&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Postgres ORMs - Challenges and limitations&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.postgresql.org/"&gt;PostgreSQL&lt;/a&gt; (commonly also referred to as Postgres) is one of the most popular open-source relational database management systems (RDBMS). Renowned for its robustness, extensibility, and adherence to SQL standards, PostgreSQL has positioned itself as one of the most reliable and feature-rich relational databases available. Its support for complex data types and advanced indexing methods makes it suitable for handling diverse data requirements. Postgres is also recognized for its extensibility through the use of custom functions, data types, and procedural languages.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  Postgres ORM Tools
&lt;/h2&gt;

&lt;p&gt;PostgreSQL supports various ORM tools for different programming languages, with some popular ones being SQLAlchemy for Python, Hibernate for Java, and Sequelize for TypeScript/Node.js. These tools provide a set of abstractions that allow developers to interact with the PostgreSQL database using object-oriented concepts.&lt;/p&gt;

&lt;p&gt;This enables developers to model database entities as objects and manipulate them using familiar programming language constructs, reducing the need for raw SQL queries. It also enhances code readability and maintainability while also promoting a more natural development workflow.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  Popular Postgres ORMs
&lt;/h2&gt;

&lt;p&gt;Here are some popular PostgreSQL ORMs for specific programming languages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sequelize (for TypeScript): Sequelize is a widely used ORM for TypeScript and JavaScript. It supports PostgreSQL and a few other databases, and provides a powerful set of features for working and querying databases.&lt;/li&gt;
&lt;li&gt;SQLAlchemy (for Python): SQLAlchemy is a robust ORM for Python, offering a flexible and expressive way to interact with PostgreSQL databases. It supports various PostgreSQL-specific features and provides a powerful SQL expression language.&lt;/li&gt;
&lt;li&gt;Active Record (for Ruby): Active Record, part of the Ruby on Rails framework, is a popular ORM for Ruby. It simplifies database interactions and supports PostgreSQL seamlessly, allowing developers to work with Ruby classes representing database tables.&lt;/li&gt;
&lt;li&gt;Eloquent (for PHP): Eloquent is the ORM included in the Laravel PHP framework. It simplifies database operations and supports PostgreSQL, offering an elegant syntax for querying databases using PHP.&lt;/li&gt;
&lt;li&gt;Hibernate (for Java): Hibernate is a widely used ORM for Java applications. It supports PostgreSQL and provides a robust framework for mapping Java objects to database tables, along with powerful querying capabilities.&lt;/li&gt;
&lt;li&gt;GORM (for Go): GORM is a popular ORM for the Go programming language. It offers support for PostgreSQL and provides a simple and concise syntax for defining and querying database models in Go.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Postgres ORMs - Challenges and limitations
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Performance Optimization: ORM-generated queries may not always be optimized for specific use cases, requiring developers to fine-tune queries manually directly in their code for better performance. The performance problems tend to worsen as the application scales.‍&lt;/li&gt;
&lt;li&gt;Learning Curve: ORMs are specific to a particular programming language and working with an ORM introduces a learning curve. Developers need to understand the intricacies of the ORM tool, programming language framework, and PostgreSQL to use them effectively.&lt;/li&gt;
&lt;li&gt;Abstraction Leaks: Abstraction leaks often occur as applications move beyond basic CRUD queries. This leads to messy code, unexpected behavior and performance issues. Developers must be cautious and carefully consider their application requirements, architecture, and data access patterns.&lt;/li&gt;
&lt;li&gt;Schema Changes and Migrations: Managing schema changes and migrations can be challenging, particularly in large applications. Developers need to plan for seamless database evolution as the application evolves.&lt;/li&gt;
&lt;li&gt;Modern application architectures: ORM tools may not seamlessly align with modern cloud and serverless architectures. As applications scale, ORM's centralized management of database connections and transactions can become a bottleneck, leading to increased latency and decreased overall throughput. Similarly, in serverless setups, ORM's heavyweight abstraction may struggle with short-lived compute instances. Developers need to carefully evaluate the trade-offs between ORM convenience and cloud-native benefits when architecting applications. &lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Postgres ORM alternative with Cloud based Data APIs
&lt;/h2&gt;

&lt;p&gt;Neurelo’s programming interface for databases addresses many of the challenges associated with Postgres ORMs such as N+1 queries and leaky abstractions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Full Transparency of Queries:
&lt;/h2&gt;

&lt;p&gt;Traditional ORMs obscure the SQL layer. Neurelo provides full visibility into the queries being executed. This makes debugging much easier but also offers developers choices to optimize database interactions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Query Strategies:
&lt;/h2&gt;

&lt;p&gt;Neurelo minimizes the number of queries made to the database using intelligent techniques like eager loading (while also offering the option of lazy loading, when needed) which retrieves the data in a single query using joins. This solves N+1 query issues associated with many traditional ORMs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Override Capability for Queries:
&lt;/h2&gt;

&lt;p&gt;Flexibility and control are key in application development, and hence Neurelo allows developers to override the default query generation behavior for APIs, when needed, as per the applications requirements. This feature is particularly useful for optimizing performance or handling edge cases as the application evolves over time, offering both short-term benefits – get started quickly with the queries that Neurelo has optimized to begin with, and long-term flexibility – debug, understand, and update these queries over time as your data, use cases, and applications demand. &lt;/p&gt;

&lt;h2&gt;
  
  
  Extending Neurelo APIs for Complex Queries:
&lt;/h2&gt;

&lt;p&gt;Neurelo’s &lt;a href="https://docs.neurelo.com/definitions/custom-query-endpoints-as-apis/ai-assisted-query-generation?_gl=1*1waumcc*_gcl_au*NzIwNzAxNzY1LjE3MTQ0MzIxMzI."&gt;AI-powered custom APIs&lt;/a&gt; go beyond the basic CRUD operations (typically offered by ORM frameworks) with auto-generated APIs that provide the ability to work on single entities (tables/collections) as well as advanced join read/write tasks that go across multiple entities. This means developers can craft bespoke solutions for specific requirements without compromising the integrity or efficiency of the application, thus solving “leaky abstractions”.&lt;/p&gt;

&lt;h2&gt;
  
  
  Schema-as-code and alignment to software development lifecycle
&lt;/h2&gt;

&lt;p&gt;By treating database schemas as code, Neurelo empowers developers to manage schema changes seamlessly within their existing codebase, enhancing collaboration, &lt;a href="https://docs.neurelo.com/definitions/branches-and-commits?_gl=1*yawpku*_gcl_au*NzIwNzAxNzY1LjE3MTQ0MzIxMzI."&gt;version control&lt;/a&gt;, and automation. With &lt;a href="https://www.neurelo.com/features/schema-as-code"&gt;Schema as Code&lt;/a&gt;, developers can define database schemas using familiar language and version control systems, enabling them to track changes, review diffs, and rollback modifications with ease. This approach revolutionizes schema management by integrating it into the software development lifecycle, streamlining workflows, and ensuring consistency between application code and database schema. Neurelo's Schema as Code concept represents a paradigm shift in database development, offering a modern and agile approach to schema management for the next generation of applications.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dashboard.neurelo.com/register"&gt;‍Try Neurelo&lt;/a&gt; now and never experience ORM limitations.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>orm</category>
      <category>api</category>
      <category>database</category>
    </item>
    <item>
      <title>What is an ORM and when developers should and shouldn't use it</title>
      <dc:creator>Shani Shoham</dc:creator>
      <pubDate>Sat, 25 May 2024 16:56:08 +0000</pubDate>
      <link>https://forem.com/neurelo/what-is-an-orm-and-when-developers-should-and-shouldnt-use-it-1hm2</link>
      <guid>https://forem.com/neurelo/what-is-an-orm-and-when-developers-should-and-shouldnt-use-it-1hm2</guid>
      <description>&lt;h2&gt;
  
  
  What is an ORM
&lt;/h2&gt;

&lt;p&gt;Object-Relational Mapping (ORM) is a programming technique that facilitates the interaction between a relational database and programming language. It acts as a bridge between the logical, object-oriented representation of data in the code and the physical, relational structure in a database. The primary purpose of an ORM is to eliminate the impedance mismatch that can occur between the object model used in application code and the relational model employed by databases.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  Why use an ORM
&lt;/h2&gt;

&lt;p&gt;Developers use ORMs for several reasons. Firstly, it simplifies and accelerates the development process by allowing them to work with objects and classes in the programming language rather than writing SQL queries. This abstraction minimizes the need to deal directly with database-specific syntax and intricacies.&lt;/p&gt;

&lt;p&gt;Secondly, an ORM enhances code maintainability by providing some level of an abstraction. Changes to the database schema can be managed more easily, as developers can update the mapping in the ORM layer without extensively modifying application code.&lt;/p&gt;

&lt;p&gt;Thirdly, an ORM promotes code reusability and portability. By abstracting the database interactions, developers can write database-agnostic code, making it easier to switch to a different database system without substantial code changes.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  Challenges using ORMs
&lt;/h2&gt;

&lt;p&gt;While an Object-Relational Mapping (ORM) layer provides numerous advantages, developers often encounter challenges and trade-offs when deciding to use this approach.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dev/Build Time Challenges:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;N+1 Queries: ORMs often lead to the N+1 query problem, where multiple database queries are executed instead of a more optimized query, causing performance issues for data access. Developers need to be mindful of the data access patterns to minimize this problem.&lt;/li&gt;
&lt;li&gt;Leakiness: As queries become more advanced and complex, beyond basic CRUD operations, the abstraction provided by an ORM can start to leak, forcing developers to direct deal with database-specific query code and behaviors from their code. This can lead to unexpected issues with code maintainability particularly when attempting to change the schema or optimize queries.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Production Performance, Scale, Optimization, and Security:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Performance: While ORM systems generate SQL queries, they may not always be ideal and developers may need to fine-tune or handcraft queries for optimal performance in specific situations. In high-performance scenarios, such optimizations become crucial.&lt;/li&gt;
&lt;li&gt;Scale: As applications scale, ORM-generated queries may not always scale seamlessly. Developers may need to consider database-specific optimizations or even move away from certain ORM features to ensure performance at scale.&lt;/li&gt;
&lt;li&gt;DB Connection Management: Connection pools manage reusable database connections, enhancing performance and scalability. ORMs may not always integrate seamlessly with connection pools, leading to issues such as inefficient connection management, impacting performance. Additionally, configuring the connection pool settings to align with the ORM's requirements and ensuring proper handling of connections within the ORM layer can be complex tasks.&lt;/li&gt;
&lt;li&gt;Security: ORM systems abstract away much of the SQL, but developers must remain vigilant about security. Poorly sanitized inputs, improper use of ORM features, or overlooking security configurations can introduce vulnerabilities.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Suitability to SDLC Processes (e.g., CI/CD):
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Continuous Integration/Continuous Deployment (CI/CD): ORMs can pose challenges in CI/CD pipelines, especially when dealing with database schema changes. Migrations and updates need careful consideration to avoid disruptions and ensure smooth deployment processes.&lt;/li&gt;
&lt;li&gt;Adaptation to Change: As applications evolve with new features, the ORM mappings may need frequent updates. Ensuring that these changes do not disrupt existing functionality and that migrations are handled seamlessly becomes a crucial aspect of the development process.&lt;/li&gt;
&lt;li&gt;Modern application architectures: ORM tools may not seamlessly align with modern cloud and serverless architectures. As applications scale, ORM's centralized management of database connections and transactions can become a bottleneck, leading to increased latency and decreased overall throughput. Similarly, in serverless setups, ORM's heavyweight abstraction may struggle with short-lived compute instances. Developers need to carefully evaluate the trade-offs between ORM convenience and cloud-native benefits when architecting applications.
‍&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;In conclusion, while an ORM simplifies database interactions and enhances code maintainability, developers should be aware of these challenges and make informed decisions based on the specific needs of their applications. Mitigating these challenges often involves a combination of careful design, optimization, and a deep understanding of both the ORM framework and the underlying database system.&lt;/p&gt;

</description>
      <category>database</category>
      <category>mongodb</category>
      <category>postgres</category>
      <category>mysql</category>
    </item>
    <item>
      <title>Top 4 Alternatives to MySQL</title>
      <dc:creator>Shani Shoham</dc:creator>
      <pubDate>Wed, 15 May 2024 23:32:26 +0000</pubDate>
      <link>https://forem.com/neurelo/mysql-or-top-alternatives-in-2024-and-how-to-choose-one-1oj5</link>
      <guid>https://forem.com/neurelo/mysql-or-top-alternatives-in-2024-and-how-to-choose-one-1oj5</guid>
      <description>&lt;p&gt;In recent years, the landscape of DataBase Management Systems (DBMS) has undergone significant changes, leading to the rise of new and innovative solutions that challenge the traditional dominance of MySQL. As businesses strive for improved developer experience, performance, scalability, and security, it becomes imperative to also explore the top alternatives to MySQL in 2024 and choose the best database for your application needs. &lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding the Need for MySQL Alternatives
&lt;/h2&gt;

&lt;p&gt;Before delving into the world of alternatives, it is crucial to comprehend why this shift is happening in the first place. The evolution of DBMS has played a pivotal role, with traditional relational databases now facing competition from modern and specialized solutions, ensuring organizations have the right tools for the job.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Evolution of Database Management Systems
&lt;/h2&gt;

&lt;p&gt;Over the years, database management systems have transformed from simple storage and retrieval tools to sophisticated systems capable of handling complex data structures and large volumes of information. This evolution has resulted in a diverse array of options, each catering to specific needs and use cases.&lt;/p&gt;

&lt;p&gt;One key aspect of this evolution is the rise of NoSQL databases. Unlike traditional relational databases like MySQL, NoSQL databases offer a flexible data model that allows for the storage and retrieval of unstructured and semi-structured data. This makes them particularly well-suited for applications dealing with large amounts of diverse data, such as social media platforms or IoT devices.&lt;/p&gt;

&lt;p&gt;Another significant development in the world of database management systems is the emergence of cloud-native databases. These databases are designed specifically for cloud environments and offer features like automatic scaling, high availability, and distributed data storage. They provide organizations with the ability to easily scale their databases as their needs grow, without the need for complex infrastructure management.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Consider Alternatives to MySQL?
&lt;/h2&gt;

&lt;p&gt;MySQL is a widely-used open-source relational database management system (RDBMS) that offers several advantages for both small and large applications. One of its primary benefits is its speed and efficiency in handling large volumes of data, making it suitable for dynamic websites and applications that require fast data processing. MySQL supports a broad range of operating systems, ensuring flexibility across different environments. It is known for its reliability and robustness, with features that ensure data integrity and security, including transactional support, which allows for consistent data states even in case of system failures. Additionally, MySQL is easy to use and manage, with a vast community and comprehensive documentation available for support.&lt;/p&gt;

&lt;p&gt;While MySQL has long been, and still is, a popular choice due to its ease of use and widespread adoption, there are several reasons why businesses may want to consider alternatives.&lt;/p&gt;

&lt;p&gt;For instance, if your application requires high-performance real-time analytics, the limitations of MySQL's architecture may become a factor. MySQL is primarily designed for transactional workloads, where data consistency is of utmost importance. However, when it comes to analytical workloads that require complex queries and aggregations on large datasets, MySQL may not be ideal to deliver the required performance. In such cases, alternative databases like Apache Cassandra or Amazon Redshift, which are optimized for analytical workloads, can provide significant performance improvements.&lt;/p&gt;

&lt;p&gt;Furthermore, MySQL's scalability and reliability might not fully meet the demands of rapidly expanding datasets and high-traffic applications. As data volumes grow and user concurrency increases, MySQL's single-node architecture can become a bottleneck. Alternative databases like MongoDB or Apache HBase, which offer distributed architectures and automatic sharding, can provide better scalability and fault tolerance.&lt;/p&gt;

&lt;p&gt;It is crucial to assess your specific requirements and determine whether an alternative DBMS can better address your organization's needs. Consider factors such as the nature of your data, the workload characteristics, and the scalability and performance requirements. By carefully evaluating your options, you can ensure that you choose the right database management system that aligns with your business goals and technical requirements.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  Overview of Top MySQL Alternatives in 2024
&lt;/h2&gt;

&lt;p&gt;Now that we understand the need for alternative DBMS options, let's explore some of the leading contenders in 2024.&lt;/p&gt;

&lt;h2&gt;
  
  
  PostgreSQL: An Advanced Open-Source Option
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.postgresql.org/"&gt;PostgreSQL&lt;/a&gt; has gained significant traction in recent years due to its feature-richness and robustness. As an open-source DBMS, it offers a high degree of customization and scalability, making it a suitable choice for a wide range of applications. With support for advanced data types, spatial data, and full-text search capabilities, PostgreSQL continues to enhance its reputation as a powerful alternative.&lt;/p&gt;

&lt;h2&gt;
  
  
  MongoDB: A Leading NoSQL Database
&lt;/h2&gt;

&lt;p&gt;In the world of NoSQL databases, &lt;a href="https://www.mongodb.com/"&gt;MongoDB&lt;/a&gt; stands out as a popular choice. Its flexible document-based data model allows for easy integration of complex and evolving data structures. Moreover, MongoDB's horizontal scalability and automatic sharding make it an excellent option for applications that handle large amounts of data distributed across multiple servers.&lt;/p&gt;

&lt;h2&gt;
  
  
  PlanetScale:
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://planetscale.com/"&gt;PlanetScale&lt;/a&gt; is a MySQL-compatible database that offers scale, performance, and reliability, and many more powerful database features. Leveraging cloud-native architecture, PlanetScale enables organizations to deploy, manage, and scale MySQL-compatible databases with ease. With features such as automatic sharding, distributed transactions, and high availability, PlanetScale enables businesses to handle large volumes of data and support complex workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  SQLite: A Lightweight Database System
&lt;/h2&gt;

&lt;p&gt;Unlike the previous alternatives mentioned, &lt;a href="https://www.sqlite.org/"&gt;SQLite&lt;/a&gt; offers a different approach by delivering a lightweight, serverless, and embedded DBMS. Ideal for simple applications and mobile devices, SQLite provides a compact and efficient solution without the need for external dependencies or configurations.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Factors to Consider When Choosing MySQL or an Alternative
&lt;/h2&gt;

&lt;p&gt;With numerous alternatives available, selecting the right DBMS warrants careful consideration of various factors.&lt;/p&gt;

&lt;h2&gt;
  
  
  Evaluating Your Database Needs
&lt;/h2&gt;

&lt;p&gt;Understanding your specific application requirements, including the volume of data, performance expectations, and scalability needs, will aid in selecting the most suitable alternative for your organization.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding the Importance of Scalability
&lt;/h2&gt;

&lt;p&gt;As your business grows, the ability of your DBMS to scale seamlessly becomes crucial. An alternative that can handle increasing data loads and easily adapt to changing demands ensures long-term viability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Developer experience
&lt;/h2&gt;

&lt;p&gt;Developers are the main users of databases, connecting applications to the database management system (DBMS). A developer-friendly interface can significantly streamline the development process, allowing developers to focus on building and refining their applications rather than wrestling with intricate database-related tasks. It's essential to consider the effort involved in designing data models, automating schema migration, creating a scalable and maintainable DBMS interface, and constructing SQL queries for CRUD operations and other complex queries related to data interaction. Furthermore, factors such as version control and collaboration across various team members and stages of the software development lifecycle (SDLC) must be taken into account. Prioritizing developer experience not only ensures smoother development cycles and faster time-to-market but also helps preempt many complexities in production, ultimately leading to greater project success overall. &lt;/p&gt;

&lt;h2&gt;
  
  
  Security Features
&lt;/h2&gt;

&lt;p&gt;Data security remains a top priority for every organization. Consider options that offer robust security features such as encryption, authentication mechanisms, and access control to safeguard your critical information.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Role of Community and Support
&lt;/h2&gt;

&lt;p&gt;Considering the availability of a vibrant community and reliable support resources is essential for smooth operations. Active developer communities often lead to continuous improvements, bug fixes, and available documentation.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  Managing your developers access
&lt;/h2&gt;

&lt;p&gt;Regardless of whether you are using MySQL or another alternative, all solutions require a significant effort in creating APIs that your developers can use, knowing the specific query syntax, keeping track of changes to those APIs. testing and debugging them and keeping track of their performance (i.e timeouts and response time).&lt;/p&gt;

&lt;p&gt;Neurelo gives you a cloud data API platform that is database agnostic. Neurelo auto-detects your schema and auto-generates your APIs. It uses AI that understands your schema to create APIs for complex custom queries in minutes and provides robust version control and observability capabilities. &lt;/p&gt;

&lt;p&gt;With Neurelo you are giving developers streamlined access to your database in under a minute. Try it yourself: &lt;a href="https://dashboard.neurelo.com/register"&gt;https://dashboard.neurelo.com/register&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This content was originally posted at &lt;a href="https://www.neurelo.com/post/mqsql-or-top-alternatives"&gt;https://www.neurelo.com/post/mqsql-or-top-alternatives&lt;/a&gt;&lt;/p&gt;

</description>
      <category>database</category>
      <category>mysql</category>
      <category>mongodb</category>
      <category>postgres</category>
    </item>
  </channel>
</rss>
