<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Avthar Sewrathan</title>
    <description>The latest articles on Forem by Avthar Sewrathan (@avthars).</description>
    <link>https://forem.com/avthars</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F242937%2Fbed60e22-a765-412d-a226-14d139264c4d.jpeg</url>
      <title>Forem: Avthar Sewrathan</title>
      <link>https://forem.com/avthars</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/avthars"/>
    <language>en</language>
    <item>
      <title>🚀 pgai Vectorizer: Automate AI Embeddings With One SQL Command in PostgreSQL</title>
      <dc:creator>Avthar Sewrathan</dc:creator>
      <pubDate>Tue, 29 Oct 2024 13:31:18 +0000</pubDate>
      <link>https://forem.com/tigerdata/pgai-vectorizer-automate-ai-embeddings-with-one-sql-command-in-postgresql-11kp</link>
      <guid>https://forem.com/tigerdata/pgai-vectorizer-automate-ai-embeddings-with-one-sql-command-in-postgresql-11kp</guid>
      <description>&lt;p&gt;&lt;em&gt;Learn how to automate AI embedding creation using the PostgreSQL you know and love.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Managing embedding workflows for AI systems like RAG, search and AI agents can be a hassle: juggling multiple tools, setting up complex pipelines, and spending hours syncing data, especially if you aren't an ML or AI expert. But it doesn’t have to be that way.&lt;/p&gt;

&lt;p&gt;With &lt;a href="https://github.com/timescale/pgai/blob/main/docs/vectorizer.md" rel="noopener noreferrer"&gt;pgai Vectorizer&lt;/a&gt;, now in Early Access, you can automate vector embedding creation, keep them automatically synced as your data changes, and experiment with different AI models -- all with a simple SQL command. No extra tools, no complex setups -- just PostgreSQL doing the heavy lifting.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Create a vectorizer to embed data in the blogs table&lt;/span&gt;
&lt;span class="c1"&gt;-- Use Open AI text-embedding-3-small model&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_vectorizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s1"&gt;'public.blogs'&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;regclass&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding_openai&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'text-embedding-3-small'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1536&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;chunking&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chunking_recursive_character_text_splitter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'content'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What pgai Vectorizer does:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Embedding creation with SQL:&lt;/strong&gt; generate vector embeddings from multiple text columns with just one command, streamlining a key part of your AI workflow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automatic sync:&lt;/strong&gt; embeddings update as your data changes—no manual intervention needed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quick model switching:&lt;/strong&gt; test different AI models instantly using SQL—no data reprocessing required.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test and roll out:&lt;/strong&gt; compare models and chunking techniques, A/B test, and roll out updates with confidence and without downtime.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8a1wkn7uugpjmg7dsdp9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8a1wkn7uugpjmg7dsdp9.png" alt="pgai Vectorizer system architecture –  Pgai Vectorizer automatically creates and updates embeddings from a source data table through the use of work queues and configuration tables housed in PostgreSQL, while embeddings are created in an external worker that interacts with embedding services like the OpenAI API.&amp;lt;br&amp;gt;
" width="800" height="454"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's an example of testing the RAG output of two different embedding models using pgai Vectorizer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Vectorizer using OpenAI text-embedding-3-small&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_vectorizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
   &lt;span class="s1"&gt;'public.blogs'&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;regclass&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;destination&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'blogs_embedding_small'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding_openai&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'text-embedding-3-small'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1536&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
   &lt;span class="n"&gt;chunking&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chunking_recursive_character_text_splitter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'content'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
   &lt;span class="n"&gt;formatting&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;formatting_python_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Title: $title&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;URL: $url&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;Content: $chunk'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Vectorizer using OpenAI text-embedding-3-large&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_vectorizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
   &lt;span class="s1"&gt;'public.blogs'&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;regclass&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;destination&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'blogs_embedding_large'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding_openai&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'text-embedding-3-large'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1536&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;-- Note different dimensions&lt;/span&gt;
   &lt;span class="n"&gt;chunking&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chunking_recursive_character_text_splitter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'content'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
   &lt;span class="n"&gt;formatting&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;formatting_python_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Title: $title&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;URL: $url&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;Content: $chunk'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Compare results from the two vectorizers on the same RAG query&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
   &lt;span class="s1"&gt;'text-embedding-3-small'&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;generate_rag_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
       &lt;span class="s1"&gt;'What is AI?'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="s1"&gt;'public.blogs_embedding_small'&lt;/span&gt;
   &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;
&lt;span class="k"&gt;UNION&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
   &lt;span class="s1"&gt;'text-embedding-3-large'&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;generate_rag_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
       &lt;span class="s1"&gt;'What is AI?'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="s1"&gt;'public.blogs_embedding_large'&lt;/span&gt;
   &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Built to Scale
&lt;/h2&gt;

&lt;p&gt;As your datasets grow, pgai Vectorizer scales with you. It automatically optimizes search performance with vector indexes (like HNSW and StreamingDiskANN) once you exceed 100,000 vectors. You’re in control—define chunking and formatting rules to tailor your embeddings to your needs.&lt;/p&gt;

&lt;p&gt;Here's an example of an advanced vectorizer configuration, with an ANN index created after 100k rows added, and custom chunking for HTML files:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;
&lt;span class="c1"&gt;-- Advanced vectorizer configuration&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_vectorizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
   &lt;span class="s1"&gt;'public.blogs'&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;regclass&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;destination&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'blogs_embedding_recursive'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding_openai&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'text-embedding-3-small'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1536&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
   &lt;span class="c1"&gt;-- automatically create a StreamingDiskANN index when table has 100k rows&lt;/span&gt;
   &lt;span class="n"&gt;indexing&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;indexing_diskann&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;min_rows&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;100000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;storage_layout&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'memory_optimized'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
   &lt;span class="c1"&gt;-- apply recursive chunking with specified settings for HTML content&lt;/span&gt;
   &lt;span class="n"&gt;chunking&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chunking_recursive_character_text_splitter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
       &lt;span class="s1"&gt;'content'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;chunk_size&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;800&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;chunk_overlap&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="c1"&gt;-- HTML-aware separators, ordered from highest to lowest precedence&lt;/span&gt;
       &lt;span class="n"&gt;separator&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
           &lt;span class="n"&gt;E&lt;/span&gt;&lt;span class="s1"&gt;'&amp;lt;/article&amp;gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;-- Split on major document sections&lt;/span&gt;
           &lt;span class="n"&gt;E&lt;/span&gt;&lt;span class="s1"&gt;'&amp;lt;/div&amp;gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;-- Split on div boundaries&lt;/span&gt;
           &lt;span class="n"&gt;E&lt;/span&gt;&lt;span class="s1"&gt;'&amp;lt;/section&amp;gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="n"&gt;E&lt;/span&gt;&lt;span class="s1"&gt;'&amp;lt;/p&amp;gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;-- Split on paragraphs&lt;/span&gt;
           &lt;span class="n"&gt;E&lt;/span&gt;&lt;span class="s1"&gt;'&amp;lt;br&amp;gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;-- Split on line breaks&lt;/span&gt;
           &lt;span class="n"&gt;E&lt;/span&gt;&lt;span class="s1"&gt;'&amp;lt;/li&amp;gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;     &lt;span class="c1"&gt;-- Split on list items&lt;/span&gt;
           &lt;span class="n"&gt;E&lt;/span&gt;&lt;span class="s1"&gt;'. '&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;-- Fall back to sentence boundaries&lt;/span&gt;
           &lt;span class="s1"&gt;' '&lt;/span&gt;          &lt;span class="c1"&gt;-- Last resort: split on spaces&lt;/span&gt;
       &lt;span class="p"&gt;]&lt;/span&gt;
   &lt;span class="p"&gt;),&lt;/span&gt;
   &lt;span class="n"&gt;formatting&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;formatting_python_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'title: $title url: $url $chunk'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Try pgai Vectorizer Today (Early Access)
&lt;/h2&gt;

&lt;p&gt;For companies like MarketReader, pgai Vectorizer has already made AI development faster and more efficient:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“pgai Vectorizer streamlines our AI workflow, from embedding creation to real-time syncing, making AI development faster and simpler -- all in PostgreSQL.” — Web Begole, CTO at MarketReader, an AI Financial Insights Company&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you're ready to start building, we are hosting a &lt;a href="https://dev.to/challenges/pgai"&gt;Dev Challenge&lt;/a&gt; with our partners at Ollama all about building AI apps with Open Source Software. We're really excited to see what the community builds with PostgreSQL and pgai Vectorizer!&lt;/p&gt;

&lt;p&gt;Save time and effort. Focus less on embeddings. Spend more time building your next killer AI app. Try pgai Vectorizer free today: &lt;a href="https://github.com/timescale/pgai/blob/main/docs/vectorizer-quick-start.md" rel="noopener noreferrer"&gt;get it on GitHub&lt;/a&gt; or fully managed on &lt;a href="https://console.cloud.timescale.com/signup/?utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=vectorlaunch&amp;amp;" rel="noopener noreferrer"&gt;Timescale Cloud &lt;/a&gt;(free for a limited time during Early Access). &lt;/p&gt;

</description>
      <category>ai</category>
      <category>vectordatabase</category>
      <category>postgres</category>
      <category>opensource</category>
    </item>
    <item>
      <title>How to Build More Accurate Grafana Trend Lines: Give Perspective with Series-Override</title>
      <dc:creator>Avthar Sewrathan</dc:creator>
      <pubDate>Thu, 30 Apr 2020 19:58:42 +0000</pubDate>
      <link>https://forem.com/tigerdata/how-to-build-more-accurate-grafana-trend-lines-give-perspective-with-series-override-9i8</link>
      <guid>https://forem.com/tigerdata/how-to-build-more-accurate-grafana-trend-lines-give-perspective-with-series-override-9i8</guid>
      <description>&lt;h2&gt;
  
  
  Problem: Skewed Trends Due to Differences in Data Scale
&lt;/h2&gt;

&lt;p&gt;Many times, we want to plot two variables on the same graph (a useful feature of viz tools like &lt;a href="https://grafana.com" rel="noopener noreferrer"&gt;Grafana&lt;/a&gt;), but, run into one big problem: the scale of one of the variables distorts the trend line of the other variable. &lt;/p&gt;

&lt;p&gt;Case in point is this graph I put together to track COVID-19 cases and deaths in the USA:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Frm5wlf4d6p7d3sgb6f6e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Frm5wlf4d6p7d3sgb6f6e.png" alt="COVID cases and deaths on same Y axis" width="606" height="341"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you can see, the scale of the total cases makes the trend line for deaths look flat, even though it’s actually growing rapidly, as we can see from the graph which plots only COVID-19-related deaths:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fqgrc28o0gq56ya6a5rsd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fqgrc28o0gq56ya6a5rsd.png" alt="COVID US Deaths plotted by itself" width="597" height="337"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Viewing two related data points in one graph is extremely useful to create informationally dense dashboards and compare related variables, but distorted trends can have large consequences - whether we view the COVID fatality situation more optimistically than we should, or if we’re comparing our eCommerce site’s unique visitors and relative session crashes. &lt;/p&gt;

&lt;p&gt;We need a way to more accurately represent the trends of both variables, while still plotting them on the same axis. &lt;/p&gt;

&lt;h2&gt;
  
  
  Solution: Two Y Axes!
&lt;/h2&gt;

&lt;p&gt;The solution is to use a different Y axis for each variable on our graph. Continuing with my COVID-19 example, this means one for the total cases variable and one for the total deaths variable, as shown in the graph below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F59b03lxhoz9eonsnogz7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F59b03lxhoz9eonsnogz7.png" alt="COVID Deaths and Cases plotted on different Y-axes" width="604" height="339"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here, we use two Y axes, one for COVID-19 total cases, on the left, and one for total deaths, on the right. &lt;/p&gt;

&lt;p&gt;Each axis has their own scale, allowing us to more accurately see the growth of each trend line without the scale of one variable (e.g., total volume of reported cases) impacting how another variable (e.g., growing number deaths) appears.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it yourself: Implementation in Grafana with Series Override
&lt;/h2&gt;

&lt;p&gt;In this post, I'll show you how to use Grafana’s series override feature to implement two Y axes (and, thus, solve our two-trend line problem).&lt;/p&gt;

&lt;p&gt;We’ll use the example of charting the spread of COVID-19 cases and deaths in the USA, but the concepts apply to any dataset you’d like to visualize in Grafana. We’ll get our COVID-19 data from &lt;a href="https://github.com/nytimes/covid-19-data" rel="noopener noreferrer"&gt;the New York Times’  public dataset&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;p&gt;To replicate the graph I’ll create in the following steps, you’ll need a: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://grafana.com" rel="noopener noreferrer"&gt;Grafana instance&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;TimescaleDB database, loaded with the NYT COVID-19 data.&lt;/li&gt;
&lt;li&gt;PostgreSQL datasource, with TimescaleDB enabled, connected to your Grafana instance. See &lt;a href="https://docs.timescale.com/latest/tutorials/tutorial-grafana/?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=advocacy-apr-2020&amp;amp;utm_content=grafana-viz-doc" rel="noopener noreferrer"&gt;here&lt;/a&gt; to get this setup. &lt;/li&gt;
&lt;li&gt;Grafana panel with Graph visualization using the PostgreSQL database with the COVID data as the data source. &lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 1: Create two series
&lt;/h3&gt;

&lt;p&gt;Plotting multiple series in one panel is a handy Grafana feature. Let’s create two series, one for COVID-19 cases and the other for COVID-19 deaths:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;"time"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;sum&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cases&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;total_cases&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;deaths&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;total_deaths&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;states&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice how we alias total cases and deaths as total_cases and total_deaths respectively.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Modify our visualization to add a second Y axis
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fdxmub3wkfmem2v6laath.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fdxmub3wkfmem2v6laath.png" alt="series override configuration settings in Grafana" width="800" height="516"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;First, navigate to the visualization panel (pictured above) and select the &lt;code&gt;Add series override&lt;/code&gt; button. &lt;/p&gt;

&lt;p&gt;Then, we select the name of the series we'd like to override, “total_deaths” from the drop down menu. Then, to associate the series with the second Y axis, we select the ‘plus’ button and then select Y-Axis 2, as shown below: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fygvophv46auyskiwxbcr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fygvophv46auyskiwxbcr.png" alt="How to find Y-axis 2 in Grafana series override settings" width="375" height="615"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When we navigate down to the Axes section, we see &lt;code&gt;Left Y&lt;/code&gt; and &lt;code&gt;Right Y&lt;/code&gt;, where we customize the units and scale for each axis. &lt;/p&gt;

&lt;p&gt;In our case, we’ll leave the units as &lt;code&gt;short&lt;/code&gt; and the scale as &lt;code&gt;linear&lt;/code&gt;, since those defaults work for the scalar quantities in our COVID dataset.&lt;/p&gt;

&lt;p&gt;Finally, we save the graph and refresh. We should now see both variables, total cases and deaths, plotted on the same graph, but with differently scaled axes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fnelk5simidx2lgoz96wl.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fnelk5simidx2lgoz96wl.jpg" alt="Before and After Series Override" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Notice: we more clearly see how quickly COVID-19 deaths in the USA are growing, which was difficult to discern in the original graph where deaths were plotted with total COVID-19 cases on the same Y axis.&lt;/p&gt;

&lt;p&gt;That’s it! We’ve successfully created a graph with two Y axes, using series-override!&lt;/p&gt;

&lt;h2&gt;
  
  
  Learn More
&lt;/h2&gt;

&lt;p&gt;Found this tutorial useful? Here’s two more resources to help you build Grafana dashboards like a pro: &lt;/p&gt;

&lt;h3&gt;
  
  
  #1 Grafana Webinar
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.timescale.com/webinar/guide-to-grafana-101-getting-started-with-alerts/?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=advocacy-apr-2020&amp;amp;utm_content=grafana-101-webinar-2-signup" rel="noopener noreferrer"&gt;Join me on May 20 at 10am PT/1pm ET/4pm GMT&lt;/a&gt; where I’ll demo how to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use alerts effectively when monitoring metrics in Grafana&lt;/li&gt;
&lt;li&gt;Define alert rules for your panels and dashboards&lt;/li&gt;
&lt;li&gt;Configure different notification channels, like Slack and email&lt;/li&gt;
&lt;li&gt;Take my demo and customize it for your project, team, or organization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I’ll focus on code and step-by-step live demos – and I and my dashboarding expert colleagues will be available to answer questions throughout the session, plus share ample resources and technical documentation.&lt;/p&gt;

&lt;h3&gt;
  
  
  #2 All-in-One Grafana Tutorial
&lt;/h3&gt;

&lt;p&gt;We’ve compiled all our tutorials, tips, and tricks for visualizing PostgreSQL data in Grafana into this one doc. You’ll find everything from how to create visuals for Prometheus metrics to how to visualize geo-spatial data using a World Map. Check it out &lt;a href="https://docs.timescale.com/latest/tutorials/tutorial-grafana/?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=advocacy-apr-2020&amp;amp;utm_content=grafana-viz-doc" rel="noopener noreferrer"&gt;here&lt;/a&gt;&lt;/p&gt;

</description>
      <category>showdev</category>
      <category>sql</category>
      <category>tutorial</category>
      <category>devops</category>
    </item>
    <item>
      <title>Devopsdays NYC 2020 Demo, Open Space Recap &amp; More</title>
      <dc:creator>Avthar Sewrathan</dc:creator>
      <pubDate>Wed, 18 Mar 2020 22:13:23 +0000</pubDate>
      <link>https://forem.com/tigerdata/devopsdays-nyc-2020-demo-open-space-recap-more-3n92</link>
      <guid>https://forem.com/tigerdata/devopsdays-nyc-2020-demo-open-space-recap-more-3n92</guid>
      <description>&lt;p&gt;&lt;strong&gt;Learn about the latest devopsdays event, get our demo, answers to community questions, and more.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;(This post was originally published on the &lt;a href="https://blog.timescale.com/blog/devopsdays-nyc-2020-demo-open-space-recap-more/??utm_source=devto-devopsday2020&amp;amp;utm_medium=blog&amp;amp;utm_campaign=mar-2020-advocacy" rel="noopener noreferrer"&gt;Timescale Blog&lt;/a&gt; on March 13, 2020.)&lt;/p&gt;

&lt;p&gt;We recently attended the NYC installment of the &lt;a href="https://devopsdays.org/about" rel="noopener noreferrer"&gt;devopsdays event series&lt;/a&gt; (thank you to the local organizers and volunteers!), where we met with community members interested in all things monitoring, infrastructure, software development, and CI/CD.&lt;/p&gt;

&lt;p&gt;Given the cancellation of many industry events to ensure public safety and mitigate COVID-19’s spread (&lt;a href="https://blog.timescale.com/blog/charting-the-spread-of-covid-19-using-timescale/?utm_source=devto-devopsday2020&amp;amp;utm_medium=blog&amp;amp;utm_campaign=mar-2020-advocacy" rel="noopener noreferrer"&gt;check out our blog post if you’re interested in monitoring it yourself&lt;/a&gt;), we’re sharing a bit about our recent experience – what we learned, what we demoed, and what we spoke about – to bring the event experience to the wider community. &lt;/p&gt;

&lt;h1&gt;
  
  
  The Demo
&lt;/h1&gt;

&lt;p&gt;During the event, I demoed how to use TimescaleDB as a long-term store for Prometheus metrics - combining Prometheus, TimescaleDB, and Grafana to monitor a piece of critical infrastructure (in this case, a database). This sort of create-your-own flexibility and customization is becoming more and more common in the conversations I have with developers, and this demo allows you to create a monitoring stack that suits your needs, without adding significant costs.&lt;/p&gt;

&lt;p&gt;Why this scenario? I was inspired by one of our Timescale Cloud customers, who uses TimescaleDB to store and analyze their Prometheus metrics. They told us how it not only saves them money and disk space, but it also allows them to keep their data around and see trends over longer time periods.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.youtube.com/watch?v=wm9T7lWCgpE" rel="noopener noreferrer"&gt;&lt;em&gt;See the demo in action below:&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/wm9T7lWCgpE"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;You’ll notice a Grafana dashboard visualizing metrics, with TimescaleDB as the data source powering the dashboard. I focused on the below basic monitoring metrics, but if you try it yourself, you can customize and add more metrics that give you more insight (e.g., query latency, queries per second, open locks, cache hits, etc.):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CPU usage&lt;/li&gt;
&lt;li&gt;Service status&lt;/li&gt;
&lt;li&gt;% of Disk used&lt;/li&gt;
&lt;li&gt;# of Database connections&lt;/li&gt;
&lt;li&gt;% Memory used&lt;/li&gt;
&lt;li&gt;Network Status&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To replicate the demo, follow these tutorials on &lt;a href="https://docs.timescale.com/latest/tutorials/prometheus-adapter/?utm_source=devto-devopsday2020&amp;amp;utm_medium=blog&amp;amp;utm_campaign=mar-2020-advocacy" rel="noopener noreferrer"&gt;how to store Prometheus metrics in Timescale&lt;/a&gt; and &lt;a href="https://docs.timescale.com/latest/tutorials/tutorial-grafana/?utm_source=devto-devopsday2020&amp;amp;utm_medium=blog&amp;amp;utm_campaign=mar-2020-advocacy" rel="noopener noreferrer"&gt;how to use Timescale as a datasource to power Grafana dashboards&lt;/a&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  Open Space: DevOps &amp;amp; Data
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh5.googleusercontent.com%2FPnjWSJ68_HlMi7aioGnNKxJWVApCpwOC8L5ATZlNlaztfEJzEgdaNu9djwqTaxE12N0VRG-tFbUZh_jLakf5o-4hm1wLLy8tiyIMkwfr3S92_ra3IWQuxt8pGByfiItFtZ7XPRkS" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh5.googleusercontent.com%2FPnjWSJ68_HlMi7aioGnNKxJWVApCpwOC8L5ATZlNlaztfEJzEgdaNu9djwqTaxE12N0VRG-tFbUZh_jLakf5o-4hm1wLLy8tiyIMkwfr3S92_ra3IWQuxt8pGByfiItFtZ7XPRkS" alt="Mat leading Open Space on DevOps and Data" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://devopsdays.org/open-space-format/" rel="noopener noreferrer"&gt;Devopsdays “Open Spaces” are a (wonderful) concept&lt;/a&gt; similar to an unconference format: there’s a block of time scheduled for any attendees to discuss topics of their choosing with other interested attendees. Simply propose a topic to the audience that you’d like to discuss for 30 mins and other attendees can pick and choose which sessions they’d like to attend.&lt;/p&gt;

&lt;p&gt;Fellow Timescaler &lt;a href="https://twitter.com/cevianNY" rel="noopener noreferrer"&gt;Matvey Arye&lt;/a&gt; and I hosted an Open Space session about DevOps Data, and other topics ranged from negotiating pay and other soft skills to DevOps in small companies and DevOps in a certain ecosystem (AWS, Microsoft Azure, Google Cloud, etc.).&lt;/p&gt;

&lt;p&gt;In our session, we heard stories, best practices, and the ways developers from all industries and areas think about the DevOps data they collect.&lt;/p&gt;

&lt;h3&gt;
  
  
  A few highlights and commonalities
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Teams are moving away from managing infrastructure themselves and toward managed services&lt;/strong&gt; (as one person put it: “One of the key criteria when we select a new tool is that we want one less thing to manage”).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DevOps at certain companies can be a lonely and isolating job.&lt;/strong&gt; To remedy that, folks mentioned that they’d joined (and recommend!) a few Slack workspaces: &lt;a href="https://o11y.slack.com" rel="noopener noreferrer"&gt;O11y.slack.com&lt;/a&gt;, &lt;a href="https://signup.hangops.com" rel="noopener noreferrer"&gt;HangOps&lt;/a&gt; and &lt;a href="http://www.coffeeops.org" rel="noopener noreferrer"&gt;Coffee Ops&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data is becoming increasingly central in how teams fuel their post-mortem problem analysis.&lt;/strong&gt; Developers collect data about critical incidents, search for patterns in what’s causing them, and correlate this information with how it impacts clients or users.&lt;/p&gt;

&lt;p&gt;One team’s best practice and advice (they manage a massive consumer messaging app): Take snapshots of high load periods. This way, you get more detailed information to use for planning and to calibrate for the following years. In this team’s case, the New Year’s Eve timeframe is when they see the highest number of messages sent across their global user base.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Kubernetes, as always, was a hot topic.&lt;/strong&gt; Two common pain points stood out (and are things that we can relate to as we &lt;a href="https://blog.timescale.com/blog/new-helm-charts-for-deploying-timescaledb-on-kubernetes/?utm_source=devto-devopsday2020&amp;amp;utm_medium=blog&amp;amp;utm_campaign=mar-2020-advocacy" rel="noopener noreferrer"&gt;build our Kubernetes deployment and multi-node offerings&lt;/a&gt;):&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Visibility about what’s happening inside clusters and pods. Someone summed it up with, “I don’t just want to know my pod is offline, I want to know what was going on inside it.” We couldn’t agree more.&lt;/li&gt;
&lt;li&gt;Aggregate observability data across clusters to simplify things for Ops teams who handle metrics from multiple applications teams.&lt;/li&gt;
&lt;/ol&gt;

&lt;h1&gt;
  
  
  Questions &amp;amp; Conversations
&lt;/h1&gt;

&lt;p&gt;To me, the best part of any conference are the hallway conversations and hearing the things community members are keen to learn. As a company, we’re help-first, so, in the spirit of helping, here are a few questions I heard again and again that may be relevant as you get up and running, or do more advanced things with TimescaleDB:&lt;/p&gt;

&lt;h3&gt;
  
  
  How does TimescaleDB perform at scale?
&lt;/h3&gt;

&lt;p&gt;TimescaleDB scales up well within a single node, and also offers scale-out capabilities if you use our &lt;a href="https://docs.timescale.com/clustering/getting-started/scaling-out/?utm_source=devto-devopsday2020&amp;amp;utm_medium=blog&amp;amp;utm_campaign=mar-2020-advocacy" rel="noopener noreferrer"&gt;multi-node beta&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In our internal benchmarks on standard cloud VMs, we regularly test TimescaleDB to 10+ billion rows, while sustaining insert rates of 100-200k rows per second (1-2 million metric inserts / second). While running on more powerful hardware, we’ve seen users scale a single-node setup to 500 billion rows of data, while sustaining 400k row inserts per second. To learn more about how TimescaleDB is architected to achieve this scale, see this &lt;a href="https://blog.timescale.com/blog/time-series-data-why-and-how-to-use-a-relational-database-instead-of-nosql-d0cd6975e87c/?utm_source=devto-devopsday2020&amp;amp;utm_medium=blog&amp;amp;utm_campaign=mar-2020-advocacy" rel="noopener noreferrer"&gt;blog explainer&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;And, in our internal tests, a multi-node beta setup with 9 nodes achieved an insert rate of over &lt;a href="https://blog.timescale.com/blog/building-a-distributed-time-series-database-on-postgresql/?utm_source=devto-devopsday2020&amp;amp;utm_medium=blog&amp;amp;utm_campaign=mar-2020-advocacy" rel="noopener noreferrer"&gt;12 million metrics per second&lt;/a&gt; (and you can read more about our multi-node benchmarking &lt;a href="https://blog.timescale.com/blog/achieving-optimal-query-performance-with-a-distributed-time-series-database-on-postgresql/?utm_source=devto-devopsday2020&amp;amp;utm_medium=blog&amp;amp;utm_campaign=mar-2020-advocacy" rel="noopener noreferrer"&gt;here&lt;/a&gt;).&lt;/p&gt;

&lt;h3&gt;
  
  
  What’s the role of a long-term data store? What types of things does this allow me to do?
&lt;/h3&gt;

&lt;p&gt;In order to keep Prometheus simple and easy to operate, its creators intentionally left out some of the scaling features developers typically need. Prometheus stores data locally within the instance and is not replicated. While having both compute and data storage on one node makes it easier to operate, it also makes it harder to scale and ensure high availability.&lt;/p&gt;

&lt;p&gt;More specifically, this means Prometheus data isn’t arbitrarily scalable or durable in the face of disk or node outages.&lt;/p&gt;

&lt;p&gt;Simply put, Prometheus isn’t designed to be a long-term metrics store. However, its creators also made Prometheus extremely extensible, and, thus, you can use TimescaleDB to store metrics for longer periods of time, which helps with capacity planning and system calibration. This combination also enables &lt;a href="https://blog.timescale.com/blog/prometheus-ha-postgresql-8de68d19b6f5/?utm_source=devto-devopsday2020&amp;amp;utm_medium=blog&amp;amp;utm_campaign=mar-2020-advocacy" rel="noopener noreferrer"&gt;high availability&lt;/a&gt; and provides &lt;a href="https://blog.timescale.com/blog/sql-nosql-data-storage-for-prometheus-devops-monitoring-postgresql-timescaledb-time-series-3cde27fd1e07/?utm_source=devto-devopsday2020&amp;amp;utm_medium=blog&amp;amp;utm_campaign=mar-2020-advocacy" rel="noopener noreferrer"&gt;advanced capabilities and features&lt;/a&gt;, such as full SQL, joins and replication (things not available in Prometheus). To learn more, see &lt;a href="https://docs.timescale.com/latest/tutorials/prometheus-adapter/?utm_source=devto-devopsday2020&amp;amp;utm_medium=blog&amp;amp;utm_campaign=mar-2020-advocacy" rel="noopener noreferrer"&gt;why use TimescaleDB and Prometheus&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I use TimescaleDB and Prometheus? Do I have to use any special connectors?
&lt;/h3&gt;

&lt;p&gt;Check out the demo :). I suggest using TimescaleDB as a remote read and write for Prometheus metrics, whether they’re infrastructure for an internal system or your public-facing eCommerce website. Since TimescaleDB extends Postgres, you use the &lt;a href="https://github.com/timescale/pg_prometheus" rel="noopener noreferrer"&gt;pg_prometheus extension&lt;/a&gt; for Postgres and our &lt;a href="https://github.com/timescale/prometheus-postgresql-adapter" rel="noopener noreferrer"&gt;prometheus_postgresql_adapter&lt;/a&gt;, and you’re ready to get started.&lt;/p&gt;

&lt;p&gt;Whatever works with Postgres works with TimescaleDB, so, if you want to connect to &lt;a href="https://docs.timescale.com/latest/using-timescaledb/visualizing-data/?utm_source=devto-devopsday2020&amp;amp;utm_medium=blog&amp;amp;utm_campaign=mar-2020-advocacy" rel="noopener noreferrer"&gt;viz tools&lt;/a&gt; (like &lt;a href="https://docs.timescale.com/latest/tutorials/tutorial-grafana/?utm_source=devto-devopsday2020&amp;amp;utm_medium=blog&amp;amp;utm_campaign=mar-2020-advocacy" rel="noopener noreferrer"&gt;Grafana&lt;/a&gt; or &lt;a href="https://docs.timescale.com/latest/tutorials/visualizing-time-series-data-in-tableau/?utm_source=devto-devopsday2020&amp;amp;utm_medium=blog&amp;amp;utm_campaign=mar-2020-advocacy" rel="noopener noreferrer"&gt;Tableau&lt;/a&gt;), ingest data from places like &lt;a href="https://blog.timescale.com/blog/create-a-data-pipeline-with-timescaledb-and-kafka/?utm_source=devto-devopsday2020&amp;amp;utm_medium=blog&amp;amp;utm_campaign=mar-2020-advocacy" rel="noopener noreferrer"&gt;Kafka&lt;/a&gt; or insert and analyze data using your favorite programming language (like Python or Go), just use one of the many connectors and libraries in the Postgres ecosystem.&lt;/p&gt;

&lt;h1&gt;
  
  
  Want to learn more?
&lt;/h1&gt;

&lt;p&gt;Thank you again to the devopsdays NYC team for your work to pull off such an interactive, fun, and community-first event! We’ll definitely be attending as future events are announced (virtually or otherwise).&lt;/p&gt;

&lt;p&gt;In the meantime, those resources once more:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=wm9T7lWCgpE" rel="noopener noreferrer"&gt;Demo Video&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Tutorials: &lt;a href="https://docs.timescale.com/latest/tutorials/prometheus-adapter/?utm_source=devto-devopsday2020&amp;amp;utm_medium=blog&amp;amp;utm_campaign=mar-2020-advocacy" rel="noopener noreferrer"&gt;Prometheus&lt;/a&gt;, &lt;a href="https://docs.timescale.com/latest/tutorials/tutorial-grafana/?utm_source=devto-devopsday2020&amp;amp;utm_medium=blog&amp;amp;utm_campaign=mar-2020-advocacy" rel="noopener noreferrer"&gt;Grafana&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;...and, in the event you’d like to see an advanced version of this demo and/or are keen to join some #remote-friendly events, you can join me on March 25 at 12 ET for &lt;a href="https://www.timescale.com/webinar/how-to-analyze-your-prometheus-data-in-sql-3-queries-you-need-to-know/?utm_source=devto-devopsday2020&amp;amp;utm_medium=blog&amp;amp;utm_campaign=mar-2020-advocacy" rel="noopener noreferrer"&gt;&lt;strong&gt;“How to Analyze Your Prometheus Data in SQL: 3 Queries You Need to Know.”&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I’ll focus on code and showing vs. telling: You’ll learn how to write custom SQL queries to analyze infrastructure monitoring metrics and create Grafana visualizations to see trends, and I’ll answer any questions that you may have.&lt;/li&gt;
&lt;li&gt;Interested? &lt;a href="https://www.timescale.com/webinar/how-to-analyze-your-prometheus-data-in-sql-3-queries-you-need-to-know/?utm_source=devto-devopsday2020&amp;amp;utm_medium=blog&amp;amp;utm_campaign=mar-2020-advocacy" rel="noopener noreferrer"&gt;Sign up here&lt;/a&gt;. You’ll receive the recording and resources shortly following the session, so &lt;a href="https://www.timescale.com/webinar/how-to-analyze-your-prometheus-data-in-sql-3-queries-you-need-to-know/?utm_source=devto-devopsday2020&amp;amp;utm_medium=blog&amp;amp;utm_campaign=mar-2020-advocacy" rel="noopener noreferrer"&gt;register&lt;/a&gt; even if you can’t attend live.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>devops</category>
      <category>opensource</category>
      <category>eventsinyourcity</category>
      <category>learning</category>
    </item>
  </channel>
</rss>
