<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Arunkumar Panneerselvam</title>
    <description>The latest articles on Forem by Arunkumar Panneerselvam (@arunkumar_panneerselvam_2).</description>
    <link>https://forem.com/arunkumar_panneerselvam_2</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3490868%2F737bd6fa-1ebc-46e3-aee1-fc7c739a9b11.png</url>
      <title>Forem: Arunkumar Panneerselvam</title>
      <link>https://forem.com/arunkumar_panneerselvam_2</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/arunkumar_panneerselvam_2"/>
    <language>en</language>
    <item>
      <title>5 Surprising dbt Truths That Will Change How You Work</title>
      <dc:creator>Arunkumar Panneerselvam</dc:creator>
      <pubDate>Tue, 18 Nov 2025 04:29:30 +0000</pubDate>
      <link>https://forem.com/arunkumar_panneerselvam_2/5-surprising-dbt-truths-that-will-change-how-you-work-6p0</link>
      <guid>https://forem.com/arunkumar_panneerselvam_2/5-surprising-dbt-truths-that-will-change-how-you-work-6p0</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;When you first start with dbt, the learning curve feels straightforward. You master the essentials: dbt run executes your models, and the ref() function magically connects them into a DAG. It feels like you've grasped the core of the tool. But beneath this surface lies a set of powerful, non-obvious features and behaviors that can fundamentally change how you build, test, and maintain your data pipelines.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This article pulls back the curtain on a handful of these surprising and impactful truths about dbt. These aren't just niche tricks; they are fundamental concepts that, once understood, unlock a more reliable, efficient, and scalable way of working.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;dbt build is More Than Just a Shortcut—It's an Atomic Guardian of Your DAG
Many dbt practitioners start their journey by running dbt run to build models, followed by a separate dbt test to validate them. It seems logical. However, dbt build isn't just a convenient command that bundles these two steps; it's a more powerful, integrated command that operates with a crucial, surprising intelligence.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The dbt build command executes resources—models, tests, snapshots, and seeds—in their correct DAG order. But its most impactful feature is how it handles test failures. It introduces atomicity into your workflow, ensuring that a failure in an upstream resource prevents downstream resources from ever running.&lt;/p&gt;

&lt;p&gt;Tests on upstream resources will block downstream resources from running, and a test failure will cause those downstream resources to skip entirely.&lt;br&gt;
This behavior is a game-changer for data pipeline reliability, especially in CI/CD environments. If a quality test on an upstream model fails, dbt build prevents dbt from wasting time and compute resources running costly downstream models that would inevitably be built on corrupted or invalid data. It's an intelligent guardrail that actively protects your data ecosystem, ensuring that corrupted data never pollutes downstream models, saving you compute costs and, more importantly, trust.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Your dbt compile Command Secretly Talks to Your Warehouse&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It’s a common and intuitive assumption: dbt compile is a purely local operation. You expect it to simply take your Jinja-infused SQL files and render them into the pure, executable SQL that will eventually be sent to the warehouse. It feels like a dry run that shouldn't need any external connections.Surprisingly, this is incorrect. The dbt compile command requires an active connection to your data platform.&lt;/p&gt;

&lt;p&gt;The reason is that compile does more than just render Jinja. It needs to run "introspective queries" against the warehouse to gather metadata. This is essential for tasks like populating dbt’s relation cache (so it knows what tables already exist) and resolving certain powerful macros, such as dbt_utils.get_column_values, which query the database to function.&lt;br&gt;
Understanding this clarifies why a compile might fail due to connection issues and distinguishes it from dbt parse, which is a local operation that can be run without a warehouse connection to validate your project's structure and YAML.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;dbt Snapshots Aren't Backups—They're Time Machines for Your Data&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The word "snapshot" often evokes the idea of a database backup—a complete copy of a table at a specific point in time. This leads many to misunderstand the true and far more powerful purpose of dbt's snapshot feature.&lt;/p&gt;

&lt;p&gt;dbt snapshots are not backups. They are dbt's native mechanism for implementing Type-2 Slowly Changing Dimensions (SCDs) over mutable source tables. Their purpose is to record how a specific row in a source table changes over time, especially when that source system overwrites data instead of preserving history.&lt;/p&gt;

&lt;p&gt;Snapshots work by monitoring a source table and creating a new record in a snapshot table every time a row changes. To manage this history, dbt adds special metadata columns, most notably dbt_valid_from and dbt_valid_to, which record the exact timestamp range during which a version of a row was valid. This is profoundly impactful for any analyst who needs to "look back in time" and understand, for example, what a customer's address was a year ago, even if the source database only stores the current address.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Custom Schemas Have a Hidden Prefix (For a Good Reason)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here’s a scenario that trips up nearly every new dbt user. You want to organize your project, so you add schema: marketing to a model's configuration. You run dbt, check your warehouse, and are surprised to find the model not in a schema named marketing, but in one named something like alice_dev_marketing or analytics_prod_marketing.&lt;/p&gt;

&lt;p&gt;This is dbt's default behavior, and it's by design. By default, dbt generates a schema name by combining the target schema from your profiles.yml with the custom schema you configured, creating a final name like _. This is why a model with schema: marketing built by a developer whose target schema is alice_dev surprisingly lands in a schema named alice_dev_marketing, not marketing.&lt;/p&gt;

&lt;p&gt;The critical reasoning behind this is to enable safe, collaborative development. Each developer works in their own target schema (e.g., alice_dev). This prefixing behavior ensures that when Alice builds the marketing models, they land in her isolated alice_dev_marketing schema, preventing her from overwriting the work of a colleague or, more critically, the production tables. While this behavior can be fully customized for production environments by overriding the generate_schema_name macro, the default is a powerful safeguard for team-based workflows.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The ref() Function Is a Swiss Army Knife for Dependencies&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Every dbt user learns ref function on day one. It's the function that builds the DAG. But its capabilities extend far beyond this basic, single-argument use. Two advanced patterns in particular unlock more robust and scalable project architectures.&lt;/p&gt;

&lt;p&gt;First, the two-argument ref() is your key to dbt Mesh. When you need to reference a model from another dbt project or an installed package, you can use as shown below,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;'{{ ref('project_or_package', 'model_name') }}'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This syntax creates an explicit, unambiguous dependency on a public model maintained by another team or package, which is the foundational pattern for building a scalable, multi-project dbt Mesh architecture.&lt;/p&gt;

&lt;p&gt;Second, you can force dependencies that dbt can't see. Sometimes, a ref() call is placed inside a conditional Jinja block, like&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;'{% if execute %}'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;which is only evaluated at run time. During dbt's initial parsing phase, the execute variable is false. This means the parser never steps inside the 'if execute' block, so it is completely blind to the ref() call within it and fails to build the dependency graph correctly. &lt;/p&gt;

&lt;p&gt;To solve this, you can add a simple SQL comment outside the block:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;'`--depends_on: {{ ref('model_name') }}`'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;dbt's parser is smart enough to evaluate Jinja inside SQL comments, allowing it to detect the dependency every time while keeping the compiled SQL valid.&lt;/p&gt;

&lt;p&gt;Conclusion&lt;br&gt;
From revealing that dbt build is an atomic guardian of your DAG, not just a shortcut, to uncovering the hidden network calls of dbt compile, it’s clear that dbt’s most powerful features lie just beneath the surface. These five "truths" are just a starting point. By moving beyond the initial basics, you can build data pipelines that are not only functional but also more reliable, scalable, and easier to maintain.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What hidden dbt feature has been a game-changer in your own data wor&lt;br&gt;
kflow?&lt;/p&gt;
&lt;/blockquote&gt;

</description>
    </item>
    <item>
      <title>Azure Data Factory: Building a Simple Pipeline - Step-by-Step Guide</title>
      <dc:creator>Arunkumar Panneerselvam</dc:creator>
      <pubDate>Tue, 23 Sep 2025 05:31:44 +0000</pubDate>
      <link>https://forem.com/arunkumar_panneerselvam_2/azure-data-factory-building-a-simple-pipeline-step-by-step-guide-ilb</link>
      <guid>https://forem.com/arunkumar_panneerselvam_2/azure-data-factory-building-a-simple-pipeline-step-by-step-guide-ilb</guid>
      <description>&lt;p&gt;Azure Data Factory (ADF) makes it easy to move, transform, and automate data workflows in the cloud. In this post, I will walk through creating a simple ADF pipeline from setting up core resources and GitHub integration to copying data between storage containers and monitoring the entire process.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Creating a Data Factory Instance
&lt;/h3&gt;

&lt;p&gt;To get started, I created a new data factory resource in Azure named coredata-datafactory1.&lt;/p&gt;

&lt;h4&gt;
  
  
  Navigation Steps:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Log in to the Azure Portal.&lt;/li&gt;
&lt;li&gt;Select “Create a resource” &amp;gt; “Analytics” &amp;gt; “Data Factory.”&lt;/li&gt;
&lt;li&gt;Fill in the resource details: subscription, resource group, unique factory name, region, and version (V2) [leaving other tabs to defaults]&lt;/li&gt;
&lt;li&gt;Refer Step 2 for Git Configuration (this can be done later too !)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffkc5mwnfn4hgcpaq404j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffkc5mwnfn4hgcpaq404j.png" alt="Basics tab" width="800" height="725"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Proceed to the “Review + create” tab and confirm.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvyj7hoxcssbbcat6d0qn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvyj7hoxcssbbcat6d0qn.png" alt="review and create tab" width="800" height="771"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;After deployment, hit “Go to resource.”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn4bryv8hjj6lmclxg9zw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn4bryv8hjj6lmclxg9zw.png" alt=" " width="800" height="417"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  2. Setting Up GitHub Integration
&lt;/h3&gt;

&lt;p&gt;Source control is key for managing changes and collaborating with ease. I set up my GitHub account, created a private repository named coredata-azuredatafactory.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F806kkecw1hweukvqqtvo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F806kkecw1hweukvqqtvo.png" alt=" " width="800" height="444"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Authorize ADF to access the repo,&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F46a52jkbrzpqr8wvv7ma.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F46a52jkbrzpqr8wvv7ma.png" alt=" " width="800" height="1167"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  3. Exploring Azure Data Factory Studio
&lt;/h3&gt;

&lt;p&gt;Next, I launched the Data Factory Studio. The homepage provides a simple navigation pane, making it easy to design and monitor your data flows.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe87woyht5b33oj4nr2r0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe87woyht5b33oj4nr2r0.png" alt=" " width="800" height="611"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  4. Creating a Storage Account
&lt;/h3&gt;

&lt;p&gt;For this demo, I created a storage account named coredatadatastorage1. The account serves as both the data source and destination.&lt;/p&gt;

&lt;h4&gt;
  
  
  Navigation Steps:
&lt;/h4&gt;

&lt;p&gt;In the Azure Portal, click “Create a resource” &amp;gt; “Storage” &amp;gt; “Storage account.”&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbreypyksbog1zxziclfd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbreypyksbog1zxziclfd.png" alt=" " width="800" height="553"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Specify the required configuration (resource group, region, account name, redundancy).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6z0cnro3bkywlj4n0mo3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6z0cnro3bkywlj4n0mo3.png" alt=" " width="800" height="851"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On the “Review + create” screen, review the settings and click “Create.”&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8psbdufj1cn14pi06i6w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8psbdufj1cn14pi06i6w.png" alt=" " width="800" height="819"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once deployed, go to the resource overview page.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flu2jl57ft1xo8lgptctb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flu2jl57ft1xo8lgptctb.png" alt=" " width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  5. Designing the Data Pipeline
&lt;/h3&gt;

&lt;p&gt;I created a simple pipeline called data_copy_pipeline to copy data from an input directory to an output directory in Blob Storage.&lt;/p&gt;

&lt;p&gt;Open Data Factory Studio and click on “Author.”&lt;/p&gt;

&lt;p&gt;Select “Pipeline” and click “New pipeline.”&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmwk5gny0n9glteuoiexk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmwk5gny0n9glteuoiexk.png" alt=" " width="800" height="630"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Name your pipeline (data_copy_pipeline).&lt;/p&gt;

&lt;p&gt;Go to "Activities" and under "Move and transform," select the "Copy data"&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgqfey72dxm7sq3ilgqym.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgqfey72dxm7sq3ilgqym.png" alt=" " width="588" height="275"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  6. Creating Datasets
&lt;/h3&gt;

&lt;p&gt;For this pipeline, I needed datasets to represent the source (input file) and sink (output file).&lt;/p&gt;

&lt;h4&gt;
  
  
  Source Dataset:
&lt;/h4&gt;

&lt;p&gt;Choose Azure Blob Storage as the source.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc7vj674cg1mey9rz5quf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc7vj674cg1mey9rz5quf.png" alt=" " width="800" height="1096"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As my file is (.txt) I selected binary format,&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F98rlgh9f80kxv2z4fgya.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F98rlgh9f80kxv2z4fgya.png" alt=" " width="800" height="828"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Sink Dataset:
&lt;/h4&gt;

&lt;p&gt;Set Azure Blob Storage as the destination (same as source).&lt;/p&gt;

&lt;p&gt;Name the output file (e.g., test_data_out.log).&lt;/p&gt;




&lt;h3&gt;
  
  
  7. Configuring Linked Services
&lt;/h3&gt;

&lt;p&gt;Each dataset requires a linked service that defines how ADF connects to the underlying storage.&lt;/p&gt;

&lt;p&gt;Create a linked service for the source by providing storage account credentials and selecting the container or directory.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2l18agncwct473n638zf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2l18agncwct473n638zf.png" alt=" " width="800" height="1091"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Select the test input file (e.g., test_data.txt).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsn0k34jw0n23opzs8drg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsn0k34jw0n23opzs8drg.png" alt=" " width="800" height="331"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It should look like,&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv2wwz25snoaqaisspiwy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv2wwz25snoaqaisspiwy.png" alt=" " width="800" height="329"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Repeat for the sink (output container and path).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmlv1qydvxpvsms1pdkvu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmlv1qydvxpvsms1pdkvu.png" alt=" " width="800" height="360"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Test the connection to ensure setup is correct.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3hrssd3qqspsg0keo4mo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3hrssd3qqspsg0keo4mo.png" alt=" " width="352" height="141"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo3vadylxf45w0ckdea01.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo3vadylxf45w0ckdea01.png" alt=" " width="552" height="142"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  8. Running &amp;amp; Monitoring the Pipeline
&lt;/h3&gt;

&lt;p&gt;With everything set, I manually triggered the pipeline using the “Trigger now” button.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff9o2gqecjfnkpmg64xdm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff9o2gqecjfnkpmg64xdm.png" alt=" " width="800" height="262"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After execution, navigate to “Monitor” and check the status under “Pipeline runs.”&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftjy9k130mg0fufctum80.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftjy9k130mg0fufctum80.png" alt=" " width="564" height="133"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff50kka578u5wszk8avn0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff50kka578u5wszk8avn0.png" alt=" " width="800" height="280"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Review activity details in “Activity runs”,&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4kmv2ps8a39vnncs87my.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4kmv2ps8a39vnncs87my.png" alt=" " width="800" height="365"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Inspect the details tab for run-specific metadata and logs,&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbl23czm2oqujloxdiy41.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbl23czm2oqujloxdiy41.png" alt=" " width="800" height="625"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  9. Verifying the Output
&lt;/h3&gt;

&lt;p&gt;Finally, I verified that the output file was written to the specified sink directory in Blob Storage (test_data_out.log).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8h0xln90r6fe6nfov7mr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8h0xln90r6fe6nfov7mr.png" alt=" " width="800" height="307"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Summary
&lt;/h4&gt;

&lt;p&gt;By following these steps, you can quickly spin up your own Azure Data Factory, connect it with source control, design simple data pipelines, and move data between Azure storage resources. The visual tools and straightforward setup make it incredibly friendly—even for those just getting started with cloud data engineering.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;Azure Data Factory empowers users to orchestrate cloud-scale data movement and transformation with minimal setup or code. Its integration with GitHub gives you confidence in version control, and the monitoring features keep you in the loop at every stage. Next, you can explore scheduling, data transformations, and integrating with other services—but even a basic pipeline gives you a strong foundation for bigger data projects.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Thank you for reading! If you found this post helpful or inspiring, please leave a comment below with your thoughts or questions. I would love to hear your feedback and experiences. Feel free to share this article with friends or colleagues who might benefit too.&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Keep transforming and exploring new data possibilities with Azure Data Factory!&lt;/p&gt;
&lt;/blockquote&gt;

</description>
    </item>
    <item>
      <title>Azure Data Factory (ADF) - A Beginner's Guide to Cloud Data Integration</title>
      <dc:creator>Arunkumar Panneerselvam</dc:creator>
      <pubDate>Tue, 23 Sep 2025 01:00:44 +0000</pubDate>
      <link>https://forem.com/arunkumar_panneerselvam_2/azure-data-factory-adf-a-beginners-guide-to-cloud-data-integration-53jd</link>
      <guid>https://forem.com/arunkumar_panneerselvam_2/azure-data-factory-adf-a-beginners-guide-to-cloud-data-integration-53jd</guid>
      <description>&lt;p&gt;Azure Data Factory (ADF) is a powerful, fully managed cloud service from Microsoft designed to simplify the process of moving, transforming, and orchestrating data at scale. Whether you are a developer, data engineer, or analyst, ADF provides a versatile platform to build data-driven workflows that automate data intake and processing across many sources and destinations. This blog introduces the core concepts, components, and benefits of Azure Data Factory, helping beginners understand how to get started easily.&lt;/p&gt;




&lt;blockquote&gt;
&lt;h3&gt;
  
  
  What is Azure Data Factory?
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Azure Data Factory is a serverless data integration platform built for modern cloud and hybrid data scenarios. It helps create automated data pipelines that extract data from diverse sources, perform transformations, and load the data into sinks (like data warehouses or lakes). You can think of ADF as a data orchestrator that ensures the right data moves efficiently and reliably between systems to enable analytics and reporting.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It supports a wide range of data sources from on-premises databases to cloud storage, SaaS applications, and big data stores. ADF also integrates well with other Azure services, making it ideal for enterprises adopting cloud data modernization.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Concepts and Components
&lt;/h2&gt;

&lt;h4&gt;
  
  
  Pipeline
&lt;/h4&gt;

&lt;p&gt;A pipeline is a logical grouping of activities that perform a unit of work. For instance, a pipeline might copy data from an Azure Blob storage location and then transform it using a compute service like Azure Databricks. Pipelines enable you to manage the workflow as a single, coordinated job, running steps either sequentially or in parallel.&lt;/p&gt;

&lt;h4&gt;
  
  
  Activity
&lt;/h4&gt;

&lt;p&gt;An activity represents a single task within a pipeline. There are many types, including but not limited to copying data, running a stored procedure, executing a data flow (for transformations), or invoking a REST endpoint. Activities are the building blocks to implement your data processes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Filtk8pq8irilr5jn33w2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Filtk8pq8irilr5jn33w2.png" alt=" " width="206" height="759"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Dataset
&lt;/h4&gt;

&lt;p&gt;Datasets represent data structures within your data stores, like tables or files. They act as inputs or outputs for activities, specifying what data is being processed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7spuknmq4665mw4i3j40.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7spuknmq4665mw4i3j40.png" alt=" " width="626" height="753"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Linked Service
&lt;/h4&gt;

&lt;p&gt;A linked service is a connection string or configuration that defines the connection to a data source, destination, or compute resource. For instance, a linked service might connect ADF to an Azure SQL database or an Amazon S3 bucket. &lt;a href="https://docs.microsoft.com/azure/data-factory/concepts-linked-services" rel="noopener noreferrer"&gt;Learn More&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzhlu7oz1gdul3j7xswdl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzhlu7oz1gdul3j7xswdl.png" alt=" " width="622" height="858"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Integration Runtime
&lt;/h4&gt;

&lt;p&gt;This is the compute infrastructure that performs data movement and transformation. You can use Azure-hosted runtimes or self-hosted runtimes to securely access on-premises data. &lt;a href="https://docs.microsoft.com/azure/data-factory/concepts-integration-runtime" rel="noopener noreferrer"&gt;Learn More&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flrgttqc86k6trs3ndv5s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flrgttqc86k6trs3ndv5s.png" alt=" " width="626" height="369"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Data Flows
&lt;/h4&gt;

&lt;p&gt;Visual data transformation tools that let you build data transformation logic with a drag-and-drop experience. Data Flows in ADF execute on managed Spark clusters, handling large-scale data processing without coding Spark directly.&lt;/p&gt;

&lt;h4&gt;
  
  
  Triggers
&lt;/h4&gt;

&lt;p&gt;Triggers define when pipelines run—on a schedule, in response to an event, or manually. You can automate your pipelines to execute daily, hourly, or based on file arrivals.&lt;/p&gt;




&lt;h3&gt;
  
  
  Basic Pipeline Example: Copy a Test data file (.txt) in Azure Blob Storage (Source &amp;amp; Sink (target/destination))
&lt;/h3&gt;

&lt;p&gt;Outline of very high-level steps, we can have a detailed explanation of it in a separate post.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Create Linked Services: Define connections for your source (Azure Blob Storage) and sink (Azure Blob Storage).&lt;br&gt;
Create Datasets: Define datasets that point to your source files and target.&lt;br&gt;
Build a Pipeline: Add a Copy Activity that moves data from the source dataset to the sink dataset.&lt;br&gt;
Add a Trigger: Schedule the pipeline to run manually once or daily to keep your target updated with new data files.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This simple setup can be extended to include data transformations, error handling, and notifications, showcasing ADF’s automation capabilities.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Use Azure Data Factory?
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Scalable and Managed
&lt;/h4&gt;

&lt;p&gt;&lt;code&gt;No need to manage servers; scale processing up or down as needed.&lt;/code&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Wide Connectivity
&lt;/h4&gt;

&lt;p&gt;&lt;code&gt;Supports hundreds of connectors for various on-premises and cloud data platforms.&lt;/code&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Code-Free UI
&lt;/h4&gt;

&lt;p&gt;&lt;code&gt;Build complex ETL/ELT workflows with minimal coding.&lt;/code&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Advanced Data Flows
&lt;/h4&gt;

&lt;p&gt;&lt;code&gt;Use Spark-based data flows for heavy transformations.&lt;/code&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Integration with Azure Ecosystem
&lt;/h4&gt;

&lt;p&gt;&lt;code&gt;Easily connect with Azure Synapse, Databricks, Logic Apps, and more.&lt;/code&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Monitoring and Alerting
&lt;/h4&gt;

&lt;p&gt;&lt;code&gt;Built-in monitoring dashboards provide detailed pipeline run information and failure alerts.&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Summary
&lt;/h3&gt;

&lt;p&gt;Azure Data Factory is a versatile tool that empowers data engineers and organizations to automate data workflows across hybrid and cloud environments. Its well-structured components—pipelines, datasets, linked services, activities, and integration runtime work together to create seamless data orchestration solutions. Whether ingesting, transforming, or transferring data, ADF offers a scalable, low-maintenance platform with powerful automation and monitoring features.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Understanding and leveraging Azure Data Factory can dramatically improve data workflows’ efficiency and reliability. Its rich feature set allows beginners to quickly get started and experts to build complex solutions at scale. By mastering the basics of ADF, you can contribute significantly to any data-driven organization’s success, enabling faster insights and smarter business decisions.&lt;/p&gt;

&lt;p&gt;This overview aims to provide a clear and approachable introduction that invites further exploration and hands-on learning with Azure Data Factory.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Thank you for reading! If you found this post helpful or inspiring, please leave a comment below with your thoughts or questions. I would love to hear your feedback and experiences. Feel free to share this article with friends or colleagues who might benefit too.&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Keep transforming and exploring new data possibilities with Azure Data Factory!&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>dataengineering</category>
      <category>cloud</category>
      <category>beginners</category>
      <category>azure</category>
    </item>
    <item>
      <title>Automating Your Local DBT &amp; Snowflake Playground with Python</title>
      <dc:creator>Arunkumar Panneerselvam</dc:creator>
      <pubDate>Tue, 16 Sep 2025 05:26:54 +0000</pubDate>
      <link>https://forem.com/arunkumar_panneerselvam_2/automating-your-local-dbt-snowflake-playground-with-python-7gl</link>
      <guid>https://forem.com/arunkumar_panneerselvam_2/automating-your-local-dbt-snowflake-playground-with-python-7gl</guid>
      <description>&lt;p&gt;&lt;em&gt;&amp;gt; Harnessing Python to reverse-engineer Snowflake metadata as dbt sources an easy, efficient playground setup&lt;/em&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;Everyone learns differently. Whether it's diving into textbooks, absorbing visual content, or hands-on tinkering, the path to mastery varies widely. Personally, I find the best way is by doing building, automating, debugging, and iterating until something clicks.&lt;/p&gt;

&lt;p&gt;Recently, I explored how to automatically generate a full dbt project structure from a Snowflake trial account taking away the toil of manual YAML and SQL file creation. If you’ve dabbled in dbt or Snowflake, you know setting up sources can be time-consuming, especially across schemas and tables.&lt;/p&gt;

&lt;p&gt;This guide, however, is not a beginner’s tutorial on Snowflake or dbt basics. I assume you’ve got some familiarity already. Instead, it focuses on streamlining setup using Python scripts and Visual Studio Code (or any coding environment you prefer), enabling rapid experimentation and transformation.&lt;/p&gt;

&lt;h4&gt;
  
  
  Why Automate Source Generation?
&lt;/h4&gt;

&lt;p&gt;Manual source definition is repetitive and error-prone. When working with complex databases, initial setup becomes a bottleneck.&lt;/p&gt;

&lt;p&gt;Using a Python script to introspect the Snowflake metadata and generate compliant dbt source files:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Saves hours of setup time&lt;/li&gt;
&lt;li&gt;Avoids human mistakes in YAML/SQL&lt;/li&gt;
&lt;li&gt;Scales effortlessly as schemas or tables grow&lt;/li&gt;
&lt;li&gt;Sets a foundation for continuous and reproducible transformations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Setting Up Your Development Environment, You’ll need a few tools ready:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Python 3.13+: Download &amp;amp; install&lt;br&gt;
Pip: The Python package manager — install if not available&lt;br&gt;
Virtualenv: To isolate your dependencies&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Step 1 — Install Python&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Download Python &lt;/p&gt;
&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
        &lt;div class="c-embed__cover"&gt;
          &lt;a href="https://www.python.org/downloads/?source=post_page-----bc81f31c103c---------------------------------------" class="c-link align-middle" rel="noopener noreferrer"&gt;
            &lt;img alt="" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.python.org%2Fstatic%2Fopengraph-icon-200x200.png" height="auto" class="m-0"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="c-embed__body"&gt;
        &lt;h2 class="fs-xl lh-tight"&gt;
          &lt;a href="https://www.python.org/downloads/?source=post_page-----bc81f31c103c---------------------------------------" rel="noopener noreferrer" class="c-link"&gt;
            Download Python | Python.org
          &lt;/a&gt;
        &lt;/h2&gt;
          &lt;p class="truncate-at-3"&gt;
            The official home of the Python Programming Language
          &lt;/p&gt;
        &lt;div class="color-secondary fs-s flex items-center"&gt;
            &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.python.org%2Fstatic%2Ffavicon.ico"&gt;
          python.org
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;Download python and install it first. I recommend installing the latest version of python 3.&lt;/p&gt;

&lt;p&gt;You can verify if python has been correctly installed by:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python --version
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;you should receive something like:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Python 3.12.1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;&lt;em&gt;Step 2 — Install pip&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Download pip:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Then install it:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python get-pip.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;You should see similar to this,&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PS C:\Users \arun&amp;gt; curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py                                                                                                                                                   PS C:\Users \arun&amp;gt; python get-pip.py
Collecting pip
  Using cached pip-25.2-py3-none-any.whl.metadata (4.7 kB)
Using cached pip-25.2-py3-none-any.whl (1.8 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 25.2
    Uninstalling pip-25.2:
      Successfully uninstalled pip-25.2
Successfully installed pip-25.2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Verify the installation:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python -m pip --version
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;you should get something like:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PS C:\Users \arun&amp;gt; python -m pip --version
pip 25.2 from C:\Users \arun\AppData\Local\Programs\Python\Python313\Lib\site-packages\pip (python 3.13)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;If you get errors when running pip, it might be due to the environment variables not properly set up. Follow this article to set up the environment vars for pip.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Step 3 — Install virtualenv&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Virtualenv is a tool in Python that allows you to create isolated environments where you can install packages and dependencies without affecting the global Python installation. It helps manage project-specific dependencies and avoids conflicts between different projects.&lt;/p&gt;

&lt;p&gt;You can run the whole project without virtualenv but I prefer isolating different projects in case I need different python or library versions.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python -m pip install --user virtualenv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Installation output should be similar to this,&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9zp60o7zb56b4yuxvo6b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9zp60o7zb56b4yuxvo6b.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Step 4 — Create the Virtual Environment&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Now we are ready to create our virtual environment: first we create and, then we activate it. Just go to the directory you want to create this project and run:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python -m venv dbt-env
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This will install the environment under that directory as shown below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6ukq4of5kxg7avce6h9t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6ukq4of5kxg7avce6h9t.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Check the &lt;a href="https://virtualenv.pypa.io/en/latest/user_guide.html" rel="noopener noreferrer"&gt;user guide &lt;/a&gt; for details.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Step 5 — Activate the Virtual Environment&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Once the virtual environment created you will need to activate it. In PowerShell, from the root directory you have created the environment, you activate the environment like this:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.\Scripts\activate.ps1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;It will look like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpfa89jp254hzzytysvoo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpfa89jp254hzzytysvoo.png" alt="Virtualenv shown in green at the left"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When you want to operate within this project, ensure that you have the dbt-env activated, as shown above. Additionally, all subsequent installations and operations must occur within this directory or its subdirectories.&lt;/p&gt;
&lt;h2&gt;
  
  
  Installing DBT
&lt;/h2&gt;

&lt;p&gt;Installing DBT with pip couldn't be simpler. Follow the instructions from dbt documentation. I am summarizing it below for Snowflake specifically.&lt;/p&gt;

&lt;p&gt;Install dbt-core:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python -m pip install dbt-core
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Then install the Snowflake-specific libraries:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python -m pip install dbt-snowflake
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;You can see the libraries and the versions under your environment as shown below:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip list
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;It should show your libraries as below (Note: I may have more libraries as I made wider installations)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyrw87kjm3l2b2uqfp2fk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyrw87kjm3l2b2uqfp2fk.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Snowflake instance
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Creating a Trial Snowflake Account&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;If you don’t have access to an Enterprise Snowflake Account, the easiest and cheapest way to complete this step is using a Trial Account. Also, we will be generating the data sources from the trial account objects.&lt;/p&gt;

&lt;p&gt;Complete the Snowflake Signup &lt;/p&gt;
&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
      &lt;div class="c-embed__body flex items-center justify-between"&gt;
        &lt;a href="https://signup.snowflake.com/?referrer=snowsight" rel="noopener noreferrer" class="c-link fw-bold flex items-center"&gt;
          &lt;span class="mr-2"&gt;signup.snowflake.com&lt;/span&gt;
          

        &lt;/a&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4mej67yk1b6x2zvoavu4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4mej67yk1b6x2zvoavu4.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then you select the edition and the cloud provider. I have selected Microsoft Azure. You will be asked for the purpose of the trial account and… that's it, you have the trial account created !!!&lt;/p&gt;

&lt;p&gt;You will receive an email to activate your Snowflake account and the link to your instance. Click on that link and activate it. Upon signup, note your account identifier, username, password, default warehouse, and database details. These are essential for connecting from your scripts and dbt profiles.&lt;/p&gt;

&lt;h2&gt;
  
  
  Initialize Your dbt Project
&lt;/h2&gt;

&lt;p&gt;With the environment active, start your dbt project. Run DBT Init statement to initialize the project as shown below. Make sure the environment has been activated as shown in Step 5 of the section “Create the Virtual Environment”.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dbt init snowflake_dbt_main
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;During setup, choose Snowflake, input your credentials, and specify the default schema, warehouse, etc.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(dbt-main) PS D:\dbt\dbt-main&amp;gt; dbt init snowflake_dbt_main
00:11:48  Running with dbt=1.10.11
00:11:48  
Your new dbt project "snowflake_dbt_main" was created!

For more information on how to configure the profiles.yml file,
please consult the dbt documentation here:

  https://docs.getdbt.com/docs/configure-your-profile

One more thing:

Need help? Don't hesitate to reach out to us via GitHub issues or on Slack:

  https://community.getdbt.com/

Happy modeling!

00:11:48  Setting up your profile.
Which database would you like to use?
[1] snowflake

(Don't see the one you want? https://docs.getdbt.com/docs/available-adapters)

Enter a number: 1
account (https://&amp;lt;this_value&amp;gt;.snowflakecomputing.com): xxxxxxx
user (dev username): xxxxxxxx
[1] password
[2] keypair
[3] sso
Desired authentication type option (enter a number): 1
password (dev password):
role (dev role): ACCOUNTADMIN
warehouse (warehouse name): COMPUTE_WH
database (default database that dbt will build objects in): SNOWFLAKE_SAMPLE_DATA
schema (default schema that dbt will build objects in): TPCDS_SF100TCL
threads (1 or more) [1]:
00:15:27  Profile snowflake_dbt_main written to C:\Users \arun\.dbt\profiles.yml using target's profile_template.yml and your supplied values. Run 'dbt debug' to validate the connection.
(dbt-main) PS D:\dbt\dbt-main&amp;gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Provide a valid Account and username. The Account Identifier has 2 segments:&lt;br&gt;
-&lt;br&gt;
You might get something like:&lt;br&gt;
ED25756-YUNGSKL&lt;br&gt;
where the organization is ED25756 and the account name would be YUNGSKL&lt;br&gt;
You can also find this in the Snowflake Account URL or under the profile &amp;gt; Account &amp;gt; Account details.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The project is now initialized, and you will see the following directories,&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fswlkz4cyyn5pg5m4esv5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fswlkz4cyyn5pg5m4esv5.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Snowflake Data Sources&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;In order to create the data sources from our newly created trial Snowflake account, below Python script that does exactly that, it connects to your Snowflake account, pulls the tables and views from a specific Warehouse, Database, and Schema, and creates the DBT objects.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flui35of3ye8hnzppymb3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flui35of3ye8hnzppymb3.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Python Automation Script&lt;/strong&gt;&lt;br&gt;
The heart of this project — a Python script — connects to Snowflake, extracts metadata on tables, views, and columns, then auto-generates:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;YAML files describing data sources&lt;br&gt;
SQL files querying these sources with a standard wrapper&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This eliminates manual YAML/SQL creation.&lt;/p&gt;

&lt;p&gt;The script code is &lt;a href="https://github.com/ArunkumarPanneerselvam/DBT_Source_Model_Gen_Automation_Snowflake_Python" rel="noopener noreferrer"&gt;here&lt;/a&gt; for your usage. Just do the modifications according to your Snowflake hierarchy.&lt;/p&gt;

&lt;p&gt;Sample code snippet function showing metadata query and file generation&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiv8cbp1m0nfsnb2jkbyx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiv8cbp1m0nfsnb2jkbyx.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbpnpf49ffqy5tf6qmftf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbpnpf49ffqy5tf6qmftf.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To run this script, you would need the following information,&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5ek95pnvbufyienx8m9n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5ek95pnvbufyienx8m9n.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Snowflake Account Code, User, and Password. This can be fetched from the earlier steps of the setup or you can get it from the Snowflake account.&lt;/li&gt;
&lt;li&gt;DBT Models path. You can get this by right-clicking in the Models folder in the IDE (Visual Studio Code).&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;Setting up environment variables before execution can be done by setting the parameters &amp;amp; values under the "Environment variables" as shown below,&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhgk0uf37ttqd2mliaduh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhgk0uf37ttqd2mliaduh.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Create the following environment variables,&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;DBT_MODELS_PATH&lt;br&gt;
SNOWFL_ACCT&lt;br&gt;
SNOWFL_USER&lt;br&gt;
SNOWFL_PWD&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;with the appropriate values as they are needed for the script to execute.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fakyqlfz6cb0ohyo8vhk0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fakyqlfz6cb0ohyo8vhk0.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;or&lt;/p&gt;

&lt;p&gt;Ensure you have environment variables configured:&lt;/p&gt;

&lt;p&gt;If you are using PowerShell,&lt;br&gt;
_&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;$env:DBT_MODELS_PATH = "D:\dbt\dbt-main\snowflake_dbt_main\models"&lt;br&gt;
$env:SNOWFLAKE_USER=your_user&lt;br&gt;
$env:SNOWFLAKE_PASSWORD=your_password&lt;br&gt;
$env:SNOWFLAKE_ACCOUNT=your_account&lt;br&gt;
_&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If on Linux,&lt;br&gt;
_&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;export SNOWFLAKE_USER=your_user&lt;br&gt;
export SNOWFLAKE_PASSWORD=your_password&lt;br&gt;
export SNOWFLAKE_ACCOUNT=your_account&lt;br&gt;
export DBT_MODELS_PATH=/full/path/to/your/dbt/models&lt;br&gt;
_&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Running the Automation&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Once you run it, as your account is a trial and it already has some sample data, the DBT model will automatically create all the tables and views from the database SNOWFLAKE_SAMPLE_DATA and schemas below&lt;/p&gt;

&lt;p&gt;['TPCH_SF100', 'TPCH_SF1000']&lt;/p&gt;

&lt;p&gt;It will use the default warehouse COMPUTE_WH. Look for creation messages in the terminal and verify files under the models directory.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python .\data_generate.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output must be similar to,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PS D:\dbt&amp;gt; python .\data_generate.py
Models directory: D:\dbt\dbt-main\snowflake_dbt_main\models
Connected to Snowflake successfully.
Sample metadata:
               schema                 object_name object_type   column_name data_type mandatory
0  INFORMATION_SCHEMA            APPLICABLE_ROLES        VIEW       GRANTEE   varchar  not null
1  INFORMATION_SCHEMA            APPLICABLE_ROLES        VIEW     ROLE_NAME   varchar  not null
2  INFORMATION_SCHEMA            APPLICABLE_ROLES        VIEW    ROLE_OWNER   varchar  not null
3  INFORMATION_SCHEMA            APPLICABLE_ROLES        VIEW  IS_GRANTABLE   varchar      null
4  INFORMATION_SCHEMA  APPLICATION_SPECIFICATIONS        VIEW         LABEL   varchar  not null
Creating directory: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF100_src
Creating directory: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF1000_src
Directory creation completed with code: 0
Starting to create YAML files for schema TPCH_SF100...
Generating YAML files in: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF100_src
Processing object: CUSTOMER with type BASE TABLE
Created YAML file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF100_src\CUSTOMER.yml
Processing object: LINEITEM with type BASE TABLE
Created YAML file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF100_src\LINEITEM.yml
Processing object: NATION with type BASE TABLE
Created YAML file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF100_src\NATION.yml
Processing object: ORDERS with type BASE TABLE
Created YAML file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF100_src\ORDERS.yml
Processing object: PART with type BASE TABLE
Created YAML file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF100_src\PART.yml
Processing object: PARTSUPP with type BASE TABLE
Created YAML file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF100_src\PARTSUPP.yml
Processing object: REGION with type BASE TABLE
Created YAML file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF100_src\REGION.yml
Processing object: SUPPLIER with type BASE TABLE
Created YAML file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF100_src\SUPPLIER.yml
Starting to create YAML files for schema TPCH_SF1000...
Generating YAML files in: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF1000_src
Processing object: CUSTOMER with type BASE TABLE
Created YAML file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF1000_src\CUSTOMER.yml
Processing object: LINEITEM with type BASE TABLE
Created YAML file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF1000_src\LINEITEM.yml
Processing object: NATION with type BASE TABLE
Created YAML file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF1000_src\NATION.yml
Processing object: ORDERS with type BASE TABLE
Created YAML file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF1000_src\ORDERS.yml
Processing object: PART with type BASE TABLE
Created YAML file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF1000_src\PART.yml
Processing object: PARTSUPP with type BASE TABLE
Created YAML file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF1000_src\PARTSUPP.yml
Processing object: REGION with type BASE TABLE
Created YAML file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF1000_src\REGION.yml
Processing object: SUPPLIER with type BASE TABLE
Created YAML file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF1000_src\SUPPLIER.yml
Starting to create SQL models for schema TPCH_SF100...
Generating SQL files in: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF100_src
Processing SQL model: CUSTOMER (type BASE TABLE)
Created SQL file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF100_src\CUSTOMER.sql
Processing SQL model: LINEITEM (type BASE TABLE)
Created SQL file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF100_src\LINEITEM.sql
Processing SQL model: NATION (type BASE TABLE)
Created SQL file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF100_src\NATION.sql
Processing SQL model: ORDERS (type BASE TABLE)
Created SQL file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF100_src\ORDERS.sql
Processing SQL model: PART (type BASE TABLE)
Created SQL file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF100_src\PART.sql
Processing SQL model: PARTSUPP (type BASE TABLE)
Created SQL file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF100_src\PARTSUPP.sql
Processing SQL model: REGION (type BASE TABLE)
Created SQL file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF100_src\REGION.sql
Processing SQL model: SUPPLIER (type BASE TABLE)
Created SQL file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF100_src\SUPPLIER.sql
Starting to create SQL models for schema TPCH_SF1000...
Generating SQL files in: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF1000_src
Processing SQL model: CUSTOMER (type BASE TABLE)
Created SQL file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF1000_src\CUSTOMER.sql
Processing SQL model: LINEITEM (type BASE TABLE)
Created SQL file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF1000_src\LINEITEM.sql
Processing SQL model: NATION (type BASE TABLE)
Created SQL file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF1000_src\NATION.sql
Processing SQL model: ORDERS (type BASE TABLE)
Created SQL file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF1000_src\ORDERS.sql
Processing SQL model: PART (type BASE TABLE)
Created SQL file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF1000_src\PART.sql
Processing SQL model: PARTSUPP (type BASE TABLE)
Created SQL file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF1000_src\PARTSUPP.sql
Processing SQL model: REGION (type BASE TABLE)
Created SQL file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF1000_src\REGION.sql
Processing SQL model: SUPPLIER (type BASE TABLE)
Created SQL file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF1000_src\SUPPLIER.sql
PS D:\dbt&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should be able to see the models now,&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjiezy1nevelqdb6gtras.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjiezy1nevelqdb6gtras.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fflemltfimmjz73w2h8gc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fflemltfimmjz73w2h8gc.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;&lt;em&gt;Start Transforming!&lt;/em&gt;&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;With your sources in place, use dbt commands:&lt;/p&gt;

&lt;p&gt;dbt run&lt;br&gt;
dbt test&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Enjoy a frictionless start on your Snowflake-Dbt transformations.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Conclusion&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Setting up a dbt environment connected to Snowflake no longer needs to feel tedious or intimidating. With a bit of scripting magic, you can automate the heavy lifting and focus on what truly matters — crafting data transformations and insights.&lt;/p&gt;

&lt;p&gt;Experiment, extend, and share your improvements!&lt;/p&gt;

&lt;p&gt;🚀 Enjoyed this guide? Give it a clap, share your feedback in the comments, or ask questions below! &lt;/p&gt;

&lt;p&gt;💬 Looking forward to your insights and stories — let’s grow together in the data engineering world!&lt;/p&gt;

</description>
      <category>snowflake</category>
      <category>dbt</category>
      <category>python</category>
      <category>dataengineering</category>
    </item>
  </channel>
</rss>
