Forem: Arunkumar Panneerselvam

5 Surprising dbt Truths That Will Change How You Work

Arunkumar Panneerselvam — Tue, 18 Nov 2025 04:29:30 +0000

When you first start with dbt, the learning curve feels straightforward. You master the essentials: dbt run executes your models, and the ref() function magically connects them into a DAG. It feels like you've grasped the core of the tool. But beneath this surface lies a set of powerful, non-obvious features and behaviors that can fundamentally change how you build, test, and maintain your data pipelines.

This article pulls back the curtain on a handful of these surprising and impactful truths about dbt. These aren't just niche tricks; they are fundamental concepts that, once understood, unlock a more reliable, efficient, and scalable way of working.

dbt build is More Than Just a Shortcut—It's an Atomic Guardian of Your DAG Many dbt practitioners start their journey by running dbt run to build models, followed by a separate dbt test to validate them. It seems logical. However, dbt build isn't just a convenient command that bundles these two steps; it's a more powerful, integrated command that operates with a crucial, surprising intelligence.

The dbt build command executes resources—models, tests, snapshots, and seeds—in their correct DAG order. But its most impactful feature is how it handles test failures. It introduces atomicity into your workflow, ensuring that a failure in an upstream resource prevents downstream resources from ever running.

Tests on upstream resources will block downstream resources from running, and a test failure will cause those downstream resources to skip entirely.
This behavior is a game-changer for data pipeline reliability, especially in CI/CD environments. If a quality test on an upstream model fails, dbt build prevents dbt from wasting time and compute resources running costly downstream models that would inevitably be built on corrupted or invalid data. It's an intelligent guardrail that actively protects your data ecosystem, ensuring that corrupted data never pollutes downstream models, saving you compute costs and, more importantly, trust.

Your dbt compile Command Secretly Talks to Your Warehouse

It’s a common and intuitive assumption: dbt compile is a purely local operation. You expect it to simply take your Jinja-infused SQL files and render them into the pure, executable SQL that will eventually be sent to the warehouse. It feels like a dry run that shouldn't need any external connections.Surprisingly, this is incorrect. The dbt compile command requires an active connection to your data platform.

The reason is that compile does more than just render Jinja. It needs to run "introspective queries" against the warehouse to gather metadata. This is essential for tasks like populating dbt’s relation cache (so it knows what tables already exist) and resolving certain powerful macros, such as dbt_utils.get_column_values, which query the database to function.
Understanding this clarifies why a compile might fail due to connection issues and distinguishes it from dbt parse, which is a local operation that can be run without a warehouse connection to validate your project's structure and YAML.

dbt Snapshots Aren't Backups—They're Time Machines for Your Data

The word "snapshot" often evokes the idea of a database backup—a complete copy of a table at a specific point in time. This leads many to misunderstand the true and far more powerful purpose of dbt's snapshot feature.

dbt snapshots are not backups. They are dbt's native mechanism for implementing Type-2 Slowly Changing Dimensions (SCDs) over mutable source tables. Their purpose is to record how a specific row in a source table changes over time, especially when that source system overwrites data instead of preserving history.

Snapshots work by monitoring a source table and creating a new record in a snapshot table every time a row changes. To manage this history, dbt adds special metadata columns, most notably dbt_valid_from and dbt_valid_to, which record the exact timestamp range during which a version of a row was valid. This is profoundly impactful for any analyst who needs to "look back in time" and understand, for example, what a customer's address was a year ago, even if the source database only stores the current address.

Custom Schemas Have a Hidden Prefix (For a Good Reason)

Here’s a scenario that trips up nearly every new dbt user. You want to organize your project, so you add schema: marketing to a model's configuration. You run dbt, check your warehouse, and are surprised to find the model not in a schema named marketing, but in one named something like alice_dev_marketing or analytics_prod_marketing.

This is dbt's default behavior, and it's by design. By default, dbt generates a schema name by combining the target schema from your profiles.yml with the custom schema you configured, creating a final name like _. This is why a model with schema: marketing built by a developer whose target schema is alice_dev surprisingly lands in a schema named alice_dev_marketing, not marketing.

The critical reasoning behind this is to enable safe, collaborative development. Each developer works in their own target schema (e.g., alice_dev). This prefixing behavior ensures that when Alice builds the marketing models, they land in her isolated alice_dev_marketing schema, preventing her from overwriting the work of a colleague or, more critically, the production tables. While this behavior can be fully customized for production environments by overriding the generate_schema_name macro, the default is a powerful safeguard for team-based workflows.

The ref() Function Is a Swiss Army Knife for Dependencies

Every dbt user learns ref function on day one. It's the function that builds the DAG. But its capabilities extend far beyond this basic, single-argument use. Two advanced patterns in particular unlock more robust and scalable project architectures.

First, the two-argument ref() is your key to dbt Mesh. When you need to reference a model from another dbt project or an installed package, you can use as shown below,

'{{ ref('project_or_package', 'model_name') }}'

This syntax creates an explicit, unambiguous dependency on a public model maintained by another team or package, which is the foundational pattern for building a scalable, multi-project dbt Mesh architecture.

Second, you can force dependencies that dbt can't see. Sometimes, a ref() call is placed inside a conditional Jinja block, like

'{% if execute %}'

which is only evaluated at run time. During dbt's initial parsing phase, the execute variable is false. This means the parser never steps inside the 'if execute' block, so it is completely blind to the ref() call within it and fails to build the dependency graph correctly.

To solve this, you can add a simple SQL comment outside the block:

'`--depends_on: {{ ref('model_name') }}`'

dbt's parser is smart enough to evaluate Jinja inside SQL comments, allowing it to detect the dependency every time while keeping the compiled SQL valid.

Conclusion
From revealing that dbt build is an atomic guardian of your DAG, not just a shortcut, to uncovering the hidden network calls of dbt compile, it’s clear that dbt’s most powerful features lie just beneath the surface. These five "truths" are just a starting point. By moving beyond the initial basics, you can build data pipelines that are not only functional but also more reliable, scalable, and easier to maintain.

What hidden dbt feature has been a game-changer in your own data wor
kflow?

Azure Data Factory: Building a Simple Pipeline - Step-by-Step Guide

Arunkumar Panneerselvam — Tue, 23 Sep 2025 05:31:44 +0000

Azure Data Factory (ADF) makes it easy to move, transform, and automate data workflows in the cloud. In this post, I will walk through creating a simple ADF pipeline from setting up core resources and GitHub integration to copying data between storage containers and monitoring the entire process.

1. Creating a Data Factory Instance

To get started, I created a new data factory resource in Azure named coredata-datafactory1.

Navigation Steps:

Log in to the Azure Portal.
Select “Create a resource” > “Analytics” > “Data Factory.”
Fill in the resource details: subscription, resource group, unique factory name, region, and version (V2) [leaving other tabs to defaults]
Refer Step 2 for Git Configuration (this can be done later too !)

Proceed to the “Review + create” tab and confirm.

After deployment, hit “Go to resource.”

2. Setting Up GitHub Integration

Source control is key for managing changes and collaborating with ease. I set up my GitHub account, created a private repository named coredata-azuredatafactory.

Authorize ADF to access the repo,

3. Exploring Azure Data Factory Studio

Next, I launched the Data Factory Studio. The homepage provides a simple navigation pane, making it easy to design and monitor your data flows.

4. Creating a Storage Account

For this demo, I created a storage account named coredatadatastorage1. The account serves as both the data source and destination.

Navigation Steps:

In the Azure Portal, click “Create a resource” > “Storage” > “Storage account.”

Specify the required configuration (resource group, region, account name, redundancy).

On the “Review + create” screen, review the settings and click “Create.”

Once deployed, go to the resource overview page.

5. Designing the Data Pipeline

I created a simple pipeline called data_copy_pipeline to copy data from an input directory to an output directory in Blob Storage.

Open Data Factory Studio and click on “Author.”

Select “Pipeline” and click “New pipeline.”

Name your pipeline (data_copy_pipeline).

Go to "Activities" and under "Move and transform," select the "Copy data"

6. Creating Datasets

For this pipeline, I needed datasets to represent the source (input file) and sink (output file).

Source Dataset:

Choose Azure Blob Storage as the source.

As my file is (.txt) I selected binary format,

Sink Dataset:

Set Azure Blob Storage as the destination (same as source).

Name the output file (e.g., test_data_out.log).

7. Configuring Linked Services

Each dataset requires a linked service that defines how ADF connects to the underlying storage.

Create a linked service for the source by providing storage account credentials and selecting the container or directory.

Select the test input file (e.g., test_data.txt).

It should look like,

Repeat for the sink (output container and path).

Test the connection to ensure setup is correct.

8. Running & Monitoring the Pipeline

With everything set, I manually triggered the pipeline using the “Trigger now” button.

After execution, navigate to “Monitor” and check the status under “Pipeline runs.”

Review activity details in “Activity runs”,

Inspect the details tab for run-specific metadata and logs,

9. Verifying the Output

Finally, I verified that the output file was written to the specified sink directory in Blob Storage (test_data_out.log).

Summary

By following these steps, you can quickly spin up your own Azure Data Factory, connect it with source control, design simple data pipelines, and move data between Azure storage resources. The visual tools and straightforward setup make it incredibly friendly—even for those just getting started with cloud data engineering.

Conclusion

Azure Data Factory empowers users to orchestrate cloud-scale data movement and transformation with minimal setup or code. Its integration with GitHub gives you confidence in version control, and the monitoring features keep you in the loop at every stage. Next, you can explore scheduling, data transformations, and integrating with other services—but even a basic pipeline gives you a strong foundation for bigger data projects.

Thank you for reading! If you found this post helpful or inspiring, please leave a comment below with your thoughts or questions. I would love to hear your feedback and experiences. Feel free to share this article with friends or colleagues who might benefit too.

Keep transforming and exploring new data possibilities with Azure Data Factory!

Azure Data Factory (ADF) - A Beginner's Guide to Cloud Data Integration

Arunkumar Panneerselvam — Tue, 23 Sep 2025 01:00:44 +0000

Azure Data Factory (ADF) is a powerful, fully managed cloud service from Microsoft designed to simplify the process of moving, transforming, and orchestrating data at scale. Whether you are a developer, data engineer, or analyst, ADF provides a versatile platform to build data-driven workflows that automate data intake and processing across many sources and destinations. This blog introduces the core concepts, components, and benefits of Azure Data Factory, helping beginners understand how to get started easily.

What is Azure Data Factory?

Azure Data Factory is a serverless data integration platform built for modern cloud and hybrid data scenarios. It helps create automated data pipelines that extract data from diverse sources, perform transformations, and load the data into sinks (like data warehouses or lakes). You can think of ADF as a data orchestrator that ensures the right data moves efficiently and reliably between systems to enable analytics and reporting.

It supports a wide range of data sources from on-premises databases to cloud storage, SaaS applications, and big data stores. ADF also integrates well with other Azure services, making it ideal for enterprises adopting cloud data modernization.

Key Concepts and Components

Pipeline

A pipeline is a logical grouping of activities that perform a unit of work. For instance, a pipeline might copy data from an Azure Blob storage location and then transform it using a compute service like Azure Databricks. Pipelines enable you to manage the workflow as a single, coordinated job, running steps either sequentially or in parallel.

Activity

An activity represents a single task within a pipeline. There are many types, including but not limited to copying data, running a stored procedure, executing a data flow (for transformations), or invoking a REST endpoint. Activities are the building blocks to implement your data processes.

Dataset

Datasets represent data structures within your data stores, like tables or files. They act as inputs or outputs for activities, specifying what data is being processed.

Linked Service

A linked service is a connection string or configuration that defines the connection to a data source, destination, or compute resource. For instance, a linked service might connect ADF to an Azure SQL database or an Amazon S3 bucket. Learn More

Integration Runtime

This is the compute infrastructure that performs data movement and transformation. You can use Azure-hosted runtimes or self-hosted runtimes to securely access on-premises data. Learn More

Data Flows

Visual data transformation tools that let you build data transformation logic with a drag-and-drop experience. Data Flows in ADF execute on managed Spark clusters, handling large-scale data processing without coding Spark directly.

Triggers

Triggers define when pipelines run—on a schedule, in response to an event, or manually. You can automate your pipelines to execute daily, hourly, or based on file arrivals.

Basic Pipeline Example: Copy a Test data file (.txt) in Azure Blob Storage (Source & Sink (target/destination))

Outline of very high-level steps, we can have a detailed explanation of it in a separate post.

Create Linked Services: Define connections for your source (Azure Blob Storage) and sink (Azure Blob Storage).
Create Datasets: Define datasets that point to your source files and target.
Build a Pipeline: Add a Copy Activity that moves data from the source dataset to the sink dataset.
Add a Trigger: Schedule the pipeline to run manually once or daily to keep your target updated with new data files.

This simple setup can be extended to include data transformations, error handling, and notifications, showcasing ADF’s automation capabilities.

Why Use Azure Data Factory?

Scalable and Managed

No need to manage servers; scale processing up or down as needed.

Wide Connectivity

Supports hundreds of connectors for various on-premises and cloud data platforms.

Code-Free UI

Build complex ETL/ELT workflows with minimal coding.

Advanced Data Flows

Use Spark-based data flows for heavy transformations.

Integration with Azure Ecosystem

Easily connect with Azure Synapse, Databricks, Logic Apps, and more.

Monitoring and Alerting

Built-in monitoring dashboards provide detailed pipeline run information and failure alerts.

Summary

Azure Data Factory is a versatile tool that empowers data engineers and organizations to automate data workflows across hybrid and cloud environments. Its well-structured components—pipelines, datasets, linked services, activities, and integration runtime work together to create seamless data orchestration solutions. Whether ingesting, transforming, or transferring data, ADF offers a scalable, low-maintenance platform with powerful automation and monitoring features.

Conclusion

Understanding and leveraging Azure Data Factory can dramatically improve data workflows’ efficiency and reliability. Its rich feature set allows beginners to quickly get started and experts to build complex solutions at scale. By mastering the basics of ADF, you can contribute significantly to any data-driven organization’s success, enabling faster insights and smarter business decisions.

This overview aims to provide a clear and approachable introduction that invites further exploration and hands-on learning with Azure Data Factory.

Keep transforming and exploring new data possibilities with Azure Data Factory!

Automating Your Local DBT & Snowflake Playground with Python

Arunkumar Panneerselvam — Tue, 16 Sep 2025 05:26:54 +0000

> Harnessing Python to reverse-engineer Snowflake metadata as dbt sources an easy, efficient playground setup

Introduction

Everyone learns differently. Whether it's diving into textbooks, absorbing visual content, or hands-on tinkering, the path to mastery varies widely. Personally, I find the best way is by doing building, automating, debugging, and iterating until something clicks.

Recently, I explored how to automatically generate a full dbt project structure from a Snowflake trial account taking away the toil of manual YAML and SQL file creation. If you’ve dabbled in dbt or Snowflake, you know setting up sources can be time-consuming, especially across schemas and tables.

This guide, however, is not a beginner’s tutorial on Snowflake or dbt basics. I assume you’ve got some familiarity already. Instead, it focuses on streamlining setup using Python scripts and Visual Studio Code (or any coding environment you prefer), enabling rapid experimentation and transformation.

Why Automate Source Generation?

Manual source definition is repetitive and error-prone. When working with complex databases, initial setup becomes a bottleneck.

Using a Python script to introspect the Snowflake metadata and generate compliant dbt source files:

Saves hours of setup time
Avoids human mistakes in YAML/SQL
Scales effortlessly as schemas or tables grow
Sets a foundation for continuous and reproducible transformations

Setting Up Your Development Environment, You’ll need a few tools ready:

Python 3.13+: Download & install Pip: The Python package manager — install if not available Virtualenv: To isolate your dependencies

Step 1 — Install Python

Download Python

Download Python | Python.org

The official home of the Python Programming Language

python.org

Download python and install it first. I recommend installing the latest version of python 3.

You can verify if python has been correctly installed by:

python --version

you should receive something like:

Python 3.12.1

Step 2 — Install pip

Download pip:

curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py

Then install it:

python get-pip.py

You should see similar to this,

PS C:\Users \arun> curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py                                                                                                                                                   PS C:\Users \arun> python get-pip.py
Collecting pip
  Using cached pip-25.2-py3-none-any.whl.metadata (4.7 kB)
Using cached pip-25.2-py3-none-any.whl (1.8 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 25.2
    Uninstalling pip-25.2:
      Successfully uninstalled pip-25.2
Successfully installed pip-25.2

Verify the installation:

python -m pip --version

you should get something like:

PS C:\Users \arun> python -m pip --version
pip 25.2 from C:\Users \arun\AppData\Local\Programs\Python\Python313\Lib\site-packages\pip (python 3.13)

If you get errors when running pip, it might be due to the environment variables not properly set up. Follow this article to set up the environment vars for pip.

Step 3 — Install virtualenv

Virtualenv is a tool in Python that allows you to create isolated environments where you can install packages and dependencies without affecting the global Python installation. It helps manage project-specific dependencies and avoids conflicts between different projects.

You can run the whole project without virtualenv but I prefer isolating different projects in case I need different python or library versions.

python -m pip install --user virtualenv

Installation output should be similar to this,

Step 4 — Create the Virtual Environment

Now we are ready to create our virtual environment: first we create and, then we activate it. Just go to the directory you want to create this project and run:

python -m venv dbt-env

This will install the environment under that directory as shown below:

Check the user guide for details.

Step 5 — Activate the Virtual Environment

Once the virtual environment created you will need to activate it. In PowerShell, from the root directory you have created the environment, you activate the environment like this:

.\Scripts\activate.ps1

It will look like this:

When you want to operate within this project, ensure that you have the dbt-env activated, as shown above. Additionally, all subsequent installations and operations must occur within this directory or its subdirectories.

Installing DBT

Installing DBT with pip couldn't be simpler. Follow the instructions from dbt documentation. I am summarizing it below for Snowflake specifically.

Install dbt-core:

python -m pip install dbt-core

Then install the Snowflake-specific libraries:

python -m pip install dbt-snowflake

You can see the libraries and the versions under your environment as shown below:

pip list

It should show your libraries as below (Note: I may have more libraries as I made wider installations)

Snowflake instance

Creating a Trial Snowflake Account

If you don’t have access to an Enterprise Snowflake Account, the easiest and cheapest way to complete this step is using a Trial Account. Also, we will be generating the data sources from the trial account objects.

Complete the Snowflake Signup

signup.snowflake.com

Then you select the edition and the cloud provider. I have selected Microsoft Azure. You will be asked for the purpose of the trial account and… that's it, you have the trial account created !!!

You will receive an email to activate your Snowflake account and the link to your instance. Click on that link and activate it. Upon signup, note your account identifier, username, password, default warehouse, and database details. These are essential for connecting from your scripts and dbt profiles.

Initialize Your dbt Project

With the environment active, start your dbt project. Run DBT Init statement to initialize the project as shown below. Make sure the environment has been activated as shown in Step 5 of the section “Create the Virtual Environment”.

dbt init snowflake_dbt_main

During setup, choose Snowflake, input your credentials, and specify the default schema, warehouse, etc.

(dbt-main) PS D:\dbt\dbt-main> dbt init snowflake_dbt_main
00:11:48  Running with dbt=1.10.11
00:11:48  
Your new dbt project "snowflake_dbt_main" was created!

For more information on how to configure the profiles.yml file,
please consult the dbt documentation here:

  https://docs.getdbt.com/docs/configure-your-profile

One more thing:

Need help? Don't hesitate to reach out to us via GitHub issues or on Slack:

  https://community.getdbt.com/

Happy modeling!

00:11:48  Setting up your profile.
Which database would you like to use?
[1] snowflake

(Don't see the one you want? https://docs.getdbt.com/docs/available-adapters)

Enter a number: 1
account (https://<this_value>.snowflakecomputing.com): xxxxxxx
user (dev username): xxxxxxxx
[1] password
[2] keypair
[3] sso
Desired authentication type option (enter a number): 1
password (dev password):
role (dev role): ACCOUNTADMIN
warehouse (warehouse name): COMPUTE_WH
database (default database that dbt will build objects in): SNOWFLAKE_SAMPLE_DATA
schema (default schema that dbt will build objects in): TPCDS_SF100TCL
threads (1 or more) [1]:
00:15:27  Profile snowflake_dbt_main written to C:\Users \arun\.dbt\profiles.yml using target's profile_template.yml and your supplied values. Run 'dbt debug' to validate the connection.
(dbt-main) PS D:\dbt\dbt-main>

Provide a valid Account and username. The Account Identifier has 2 segments:
-
You might get something like:
ED25756-YUNGSKL
where the organization is ED25756 and the account name would be YUNGSKL
You can also find this in the Snowflake Account URL or under the profile > Account > Account details.

The project is now initialized, and you will see the following directories,

Snowflake Data Sources

In order to create the data sources from our newly created trial Snowflake account, below Python script that does exactly that, it connects to your Snowflake account, pulls the tables and views from a specific Warehouse, Database, and Schema, and creates the DBT objects.

The Python Automation Script
The heart of this project — a Python script — connects to Snowflake, extracts metadata on tables, views, and columns, then auto-generates:

YAML files describing data sources
SQL files querying these sources with a standard wrapper

This eliminates manual YAML/SQL creation.

The script code is here for your usage. Just do the modifications according to your Snowflake hierarchy.

Sample code snippet function showing metadata query and file generation

To run this script, you would need the following information,

Snowflake Account Code, User, and Password. This can be fetched from the earlier steps of the setup or you can get it from the Snowflake account.

DBT Models path. You can get this by right-clicking in the Models folder in the IDE (Visual Studio Code).

Setting up environment variables before execution can be done by setting the parameters & values under the "Environment variables" as shown below,

Create the following environment variables,

DBT_MODELS_PATH
SNOWFL_ACCT
SNOWFL_USER
SNOWFL_PWD

with the appropriate values as they are needed for the script to execute.

Ensure you have environment variables configured:

If you are using PowerShell,
_

$env:DBT_MODELS_PATH = "D:\dbt\dbt-main\snowflake_dbt_main\models"
$env:SNOWFLAKE_USER=your_user
$env:SNOWFLAKE_PASSWORD=your_password
$env:SNOWFLAKE_ACCOUNT=your_account
_

If on Linux,
_

export SNOWFLAKE_USER=your_user
export SNOWFLAKE_PASSWORD=your_password
export SNOWFLAKE_ACCOUNT=your_account
export DBT_MODELS_PATH=/full/path/to/your/dbt/models
_

Running the Automation

Once you run it, as your account is a trial and it already has some sample data, the DBT model will automatically create all the tables and views from the database SNOWFLAKE_SAMPLE_DATA and schemas below

['TPCH_SF100', 'TPCH_SF1000']

It will use the default warehouse COMPUTE_WH. Look for creation messages in the terminal and verify files under the models directory.

python .\data_generate.py

The output must be similar to,

PS D:\dbt> python .\data_generate.py
Models directory: D:\dbt\dbt-main\snowflake_dbt_main\models
Connected to Snowflake successfully.
Sample metadata:
               schema                 object_name object_type   column_name data_type mandatory
0  INFORMATION_SCHEMA            APPLICABLE_ROLES        VIEW       GRANTEE   varchar  not null
1  INFORMATION_SCHEMA            APPLICABLE_ROLES        VIEW     ROLE_NAME   varchar  not null
2  INFORMATION_SCHEMA            APPLICABLE_ROLES        VIEW    ROLE_OWNER   varchar  not null
3  INFORMATION_SCHEMA            APPLICABLE_ROLES        VIEW  IS_GRANTABLE   varchar      null
4  INFORMATION_SCHEMA  APPLICATION_SPECIFICATIONS        VIEW         LABEL   varchar  not null
Creating directory: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF100_src
Creating directory: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF1000_src
Directory creation completed with code: 0
Starting to create YAML files for schema TPCH_SF100...
Generating YAML files in: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF100_src
Processing object: CUSTOMER with type BASE TABLE
Created YAML file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF100_src\CUSTOMER.yml
Processing object: LINEITEM with type BASE TABLE
Created YAML file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF100_src\LINEITEM.yml
Processing object: NATION with type BASE TABLE
Created YAML file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF100_src\NATION.yml
Processing object: ORDERS with type BASE TABLE
Created YAML file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF100_src\ORDERS.yml
Processing object: PART with type BASE TABLE
Created YAML file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF100_src\PART.yml
Processing object: PARTSUPP with type BASE TABLE
Created YAML file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF100_src\PARTSUPP.yml
Processing object: REGION with type BASE TABLE
Created YAML file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF100_src\REGION.yml
Processing object: SUPPLIER with type BASE TABLE
Created YAML file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF100_src\SUPPLIER.yml
Starting to create YAML files for schema TPCH_SF1000...
Generating YAML files in: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF1000_src
Processing object: CUSTOMER with type BASE TABLE
Created YAML file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF1000_src\CUSTOMER.yml
Processing object: LINEITEM with type BASE TABLE
Created YAML file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF1000_src\LINEITEM.yml
Processing object: NATION with type BASE TABLE
Created YAML file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF1000_src\NATION.yml
Processing object: ORDERS with type BASE TABLE
Created YAML file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF1000_src\ORDERS.yml
Processing object: PART with type BASE TABLE
Created YAML file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF1000_src\PART.yml
Processing object: PARTSUPP with type BASE TABLE
Created YAML file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF1000_src\PARTSUPP.yml
Processing object: REGION with type BASE TABLE
Created YAML file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF1000_src\REGION.yml
Processing object: SUPPLIER with type BASE TABLE
Created YAML file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF1000_src\SUPPLIER.yml
Starting to create SQL models for schema TPCH_SF100...
Generating SQL files in: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF100_src
Processing SQL model: CUSTOMER (type BASE TABLE)
Created SQL file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF100_src\CUSTOMER.sql
Processing SQL model: LINEITEM (type BASE TABLE)
Created SQL file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF100_src\LINEITEM.sql
Processing SQL model: NATION (type BASE TABLE)
Created SQL file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF100_src\NATION.sql
Processing SQL model: ORDERS (type BASE TABLE)
Created SQL file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF100_src\ORDERS.sql
Processing SQL model: PART (type BASE TABLE)
Created SQL file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF100_src\PART.sql
Processing SQL model: PARTSUPP (type BASE TABLE)
Created SQL file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF100_src\PARTSUPP.sql
Processing SQL model: REGION (type BASE TABLE)
Created SQL file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF100_src\REGION.sql
Processing SQL model: SUPPLIER (type BASE TABLE)
Created SQL file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF100_src\SUPPLIER.sql
Starting to create SQL models for schema TPCH_SF1000...
Generating SQL files in: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF1000_src
Processing SQL model: CUSTOMER (type BASE TABLE)
Created SQL file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF1000_src\CUSTOMER.sql
Processing SQL model: LINEITEM (type BASE TABLE)
Created SQL file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF1000_src\LINEITEM.sql
Processing SQL model: NATION (type BASE TABLE)
Created SQL file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF1000_src\NATION.sql
Processing SQL model: ORDERS (type BASE TABLE)
Created SQL file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF1000_src\ORDERS.sql
Processing SQL model: PART (type BASE TABLE)
Created SQL file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF1000_src\PART.sql
Processing SQL model: PARTSUPP (type BASE TABLE)
Created SQL file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF1000_src\PARTSUPP.sql
Processing SQL model: REGION (type BASE TABLE)
Created SQL file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF1000_src\REGION.sql
Processing SQL model: SUPPLIER (type BASE TABLE)
Created SQL file: D:\dbt\dbt-main\snowflake_dbt_main\models\TPCH_SF1000_src\SUPPLIER.sql
PS D:\dbt>

You should be able to see the models now,

Start Transforming!

With your sources in place, use dbt commands:

dbt run
dbt test

Enjoy a frictionless start on your Snowflake-Dbt transformations.

Conclusion

Setting up a dbt environment connected to Snowflake no longer needs to feel tedious or intimidating. With a bit of scripting magic, you can automate the heavy lifting and focus on what truly matters — crafting data transformations and insights.

Experiment, extend, and share your improvements!

🚀 Enjoyed this guide? Give it a clap, share your feedback in the comments, or ask questions below!

💬 Looking forward to your insights and stories — let’s grow together in the data engineering world!