Forem: ClintWithK

ETL vs ELT: The Data Pipeline Behind Every Powerful Dashboard

ClintWithK — Wed, 15 Apr 2026 03:31:39 +0000

A Brief History on ETL & ELT Processes

Data integration has long been a critical challenge for businesses seeking to unify and leverage data from multiple sources across teams and regions. Since the 1960s, when disk storage and early database management systems first enabled data sharing, organizations have struggled to efficiently combine disparate data sources. This challenge led to the emergence of ETL (Extract, Transform, Load) in the 1970s as the standard method for aggregating and transforming enterprise data from complex systems, payroll, inventory, and ERP platforms. The rise of data warehouses in the 1980s further amplified its importance, driving the development of increasingly sophisticated ETL tools that became more accessible by the 1990s. However, the arrival of cloud computing in the 2000s sparked a fundamental shift to ELT (Extract, Load, Transform), allowing businesses to load raw data directly into cloud data warehouses and lakes for flexible, in-platform transformation. This evolution finally unlocked the full analytical power of big data, enabling faster insights, greater agility, and a new era of truly data-driven decision-making.

ETL vs ELT in 2026: What’s the Difference and Which Should You Use?

ETL and ELT are the two dominant approaches to moving and preparing data for analysis. While both extract data from source systems, their difference lies in when the transformation happens and that single decision dramatically affects performance, cost, scalability, security, and developer experience.

The Core Difference

ETL (Extract, Transform, Load): Data is extracted, transformed on a separate processing engine, then loaded into the target warehouse. Transformation happens before loading.

ELT (Extract, Load, Transform): Raw data is extracted and loaded directly into the destination (usually a cloud data warehouse), then transformed inside the warehouse using its compute power.

ETL vs ELT: Key Differences

The main difference between ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) lies in the order in which data is processed.

In ETL, data is transformed on a separate processing server before being loaded into the data warehouse. In contrast, ELT loads raw data directly into a cloud data warehouse or data lake, where transformations are performed later using the warehouse’s computing power.

Key Advantages of ELT over ETL

Data Compatibility
ETL is best suited for structured data, while ELT can handle both structured and unstructured data such as images, documents, and logs.
Speed
ELT is generally faster because it loads raw data immediately and leverages the parallel processing capabilities of modern cloud data warehouses, enabling near real-time transformations.
Cost
ELT is typically more cost-efficient since it requires fewer systems, less infrastructure, and reduced upfront planning compared to ETL.
Security
Modern cloud data warehouses used in ELT provide built-in security features such as granular access control and authentication, reducing the need for custom security implementations.

When to Use ETL Instead

Although ELT is the standard for modern data platforms, ETL is still useful in specific scenarios:

Integrating with legacy databases or third-party systems with fixed data formats
Early-stage data exploration and experimentation
Complex analytics involving multiple diverse data sources (often in hybrid pipelines)
IoT and edge computing use cases where data must be filtered, cleaned, or aggregated before being sent to the cloud

Modern Tools Landscape (2026)

The lines between ETL and ELT have blurred thanks to powerful specialized tools:

Category	Popular Tools	Best Used With	Notes
Ingestion	Fivetran, Airbyte, Kafka, Debezium	ELT	Kafka for real-time streaming
Orchestration	Apache Airflow, Dagster, Prefect	Both	Industry standard
Transformation	dbt, Spark, dbt + SQL	Mostly ELT	dbt dominates
Traditional ETL	Informatica, Talend, AWS Glue	ETL	Enterprise-heavy
Warehouse/Lakehouse	Snowflake, BigQuery, Databricks, Redshift	ELT	Compute happens here

Practical Modern PatternsMost common pattern today:

Airbyte / Fivetran --> Raw layer in warehouse --> dbt (transform) --> Orchestrated by Apache

Apache Airflow DAG Example (Using BashOperator + PythonOperator)

from airflow import DAG
from airflow.operators.bash import BashOperator
from airflow.operators.python import PythonOperator
from datetime import datetime

def run_dbt_transform():
    """Run dbt transformations"""
    import subprocess
    subprocess.run(["dbt", "run"], check=True)
    print("dbt transformation completed successfully!")


with DAG(
    dag_id="elt_sales_pipeline",
    start_date=datetime(2025, 1, 1),
    schedule="@daily",
    catchup=False,
    default_args={
        "retries": 2,
        "owner": "data_team",
    },
) as dag:

    # Extract & Load (EL)
    extract_and_load = BashOperator(
        task_id="extract_and_load",
        bash_command="""
            echo "Starting data ingestion..."
            # Replace with your ingestion command (Airbyte CLI, Fivetran, custom script, etc.)
            python /scripts/ingest_sales_data.py
            echo "Raw data successfully loaded into warehouse"
        """,
    )

    # Transform (T) with dbt
    transform = PythonOperator(
        task_id="transform_with_dbt",
        python_callable=run_dbt_transform,
    )

    # Optional: Run data quality tests
    data_quality = BashOperator(
        task_id="run_dbt_tests",
        bash_command="dbt test",
    )

    # Task dependencies
    extract_and_load >> transform >> data_quality

Conclusion

In 2026, ELT has become the preferred approach for most modern data teams due to its speed, flexibility, and seamless integration with cloud platforms and tools like dbt and Airflow. However, ETL remains relevant for regulated industries, legacy systems, and edge/IoT use cases.
Choose ELT by default for new projects, but don’t hesitate to use ETL or a hybrid model when compliance, security, or legacy constraints require it. Ultimately, the best pipeline is the one that is reliable, maintainable, and serves your business needs.

Happy data building!

Power BI Guide: Data Cleaning, DAX Formulas, and Dashboard Design

ClintWithK — Mon, 02 Mar 2026 11:56:48 +0000

“Data is rarely clean. It’s duplicated, inconsistent, poorly structured, and sometimes misleading. Yet decisions worth millions depend on it.”

In this article we are going to learn to turn messy data into actionable insights i.e: Data Cleaning, Data Analysis Expressions(DAX) formulas and Dashboard Design. We are going to achieve this by using Power BI.

What is Power BI?

Power BI is Microsoft's business analytics platform that helps you turn data into actionable insights. Whether you're a business user, report creator, or developer, Power BI offers integrated tools and services to connect, visualize, and share data across your organization.
For a deeper dive into Power BI tool, I will point you to Microsoft official article: https://learn.microsoft.com/en-us/power-bi/fundamentals/power-bi-overview

Data Cleaning

Also called data cleansing or data scrubbing, is the process of identifying and correcting errors and inconsistencies in raw datasets to improve data quality, this ensures a dataset meets criteria for accuracy, completeness, validity, consistency, uniqueness, timeliness and fitness for purpose, and it is critical to all data governance initiatives within an organization.

Data Cleaning & Transformation (Power Query)

Removing duplicates: When starting with data cleaning, one of the first steps is removing duplicates. However, this must be done carefully. You need to clearly identify which column defines uniqueness in your dataset. For example: In my dataset below is a Sample Hospital dataset with a Visits table and I've identified visit_ID column as my primary key. In power BI go to Transform data using the right-click on the column the select Remove Duplicates then power BI will automatically remove duplicates:

Handling nulls: Handling null values also requires careful consideration. Not every null should be treated the same way. In many cases, instead of deleting rows that contain null values, it’s better to replace them with a sensible placeholder such as: "Not Provided", "N/A", "Unknown", 0 (only when logically appropriate). Below is an image illustrating how to deal with "nulls":

The principle is that Null handling is not just data cleaning - it’s preserving analytical integrity.

Fixing inconsistent formats: Another common issue in messy datasets is inconsistent formatting — especially with dates and numeric columns. For example, in a pharmacy transactions table, you might find the different date formats in the same column, and the only way to deal with it is to ensure consistent date format as shown below:

Splitting/merging columns: Another common transformation in Power Query is splitting or merging columns. However, this should only be done when it improves clarity, usability, or analytical accuracy. Splitting columns applies in cases where data is shared in a combined format which might limit analysis. For example having, "Kenya - Nairobi" we can have it split into "Country" and "City". The key principle is to transform data only when it improves modeling, filtering, or reporting — not just because it’s possible.

Creating calculated columns: After cleaning and structuring your dataset, the next step may involve creating calculated columns. This is where you derive new information from existing fields at the row level.

A calculated column is computed during data refresh and stored in the model. It evaluates one row at a time.

Note that the above is just an illustration, it is not logical to add IDs in any manner.

DAX Formulas

DAX(Data Analysis Expressions) is a collection of functions, operators, and constants that can be used in a formula, or expression, to calculate and return one or more values. DAX helps you create new information from data already in your model.

We will start understanding DAX around three fundamental concepts: Syntax, Functions, and Context. There are other important concepts in DAX, but understanding these three concepts will provide the best foundation on which to build your DAX skills.

In the above case we are looking at:

Total Profit = SUM('Kenya_Crops_Power BI DATASET'[Profit (KES)])

syntax:

Total Profit - is the New measure
= - this is an equal sign operator
SUM - is a DAX function that adds up all the Profit in the dataset
Parenthesis - these surround the expressions within the arguments.

This could also be broken down to a language that one can easily understand: For the measure named Total Profit, calculate (=) the SUM of values in the [Profit] column in the Kenya_Crops_Power BI DATASET table.

There are other DAX functions which could be explored in context with different datasets. For example: SUMX(), COUNT(), AVERAGE(), AVERAGEX(), IF(), GEOMEANX() and use of AND & OR operators

Dashboard Design

Designing an effective Power BI dashboard is more than just dropping charts onto a canvas. it’s about clarity, readability, and insight delivery. Here are the key principles and techniques you should follow:

1. Layout and Structure

Keep it simple and organized: Group related visuals together. Use whitespace effectively.
Use a grid system: Align charts and tiles for balance.
Highlight key metrics: Place the most important KPIs at the top or in prominent positions.

2. Consistent Visual Themes

Stick to consistent fonts, sizes, and styles across your dashboard.
Avoid clutter—only display essential information.

3. Color Usage with HCL Model

Colors are crucial in dashboards—they guide attention and improve comprehension. Power BI allows you to customize colors, but using the HCL (Hue-Chroma-Luminance) color model ensures your choices are perceptually uniform.

Understanding HCL:

Hue: The type of color (e.g., red, blue, green).
Chroma: The intensity or purity of the color (saturation).
Luminance: The lightness or darkness of the color.

Why HCL matters:

Colors should differentiate data clearly without misleading perception.
HCL ensures that differences in color intensity are perceived equally by the human eye.
Avoid using colors that are too bright or too dull, which can make charts harder to read.

Practical Tips in Power BI:

Use contrasting colors for categorical data to make groups distinguishable.
For sequential data, adjust luminance gradually to show progression.
Test colors for accessibility, especially for color-blind users.

Example:

A bar chart showing sales performance could use a sequential green palette with increasing luminance for higher sales.
A categorical chart for product categories could use distinct hues with similar luminance to make categories easy to compare.

4. Interactivity

Use slicers and filters to allow users to drill down into the data.
Include tooltips and hover effects to provide additional context without cluttering visuals.

5. Storytelling

Arrange visuals in a logical order to tell a story with your data.
Use titles, labels, and annotations effectively to guide the user.

Conclusion
Messy data is the norm: full of duplicates, nulls, inconsistencies, and hidden pitfalls yet critical decisions depend on getting it right. Through this Power BI guide, we transformed raw chaos into reliable insights: cleaning and shaping data in Power Query, unlocking deeper analysis with essential DAX formulas, and delivering clear, actionable stories via thoughtful dashboard design powered by perceptually accurate HCL colors. The payoff? Trustworthy visuals that stakeholders actually use to drive better decisions. Grab a messy dataset, follow the steps, and watch garbage-in become golden-out. You've got the full workflow — now go make your data work for you.

Mastering Joins and Window Functions in SQL for Data Engineers

ClintWithK — Mon, 02 Mar 2026 11:38:02 +0000

In this article, we will cover the basic concepts of SQL joins and window functions. We assume that you are already familiar with the fundamentals of Data Definition Language (DDL): creating schemas and tables, understanding data types, and applying constraints.

We will explore how joins allow you to combine and relate data from multiple tables, and how window functions let you perform calculations across a set of rows related to the current row, without collapsing the result set. Examples of window functions include computing running totals, rankings, moving averages, and cumulative sums. By the end of this article, you’ll understand when and how to use joins and window functions effectively in real-world SQL queries.

What are SQL Joins?

As the name suggests, a join refers to the process of combining two or more items. In SQL, joins are used to combine two or more tables into a single result set using a common field that exists in all the tables involved.

Interestingly, in SQL, this concept can also extend to a table joining itself, which is useful in hierarchical or relational scenarios.

We will explore the different types of joins with practical examples:

When to use them
How to use them
The syntax for each type

We will use four tables: customers, products, sales, and inventory to demonstrate SQL joins, with each table containing relevant data on customers, products, transactions, and stock.

The main types of SQL joins are:

INNER JOIN

When to use: When you want only matching records from two tables.
How to use: Combine two tables on a common column (foreign key).

Syntax & Example: Get a list of all sales with customer and product details:

LEFT JOIN

When to use: When you want all records from the main table and optional matches from another table.
How to use: Use LEFT JOIN when the left table is the main table.

Syntax & Example: List all customers and their purchases, even if they haven’t bought anything:

RIGHT JOIN

When to use: When you want all records from the right table, even if no match exists in the left table.
How to use: Right join is basically the reverse of LEFT JOIN.

Syntax & Example: List all products and any sales made, even if a product hasn’t been sold:

FULL OUTER JOIN

When to use: When you want all records from both tables, matched where possible.
How to use: Combine LEFT JOIN and RIGHT JOIN behavior.

Syntax & Example: List all sales and all inventory items, even if some products haven’t been sold:

CROSS JOIN

When to use: When you need all combinations of rows from two tables.
How to use: CROSS JOIN does not require a condition.

Syntax & Example: Get every possible combination of customers and products (maybe for a promotion plan):

SELF JOIN

When to use: When a table needs to relate to itself, like finding hierarchical relationships.
How to use: Alias the table twice and join on self-referencing columns.

Example (customers with the same first name):

What are Window Functions?

Window functions are honestly one of the most powerful (and underrated) tools in SQL.
They let you do calculations across a bunch of rows that are “related” to the current row… but without smashing everything down into one aggregated row like a normal GROUP BY would. You still get every single row back just with extra useful columns added on.

Core idea in one sentence:
Window functions calculate stuff over a “window” of rows defined by you, while keeping all the original rows intact.

The Big Differences from Regular Aggregates

SUM(), AVG(), COUNT() etc. with GROUP BY one row per group

SUM(), AVG(), COUNT() etc. with OVER() still one row per original row, but now you get the aggregate value repeated/calculated per window

The Must-Know Window Functions

ROW_NUMBER() - gives every row a unique 1,2,3… number

RANK() - like sports rankings (1,2,2,4 if ties)

DENSE_RANK() - same but no gaps (1,2,2,3)

SUM() / AVG() / COUNT() over a window - running totals, moving averages, etc.

LEAD() / LAG() - peek at the next or previous row’s value

NTILE(n), PERCENT_RANK(), CUME_DIST() - more niche but handy for percentiles and bucketing.

explore further on Window functions here.

Conclusion

In this article, we explored the fundamentals of SQL joins and window functions, two powerful tools for combining and analyzing data. Joins allow you to merge tables based on relationships, while window functions let you perform advanced calculations across related rows without losing detail. Mastering these techniques enables data engineers to write efficient queries, uncover insights, and build robust analytics pipelines. By applying joins and window functions to real-world tables like customers, products, sales, and inventory, you can transform raw data into actionable insights with clarity and precision.

Introduction To Schemas & Data Modeling In Power BI

ClintWithK — Mon, 02 Feb 2026 11:06:29 +0000

Power BI being a Microsoft's business analytics platform that helps to turn data into actionable insights. Whether it's a business user, report creator, or developer, Power BI offers integrated tools and services to connect, visualize, and share data across your organization. In this article we are going to take a deeper dive into understanding what are schemas and Data Modeling in Power BI.

Firstly let's jump right into understanding What is a Schema? and What is Data Modeling?

While trying to understand these vital aspects in Data field, we are going to come across key words like: Fact tables, Dimensional tables, Relationships, one to many, filter propagational arrows, performance and usability. We are also going to explore all these key words with demonstrations using sample dataset that can be downloaded using link: [(https://chandoo.org/wp/wp-content/uploads/2024/10/sample-chocolate-sales-data-all.xlsx)]

Load data into Power BI as shown in the image below:

Select Excel Workbook, which will trigger files window on your screen:

Select the correct excel workbook

Select the worksheets & Load the data

Now you have loaded data successfully into Power BI, let's now explore Schemas and Data Modeling using the loaded data.
Before jumping into Schemas, let's understand what is Data Modeling. Having this understanding will help us have a clearer catch of the Schema concept.

Data Modeling

Data modeling is the process of structuring and organizing data to define how it is stored, related, and used within a system. In Power BI, it involves creating a logical data model by connecting tables, defining relationships, setting data types, and building calculations (e.g., using DAX) to support accurate and efficient reporting.

Purpose:

Enable meaningful analysis by linking fact tables (quantitative data) with dimension tables (descriptive context).
Support fast query performance and intuitive report design.
Ensure data integrity and reduce redundancy.

Key Components:

Tables: Fact tables (e.g., shipments) and dimension tables (e.g., product, calendar, people, location).

Fact Tables: usually have large data depending on the size of the firm, can be upto billions or even trillions of rows and, narrow columns(few Attributes).

Dimension Tables: These are the owner of the information in a dataset hence having few rows and fat columns(Many Attributes), this is usually relative depending o the size of the orgaization.
An illustrations as shown below:

Relationships: Typically one-to-many (1:M), defining how tables are connected (e.g., via PID, SPID, cal_date). This illustrated in the image below:

Key reasons for accurate relationships:

Data Accuracy: Ensures that aggregations (e.g., total sales) are calculated correctly across related tables.
Filter Propagation: Enables interactive filtering—when a user selects a value in one visual, related visuals update accordingly based on the defined cross-filter direction.
Performance Optimization: One-to-many relationships with single-direction filtering improve query speed and reduce ambiguity. Avoiding unnecessary bi-directional filters prevents performance degradation.
Avoiding Ambiguity: Multiple relationships between the same tables can cause errors. Using inactive relationships with USERELATIONSHIP() in DAX resolves ambiguity and allows precise control over which relationship is used in calculations.
Model Integrity: Prevents issues like circular references, incorrect data types (e.g., DateTime vs. Date), and data integrity problems that arise from mismatched or poorly defined links.

Best Practices:

Always verify Power BI’s auto-detected relationships they are not always accurate.
Use one-to-many relationships as the default; avoid one-to-one unless necessary.
Set cross-filter direction to Single unless a specific use case requires Both.
Use bridge tables for many-to-many relationships to maintain model clarity and performance.
Regularly manage and edit relationships via the Manage Relationships dialog to ensure alignment with business logic.

Schema Types: Star schema (recommended), snowflake, or flat models. A well-designed data model is foundational for scalable, high-performance Power BI reports

Schemas

What is a Schema? A schema is the logical structure or blueprint of a database that defines how data is organized, stored, and related. It includes tables, columns, data types, relationships, constraints, and other elements, but does not contain the actual data.
In Power BI, a schema refers to the data model's structure, showing how tables are connected to support efficient querying and reporting.

Common schema types in Power BI include:

Star Schema: One central fact table linked to multiple dimension tables (recommended for performance). As illustrated in the image below:

Snowflake Schema: A normalized version of the star schema, where dimension tables are further split. Also sometimes considered as a variant of a Star schema with more things hanging off at the end.

Flat Schema: All data in a single table (simple but prone to redundancy). Refer to this link for more details and deeper understanding of Schemas in Power BI: [(https://learn.microsoft.com/en-us/power-bi/guidance/star-schema)]

Conclusion.

Proper schema design and data modeling are essential for building accurate and efficient Power BI reports. Using well-structured schemas, such as the star schema, and defining correct relationships ensures reliable calculations, optimal performance, and simpler DAX expressions. A strong data model enables clear insights, supports scalability, and allows decision-makers to trust the results produced in Power BI.

Getting Started With Linux For Data Engineering

ClintWithK — Sun, 25 Jan 2026 18:32:12 +0000

As a beginner using Linux, many people tend to feel scared or intimidated — especially because most operations are done through the terminal. This fear is completely normal, but the good news is: once you understand the basics, Linux becomes simple, powerful, and even enjoyable to use.

Linux comes in different flavors known as distributions (distros). Some common ones include:

Ubuntu
Arch Linux
Fedora
Parrot OS
Red Hat

Although these distros may look different, most Linux commands are very similar across them, especially the core ones you’ll use daily.

In this article, we’ll focus on the most minimal but critical Linux commands every beginner — especially an aspiring Data Engineer — should know.

We’ll be working specifically with Ubuntu Server, as it is widely used in data engineering, cloud environments, and production systems.

What We’ll Cover

By the end of this article, you’ll be comfortable with:

Updating and upgrading the system
Navigating directories (folders)
Listing directory contents
Reading and editing files using the terminal
Copying and moving files
Logging into servers using SSH
Transferring files using SCP and SFTP

Let’s upskill you from Beginner to Intermediate
Let’s explore these basics together.

Step 1: Identify Your Linux System

Run the command below:

uname -a

This command shows:

The Linux kernel version
System architecture
OS details

It’s a quick way to know what system you’re working on, especially when logged into remote servers.

Step 2: Update and Upgrade the System

Keeping your system updated is one of the most important habits in Linux.

Update Package Lists

sudo apt update

This command refreshes the list of available packages and versions.
It ensures that any software you install is up to date.

Upgrade Installed Packages

sudo apt upgrade

This upgrades all installed applications and system tools to their latest versions.
Think of this as patching the OS.

Step 3: Navigating Directories

Linux uses directories instead of folders, but they mean the same thing.

Check Your Current Directory

pwd

List Files and Directories

ls

To see more details:

ls -l

To include hidden files:

ls -a

Move Between Directories

cd directory_name

Go back one level:

cd ..

Go to your home directory:

cd ~

Step 4: Reading and Editing Files from the Terminal

View File Contents

cat filename.txt

For long files:

less filename.txt

Step 5: Editing Files Using Nano and Vi

Using Nano (Beginner-Friendly)

nano filename.txt

Type to edit

Press Ctrl + O to save
Press Ctrl + X to exit

Using Vi / Vim (Advanced but Powerful)

vi filename.txt

Basic commands:

Press i → Insert mode
Press Esc → Exit insert mode
Type :wq → Save and quit
Type :q! → Exit without saving

Step 6: Copying and Moving Files

Copy Files

cp file1.txt /path/to/destination/

Copy directories:

cp -r folder1 /path/to/destination/

Move or Rename Files

mv oldname.txt newname.txt

Move files:

mv file.txt /new/location/

Step 7: Logging into a Server Using SSH

SSH allows you to securely access remote servers.

ssh username@server_ip

Example:

ssh ubuntu@192.168.1.10

SSH is heavily used in:

Cloud platforms
Data pipelines
Production servers

Step 8: File Transfer Using SCP and SFTP

Copy Files Using SCP

scp file.txt username@server_ip:/remote/path/

Copy files from server to local machine:

scp username@server_ip:/remote/file.txt .

Using SFTP

sftp username@server_ip

Common SFTP commands:

ls
get filename
put filename
exit

Why Linux Matters for Data Engineers

As a Data Engineer, Linux is unavoidable:
Most data systems run on Linux
Cloud servers are Linux-based
Automation and pipelines rely on terminal commands
Mastering these basics gives you:
Confidence
Speed
Control over systems

Conclusion

Linux might feel overwhelming at first, but you don’t need to know everything at once. Start with the basics, practice daily, and build confidence gradually.

Happy Coding!!!

Understanding Git and GitHub: Pushing, Pulling, and Tracking Code (Beginner-Friendly Guide)

ClintWithK — Fri, 16 Jan 2026 09:20:04 +0000

As you grow in your tech journey, one tool you will definitely encounter early is Git. At first, it might feel confusing or even intimidating, but once you understand why it exists and how it works, it becomes one of the most powerful tools in your workflow.

In this article, we’ll cover:

What Git is and why version control is important
How to push code to GitHub
How to pull code from GitHub
How to track changes using Git

Let’s break it down step by step.

What Is Git and Why Is Version Control Important?

Git is a version control system. In simple terms, it helps you:

Track changes in your code
Save different versions of your project
Collaborate with others without overwriting each other’s work

Imagine working on a project and accidentally deleting a file that worked perfectly yesterday. Without Git, that’s a nightmare. With Git, you can go back in time and restore previous versions.

Why Version Control Matters

Version control allows you to:

See what changed, when, and by who
Work on features safely
Experiment without fear
Collaborate with teams efficiently

GitHub, on the other hand, is a platform that hosts Git repositories online, making it easier to store, share, and collaborate on code.

Initial Setup (Quick Recap)

Make sure you have:

Git installed
A GitHub account
SSH set up (as covered in the previous article)

Open your terminal using Ctrl + Alt + T.

Step 1: Initialize a Git Repository

Navigate to your project folder:

cd project-folder

Initialize Git:

git init

This creates a hidden .git folder that Git uses to track your project.

Step 2: Tracking Changes Using Git

Check Project Status

git status

This shows:

Untracked files
Modified files
Files ready to be committed

Add Files to Staging

To track all files:

git add .

Or add a specific file:

git add filename.py

Commit Your Changes

git commit -m "Initial project setup"

A commit is like saving a snapshot of your project at that moment.

Step 3: Push Code to GitHub

Create a Repository on GitHub

Log in to GitHub
Click New repository
Give it a name
Do not initialize with README
Copy the SSH repository URL

Connect Local Repo to GitHub

git remote add origin git@github.com:username/repository-name.git

Verify:

git remote -v

Push Your Code

git branch -M main
git push -u origin main

Your code is now live on GitHub

Step 4: Pull Code from GitHub

Pulling allows you to fetch and update your local code with changes from GitHub.

git pull origin main

This is especially important when:

Working with others
Switching between machines
Updating your local copy

Step 5: Viewing and Tracking Changes

View Commit History

git log

See File Changes

git diff

Quick Overview

git status

These commands help you understand what changed, what’s staged, and what’s committed.

A Typical Git Workflow

Here’s how things usually flow:

Make changes to your code
Check status

git status

Add changes

git add .

Commit

git commit -m "Describe what you changed"

Push to GitHub

git push

Simple, repeatable, and powerful.

Final Thoughts

Git is not just a tool — it’s a safety net for your code. Once you get comfortable with pushing, pulling, and tracking changes, your confidence as a developer grows significantly.

If you’re just starting out, don’t rush mastery. Focus on understanding the basics and practicing regularly.

What’s Next?

            Practice! Practice! Practice!

Happy coding!

Setting Up a Secure Connection to Git Using SSH Keys on Ubuntu (Beginner’s Guide)

ClintWithK — Fri, 16 Jan 2026 05:34:21 +0000

In your tech journey, you will definitely come across SSH multiple times, so it’s in your best interest to understand “the what” and “the why.”

SSH (Secure Shell) is a secure protocol used to facilitate communication between computers and servers through the terminal. It is commonly preferred because it is secure, fast, and hard to compromise without the correct credentials.

SSH is widely used for:

Secure file transfers
System administration
Remote server access
Authenticating Git operations (push and pull requests)

In this guide, we’ll focus on using SSH keys to securely connect your Ubuntu machine to GitHub, allowing you to push and pull code without repeatedly entering your username and password.

Prerequisites

Before you begin, ensure you have the following:

A PC running Ubuntu
Git installed 👉 Refer to my previous article: https://dev.to/k1int/installing-and-setting-up-git-on-ubuntu-beginners-guide-5bk0
SSH installed
Basic terminal knowledge

Open your terminal using Ctrl + Alt + T.

Step 1: Confirm SSH Installation

Run the command below:

ssh -V

If SSH is installed, you should see output similar to:

OpenSSH_9.6p1 Ubuntu-3ubuntu13.14, OpenSSL 3.0.13 30 Jan 2024

If SSH Is Not Installed

Install it using:

sudo apt update
sudo apt install openssh-server

Check if SSH is running:

sudo systemctl status ssh

Enable SSH to start on boot (recommended):

sudo systemctl enable ssh

Step 2: Check for Existing SSH Keys

Run:

ls -a ~/.ssh

If keys already exist, you should see something similar to:

.  ..  authorized_keys  id_ed25519  id_ed25519.pub  known_hosts  known_hosts.old

id_ed25519 → Private key (keep this secret ❗)

id_ed25519.pub → Public key (shared with GitHub)

Step 3: Generate an SSH Key (If You Don’t Have One)

If no keys are found, generate one:

ssh-keygen -t ed25519 -C "your_email@example.com"

When prompted:

Press Enter to accept the default location

Optionally add a passphrase for extra security

Step 4: Start the SSH Agent and Add Your Key

Start the SSH agent:

eval "$(ssh-agent -s)"

Add your SSH private key:

ssh-add ~/.ssh/id_ed25519

Confirm the key is added:

ssh-add -l

Step 5: Add the SSH Key to Your GitHub Account

Copy Your Public Key

cat ~/.ssh/id_ed25519.pub

Copy the entire output.

Add the Key on GitHub

-Log in to GitHub

-Click your profile picture → Settings

-Navigate to SSH and GPG keys

-Click New SSH key

Add:

-Title: e.g. Ubuntu Laptop

-Key: Paste the copied public key

-Click Add SSH key

Step 6: Test the SSH Connection

Verify the setup:

ssh -T git@github.com

If successful, you should see:

Hi username! You've successfully authenticated, but GitHub does not provide shell access.

Congratulations! Your Ubuntu machine is now securely connected to GitHub using SSH.

Conclusion

-SSH is more secure than HTTPS authentication

-No need to enter passwords when pushing or pulling code

-This setup is essential for professional Git workflows

What’s Next?

-In the next article, we’ll walk through:

-Initializing a Git repository

-Making commits

-Pushing and pulling changes from GitHub using SSH

                            Happy Coding!

Installing and Setting Up Git on Ubuntu (Beginner’s Guide)

ClintWithK — Wed, 14 Jan 2026 21:11:17 +0000

When starting your journey in software development, one of the first tools you’ll encounter is Git. Git is a distributed version control system that helps developers track changes in their code, collaborate with others, and manage projects efficiently.

Most developers store their projects in repositories (commonly called repos) on platforms such as GitHub. Git allows you to interact with these repositories directly from your local machine — pushing changes, pulling updates, and managing versions of your code.

In this guide, we’ll walk through how to install Git on Ubuntu and perform the basic configuration needed to get started.

Prerequisites

Before you begin, make sure you have:

A PC running Ubuntu (or another Debian-based Linux distribution)
A GitHub account --> If you don’t have one, sign up at: https://github.com/

Step 1: Update Your System Packages

Open your terminal using Ctrl + Alt + T, then update your package list:

sudo apt update

Step 2: Install Git

Install Git by running:

sudo apt install git

Press Y when prompted to confirm the installation.

Step 3: Verify the Installation

Once the installation is complete, confirm that Git was installed successfully:

git --version

If Git is installed correctly, you’ll see output similar to:

git version 2.x.x

If you see the version number, you’re good to proceed.

Step 4: Configure Your Git Username

Git uses a username to identify who made each commit. This should match the username you use on GitHub.

Run the command below and replace your-github-username with your actual GitHub username:

git config --global user.name "your-github-username"

Step 5: Configure Your Git Email

Next, set the email address associated with your GitHub account. This email will be attached to your commits.

Replace your-email@example.com with the email you used when signing up on GitHub:

git config --global user.email "your-email@example.com"

Step 6: Confirm Your Git Configuration

To verify that your username and email were set correctly, run:

git config --list

You should see output that includes both your user.name and user.email.

  user.email=your-email@example.com
  user.name=your-github-username

What’s Next?

At this point, Git is successfully installed and configured on your machine.
However, to fully interact with GitHub (especially pushing and pulling code), it’s recommended to set up SSH authentication.

In the next article, we’ll cover:

Generating an SSH key
Adding it to GitHub
Connecting securely without typing your password every time

Conclusion

Installing and configuring Git is an essential first step for any developer. With Git set up on your Ubuntu system, you’re now ready to start managing projects, collaborating with others, and contributing to open-source software.

Happy coding!!!