Forem: Edwin Omondi

Demystifying SQL Joins & Window Functions

Edwin Omondi — Sat, 14 Mar 2026 01:55:09 +0000

1. SQL Joins

The Structure Query Language (SQL) Join is a command clause that combines records from two or more tables in database. It is a means of combining data fields from two tables by using values common to each table.

If you work with databases, you'll likely need to use SQL Joins to retrieve data from multiple tables at some point in your work. These impactful clauses allow you to get information from separate tables so that you get the right information you need to make the best possible decision.

What is a Join in Sql?

In Structure Query Language (SQL), a Join is used to connect two or more records within a relational database. As their name suggests, relational databases organize data based on pre-established relationships, which define how data contained in one table relates to data contained within another (or several others).

Fig.1.0

The Join clause retrieves data from related tables in a database. Because it retrieves data from multiple tables, however, the SQL Join clause is more complex than a simple query that retrieves data from a single table.

Types of Sql Joins

There are many different use cases for SQL Joins, and they are crucial when mapping out relationships between tables in your database.

There are four primary types of SQL Joins that you can use: Inner Join, Left Outer Join, Right Outer Join, and Full Outer Join. Explore these four types of JOINs along with some sample SQL Join clauses below:

a) Inner Join

Inner Joins combine two tables based on a shared key. For example, if you have a table with a column called "user id" and each user id is unique to each user, this you could join that table to another table with a "user id" column to find the information associated with each user. This example shows how to use an Inner JOIN clause to join two tables

Fig 1.1

An Inner Join returns only the rows that have matching values in both tables.

SELECT c.first_name, c.last_name
FROM assignment.customers c
JOIN assignment.sales s
ON c.customer_id = s.customer_id;

Fig 1.2

b) Left Outer Join

Left Outer Joins return all rows from the first table and only the rows in the second table that match.

Fig 1.3

A Left Join returns all rows from the left table and the matched rows from the right table. If there is no match, NULL values are returned for columns from the right table.
For example, we would use the previous syntax that joins the ‘film’ and ‘film_category’ tables. However, this time, we would make two changes. First, we would use a left join instead of an inner join. Second, we would exclude the first three rows from the ‘film_category’ table.

Fig 1.4

The result clearly conveys that there are null values in the first three category_id entries. This is because the first three rows in the ‘film_category’ table are excluded before the left join occurs

c) Right Outer Join

Right Joins are logically the opposite of Left Joins—they return all rows from the second table, and only the rows in the first table that match

Fig 1.5

A RIGHT JOIN returns all rows from the right table and the matched rows from the left table. If there is no match, NULL values are returned for columns from the left table.
For example, we would use the previous syntax. However, this time, we would make two changes. First, we would use a right join instead of a left join.

d) Full Join

Full Joins combine both left and right joins by returning all rows from both tables, as long as there is at least one match between them

2. Window Functions

Window functions, also known as windowing or analytic functions, are a category of functions in SQL that perform calculations across a specified range of rows related to the current row within a result set. These functions operate on a “window” of rows defined by an OVER() clause. Window functions are often used in conjunction with the ORDER BY clause to define the window.

Now, let’s look at some common window functions

SQL Aggregate functions

SQL Aggregate Functions allow summarizing large sets of data into meaningful results, making it easier to analyze patterns and trends across many records. They return a single output value after processing multiple rows in a table.

Perform calculations like totals, averages, minimum or maximum values on data.
Ignore NULL values in most functions except COUNT(*), improving result accuracy.
Work with clauses such as GROUP BY, HAVING and ORDER BY for analysis.
Example: First, we create a demo SQL database and table, on which we use the Aggregate functions.

Aggregate Functions in SQL

Below are the most frequently used aggregate functions in SQL.

a). Count()

It is used to count the number of rows in a table. It helps summarize data by giving the total number of entries. It can be used in different ways depending on what you want to count:

COUNT(*): Counts all rows.
COUNT(column_name): Counts non-NULL values in the specified column.
COUNT(DISTINCT column_name): Counts unique non-NULL values in the column.
Query:

COUNT(*) returns the total number of rows in the table, including rows with NULL values.
COUNT(Salary) counts only the rows where Salary is not NULL.
COUNT(DISTINCT Salary) counts unique non-NULL salary values, ignoring duplicates.

b). SUM()

It is used to calculate the total of a numeric column. It adds up all non-NULL values in that column for Example, SUM(column_name) returns sum of all non-NULL values in the specified column.

SUM(Salary) adds all non-NULL salary values to get the total salary amount.
SUM(DISTINCT Salary) adds only unique non-NULL salary values, avoiding duplicates.
NULL values are ignored in both SUM calculations.

c). AVG()

It is used to calculate average value of a numeric column. It divides sum of all non-NULL values by the number of non-NULL rows for Example, AVG(column_name) returns average of all non-NULL values in the specified column.

AVG(Salary) calculates the average of all non-NULL salary values.
AVG(DISTINCT Salary) computes the average only from unique non-NULL salary values.
Both ignore NULL values when performing the calculation.

d). MIN() and MAX()

The MIN() and MAX() functions return the smallest and largest values, respectively, from a column.

MAX(Salary) returns the highest non-NULL salary value from the Employee table.
MIN(Salary) returns the lowest non-NULL salary value from the Employee table.
Both functions ignore NULL values while determining the result.

How Analysts Translate Messy Data, DAX, and Dashboards into Action Using Power BI

Edwin Omondi — Mon, 09 Feb 2026 19:59:56 +0000

What is Microsoft Power BI?

Microsoft Power BI is a data visualization platform primarily for business intelligence purposes.

PowerBI stands for Power Business Intelligence and refers to a collection of software services, tools, and connectors that help you transform data from multiple sources into actionable insights.

Fig. 1.0 Power BI Interface

Fig. 1.1 Power BI Blank Report

What is Power BI used for?

Data Visualization & reporting

Create reports and dashboards that present data sets in multiple ways using visuals

Turn data into a wide range of different visuals, including pie charts, decomposition trees, gauge charts, KPIs, combo charts, bar and column charts, and ribbon charts.

Data Integration

Connect various data sources, such as Excel sheets, on-site data warehouses, and cloud-based data storage, and then transform them into business insights

Integrate Power BI with websites

Business Intelligence

Track key performance indicators (KPIs) and metrics in real time.

Use built-in AI and machine learning to make business predictions based on historical data

Collabortion & Sharing

Provide company-wide access to data, data visualization tools, and insights to create a data-driven work culture

Collaborate on workspaces and shared datasets

Financial Analysis

Create financial statements and balance sheets

Analyze sales performance and profit

Marketing Sales

Integrate Power BI with the CRM system to analyze customer data and use insights to improve customer experience

Analyze market trends and customer behavior to discover opportunities.

Step by Step Guide on how Analysts transform messy data to real business acton

Step 1: Understanding the Business Question:

What problem are we trying to solve?
What decisons needs to be made?

Examples:
Why are sales dropping?
Which region is underperforming?
Are costs growing faster than revenue?

This step is crucial. Without a clear question, dashboards are just pretty charts.

Step 2: Bring messy data into Power BI:

Data usually comes from many places: Excel, databases, databses & online systems - CRM, ERP

In Power BI, analysts load all the data together, then check for missing, duplicate, or incorrect values.

Step 3: Open Power Query - Where Cleaning Happens

Why this step matters
Power Query is where analysts prepare data once, so reports stay clean forever.

Power BI clicks

Click Transform Data

Power Query Editor opens

Typical cleaning actions

Remove duplicates

Select all columns → Home → Remove Rows → Remove Duplicates

Fix data types

Click column header → choose Date / Whole Number / Decimal

Handle missing values

Replace with “Unknown” or infer logically

Fix obvious errors

Flag negative prices

Cap extreme discounts

Step 4: Create a Staging Table - Clean Base

What this means
A staging table is just a cleaned version of raw data.

Why do analysts do this

Protects original data

Makes future refreshes safe

Avoids breaking dashboards later

Power BI action

Rename query to something like Sales_Staging

Apply all cleaning steps

Click Close & Apply

Step 5: Add Calculated Columns - Row-Level Logic

Now analysts add meaning row by row.

Examples: Revenue, Cost, Profit, Lead time

Power BI clicks

Go to Data View

Click New Column

Step 6: Build a Clean Data Model

What analysts check

Are tables connected correctly?

Do relationships make sense?

Power BI clicks

Go to Model View

Create relationships:

Date → Sales

Product → Sales

Region → Sales

Step 7: Write DAX Measures

This is where analysis becomes dynamic.

Why measures matter
They change automatically when you filter by:

Date, Region, Product, Channel

Step 8: Turn Measures Into Visuals

Now you build visuals with purpose.

Example 1: Is revenue growing?

Line chart

X-axis → Month

Values → Total Revenue

Example 2: Who performs better?

Bar chart

Axis → Region

Values → Total Profit

Example 3: Are discounts hurting margins?

Scatter chart

X → Discount %

Y → Margin %

Step 9: Add Slicers

Why slicers matter
They allow users to ask their own questions.

Power BI clicks

Select Slicer

Drag fields like:

Region

Date

Product Category

Salesperson

Conclusion:

Big Picture

Messy Data → Clean Data → DAX → Dashboard → Decision

That’s the analyst workflow.

Schemas & Data Modelling in Power BI

Edwin Omondi — Sun, 01 Feb 2026 20:59:38 +0000

![ ](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4em6iuxplrbtrwmusi74.jpg

Data Modelling in Power BI

Data modelling is the process of structuring your tables and defining relationships so Power BI can:

Aggregate data correctly

Filter data efficiently

Produce accurate measures

Perform fast, even with large datasets

Think of it as designing the blueprint of your report before decorating it with visuals.

Star Schema Overview

Star schema is a mature modeling approach widely adopted by relational data warehouses.

It requires modelers to classify their model tables as either dimension or fact.

Dimension Tables

Dimension tables describe business entities—the things you model. Entities can include products, people, places, and concepts, including time itself.
The most consistent table in a star schema is the date dimension table. A dimension table contains a key column (or columns) that acts as a unique identifier, and other columns. Other columns support filtering and grouping your data.

Fact Tables

Fact tables store observations or events, and can be sales orders, stock balances, exchange rates, temperatures, and more.
A fact table contains dimension key columns that relate to dimension tables and numeric measure columns.
The dimension key columns determine the dimensionality of a fact table, while the dimension key values determine the granularity of a fact table. For example, consider a fact table designed to store sales targets that has two dimension key columns, Date and ProductKey.
It's easy to understand that the table has two dimensions. The granularity, however, can't be determined without considering the dimension key values.
In this example, consider that the values stored in the Date column are the first day of each month.
In this case, the granularity is at the month-product level.

Generally, dimension tables contain a relatively small number of rows. Fact tables, on the other hand, can contain many rows and continue to grow over time.

Fig. 1.1

Normalization vs. denormalization

To understand some star schema concepts described in this article, it's important to know two terms: normalization and denormalization.

Normalization is the term used to describe data that's stored in a way that reduces repetitious data. Consider a table of products that has a unique key value column, like the product key, and other columns that describe product characteristics, like product name, category, color, and size. A sales table is considered normalized when it stores only keys, like the product key. In the following image, notice that only the ProductKey column records the product.

Fig. 1.2

If, however, the sales table stores product details beyond the key, it's considered denormalized. In the following image, notice that the ProductKey and other product-related columns record the product.

Fig. 1.3

When you source data from an export file or data extract, it's likely that it represents a denormalized set of data. In this case, use Power Query to transform and shape the source data into multiple normalized tables.

As described in this article, you should strive to develop optimized Power BI semantic models with tables that represent normalized fact and dimension data. However, there's one exception where a snowflake dimension might be denormalized in order to produce a single model table.

Star schema relevance to Power BI semantic models

Star schema design and many related concepts introduced in this article are highly relevant to developing Power BI models that are optimized for performance and usability.

Consider that each Power BI report visual generates a query that's sent to the Power BI semantic model. Generally, queries filter, group, and summarize model data. A well-designed model, then, is one that provides tables for filtering and grouping, and tables for summarizing. This design fits well with star schema principles:

Dimension tables enable filtering and grouping.

Fact tables enable summarization.

There's no table property that modelers set to set the table type as dimension or fact. It's in fact determined by the model relationships. A model relationship establishes a filter propagation path between two tables, and it's the cardinality property of the relationship that determines the table type. A common relationship cardinality is one-to-many or its inverse many-to-one. The "one" side is always a dimension table while the "many" side is always a fact table.

Fig. 1.4

A well-structured model design includes tables that are either dimension tables or fact tables. Avoid mixing the two types together for a single table. We also recommend that you strive to deliver the right number of tables with the right relationships in place. It's also important that fact tables always load data at a consistent grain.

Lastly, it's important to understand that optimal model design is part science and part art. Sometimes you can break with good guidance when it makes sense to do so.

Snowflake Dimension

A snowflake dimension is a set of normalized tables for a single business entity. For example, Adventure Works classifies products by category and subcategory. Products are assigned to subcategories, and subcategories are in turn assigned to categories. In the Adventure Works relational data warehouse, the product dimension is normalized and stored in three related tables

Fig. 1.5
If you use your imagination, you can picture the normalized tables positioned outwards from the fact table, forming a snowflake design.

Fig. 1.6
In Power BI Desktop, you can choose to mimic a snowflake dimension design (perhaps because your source data does) or combine the source tables to form a single, denormalized model table. Generally, the benefits of a single model table outweigh the benefits of multiple model tables. The most optimal decision can depend on the volumes of data and the usability requirements for the model.

When you choose to mimic a snowflake dimension design:

Power BI loads more tables, which is less efficient from storage and performance perspectives. These tables must include columns to support model relationships, and it can result in a larger model size.
Longer relationship filter propagation chains need to be traversed, which might be less efficient than filters applied to a single table.
The Data pane presents more model tables to report authors, which can result in a less intuitive experience, especially when snowflake dimension tables contain only one or two columns.
It's not possible to create a hierarchy that comprises columns from more than one table.
When you choose to integrate into a single model table, you can also define a hierarchy that encompasses the highest and lowest grain of the dimension. Possibly, the storage of redundant denormalized data can result in increased model storage size, particularly for large dimension tables.

Fig 1.7

Introduction to MS Excel for Data Analytics

Edwin Omondi — Sun, 25 Jan 2026 20:27:58 +0000

Introduction

:
**Microsoft Excel is one of the most used software applications of all time.
You can use Excel to enter all sorts of data and perform financial, mathematical, or statistical calculations.

**Data Analysis Process:
This is the science of analyzing a particular set of data to be used by an organization in making informed decisions.

There are various types of Data Analytics.
a) Descriptive Data - looks at historical data to summarize what already happened (Past Data - establish trends and patterns)
b) Diagnostic Data - this dwells in finding reasons behind outcomes. (causes and relationship - identify factors affecting results)
c) Predictive Analytics - uses existing data to forecast future outcomes (future possibilities -make informed decisions)
d) Prescriptive Analytics - recommends actions based on data insights (decision making - suggests best actions)

*Data analysis involves the following processes;
*
Data Collection
Data Processing
Data Ceaning
Data Analysis
Data Communication

*Tools used for Data Analysis
*Excel
Power BI
SQL
Python

***Data Analysis in Excel:*
_*a) Sort *-_ you can sort your Excel data by one or multiple columns. You can sort in either ascending or descending order.
Sort By One Column:
To sort by one column in Excel, execute the following steps;
**Click any cell in the column you want to sort

Fig. 1 Sorting by One Column

To sort in ascending order, on the Data tab, in the sort & filter group, click AZ

Fig. 2 Sorting by Ascending Order

b) Filter - Filter your Excel data to display records that meet a certain criteria.

Click any single cell inside a data set
On the data tab, in the sort & filter group, click filter

Fig. 3 Filter

c) Conditional Formatting - Use Conditional formatting in Excel to automatically highlight cells based on their content. Apply a rule or use a formula to determine which cells to format

_Highlight Cells Rules _

Fig. 4 Conditional Formatting

d) Pivot Tables - Pivot Tables are one of Excel's most powerful features. A pivot table allows you to extract the significance from a large, detailed data set.

Insert a Pivot Table - to insert a pivot table
Click on any single cell inside the data set
On the data set, in the tables group, click Pivot Table

Fig. 5 Pivot Tables

e) Pivot Charts - this is one of the most powerful pivot table features Excel has to offer.

Fig. 6 Pivot Chart

Understanding Git for Beginners: Version Control, Tracking Changes, Push & Pull Explained

Edwin Omondi — Sat, 17 Jan 2026 21:14:02 +0000

*What Is Version Control?
*
Version control is a system that helps you keep track of changes made to files over time.

Instead of manually saving multiple copies of a project with different names, version control allows you to:

Save different versions of your work

See what changed and when

Restore older versions if something goes wrong

Work confidently without fear of losing progress

In simple terms, version control acts like a history book for your project.

*What Is Git?
*
Git is a version control system used by developers to manage changes in their projects.

Git works on your local computer and helps you:

Monitor file changes

Save progress in organized steps

Keep a clear record of your work

Experiment safely with new ideas

Every time you save your progress in Git, it creates a snapshot of your project. These snapshots allow you to move back and forth between different versions.

*What Is GitHub and How Is It Related to Git?
*
GitHub is an online platform where Git projects are stored.

While Git works on your computer, GitHub:

Stores your project online

Acts as a backup for your work

Allows multiple people to work on the same project

Makes collaboration possible from anywhere in the world

Think of Git as the tool you use locally, and GitHub as the cloud where your work is shared and saved.

*How Git Tracks Changes
*
Git does not automatically save everything you do. Instead, it follows a structured process that gives you full control.

There are three main stages in Git:

_1. Working Area
_
This is where you make changes to your files. Nothing is saved yet.

_2. Staging Area
_
Here, you choose which changes you want Git to remember. This step allows you to review your work before saving it.

_3. Saved Version (Commit)
_
Once changes are saved, Git records them as a version in your project history.

This process helps prevent mistakes and keeps your project organized.

*What Does “Tracking Changes” Mean?
*
Tracking changes means Git keeps a detailed record of:

What files were changed

What was added or removed

When the change happened

Who made the change

This makes it easy to:

Understand your project’s history

Find errors

Work with others without confusion

For beginners, this is one of Git’s most powerful features.

*What Does “Push” Mean in Git?
*
Push means sending your saved work from your computer to GitHub.

When you push your work:

Your project is backed up online

Others can see your updates

Your changes become part of the shared project

You can think of pushing as uploading your work to the cloud.

*What Does “Pull” Mean in Git?
*
Pull means bringing changes from GitHub down to your computer.

Pulling is useful when:

You are working on more than one device

Other people have updated the project

You want the latest version of the work

Pulling ensures your local project stays up to date.

*Push vs Pull (Simple Explanation)
*
Push sends changes from your computer to GitHub

Pull brings changes from GitHub to your computer

Together, they keep everything synchronized.

Why Version Control Is Important for Beginners

Version control helps beginners by:

Preventing loss of work

Encouraging experimentation

Making mistakes easier to fix

Teaching professional development practices

Preparing you for real-world team projects

Even solo developers benefit greatly from using Git.

*Common Beginner Misunderstandings
*
Git does not automatically save your work

Pushing and pulling are different actions

Git and GitHub are not the same thing

You do not need to be an expert to use Git

Understanding these early makes learning easier.

*Final Insights:
*
Git may seem overwhelming at first, but the core ideas are simple:

Git tracks changes

Version control saves history

Push uploads your work

Pull updates your work

You don’t need to know everything at once. Learning Git step by step will make your development journey smoother and more professional.