Forem: Ruto Kipkirui Robert

Master SQL Joins and Window Functions

Ruto Kipkirui Robert — Mon, 02 Mar 2026 13:09:03 +0000

Introduction
Database management and querying practices cannot function without SQL joins and window functions. These are two distinct yet powerful mechanisms that facilitate data analysis and enable data scientists to retrieve the data they need efficiently. For instance, SQL joins combine two or more tables based on existing logical relationships and unique columns, including primary-foreign key relationships. Alternatively, a window function performs a calculation across a group of rows while keeping each row visible.
SQL Joins
Joins are SQL clauses that combine rows from one or more tables based on a related column.
SQL joins are crucial because they help.
a) Retrieve connected data stored across multiple tables.
b) Match table records based on standard columns.
c) Improve data analysis by combining related information.
d) Create meaningful result sets from separate tables.
Types of SQL Joins
INNER JOIN: Returns only rows that have matching values in both tables. It helps in combining records based on a related column.
a) Returning only matching rows from both tables.
b) Excluding non-matching data from the result set.
c) Ensuring accurate data relationships between tables.
Syntax:
SELECT table1.column1, table1.column2, table2.column1,... FROM table1 INNER JOIN table2 ON table1.matching_column = table2. matching_column;

LEFT (OUTER) JOIN: Returns all rows from the left table, and only the matched rows from the right table. It helps in:
a) Returning all records from the left table.
b) Showing matching data from the right table.
c) Displaying NULL values where no match exists in the right table.
Syntax: **
SELECT table1.column1,table1.column2,table2.column1,.... FROM table1 LEFT JOIN table2 ON table1.matching_column = table2.matching_column;
**RIGHT (OUTER) JOIN: Returns all rows from the right table, and only the matched rows from the left table. It helps in:
a) Returning all records from the right-side table.
b) Showing matching data from the left-side table.
c) Displaying NULL values where no match exists in the left table.
Syntax **
SELECT table1.column1, table1.column2, table2.column1,.... FROM table1 RIGHT JOIN table2 ON table1.matching_column = table2.matching_column;
**FULL (OUTER) JOIN: Returns all rows when there is a match in either the left or right table. It helps in:
a) Returning all rows from both tables.
b) Showing matching records from each table.
c) Displaying NULL values where no match exists in either table.
Syntax
SELECT table1.column1,table1.column2,table2.column1,.... FROM table1 FULL JOIN table2 ON table1.matching_column = table2.matching_column;

Window Functions
A window function is used to perform a calculation across a specific set of rows (the 'window' in question), defined by an OVER() clause.
Syntax
SELECT column_1, column_2, column_3, function() OVER (PARTITION BY partition_expression ORDER BY order_expression) as output_column_name FROM table_name
In this syntax:
• SELECT; defines the columns to be selected from the table_name table.
• function (); the window function applied.
• OVER; defines the partitioning and ordering of rows in the window.
• PARTITION BY; divides rows into partitions based on the specified partition_expression; if the partition_expression is not specified, the result set will be treated as a single partition.
• ORDER BY: define the order in which rows are processed within each partition; if the order_expression is not specified, rows will be processed in an undefined order.
• Finally, output_column_name is the name of your output column.
_N/B. Window functions are applied after the processing of WHERE, GROUP BY, and HAVING clauses. _

Types of SQL window functions

Aggregate window functions

a) AVG() returns the average of the values in a group, ignoring null values.
b) MAX() returns the maximum value in the expression.
c) MIN() returns the minimum value in the expression.
d) SUM() returns the sum of all the values, or only the DISTINCT values, in the expression.
e) COUNT() returns the number of items found in a group.
f) STDEV() returns the statistical standard deviation of all values in the specified expression.
g) STDEVP() returns the statistical standard deviation for the population for all values in the specified expression.
h) VAR() returns the statistical variance of all values in the specified expression; the OVER clause may follow it.
i) VARP() returns the statistical variance for the population for all values in the specified expression.

Ranking window functions

a) ROW_NUMBER() assigns a unique sequential integer to rows within a partition of a result set.
b) RANK() assigns a unique rank to each row within a partition with gaps in the ranking sequence when there are ties.
c) DENSE_RANK() assigns a unique rank to each row within a partition without gaps in the ranking sequence when there are ties.
d) PERCENT_RANK() calculates the relative rank of a row within a group of rows.
e) NTILE() distributes rows in an ordered partition into a specified number of approximately equal groups.

Value window functions

a) LAG() retrieves values from rows that precede the current row in the result set.
b) LEAD() retrieves values from rows that follow the current row in the result set.
c) FIRST_VALUE() returns the first value in an ordered set of values within a partition.
d) LAST_VALUE() returns the last value in an ordered set of values within a partition.
e) NTH_VALUE() returns the value of the nth row in the ordered set of values.
f) CUME_DIST() returns the cumulative distribution of a value in a group of values.

How Analysts Translate Messy Data, DAX, and Dashboards into Action Using Power BI

Ruto Kipkirui Robert — Mon, 09 Feb 2026 10:18:40 +0000

Introduction
Data analysts, data engineers, and data scientists often handle messy data characterized by duplicates, inconsistent attributes, and incomplete datasets. Power BI is a critical, widely used tool in modern business intelligence. Power offers detailed approaches for transforming raw data into engaging, interactive dashboards that provide actionable insights to stakeholders. Throughout this article, the focus is on exploring Power BI from the perspective of data cleaning, DAX measures, and the creation of interactive dashboards.
Data Cleaning
Data cleaning, shaping, and transformation of messy data are critical requirements before data analysis, dashboard preparation, and reporting in Power BI. Power BI is a crucial tool that bridges messy data with the actionable insights needed to inform every executive decision. However, to achieve a clean, analyzable dataset, Power BI provides a powerful tool, Power Query, that enables data analysts, data engineers, and data scientists to clean, transform, and shape the dataset before working with it.
Power Query
The Power Query Editor allows one to connect to and shape data sources based on immediate user needs. After detailed data shaping, it is loaded into Power BI Desktop for analysis, dashboard preparation, and reporting.
To open Power Query Editor,
a) Click Home
b) Transform Data to open the Power Query Editor.
c) Select data source.
d) Apply the transformation needed.
Typical Data Cleaning Approaches

Remove Unnecessary Columns; Datasets can have columns that are not needed for the data analysis processes. In Power Query Editor, use the " Remove Unwanted Columns” option. The objective is to optimize query performance.
Rename Columns. Rename columns to enhance the clarity of the dataset. In the Power Query Editor, right-click the column > Click Rename
Split Columns; In instances where data are stored in single columns, it is prudent to split them into multiple columns for enhanced query performance. In the Power Query edition, Select the Column > Click Split Column (by delimiter or number of characters).
Merge Columns**: Combine different columns to achieve a specific objective. In Power Query Editor, select multiple columns > Click Merge Columns.
Change Text Cases; For Consistency, ensure text is uniformly formatted. In Power Query Editor, select a text column > Click Transform > Format > Uppercase/Lowercase/Capitalize Each Word
Handle Missing Data (Nulls);For enhanced Power BI reports. **

Remove Null Values**; Select the column > Click Remove Rows > Remove Blank Rows
*
Replace Nulls*; Select the column > Click Transform > Replace Values
Handle Duplicates; Select the column > Click Remove Duplicates
DAX
Data Analysis Expressions (DAX) is Power BI’s formula and query language for creating and applying custom measures, calculated columns, and tables. DAX is efficient for complex calculations beyond MS Excel, including row and filter context.
Measures
Core elements of DAX are crucial for aggregating data.
Offers dynamic reports with filter and applied slicer support.
Examples of measures include SUM, AVERAGE, and COUNT
Measure calculation depends on the correct data types and clean data.
Calculated Columns
Creates new fields for analysis based on derived values from the existing columns.
Derived values are calculated row by row and stored in the data model.
Crucial in instances where reports do not depend on filters and applied slicers.
Time Intelligence
DAX provides time intelligence functions for analyzing and understanding data using time-based sets.
Example time intelligence functions include TOTALYTD, SAMEPERIODLASTYEAR, and DATEADD, which enable period comparisons.
Generates actionable insights from a time-series dataset.
Best Practices to Optimize DAX Formulas
a) Ensure that every measure is applied to the calculated columns.
b) If possible, avoid nested operations.
c) Use simplified relationships.
d) Optimize Cardinality.

_Dashboards _

Power BI can analyze and convert data into desired interactive visuals for reporting. A Power BI dashboard is a one-page chart summary designed to be explored interactively by the target users. In contrast, Reports are detailed, multipage, interactive documents for in-depth analysis and insights.

How To Create Dashboard
a) Import Data
b) Explore Data
c) Choose the correct Chart based on the questions
d) Select the chart type based on the insight required:
Question

Type Best Visualizations

1. Comparison (Compare values across categories) use Bar/Column Chart, Treemap, Table
2. Trend (Trends over time) use Line Chart, Area Chart, Ribbon Chart
3. Part-to-Whole (Composition) use Donut Chart, Pie Chart, Stacked Bar
4. Relationship (Correlation) Use Scatter Plot, Bubble Chart
5. Geographical (Location data) Use Map, Filled Map, Shape Map
6. Key Metric (Single number) use Card, KPI Visual, Gauge
7. Process (Steps/Flow) use Funnel Chart, Waterfall Chart

*Effective dashboards answer three key questions: *
a) What happened?
b) Why did it happen?
c) What action should be taken?
Conclusion
Translating messy data before actual data analysis is critical in every analytics process. Power BI offers a clean platform for transforming data and generating actionable insights that inform every organizational decision-making process.

#Data analytics #Data Science #Power Bi

Ruto Kipkirui Robert — Sun, 01 Feb 2026 19:23:21 +0000

Schemas And Data Modelling in Power BI
Introduction
• This article explores data modeling concepts often used to achieve high performance and accurate data analytics in Power BI.
The article focuses on key schema types in Power BI and on how detailed data modelling improves reporting accuracy.
• Data modelling refers to the procedures used by data analysts, data scientists, and data engineers to structure data in tables based on defined relationships and a logical framework. The objective of data modelling is to achieve effective data cleansing, build accurate calculations, and prepare detailed business intelligence reports.
• A schema refers to the structure and defined relationships of data within a designed data model. Schemas shape how data analysis interacts with the database, influencing dashboard load times and decision-making efficiency.
• The two primary database schemas in Power BI are the star and snowflake schemas.
Star Schema
• A star schema is a data modeling approach in which a central fact table is directly connected to multiple dimension tables.
• A star schema consists of a fact table and multiple dimension tables.
• Tables in a star schema are connected via one-to-many relationships. Every dimension table is on the ‘one’ side, while the fact table is on the ‘many’ side, as indicated in the table below.

In Star Schemas;
_Dimension tables _
• Represent all business entities, the things being modelled. For example, in a product, place, or people dimension table, there is a key column that serves as a unique identifier. Other columns are used for filtering and grouping data.
• Dimension tables hold data based on the defined categorical fields in the fact table.
• It does contain duplicates.
_Fact tables
_• It’s the man of the data.
• store quantitative transactional data, such as sales orders, quantities sold, and related details.
• The fact table contains dimensionally columns that relate to the dimension tables and numeric measure columns.
• Fact tables have dimension key columns that directly relate to the dimension tables. Here, the dominant columns determine the table's dimensionality. Alternatively, dimension key values determine the table's granularity.
• Facts are likely to have duplicates.

Key Concepts for Star Schemas.
Normalization;

• Splitting data into multiple related tables reduces duplication and improves data integrity.
• “One fact, one place.”
• Used to describe how data is stored to ensure that there is no immediate repetition.
• For example, in a sales table with a product key, it is considered normal because it stores only keys.
Denormalization;
• The process of combining tables to reduce joins and simplify analysis.
• “Put related data together.”
Snowflake Schema
• Snowflake refers to a data modelling approach in which a central fact table is connected to multiple dimensions, with one or more dimension tables subdivided into sub-dimension tables.
• A snowflake schema consists of a single fact table and multiple dimension tables.
• Snowflake schemas are unique because dimension tables are normalized, i.e., they are broken down into smaller sub-tables.

Key Difference between Star Schema and Snowflake Schema

Relationships
• Relationships determine how Power BI connects and interacts with tables.
• A relationship definition shows how tables are connected using key columns.
• Typically, power employs one-to-many relationships, where dimension tables are on the ‘one’ side and fact tables are on the ‘many’ side.
• Relationships are characterized by filter directions that control how data flows between tables.
• Proper relationships enhance accurate aggregations and consistent reports.
Why Good Modelling Is Critical for Performance and Accurate Reporting

Enhance query performance.
A star schema compresses data.
Fact and dimension tables minimize data duplicates and improve model efficiency.
Correct relationships enhance accurate aggregations.

MS Excel Data Analysis: Foundational Basics

Ruto Kipkirui Robert — Sun, 25 Jan 2026 12:57:20 +0000

Introduction

Data analysis is one of the in-demand skills in modern technology-driven organizational setups.
MS Excel is a powerful tool for data analysis that facilitates
 Data processing
 Data manipulation
 Data visualization.

Data Preparation

Data cleaning is a critical requirement before any data analysis. MS Excel is a fundamental tool for ensuring that missing values and duplicates are corrected.

How to Clean Data in Excel

Remove Duplicates: Use Data > Remove Duplicates to eliminate redundancy.
 Use TRIM and CLEAN Functions:
 TRIM removes unnecessary spaces.
 CLEAN removes non-printable characters.
 Sort and Structure Data: Convert your dataset into an Excel Table (Insert > Table) for better organization.

Basic Data Analysis Methods.

Charts and Visualization
Charts make it easier to identify trends and relationships in your data:
• Select your dataset and go to Insert > Charts.
• Choose from bar charts, line charts, or pie charts.
• Customize the chart for clarity and impact.
Conditional Formatting
Go to Home > Conditional Formatting.
Select any column from the table. Here we are going to select a Quarter column. After that, go to the Home tab on the ribbon, then in the Styles group choose Conditional Formatting, and then in the Highlight Cells rule select the Greater Than option.
Then a greater than dialog box appears. First, write the quarter value and then select the color.
Sorting Data
Sorting data makes it easier to immediately view and comprehend your data, organize and locate the facts you need, and ultimately help you make better decisions.
A list of names may be arranged alphabetically, a list of sales numbers can be arranged from highest to lowest, or rows can be sorted by colors or icons.
Using text, numbers, dates, and times, you can sort data in one or more columns by custom list, format, cell color, font color, or icon set.
Step 1: Select Data > Data Tab> Sort
Select any column from the table. Here we are going to select a Month column. After that, go to the data tab at the top of the ribbon, then in the Sort & Filter group, choose Sort.

Step 2: Select the Order
Then a sort dialog box appears. First, select the column, then choose Sort on, and then Order. After that, click OK.

Step 3: Preview Results
As you can see, the months column is now arranged alphabetically.

Filtering Data Filtering to pull information from a given Range or table that satisfies the specified criteria in Excel data analysis. Step 1: Select your dataset and go to Data > Filter Select any column from the table. Here we are going to select a Sales column. After that, go to the data tab at the top of the ribbon, then in the Sort & Filter group, choose Filter.

Step 2: Select the Filter Option
The values in the sales column are then shown in a drop-down box. Here, we will select several filters and then use the greater-than operator.

Step 3: Select the Options
Then a custom auto-filler dialog box appears. Here, we are going to set the sales value to greater than 70, then click OK.

Step 4: Preview Results
As you can see, only rows greater than 70 are shown.

Essential Excel Functions for Data Analysis

=AND -Returns TRUE or FALSE based on two or more conditions
=AVERAGE -Calculates the average (arithmetic mean)
=AVERAGEIF-Calculates the average of a range based on a TRUE or FALSE condition
=AVERAGEIFS-Calculates the average of a range based on one or more TRUE/FALSE conditions
=CONCAT-Links together the content of multiple cells
=COUNT-Counts cells with numbers in a range
=COUNTA-Counts all cells in a range that have values, both numbers and letters =COUNTBLANK-Counts blank cells in a range
=COUNTIF-Counts cells as specified
=COUNTIFS- Counts cells in a range based on one or more TRUE or FALSE conditions =IF-Returns values based on a TRUE or FALSE condition
=IFS-Returns values based on one or more TRUE or FALSE conditions
=LEFT-Returns values from the left side of a cell
=LOWER-Reformats content to lowercase
=MAX-Returns the highest value in a range
=MEDIAN-Returns the middle value in the data
=MIN-Returns the lowest value in a range
=MODE-Finds the number seen most times. The function always returns a single number
=OR-Returns TRUE or FALSE based on two or more conditions
=RIGHT-Returns values from the right side of a cell
=SUM -Adds together numbers in a range
=SUMIF-Calculates the sum of values in a range based on a TRUE or FALSE condition
=SUMIFS-Calculates the sum of a range based on one or more TRUE or FALSE conditions
=TRIM-Removes irregular spacing, leaving one space between each value
=VLOOKUP-Allows vertical searches for values in a table
=VLOOKUP(lookup_value, table_array, col_index_num, [range_lookup])
• Lookup_value: Choose the cell that will be used to input the search criteria.
• Table_array: The whole table range, which includes every cell.
• Col_index_num: The information being searched for. The column's number, starting from the left, is the input.
• Range_lookup: FALSE if text (0), TRUE if numbers (1).

Git and Git Bash for Beginners

Ruto Kipkirui Robert — Sun, 25 Jan 2026 12:52:51 +0000

Git and Git Bash for Beginners
Introduction to Git Bash
Git Bash is a command-line tool that provides Git command-line functionality.
It enables developers to work in a Bash-like terminal environment while efficiently managing Git repositories.
It supports core Git operations such as cloning repositories, committing changes, pushing and pulling updates, and managing branches.
Overview of Git
Git is a distributed version control system (DVCS).
Git is a tool that helps you:
• save and manage different versions of your files and code.
• work with others, keep track of changes, and undo mistakes.
Git allows each developer to maintain a complete copy of a repository, including its full history, on their local machine.
This decentralized model improves performance, reliability, and collaboration.
Developers can work offline, experiment freely, and merge changes efficiently.
Why Use Git Bash?
Git Bash is widely used because of its compatibility, flexibility, and power.
It is fully compatible with Git and supports all Git commands.
Git Bash also offers a familiar environment for users transitioning from Linux or macOS to Windows, reducing the learning curve.
Installing Git Bash on Windows
Installing Git Bash involves downloading the Git for Windows installer and following a guided setup process. Users can select components, choose an installation directory, and complete the installation. Once installed, Git Bash can be launched from the Start menu or desktop shortcut.
Basic Git Bash Commands
Git Bash supports Git commands that help users navigate directories, manage files, and control version history.
Common Navigation Commands:

ls – Lists files and directories
cd – Changes the current directory
pwd – Displays the current working directory Common Git Commands:
git init – Initializes a Git repository
git status – Displays repository status
git add. – Stages all changes
git commit -m "message" – Commits changes
git log – Displays commit history Using Git Bash: Basic Workflow Using Git Bash begins by configuring Git with a username and email address. After configuration, users navigate to a project directory, initialize a repository, stage files, and commit changes. Key Commands: git config --global user.name "Your Name" git config --global user.email "you@example.com" Connecting Local Repositories to GitHub Git Bash allows users to link local repositories to remote GitHub repositories. This enables pushing local changes to GitHub and pulling updates from collaborators. Key Commands: git remote add origin git push origin master Branch Management in Git Bash Branches allow multiple developers to work on different features independently. Git Bash supports creating, switching, listing, and deleting branches. Key Commands:
git branch – Lists branches
git branch branch_name – Creates a new branch
git checkout -b branch_name – Creates and switches to a branch Merging and Cloning Repositories Merging combines changes from one branch into another, ensuring code integration. Cloning creates a local copy of a remote repository. Key Commands: git merge branch_name git clone Undoing Commits Git Bash allows users to modify the most recent commit using the --amend option. This is useful when files are missed or commit messages need correction. Key Command: git commit --amend Conclusion Git Bash is a powerful and flexible tool that enables. It supports version control, collaboration, automation, and branch management. With its combination of Git commands and Bash utilities, Git Bash remains an essential tool for developers seeking efficiency and control in modern software development environments.