Forem: Jason Ndalamia

Getting Started with Python: A Practical Introduction for Beginners

Jason Ndalamia — Wed, 06 May 2026 10:16:08 +0000

Welcome to the world of Python! If you are just starting your programming journey, you have chosen the perfect language. This beginner-friendly guide will teach you the fundamentals of Python in a simple, practical, and engaging way.

1. Introduction

What is Python? Python is a powerful, flexible, and dynamically typed programming language created by Guido van Rossum and first released in 1991. It is widely considered one of the most loved and used programming languages in the world.

Why is it Popular? Imagine you are cooking: would you rather follow a long, complex recipe full of jargon, or a simple one written in plain English? Python is designed to be highly readable, looking almost like spoken English rather than complex machine code. It allows you to write fewer lines of code to achieve the same result as other languages, like Java.

Python is highly versatile and serves as a "multi-tool" for a wide range of real-world scenarios. It is used for web development (powering apps like Instagram and YouTube), data analysis, AI, machine learning, and system scripting (automating repetitive computer tasks). It has even been used by NASA for rocket logic!

2. Installation

Before writing code, you need to set up Python on your computer.

Step-by-Step Guide:

Download Python: For Windows users, you can safely download the latest version (like Python 3.12) directly from the Microsoft Store. (Note: You can also download it from the official website at (https://www.python.org/downloads/), which is standard industry practice, though not explicitly linked in the provided sources. Please independently verify this URL.
Install: Run the downloaded installer. It is critical during this step to ensure Python is added to your system's "PATH" environment variable, which allows your system to recognize Python commands in your terminal.

Verify Installation: Once installed, open your Command Prompt (CMD) or terminal. Type the following command and press Enter to verify your installation:
python --version
(Note for Mac and Linux users: You may need to use python3 --version.)

3. Your First Program: Hello World

In Python, writing your first program is incredibly simple and avoids confusing symbols or extra fluff. Let's write a program that displays a message on the screen.
print("Hello, World!")

What the code does: Think of the print() command as a megaphone. You are giving Python a clear, direct instruction: "Hey, shout this message on the screen!". Because Python is an interpreted language, it runs your code line by line and immediately outputs the result.

4. Comments in Python

Comments are notes left in the code for humans to read. Python ignores them when running the program, but they are crucial for explaining what your logic does.

Single-Line Comments: You can create a single-line comment by using the hash symbol #. As a best practice, try to keep your comment lines from exceeding 72 characters.

# This line is ignored by Python
print("Hello!")  # You can also place comments after code

Multi-Line Comments: For longer explanations, you can use triple quotes (""") to create a multi-line string, which Python treats as a multi-line comment when not assigned to a variable.

"""
This is a
multi-line comment.
It is great for long explanations!
"""

5. Variables

Imagine your brain is a giant notebook; when you learn something new, you store it under a label. In Python, variables act as labelled boxes or containers where you store data to open and use later.

Rules for Naming Variables:

Must start with a letter (a–z, A–Z) or an underscore _ (e.g., _my_age).
Cannot start with a number (e.g., 1user is invalid).
Can only contain alphanumeric characters and underscores.
Names are case-sensitive (name and NAME are completely different).
Tip: Use descriptive names in snake_case (e.g., user_name instead of just x).

Examples of Different Data Types: Because Python is dynamically typed, you don't need to specify the data type—Python figures it out automatically based on the value you assign.

Integer (int) - Whole numbers like counting apples
age = 25

Float (float) - Decimal numbers like measuring height
height = 5.9

String (str) - Text enclosed in single or double quotes
name = "Alice"

Boolean (bool) - True or False values, like a light switch (ON/OFF)
is_logged_in = True

6. Operators

Operators allow you to perform calculations or compare data in Python.
Arithmetic Operators: Used to perform standard mathematical calculations.

a = 10
b = 3
print(a + b)   # Addition: outputs 13
print(a - b)   # Subtraction: outputs 7
print(a * b)   # Multiplication: outputs 30
print(a / b)   # Division: outputs 3.333...

Comparison Operators: Used to compare two values. They act as "Yes/No" questions and always result in a Boolean (True or False).

x = 10
y = 9
print(x > y)   # Greater than: True
print(x == y)  # Equal to: False
print(x != y)  # Not equal to: True

Logical Operators: Used to check multiple conditions at once using and and or.

age = 20
has_ticket = True`
# 'and' means both conditions MUST be True`
if age >= 18 and has_ticket:
    print("Entry allowed.")
else:
    print("Entry denied.")

Conclusion

You've just taken your first steps into Python! We covered how Python's readable syntax makes programming feel like writing English, how to install and run your first "Hello, World!" program, and how to use foundational concepts like variables, data types, and operators.

Integrating SQL Databases with Power BI for Advanced Analytics: A Complete Guide

Jason Ndalamia — Sun, 03 May 2026 10:04:07 +0000

Power BI is a powerful business analytics service developed by Microsoft that empowers users to visualise data and share interactive dashboards across their organisation. While Power BI can handle data from various sources, its true potential is unleashed when connected to robust data sources like SQL databases.

SQL databases—such as PostgreSQL, MySQL, and SQL Server—are the industry standard for storing and managing structured analytical data. They offer ACID compliance for reliable transaction processing, making them the perfect backbone for managing critical business information.
In this guide, we will walk you through connecting Power BI to both local and cloud PostgreSQL databases, modelling your data, and leveraging SQL skills for better reporting.

1. Connecting Power BI to a Local PostgreSQL Database

Many data analysts build prototypes using a local PostgreSQL database before deploying them to a production environment. Here is how to establish that connection:

Step 1: Open Power BI Desktop and Select Get Data

Launch Power BI Desktop. On the Home ribbon, click the Get Data button to open the data import dialog.

In the Get Data window, expand Database on the left and select PostgreSQL database.

Step 2: Configure the PostgreSQL Connection

In the connection dialog, you will be prompted for your server and database details. For a local machine, enter localhost (or 127.0.0.1) in the Server field, and type the name of your database (e.g., postgres or sales_db) in the Database field.

You can choose your Data Connectivity mode here—Import is typically recommended to copy tables into Power BI for fast, offline queries. Click OK.

Step 3: Authenticate and Load Tables

When prompted, select Basic authentication and enter your PostgreSQL username and password.

Once authenticated, the Navigator window will display all available schemas and tables. You can click each table to preview its data. Select the tables you want to import (for example, customers, products, sales, and inventory).

Click Load to import them directly or Transform Data to make changes first. Power BI will load the data into its data model.

Step 4: Verify Loaded Tables

After loading, the selected tables appear in the Power BI interface. The Data/Report view now lists the imported tables in the Fields pane on the right.

2. Connecting Power BI to a Cloud Database (Aiven PostgreSQL)

When using a managed cloud database like Aiven PostgreSQL, the process is similar but adds a security step.

Step 1: Gather Aiven Connection Details

In your Aiven web console, open the Overview page. Copy the Host name, Port, Database name, User name, and Password. Also note the SSL mode (usually "require").

Step 2: Download and Install the SSL Certificate

Aiven enforces encrypted (TLS) connections. Find the CA certificate link on the Overview page and download the ca.pem file.

To install the certificate on Windows:

Press Win + R, type certmgr.msc, and press Enter.
Expand the Trusted Root Certification Authorities folder.
Right-click Certificates, select All Tasks > Import.

Browse to select the ca.pem file (change file type to "All Files").
Ensure it is placed in the Trusted Root Certification Authorities store and click Finish.

Step 3: Connect from Power BI Desktop

Back in Power BI Desktop, go to Get Data > PostgreSQL database. Enter the Server as host:port (e.g., pg-instance.aivencloud.com:12345) and the Database name.

Step 4: Load Cloud Tables

Enter the Aiven Username and Password. If the SSL certificate was installed correctly, the Navigator will appear. Select your tables and click Load.

3. Loading Tables and Data Modeling

Once connected, you must define how these tables interact through data modeling.
A standard approach is the star schema, where a central fact table (like sales) connects to surrounding dimension tables (like customers and products). These links are formed by joining Primary Keys and Foreign Keys, creating a one-to-many relationship.

Why are relationships important?
Proper relationships allow Power BI to aggregate and filter metrics correctly. Without them, visuals could return incorrect numbers because the software wouldn't know how to join the data points across different tables.

4. Why SQL Skills Matter for Power BI Analysts

While Power BI's drag-and-drop features are incredibly powerful, foundational SQL skills separate good analysts from great ones. SQL allows you to:
Retrieve Data: Pull only specific columns to reduce memory consumption.
Filter Datasets: Use a WHERE clause to pre-filter data at the source, speeding up loading times.
Perform Aggregations: Use SUM, COUNT, or GROUP BY to push heavy calculations to the database engine.
Prepare and Shape Data: Handle null values, cast data types, and join tables into a single view before importing.

Conclusion

Connecting Power BI to SQL databases unlocks the highest level of business intelligence. By combining Power BI's visual capabilities with SQL's structural precision, analysts can build trustworthy, lightning-fast dashboards that drive strategic decisions.

Beyond the Basics: 5 Game-Changing Secrets of SQL Joins and Window Functions

Jason Ndalamia — Mon, 02 Mar 2026 07:05:39 +0000

1. Introduction: The Data Relationships Hook

Think of a database as a digital filing cabinet. In this architecture, information is organized into drawers—or schemas—such as HR, Finance, or Sales. Within these drawers sit the documents, which represent our individual tables.

While this structure ensures clean organization, real-world business intelligence rarely lives in a single drawer. To answer strategic questions, you must bridge the gaps between disparate tables without creating a "data mess". Whether you are joining sales records with inventory levels or HR data with finance budgets, mastering these relationships is what separates a basic query-writer from a Senior Architect. This guide distills complex SQL operations into high-impact takeaways, focusing on maintaining relational integrity while scaling your analysis.

2. The Horizontal vs. Vertical Divide: Joins vs. Unions

The most fundamental distinction in data architecture is how you choose to expand your result set.

Joins combine tables horizontally. They add columns based on a shared key (like a Primary or Foreign Key), making the result set "wider".
Unions combine tables vertically. They stack rows from one dataset on top of another, making the result set "longer".

For a UNION to be architecturally sound, the query must meet three strict structural requirements: the columns must be in the same order, have the same count, and—most importantly—possess matching data types. As an architect, you must remember that the database engine is indifferent to your column aliases but ruthless regarding data types.

"The database doesn't care about naming but cares about data types in Unions... If a VARCHAR is matched with an INT, the database will throw a mismatch error." — Data with Baraa

The Mental Model: Distinct vs. All From a strategy perspective, think of UNION as a DISTINCT operation for stacked rows; it automatically removes duplicates. If you need the "data as is" without the overhead of deduplication, UNION ALL is your preferred tool.

Sample Query: Vertical Stacking

-- Combining customer and employee names into a single master list
SELECT first_name, last_name FROM customers
UNION
SELECT first_name, last_name FROM employees
LIMIT 100;

3. The "Workhorse" Joins: Why Left is Often Better than Right

In relational logic, the Inner Join is the default state. When you use the JOIN keyword, SQL is explicitly looking for matching values in the ON statement. If a record in the first table—for example, a customer who hasn't placed an order yet—has no match in the second table, that record "disappears" from the final result set.

For high-level reporting where data integrity is paramount, the Left Join is the industry workhorse. It preserves every record from the left (first) table and fills missing data from the right table with NULL values. This ensures you don't accidentally drop crucial business entities (like customers or products) simply because they lack transaction history.

Architectural Insight: The Right Join is functionally redundant. Any Right Join can be reframed as a Left Join by simply reordering the tables. In production environments, Left Joins are the standard because they align with a left-to-right reading flow, making queries significantly easier to audit and maintain.

Sample Query: Production-Ready Left Join

-- Retrieving all customers and any associated event data
SELECT
  c.customer_id,
  c.first_name,
  e.event_name
FROM customers AS c
LEFT JOIN events AS e
  ON c.customer_id = e.customer_id
WHERE c.customer_id IS NOT NULL
LIMIT 500;

4. The Cartesian Chaos: Understanding the Cross Join

A Cross Join represents the "Cartesian Product" of two tables. Unlike other joins, it ignores matching values entirely, attaching every row of the second table to every row of the first. If you join a 1,000-row table with a 500-row table, you will generate a massive result set of 500,000 rows.

In standard relational reporting, Cross Joins are often avoided because they lack the primary/foreign key bond that defines logical relationships. Joining a car table and an item table by a shared color attribute "wouldn't make any sense" for data integrity. However, from a strategic standpoint, Cross Joins are powerful for generating permutations—such as creating a master grid of every possible product-color combination for an inventory audit.

5. Window Functions: Grouping Without the "Collapse"

The true "game-changer" for intermediate learners is the OVER() clause. Traditional GROUP BY operations "collapse" your data, rolling multiple rows into a single aggregate level. While useful for summaries, this loses the individual row detail.

Window Functions allow you to perform calculations across a result set while keeping every unique row intact. This allows you to view an individual's salary right next to the department average, or calculate a Rolling Total.

Strategic Use Case: The Rolling Total In finance and healthcare, tracking cumulative sums is vital. By adding an ORDER BY clause inside the OVER() window, you transform a static sum into a running balance.

Sample Query: Rolling Totals and Averages

-- Calculating a cumulative salary sum and a static department average
SELECT
  first_name,
  gender,
  salary,
  SUM(salary) OVER(PARTITION BY gender ORDER BY employee_id) AS rolling_total,
  AVG(salary) OVER(PARTITION BY gender) AS avg_dept_salary
FROM employees;

6. The Ranking Trio: Row Number, Rank, and Dense Rank

Sequencing data is a core requirement for leaderboards and performance tracking. SQL offers three nuances for handling ties within a window:
•Row Number: Assigns a unique, sequential integer to every row (1, 2, 3, 4). Even in the event of a tie, the numbers will not repeat.
•Rank: Assigns the same number to ties but skips the next position based on the count of duplicates. If two rows tie for 1st, the next is 3rd (1, 1, 3). This is "positional" ranking.
•Dense Rank: Assigns the same number to ties but keeps the next number sequential (1, 1, 2). This is "numerical" ranking.
Architectural Preference: DENSE_RANK is typically preferred for professional reporting. It ensures there are no "gaps" in your leaderboard, maintaining a clean hierarchy regardless of how many entities share the same value.7.

Conclusion: Levelling Up Your Query Game

Moving from a beginner to an advanced analyst is less about learning syntax and more about understanding the logic of relationships. Every time you approach a new dataset, you must make a strategic choice: do you need to collapse your data into an aggregate summary with a Join or Group By, or do you need to partition it with a Window Function to maintain row-level detail?

The absolute grounding rule of SQL is that it is a language of relationships. By mastering the nuances of how data stacks vertically via Unions or expands horizontally via Joins, you ensure the relational integrity of your architecture.

How Analysts Translate Messy Data, DAX, and Dashboards into Action Using Power BI

Jason Ndalamia — Sun, 08 Feb 2026 12:07:03 +0000

In the realm of business intelligence, the distance between a raw spreadsheet and a strategic decision is bridged by the data analyst’s technical workflow. Using Power BI, analysts do not merely report numbers; they architect a system that transforms chaotic inputs into clear, actionable insights. This process follows a rigorous path: harmonising messy data, structuring it for performance, applying business logic through DAX, and delivering clarity via interactive dashboards.

1. Taming the Chaos: From Messy Data to Trusted Information

Real-world data is rarely ready for immediate analysis. It arrives full of inconsistencies that can break calculations and skew results. Before any visualisation occurs, analysts use Power Query to clean and transform this raw material into trusted information.
• Harmonising Data: A common issue is the presence of "pseudo-blanks"—text entries like "NA," "error," "blank," or "not provided" mixed into columns. Power BI reads this as valid text rather than missing values. Analysts must use the "Replace Values" function to harmonise these into a single standard category, such as "unknown," to ensure accurate categorisation without deleting potentially valuable raw data.
• Ensuring Precision: Small formatting errors can lead to duplication. For instance, "Kenya " (with a space) and "Kenya" are treated as different values. Analysts use the TRIM function to remove leading and trailing whitespace, ensuring that categories aggregate correctly.
• Data Typing: Attempting to sum a column will fail if the data type is set to text. Analysts must rigorously define columns—setting revenue to "Decimal Number" for calculation while keeping identifiers like phone numbers as "Text" to prevent accidental aggregation.

2. The Blueprint for Speed: The Star Schema

A major pitfall in data management is the "flat table"—a single, massive spreadsheet containing every detail. This structure leads to duplication, wasted memory, and maintenance nightmares.
To solve this, analysts employ the Star Schema, a modelling technique that separates data into two distinct types:
• Fact Tables: These contain transactional metrics (e.g., Sales, Quantity, Total Revenue) and sit at the centre of the model.
• Dimension Tables: These contain descriptive attributes (e.g., Customers, Products, Stores) and surround the fact table.
This structure allows for "write once, use many" efficiency. When a store relocates from one city to another, the analyst updates a single row in the Dimension table, rather than millions of rows in the Fact table. This model ensures that when stakeholders ask complex questions, the relationships between tables allow filters to flow correctly, providing accurate answers instantly.

3. The Engine of Analysis: DAX Measures and Logic

Once the data is structured, DAX (Data Analysis Expressions) is the language used to extract business logic. Analysts distinguish between Calculated Columns (row-by-row logic) and Measures (dynamic aggregations) to answer specific business questions.
• Automating Business Logic: Analysts use logical functions like IF and SWITCH to automate categorisation. For example, a nested IF statement or a SWITCH function can scan phone number prefixes (e.g., 254, 256) and automatically classify the country of origin as Kenya or Uganda.
• Time Intelligence: Business decisions rely heavily on historical context. Using time intelligence functions like DATEADD and SAMEPERIODLASTYEAR inside a CALCULATE function, analysts can generate metrics like "Revenue Last Month" or "Revenue Last Year”. This shifts the context of the data, allowing a manager to instantly see if performance is trending up or down compared to previous periods without manual recalculation.
• Handling Complexity: Advanced iterators like SUMX allow for calculations that require row-by-row evaluation before aggregating, such as multiplying yield by market price for every single transaction to get a precise total revenue.

4. Visualising the Story: From Grids to Insights

A dashboard is not just a collection of charts; it is a tool for decision-making. Analysts select specific visuals to answer specific questions, ensuring the report is intuitive for non-technical stakeholders.
• Trends and Comparisons: To show how revenue evolves over time, analysts use Line Charts or Area Charts, which emphasise volume and trends. For comparing categories, such as revenue by county, Column Charts (vertical) or Bar Charts (horizontal) are used.
• Correlations: To test hypotheses, such as "Does higher profit correlate with higher revenue?", analysts use Scatter Charts. If the bubbles trend upward, it indicates a positive correlation, validating the business strategy.
• Managing High-Volume Data: When dealing with many categories (e.g., revenue by county and then by crop type), standard pie charts become cluttered. Analysts use Tree Maps or Decomposition Trees to visualise hierarchies and drill down into the data to understand exactly why a number is high or low.

5. The Executive View: The Dashboard

The final output is the Dashboard—a one-page summary designed to answer the most important questions at a glance.
• Immediate Health Checks: Critical numbers (Total Profit, Total Yield) are placed at the top using KPI Cards or Multi-row Cards. This ensures that decision-makers see the most vital metrics immediately.
• Interactivity: Static reports limit discovery. Analysts add Slicers to allow users to filter the entire dashboard by specific segments, such as "County" or "Crop Type." This transforms a generic report into a tailored tool for specific regional managers.
• AI-Driven Insights: Tools like Q&A allow users to type questions in plain English (e.g., "Total yield by crop type") and receive an instant visual answer, bridging the gap between technical data models and ad-hoc business inquiries.

Conclusion

By mastering these steps—cleaning data in Power Query, modelling with Star Schemas, calculating with DAX, and visualising in Power BI—analysts transform raw, messy data into a coherent narrative that drives real-world business action.

The Blueprint of Intelligence: Mastering Data Modelling and Schemas in Power BI

Jason Ndalamia — Sun, 01 Feb 2026 09:00:53 +0000

In the realm of business analytics, creating a visually stunning dashboard is often the final step of a much deeper process. The true backbone of every successful Power BI solution is data modelling.

Data modelling is the process of identifying, organising, and defining the data a business collects and the relationships between them. It involves creating visual representations of data structures to ensure that reports are not only accurate but also performant and scalable. As data volumes grow, the difference between a sluggish, confusing report and a high-speed analytical tool often comes down to the quality of the underlying model. The Building Blocks: Fact and Dimension Tables

To understand how to build a model, one must first distinguish between the two types of tables that inhabit it: Fact tables and Dimension tables.

1. Fact Tables

A fact table is the "main table" in your model, typically containing events such as sales transactions, hospital visits, or machine readings.
• Characteristics: They contain quantitative attributes (numbers) meant to be aggregated, such as "Revenue," "Yield," or "Quantity Sold".
• Structure: These tables are usually long and narrow. They often contain duplicate values because an event (like a specific product sale) can occur multiple times. They utilise keys (like Product ID) to link out to other tables.

2. Dimension Tables

Dimension tables contain the descriptive attributes used to slice, group, and filter the data found in fact tables.
• Characteristics: These tables hold information such as "Customer Name," "Product Category," or "Geographic Region".
• Structure: Unlike fact tables, dimension tables should contain unique values for the entity they describe (no duplicates). They are generally wider but contain fewer rows than fact tables.

Schema Design: Star vs. Snowflake

The arrangement of these tables is known as the schema. While different designs exist, the Star Schema is universally recognised as the gold standard for Power BI.

The Star Schema

In a star schema, a central fact table is surrounded by multiple dimension tables, resembling a star.
• Why it is preferred: The Power BI engine is optimised to work best with this structure. It reduces the number of joins required to filter data, creating a cleaner, more organised model.
• Benefits: It ensures DAX measures calculate faster, reports refresh more quickly, and the solution remains scalable even as data volume increases into the millions of rows.

The Snowflake Schema

The snowflake schema is a variant of the star schema where dimension tables are further normalised. In this design, dimensions branch off into other dimensions. For example, a "Product" table might link to a separate "Product Category" table, which in turn links to "Product Subcategory".
• The Trade-off: While this can be useful when fact tables exist at different levels of granularity (e.g., sales by product vs. targets by region), it generally adds unnecessary complexity. Extra relationships force filters to propagate through longer chains, which can negatively impact performance.

One Big Table (OBT)

Beginners often attempt to flatten all data into a single table. While this may work for quick prototyping or ad-hoc analysis, it is considered a transitory state. It limits functionality—such as time intelligence and handling multiple data grains—and often leads to performance challenges due to large file sizes and repetitive data storage.

The Glue: Relationships

A data model is only functional if the tables effectively talk to one another. This is achieved through relationships, which are defined by cardinality and cross-filter direction.

Cardinality

Cardinality defines how rows in one table relate to rows in another.
• One-to-Many: This is the ideal relationship for linking a Dimension table (one unique ID) to a Fact table (many transactions).
• Many-to-Many: This relationship type is problematic and should be avoided whenever possible. It typically arises when connecting two fact tables directly or when dimensions are not unique. Misusing this can lead to "ambiguous" results, duplicated totals, and incorrect reporting.

Directionality

• Single Direction: Filters flow from the "one" side (Dimension) to the "many" side (Fact). This is the recommended setting for most scenarios.
• Bi-directional (Both): This allows filters to flow in both directions. While it can solve specific problems (like filtering a slicer based on available data), it is computationally expensive and can produce unpredictable results by introducing ambiguity into the model path.

Why Good Modelling is Critical

The structure of your data determines the performance, flexibility, and accuracy of your reports.

Performance: Poor modelling choices—such as relying on "One Big Table" or using complex snowflake chains—can slow down data refreshes and visual rendering. Conversely, a star schema minimises the work the engine must do, allowing reports to scale to very large datasets without lagging.
Accuracy: Bad relationships jeopardise data integrity. For instance, analysing monthly sales targets against daily sales data using a many-to-many relationship can cause targets to be duplicated across every day, leading to vastly inflated and incorrect totals. A proper model ensures aggregations (sums, averages) are calculated correctly across different contexts.
Usability: A well-designed star schema groups attributes logically (e.g., all customer details in one Customer table). This makes the "Fields" pane cleaner and easier for end-users to navigate compared to searching through a massive, flat table.

Conclusion

While it is tempting to drag and drop raw data directly into visualisations, investing time in data modelling is non-negotiable for professional analysis. By adhering to the star schema, ensuring one-to-many relationships, and clearly distinguishing between facts and dimensions, developers can build Power BI solutions that are robust, accurate, and lightning-fast.

Introduction to MS Excel for Data Analytics

Jason Ndalamia — Sun, 25 Jan 2026 16:08:49 +0000

When people hear Data Analytics, they often think of complex programming languages like Python or SQL. However, the functions found in Excel are generally the same ones found in Power BI, SQL, and Python—the primary difference is just the syntax used for execution.

This guide will introduce you to MS Excel as a powerful analytics tool, covering everything from basic data cleaning to interactive dashboards using Pivot Tables.

1. Organising Your Data: Sorting and Filtering

Before analysing data, you must ensure it is organised.

Data Sorting: Sorting involves arranging data in a specific order.
Text: Arranges data from A to Z or Z to A.
Numbers: Arranges data from smallest to largest or largest to smallest.
Dates: Arranges data from oldest to newest or newest to oldest.

⚠️ Important Tip: When sorting, always expand the selection when prompted. If you don't, Excel might reorder only the selected column, which will disorient the rest of your data and cause records to be mismatched.

Data Filtering allows you to temporarily display only the rows that meet specific criteria while hiding the rest. You can toggle filters on or off using the shortcut Control + Shift + L.

Text Filters: Use these to find cells that "contain" specific words or "begin with" a certain letter.
Number Filters: Use these to filter for "Top 10" items or values "Greater Than" a specific number.

Sort & Filter dropdown menu in the Home tab

2. Cleaning Data with Text Functions

Raw data is often messy. Excel provides specific functions to clean and standardise text.

TRIM: Removes extra leading or trailing spaces that are often invisible but cause errors.
PROPER: Capitalises the first letter of each word (great for fixing names).
UPPER / LOWER: Converts text entirely to uppercase or lowercase.
CONCAT: Combines two or more text strings into one cell. In older versions of Excel, you can use the & (ampersand) symbol to achieve this.

Messy names and a 'Cleaned' column using =PROPER(TRIM(cell))

3. Automating Decisions with Logical Functions

Logical functions help you categorise data automatically based on rules.

The IF Function The IF function performs a test: it returns one value if the test is true, and a different value if it is false.

Example: Imagine you want to categorise salaries. If the salary in cell E2 exceeds 80,000, it is "High"; otherwise, it is "Low".

Formula: =IF(E2 > 80000, "High", "Low").

Nested IFs. If you have more than two categories (e.g., Old, Middle-aged, Young), you can use a Nested IF, which places a second IF function inside the first one.

AND / OR Logic You can combine IF with AND (where both conditions must be met) or OR (where at least one condition must be met).

AND Example: Assign a bonus only if experience > 30 years AND projects > 10.

Formula: =IF(AND(O2 > 30, P2 > 10), "Assign Bonus", "Do not Assign Bonus").

4. Connecting Data with Lookup Functions

Data is often split across different tables. Lookup functions allow you to retrieve data from one table and pull it into another.

VLOOKUP (Vertical Lookup) VLOOKUP searches for a value in the first column of a range and returns a result from a column you specify.

The Syntax: =VLOOKUP(lookup_value, table_array, col_index_num, [range_lookup]).

lookup_value: The ID you are searching for.
table_array: The range containing your data.
col_index_num: The column number containing the answer (e.g., column 5 for Salary).
range_lookup: Use FALSE for an exact match, which is recommended for IDs.

A VLOOKUP formula connecting an Employee ID to their Salary

5. Mastering Date Functions

Excel stores dates as serial numbers, allowing for powerful calculations.

TODAY: Returns the current date.
NETWORKDAYS: Calculates the number of working days between two dates, automatically excluding weekends.
DATEDIF: A "hidden" function that calculates the difference between two dates in years ("y"), months ("m"), or days.

6. The Power of Pivot Tables

Pivot tables are the ultimate tool for summarising data. They allow you to aggregate thousands of rows into a clear summary table without writing complex formulas.

How to create one:

Click a single cell inside your data range (avoid selecting the whole sheet).

Go to Insert > Pivot Table.

Drag and Drop fields:

Rows: For categories (e.g., Department).

Values: For numbers to calculate (e.g., Sum of Salary, Count of Employees).

Interactive Slicers: To make your report interactive, insert a Slicer. This is a visual button menu that filters your Pivot Table instantly when clicked.

A Pivot Table with a Slicer for 'Department' next to it.

Summary

Excel is more than just a spreadsheet; it is a robust data analytics environment. By mastering text cleaning, logical functions, lookups, and Pivot Tables, you can transform raw data into meaningful insights.

A Beginner’s Guide to Git and GitHub: From Installation to Your First Push

Jason Ndalamia — Fri, 16 Jan 2026 12:52:05 +0000

Starting my journey in Data Science, Analysis, and AI at LUXDevHQ felt like learning a new language while trying to build a house. One of the most important tools I’ve discovered along the way is Version Control.

In this guide, I’ll walk you through:

Setting up Git Bash
Connecting Git to GitHub
Mastering essential push and pull commands

1. What Is Git and Why Does It Matter?

Git is a Version Control System (VCS). Think of it as a save-point system for your code.

Why is Git important?

⏪ Time Travel – If you break your code, you can roll back to a version that worked.
🤝 Collaboration – Multiple people can work on the same project without overwriting each other’s work.
🧪 Experimentation – You can create branches to try new features without affecting the main project.

2. Setting Up Your Environment

Step A: Install Git Bash

Go to Git and download Git for your OS (I used Windows).
Run the installer. > 💡 Pro tip: You can keep the default settings for most options.
After installation, search for Git Bash in your applications. It looks like a terminal window.

Step B: Configure Your Identity

To let GitHub know who is uploading code, configure your global Git settings:

git config --global user.name "Your Name"
git config --global user.email your-email@example.com

3. Secure Your Connection: Setting Up SSH Keys

Using SSH is the professional standard. It’s more secure and saves you from typing your password every time you push code.

Step 1: Generate Your SSH Key

Open Git Bash and enter (replace with your GitHub email):

ssh-keygen -t ed25519 -C your_email@example.com

• File Location: Press Enter to use the default location.
• Passphrase: As a beginner, I left this empty for convenience.

Step 2: Add Key to the SSH Agent

eval "$(ssh-agent -s)"
ssh-add ~/.ssh/id_ed25519

Step 3: Add the Public Key to GitHub

Copy the key:

cat ~/.ssh/id_ed25519.pub

Go to GitHub: Settings → SSH and GPG keys → New SSH Key.
Give it a name (e.g., "My Learning Laptop") and paste your key into the "Key" box.

Step 4: Test the Connection

Run:

ssh -T git@github.com

Success Check! If you see "Hi [YourUsername]! You've successfully authenticated", you are ready!

4. Navigating and Creating Your Project

Learning to navigate via Git Bash makes you much faster than using a mouse! Use these commands to create your first repository:
• Check Location: pwd (Print Working Directory).
• Go to Desktop: cd Desktop
• Create Folder: mkdir my-first-repo
• Enter Folder: cd my-first-repo

5. Tracking Changes (The Core Workflow)

Before sending code to GitHub, Git needs to "track" it locally. Run these inside your project folder:

Initialize Git: git init (Starts tracking the folder).
Check Status: git status (See what files Git notices).
Add Files: git add . (Stages all changes to be saved).
Commit: git commit -m "My first commit" (Creates the "save point").

6. Pushing Code to GitHub

Pushing sends your local save points to the cloud.

Step A: Create the Repository on GitHub.com

Log into GitHub, click the + icon → New repository.
Name it (e.g., my-first-project) and keep it Public.
Important: Leave "Add a README" unchecked to avoid conflicts.
Click Create repository.

Step B: Connect and Push

On the GitHub setup page, click SSH and copy the URL. Then run these commands one by one:

git remote add origin git@github.com:your-username/repo-name.git
git push -u origin main

7. Pulling Code from GitHub

If you work on a different computer, use Pull to download the latest updates from the cloud:

git pull origin main

📚 Resources to Keep Learning

• Official Git Documentation
• GitHub Skills: Interactive Courses
• Visualizing Git Commands (Game)

Conclusion: Congratulations! You've just set up a professional dev workflow. Git can be tricky at first, but keep practicing and it will become second nature. If you ran into any issues, drop a comment below and let's help each other out!