Forem: John Mwendwa

A Deep Dive Into Boston’s Airbnb Performance

John Mwendwa — Thu, 08 Jan 2026 15:47:19 +0000

Introduction

The short term rental market is constantly shifting and platforms like Airbnb need more than descriptive statistics to stay ahead.For this project, i took a deep dive into Boston’s Airbnb listings to answer three high-impact business questions:

What makes some hosts incredibly successful while others struggle?
Which Boston neighborhoods show strong demand and which ones are oversaturated?
What drives exceptional guest experiences, and how can Airbnb detect quality risks early?

My goal wasn’t just to analyze data, but to translate it into real market intelligence that can guide growth strategy, host enablement, and guest experience improvement.

The End-to-End Data Pipeline

1. Data Preparation in Python

Loaded the raw Airbnb Boston dataset
Checked data structure, completeness, and quality
Cleaned missing values, removed duplicates, and treated outliers
Performed exploratory data analysis (EDA)
Analyzed pricing patterns, room types, correlations, and neighborhood distribution

2. Data Modeling & Analysis in PostgreSQL

Loaded the cleaned dataset into PostgreSQL using psycopg2
Designed relational schemas for listings, hosts, and neighborhood data
Used SQL and CTEs to calculate:
- Host success metrics
- Neighborhood market health
- Oversaturation indicators
- Guest engagement via reviews-per-month

3. Dashboard Creation in Power BI

Connected PostgreSQL to Power BI
Built DAX-based KPIs for host performance, market health, and guest experience
Designed interactive dashboards with slicers and buttons
Visualized insights for hosts, neighborhoods, and reviews

Key Insights

1. High performing hosts share common behaviors

They set competitive prices, maintain high-quality listings, and benefit strongly from central neighborhoods like Back Bay, Beacon Hill, and the West End.

2. Not all neighborhoods are equal

Areas like Longwood Medical Area, Bay Village, and Back Bay show strong demand.Others Dorchester, Roxbury, Jamaica Plain, Fenway show signs of oversaturation.

3. Guest experience can be predicted

Listings with high review velocity (more reviews per month) consistently reflect better guest satisfaction and listing quality.

Recommendations

1. Help hosts succeed

Provide better pricing guidance to match market demand
Support new hosts with onboarding and listing improvement tips
Boost visibility for small or new hosts

2. Manage neighborhood growth strategically

Focus expansion in high-demand areas
Be cautious adding new listings in saturated neighborhoods
Encourage improvements (quality, pricing) for hosts in slow demand regions

3. Protect guest experience

Track listings with declining reviews-per-month
Benchmark each listing against similar ones in the same neighborhood
Flag risk listings early and intervene before quality drops affect platform reputation

SQL and I Had Beef 😂So I Built a Trigger

John Mwendwa — Mon, 05 May 2025 10:53:24 +0000

Introduction

Let me be honest SQL has been giving me a hard time for a while. So, I was getting stuck even with the basics. So I decided to stop running away and build a real project from scratch: a student enrollment system.

I wanted something simple but useful, where I could understand how real systems are built with SQL,,including relationships, foreign keys, and triggers. This is how it went down.

What the System Does

This project is all about managing a small school system. It handles:

Storing student info (name, email, date of birth)
Keeping track of instructors
Linking courses to instructors
Enrolling students to courses
Logging every enrollment using a trigger

Tables I Created

Table	Description
Students	Stores student details like name and date of birth
Instructors	Stores instructor info
Courses	Each course is taught by one instructor
Enrollments	Shows which student is taking which course
Enrollment_Log	Automatically logs new enrollments using a trigger

SQL Code I Used

1. Students Table

CREATE TABLE Students (
    student_id INT PRIMARY KEY,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    email VARCHAR(100) UNIQUE,
    date_of_birth DATE
);

2. Instructors Table

CREATE TABLE Instructors (
    instructor_id INT PRIMARY KEY,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    email VARCHAR(100) UNIQUE
);

3. Courses Table

CREATE TABLE Courses (
    course_id INT PRIMARY KEY,
    course_name VARCHAR(100),
    instructor_id INT,
    FOREIGN KEY (instructor_id) REFERENCES Instructors(instructor_id)
);

4. Enrollments Table

CREATE TABLE Enrollments (
    enrollment_id INT PRIMARY KEY,
    student_id INT,
    course_id INT,
    grade CHAR(2),
    FOREIGN KEY (student_id) REFERENCES Students(student_id),
    FOREIGN KEY (course_id) REFERENCES Courses(course_id)
);

5. Enrollment Log Table

CREATE TABLE Enrollment_Log (
    log_id INT PRIMARY KEY AUTO_INCREMENT,
    enrollment_id INT,
    log_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

6. Trigger to Log Enrollments

CREATE TRIGGER LogNewEnrollment
AFTER INSERT ON Enrollments
FOR EACH ROW
INSERT INTO Enrollment_Log (enrollment_id)
VALUES (NEW.enrollment_id);

Sample Data I Added

1.Instructors

INSERT INTO Instructors (instructor_id, first_name, last_name, email) VALUES 
(1, 'Thomas', 'Ndegwa', 'thomas.ndegwa@example.com'),
(2, 'Lilian', 'Achieng', 'lilian.a@example.com'),
(3, 'George', 'Kariuki', 'george.kariuki@example.com');

2. Students

INSERT INTO Students (student_id, first_name, last_name, email, date_of_birth) VALUES 
(101, 'Alice', 'Mwende', 'alice.mwende@example.com', '2002-04-12'),
(102, 'Brian', 'Otieno', 'brian.otieno@example.com', '2001-08-05');

3. Courses

INSERT INTO Courses (course_id, course_name, instructor_id) VALUES 
(201, 'Database Systems', 1),
(202, 'Python Programming', 2),
(203, 'Data Analytics', 3);

4.Enrollments

INSERT INTO Enrollments (enrollment_id, student_id, course_id, grade) VALUES 
(301, 101, 201, 'A'),
(302, 102, 202, 'B');

What I Learned

Through this project, I’ve learned:

How to design related tables using foreign keys
How to keep data clean and connected
How to use triggers to automate tasks (like logging)
The importance of thinking through the whole structure before writing code
This is way better than just memorizing;

SELECT * FROM table;

What I Might Add Later

Add user logins for students and instructors
Store attendance and class schedules
Build a simple dashboard with Power BI

Final Thoughts

This was a small project, but it really helped me understand SQL better. I now see why people say the database is the heart of any system.
If you're learning SQL, I encourage you to try something small like this,it makes everything more clear.

From Chaos to Clarity:Excel Made Easy.

John Mwendwa — Sat, 26 Apr 2025 16:52:19 +0000

Today, Excel is still one of the best tools we use to understand numbers and make good business decisions.
Whether you are checking sales, looking for patterns, or planning ahead, Excel can really help you a lot.
Let me take you through a project iv done recently on sales data;

1. Data Transformation: Preparing for Analysis
The first and most crucial step in any data project is cleaning and preparing the data. In Excel:

Format Dates Properly: Use Excel’s date formatting tools to ensure that fields that have dates data are recognized as real dates.
Ensure Numeric Fields Are Clean: Highlight the sales_value column tap home tab,then number group then number(set format to number).
Handle Missing Data: Highlight whole data,use find & select tab,go to special,then tap blanks.If u find them either delete or fill them using the mean of the data.
Organize the Data: Structure the table clearly, with headers and consistent data types, making it easy to perform calculations later.
A clean dataset is the foundation for reliable insights.

2. Statistical Analysis: Understanding the Numbers
Once your data is ready, you can start by finding out the important numbers:

Total Sales Value: Add up all the sales to see how much money the company made.
Average Sales Value: Divide the total sales by the number of sales to find out the average amount per sale.
Total Quantity Sold: Add up the number of items sold.
Average Quantity Sold: Find out the average number of items sold per sale.

3. Data Analysis: Identifying Patterns and Trends
Beyond basic stats, Excel’s PivotTables allow you to break down performance

Sales by Region: Find out which regions are bringing in the most money.
Sales by Channel: See which sales methods are working best.
Sales by Salesperson: Check which salespeople are performing well and who may need support.
This step helps you see clearly where the strengths and weaknesses are.

4. Data Visualization: Building Interactive Dashboards
A dashboard is a smart way to show your results nicely.
You can use different charts in Excel like:

Bar Charts: To compare sales between regions, channels, or salespeople.
Line Charts: To see how sales are changing over time.
Pie Charts: To show how sales are shared between different channels or regions.
Slicers: To make the dashboard interactive e.g., you can click on a region and see only its data.
A good dashboard makes it easy for anyone to understand the numbers quickly.

5. Summary and Insights
After your analysis and dashboard are ready, it’s time to explain your findings:

Highlight Top Performers: Recognize best performing regions, channels, and salespeople.
Show Areas That Need Help: Point out regions or channels where sales are low.
Give Smart Advice/recommendations;Focus resources on high-performing areas and develop strategies to improve weak spots.
When you explain your findings well, people can make better decisions based on real facts.

🎯 Conclusion
"Good data in → Great insights out."
Invest time upfront cleaning and organizing your data hence the results will always be worth it.

The Ultimate Guide to Data Science.

John Mwendwa — Sun, 25 Aug 2024 16:03:50 +0000

Data science is transforming how we perceive and interact with information. It combines statistical, technological, and business expertise to transform raw data into meaningful insights. This article gives a thorough explanation of data science, including the fundamentals, important components, tools, and applications.

Lets brainstorm this,do you ever wonder what is this data science we are talking about?
Data science is about extracting useful knowledge from data. It entails gathering, cleaning, analysing, and interpreting data to find patterns and insights. The idea is to use these insights to make educated decisions and drive strategic initiatives. Data science is a combination of several fields, including:

Statistics are used to analyse and understand data.
Computer Science: Programming and data manipulation.
Domain Knowledge: For applying data insights to specific fields.

Key Components of Data Science

1.Data Collection: Collecting information from a variety of sources, including databases, online
questionnaires, and sensors. High-quality data collection is required for accurate
analysis.
2.Data Cleaning and Preparation: Ensure that the data is clean and well-organized. This includes
processing missing numbers, rectifying errors, and converting data to a
usable format.
3.Exploratory Data Analysis (EDA) involves summarising and visualising data in order to comprehend its
structure and uncover trends. Techniques include descriptive statistics, charts, and graphs.
4.Statistical Analysis: Using statistical tools to test hypotheses and reach conclusions. This may
include procedures such as regression analysis and hypothesis testing.
5.Machine Learning and Modelling: Creating and training algorithms to predict or classify data. Decision
trees and neural networks are examples of machine learning models that are used to forecast future
trends.
6.Data Visualisation: Creating visual representations of data to make difficult information more
understandable. Matplotlib, Tableau, and Power BI are useful tools for presenting data in the form of
charts, graphs, and interactive dashboards.
7.Deployment and Integration: Using data models in real-world applications and integrating them with
existing systems to provide real-time decision assistance.

Outstanding Tools and Technologies

1.Python is a versatile programming language popular in data science because to its sophisticated
libraries, which include Pandas for data processing, NumPy for numerical computations, and Matplotlib
for visualisation.
2.R is a computer language for statistical analysis and visualisation. R provides programmes such as
ggplot2 to create complex and customisable graphs.
3.Excel is a spreadsheet utility for organising, analysing, and visualising data. It's excellent for
performing rapid analyses and making simple charts.
4.SQL is essential for searching and managing relational databases, as well as effectively retrieving
and processing data.
5.Tableau is well-known for its ability to generate interactive and shareable dashboards, making it
ideal for data visualisation and insight discovery.
6.Power BI is a Microsoft application that works well with other Microsoft products and is used to
generate dynamic reports and visualisations.
7.GitHub: Used for version control and collaboration, GitHub enables data scientists to manage code,
track changes, and collaborate on projects.

Application of Data Science

Data science is utilised in many fields to foster innovation and efficiency.

Marketing involves predicting customer behaviour and optimising marketing methods.
Finance entails predicting market trends and controlling risk.
Healthcare: Optimising patient outcomes and tailoring treatment methods.
Retailers analyse sales data to optimise inventory and price.

THE ULTIMATE GUIDE TO FEATURE ENGINEERING

John Mwendwa — Mon, 19 Aug 2024 11:41:27 +0000

"Data Engineering is like you taking all the frustrating parts of being a data analyst and combining them with all the frustrating parts of being a software engineer.
Until payday and you forget about the frustrations for a moment, haha!!!"

Hey there, are you into machine learning and want to gain most of your date, probably you should try featuring engineering in your endeavors.
Feature engineering essentially the process of converting raw data like numbers, dates, into something that your machine learning model can comprehend and use.
The better the features, the better your model will perform.
Even with the best algorithms, your model is only as good as the information you feed it. Good features equal better accuracy. Giving your model glasses helps it see things well.

Now let’s take a look at types of features and know what we are missing out on;

Numerical Features: Continuous values, like age, wages, or height. 2.Categorical Features: Are distinct groups like gender(male/female), city (Nairobi, Mombasa). 3.Ordinal Features: Are categorized traits that have a natural order, such as education level or grades (A, B, C). 4.Temporal Features: Include date and time-related information, such as timestamps.

Let’s look at Cool Feature Engineering Tricks:
1.Clean Your Data: Remove missing values and outliers.
Missing value handling techniques include imputation (mean, median, mode), the use of algorithms that support missing values, and the removal of rows/columns with missing data.
Outlier Detection and Removal techniques include use of statistical methods (e.g., Z-scores, IQR) or model-based approaches to identify and manage outliers.
2.Encode Categories: Convert categories into numbers
Label Encoding where you transform categories into numerical labels (e.g., Male = 0, Female = 1).
One-Hot Encoding where you create binary columns for each category (e.g., gender_Male and gender_Female).
Target Encoding where you replace categories with the target variable's mean for each category.
3.Scale Features: Ensure that all of your data is on the same scale so that no one feature overpowers another.
Normalization is the process of scaling features to a specific range, which is usually between 0 and 1.
Standardization involves scaling features to have a zero mean and unit variance.
Robust Scaling uses percentiles to scale features, making it less vulnerable to outliers.
4.Create New Features: Merge or alter existing features to create new ones.
For example, multiply two characteristics or generate time-based features such as "day of the week."

Here are some of the best practices;
1.Understand Your Data: Perform some analysis to determine what's going on.
2.Start Simple: Don't overcomplicate things; instead, start with basic features.
3.Test and iterate: Experiment with different features to find what works best.
4.Avoid Data Leakage: Using future data to forecast the past is like cheating on a test.
5.Document your work: Take notes on what you've done so you can repeat it if necessary.

That’s a lot we have looked at. Its amazing right? In that mood of excitement, we also have to take caution at these two things so that we be on the safer side;
1.Don't Overfit: Adding too many features may make your model perform well on existing data but poorly on new data.
2.Use Domain Knowledge: Rather than relying just on tools, construct relevant features based on your understanding of the data.

All the best!

UNDERSTANDING YOUR DATA: THE ESSENTIALS OF EXPLORATORY DATA ANALYSIS

John Mwendwa — Sun, 11 Aug 2024 14:02:27 +0000

Exploratory Data Analysis (EDA) is an essential phase in data science that allows you to better understand your data, identify trends, and obtain insights.
EDA analyzes and visualizes data to uncover relationships, detect abnormalities, and verify data quality.

Importance of EDA;
1.Summarize main characteristics of the data
2.Gain better understanding of the data set
3.Uncover relationships between different variables
4.extract important variables for the problem that is trying to be solved.

Outlier Detection- Identifies odd data points that may impact the analysis.
Error detection - Assists in identifying and correcting errors in data.

Tools and Techniques;

Jupyter Notebook: An interactive environment for EDA (has tools like pandas and NumPy)
Visualization Libraries: Tools like Matplotlib, Seaborn, and SciPy for creating visualizations.
Summary Statistics: Key metrics to summarize data (e.g., mean, standard deviation).
Data Transformation: Techniques like normalization to improve data quality.

Steps involved in EDA;

Data Collection - Collecting data for your analysis.
Collect data from several sources e.g., databases.
Ensure that the data is correct, complete, and relevant to your situation.
Data Cleaning – Involves Preparing data by removing inconsistencies and inaccuracies.
A) Remove duplicates: Make sure each record is unique and meaningful

# Check for duplicate records
duplicate_mask = data.duplicated()
# Count the number of duplicate records
num_duplicates = duplicate_mask.sum()
print(f"Number of duplicate records: {num_duplicates}")
# View duplicate records
if num_duplicates > 0:
    print("Duplicate records:")
    print(data[duplicate_mask])
# Optionally: Remove duplicate records
# weather_df = weather_df.drop_duplicates()
# Save the cleaned dataset (if duplicates were removed)
# weather_df.to_csv('cleaned_weather_dataset.csv', index=False)

B) Handle Missing Values: Determine whether to eliminate or fill in missing data.

# Check for null & missing values
data.isnull().sum()

C) Standardizing data; Ensuring Consistency in data formats
D) Correcting errors; Fixing any data entry mistakes

3.Data visualization
Involves using visual tools to investigate the distribution, trends, and relationships in data.
A) Histograms illustrates the distribution of a single variable.
B) Box plots- used to highlight the distribution and identify outliers.
C) Scatter plots- Investigate the correlations between two variables.
D) Bar charts allow you to compare category data.

4.Statistical analysis.
Using statistical measures to summarize the data and analyze its key properties.

A) Summary Statistics: Determine the mean, median, standard deviation, and other metrics.
` Data.describe()

`
B) Correlation Analysis: Determine the relationship between variables using heat map plot

#heatmaps to identify relationships between different weather parameters.
correlation_matrix = numeric_data.corr()
plt.figure(figsize=(10, 6))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Matrix')
plt.show()

5.Interpretation and Insight
Drawing conclusions and produce insights from the analysis.
A) Interpret visualizations and statistics. Understand the data's patterns, trends, and relationships.
B) Generate hypotheses: Create hypotheses based on the EDA to guide future analysis.
C) Document Findings: Clearly document findings, any effects, and any issues discovered in the data.