Forem: Juwon?🍀

DIABETES PREDICTION APP WITH MACHINE LEARNING

Juwon?🍀 — Sun, 06 Apr 2025 01:00:37 +0000

PROJECT INTRODUCTION

Diabetes is a condition that affects how your body processes sugar (glucose). Typically, your body uses insulin to help regulate blood sugar levels, but in diabetes, this process gets disrupted. There are two main types:

Type 1 Diabetes: The body doesn’t produce insulin at all. It usually develops early in life and requires insulin injections.
Type 2 Diabetes: occurs when the body either doesn’t produce enough insulin or can’t use it properly. It’s more common and often linked to lifestyle factors like diet and exercise.

If left unmanaged, diabetes can lead to serious health problems, but with the right care, like a balanced diet, exercise, and medication, it can be controlled. That’s where your Diabetes Prediction App comes in, helping people get an early indication and take action!

PROJECT AIM

The dataset for this project was downloaded from Kaggle. This project aims to develop an app that can predict whether a patient is diabetic. Data handling and visualization will also take place to gain insight. A Logistic Regression and Random Forest classifier model would be created, and the best-performing model would be used to determine if a patient is diabetic or not.

DATASET OVERVIEW

The dataset is obtained from https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database

Import Required Libraries

#Import required libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

%matplotlib inline

sns.set_style('whitegrid')

Load The Dataset

df = pd.read_csv('diabetes.csv')
df.head(5)

Get Dataset Information

#information of dataset

df.info()

Information about dataset attributes

Pregnancies: To express the Number of pregnancies
Glucose: To express the Glucose level in blood
BloodPressure: To express the Blood pressure measurement
SkinThickness: To express the thickness of the skin
Insulin: To express the Insulin level in blood
BMI: To express the Body mass index
DiabetesPedigreeFunction: To express the Diabetes percentage
Age: To express the age
Outcome: To express the final result, 1 is Yes and 0 is No

Dataset Statistics

#check statistics of dataset

df.describe().T

Observation:

The dataset's statistics show that the minimum values of Glucose, Blood Pressure, Skin Thickness, Insulin, and BMI cannot realistically be 0, so this is a case that must be dealt with.

Data Handling

We would check for missing values in this aspect and handle them accordingly.

#Check for missing values

df.isna().sum()

Observation:

No missing values in the dataset.

Handling Zero Values

In this aspect, we would handle the zeros in the dataset.

Firstly, we would check where the zero appears.

#check for where 0 is present in each column

print(df[df['Glucose'] == 0].shape[0])
print(df[df['BloodPressure'] == 0].shape[0])
print(df[df['SkinThickness'] == 0].shape[0])
print(df[df['Insulin'] == 0].shape[0])
print(df[df['BMI'] == 0].shape[0])

Output: 5 35 227 374 11

Next, we would visualize the plot to check the distribution of each column.

#Check distribution of each column in the dataset

df.hist(figsize=(20,20))
plt.show()

Observation:

Some of the columns have skewed distributions, so the mean is less affected by outliers than the median. Glucose and Blood Pressure have normal distributions; hence, we replace 0 values in those columns with mean values. Skin thickness, Insulin, and BMI have skewed distributions; hence, the median is a better choice as it is less affected by outliers.

#Handling Zero Values

df['Glucose'] = df['Glucose'].replace(0, df['Glucose'].mean())
df['BloodPressure'] = df['BloodPressure'].replace(0, df['BloodPressure'].mean())
df['SkinThickness'] = df['SkinThickness'].replace(0, df['SkinThickness'].median())
df['Insulin'] = df['Insulin'].replace(0, df['Insulin'].median())
df['BMI'] = df['BMI'].replace(0, df['BMI'].median())

Data Visualizations

In this aspect, we would perform a simple visualization where we check the relationship between the target column(Outcome) with the other columns.

#Get numerical columns

num_col = ['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin',
       'BMI', 'DiabetesPedigreeFunction', 'Age']

#Visualize columns in respect to the outcome.

# Number of rows needed (assuming you want 2 histograms per row)
nrows = (len(num_col) + 1) // 2  # this will round up the division

fig, axes = plt.subplots(nrows=nrows, ncols=2, figsize=(10, nrows * 5))

# Flatten axes array to make it easier to iterate over
axes = axes.flatten()

for i, col in enumerate(num_col):
    sns.histplot(df, x=col, hue=df['Outcome'], ax=axes[i])
    axes[i].set_title(f'Distribution of {col} by Outcome')

# Hide any unused subplots if there are an odd number of columns
for j in range(i + 1, len(axes)):
    axes[j].axis('off')

plt.tight_layout()
plt.show()

Correlation Heatmap

#correlation heatmap

plt.figure(figsize=(10,6))
sns.heatmap(df.corr(), annot=True, fmt=' .2f')
plt.title('CORRELATION HEATMAP')
plt.show()

Data Preparation

In this aspect, I would first scale the dataset using the standard scaler and split into X(Feature variable) and y(Target variable).

#Scale data

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

X = pd.DataFrame(scaler.fit_transform(df.drop(columns=['Outcome'])), columns=df.columns[:-1])

y = df['Outcome']
y

Then, I would split the dataset into train and test splits using the scikit-learn TrainTestSplit.

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

Observation:

The dataset was split into Features [X] and Target[y] variable
It was then split into our Train and Test splits using TestTrainSplit.
The dataset was split into 80% train data and 20% test data.

Model Selection and Evaluation

We used two models for this prediction project, models used are:

Logistic Regression: a statistical method used to predict the probability of a binary outcome (like yes/no, 0/1) based on one or more independent variables, essentially predicting the likelihood of an event occurring.
Random Forest Classifier: a machine learning algorithm that uses an ensemble of decision trees to classify data, making predictions by averaging the predictions of individual trees. It’s a powerful and versatile tool known for its accuracy and efficiency.

Logistic Regression

Build the model for prediction.

#import required libraries
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

lr = LogisticRegression()
lr.fit(X_train,y_train)

#predictions
train_pred = lr.predict(X_train)  #prediction on training set
test_pred = lr.predict(X_test)    #Prediction on test set

#Accuracy scores
train_acc = accuracy_score(y_train,train_pred)
test_acc = accuracy_score(y_test, test_pred)

print('Train Set Accuracy: ', train_acc * 100)
print('Test Set Accuracy: ', test_acc * 100)

print()

#Confusion matrix and classification report
print('Confusion Matrix:\n', confusion_matrix(y_test,test_pred))
print('Classification Report:\n', classification_report(y_test,test_pred))

#Visualize the Logistic Regression confusion matrix

#convert to matrix
conf_matrix = np.array([[82, 18], [27,27]])

#convert to dataframe
df_cm = pd.DataFrame(conf_matrix, index=['Actual Negative', 'Actual Positive'], columns=['Predicted Negative', 'Predicted Positive'])

#heatmap
plt.figure(figsize=(8,6))
sns.heatmap(df_cm, annot=True, fmt='d', cmap='Blues')
plt.title('Logistic Regression Confusion Matrix')
plt.show()

Observation:

The model achieves 79.48% accuracy on the training set and 70.78% accuracy on the test set, indicating a moderate drop in performance, which suggests some overfitting.
From the confusion matrix, the model correctly classifies 82 non-diabetic patients but misclassifies 18 as diabetic. It also correctly classifies 27 diabetic patients but misclassifies 27 as non-diabetic, which may indicate difficulty in distinguishing diabetic cases.
The classification report shows that the model has higher precision (0.75) and recall (0.82) for non-diabetic cases compared to diabetic cases (precision = 0.60, recall = 0.50). This suggests that the model is better at identifying non-diabetic patients but struggles with diabetic cases, likely due to class imbalance or feature representation.

Random Forest Classifier

Build the model for prediction.

#import required libraries
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.model_selection import GridSearchCV

#hyperparameter grid
param_grid = {
    'n_estimators': [50, 100, 200, 300],
    'max_depth': [10, 20 ,30],
    'min_samples_split': [2, 5, 10]
}

#Perform gridsearch with cross validation
grid = GridSearchCV(RandomForestClassifier(), param_grid, cv=5, scoring='accuracy', n_jobs=-1)
grid.fit(X_train,y_train)

#get the best estimator
print('Best param: ', grid.best_params_)
rfc = grid.best_estimator_

#predictions
rf_train_pred = rfc.predict(X_train)
rf_test_pred = rfc.predict(X_test)

#Accuracy score
rf_train_acc = accuracy_score(y_train,rf_train_pred)
rf_test_acc = accuracy_score(y_test, rf_test_pred)

print('Train Set Accuracy: ', rf_train_acc * 100)
print('Test Set Accuracy: ', rf_test_acc * 100)
print()

#Confusion matrix and classification report
print('Confusion Matrix:\n', confusion_matrix(y_test,rf_test_pred))
print('Classification Report:\n', classification_report(y_test,rf_test_pred))

#visualize the confusion matrix

#convert to matrix
rf_matrix = np.array([[82,18],[22,32]])

#convert to dataframe
rf_df = pd.DataFrame(rf_matrix, index=['Actual Negative', 'Actual Positive'], columns=['Predicted Negative', 'Predicted Positive'])

#heatmap
plt.figure(figsize=(8,6))
sns.heatmap(rf_df, annot=True, fmt='d', cmap='Blues')
plt.title('Random Forest Confusion Matrix')
plt.show()

Observation:

The model’s training accuracy improved to 93.16%, while test accuracy increased to 76.62%, showing better generalization but still some overfitting.
The confusion matrix indicates that the model correctly classifies 83 non-diabetic and 35 diabetic patients, with fewer misclassifications compared to the previous model. However, 17 non-diabetic and 19 diabetic patients are still misclassified.
The classification report shows an improvement in detecting diabetic cases (precision = 0.67, recall = 0.65, f1-score = 0.66), meaning the model is now slightly better at identifying diabetes, though it still favors non-diabetic predictions (precision = 0.81, recall = 0.83).

Save The Model

The Random Forest classifier is the better-performing model; it will be saved using the pickle library and is useful in building our app. The standard scaler would also be saved to be used in the app When our user inputs details, the model would first scale the inputs before passing them into the model for prediction.

#import required library
import pickle

pickle.dump(rfc, open('model.pkl', 'wb'))
pickle.dump(scaler, open('scaler.pkl', 'wb'))

BUILD AND DEPLOY THE APP

Now, we would build and deploy the app using STREAMLIT.

import streamlit as st
import pickle
import numpy as np
import time

# Load the trained model and scaler
model = pickle.load(open('model.pkl', 'rb'))
scaler = pickle.load(open('scaler.pkl', 'rb'))

# Streamlit app styling
st.markdown(
    """
    <style>
        body { background-color: #f4f4f4; }
        .main-title { text-align: center; color: #2c3e50; font-size: 36px; font-weight: bold; }
        .sub-text { text-align: center; font-size: 18px; color: #7f8c8d; }
        .stButton > button { width: 100%; background: linear-gradient(to right, #4CAF50, #2ecc71); color: white; }
        .result-box { padding: 20px; border-radius: 10px; text-align: center; font-weight: bold; }
    </style>
    """,
    unsafe_allow_html=True
)

# Title
st.markdown("""
    <h1 class='main-title'>🩺 Diabetes Prediction App</h1>
    <p class='sub-text'>This app predicts the likelihood of diabetes based on patient medical details.</p>
""", unsafe_allow_html=True)

# Sidebar for user inputs
st.sidebar.header("Enter Patient Details 🏥")

pregnancies = st.sidebar.slider('Pregnancies 🤰', 0, 20, 0)
glucose = st.sidebar.slider('Glucose Level 🍬', 0, 300, 120)
blood_pressure = st.sidebar.slider('Blood Pressure 💉', 0, 200, 80)
skin_thickness = st.sidebar.slider('Skin Thickness 📏', 0, 100, 20)
insulin = st.sidebar.slider('Insulin Level 💊', 0, 900, 80)
bmi = st.sidebar.slider('BMI ⚖️', 0.0, 70.0, 25.0, step=0.1)
dpf = st.sidebar.slider('Diabetes Pedigree Function 🧬', 0.0, 3.0, 0.5, step=0.01)
age = st.sidebar.slider('Age 🎂', 0, 120, 30)

# Prediction button
if st.sidebar.button('🔍 Predict Diabetes'):
    # Create input array
    input_data = np.array([[pregnancies, glucose, blood_pressure, skin_thickness, insulin, bmi, dpf, age]])

    # Scale the data
    input_data_scaled = scaler.transform(input_data)

    # Loading animation
    with st.spinner('Analyzing medical data... ⏳'):
        time.sleep(2)

    # Make prediction
    prediction = model.predict(input_data_scaled)

    # Display the result with better styling and animations
    if prediction[0] == 1:
        st.markdown("""
            <div class='result-box' style='background-color: #ff4d4d; color: white;'>
                🚨 <strong>Prediction: DIABETES DETECTED!</strong><br> Please consult a medical professional.
            </div>
        """, unsafe_allow_html=True)
    else:
        st.markdown("""
            <div class='result-box' style='background-color: #4CAF50; color: white;'>
                ✅ <strong>Prediction: NO DIABETES!</strong><br> Maintain a healthy lifestyle! 🏃‍♂️🥗
            </div>
        """, unsafe_allow_html=True)

Above are images of the app running that predicts No Diabetes or Diabetes Detected.

CONCLUSION

In this article, using the Diabetes dataset, we have demonstrated an end-to-end machine learning and deployment project from beginning to end. Data cleaning and visualization were our first steps. Then, to give better data to train with the machine learning model, the data was scaled using the standard scaler. After that, we built two models, the Logistic Regression and the Random Forest Classifier, in which the Random Forest was the better-performing model, and the model was saved and used for building our app using Streamlit. Tho the model can still be improved using more advanced machine models, which were not discussed in this article, as the main purpose of this article is to show the usage of the Random forest classifier and the streamlit app building.

You can check out the GitHub file here: Raw File

Deployed App: Diabetes Prediction App

PREDICTION OF GLOBAL RESTAURANT RATINGS WITH MACHINE LEARNING

Juwon?🍀 — Mon, 31 Mar 2025 21:31:12 +0000

PROJECT INTRODUCTION

The rating of a restaurant is an important form of feedback given to the restaurant to provide either appraisal or criticism about the restaurant's services. This is a regression analysis project, the main aim of this project is to understand the dataset and carefully handle missing values, perform Exploratory Data Analysis(EDA), perform feature engineering, and create machine learning models such as the Linear Regression, Decision Tree Regression, Random Forest Regression models to help predict the rating of each restaurants.

DATASET OVERVIEW

The dataset is obtained from (https://raw.githubusercontent.com/Oyeniran20/axia_class_cohort_7/refs/heads/main/Dataset%20.csv).

Import Required Libraries

#importing the required libraries.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

%matplotlib inline

sns.set_style('whitegrid')

Load The Dataset

#load the dataset
df = pd.read_csv('https://raw.githubusercontent.com/Oyeniran20/axia_class_cohort_7/refs/heads/main/Dataset%20.csv')
df.head()

Get Dataset Information

#Get information of the dataset

df.info()

The restaurant dataset contains 21 columns and 9551 rows. The columns are explained below with their importance;

Restaurant ID and Restaurant Name are two columns that provide information about each restaurant’s name and ID.
The Country Code, City, Address, Locality, Locality Verbose, Longitude, and Latitude provide specific location and indigenous information about each restaurant.
Cuisines provides information about each type of cuisine offered in a restaurant.
The range of prices, the average cost for two people, and the currency offer information about the type of money and amount of each dish, which are also key features in predicting the target variable.
Rating color, text, and Votes provide information about each restaurant’s rating and are also key features in predicting the target variable.
The aggregate rating, which is our target variable, is the rating each restaurant receives.

Dataset Statistics

#Check for the statistics of the dataset and Transpose

df.describe().T

Observation:

We can notice outliers in the Average Cost for the two columns, the 25% is oddly larger than the 50%,75% and max.
We can also notice outliers in the Votes column, the 25% is also oddly larger than the 50%,75% and max.

DATA HANDLING
We would check for missing values in this aspect and handle them accordingly.

#check for missing data

df.isna().sum().sort_values(ascending=False)

Handling Missing Values

#Drop the missing values

df.dropna(inplace=True)

Observation:

We have just 9 missing values in the cuisine column; this is a very small percentage of missing values, so it's best we drop the missing values.

Now that the dataset has been reviewed, we have seen the head of the dataset, we have gotten information to help us understand the dataset more, we have also handled missing values, the next thing to do is our Exploratory Data Analysis(EDA)

EXPLORATORY DATA ANALYSIS

In this aspect, we would perform a dataset exploration to help us better understand the dataset, and this part is more of visualizing the dataset to show relationships between different columns and the target variable, to also check for outliers and skewness with visualization.

Let's get into it.

Firstly, we would check the distribution of our target variable, which is the Aggregate rating column, using the histogram plot and the boxplot.

#Histogram
plt.figure(figsize=(10,6))
plt.subplot(1,2,1)
sns.histplot(df['Aggregate rating'], kde=True, bins=20,color='lightgreen')
plt.title('Aggregate Rating Distribution')
plt.xlabel('Aggregate Rating')

#Boxplot
plt.subplot(1,2,2)
sns.boxplot(x='Aggregate rating', data=df, color='lightblue')
plt.title('Aggregate Rating Boxplot')
plt.xlabel('Aggregate Rating')

plt.show()

Observation:

From the results obtained from the frequency count of the Aggregate rating by the Rating text, it can be seen that 0 appeared 2148 times as not-rated restaurants.
From the histogram, it can be deduced that the Aggregate rating is negatively skewed.
From the boxplot, it can be noticed that 0 is the only outlier in the plot.

Next, we would identify the top 5 cuisines and the top cities.

Top 5 cuisines

#Visualize the Top5 cuisines
top_cuisines = df['Cuisines'].dropna().str.split(', ').explode().value_counts().head(5)
plt.figure(figsize=(10,6))
plot = sns.barplot(top_cuisines, color='lightgreen')
for bars in plot.containers:
    plot.bar_label(bars)
plt.title('Top 5 Cuisines')
plt.show()

Top 5 cities

#Visualize the Top5 cuisines
plt.figure(figsize=(10,6))
plot = sns.countplot(x='City', data=df, order=df['City'].value_counts().index[:5],color='lightgreen')
for bars in plot.containers:
  plot.bar_label(bars)
plt.title('Top 5 Cities')
plt.xlabel('Cities')
plt.show()

Next, we would compare average ratings across cuisines and cities;

Average rating across cuisines

#Compare Average rating across cuisines

#split cuisines 
df["Cuisines"] = df["Cuisines"].str.split(", ").explode().reset_index(drop=True)

plt.figure(figsize=(10,6))
plot = df.groupby('Cuisines')['Aggregate rating'].mean().sort_values(ascending=False).head(10).plot(kind='bar',color='lightgreen')
for bars in plot.containers:
  plot.bar_label(bars, fmt='%.2f')
plt.xlabel('Cuisines')
plt.ylabel('Rating')
plt.title('Top 10 Cuisines By Average Ratings')
plt.xticks(rotation=45)
plt.show()

Average rating across cities

#Compare Average rating across cities

plot = df.groupby('City')['Aggregate rating'].mean().sort_values(ascending=False).head(10).plot(kind='bar',color='lightgreen')
for bars in plot.containers:
  plot.bar_label(bars,  fmt='%.2f')
plt.xlabel('City')
plt.ylabel('Rating')
plt.title('Top 10 Cities By Average Ratings')
plt.show()

Alright, that looks cool. Our next visualization would be our geospatial Analysis.

Geospatial Analysis

In this visualization step, we would display a map using our latitude and longitude columns.

Firstly, we would display it using a scatterplot to show a simple representation.

#Scatterplot

df.plot(x='Longitude', y='Latitude',c='Aggregate rating',kind='scatter')
plt.title('Restuarant Loctions Scatterplot')
plt.show()

Next, we would use the plotly library to display the map,

#Plotly

fig = px.scatter_mapbox(df,
                        lon='Longitude',
                        lat='Latitude',
                        zoom = 2,
                        color ='Aggregate rating',
                        width = 1200,
                        height = 900,
                        title= ('Global Restaurants Location with Rating')
                        )
fig.update_layout(mapbox_style = 'open-street-map')
fig.update_layout(margin = {'r':0, 't':50, 'l':0, 'b':10})
fig.show()

Observation:

It can be seen from the map that the restaurants are located all over the world.

We would do more data visualizations to gain extra insights and understanding.

Check for Outliers and Skewness

In this aspect, we would check for outliers using the boxplot and the skewness using the histogram.

#Get the numerical columns

num_col = ['Average Cost for two','Price range','Aggregate rating','Votes']

#Check for outliers using both histplot and boxplot

#histogram
for col in num_col:
  plt.figure(figsize=(10,6))
  plt.subplot(1,2,1)
  sns.histplot(df[col], kde=True, bins=20,color='lightgreen')
  plt.title(f'{col} Distribution')
  plt.xlabel(col)

  #Boxplot
  plt.subplot(1,2,2)
  sns.boxplot(x= df[col], data=df, color='lightblue')
  plt.title(f'{col} Boxplot')
  plt.xlabel(col)

plt.show()

Observation:

Skewness and Outliers can be seen throughout the histogram and boxplot, respectively.
The Average cost for two, Price range and Votes plot each show a positively skewed distribution; they all display outliers also in their boxplots.
The Aggregate rating plot shows a negatively skewed distribution, and it shows only one particular outlier in the box plots.
It can be resolved either by performing the Log or Sqrt transformation on the affected columns, as it would be necessary when preparing data for model training.

After checking for outliers and skewness, we would do a few more data visualizations to better understand the relationships between more columns and the target variable.

Determine the relationship between votes and ratings

#Show relationship between the Votes and Aggregate rating using Scatterplot

df.plot(x='Votes', y='Aggregate rating', kind='scatter', color='lightgreen')
plt.show()

Observation:

From the scatterplot:

It can be noticed that the 0, which is represented as Not rated restaurants, has 0 votes also.
It can be noticed that the majority of the Votes are between the 3, 4 and 5 Aggregate ratings, while a minority fall between the 2 and 3 Aggregate ratings, and lastly, no votes fall between the 0 and 1 Aggregate ratings at all.
It can be noticed that the majority of the Aggregate rating falls between 0 to 2000 votes, while a minority of the Aggregate rating falls between 2000 to 4000 votes.
It can be noticed that some extreme values are present between 6000 to 10000, which might be indication of possible outliers.

Identify the highest-rated cuisines

In this aspect, we would visualize the highest-rated cuisines in the dataset using a bar plot.

#Identify highest-rated cuisines


plt.figure(figsize=(10,6))
plot = df.groupby('Cuisines')['Aggregate rating'].max().sort_values(ascending=False).head(10).plot(kind='bar', color='lightgreen')
for bars in plot.containers:
  plot.bar_label(bars)
plt.title('Top 10 Highest Rated Cuisines')
plt.xlabel('Cuisines')
plt.ylabel('Rating')
plt.xticks(rotation=45)
plt.show()

Identify popular cuisines by votes

In this aspect, we would visualize cuisines with the highest votes.

#Identify Popular cuisines by votes


plt.figure(figsize=(10,6))
plot = df.groupby('Cuisines')['Votes'].sum().sort_values(ascending=False).head(10).plot(kind='bar', color='lightgreen')
for bars in plot.containers:
  plot.bar_label(bars)
plt.title('Top 10 Cuisines by Votes')
plt.xlabel('Cuisines')
plt.ylabel('Vote')
plt.xticks(rotation=45)
plt.show()

Determine which price ranges receive the highest ratings

In this aspect, we would visualize the price ranges that have the highest ratings using a bar plot.

#Price range with highest ratings

plot = df.groupby('Price range')['Aggregate rating'].mean().plot(kind='bar', color='lightgreen')
for bars in plot.containers:
  plot.bar_label(bars,  fmt='%.2f')
plt.title('Price Range With Highest Rating')
plt.xlabel('Price Range')
plt.ylabel('Rating')
plt.xticks(rotation=0)
plt.show()

Compare restaurants with and without table booking with Aggregate ratings

In this aspect, we would visualize the comparison of the restaurants with or without table bookings with aggregate ratings.

#Compare Restaurants that Has Table booking by ratings

plot = df.groupby('Has Table booking')['Aggregate rating'].mean().plot(kind='bar', color='lightgreen')
for bars in plot.containers:
  plot.bar_label(bars,  fmt='%.2f')
plt.title('Has Table Booking By Average Rating')
plt.xlabel('Has Table Booking')
plt.xticks(rotation=0)
plt.ylabel('Rating')
plt.show()

Observation:

We can also observe that restaurants that provide the Has Table Booking option tend to have higher ratings than restaurants that do not provide the Has Table Booking option.

Lastly, in our EDA section, we would check the Online delivery column for insights also.

Calculate the percentage of restaurants offering delivery

#percentage of restuarants offering online delivery

restuarant_delivery = round((len(df[df['Has Online delivery'] == 'Yes'])/len(df['Has Online delivery']))*100,2)

print(f'The percentage of restuarants offering Online Delivery is : {restuarant_delivery}%')

Output:

The percentage of restaurants offering Online Delivery is 25.69%

Observation:

The result indicates that 25.69% of restaurants provide online delivery.

Analyze availability across different price ranges

#Analyze the availability of online delivery among restaurants with different price ranges.


df.groupby('Has Online delivery')['Price range'].value_counts()

Observation:

We noticed that restaurants with a lower price range do not provide Online Delivery services, while restaurants with a higher price range do provide Online Delivery services.

Feature Engineering

In the feature engineering aspect, I would create encoded categorical columns of the Has Table Booking, Has Online Delivery, and Is Delivering Now columns by using the pd.get_dummies function. It can also be done by using the scikit-learn function, which is the One-Hot Encoder, but as said earlier, I would be using the Pd.get_dummies to carry it out.

#Encoding the categorical columns

df['Has Table booking'] = pd.get_dummies(df['Has Table booking'], drop_first=True, dtype=float)
df['Has Online delivery'] = pd.get_dummies(df['Has Online delivery'], drop_first=True, dtype=float)
df['Is delivering now'] = pd.get_dummies(df['Is delivering now'], drop_first=True, dtype=float)

df.head(5)

Observation:

1 is Yes, 0 is No

Handling of Skewness

In this aspect, I would handle the skewed data by using both the sqrt and log1p transformations to handle the skewness, and compare which transformation performed well in handling the skewness and use it in our model training. The log1p is used because it helps in handling zeros in the column about to be transformed.

Using the Sqrt Transformation

#Using the sqrt transformation

df['sqrt_Average_Cost_for_two'] = np.sqrt(df['Average Cost for two'])
df['sqrt_votes'] = np.sqrt(df['Votes'])
df['sqrt_Aggregate_rating'] = np.sqrt(df['Aggregate rating'])

Using The Log1p Transformation

#Using the log1p transformation

df['log1p_Average_Cost_for_two'] = np.log1p(df['Average Cost for two'])
df['log1p_votes'] = np.log1p(df['Votes'])
df['log1p_Aggregate_rating'] = np.log1p(df['Aggregate rating'])

Observation:

After using the sqrt and log1p transformations, it can be noticed from the visualizations that the log1p performed better on the Votes and Average cost for two columns.
The log1p was used instead of the regular log because it helps in treating zero values in the dataset.
Only the log1p_votes and log1p_Average_cost_for_two would be used in our model training.
There is no significant change in the Aggregate rating after using both the sqrt and log1p transformations, so we would just make use of the normal Aggregate column.
The Price range was not transformed at all because no significant outliers were being visualized.

Data Preparation

In this aspect, I would first split the data into the Independent variable(X) and the target variable (y)

#Split into X[Features] and y[Target variable]

X = df.drop(['Restaurant ID', 'Restaurant Name', 'Country Code', 'City', 'Address',
             'Locality', 'Locality Verbose', 'Longitude', 'Latitude', 'Cuisines','Currency','Switch to order menu',
             'Average Cost for two','sqrt_Average_Cost_for_two','Votes','Rating color', 'Rating text',
             'sqrt_votes', 'sqrt_Aggregate_rating','log1p_Aggregate_rating','Aggregate rating'], axis=1)
y = df['Aggregate rating']

Then, I would split the dataset into train and test splits using the scikit-learn TrainTestSplit.

#split the data into train and test

#import reqired library
from sklearn.model_selection import train_test_split,GridSearchCV

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Observation:

The dataset was split into Features [X] and Target[y] variable
It was then split into our Train and Test splits using TestTrainSplit.
The dataset was split into 80% train data and 20% test data.

Model Selection and Evaluation

We used three models for this prediction project, models used are:

Linear Regression: A simple regression model for prediction.
Decision Tree Regression: It uses a tree-like model to predict continuous numerical values by recursively splitting data based on features, ultimately leading to a prediction at the leaf nodes.
Random Forest Regression: It predicts continuous values by averaging the predictions of multiple, independently trained decision trees, each trained on a random subset of the data.

Linear Regression

#build the linear regression model
#import required library
from sklearn.linear_model import LinearRegression

lr = LinearRegression()

lr.fit(X_train,y_train)

#predict the train and test data
train_pred = lr.predict(X_train)
test_pred = lr.predict(X_test)

Evaluation of the Model;

#Evaluating the metrics of the Linear regression model

from sklearn.metrics import mean_squared_error, root_mean_squared_error, r2_score


#the mean square error for the Train and Test data

lr_train_mse = mean_squared_error(y_train,train_pred)
print(f'Training MSE: {lr_train_mse}')
print('\n')
lr_test_mse = mean_squared_error(y_test,test_pred)
print(f'Test MSE: {lr_test_mse} ')
print('\n')


#the root mean square error for the Train and Test data

lr_train_rmse = root_mean_squared_error(y_train,train_pred)
print(f'Training RMSE: {lr_train_rmse}')
print('\n')
lr_test_rmse = root_mean_squared_error(y_test,test_pred)
print(f'Test RMSE: {lr_test_rmse} ')
print('\n')


#the r2 score of the Train and Test data

lr_train_score = r2_score(y_train,train_pred)
print(f'Training SCORE: {lr_train_score}')
print('\n')
lr_test_score = r2_score(y_test,test_pred)
print(f'Test SCORE: {lr_test_score} ')
print('\n')

`Training MSE: 0.6387975061881729

Test MSE: 0.6360581133977781

Training RMSE: 0.7992480880103329

Test RMSE: 0.797532515573991

Training SCORE: 0.7225277903282395

Test SCORE: 0.7222492709572614`

Observation:

The results obtained from the Linear Regression model suggest that the model might not be capturing the data’s complexity well.

MSE (Mean Squared Error)

Training MSE: 0.6387
Test MSE: 0.6360
MSE is nearly the same for training and test sets, indicating no overfitting.
Lower MSE means the model’s predictions are relatively close to actual values

RMSE (Root Mean Squared Error)

Training RMSE: 0.7992
Test RMSE: 0.7975
RMSE is lower than the standard deviation, meaning the model improves upon a naive prediction (e.g., using the mean).

R² Score

Training R²: 0.7225
Test R²: 0.7222
A low R² (~0.72) means that the model explains 72% of the variance in the target variable.

The small gap between training and test R² suggests that the model generalizes well to unseen data.

Decision Tree Regression

Build the model for prediction. This is a decision tree model, we would make use of hyperparameters to fine-tune the model to prevent it from overfitting and get the best estimators to help our model predict well.

#Build Decision Tree Model
#import required library
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import GridSearchCV
#Define the parameter grid
param_gr = {
    'max_depth': [3, 5, 10, None],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 5],
}


#model

dtree = DecisionTreeRegressor()

#Grid search

grid_search = GridSearchCV(estimator=dtree, param_grid=param_gr, n_jobs=-1)
grid_search.fit(X_train, y_train)

After building the model and fine-tuning with hyperparameters, we would get the best estimator, which would be used in training our model.

#best estimator
best_gri= grid_search.best_estimator_

#Predict the train and test using the best estimator
d_train_pred = best_gri.predict(X_train)
d_test_pred = best_gri.predict(X_test)

Evaluation of the model.

#Evaluating the metrics of the Decision Tree regression model

from sklearn.metrics import mean_squared_error, root_mean_squared_error, r2_score


#the mean square error for the Train and Test data

d_train_mse = mean_squared_error(y_train,d_train_pred)
print(f'Training MSE: {d_train_mse}')
print('\n')
d_test_mse = mean_squared_error(y_test,d_test_pred)
print(f'Test MSE: {d_test_mse} ')
print('\n')


#the root mean square error for the Train and Test data

d_train_rmse = root_mean_squared_error(y_train,d_train_pred)
print(f'Training RMSE: {d_train_rmse}')
print('\n')
d_test_rmse = root_mean_squared_error(y_test,d_test_pred)
print(f'Test RMSE: {d_test_rmse} ')
print('\n')


#the r2 score of the Train and Test data

d_train_score = r2_score(y_train,d_train_pred)
print(f'Training SCORE: {d_train_score}')
print('\n')
d_test_score = r2_score(y_test,d_test_pred)
print(f'Test SCORE: {d_test_score} ')
print('\n')

`Training MSE: 0.09720034928562282

Test MSE: 0.1265090432569965

Training RMSE: 0.3117697055289735

Test RMSE: 0.35568109769426387

Training SCORE: 0.9577794286360224

Test SCORE: 0.9447566531186506
`

Observation:

Decision Tree Regression results show a significant improvement over Linear Regression!

MSE (Mean Squared Error)

Training MSE: 0.0972
Test MSE: 0.1265
Drastic improvement from the previous MSE of 0.638 (Linear Regression).
Slightly Higher Test Error than Training

RMSE (Root Mean Squared Error)

Training RMSE: 0.3117
Test RMSE: 0.3556
Huge improvement from the previous RMSE of 0.7992.
RMSE is way lower than the standard deviation, meaning much better predictions.
The average prediction error is small.
Train-test RMSEs are nearly identical, indicating excellent generalization.
Predictions are more accurate than before.

R² Score (Model Fit)

Training R²: 0.9577
Test R²: 0.9447
The model ~94% of variance captures almost all the variance in the target variable.
It is performing much better than Linear Regression (which had ~0.72 R²).
Decision Tree is outperforming Linear Regression (MSE, RMSE, and R² all improved).
No overfitting, as train and test R² are nearly equal.
A better model than the Linear Regression model.

Random Forest Regression

Build the model for prediction. This is a random forest model, we would make use of hyperparameters to fine-tune the model to prevent it from overfitting and get the best estimators to help our model predict well.

from sklearn.ensemble import RandomForestRegressor

# Define the parameter grid
param_grid = {
    'max_depth': [5, 10, 15, None],
    'n_estimators':[2,5,10]
}

#model

forest = RandomForestRegressor(random_state=42)

grid_search = GridSearchCV(forest, param_grid, n_jobs=-1)
grid_search.fit(X_train, y_train)

After building the model and fine-tuning with hyperparameters, we would get the best estimator, which would be used in training our model.

#best estimator
best_grid= grid_search.best_estimator_

#Predict the train and test using the best estimator
r_train_pred = best_grid.predict(X_train)
r_test_pred = best_grid.predict(X_test)

Evaluation of the model.

#Evaluating the metrics of the Random Forest regression model

from sklearn.metrics import mean_squared_error, root_mean_squared_error, r2_score


#the mean square error for the Train and Test data

r_train_mse = mean_squared_error(y_train,r_train_pred)
print(f'Training MSE: {r_train_mse}')
print('\n')
r_test_mse = mean_squared_error(y_test,r_test_pred)
print(f'Test MSE: {r_test_mse} ')
print('\n')


#the root mean square error for the Train and Test data

r_train_rmse = root_mean_squared_error(y_train,r_train_pred)
print(f'Training RMSE: {r_train_rmse}')
print('\n')
r_test_rmse = root_mean_squared_error(y_test,r_test_pred)
print(f'Test RMSE: {r_test_rmse} ')
print('\n')


#the r2 score of the Train and Test data

r_train_score = r2_score(y_train,r_train_pred)
print(f'Training SCORE: {r_train_score}')
print('\n')
r_test_score = r2_score(y_test,r_test_pred)
print(f'Test SCORE: {r_test_score} ')
print('\n')

`Training MSE: 0.08600751284842134

Test MSE: 0.1142196278739341

Training RMSE: 0.2932703749928065

Test RMSE: 0.33796394463601304

Training SCORE: 0.9626412213459802

Test SCORE: 0.9501231345929935 `

Observation:

Random Forest Regression model has shown significant improvements compared to Decision Tree and Linear Regression.

MSE (Mean Squared Error)

Training MSE: 0.0860
Test MSE: 0.1142
Test MSE decreased from 0.6360 → 0.1265 → 0.1142, meaning Random Forest generalizes better than Decision Tree and Linear Regression.

RMSE (Root Mean Squared Error)

Training RMSE: 0.304
Test RMSE: 0.328
Test RMSE dropped from 0.7975 → 0.3556 → 0.3379, meaning Random Forest reduces error significantly.
Training RMSE also improved, making the model more precise.
Prediction errors are small, generalise well.

R² Score (Model Fit)

Training R²: 0.9626
Test R²: 0.9501
Training R² increased from 0.7222 → 0.9447 → 0.9626, meaning Random Forest captures more variance.
Test Score (0.9501) is close to Training Score (0.9626). This means no signs of overfitting, which is great.
The model explains ~95% of the variance. Excellent fit.

KEY INSIGHTS

Random Forest outperformed Decision Tree and Linear Regression in every metric (lower RMSE, lower MSE, higher R²).
No signs of overfitting, as the train-test scores are close.
Better generalization, since test performance improved significantly.
Random Forest is the best-performing model so far.
The Random Forest regression is saved using the pickle library, as it is the best-performing model.

Conclusion

In this article, using the Restaurant dataset, we have demonstrated an end-to-end machine learning project from beginning to end. Data cleaning and visualization were our first steps. Then, to give better data to train with the machine learning model, we had to encode our categorical data using the pd.get_dummies(Feature Engineering). After that, we tried to build three machine-learning models and evaluated them. Random Forest regression is the best-performing model compared with Decision Tree Regression and Linear Regression.

You can check out the GitHub file here: Raw File

Build a Heads Or Tails Game With Python

Juwon?🍀 — Tue, 27 Jun 2023 11:30:50 +0000

Python is a programming language that's often used to develop websites, automate tasks, and analyze data. As a general-purpose language, Python can be used to create many different programs, and it is not specialized in any specific field.

In this article, I will implement the Heads or Tails game using the Python programming language. This article aims to provide a deeper and more helpful understanding of Python.

While heads-or-tails is a simple problem to solve in Python you
can learn from it.

Simplify: A simple way to structure your code in functions. Despite its simplicity, this is one of the skills you need a lot of practice in.
User Prompt: The game requires handling of user's input.
Randomness: You need to flip a coin and for that, you need randomness.
Control Flow: The game requires conditional statements and loops to control the game flow based on user input and game logic.

Prerequisites

Before we create the Heads or Tails game with python, ensure the following is installed on your system:

Operating systems such as Windows, Linux, or MacOS
Python installed on your system(updated version)
A code editor or IDE(Visual Studio code with Jupyter notebooks extension)

Let's go!!

Project Description

This is a simple project to create a heads-or-tails game in Python. We need to learn to work with functions.

Game Description.

The user is asked to guess head or tails
The game will “flip” a coin to either heads or tails.
The game will write if the user guessed correctly or not.

This article aims to demonstrate how simple and useful functions are.
Note: An advantage of writing functions is that it enables you to
test it isolated.

Simplifying The Program
I think it is always wise to break down problems into smaller pieces that can be handled separately rather than just starting to write code.

As a result, it will be easier to implement isolated pieces and, the best part is, you can test pieces of code independently.

A great way to organize code is to implement it into functions.

Let’s try to do it for this project.

Step 1 - Prompt User
In this step, I would create a function that prompts the user.

Creating User Input

#Python

def user_input():

    guess = ''

    while guess not in ['Head','Tail']:
        guess = input('Choose Head or Tail: ').capitalize()


    if guess in ['Head','Tail']:
        return guess

    else:
        print('Wrong choice')
print(user_input)

Code Output

#Python
#code output

Choose Head or Tail: fry
Choose Head or Tail: Head
'Head'

A while loop (while True) is used to prompt the user for input, so executing the code won't begin until a valid entry is entered.

Observe that my code didn't execute until I input the correct word 'Head' after I input the wrong word 'fry'.

Step 2 - Flip a coin
The next step is to implement a function that flips a coin randomly.

To do that we need randomness. Luckily,Python has a standard library that can assist you.

We will use randrange as it is a standard go-to function to use to get a random integer.

Creating a function to flip a coin

#Python

from random import randrange

def flip_coin():

    coin = randrange(2)

    if coin == 0:
        return 'Tail'
    else:
        return 'Head'

print(flip_coin)

Output of the flip a coin function

#Python
#code output

'Tail'

The call randrange(2) will either return 0 or 1. In the case of 0, tail are returned, in the case of 1, head is returned.

This output indicates that the randrange(2) chose 0 randomly, which means 'Tail' was returned.

Step 3 - Print The Result
Finally, we need to validate if the user’s guess is correct.

This is where the power of functions is great.

Creating the result function

#Python

def result(user_guess,coin):

    if user_guess == coin:
        print('Awesome Choice, You Are Correct!!!')
        print(f'User guessed {user_guess}, computer chose {coin}')

    else:
        print('Wrong Choice')
        print(f'User guessed {user_guess}, computer chose {coin}')

Step 4 - Combining All Functions
As we have already established, functions work independently.

Now it is time to combine it all.

#Python

user_guess = user_input()

coin = flip_coin()

print_result = result(user_guess,coin)

Final Output

#Python

Choose Head or Tail: Head
Wrong Choice
User guessed Head, computer chose Tail

In the output above, we can see that 'Head' was input by the user.

Also we can notice the computer randomly chose 0 randomly which returned 'Tail', finally making the user lose the game and end the program.

Conclusion
The article is about Building a Heads-or-Tails game, which I have broken down step by step for easy readability.

I must admit this is beautiful and simple.

Why is this powerful?
Because it is simple to understand. If you notice something is wrong, say, if it only flips tails, it is easy to identify where in the code you should look, and on top of that, you can test that piece of code isolated.

I hope it's educational. It would be greatly appreciated if you followed me, read my previous articles, share your honest opinions, and react and comment.

A Simple Python User Interaction Game

Juwon?🍀 — Sun, 04 Jun 2023 08:46:58 +0000

In this article, I will discuss how to use Python to create a simple user interaction. As you implement this code, you will gain a deeper understanding of Python programming.

Input and Output Handling: The game requires handling of user's input and displaying output to the user.
Control Flow: The game requires conditional statements and loops to control the game flow based on user input and game logic.
Debugging: Debugging is a crucial skill in programming. Implementing a simple user interaction code can help programmers practice debugging by identifying and fixing an error in the code.
User Interface Design: Although a simple program such as this does not require complex user interfaces, it can still teach a programmer the essentials of designing user-friendly interfaces for their programs.
Code organization: The program requires the organization of code into functions and modules, which can help the programmer learn how to write modular and reusable code.

Prerequisites

Before creating the Python simple user interface, ensure the following is installed on your system:

Operating systems such as Windows, Linux, or MacOS
Python installed on your system(updated version)
A code editor or IDE(Visual Studio code with Jupyter notebooks extension)

Project Description

This article will describe the program in detail:

Display a list
Have a user choose an index position and an input value
Replace value at index position with user's input value

Step 1: Displaying List
Let's start by creating the display function.

Creating the game list

#Python
#game list that would be passed into the display function

game_list = ['0','1','2']

Creating the display function

#Python
#A function that displays the game list

def display_list(game_list):
    print('Here is my current list')
    print(game_list)

print(display_list(game_list))

Output of the display function

#Code output

Here is my current list
['0', '1', '2']
None

Step 2: Position Choice
In this step, I would create a function that asks the user for a position. I would use the while loop to keep asking the user for inputs in case the user is entering a position out of range or a string.

Creating the position choice function

#Python
#A function that asks the user for input to determine the index position on the list 


def position_choice():

    #This original choice value can be anything that isn't an integer
    choice = ''

    #While choice is not a digit keep asking for input.
    while choice not in ['0','1','2']:
        choice = input('Pick a position (0,1 or 2): ')
        if choice not in ['0','1','2']:
            print('Sorry, Invalid choice!')

    return int(choice)

print(position_choice())

Output of the position choice function

#code output

Pick a position(0,1,2): two
Sorry, Invalid choice!
Pick a position(0,1,2): 24
Sorry, Invalid choice!
Pick a position(0,1,2):2

In the output above, I inputted the 'string two' and the 'number 24'. The string two is not a number, so the code continues, while the number 24 is out of range. I finally input the correct value '2' and the code runs perfectly.

Step 3: Replacement Value
In this step, I would create a function that replaces value in the game_list. This function allows the user to replace an item in the game_list with the chosen index position.

Creating the replacement function

#Python
#A function that asks for a replacement value

def replacement_choice(game_list, position):
    user_placement = input('Type a string to be replaced: ')
    game_list[position] = user_placement

    return game_list

print(replacement_choice(game_list,1))

Output of the replacement function

#code output

Type a string to be replaced: two
['0', 'two', '2']

In the output above, we can see that when the function was executed, it asked the user to input a string.
From the function, we can see it took in the game_list and position as attributes already.
The user input was replaced at the index position passed into the function with the list.

Step 4: Replay Choice
In this step, I would create a function that asks the player whether to keep playing.

Creating the replay function

#Python
#A function that asks for replay value

def gameon_choice():
    choice = ''

    while choice not in ['Y','N']:
        choice = input('Keep playing, please choose (Y or N): ').capitalize()
        if choice not in ['Y','N']:

            print('Sorry I dont understand, choose Y or N!')

    if choice == 'Y':
        return True
    else:
        return False

print(gameon_choice())

Output of the replay function

#code output

Keep playing, please choose (Y or N): Y
True

Step 5: Game Logic
This is the final step. In this step, we would arrange our functions and apply logic with conditional statements to allow the game to work as we want it to.

Creating the Game Logic

#Python
#Game Logic

game_on = True
game_list = ['0','1','2']

while game_on:
    display_list(game_list)

    position = position_choice()

    game_list = replacement_choice(game_list, position)

    display_list(game_list)

    game_on = gameon_choice()

Output of final code

#code output

Here is my current list
['0', '1', '2']
Pick a position(0,1 or 2): two
Sorry, Invalid choice!
Pick a position(0,1 or 2): 0
Type a string to be replaced: zero
Here is my current list
['zero', '1', '2']
Keep playing, please choose (Y or N):q
Sorry I don't understand, choose Y or N!
Keep playing, please choose (Y or N): N

Using While Loop
The while loop allows for a continual loop until a certain condition is met. In this case, the loop kept asking for valid input until the user chose a number or chose to end the game, which ended the loop. The code then replaced the value at index position '0' with the string 'zero' that was inputted by the user and displayed the updated list. Finally, the user was asked if he wanted to continue the game, the user chose the wrong input at first and was prompted again by the game, the user chose 'N' and the loop ended and the game was over.

Conclusion
The article is about the implementation of a simple user interaction game, which I have broken down step by step for easy readability. I hope it's educational. It would be greatly appreciated if you followed me, read my previous articles, share your honest opinions, and react and comment.

ERRORS AND EXCEPTIONS HANDLING IN PYTHON

Juwon?🍀 — Fri, 26 May 2023 02:46:18 +0000

Python is an easy-to-learn and use programming language. However, Python is prone to errors like any other programming language.
In this article, I would be discussing the common types of errors a Python programmer is likely to encounter and how to handle them.

Errors are problems in a program due to which the program will stop the execution. On the other hand, exceptions are raised when some internal events occur which change the normal flow of the program.

Two types of Error occurs in Python:

Syntax Errors
Logical Errors (Exceptions)

Syntax Errors In Python

Syntax errors occur when you have a typo or other mistake in your code that causes it to be invalid syntax. These errors are usually caught by Python's interpreter when you try to run the code.

Example:

#python

if x = 10:
    print('x is equal to 10')

Output:
Cell In[12], line 1 if x = 10: ^ SyntaxError: invalid syntax. Maybe you meant '==' or ':=' instead of '='?

In this example, we are trying to assign the value 10 to the variable x using the assignment operator (=) inside an if statement.

But the correct syntax for comparing values in an if statement is to use the comparison operator (==).

So here's how you fix this one:

#python

if x == 10:
    print('x is equal to 10')

Here are some tips for avoiding syntax errors:

Double-check your code for typos or other mistakes before running it.
Use a code editor that supports syntax highlighting to help you catch syntax errors.
Read the error message carefully to determine the location of the error.

Logical Errors In Python (Exceptions)

When in the runtime an error that occurs after passing the syntax test is called exception or logical type. For example, when we divide any number by zero then the ZeroDivisionError exception is raised, or when we import a module that does not exist then ImportError is raised.

A few of the errors would be discussed below, how they are caused and how they can be fixed.

Indentation Error In Python:
One of the most common errors in Python is indentation errors. Unlike many other programming languages, Python uses whitespace to indicate blocks of code, so proper indentation is critical.

To avoid indentation errors, it's a good idea to use a code editor that supports automatic indentation, such as PyCharm or Visual Studio Code.

Example:

#python

for i in range(10):
print(i)

In this example, the code inside the for loop is not indented correctly.

Fix:

#python

for i in range(10):
    print(i)

Name Errors In Python:
Name errors occur when you try to use a variable or function that hasn't been defined. For example, if you try to print the value of a variable that hasn't been assigned a value yet, you'll get a name error.

Example:

#python

my_variable = 5
print(my_vairable)

In this example, we misspelled the variable name my_variable as my_vairable.

Fix:

#python

my_variable = 5
print(my_variable)

Type Errors In Python:
Another common error in Python is type errors. Type errors occur when you try to perform an operation on data of the wrong type. For example, you might try to add a string and a number, or you might try to access an attribute of an object that doesn't exist.

Example:

#python

x = "5"
y = 10
result = x + y

In this example, we are trying to concatenate a string and an integer, which is not possible.

Fix:

#python

x = "5"
y = 10
result = int(x) + y

Here, we convert the string to an integer using the int() function before performing the addition.

Index Errors In Python:
Index errors occur when you try to access an item in a list or other sequence using an index that is out of range. For example, if you try to access the fifth item in a list that only has four items, you'll get an index error.

Example:

#python

my_list = [1, 2, 3, 4]
print(my_list[5])

In this example, we are trying to access an item at index 5, which is outside the range of the list.

Fix:

#python

my_list = [1, 2, 3, 4]
print(my_list[3])

Here, we access the item at index 3, which is within the range of the list.

Key Errors In Python:
Key errors occur when you try to access a dictionary using a key that doesn't exist. For example, if you try to access the value associated with a key that hasn't been defined in a dictionary, you'll get a key error.

Example:

#python

my_dict = {"name": "John", "age": 25}
print(my_dict["gender"])

In this example, we are trying to access the value for the key "gender", which does not exist in the dictionary.

Fix:

#python

my_dict = {"name": "John", "age": 25}
print(my_dict.get("gender", "Key not found"))

Here, we use the get() method to access the value for the key "gender". The second argument of the get() method specifies the default value to return if the key does not exist.

Attribute Errors In Python:
Attribute errors occur when you try to access an attribute of an object that doesn't exist, or when you try to access an attribute in the wrong way.

There are several different types of attributes in Python:

Instance attributes: These are attributes that belong to a specific instance of a class.
Class attributes: These are attributes that belong to a class rather than an instance.
Static attributes: These are attributes that belong to a class, but can be accessed without creating an instance of the class.

Example:

#python

my_list = [1, 2, 3, 4]
my_list.append(5)
my_list.add(6)

In this example, we are trying to add an item to the list using the add() method, which does not exist for lists.

Fix:

#python

my_list = [1, 2, 3, 4]
my_list.append(5)

Here, we use the append() method to add an item to the list.

Error Handling In Python

When an error and an exception are raised then we handle that with the help of the Handling method.

Handling Exceptions With Try/Except/Finally:
We can handle errors by the Try/Except/Finally method. we write unsafe code in the try, fall back code in except and final code in finally block.

Example:

#python

# put unsafe operation in try block
try:
    print("code start")

    # unsafe operation perform
    print(1 / 0)

# if error occur the it goes in except block
except:
    print("an error occurs")

# final code in finally block
finally:
    print("All Done!!!")

Output:
code start an error occurs All Done!!!

Raising For exceptions For a Predefined Condition:
When we want to code for the limitation of certain conditions then we can raise an exception.

Example:

#python

# try for unsafe code
try:
    amount = 1999
    if amount < 2999:

        # raise the ValueError
        raise ValueError("please add money in your account")
    else:
        print("You are eligible to purchase DSA Self Paced course")

# if false then raise the value error
except ValueError as e:
        print(e)

Output:
please add money in your account

Conclusion:
In this article, we covered some of the most common errors in Python and how to fix them. By understanding these errors and how to fix them, you can become a more confident and effective Python programmer.

PLEASE LIKE, COMMENT AND SHARE...

DIFFERENCES BETWEEN FUNCTIONS AND METHODS IN PYTHON

Juwon?🍀 — Wed, 10 May 2023 16:59:56 +0000

In this article, we aim to discuss the difference between FUNCTIONS AND METHODS IN PYTHON PROGRAMMING, to have a clear understanding of both.

To begin with;

WHAT ARE FUNCTIONS IN PYTHON?

A function is a block of code (or statement) that performs a specific task and runs only when called. A function is also a line of code that accomplishes a certain task. Functions have :

Name
Argument
Return Statement Return statement and argument are both optional. A function can either have them or not.

There are mainly three types of FUNCTIONS in Python

Built-in Function
User-defined Function
Anonymous Function

Built-in Function: Built-in functions are the pre-defined functions in Python that can be directly used. We do not need to create them, to use them we just have to call them. Examples: sum(),min(),max(),etc...

An example to show the syntax of built-in functions.

Syntax

#python

li = [1,2,3]
ans = sum(li)
print(ans)

Output:
6

In the above code example, we used two Built-in functions, sum() and print() to find and output the sum of the list li.

User-Defined Function: They are not pre-defined functions. The user creates their function to fulfil their specific needs.

Creating a User-Defined Function In Python
We can create a function using the keyword def.

Syntax

#python
def function_name(argument):
    “ “ “
    functions logics
    ” ” ”
    return values

Example:

#python
def join(str1, str2):
    joined_str = str1 + str2
    return joined_str
print(join)

Output:
<function join at 0x000002E42649D750>
In the above code example, we created a function named join which combines its two parameters and returns a concatenated string.

Functions are only executed when they are called.

Calling A Function:
Without calling, a function will never run or be executed. To call a function we use the following syntax.

Syntax

#python

function_name(arguments)

Example:

#python

print(join("Hello", "World"))

Output:
HelloWorld

In the above code example, we executed our function join() by calling it with two arguments “Hello” and “World” and it returned their concatenated string “HelloWorld”.

Anonymous Functions in Python: Functions without a name and are declared without using the keyword def are called Anonymous Functions. To create these functions, we use the keyword lambda and these functions are also called Lambda Functions.

Example:

#python 

li = [1,2,3]
new = list(map(lambda x:x+1, li))
print(new)

Output:
[2, 3, 4]

WHAT ARE METHODS IN PYTHON

Functions inside a class are called methods. Methods are associated with a class/object. As Python is an Object-Oriented Programming language, it contains objects, and these objects have different properties and behaviour. Methods in Python are used to define the behaviour of the Python objects.

It facilitates code reusability by creating various methods or functions in the program.
It also improves readability and accessibility to a particular code block.
Methods creation makes it easy to debug for the programmers.

Creating a Method in Python:
We use the same syntax as the function but this time, it should be inside a class.

Syntax

#python

class ClassName:
def method_name(parameters):
    # Statements…

Example:

#python

class Addition:

def add(self, num1, num2):
    return num1 + num2
print(Addition.add)

Output:
<function Addition.add at 0x1088655e0>

Calling a Method in Python:
To use a method, we need to call it. We call the method just like a function but since methods are associated with class/object, we need to use a class/object name and a dot operator to call it.

Syntax

#python 

object_name.method_name(arguments)

Example:

#python

addition1 = Addition() # Object Instantiation
print(addition1.add(2, 3))

Output:
5

In the above code example, we created an object addition1 for our class Addition and used that object to call our method add() with arguments 2 and 3. After calling, our method got executed and returned the value 5.

Being inside a class gives methods a few more abilities. Methods can access the class/object attributes. Since methods can access the class/object attributes, they can also alter them.

Types Of Method In Python:

Instance Method: It is one of the most common methods in Python, used to set or get details about the instances (objects). Self is the default parameter that points to an instance of the class.
Class Method: It is used to get the status of the class, but they can’t access or modify the specific instance data. It is defined using the @classmethod decorator.
Static Method: A static method doesn’t know if it’s a class or an instance, and they do not need to access the class data. It is defined using the @staticmethod decorator.

NOTABLE DIFFERENCES BETWEEN FUNCTION AND METHOD IN PYTHON

Method definition is always present inside the class, while the class is not required to define the function.
Functions can have a zero parameter, whereas the method should have a default parameter, either self or cls, to get the object.
The method operates the data in the class, while a function is used to return or pass the data.
A function can be directly called by its name, while a method can’t be called by its name.
The method lies under Object-Oriented Programming, while a function is an independent functionality.

By looking at the above differences, we can simply say that all methods are functions but all functions are not methods.

CONCLUSION
In this article, we have discussed the functions and methods in Python, the difference between them, and their types.
Hope you will like the article.
Keep Learning!!
Keep Sharing!!

UNDERSTANDING LOOPS IN PYTHON

Juwon?🍀 — Tue, 02 May 2023 15:43:12 +0000

Python is a general high-level programming language easy to understand and execute, it is open-source, which means it is free to use. In this article we would be discussing LOOPS IN PYTHON.

Before we go too far, let us get a basic understanding of loops, loops in Python allow us to execute a group of statements several times.

Python loops can be executed in two ways, they provide similar basic functionality but they differ in their syntax and condition-checking time.

While Loop
For Loop

During this article you will learn how to implement loops in the Python program and also upgrade your Python programming skillsets, this article will assist you better in understanding how and when to use loops, how to end loops and the functionality of loops in Python.

WHILE LOOP

In Python a while loop is used to execute a block of statements repeatedly until a given condition is achieved, when the condition becomes false, the line immediately after the loop is executed.

Syntax:

#python
while expression:
     statement(s)

This particular loop instructs the computer to continuously execute a code based on the value of a condition.

Example of while loop:

#python program to illustrate while loop
n = 0
while (n < 3):
    n = n + 1
    print('This is a while loop')

Output
This is a while loop This is a while loop This is a while loop
Else statement with while loop in Python:
This else clause is only activated if your while condition becomes false, if you break out of the loop or an exception is raised, it won't be executed.

Here is an example of else statement with a while loop:

#python
n = 0
while (n < 3):
    n = n + 1
    print('This is a while loop')
else:
   print('This is an else statement with a while loop')

Output
This is a while loop This is a while loop This is a while loop This is an else statement with a while loop
Infinite while loop in Python
An infinite loop is a loop that keeps executing and never stops, If we want a block of code to execute an infinite number of times, we can use the infinite loop.
Example:

#python to illustrate infinite loop

count = 0
while count == 0:
    print('infinite loop')

Note: It is suggested to never use this type of loop as it is a never-ending infinite loop where the condition is true, you would have to forcefully terminate the compiler. To avoid this issue, it's a good idea to take a moment to consider the different values a variable can take. This helps you make sure the loop won't get stuck during iteration.

FOR LOOP

A for loop iterates over a sequence of values. In Python, there is a "for in" loop which is similar to the "for each" loop in other programming languages.

Syntax

#python
for iterator_val in sequence:
    statement(s)

It can be used to iterate over a range and iterators

Example:

#python
for i in range(5):
    print(i)

Output
0 1 2 3 4

Well, the power of the for loop is that we can use it to iterate over a sequence of values of any type, not just a range of numbers. For example, we can iterate over a list of strings or words:

#python
friends = ['Tommy', 'Johnny',' Mike']
for name in friends:
    print('Hello',name)

Hello Tommy Hello johnny Hello Mikey
The sequence that the For loop iterates over could contain any type of element, not just strings. For example, we could iterate over a list of numbers to calculate the total sum and average.
Example:

#python
values = [2,3,4,5,60,30]
sum, length = 0, 0
for value in values:
    sum += value
    length += 1

print('TOTAL SUM: ' + str(sum) + ' AVERAGE: ' + str(round(sum/length,2)))

Output
TOTAL SUM: 104 AVERAGE: 17.33
Else statement with for loop in Python:
We can also combine the else statement with FOR loop like in the WHILE loop. But as there is no condition in for loop based on which the execution will terminate so the else block will be executed immediately after for block finishes execution.

Example:

#python
# Python program to illustrate
# combining else with for

list = ["book", "for", "book"]
for index in range(len(list)):
    print(list[index])
else:
    print("Inside Else Block")

Output:
book for book Inside Else Block
HOW TO IDENTIFY WHAT TYPE OF LOOP TO USE:
If you are wondering when you should use FOR and WHILE loops, there is a way to tell;

Use FOR loops when there is a sequence of elements that you want to iterate.
Use WHILE loops when you want to repeat an action until a condition changes.

NESTED LOOPS
A nested loop is one or more FOR loops inside another loop.

Syntax:

#python
for iterator_var in sequence:
    for iterator_var in sequence:
        statement(s)
    statement(s)

The syntax for a nested while loop in Python programming:

#python
while expression:
    while expression:
         statement(s):
    statement(s):

Example:

# Python program to illustrate
# nested for loops in Python
from __future__ import print_function
for i in range(1, 5):
    for j in range(i):
        print(i, end=' ')
    print()

Output:
1 2 2 3 3 3 4 4 4 4

Loop Control Statements

Loop control statements change execution from their normal sequence. When execution leaves a scope, all automatic objects that were created in that scope are destroyed. Python supports the following control statements.

Continue Statement
The CONTINUE statement in Python returns the control to the beginning of the loop.

Example:

#python
# Prints all letters except 'o' and 'k'
for letter in 'bookforbook':
    if letter == 'o' or letter == 'k':
        continue
    print('Current Letter :', letter)

Output:
Current Letter : b Current Letter : f Current Letter : r Current Letter : b

Break Statement
The BREAK statement in Python brings control out of the loop.

Example:

#python
for letter in 'bookforbook':

    # break the loop as soon it sees 'o'
    # or 'k'
    if letter == 'o' or letter == 'k':
        break

print('Current Letter :', letter)

Output:
Current Letter : o

Pass Statement

#python
# An empty loop
for letter in 'bookforbook':
    pass
print('Last Letter :', letter)

Output:
Last Letter : k

Common Mistakes With Loops In Python

Iterating over non-sequences: As I’ve mentioned already, for loops iterate over sequences. Consequently, Python’s interpreter will refuse to iterate over single elements, such as integers or non-iterable objects.
Failure to initialize variables. Make sure all the variables used in the loop’s condition are initialized before the loop.
Unintended infinite loops. Make sure that the body of the loop modifies the variables used in the condition so that the loop will eventually end for all possible values of the variables.
Forgetting that the upper limit of a range() isn’t included.

How does For loop in Python works internally?:

Before proceeding to this section, you should have a prior understanding of Python Iterators.

Firstly, let's see what a simple for loop looks like.

#python
# A simple for-loop example

fruits = ["apple", "orange", "kiwi"]

for fruit in fruits:

    print(fruit)

Output:
apple orange kiwi

Here we can see the for loops iterates over iterable object fruit which is a list. Lists, sets, and dictionaries are few iterable objects while an integer object is not an iterable object. For loops can iterate over any of these iterable objects.

Now with the help of the above example, let’s dive deep and see what happens internally here.

Make the list (iterable) an iterable object with the help of the iter() function.
Run an infinite while loop and break only if the StopIteration is raised.
In the try block, we fetch the next element of fruits with the next() function.
After fetching the element we did the operation to be performed with the element. (i.e print(fruit))

#python
fruits = ["apple", "orange", "kiwi"]

# Creating an iterator object
# from that iterable i.e fruits
iter_obj = iter(fruits)

# Infinite while loop
while True:
    try:
        # getting the next item
        fruit = next(iter_obj)
        print(fruit)
    except StopIteration:

        # if StopIteration is raised,
        # break from loop
        break

Output:
apple orange kiwi

Conclusion:
In this article, I explained how to tell a computer to do an action repetitively. Python gives us three different ways to perform repetitive tasks: while loops, for loops.

For loops are best when you want to iterate over a known sequence of elements but when you want to operate while a certain condition is true, while loops are the best choice.