TDoC 2024 - Day 3: Introduction to Machine Learning

Sagar Das — Tue, 17 Dec 2024 16:20:20 +0000

Welcome to Day 3 of TDoC 2024! Today, we embarked on a fascinating journey into the realm of Machine Learning (ML)—a powerful branch of Artificial Intelligence (AI) that allows machines to learn from data and improve their performance over time without explicit programming.

Machine Learning essentially involves teaching computers to recognize patterns in data so they can make predictions or decisions. In simpler terms, it’s about creating smarter systems that get better with experience.

Understanding Machine Learning

Machine Learning enables systems to:

Recognize handwritten digits after being trained with multiple examples.
Predict outcomes, such as stock prices or weather conditions.
Make suggestions, like Netflix recommending your next favorite show. Real-World Applications

Here are some practical ML applications that you may already be using:

Spam Classification: Filtering out spam emails (e.g., Gmail).
Movie Recommendations: Suggesting shows based on your viewing history (e.g., Netflix).
Language Translation: Translating text from one language to another (e.g., Google Translate).
Grammar Assistance: Correcting grammar and spelling errors (e.g., Grammarly).

For our VocalShift project, we’ll leverage Machine Learning to build an intelligent voice-changing AI.

Types of Machine Learning

1. Supervised Learning
The algorithm is trained on labeled data, where each input has a corresponding output.

Goal: Discover the relationship between input features and their outcomes.
Example: Predicting house prices based on features like size, location, and amenities.

In this project we’ll focus on Supervised Learning to train models for voice transformation.

2. Unsupervised Learning
The algorithm identifies patterns in data without predefined labels.

Goal: Cluster or group similar data points based on their features.
Example: Segmenting customers into groups based on purchasing behavior.

3. Reinforcement Learning
The algorithm learns by interacting with an environment, receiving rewards for desired actions and penalties for undesirable ones.

Goal: Optimize actions to maximize long-term rewards.
Example: Training a robot to play chess through trial and error.

Steps in a Machine Learning Project

A typical ML workflow can be broken into three key stages:

1. Data Collection
The success of any ML project begins with gathering high-quality data.
For example, a dataset of customer purchase records or a list of cricket chirps per minute.

2. Data Modeling
Modeling involves applying ML algorithms to extract insights from data and create predictive models.

3. Model Deployment
Once the model is trained, it is integrated into real-world applications, such as a product recommendation engine or a medical diagnostic system.

Diving Deeper: Data Preprocessing

Before training an ML model, data must be cleaned and prepared.

Steps in Data Preprocessing:

Handing Missing Values
- Replace missing values with statistical measures like mean, median, or mode.
- Drop rows or columns with excessive missing data.
Encoding Categorical Data
- Convert non-numeric data (e.g., categories) into numeric formats for model compatibility.
Feature Scaling
- Standardize data (mean = 0, std = 1) to ensure uniformity, especially for algorithms sensitive to scale, like neural networks.
Dataset Splitting
- Divide the dataset into training and testing subsets.
- The training set is used to build the model, while the testing set evaluates its performance.

Choosing a Machine Learning Model

Selecting the right ML model depends on the type of problem and the nature of the data:

Supervised Learning models are suited for labeled data.
Unsupervised Learning models are ideal for unlabeled datasets.

Training the Model

Model training involves fitting the ML algorithm to the training dataset. As the model learns, it improves its ability to make accurate predictions on unseen data.

Parameter Tuning

To achieve optimal results, ML models undergo parameter tuning using techniques like Gradient Descent to minimize the loss function—a measure of prediction error.

Hands-On: Implementing Linear Regression

Creating a Simple Linear Regression Class

Here’s how we can code a Simple Linear Regression model in Python, complete with gradient descent for optimization:

python

import numpy as np
import random
import matplotlib.pyplot as plt

class SimpleLinearRegression:
    def __init__(self, learning_rate=0.01, n_iterations=1000, threshold=1e-6):
        self.lr = learning_rate
        self.max_iter = n_iterations
        self.threshold = threshold
        self.weight = None  # Coefficient (slope)
        self.bias = None    # Intercept

    def fit(self, X, Y):
        X = np.array(X)
        Y = np.array(Y)
        if X.ndim == 1:
            X = X.reshape(-1, 1)

        n = X.shape[0]
        loss_history = []

        # Initializing weight and bias
        self.weight = random.uniform(-1, 1)
        self.bias = random.uniform(-1, 1)

        # Initial prediction and loss
        Y_predict = self.predict(X)
        errors = Y_predict - Y
        prev_loss = np.sum(errors ** 2) / (2 * n)
        loss_history.append(prev_loss)

        for i in range(self.max_iter):
            # Compute gradients
            dW = np.sum(errors * X.flatten()) / n  # Gradient for weight
            dB = np.sum(errors) / n                # Gradient for bias

            # Gradient descent step
            self.weight -= self.lr * dW
            self.bias -= self.lr * dB

            # Update predictions and calculate new loss
            Y_predict = self.predict(X)
            errors = Y_predict - Y
            curr_loss = np.sum(errors ** 2) / (2 * n)
            loss_history.append(curr_loss)

            # Check for convergence
            if abs(prev_loss - curr_loss) < self.threshold:
                print(f"Converged after {i + 1} iterations.")
                break
            prev_loss = curr_loss

            # Optional progress output
            if (i + 1) % 100 == 0:
                print(f"Iteration {i + 1}: Loss = {curr_loss:.6f}")

        return loss_history

    def predict(self, X):
        X = np.array(X)
        if X.ndim == 1:
            X = X.reshape(-1, 1)
        return self.weight * X.flatten() + self.bias

    def score(self, X, Y):
        y_pred = self.predict(X)
        u = np.sum((Y - y_pred) ** 2)  # Residual sum of squares
        v = np.sum((Y - np.mean(Y)) ** 2)  # Total sum of squares
        return 1 - (u / v)

    def plot(self, X, Y):
        X = np.array(X)
        Y = np.array(Y)
        if X.ndim == 1:
            X = X.reshape(-1, 1)

        y_pred = self.predict(X)
        plt.scatter(X.flatten(), Y, label='Data Points')
        plt.plot(X.flatten(), y_pred, color='red', label='Regression Line')
        plt.xlabel('X')
        plt.ylabel('Y')
        plt.title('Simple Linear Regression')
        plt.legend()
        plt.show()

    def plot_loss(self, loss_history):
        plt.plot(range(len(loss_history)), loss_history, label='Loss')
        plt.xlabel('Iterations')
        plt.ylabel('Loss')
        plt.title('Loss Over Iterations')
        plt.legend()
        plt.show()

Testing the Implementation

We can test the above class using synthetic data:'

python

#Generate synthetic data
np.random.seed(42)
X = np.random.rand(100) * 10  # Random numbers between 0 and 10
Y = 2.5 * X + np.random.randn(100) * 2  # Linear relationship with noise

#Train the model
model = SimpleLinearRegression(learning_rate=0.01, n_iterations=1000, threshold=1e-6)
loss_history = model.fit(X, Y)

#Evaluate the model
print("Weight (slope):", model.weight)
print("Bias (intercept):", model.bias)
print("R² Score:", model.score(X, Y))

#Visualize the regression line
model.plot(X, Y)

#Visualize the loss over iterations
model.plot_loss(loss_history)

Solving a Real-World Problem: Predicting Temperature from Cricket Chirps

Using a real-world dataset, let’s predict temperature based on cricket chirps.

Step 1: Import and Load the Dataset

python

import numpy as np
import pandas as pd

dataset = pd.read_csv('cricket_chirps.csv')
print(dataset)
X = dataset.iloc[:, :-1].values  # Extract independent variables
y = dataset.iloc[:, 1].values    # Extract dependent variable

Step 2: Split Data into Training and Test Sets

Splitting the dataset is a crucial step to ensure the model is trained on one part of the data and evaluated on another.

python

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Step 3: Train a Linear Regression Model

Using the scikit-learn library, we can create a linear regression model:

python

from sklearn.linear_model import LinearRegression

regressor = LinearRegression()
regressor.fit(X_train, y_train)  #Train the model

Step 4: Make Predictions

Once the model is trained, it can be used to predict unseen data:

python

y_pred = regressor.predict(X_test)
print(y_pred)

Step 5: Visualize the Results

Visualization helps in understanding the performance of the model:

python

import matplotlib.pyplot as plt

plt.scatter(X_train, y_train, color='green')  # Training data
plt.scatter(X_test, y_test, color='red')      # Actual test data
plt.scatter(X_test, y_pred, color='blue')     # Predicted test data
plt.plot(X_train, regressor.predict(X_train), color='gray')  # Regression line
plt.title('Temperature Prediction Based on Cricket Chirps')
plt.xlabel('Chirps per Minute')
plt.ylabel('Temperature')
plt.show()

What You Achieved on Day 3

By the end of today, you:

Gained a solid understanding of Machine Learning and its real-world applications.
Learned about the three main types of ML: Supervised, Unsupervised, and Reinforcement Learning.
Explored the ML project lifecycle, from data collection to deployment.
Implemented and visualized a Linear Regression model in Python.

Resources for Further Learning

scikit-learn user guide

Your Feedback Matters!

Feel free to share your results, challenges, or questions in the comments. Happy coding! 🚀

Forem: Sagar Das

TDoC 2024 - Day 3: Introduction to Machine Learning

Understanding Machine Learning

Here are some practical ML applications that you may already be using:

Types of Machine Learning

Steps in a Machine Learning Project

Diving Deeper: Data Preprocessing

Steps in Data Preprocessing:

Handing Missing Values

Encoding Categorical Data

Feature Scaling

Dataset Splitting

Choosing a Machine Learning Model

Training the Model

Parameter Tuning

Hands-On: Implementing Linear Regression

Creating a Simple Linear Regression Class

Testing the Implementation

Solving a Real-World Problem: Predicting Temperature from Cricket Chirps

Step 1: Import and Load the Dataset

Step 2: Split Data into Training and Test Sets

Step 3: Train a Linear Regression Model

Step 4: Make Predictions

Step 5: Visualize the Results

What You Achieved on Day 3

Resources for Further Learning

Your Feedback Matters!