Forem: Anik Chand

I Was Tired of Waiting for GridSearchCV. So I Built Something Smarter. 🚀

Anik Chand — Tue, 24 Mar 2026 19:15:00 +0000

Have you ever set up a GridSearchCV, pressed run, watched the little spinner go... and then just left the room? Maybe made tea. Maybe made dinner. Came back — and it was still running?

I hit that wall one too many times. Instead of waiting, I started thinking — why does this have to be this slow? That frustration turned into a late-night coding session, which became LazyTune — a smarter hyperparameter tuner for scikit-learn that I turned into a proper Python package with a live web app.

The Problem with GridSearchCV

Here's what GridSearchCV does under the hood:

You give it a parameter grid. Say 4 values for n_estimators, 4 for max_depth, 4 for min_samples_split. That's 64 combinations. With 5-fold CV, that's 320 full training runs. On your entire dataset. Every single one.

RandomizedSearchCV helps a little — it just picks random combos instead of all of them. But random is dumb. It has no idea which combinations are promising. Tools like Optuna and Hyperopt are genuinely clever, but they come with their own vocabulary and APIs, and honestly feel like overkill when you just want to tune a Random Forest on a Friday afternoon.

The Insight That Started Everything

Here's the thought that clicked at 2am:

Most hyperparameter combinations are obviously bad within the first few training rounds. You don't need to fully train them to know they're losers.

Think of it like a talent show audition. You don't give every contestant an hour-long slot. You do a quick 2-minute round first, figure out who's genuinely talented, then give the finalists the full slot.

LazyTune does exactly this:

Generate all combinations
Do a quick screening on a small data subset
Rank every combination by early performance
Prune the obvious losers (the bottom X%)
Fully train only the survivors
Return the winner

The whole thing lives in one class — SmartSearch — that you use almost identically to GridSearchCV. No new mental model. No new vocabulary. Just smarter.

Under the Hood — The Full Data Flow

Step 1 — Split your data: 80% training, 20% held-out test set.

Step 2 — Split the training set further: 30% becomes a "small screening subset," 70% becomes a validation pool.

Step 3 — Screen ALL combinations quickly on that 30% subset using 3-fold CV.

Step 4 — Rank all combinations by validation score — you get a full leaderboard.

Step 5 — With prune_ratio=0.1, keep the top ~10%. The other 90%? Gone. Never fully trained. ✂️

Step 6 — Fully train the surviving configs on the entire training set with proper CV.

Step 7 — Evaluate survivors on the held-out 20%. The winner comes back as best_estimator_ with best_params_ and best_score_.

The result: serious compute on 3 configs instead of 27 — but because the screening was smart, the winner is almost always the same as exhaustive GridSearchCV.

GridSearchCV vs LazyTune — The Visual Difference

On the left, every combination gets the full expensive treatment. On the right, LazyTune screens all 9 cheaply, crosses out the clear losers, and only invests in the survivors. Same result. Fraction of the compute.

The Benchmarks

Does LazyTune match GridSearchCV's accuracy?

For all three classifiers — RandomForest, SVC, LogisticRegression — the accuracy bars are practically indistinguishable. Both hitting 95–97%. LazyTune matches GridSearchCV's accuracy almost perfectly.

LazyTune vs Every Major Tuner

Method	Accuracy	Runtime
🥇 LazyTune	0.940	6.23s
GridSearchCV	0.909	5.02s
RandomizedSearchCV	0.909	4.83s
Optuna	0.912	5.43s
Hyperopt	0.913	10.37s 😬

LazyTune gets the highest accuracy of all five. Hyperopt — often touted as the smart Bayesian choice — takes 10.37s and still scores lower.

Large Dataset Benchmark

Method	Accuracy	Runtime
🥇 LazyTune	0.982	122.3s
GridSearchCV	0.978	143.5s
RandomizedSearchCV	0.978	24.5s
Optuna	0.978	76.2s
Hyperopt	0.977	86.6s

LazyTune still leads on accuracy. GridSearchCV is 17% slower and still loses. LazyTune consistently gives you the best accuracy-per-second of anything tested.

Let's Write Some Code

pip install lazytune

Basic usage — Random Forest

from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from lazytune import SmartSearch

X, y = load_breast_cancer(return_X_y=True)

param_grid = {
    "n_estimators": [50, 100, 150, 200],
    "max_depth": [5, 10, 15, None],
    "min_samples_split": [2, 3, 4, 5]
}

search = SmartSearch(
    estimator=RandomForestClassifier(random_state=42),
    param_grid=param_grid,
    metric="accuracy",
    cv_folds=3,
    prune_ratio=0.5,
    n_jobs=-1
)

search.fit(X, y)

print("Best parameters:", search.best_params_)
print("Best CV score:  ", search.best_score_)
print("Best model:     ", search.best_estimator_)

One new parameter — prune_ratio. One renamed parameter — metric instead of scoring. Everything else is identical to sklearn.

SVM with F1 score

search = SmartSearch(
    estimator=SVC(random_state=42),
    param_grid={
        "C": [0.1, 1, 10, 50, 100],
        "kernel": ["linear", "rbf"],
        "gamma": ["scale", "auto", 0.001, 0.0001]
    },
    metric="f1_macro",
    cv_folds=5,
    prune_ratio=0.6
)
search.fit(X, y)

Understanding `prune_ratio`

`prune_ratio`	What happens	Best for
`0.1`	Keep only top 10%	Huge grids where you trust fast screening
`0.3`	Keep top 30%	Good balance, slightly conservative
`0.5`	Keep top 50%	Start here — recommended default
`1.0`	Keep everything	Same as GridSearchCV — comparison only

Always start at 0.5. Once you trust the screening on your dataset, try going lower.

After `.fit()` — Everything You Get Back

search.best_params_      # dict      — the winning hyperparameter combo
search.best_score_       # float     — best cross-validated score
search.best_estimator_   # model     — fully fitted, ready to use
search.summary_          # DataFrame — every trial ranked by score
search.cv_results_       # dict      — full CV results per candidate

Since best_estimator_ is a normal sklearn model:

predictions = search.predict(X_new)
accuracy    = search.score(X_test, y_test)

No adapters. No wrappers. Just works.

There's Also a Web App

lazytune.vercel.app — upload your CSV, pick a model, enter your parameter ranges, hit run. The same SmartSearch engine runs on the backend. No local setup required.

What's On the Roadmap

Auto prune_ratio — calibrate pruning based on grid size and a time budget
XGBoost / LightGBM native support
Early stopping within screening — kill candidates mid-CV the moment they're failing
Visual trial landscape — a heatmap of the hyperparameter space in the web UI
Timing breakdown in summary_

Open an issue on GitHub if any of these excite you — or if you have an idea I haven't thought of.

Try It Right Now

pip install lazytune

📦 PyPI: pypi.org/project/lazytune
💻 GitHub: github.com/anikchand461/lazytune
🌐 Live Demo: lazytune.vercel.app

This started as frustration at 2am and is now a real thing people can pip install. If you found it useful — a ❤️ on the post or a ⭐ on GitHub keeps me motivated to keep building.

Happy tuning! 🚀

[Boost]

Anik Chand — Thu, 09 Oct 2025 14:30:56 +0000

Abhiraj Adhikary

Oct 9 '25

Building a YouTube Video Search App with Flask, Whisper, and RAG

#whisper #flask #mariadb #hackathon

Comments

5 min read

How Google Translate & ChatGPT Work: The Transformer, Unboxed

Anik Chand — Thu, 09 Oct 2025 13:50:48 +0000

What Exactly Is a Transformer? 🤔

Ever used Google Translate 🌍 or chatted with ChatGPT 💬?

Behind both lies the same breakthrough: the Transformer ⚡.

Imagine an AI that doesn’t read sentences like a robot 🤖—one word at a time—but like a human 🧠:

instantly grasping how every word connects to every other word.

That’s the Transformer.

It’s a revolutionary AI architecture that understands and generates language by focusing on the most important words at once—no slow, step-by-step reading required.

Born in 2017 from the paper that changed everything—“Attention Is All You Need”✨—the Transformer ditched old-school methods and bet everything on one powerful idea: attention 👀.

And it worked. So well, in fact, that it now powers the smartest language tools you use every day.

📄 Curious how it all began?

Read the original paper here: “Attention Is All You Need”

High-Level Architecture: The Two Main Parts

The Transformer has two big sections that work together, like two teams: the Encoder and the Decoder.

The process works in a few simple steps:

Input Ready: We take the starting sentence (like, the German one). We give each word a number (this is Embedding), and we also add a special code (Positional Encoding) so the Transformer knows the order of the words.
The Encoder's Job: The Encoder reads the whole input sentence and figures out the complete meaning and context of every word. It creates a detailed "thought" for the sentence.
The Decoder's Job: The Decoder starts with a "Start" signal. It looks at the Encoder's "thought" and starts writing the new sentence (the English one), one word at a time.
Final Output: A simple layer (Linear and Softmax) at the end chooses the most likely word to be the next one in the sentence.

🧑‍🍳 A Quick Analogy: The Bilingual Cooking Show

Imagine a chef who must recreate a dish from a foreign recipe—but doesn’t speak the language.

The Encoder is like a team of expert tasters who read the whole original recipe at once and create a Master Flavor Map.
The Decoder is the recreating chef who:
- Starts with a <start> note,
- Can only look at what they’ve already cooked (no peeking ahead!),
- And keeps glancing at the Flavor Map to decide the next ingredient.
Finally, a pantry assistant (Linear + Softmax) picks the most likely Hindi word (ingredient) for each step.

This is exactly how the Transformer translates "How are you?" → "तुम कैसे हो?"—one smart, attentive step at a time!

High-Level Block Diagram

This shows the big picture. Now, let's open up these big blocks and see the smaller, powerful layers inside!

🏗️ Inside the Input Block (Encoder Input)

The Input Block is the very first step on both the Encoder and Decoder side of the Transformer. It takes the original words and prepares them for the attention layers.

Let’s follow the flow in the diagram — from bottom to top — to see exactly how the input sentence gets ready for the Transformer.

Step 1: Tokenizer

The sentence "How are you" goes into the Tokenizer.
It breaks the sentence into individual words (or subwords): → "How", "are", "you"

💡 Think of this like cutting a sandwich into pieces before eating it — the model works on one word at a time.

Step 2: Embedding (512 dim)

Each word ("How", "are", "you") is sent to the Embedding layer.
This turns each word into a 512-number list (called a vector).
These are called Word Embeddings: E1, E2, E3 (each is (512,0) — meaning 512 numbers).

✅ Example:

"How" → E1 = [0.2, -0.8, 0.9, ..., 0.1]

"are" → E2 = [0.7, 0.3, -0.6, ..., 0.4]

"you" → E3 = [-0.1, 0.9, 0.2, ..., -0.7]

Step 3: Positional Embeddings

At the same time, each word gets a Positional Embedding: P1, P2, P3.
These are not learned — they’re precomputed using special math (sine/cosine waves) so every position has a unique pattern.
Why? So the model knows that "How" is first, "are" is second, "you" is third.

🧩 Without this, "Cat chases dog" and "Dog chases cat" would look identical!

Step 4: Add Them Together → Positional Encoded Vectors

For each word, we add its Word Embedding and Positional Embedding:
- X1 = E1 + P1
- X2 = E2 + P2
- X3 = E3 + P3

These final vectors — X1, X2, X3 — are called Positional Encoded Vectors.

Each is still 512 numbers — but now they contain both meaning AND position.

🎯 This is the magic: the model now has all the info it needs to start paying attention!

📌 What Happens Next?

These X1, X2, X3 vectors are now ready to go into the first Encoder block — where the real “attention” begins!

🧱 Inside the Encoder Block

The Encoder takes your input sentence (like "How are you?") and turns it into a deep, contextual understanding of every word.

It does this in two main steps: first, a Multi-Head Attention Block lets each word understand its relationship to all others. Then, a Feed Forward Neural Network Block refines that meaning further.

This whole process repeats 6 times — each time making the understanding richer.

Let’s walk through one full Encoder block using your detailed diagram — from bottom to top.

➡️ Step 1: Input — Positional Encoded Vectors (`X1`, `X2`, `X3`)

Input shape: (3, 512) → 3 words, each as a 512-number vector.
These come from the Input Block (after adding Word + Positional Embeddings).

🟢 Step 2: Multi Head Attention

Each word (X1, X2, X3) looks at all other words to understand context.
Output: Contextual Embeddings → Z1, Z2, Z3 (still (3, 512)).
This is where the model learns that "you" should pay attention to "How" and "are".

➕ Step 3: Residual Connection + Layer Normalisation

Add original input back: Z1' = Z1 + X1 Z2' = Z2 + X2 Z3' = Z3 + X3
Apply Layer Normalisation → Z1norm, Z2norm, Z3norm

✅ This helps the model train better — keeps information flowing without getting lost.

🟣 Step 4: Feed Forward Neural Network (FFNN) Block

This is where each word gets its own private “thinking room”:

A. First Linear Layer + ReLU

Input: Z1norm, Z2norm, Z3norm → (3, 512)
Multiply by weight matrix W1 (size 512 × 2048)
Add bias B1
Apply ReLU → adds non-linearity → output shape: (3, 2048)

B. Second Linear Layer

Multiply by weight matrix W2 (size 2048 × 512)
Add bias B2
Output: Y1, Y2, Y3 → (3, 512)

💡 Think of this as a small brain for each word — refining its meaning after the group discussion.

➕ Step 5: Final Residual + Layer Normalisation

Add the input (Z1norm, Z2norm, Z3norm) back to the FFN output:

Y1' = Y1 + Z1norm
Y2' = Y2 + Z2norm
Y3' = Y3 + Z3norm

Then apply Layer Normalisation → Y1norm, Y2norm, Y3norm

These become the final output of one Encoder block.

📌 Important: In the original “Attention Is All You Need” paper, this entire Encoder block is repeated 6 times in a chain:
Input → Encoder 1 → Output 1 → Encoder 2 → Output 2 → Encoder 3 → Output 3 → Encoder 4 → Output 4 → Encoder 5 → Output 5 → Encoder 6 → Final Encoder Output
Each encoder takes the output of the previous one as its input, building deeper and richer understanding at every stage.

🧱 Inside the Decoder Input Block

Now that the Encoder has finished its job, it’s time for the Decoder to start writing the output sentence — but not quite yet. First, it needs its own special input.

This is where the Decoder Input Block comes in — and as diagram shows, it’s almost identical to the Encoder Input Block… with one very important twist: the Right Shift.

The Decoder Input Block prepares the target sentence (e.g., "तुम कैसे हो") so the Decoder can learn to generate it one word at a time — without cheating by looking ahead.

It does this by adding a <start> token and shifting everything right, so each step only sees what came before.

➡️ Step 1: Right Shift

Start with the target sentence: "तुम कैसे हो"
Add a special <start> token at the beginning: → "<start> तुम कैसे हो"
Then shift the entire sequence one position to the right. This means the decoder will never see the word it’s supposed to predict.

The result is a new input sequence for the decoder:

Position	1	2	3	4
Decoder Input	`<start>`	`तुम`	`कैसे`	`हो`
Target Output	`तुम`	`कैसे`	`हो`	`<end>`

💡 Why?

During training, the Decoder uses this shifted input to predict the next word:

To predict "तुम", it only sees <start>

To predict "कैसे", it sees <start> + तुम

It never sees "हो" when predicting "कैसे"

This forces the model to generate text causally—just like writing a sentence from left to right.

➡️ Step 2: Tokenizer

The shifted sequence (<start> तुम कैसे हो) goes into the Tokenizer.
It breaks it into individual tokens: → <start>, तुम, कैसे, हो

➡️ Step 3: Embedding (512 dim)

Each token gets turned into a 512-number vector via the Embedding layer.
These are called Word Embeddings: E1, E2, E3, E4

✅ Example:

<start> → E1 = [0.1, -0.9, 0.3, ..., 0.7]

तुम → E2 = [0.8, 0.2, -0.6, ..., 0.1]

कैसे → E3 = [-0.4, 0.9, 0.5, ..., -0.3]

हो → E4 = [0.6, -0.1, 0.8, ..., 0.2]

➡️ Step 4: Positional Embeddings

Just like the Encoder, each token also gets a Positional Embedding: P1, P2, P3, P4
These are precomputed (using sine/cosine waves) to tell the model the position of each token.

➡️ Step 5: Add Them Together → Positional Encoded Vectors

For each token, we add its Word Embedding and Positional Embedding:
X1 = E1 + P1 → for <start>
X2 = E2 + P2 → for तुम
X3 = E3 + P3 → for कैसे
X4 = E4 + P4 → for हो

These final vectors — X1, X2, X3, X4 — are called Positional Encoded Vectors.

Each is still 512 numbers — but now they contain both meaning AND position.

🎯 This is the magic: the Decoder now has all the info it needs to start generating — one word at a time, without peeking ahead!

📌 What Happens Next?

These X1, X2, X3, X4 vectors are now ready to enter the first Decoder Block — where they’ll meet the Encoder’s “thought” through Cross-Attention.

🧱 Inside the Decoder Block

Now that we have our Positional Encoded Vectors (X1, X2, X3, X4) from the Decoder Input Block, they’re ready to enter the Decoder Block.

This block has three main parts, stacked one after another:

Masked Self-Attention Block
Cross Attention Block
Feed Forward Neural Network Block

And just like the Encoder, this whole structure is repeated 6 times (Decoder 1 → Decoder 6).

Let’s walk through one full Decoder block — from bottom to top — using detailed diagram.

➡️ Step 1: Input — Positional Encoded Vectors (`X1`, `X2`, `X3`, `X4`)

Input shape: (4, 512) → 4 words (including <start>), each as a 512-number vector.
These come from the Decoder Input Block (after adding Word + Positional Embeddings).

🟢 Step 2: Masked Multi Head Attention

Each word (X1, X2, X3, X4) looks at all previous words — but not future ones.
Why? Because during training, the model must predict the next word without seeing it!
This is called Masked Self-Attention — the “mask” blocks out future positions.
Output: Contextual Embeddings → Z1, Z2, Z3, Z4

💡 Example:

When predicting "कैसे", it can see <start> and तुम — but not हो.

➕ Step 3: Residual Connection + Layer Normalisation

Add original input back: Z1' = Z1 + X1 Z2' = Z2 + X2 Z3' = Z3 + X3 Z4' = Z4 + X4
Apply Layer Normalisation → Z1norm, Z2norm, Z3norm, Z4norm

✅ This helps the model train better — keeps information flowing without getting lost.

🟠 Step 4: Cross Attention

This is where the magic happens — the Decoder talks to the Encoder!

The Decoder takes its own normalized vectors (Z1norm, Z2norm, Z3norm, Z4norm) as Queries.
It uses the Encoder’s final output (from Encoder 6) as Keys and Values.
This lets the Decoder focus on the most relevant parts of the input sentence.
- For example: when generating "हो", it might look back at the Encoder’s understanding of "you".

Output: Cross-Attention Embeddings → Zc1, Zc2, Zc3, Zc4

💡 Think of this as the Decoder asking: “Hey Encoder — what part of the English sentence should I focus on right now?”

➕ Step 5: Residual Connection + Layer Normalisation

Add the input (Z1norm, Z2norm, Z3norm, Z4norm) back to the cross-attention output:

Zc1' = Zc1 + Z1norm
Zc2' = Zc2 + Z2norm
Zc3' = Zc3 + Z3norm
Zc4' = Zc4 + Z4norm

Apply Layer Normalisation → Zc1norm, Zc2norm, Zc3norm, Zc4norm

🟣 Step 6: Feed Forward Neural Network (FFNN) Block

This is where each word gets its own private “thinking room” — same as in the Encoder:

A. First Linear Layer + ReLU

Input: Zc1norm, Zc2norm, Zc3norm, Zc4norm → (4, 512)
Multiply by weight matrix W1 (size 512 × 2048)
Add bias B1
Apply ReLU → adds non-linearity → output shape: (4, 2048)

B. Second Linear Layer

Multiply by weight matrix W2 (size 2048 × 512)
Add bias B2
Output: Y1, Y2, Y3, Y4 → (4, 512)

💡 Think of this as refining each word’s meaning after listening to both itself (self-attention) and the Encoder (cross-attention).

➕ Step 7: Final Residual + Layer Normalisation

Add the input (Zc1norm, Zc2norm, Zc3norm, Zc4norm) back to the FFN output:

Y1' = Y1 + Zc1norm
Y2' = Y2 + Zc2norm
Y3' = Y3 + Zc3norm
Y4' = Y4 + Zc4norm

Apply Layer Normalisation → Y1norm, Y2norm, Y3norm, Y4norm

These become the final output of one Decoder block.

🔄 Repeat 6 Times

This entire process — Masked Self-Attention → Residual → Norm → Cross-Attention → Residual → Norm → FFN → Residual → Norm — happens 6 times in a row.

After Decoder 6, the model has a rich, context-aware understanding of what to generate next — ready for the Final Output Block.

🎯 Final Output Block: Turning Numbers into Words

After the last Decoder block (Decoder 6), we have four final vectors: Y1fnorm, Y2fnorm, Y3fnorm, Y4fnorm — each of shape (512,0).

These vectors are the model’s “best guess” for what each word in the output sentence should be. But they’re still just numbers. To turn them into actual words like "तुम", "कैसे", "हो", and <end>, we need the Final Output Block.

This block is repeated once for each output position — so there are 4 identical blocks here, one for each word.

Let’s walk through the first block — the one that predicts the very first word: "तुम".

➡️ Step 1: Input — `Y1fnorm`

This is the final vector from Decoder 6 for the first position.
Shape: (512,0) → 512 numbers.

🟣 Step 2: Linear Layer (512 → V)

The vector goes into a linear layer with weights of size 512 × V.
V = number of unique words in the output vocabulary (e.g., all Hindi words + <start>, <end>).
Output: V values — one score for every possible word.

💡 Think of this as a giant lookup table: it asks, “Given these 512 numbers, how likely is each word to be the next one?”

🟠 Step 3: Softmax

The V values go through a softmax function.
This turns the scores into probabilities — adding up to 1.0.
Output: V probability values — e.g., 90% chance of "तुम", 5% of "कैसे", etc.

🟢 Step 4: Normalisation

A Normalisation step is applied — this ensures the probabilities are smooth and well-scaled.
In practice, this is often part of the softmax or a small post-processing step.

🎯 Step 5: Return Highest Probability Value

The model picks the word with the highest probability.
For the first position → it picks "तुम"

✅ This is how the Transformer generates its first word!

🔁 Repeat for All Positions

The same process happens for the other three positions:

Position 2 → Y2fnorm → predicts "कैसे"
Position 3 → Y3fnorm → predicts "हो"
Position 4 → Y4fnorm → predicts <end>

Each block is identical — only the input vector (Y1fnorm, Y2fnorm, etc.) changes.

📌 Why This Works

The model doesn’t generate all words at once — it does one at a time.
Each prediction is based on the full context built by the Encoder and Decoder.
The final linear + softmax layer is like a “vocabulary selector” — turning abstract numbers into real words.

This is the final step in the Transformer — where numbers become language!

🌟 Conclusion: The Transformer, Demystified

You’ve just walked through the entire Transformer — from raw words to fluent translation — one block at a time.

you can check the whole diagram here : https://drive.google.com/file/d/1lz68fKBnUtsqi9_9q_7J6MrikSu2oA8e/view?usp=sharing

No magic. No mystery. Just smart design:

Attention that sees relationships,
Positional codes that preserve order,
Residual connections that keep learning stable,
And parallel processing that makes it fast.

What started as a sentence — "How are you?" — became numbers, then context, then meaning, and finally: "तुम कैसे हो?"

And the best part?

You now understand how it works — not just at a high level, but deep down to the vectors, layers, and shapes.

The Transformer isn’t just a model. It’s the foundation of modern AI — from translation and chatbots to code generation and beyond.

And you?

You didn’t just read about it.

You followed the data all the way through.

Go ahead — share what you’ve learned.

Because now, you truly see the machine behind the magic. 💫

A Beginner's GAN Adventure with Digits

Anik Chand — Tue, 07 Oct 2025 20:52:23 +0000

It was a rainy afternoon in September. The view from my window was all gray and blurry. That reminded me of the first images my GAN made – just fuzzy chaos on the screen. 🌧️

I'd been learning Deep Learning for months. I wanted to create something new with code. Not just numbers, but something that looks real.

So, I tried Generative Adversarial Networks. I built it from scratch with Keras and TensorFlow, taking help from ChatGPT. 💻

The dataset? MNIST. It has 70,000 handwritten digits. They look like quick notes from people long ago – scribbled 7s and curvy 8s. 📝

Think of two neural networks working against each other:

The Generator takes random noise – just a bunch of random numbers. It tries to turn that into a digit image. At first, it makes blurry shapes. You might guess it's a 3... or something else. 🔄
The Discriminator checks the images. It knows real MNIST images. It says "fake" to the bad ones and "real" to the good ones. 👮‍♂️

Source: DZone Article on GAN Principles

Think of GANs like a counterfeiter (Generator) trying to make fake money that fools the police (Discriminator). The bank provides real money for training. Over time, the fakes get better! 💸

Architecture of A GAN network :

Source: Jonathan Hui on Medium

The cool part? They learn together. The Generator gets better at tricking the Discriminator. The Discriminator gets better at spotting fakes. After many training steps, the fake digits look real. Like they were drawn by hand. 🎨

I did this project as an experiment. I wanted to understand how GANs work. It was fun to see it come together. Next, I plan to use this to make realistic human faces. 😎

Building the Networks 🛠️

I used Jupyter notebook to build it. The Generator starts with 100 random numbers. It uses dense layers and LeakyReLU to shape them. Then it turns them into a 28x28 image. It's like building a picture from nothing. 🖼️

The Discriminator takes 28x28 images. It uses Conv2D layers to look for patterns. It ends with a yes or no: real or fake.

Parts	Generator	Discriminator
Input	100 random numbers	28x28 image
Main Layers	Dense then reshape	Conv2D then flatten
Activations	LeakyReLU, Tanh	LeakyReLU, Sigmoid
Output	Fake image	Real (1) or Fake (0)

At the last of the blog I have explained more about the Discriminator and Generator network which is have used in my code.

Training took time on my GPU. I saved the models so I could start again if needed. I also saved images every few steps to see progress. Later, I made a GIF from them. And I zipped all the images together. ⏳

Seeing the Digits Improve 📈

Around step 5, I saw a shaky 4 appear. That was exciting. By step 25, the digits looked good. 1s had straight lines. 9s had curves. At step 100, the Discriminator could not tell fakes from real very well. That meant success.

Here are some images from training:

Pure Noise	Epoch 1

From total randomness to the first blurry hints of digits – like fog lifting just a bit.

Epoch 2	Epoch 3

Faint shapes emerging... is that a 6? The Generator is starting to get the idea.

Epoch 9	Epoch 17

Things are sharpening up! Each digit feels a little more like real handwriting, with its own quirks.

Epoch 25	Epoch 50

Confidence building – straight lines for 1s, smooth curves for 9s. The duel is heating up.

Epoch 75	Epoch 100

Nearly there! These fakes could pass for real MNIST scribbles. Magic in the making. ✨

Here is a GIF of 16 digits improving over time.

It shows how random noise turns into clear digits in 10 seconds.

GAN Architecture in My Project 🔧

Discriminator Architecture

The Discriminator network processes 28x28 grayscale images and outputs a probability (0 for fake, 1 for real). Here's the layer-by-layer breakdown:

Layer Name	Type	Input Shape	Output Shape
conv2d_6	Conv2D	(None, 28, 28, 1)	(None, 14, 14, 64)
leaky_re_lu_30	LeakyReLU	(None, 14, 14, 64)	(None, 14, 14, 64)
dropout_6	Dropout	(None, 14, 14, 64)	(None, 14, 14, 64)
conv2d_7	Conv2D	(None, 14, 14, 64)	(None, 7, 7, 128)
leaky_re_lu_31	LeakyReLU	(None, 7, 7, 128)	(None, 7, 7, 128)
dropout_7	Dropout	(None, 7, 7, 128)	(None, 7, 7, 128)
flatten_3	Flatten	(None, 7, 7, 128)	(None, 6272)
dense_11	Dense	(None, 6272)	(None, 1)

This architecture uses convolutional layers for feature extraction, dropout for regularization to prevent overfitting, and LeakyReLU activations to maintain gradient flow.

Generator Architecture

The Generator network starts with random noise input and progressively upsamples it to produce 28x28 grayscale digit images. Here's the layer-by-layer breakdown:

Layer Name	Type	Input Shape	Output Shape
dense_9	Dense	(None, 100)	(None, 1254)
batch_normalization_18	BatchNormalization	(None, 1254)	(None, 1254)
leaky_re_lu_24	LeakyReLU	(None, 1254)	(None, 1254)
reshape_6	Reshape	(None, 1254)	(None, 7, 7, 256)
conv2d_transpose_18	Conv2DTranspose	(None, 7, 7, 256)	(None, 7, 7, 128)
batch_normalization_19	BatchNormalization	(None, 7, 7, 128)	(None, 7, 7, 128)
leaky_re_lu_25	LeakyReLU	(None, 7, 7, 128)	(None, 7, 7, 128)
conv2d_transpose_19	Conv2DTranspose	(None, 7, 7, 128)	(None, 14, 14, 64)
batch_normalization_20	BatchNormalization	(None, 14, 14, 64)	(None, 14, 14, 64)
leaky_re_lu_26	LeakyReLU	(None, 14, 14, 64)	(None, 14, 14, 64)
conv2d_transpose_20	Conv2DTranspose	(None, 14, 14, 64)	(None, 28, 28, 1)

This architecture uses transposed convolutions for upsampling, batch normalization for stability, and LeakyReLU activations to prevent vanishing gradients.

When the rain stopped, I looked at all the images. This project taught me a lot about GANs. It showed how trial and error leads to something new. Now, I want to try better ways to check quality, like FID scores. Maybe use more layers. Or try it on color images from CIFAR-10.

I also want to make Conditional GANs. That way, I can make a specific digit, like just 7s.

Check the code on GitHub. Try it yourself. What GAN project have you done? Tell me in the comments.

Sentiment Analysis, the Classical Way — No Deep Learning

Anik Chand — Wed, 06 Aug 2025 14:12:30 +0000

“Can you get high accuracy in sentiment analysis without touching deep learning?”
That's the question that sparked my curiosity — and led to a project that amazed me with its results.

In this blog, I’ll walk you through my journey of building a sentiment analysis system using only Machine Learning — no neural networks, no transformers, just classic ML — and still achieving an impressive accuracy of 89.12%.

📌 Problem Statement

The goal: Sentiment Analysis — classifying text as either positive or negative.

Rather than relying on RNNs, LSTMs, or BERT, I challenged myself to stay within the boundaries of classical machine learning algorithms.

🧪 Dataset

I used the IMDb movie reviews dataset:

40,000 training reviews
10,000 testing reviews
Binary labels: positive / negative

🧹 Preprocessing Steps

Lowercasing text
Removing HTML tags
Cleaning punctuation
Tokenization
Stopword removal (using NLTK)
Stemming with Porter Stemmer
Vectorization using TF-IDF

⚙️ Baseline Model: GaussianNB

To begin, I tested the simplest model possible: Gaussian Naive Bayes.

📈 Accuracy: 82.00%

This gave me a quick baseline — but I knew I could push further.

🔍 Model Experiments

I tested multiple models:

from sklearn.naive_bayes import MultinomialNB, BernoulliNB
from sklearn.linear_model import LogisticRegression, RidgeClassifier, SGDClassifier, PassiveAggressiveClassifier
from sklearn.svm import LinearSVC

clf2 = MultinomialNB(alpha=1.0)
clf3 = BernoulliNB(alpha=1.0)
clf4 = LogisticRegression(solver='saga', max_iter=1000)
clf5 = LinearSVC(max_iter=5000)
clf6 = SGDClassifier(loss='log_loss', max_iter=1000, tol=1e-3)
clf7 = RidgeClassifier(alpha=1.0, solver='auto')
clf8 = PassiveAggressiveClassifier(max_iter=1000, tol=1e-3, early_stopping=False)

Best 4 accuracies -

🧪 GridSearchCV for Tuning

I applied GridSearchCV on the top 4 models:

# Logistic Regression
param_grid = {'C': [0.01, 0.1, 1, 5, 10], 'solver': ['saga'], 'max_iter': [1000]}
# Best: {'C': 0.1, 'solver': 'saga'} => 88.7%

# LinearSVC
param_grid = {'C': [0.01, 0.1, 1, 5, 10], 'max_iter': [5000]}
# Best: {'C': 0.01} => 88.645%

# MultinomialNB
param_grid = {'alpha': [0.01, 0.1, 0.5, 1.0, 5.0]}
# Best: {'alpha': 1.0} => 85.37%

# SGDClassifier
param_grid = {
    'alpha': [0.0001, 0.001, 0.01],
    'loss': ['hinge', 'log_loss'],
    'penalty': ['l2', 'l1', 'elasticnet'],
    'max_iter': [1000]
}
# Best: {'alpha': 0.001, 'loss': 'log_loss', 'penalty': 'l2'} => 88.75%

🧪 Final Accuracy after Retraining

🧠 Stacking Ensemble

To boost performance further, I implemented stacking:

Base Models: SGD, LogisticRegression, LinearSVC
Meta Model: LogisticRegression

📈 Stacking Accuracy: 89.12%

🧪 Tried Deep Stacking

Tried multi-layer stacking:

Layer 1: Logistic, LinearSVC, MultinomialNB, SGD
Layer 2: LogisticRegression
Layer 3: RidgeClassifier, SGD → Final VotingClassifier

But… accuracy slightly dropped:

📉 Accuracy: 89.09%

🔁 I reverted to the 1-layer stacking, which performed best.

🔧 Tools & Libraries Used

Python
Scikit-learn
Pandas
NLTK
Matplotlib / Seaborn

🎯 Key Learnings

Traditional ML can still compete with deep learning in text tasks
Logistic Regression + TF-IDF = surprisingly powerful
Ensemble methods like stacking can push the limits

📝 Final Words

This project wasn't about beating deep learning. It was about challenging assumptions — and proving that, with the right setup, classic ML still holds its ground.

If you're new to ML or want to understand the fundamentals before diving into deep models, this path is for you.

Feel free to connect, share thoughts, or collaborate. I’d love to hear your feedback!

repo link : https://github.com/anikchand461/sentiment-analysis

The Curve That Judges Your ML Model

Anik Chand — Sun, 13 Jul 2025 14:34:01 +0000

Ever built a model and felt proud of its 95% accuracy, only to find out it’s not that great after all? 😅

I used to think the AUC-ROC curve was some complicated graph that only expert data scientists talked about. But once I understood it, I realized it’s actually pretty simple — and super useful!

In this blog, I’ll explain the ROC curve in a way that’s easy to understand. We’ll see how it helps you figure out how good your model really is — with simple examples, pictures, and Python code.

🔍 Why Accuracy Isn’t Always the Hero

Let’s say you built a model to detect a rare disease. 99 out of 100 people don’t have it.

Now imagine your model just predicts “No disease” for everyone.

Accuracy? 99%
Helpful? Not at all. You missed the one person who actually has the disease.

This is where smarter metrics come in — things like Precision, Recall, and the star of today’s show: AUC-ROC.

✨ So, What’s This ROC Curve Anyway?

The ROC (Receiver Operating Characteristic) curve shows how good your model is at distinguishing between two classes — like spam vs. not spam.

X-Axis: False Positive Rate (FPR) — how often the model cries wolf
Y-Axis: True Positive Rate (TPR) — how often it catches the real deal

As you move the decision threshold, these rates change. The ROC curve just plots these changes.

If the curve hugs the top-left corner — you’ve got a great model. If it sticks to the diagonal? Might as well toss a coin.

📝 A Real-Life Example (Email Spam Classifier)

Suppose you built a model to detect spam. It gives probabilities like:

Email    Prob(Spam)
A        0.45
B        0.29
C        0.61

Let’s say your threshold is 0.5:

> 0.5 → Spam
≤ 0.5 → Not spam

🎯 Picking the right threshold matters. Why?

Because there are two types of mistakes:

Predicting not spam when it is spam → ⚠️ You miss an actual threat.
Predicting spam when it’s not → 🤷 Just an annoying false alarm.

Depending on your use case, one error might be worse than the other.

📊 Quick Refresher: Confusion Matrix

             Predicted
            1       0
Actual  1   TP      FN
        0   FP      TN

TP = True Positive
FN = False Negative
FP = False Positive
TN = True Negative

From this we get:

TPR = TP / (TP + FN) — how many real positives you caught
FPR = FP / (FP + TN) — how many times you cried wolf

📉 Threshold Changes — What Happens?

Changing your threshold is like adjusting the sensitivity of your spam filter:

Lower threshold → You catch more spam (high TPR), but mislabel legit emails too (high FPR)
Higher threshold → Fewer false alarms, but you might miss actual spam

The ROC curve shows this trade-off for every possible threshold.

💡 Spam Detection in Action

Let’s say:

You have 200 emails: 100 spam, 100 not spam
Your model detects 80 spam correctly → TPR = 80%
But it wrongly flags 20 legit emails → FPR = 20%

🎯 Goal: Keep TPR high and FPR low.

Another fun one:

Netflix Churn Prediction
- You predict who will cancel their subscription
- False positive = predicting a loyal user will leave → Not great for business

📈 How to Read a ROC Curve

Y-axis: TPR (catching the good stuff)
X-axis: FPR (making false calls)

As you tweak the threshold:

Low threshold → High TPR and high FPR
High threshold → Low TPR and low FPR

We want the sweet spot where TPR is high and FPR is low.

✨ AUC: The Area Under That Curve

AUC (Area Under the ROC Curve) tells us how good your model is overall — across all thresholds.

AUC = 1.0 → Perfect model
AUC = 0.5 → Random guess
AUC < 0.5 → Your model might be predicting backwards 😅

AUC basically says: Pick any random spam and non-spam email — what’s the chance the model ranks the spam one higher?

Example:

Model M1: AUC = 0.85
Model M2: AUC = 0.70

→ M1 is clearly better at separating spam from not spam.

👨‍💻 Try It in Python

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_curve, auc
import matplotlib.pyplot as plt

# Dummy data
df = make_classification(n_samples=1000, n_classes=2, weights=[0.7, 0.3], random_state=42)
X_train, X_test, y_train, y_test = train_test_split(*df, test_size=0.3, random_state=42)

# Train model
clf = RandomForestClassifier()
clf.fit(X_train, y_train)

# Probabilities
y_probs = clf.predict_proba(X_test)[:, 1]

# ROC stuff
fpr, tpr, _ = roc_curve(y_test, y_probs)
roc_auc = auc(fpr, tpr)

# Plot
plt.plot(fpr, tpr, label=f'AUC = {roc_auc:.2f}')
plt.plot([0, 1], [0, 1], '--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend()
plt.grid()
plt.show()

🔖 One-Liner to Get AUC

from sklearn.metrics import roc_auc_score
print("AUC:", roc_auc_score(y_test, y_probs))

🧠 Key Takeaways

Accuracy isn’t always enough
ROC curve helps visualize your classifier’s skill
AUC gives an overall score
Thresholds change how sensitive your model is

🚀 Wrap-Up

AUC-ROC isn’t just a fancy graph — it helps you really understand your model. Whether you’re filtering spam, detecting diseases, or predicting churn — this curve has your back.

So next time someone mentions AUC, you can nod, smile, and maybe even draw it too. 😉

If this helped you, follow me on GitHub or LinkedIn for more ML breakdowns!

#machinelearning #datascience #python #roc #auc #classification

🚢 OneHotEncoder Shape Mismatch Mystery in Titanic Dataset — Solved!

Anik Chand — Mon, 07 Apr 2025 12:48:13 +0000

Hi everyone! 👋 I'm currently working through the Titanic dataset as part of the CampusX YouTube course, and I ran into an interesting issue involving OneHotEncoder and SimpleImputer that I finally understood after digging into the problem.

This blog is all about that journey — what caused the shape mismatch between training and testing data, and how I fixed it. If you're also working on preprocessing categorical variables in machine learning, this might save you a few hours of debugging!

🧠 The Setup

We’re using the Titanic dataset for classification (predicting survival), and like most people, I’m preprocessing the Sex and Embarked columns using:

SimpleImputer to handle missing values
OneHotEncoder to convert categorical variables into numerical format

Here’s a snippet of what I had:

from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder

si_embarked = SimpleImputer(strategy='most_frequent')
ohe_embarked = OneHotEncoder(sparse=False, handle_unknown='ignore')

# Imputation
x_train_embarked = si_embarked.fit_transform(x_train[['Embarked']])
x_test_embarked = si_embarked.transform(x_test[['Embarked']])

# Encoding
x_train_embarked = ohe_embarked.fit_transform(x_train[['Embarked']])
x_test_embarked = ohe_embarked.transform(x_test[['Embarked']])

⚠️ The Issue

After running this, I checked the shapes:

x_train_embarked.shape  → (712, 4)
x_test_embarked.shape   → (179, 3)

Wait — what?!

Why does the train set have 4 columns, and the test set has only 3, even though I used handle_unknown='ignore'?

Wasn’t that supposed to handle unknown categories safely?

🕵️‍♂️ Investigating the Root Cause

I ran a few more checks and realized something sneaky:

x_train['Embarked'].isnull().sum()  # Output: 2

Hmm… that’s weird. I thought I had already imputed missing values. But then I remembered this part of my code:

x_train_embarked = si_embarked.fit_transform(x_train[['Embarked']])

Aha! 💡 I had imputed missing values into a new variable, x_train_embarked, but I never updated the original x_train DataFrame!

That means the original x_train['Embarked'] still had NaN values when I called .fit() on the encoder:

x_train['Embarked'] = ohe_embarked.fit_transform(x_train[['Embarked']])

This caused the encoder to treat NaN as a valid category, resulting in 4 categories being learned:

['C', 'Q', 'S', NaN]

But in the test data, there were no NaN values, only:

['C', 'Q', 'S']

So the encoder ignored the unseen NaN category, resulting in:

x_test_embarked.shape = (179, 3)

✅ The Fix

The correct way was to assign the imputed values back to the original DataFrame:

x_train['Embarked'] = si_embarked.fit_transform(x_train[['Embarked']])
x_test['Embarked'] = si_embarked.transform(x_test[['Embarked']])

Now, when I fit the encoder:

ohe_embarked = OneHotEncoder(sparse=False, handle_unknown='ignore')
x_train_embarked = ohe_embarked.fit_transform(x_train[['Embarked']])
x_test_embarked = ohe_embarked.transform(x_test[['Embarked']])

✅ The shapes finally matched:

x_train_embarked.shape → (712, 3)
x_test_embarked.shape → (179, 3)

📌 Bonus: What `handle_unknown='ignore'` Really Means

Here’s a quick visual explanation:

Imagine your training data had these categories:

['Red', 'Blue', 'Green']

And you encode them like this:

Red	Blue	Green
1	0	0
0	1	0
0	0	1

Now your test data contains a new category: 'Yellow'.

If you use:

OneHotEncoder(handle_unknown='ignore')

Then the encoder will just assign all 0s for 'Yellow':

Red	Blue	Green
0	0	0

✅ No crash. But also — you now have a row of all zeros!

🎓 Final Takeaways

Always handle missing values before encoding
If you’re using SimpleImputer, assign the output back to your original DataFrame
handle_unknown='ignore' prevents errors, but doesn’t fix shape mismatches caused by unseen categories during .fit()

This was a great learning moment for me while working through the Titanic dataset with CampusX. Hope this helps anyone else facing the same mystery! 🧩

Let me know if you've run into similar preprocessing surprises!

Regression in ML Explained! 🚀 The Ultimate Hands-on Guide

Anik Chand — Wed, 19 Mar 2025 11:53:54 +0000

💡 How does Netflix know what you’ll binge-watch next? Or how do businesses predict future sales with impressive accuracy?

The magic behind these predictions is Regression—a fundamental technique in Machine Learning! 🚀

Whether it's forecasting house prices 🏡, stock trends 📈, or weather patterns 🌦️, regression plays a crucial role in making data-driven decisions. In this guide, we’ll break it all down—step by step—with easy explanations, real-world examples, and hands-on code.

🔍 What’s in store for you?

We'll explore various Regression algorithms, understand how they work, and see them in action with practical applications. Let’s dive in! 🔥

💡 1. Linear Regression: The Foundation of Predictive Modeling

Linear Regression is the most fundamental regression technique, assuming a straight-line relationship between input variables (X) and the output (Y). It is widely used for predicting trends, making forecasts, and understanding relationships between variables.

By fitting a linear equation to the observed data, Linear Regression helps in estimating the dependent variable based on independent variables. The equation of a simple linear regression is:

📌 Where:

Y = Predicted value (dependent variable)
X = Input feature (independent variable)
b₀ = Intercept (constant term)
b₁ = Slope (coefficient of X)
ε = Error term

🔹 Key Applications of Linear Regression:

✅ Stock Market Predictions 📈

✅ Sales Forecasting 🛍️

✅ Real Estate Price Estimation 🏡

✅ Medical Research & Risk Analysis ⚕️

🖥️ Implementing Linear Regression in Python:

Let's implement Simple Linear Regression using Python and Scikit-Learn:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

# Sample dataset
data_X = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]).reshape(-1, 1)
data_Y = np.array([3, 4, 2, 5, 6, 7, 8, 9, 10, 12])

# Splitting the data
X_train, X_test, Y_train, Y_test = train_test_split(data_X, data_Y, test_size=0.2, random_state=42)

# Model training
model = LinearRegression()
model.fit(X_train, Y_train)

# Predictions
y_pred = model.predict(X_test)

# Plotting the regression line
plt.scatter(data_X, data_Y, color='blue', label='Actual Data')
plt.plot(X_test, y_pred, color='red', linewidth=2, label='Regression Line')
plt.xlabel("Input Feature (X)")
plt.ylabel("Output (Y)")
plt.title("Linear Regression Model")
plt.legend()
plt.show()

📊 Output Visualization:

This simple example demonstrates how Linear Regression can be implemented using Scikit-Learn in Python. 🚀

Stay tuned as we explore more regression techniques in the next sections! 🔥

🔎 Example Use Case:

📌 Predicting house prices based on square footage 🏠

Imagine you have a dataset with house sizes and their respective prices. By applying Linear Regression, you can predict the price of a house based on its area!

📢 Tip: Always check model assumptions like linearity, independence, and normal distribution of residuals before applying Linear Regression in real-world scenarios.

Let’s move on to more advanced regression techniques in the next section! 🚀

🚀 2. Multiple Linear Regression: Expanding Predictive Power

Multiple Linear Regression extends Simple Linear Regression by incorporating multiple input variables to predict an outcome. Instead of modeling a relationship between just one independent variable and the dependent variable, it considers two or more independent variables, making predictions more accurate.

🔍 Understanding Multiple Linear Regression

In Multiple Linear Regression, the relationship between the dependent variable (Y) and multiple independent variables (X₁, X₂, X₃, ... Xₙ) is represented as:

📏 Equation of Multiple Linear Regression:

Where:

Y = Dependent variable (what we predict)
X₁, X₂, X₃, ... Xₙ = Independent variables (input features)
b₀ = Intercept (constant term)
b₁, b₂, ..., bₙ = Coefficients representing the influence of each variable
ε = Error term

📊 Visual Representation:

1️⃣ Concept of Multiple Regression

2️⃣ Regression Plane Representation (for 2 Variables)

3️⃣ Multiple Linear Regression Formula Breakdown

🖥️ Code Implementation: Mean Squared Error (MSE) in Python

import numpy as np

def mean_squared_error(y_actual, y_pred):
    """
    Compute the Mean Squared Error (MSE) cost function.

    Parameters:
    y_actual : np.array : Actual values
    y_pred : np.array : Predicted values (mx + c)

    Returns:
    float : MSE value
    """
    n = len(y_actual)  # Number of data points
    mse = (1 / n) * np.sum((y_actual - y_pred) ** 2)
    return mse

# Example Data
x = np.array([1, 2, 3, 4, 5])  # Input features
y_actual = np.array([2, 4, 6, 8, 10])  # Actual output values

# Linear regression parameters
m = 2  # Slope
c = 0  # Intercept

# Compute predictions
y_pred = m * x + c

# Compute MSE
mse_value = mean_squared_error(y_actual, y_pred)

print("Mean Squared Error (MSE):", mse_value)

🏠 Example Use Case: Predicting House Prices

Features considered:

X₁: Size of the house (sq ft)
X₂: Number of bedrooms
X₃: Location rating
Y: Predicted house price

✅ Advantages of Multiple Linear Regression:

✔️ Captures the effect of multiple variables for better predictions.

✔️ Useful for complex real-world scenarios like finance, healthcare, and business analytics.

❌ Challenges of Multiple Linear Regression:

⚠️ More features increase complexity and overfitting risks.

⚠️ Requires careful feature selection and normalization for accuracy.

🚀 3. Polynomial Regression: Capturing Non-Linear Trends

When data doesn’t follow a straight-line trend, Polynomial Regression helps model non-linear relationships by introducing polynomial terms to the equation. This technique is useful when the relationship between the independent and dependent variables is curved.

📌 Equation:

Polynomial Regression extends Linear Regression by incorporating higher-degree polynomial terms:

Where:

Y is the predicted output
X is the input feature
b₀, b₁, b₂, …, bₙ are the regression coefficients
n is the polynomial degree
ε is the error term

🔍 Real-World Applications of Polynomial Regression:

📈 Salary Prediction: Estimating salary growth over time, where experience influences salary in a non-linear fashion.
🦠 COVID-19 Trend Forecasting: Modeling infection rate trends, which often follow polynomial or exponential growth.
🚗 Vehicle Performance Modeling: Predicting fuel consumption based on speed and engine performance.
📊 Economics & Finance: Forecasting demand, inflation, and economic trends where relationships are complex.

✅ Advantages:

✔️ Works well for curved datasets where Linear Regression fails.

✔️ Provides a better fit for non-linear trends when the correct degree is chosen.

❌ Disadvantages:

❌ Can overfit the data if the polynomial degree is too high.

❌ Harder to interpret compared to simple Linear Regression.

🖥️ Python Code for Polynomial Regression:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline

# Sample dataset
X = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]).reshape(-1, 1)
y = np.array([2, 5, 10, 18, 30, 50, 75, 105, 140, 180])

# Creating a polynomial model (degree = 2)
poly_model = make_pipeline(PolynomialFeatures(degree=2), LinearRegression())
poly_model.fit(X, y)

y_pred = poly_model.predict(X)

# Plot results
plt.scatter(X, y, color='blue', label='Actual Data')
plt.plot(X, y_pred, color='red', linewidth=2, label='Polynomial Regression Line')
plt.xlabel("Input Feature (X)")
plt.ylabel("Output (Y)")
plt.title("Polynomial Regression Model")
plt.legend()
plt.show()

📌 Visual Representation:

Polynomial Regression allows machine learning models to capture non-linear relationships and make better predictions in real-world scenarios. 🚀

4. Logistic Regression (For Classification) :

Although it contains "Regression" in its name, Logistic Regression is used for Classification problems, not Regression.

Instead of predicting continuous values, it predicts probabilities and assigns categories like Yes/No, Pass/Fail, Spam/Not Spam.

Equation:

where P is the probability of belonging to a class.

✅ Example:

Predicting whether a customer will buy a product (Yes/No).
Classifying emails as spam or not.

✅ Why is it called Regression?

Although it’s used for classification, Logistic Regression applies a regression-based approach before applying the Sigmoid function to convert outputs into probabilities.

🖥️ Python Implementation of Logistic Regression

import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

# Sample dataset (Binary classification: Pass (1) or Fail (0))
X = np.array([[20], [25], [30], [35], [40], [45], [50], [55], [60], [65]])  # Hours studied
y = np.array([0, 0, 0, 1, 1, 1, 1, 1, 1, 1])  # 0 = Fail, 1 = Pass

# Splitting dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Training Logistic Regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Evaluating model
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
print("Classification Report:\n", classification_report(y_test, y_pred))

# Plotting the sigmoid curve
X_range = np.linspace(15, 70, 100).reshape(-1, 1)
y_probs = model.predict_proba(X_range)[:, 1]

plt.scatter(X, y, color='blue', label='Actual Data')
plt.plot(X_range, y_probs, color='red', label='Sigmoid Curve')
plt.xlabel("Hours Studied")
plt.ylabel("Probability of Passing")
plt.title("Logistic Regression Model")
plt.legend()
plt.show()

📌 Expected Output:

Accuracy: 1.0  # (Might vary slightly depending on random split)
Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00         1
           1       1.00      1.00      1.00         1

    accuracy                           1.00         2
   macro avg       1.00      1.00      1.00         2
weighted avg       1.00      1.00      1.00         2

This implementation demonstrates how Logistic Regression is used for binary classification. The model predicts whether a student will pass or fail based on study hours, and we visualize the sigmoid function curve. 📊🔥

📌 Conclusion: Regression in Machine Learning

Regression is a fundamental concept in Machine Learning, enabling us to make continuous predictions based on input features. It is widely used in forecasting, trend analysis, and data-driven decision-making.

🔹 Quick Summary of Regression Algorithms

Algorithm	Use Case	Equation Type	Best For
Linear Regression	Predicting sales, stock prices	Linear equation	Simple relationships between variables
Multiple Regression	House pricing with multiple factors	Linear (Multiple Inputs)	Impact of multiple features
Polynomial Regression	Salary growth trends, COVID-19 cases	Polynomial equation	Capturing non-linear patterns
Logistic Regression	Spam detection, customer conversion	Sigmoid function	Classification problems

🏆 Key Takeaways

✅ Regression is essential for predictive modeling in real-world applications.

✅ Choosing the right regression technique depends on data patterns and relationships.

✅ Logistic Regression is used for classification, despite its name.

Regression models power AI-driven decision-making, forming the backbone of modern analytics and forecasting! 🚀

🎮 Crafting Fun with Code: My Journey Building the Hangman Game 🕹️

Anik Chand — Tue, 07 Jan 2025 14:14:48 +0000

Repo link : https://github.com/anikchand461/Hangman-Game

What motivated me to start blogging ?
This is my first blog, and the motivation to start blogging came from my friend, Abhiraj Adhikary. He encouraged me to explore blogging as a way to connect with like-minded people, share knowledge, and engage with a broader community. His insights made me realize that blogging isn’t just about documenting ideas but also a way to clarify and refine my own concepts, particularly in projects. Writing about my work helps me dive deeper into the details, reflect on my learning, and present it in a way that others can benefit from. Inspired by his advice, I decided to start this journey to share my experiences and build meaningful connections with others in the tech and learning community.

What is hangman Game ?
Hangman is a classic word-guessing game with a simple yet engaging objective: players guess a hidden word by suggesting letters within a limited number of attempts. With each incorrect guess, a part of a stick-figure “hangman” is drawn, increasing the tension. The game tests vocabulary, problem-solving skills, and strategy, making it a popular choice for both casual play and learning activities.

What inspired me to create this game ?
I was learning Python through Jenny’s Lecture CS IT YouTube channel, where the teacher introduced the Hangman game as a project. However, I decided to take a different approach—I wanted to challenge myself to create the game independently, without watching the tutorial. To start, I played a few Hangman games from apps downloaded from the Play Store, which helped me understand the gameplay and logic. These experiences inspired me to incorporate small changes and enhancements into my version of the game, aiming for a more polished and engaging result.

Objective of the game
The main goal of creating my Hangman game was twofold. It was a fun project that allowed players to enjoy a word-guessing game with categories like sports, food, and animals. For me, it was also a valuable learning experience, reinforcing my understanding of Python concepts like conditionals, loops, and the random module for dynamic word selection. Building the game enhanced my logic-building skills, as I designed the gameplay flow, added ASCII art, and implemented features like replay options. This project was a perfect mix of fun and learning, marking an important step in my Python journey.

Features of the Game

Diverse Word Categories
The game includes multiple word categories, such as sports, food, and animals, offering players a varied and engaging experience.

Dynamic Word Selection
With the use of Python’s random module, words are dynamically selected from the chosen category, ensuring unpredictability and replayability.

Engaging ASCII Art
The game features creative ASCII art for the Hangman and the game header, adding a visual element that enhances the player’s experience.

Replay Option
Players have the option to replay the game after a round ends, making it easy to enjoy multiple sessions without restarting the program.

User-Friendly Enhancements
To improve the gameplay experience, I incorporated user-friendly features such as-

Error-Handling
Ensuring the program runs smoothly even if the player makes invalid inputs.

Score-keeping
Tracking player performance to add a competitive edge and encourage improvement.

  print(f'''
    GAME SUMMARY--
    Player name : {name}
    Difficulty level : {difficulty_level}
    lives used : {6 - life}
    Total attampts : {correct_attampts + wrong_attampts}
    Correct attampts : {correct_attampts}
    Wrong attampts : {wrong_attampts}
    ''')

Technologies Used
Programming Language: Python
The game is built entirely using Python, a versatile and beginner-friendly programming language known for its simplicity and readability.

Libraries used

import random as r
import os

random :
This library is utilized to dynamically select words from predefined categories, ensuring that each game offers a unique experience.

os :
The os module is used for tasks like clearing the screen between guesses, enhancing the overall gameplay presentation.

Why Python?
Python was chosen for this project because I was actively learning Python at the time. I thought building the Hangman game would be a great way to apply and master the concepts I was learning. The simplicity and versatility of Python made it the perfect choice for creating a fun and engaging project while improving my skills.

Code Walkthrough

First Implementation

I started by creating a flowchart to map out the game’s logic:
• I chose a predefined word and created a dashed list, with each dash representing a letter of the word.
• When the player guessed a letter, it was compared to the word. If a match was found, the dash was replaced with the correct letter. This continued until all letters were revealed or the player ran out of attempts.

Secondary Improvement

After implementing the basic logic, I enhanced the game by:
• Allowing players to replace all instances of the guessed letter in the dashed list (e.g., “away” becomes [‘a’, ‘’, ‘a’, ’’] when guessing ‘a’).
    for i in range(letter_numbers):
        list_word.append('_')
• Introducing a lives system, where players start with 6 lives, losing 1 life with each incorrect guess. The game ends when the word is guessed correctly or lives run out.

Further Improvement

I improved the game’s word selection by:
• Creating a list of 1000 words using ChatGPT and using the random module to select a word.
• Adding categories like sports, food, and animals, allowing players to choose based on their interests, making the game more engaging.

Additional Development

To make the game more engaging, I added difficulty levels and visual enhancements:
     if difficulty_level == 'easy':
        select_word = select_word1(key)
    elif difficulty_level == 'moderate':
        select_word = select_word2(key)
    elif difficulty_level == 'hard':
        select_word = select_word3(key)
    else:
        print('please enter valid input')
Difficulty Levels:
• Introduced three levels: easy (1–5 letters), medium (5–7 letters), and hard (8+ letters).
• Players could choose a level, and word selection adjusted accordingly for added challenge.
Visual Enhancements:
• Included ASCII art hangman figures to show progress, with more detail as lives were lost.
    r"""
      +---+
      |   |  
      O   | 
     /|\  | 
     /    | 
          |
    """,
    r"""
      +---+
      |   |  
      O   | 
     /|\  | 
     / \  | 
          | ☠️
    """
• Added win and loss ASCII art for a dramatic and satisfying game ending.
These features made the game more dynamic, fun, and visually appealing.

Final Touches

To give the game a polished feel, I made key improvements:
• Added an introductory message with hangman ASCII art.
• Included player name tracking, lives used, and final results showing the word.
• Wrapped the game in a while(True) loop for infinite rounds, with an option to replay after each round.
These enhancements made the game more engaging, visually appealing, and fun.

Lessons Learned

1. Problem-Solving Skills

Building the Hangman game was a great way to enhance my problem-solving skills. Debugging errors and optimizing gameplay required critical analysis and logical thinking. Handling edge cases like duplicate guesses, invalid inputs, and word categories taught me to design robust conditions. Each challenge refined my structured problem-solving approach: identifying issues, breaking them down, and systematically testing solutions. This iterative process improved my code and strengthened my ability to handle complex problems in future projects.

2. Time Management

Balancing learning Python while working on this project tested my time management skills. I scheduled time for learning concepts, implementing features, and refining the game. Initially, juggling syntax, debugging, and creative elements like ASCII art felt overwhelming. However, a step-by-step plan and task prioritization kept me organized. I focused on essential features, like basic game logic, before adding enhancements like visuals and error-handling. This disciplined approach improved my efficiency and became a valuable skill in my learning journey.

Enjoy The Game

Feel free to explore the project on GitHub (github repo link : https://github.com/anikchand461/Hangman-Game ), try the game, or share your feedback and ideas. I’d love to hear your thoughts and connect with others who are passionate about coding and learning. Let’s build something amazing together!

technical blogs

Anik Chand — Mon, 30 Dec 2024 17:57:10 +0000

blog

Anik Chand — Mon, 30 Dec 2024 17:50:43 +0000

Forem: Anik Chand

I Was Tired of Waiting for GridSearchCV. So I Built Something Smarter. 🚀

The Problem with GridSearchCV

The Insight That Started Everything

Under the Hood — The Full Data Flow

GridSearchCV vs LazyTune — The Visual Difference

The Benchmarks

Does LazyTune match GridSearchCV's accuracy?

LazyTune vs Every Major Tuner

Large Dataset Benchmark

Let's Write Some Code

Basic usage — Random Forest

SVM with F1 score

Understanding prune_ratio

After .fit() — Everything You Get Back

There's Also a Web App

What's On the Roadmap

Try It Right Now

[Boost]

Building a YouTube Video Search App with Flask, Whisper, and RAG

How Google Translate & ChatGPT Work: The Transformer, Unboxed

What Exactly Is a Transformer? 🤔

High-Level Architecture: The Two Main Parts

🧑‍🍳 A Quick Analogy: The Bilingual Cooking Show

High-Level Block Diagram

🏗️ Inside the Input Block (Encoder Input)

Step 1: Tokenizer

Step 2: Embedding (512 dim)

Step 3: Positional Embeddings

Step 4: Add Them Together → Positional Encoded Vectors

📌 What Happens Next?

🧱 Inside the Encoder Block

➡️ Step 1: Input — Positional Encoded Vectors (X1, X2, X3)

🟢 Step 2: Multi Head Attention

➕ Step 3: Residual Connection + Layer Normalisation

🟣 Step 4: Feed Forward Neural Network (FFNN) Block

A. First Linear Layer + ReLU

B. Second Linear Layer

➕ Step 5: Final Residual + Layer Normalisation

🧱 Inside the Decoder Input Block

➡️ Step 1: Right Shift

➡️ Step 2: Tokenizer

➡️ Step 3: Embedding (512 dim)

➡️ Step 4: Positional Embeddings

➡️ Step 5: Add Them Together → Positional Encoded Vectors

📌 What Happens Next?

🧱 Inside the Decoder Block

➡️ Step 1: Input — Positional Encoded Vectors (X1, X2, X3, X4)

🟢 Step 2: Masked Multi Head Attention

➕ Step 3: Residual Connection + Layer Normalisation

🟠 Step 4: Cross Attention

➕ Step 5: Residual Connection + Layer Normalisation

🟣 Step 6: Feed Forward Neural Network (FFNN) Block

A. First Linear Layer + ReLU

B. Second Linear Layer

➕ Step 7: Final Residual + Layer Normalisation

🔄 Repeat 6 Times

🎯 Final Output Block: Turning Numbers into Words

➡️ Step 1: Input — Y1fnorm

🟣 Step 2: Linear Layer (512 → V)

🟠 Step 3: Softmax

🟢 Step 4: Normalisation

🎯 Step 5: Return Highest Probability Value

🔁 Repeat for All Positions

📌 Why This Works

🌟 Conclusion: The Transformer, Demystified

A Beginner's GAN Adventure with Digits

Building the Networks 🛠️

Seeing the Digits Improve 📈

GAN Architecture in My Project 🔧

Discriminator Architecture

Generator Architecture

Sentiment Analysis, the Classical Way — No Deep Learning

📌 Problem Statement

🧪 Dataset

🧹 Preprocessing Steps

⚙️ Baseline Model: GaussianNB

🔍 Model Experiments

🧪 GridSearchCV for Tuning

🧪 Final Accuracy after Retraining

Understanding `prune_ratio`

After `.fit()` — Everything You Get Back

➡️ Step 1: Input — Positional Encoded Vectors (`X1`, `X2`, `X3`)

➡️ Step 1: Input — Positional Encoded Vectors (`X1`, `X2`, `X3`, `X4`)

➡️ Step 1: Input — `Y1fnorm`

📌 Bonus: What `handle_unknown='ignore'` Really Means