Forem: yuval mehta

I Built a Free Web App with Tools Students Actually Need

yuval mehta — Thu, 29 May 2025 13:01:49 +0000

Introduction

As a developer and student, I constantly found myself searching for simple online tools like paraphrasers, calculators, or essay expanders. Most of what I found was either:

Overcomplicated
Hidden behind paywalls
Required sign-up just to access basic features

This made me realize there’s room for a better solution—something clean, fast, and genuinely useful. That’s what inspired Tools for Students: a no-login suite of web tools designed to help students be more productive.

🔗 https://toolsforstudents.netlify.app

What is “Tools for Students”?

Tools for Students is a growing collection of browser-based utilities tailored to academic needs. These tools are:

Instant-use: No login or registration required
Lightweight and mobile-friendly
Designed with performance and usability in mind

Tools Available

Here are some of the tools currently live:

Paraphrasing Tool

Helps reword text for essays and assignments while preserving meaning.
Essay Expander

Adds depth and elaboration to brief essay content, useful for drafts and brainstorming.
GPA Calculator

Calculates weighted or unweighted GPA quickly and accurately.
Basic Calculator

Simple math operations, always one click away.

Upcoming tools include a text summarizer, citation builder, and grammar checker.

Why This Project Matters

Most online tools either overcomplicate the experience or gatekeep basic functionality behind accounts or paywalls. This project takes a different approach:

Everything runs in the browser
No accounts or personal data needed
Designed specifically for student workflows

It’s built with simplicity and speed at the core.

Technical Overview

Frontend Only: HTML, Tailwind CSS, and lightweight JS Framework
Hosting: Deployed on Netlify as a static site
Performance: Optimized for fast loading and minimal page weight
No Backend: Tools operate entirely in-browser, which also makes maintenance easier

This architecture keeps hosting costs low while ensuring high availability.

What’s Next

Expanding the number of tools
Adding keyboard shortcuts and UI refinements
Integrating more educational utilities based on feedback

If you're a student or educator, give it a try and let me know what else you'd like to see.

🔗 Visit Tools for Students

Evolve Your Machine Learning: Automate the Process of Model Selection through TPOT.

yuval mehta — Sat, 06 Jul 2024 18:45:10 +0000

One day, I google for optimizing my machine learning projects and I came across the TPOT library. Based on the genetic algorithms, TPOT stands for Tree-based Pipeline Optimization Tool, is an automatic way to select the model and tune hyperparameters. More information regarding TPOT, its features and a step by step guide on how to use TPOT to automate your machine learning process shall be discussed in this blog.

What is TPOT?
TPOT is a Python library that utilizes genetic programming to optimize the pipeline of machine learning. It deals with two problems that are otherwise time-consuming, that is; model selection and hyperparameter tuning so that the data scientists can find better solutions to ostensibly problematic tasks. TPOT has several models to choose from and the hyperparameters of the models are dynamically optimised with new best pipelines being incorporated.

Key Features of TPOT
Automation: TPOT optimizes the selection of models and the tuning of the hyperparameters of the chosen models themselves.
Genetic Programming: Uses genetic algorithms to solve the problem of the evolution of machine learning pipelines.
Scikit-Learn Compatibility: TPOT is designed to be highly flexible, is implemented in Python, and leverages scikit-learn which should integrate well into most pipelines.
Customizability: Users can can also set their personalized operators and pipeline settings.

How TPOT Works
In the case of TPOT, it applies genetic programming so as to evolve the machine learning pipelines. Starting with a set of random pipelines and then using selection, crossover, and mutation, it improves the pipelines. The fitness function is a key element of the process since it provides assessment of the pipelines’ performance.

Getting Started with TPOT
Now let us consider the steps of setting up TPOT as the tool to automate the most of the ML processes.

Step 1: Installing TPOT
You can install TPOT using pip:



pip install tpot

Step 2: Importing Necessary Libraries
Once installed, you can import TPOT and other necessary libraries.



import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from tpot import TPOTClassifier

Step 3: Loading and Preparing Data
I am using gamma-telescope data which I found in Kaggle datasets



telescope=pd.read_csv('/kaggle/input/magic-gamma-telescope-dataset/telescope_data.csv')
telescope.drop(telescope.columns[0],axis=1,inplace=True)
telescope.head()



telescope_shuffle=telescope.iloc[np.random.permutation(len(telescope))]
telescope=telescope_shuffle.reset_index(drop=True)
telescope['class']=telescope['class'].map({'g':0,'h':1})

Step 4: Configuring and Running TPOT
Configure the TPOT classifier and fit it.



tele_class = telescope['class'].values
tele_features = telescope.drop('class',axis=1).values
training_data, testing_data, training_classes, testing_classes = train_test_split(tele_features, tele_class, test_size=0.25, random_state=42, stratify=tele_class)



tpot = TPOTClassifier(generations=5,verbosity=2)
tpot.fit(training_data, training_classes)

Step 5: Evaluating the Best Pipeline



tpot.score(testing_data, testing_classes)

Step 6: Understanding the Output
The export function saves the best pipeline as a Python script



import os
os.makedirs('Output',exist_ok=True)
tpot.export('Output/tpot_pipeline.py')

Output file (tpot_pipeline.py):



import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import RobustScaler
from tpot.builtins import ZeroCount

# NOTE: Make sure that the outcome column is labeled 'target' in the data file
tpot_data = pd.read_csv('PATH/TO/DATA/FILE', sep='COLUMN_SEPARATOR', dtype=np.float64)
features = tpot_data.drop('target', axis=1)
training_features, testing_features, training_target, testing_target = \
            train_test_split(features, tpot_data['target'], random_state=None)

# Average CV score on the training set was: 0.8779530318962496
exported_pipeline = make_pipeline(
    ZeroCount(),
    RobustScaler(),
    MLPClassifier(alpha=0.001, learning_rate_init=0.01)
)

exported_pipeline.fit(training_features, training_target)
results = exported_pipeline.predict(testing_features)

Conclusion
TPOT is a significant tool based on the application of genetic algorithms, which helps to solve the task of automatic finding of the optimal structure of the machine learning pipeline. Therefore, by automating the process by bringing TPOT into your development environment, you can cut out the time devoted to model selection or fine-tuning of hyperparameters in favor of more intricate operations of your tasks.

Resources:
TPOT Documentation
Genetic Programming

Unleashing GPU Power: Supercharge Your Data Processing with cuDF

yuval mehta — Fri, 21 Jun 2024 11:12:49 +0000

This time, while randomly scrolling through some blog post about the latest AI advancement and its capabilities, I found out about cuDF , which is part of the family of software libraries and APIs called RAPIDS for accelerating data operations and Machine Learning on GPUs. During data feeding, cuDF allows for the parallel processing on NVIDIA GPUs which, in turn, may be effective in large data operations. The next blog will give an overview as what cuDF is, major current functionalities related to cuDF and how to perform data manipulation using cuDF.

What is cuDF?

cuDF is a GPU DataFrame library which is pandas like for handling data on GPU. It enables data scientists and engineers to work with large amounts of data and carry out in-memory processing, thus it is ideal for pre-processing steps.

Key Features of cuDF

High Performance: Due to the use of the GPUs, the cuDF is able to perform data operations faster than that of the other CPU based libraries.
Pandas Compatibility: cuDF is built to have a similar interface to pandas so that users of pandas do not have to learn how to use a new system but can transfer over to using the GPU-based system instead.
Seamless Integration: cuDF is interoperable with other tensor libraries in the RAPIDS ecosystem such as cuML for machine learning and cuGraph for graph analytics.

Getting Started with cuDF

Now, without further ado, let’s go over the basic setup and how to use cuDF for data manipulation.

Step 1: Installing cuDF

First of all, it is necessary to mention that the use of cuDF is possible in case if the user has a proper NVIDIA GPU, as well as the suitable version of CUDA toolkit. Accordingly, you can download it from Rapids AI

This is the command which I got from Rapids AI installation guide for my system



conda create -n rapids-24.06 -c rapidsai -c conda-forge -c nvidia  \
    rapids=24.06 python=3.11 cuda-version=12.2

Step 2: Importing cuDF



import cudf
import numpy as np
import pandas as pd

Step 3: Creating a cuDF DataFrame
You can create a cuDF DataFrame from various data sources, including pandas DataFrames, CSV files, and more.



# Create a cuDF DataFrame from a pandas DataFrame
pdf = pd.DataFrame({
    'a': np.random.randint(0, 100, size=10),
    'b': np.random.random(size=10)
})
gdf = cudf.DataFrame.from_pandas(pdf)
print(gdf)

Step 4: Data Manipulation with cuDF
cuDF provides a rich set of functions for data manipulation, similar to pandas.



# Adding a new column
gdf['c'] = gdf['a'] + gdf['b']

# Filtering data
filtered_gdf = gdf[gdf['a'] > 50]

# Grouping and aggregation
grouped_gdf = gdf.groupby('a').mean()
print(grouped_gdf)

Step 5: Reading and Writing Data
cuDF supports reading from and writing to various file formats, such as CSV, Parquet, and ORC.



# Reading from a CSV file
gdf = cudf.read_csv('data.csv')

# Writing to a Parquet file
gdf.to_parquet('output.parquet')

Step 6: Performance Comparison with Pandas



import time

# Create a large pandas DataFrame
pdf = pd.DataFrame({
    'a': np.random.randint(0, 100, size=100000000),
    'b': np.random.random(size=100000000)
})

# Create a cuDF DataFrame from the pandas DataFrame
gdf = cudf.DataFrame.from_pandas(pdf)

# Timing the pandas operation
start = time.time()
pdf['c'] = pdf['a'] + pdf['b']
end = time.time()
print(f"Pandas operation took {end - start} seconds")

# Timing the cuDF operation
start = time.time()
gdf['c'] = gdf['a'] + gdf['b']
end = time.time()
print(f"cuDF operation took {end - start} seconds")

From the image we can see that cuDF is 40 times more faster than pandas

Step 7: Using cuDF as a no-code-change accelerator for pandas



%load_ext cudf.pandas



# Pandas operations now use the GPU!
import pandas as pd
import time

# Create a large pandas DataFrame
pdf = pd.DataFrame({
    'a': np.random.randint(0, 100, size=100000000),
    'b': np.random.random(size=100000000)
})

# Timing the pandas operation with cudf.pandas
start = time.time()
pdf['c'] = pdf['a'] + pdf['b']
end = time.time()
print(f"Pandas operation with cuDF loaded took {end - start} seconds")

We can see from the image that it gives almost similar performance compared to using cuDF APIs

Conclusion
cuDF is the equipping methodology to speed up data processing pipelines by using the parallel computing system, GPUs. This is the tool’s biggest strength: Since its usage directly corresponds with pandas, users can switch and start enjoying the performance improvements quickly. Thus, with the help of cuDF lets incorporate it in data science movement, which will help to work with larger datasets and perform complex operations faster than conventional computers.

Resources:

If you have any questions about cuDF or if you have used it in your project in the past then please feel free to drop the questions and/or experiences in the comments section below. Happy computing!

From Noise to Art: Building Your First Generative Adversarial Network

yuval mehta — Mon, 17 Jun 2024 13:00:06 +0000

I was introduced to this splendid machine learning idea known as Generative Adversarial Networks (GANs) especially in the image generation area. Another framework known as GANs was developed by Ian Goodfellow in 2014; its underlying architecture is built by utilizing a two-neural-network competition. According to the scope of this blog, let me first introduce what GAN is, and then tell you what I am going to do in this blog including the code in TensorFlow about how to train a simple GAN.

What are GANs?
At its core, a GAN consists of two neural networks: Of course, there is the generator of the fake data, and the discriminator that learns how to distinguish between the fake and the real thing.

Generator: After inputting noise and then passes them to produce an output data that resembles the pattern of the training data set.
Discriminator: The discriminator employed in the description of this model takes an input sample and tries to afford a guess if the sample was drawn from the training data or was just synthesized with the help of the generator.

These two networks are trained simultaneously in a zero-sum game framework: while in GANs the generative network will feed information to the discriminative network in an effort to fool it into believing that the data fed to it is real but on the other side the discriminative network has the role of distinguishing real data from fake data.

Step-by-Step Guide to Building a Simple GAN

Step 1: Setting Up the Environment

pip install tensorflow

Step 2: Import Necessary Libraries

import tensorflow as tf
from tensorflow.keras import layers
import numpy as np
import matplotlib.pyplot as plt

Step 3: Define the Generator

The generator network will next take a randomly chosen noise vector and map it into a data point that looks like the actual training data.

def build_generator():
    model = tf.keras.Sequential()
    model.add(layers.Dense(8*8*128, use_bias=False, input_shape=(100,)))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())
    model.add(layers.Reshape((8, 8, 128)))
    assert model.output_shape == (None, 8, 8, 128)  # Note: None is the batch size

    model.add(layers.Conv2DTranspose(128, (5, 5), strides=(1, 1), padding='same', use_bias=False))
    assert model.output_shape == (None, 8, 8, 128)
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())

    model.add(layers.Conv2DTranspose(128, (5, 5), strides=(2, 2), padding='same', use_bias=False))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())
    assert model.output_shape == (None, 16, 16, 128)

    model.add(layers.Conv2DTranspose(128, (5, 5), strides=(2, 2), padding='same', use_bias=False))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())
    assert model.output_shape == (None, 32, 32, 128)

    model.add(layers.Conv2DTranspose(128, (5, 5), strides=(2, 2), padding='same', use_bias=False))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())
    assert model.output_shape == (None, 64, 64, 128)

    model.add(layers.Conv2DTranspose(3, (5, 5), strides=(2, 2), padding='same', use_bias=False, activation='tanh'))
    print(model.output_shape)

    return model

generator = build_generator()
generator.summary()

Step 4: Define the Discriminator

The discriminator network will take an input sample and classify it as real

def build_discriminator():
    model = tf.keras.Sequential()
    model.add(layers.Conv2D(64, (5, 5), strides=(2, 2), padding='same',
                                     input_shape=[128, 128, 3]))
    model.add(layers.LeakyReLU())
    model.add(layers.Dropout(0.3))

    model.add(layers.Conv2D(128, (5, 5), strides=(2, 2), padding='same'))
    model.add(layers.LeakyReLU())
    model.add(layers.Dropout(0.3))

    model.add(layers.Conv2D(256, (5, 5), strides=(2, 2), padding='same'))
    model.add(layers.LeakyReLU())
    model.add(layers.Dropout(0.3))

    model.add(layers.Flatten())
    model.add(layers.Dense(1))
    return model

discriminator = build_discriminator()
discriminator.summary()

Step 5: Test the models

noise = tf.random.normal([1,100])
generated_image = generator(noise,training=False)
print(discriminator(generated_image))
plt.imshow(generated_image[0]*127.5+127.5)

Step 6: Setup loss function and optimizer

cross_entropy=BinaryCrossentropy(from_logits=True)

def discriminator_loss(real_output,fake_output):
  real_loss = cross_entropy(tf.ones_like(real_output),real_output)
  fake_loss = cross_entropy(tf.zeros_like(fake_output),fake_output)
  total_loss = real_loss + fake_loss
  return total_loss

def generator_loss(fake_output):
  return cross_entropy(tf.ones_like(fake_output),fake_output)

generator_optimizer = tf.keras.optimizers.Adam(1e-4)
discriminator_optimizer = tf.keras.optimizers.Adam(1e-4)

Step 7: Setup checkpoint

checkpoint_dir = 'training_checkpoints'
checkpoint_prefix = os.path.join(checkpoint_dir,'ckpt')
checkpoint = tf.train.Checkpoint(generator_optimizer=generator_optimizer,
                                 discriminator_optimizer=discriminator_optimizer,
                                 generator=generator,
                                 discriminator=discriminator)

Step 8: Defining train step

@tf.function
def train_step(images):

    noise=tf.random.normal([batch_size,noise_dims])

    with tf.GradientTape() as gen_tape, tf.GradientTape() as dis_tape:
        generated_images=generator(noise,training=True)

        real_output=discriminator(images,training=True)
        fake_output=discriminator(generated_images,training=True)

        gen_loss=generator_loss(fake_output)
        disc_loss=discriminator_loss(real_output,fake_output)

    gen_gradients=gen_tape.gradient(gen_loss,generator.trainable_variables)
    dis_gradients=dis_tape.gradient(disc_loss,discriminator.trainable_variables)

    generator_optimizer.apply_gradients(zip(gen_gradients,generator.trainable_variables))
    discriminator_optimizer.apply_gradients(zip(dis_gradients,discriminator.trainable_variables))

    return gen_loss,disc_loss

Step 9: Setting up the training loop and saving generated images

from IPython import display
import time

total_gloss=[]
total_dloss=[]
def train(dataset,epochs):
    for epoch in range(epochs):
        disc_loss=gen_loss=0
        start=time.time()
        count=0
        for batch in dataset:
            losses=train_step(batch)
            count+=1
            disc_loss+=losses[1]
            gen_loss+=losses[0]
        total_gloss.append(gen_loss.numpy())
        total_dloss.append(disc_loss.numpy())

        if (epoch+1)%50==0:
            checkpoint.save(file_prefix=checkpoint_prefix)
            display.clear_output(wait=True)
            generate_and_save_output(generator,epoch+1,seed)

        print(f'Time for epoch {epoch + 1} is {time.time()-start}')
        print(f'Gloss: {gen_loss.numpy()/count} , Dloss: {disc_loss.numpy()/count}',end='\n\n')
    display.clear_output(wait=True)
    generate_and_save_output(generator,epochs,seed)

def generate_and_save_output(model,epoch,test_input):

      predictions = model(test_input,training=False)
      fig = plt.figure(figsize=(4,4))
      for i in range(predictions.shape[0]):
        plt.subplot(4,4,i+1)
        plt.imshow((predictions[i]*127.5+127.5).numpy().astype(np.uint8),cmap='gray')
        plt.axis('off')
      plt.savefig(f'image_at_epoch_{epoch}.png')
      plt.show()

Step 10: Train the GAN

Let's train our GAN, I have used dog image dataset, which is available on Kaggle stanford dog dataset

EPOCHS = 500
noise_dims = 100
num_egs_to_generate = 16
seed = tf.random.normal([num_egs_to_generate,noise_dims])

train(train_images,EPOCHS)

Note: To generate good-quality images, the model would require large number of epochs.

Trying our model:

new_image = generator(tf.random.normal([1,100]),training=False)
plt.imshow((new_image[0]*127.5+127.5).numpy().astype(np.uint8))

Conclusion
GANs are useful in producing realistic datasets since they are a type of neural network that learns from the labeled training data and then creates new data. From here, it would be clear and feasible to formulate a sensible GAN and from here it is evident that there exists a relative rhythm between the motions of the generator as well as Discriminator. This distills the current guide’s aim to merely introduce the reader to the subject of GAN and offer them a first taste of what is possible in this burgeoning research area.

Resources:
Ian Goodfellow's Original Paper
TensorFlow Documentation
My Github Repo
Feel free to ask questions or share your GAN projects in the comments below!