A compilation of Different Machine Learning Algorithms/Models for beginners in Data Science Competitons(Kaggle)

anirudhkannan — Tue, 04 Aug 2020 12:26:48 +0000

Hola,

Before reading I want the reader to know that I am not an expert in data science.I am an SDE by profession. I have started spending quite a lot of my time on Kaggle and learning about data science in General.

Here I have compiled a list of frequently used ML Algorithms by various Kaggle Grandmasters, so that I can frequently lookup to this list, keep adding more stuff here for faster lookup during future Competitions(This post is just meant to be my cache)

If you consider yourself an expert, please skip this post.

1) Linear Model
1. Especially good for sparse high dimensional data.
2. Usually split a given space into two sub spaces with a line/hyperspace.
3. Regularization is usually done for Linear models in pre processing during Competitions

eg:

Logistic Regression
Support Vector Machines

Best Implementations:-

Sckit Learn
VowPal Rabbit

2) Tree Based Methods (Uses Decision tree to create models)

Here we divide spaces into sub spaces until probability of a class in a divided.

eg:

Random Forest
Gradient Boosted Decision Trees(We improve prediction probabilities based on probabilities of sum of the previous ones)
ExtraTrees Classifier

Disadvantages:

Hard to capture linear splits if it exists while classifying

Best Implementations:

Sckit Learn
XGBoost
LightGBM

3) K-NN(K nearest neighbours) methods

Based on intutiton/assumption that nearest neighbours have
similar labels.

Best Implementations are in Sckit Learn

4) Neural Networks

The most used ones according to a Kaggle Grandmaster are Feed-forward neural network which produces smooth non-linear decision boundaries.

Best Implementations:

TensorFlow
Keras
mxnet
Pytorch
Lasagne

Making Inferences from Decision Surfaces

If lines parallel to the axis and boundaries are smooth then its probably a Random Forest

Important: Choose a model for a Particular Competition based on use the use case as no model is better than others in all situations

A Beginners guide to Data Science: How I started with Data Science, Kaggle and Machine Learning...Solutions, Tips and much more

anirudhkannan — Sun, 05 Jul 2020 14:36:13 +0000

This is the starting post of a series I am writing that is meant to help people get started with Data Science. I am not a highly experienced person in Data Science yet, but I keep learning everyday and I am gonna keep publishing content to help people get started with Data Science as getting started seems like the toughest job.

Sometimes I may be automatically publishing my work done at Kaggle or any website directly through a web page parser I wrote with Python, Selenium and Beautiful Soup, so pardon me for any markdown errors. I hope to write a parser one day that will learn from its mistakes and create a perfect medium or dev.to post. But that still has a long way to go haha :)

The upcoming series is meant to help newbies with Data Science, get through Kaggle Courses, where I discuss the solutions of courses, contests and much more here or in upcoming posts.

Pardon me for my mistakes if any, since I am new to publishing written content but I hope to create high quality work as I always believe in highest standards of work.

Please follow me if you like my content or feel free to reach out to me regarding any comments/feedback. I always value comments from anyone irrespective of your experience, as I believe all humans are equal. Lets all grow as humans and create a better world :)

Forem: anirudhkannan

A compilation of Different Machine Learning Algorithms/Models for beginners in Data Science Competitons(Kaggle)

A Beginners guide to Data Science: How I started with Data Science, Kaggle and Machine Learning...Solutions, Tips and much more