Forem: Jubaeir Islam

Why CSE Graduates Should Look for Product Management Jobs?

Jubaeir Islam — Mon, 30 Oct 2023 05:04:42 +0000

As a computer science engineer graduating in 2023, you likely have several career paths to choose from. While software engineering may seem like an obvious option, I encourage you to also consider opportunities in product management.

Computer Science Engineering (CSE) graduates have a strong foundation in the technical aspects of software development, making them well-suited for product management roles. In product management, you will be responsible for the entire lifecycle of a product, from ideation to launch to post-launch support. This requires a deep understanding of the technical aspects of the product, as well as the ability to think strategically and creatively.

Here are a few reasons why product management can be a great fit for CSE graduates:

You understand Technology

As a CSE grad, you have strong technical skills and understand how software and products work under the hood. This gives you an advantage when collaborating with engineers and assessing technical feasibility of product ideas.

You're Analytical

Product management requires strong analytical skills - from crunching numbers on business models to analyzing user data. Your training in methods like algorithms and data structures equips you to approach product decisions methodically.

You're Innovative

Product managers are responsible for coming up with creative solutions to problems. The out-of-the-box thinking you gained in school by solving coding challenges applies here.

You see the big picture

In computer science programs, you work on complex systems and learn how individual components fit together. This systematic perspective allows you to view products holistically.

You can pick up New Domains

While PMs need to understand their product domain, strong general learning abilities enable you to rapidly gain knowledge in new areas.

Conclusion

Product managers are in high demand. The tech industry is booming, and there is a growing demand for skilled product managers. CSE graduates have the skills and experience that are in high demand, making them well-positioned to land these jobs. Product management is a well-paid career. Product managers are some of the highest-paid professionals in the tech industry. According to Glassdoor, the median salary for a product manager in the United States is $122,000. If you're intrigued by blending business, technology and design to create products users love, consider a career in PM. With your technical foundation and adaptive mindset, you have much to contribute in this role.

The Power of Early Research: Why It's Important for CSE Students to Start Research After Their First Year

Jubaeir Islam — Tue, 04 Jul 2023 08:48:21 +0000

Computer Science and Engineering (CSE) is a dynamic field that thrives on innovation, and research plays a pivotal role in driving advancements in technology. As a CSE student, embarking on research after your first year might seem intimidating or even premature, but the benefits are numerous and far-reaching. In this blog article, we will explore why it is crucial for CSE students to embrace research early in their academic journey and how it can shape their future success in the tech industry.

Gaining a Deeper Understanding of Core Concepts

When you dive into research during your early academic years, you'll be exposed to more complex and nuanced concepts that go beyond the standard curriculum. Engaging in research projects allows you to apply the theoretical knowledge you have acquired to real-world problems, fostering a deeper understanding of the subject matter. This comprehension can prove invaluable as you progress in your studies and tackle advanced CSE topics.

Developing Critical Problem-Solving Skills

Research inherently involves solving problems and overcoming challenges. By participating in research projects, CSE students enhance their analytical and critical thinking abilities. They learn to approach complex issues with creativity and logical reasoning, cultivating problem-solving skills that are essential not only in academia but also in the industry.

Building Hands-On Technical Expertise

Research often requires working with cutting-edge technologies and tools. As a result, early engagement in research exposes CSE students to hands-on experience in using these tools, programming languages, and methodologies. Gaining technical expertise in research is advantageous, as it equips students with practical skills sought after by employers in the tech industry.

Networking and Collaborations

Research projects often involve collaboration with professors, researchers, and peers. Building connections with academics and industry professionals at an early stage can open doors to various opportunities, such as internships, scholarships, and conference presentations. These networks can prove invaluable in shaping a student's future career trajectory.

Starting research early in their academic journey can be a transformative experience for CSE students. It provides a platform to enhance their understanding of core concepts, develop critical problem-solving skills, and gain hands-on technical expertise. Additionally, early research engagement allows students to explore diverse domains, nurture creativity and innovation, and build valuable networks. As a result, CSE students who embrace research after their first year set themselves on a path of continuous growth, increased opportunities, and a rewarding career in the ever-evolving world of computer science and engineering.

How Microsoft Azure is Revolutionizing Machine Learning

Jubaeir Islam — Sat, 06 May 2023 05:24:29 +0000

Introduction

Machine learning is becoming increasingly important in the tech industry, as companies look to make data-driven decisions and gain a competitive edge. Microsoft Azure, a cloud computing service from Microsoft, is at the forefront of the machine learning revolution, offering a range of machine learning capabilities that are transforming the way businesses operate.

What is Machine Learning?

Before delving into Microsoft Azure's machine learning capabilities, it's important to understand what machine learning is and why it's important. Machine learning is a type of artificial intelligence that allows computer systems to learn and improve from experience, without being explicitly programmed. There are different types of machine learning, including supervised learning, unsupervised learning, and reinforcement learning, each with its own applications.

Microsoft Azure's Machine Learning Capabilities

Microsoft Azure offers a range of machine learning capabilities, including pre-built and custom machine learning models, as well as automated machine learning. With Azure's pre-built models, companies can easily incorporate machine learning into their operations, without the need for extensive programming knowledge. For those who need more customized machine learning models, Azure also offers tools to build custom models tailored to specific business needs. Azure's automated machine learning capabilities allow companies to rapidly create and deploy machine learning models, without the need for extensive coding.

Benefits of Microsoft Azure for Machine Learning

One of the key benefits of using Microsoft Azure for machine learning is its scalability. Azure can easily handle large amounts of data, making it ideal for businesses of all sizes. Azure is also cost-effective, with pay-as-you-go pricing options that allow companies to scale their machine learning operations as needed. Additionally, Azure's machine learning capabilities are designed to be user-friendly, with drag-and-drop interfaces and easy-to-understand workflows.

Real-world Applications of Microsoft Azure Machine Learning

Microsoft Azure's machine learning capabilities have been used by companies in a range of industries, from transportation to healthcare. For example, UPS uses Azure machine learning to optimize its package delivery routes, while Uber uses Azure for real-time fraud detection. Azure machine learning can also be used for predictive maintenance, allowing companies to anticipate maintenance needs before they become a problem.

Conclusion

Machine learning is revolutionizing the way businesses operate, and Microsoft Azure is at the forefront of this revolution. With its range of machine learning capabilities, scalability, and cost-effectiveness, Azure is an ideal platform for companies looking to incorporate machine learning into their operations. By leveraging Azure's machine learning capabilities, companies can make more informed decisions, improve efficiencies, and gain a competitive edge in their industries.

Let's use pandas effectively in our code

Jubaeir Islam — Thu, 26 Jan 2023 18:08:28 +0000

In the field of data science, the use of powerful and efficient tools is essential for effectively analyzing and interpreting large datasets. One such tool that is widely used by data scientists is Pandas, a library for Python that provides fast and flexible data structures for data analysis.

Pandas is a powerful library for Python that is widely used in the field of data science for data analysis and manipulation. It provides fast and flexible data structures, such as DataFrames and Series, that make it easy to work with large datasets. In this blog post, we will explore some of the most popular methods used in Pandas and how they can be effectively utilized in data science.

One of the most popular methods in Pandas is the read_csv() function, which is used to read and import data from a CSV file. This function can be used to import data into a Pandas DataFrame and is a quick and easy way to load data for analysis.

import pandas as pd
data = pd.read_csv('data.csv')

Another popular method in Pandas is the head() function, which is used to view the first few rows of a DataFrame. Always use this function to quickly inspect the structure and contents of a dataset.

data.head()

Output:

   col1  col2  col3
0     1     2     3
1     4     5     6
2     7     8     9
3    10    11    12
4    13    14    15

You can use describe(). This method returns the basic statistical summary of the numerical columns in a DataFrame.

df.describe()

Output:

           col1       col2       col3
count  10.00000  10.000000  10.000000
mean   17.50000  18.500000  19.500000
std    11.77439  11.774437  11.774437
min     1.00000   2.000000   3.000000
25%     9.25000  10.250000  11.250000
50%    17.50000  18.500000  19.500000
75%    25.75000  26.750000  27.750000
max    34.00000  35.000000  36.000000

Pandas also provides a variety of methods for data cleaning and preparation, such as the dropna() and fillna() methods. The dropna() method is used to remove rows or columns with missing data, while the fillna() method is used to fill in missing values with a specific value or method.

data.dropna()
data.fillna(value=0)

Pandas also provides powerful methods for data manipulation and transformation, such as groupby()and pivot_table(). The groupby() method is used to group data by a specific column, while the pivot_table() method is used to reshape data and create a pivot table.

data.groupby('column_name').mean()
data.pivot_table(values='column_name', index='grouping_column', aggfunc='mean')

Let's break it down. in first line we group the dataset by a column and by default it will give a mean value. But for writing cleaner code we can use .pivot_table(). The pivot_table() method allows you to create a new table by grouping rows based on one column and calculating aggregate values for another column.

Amazon Web Services for Data Science and Machine Learning

Jubaeir Islam — Sat, 14 Jan 2023 09:52:13 +0000

AWS, a well-known cloud computing platform, provides a variety of services for data science and machine learning. Large volumes of data can be stored and processed using these services, as well as built, trained, and deployed machine learning models.

Amazon SageMaker

Amazon SageMaker is one of the key services for data science and machine learning on AWS. Machine learning models may be easily built, trained, and deployed using SageMaker, a fully managed service. It has numerous tools for preprocessing data, training models, and deploying them, as well as built-in algorithms for typical jobs like text and image classification.

Useful services provided by AWS

Amazon Elastic Container Service for Kubernetes is a significant service for data science and machine learning on AWS . Running Kubernetes clusters on AWS is simple thanks to EKS, a fully managed service. This can be used to scale and highly available deploy machine learning models.
AWS also offers a wide range of storage and data processing services, such as Amazon S3 for storing large amounts of data and Amazon Redshift for analyzing and querying data. These services can be used to store and process data for use in machine learning models.
Amazon Comprehend and Amazon Transcribe are just two of the other services that AWS provides that can be utilized in data science and machine learning. Amazon Comprehend is used for natural language processing. Tensorflow, Pytorch, and Scikit-learn are also just a handful of the machine learning tools which can be utilized with AWS.

Conclusion

In conclusion, AWS offers a wide range of services that can be used to build, train, and deploy machine learning models, as well as to store and process large amounts of data. These services can be used together to create a powerful and flexible environment for data science and machine learning.

How to stay productive as a Computer Science Student

Jubaeir Islam — Thu, 29 Dec 2022 18:34:10 +0000

Feeling overwhelmed and unproductive in university? I felt the same actually. I am studying CSE and just finished my first semester. Even though I have prior experience of programming and project building. It was quite overwhelming for me to study something which was once my hobby but isn’t it amazing? I used to be very doubtful about my performance, thinking I wasn’t productive enough. As a computer science student, it is important to stay productive in order to succeed in your studies and ultimately in your career. Here I will discuss some tips that I followed to study effectively while developing skills for making myself eligible for the fast-paced world.

Studying in a Structured Manner

You have a ton of work to complete as a student. It could be academic research, test and exam prep, group project work, or any other extracurricular activity. There is undoubtedly a deadline for you to do your tasks. Set clear objectives for each study session. When you are clear on what you want to achieve, it is simpler to stay motivated and focused.

“You can’t reach for anything new if your hands are still full of yesterday’s junk.” - Louise Smith

The 80/20 Rule

To stay motivated and productive, take regular breaks throughout the day as well as longer breaks over the weekend or holidays. This will give your mind a chance to rest from all of the studying and allow you to recharge for the next week’s workload. Finally, don’t forget about self-care! Make sure that you are taking care of yourself physically and mentally while studying. This could mean eating healthy meals or taking part in some form of physical exercise. I used to follow the 80/20 rule. Study for 80 minutes without any distractions and then take a break of 20 minutes. Doing so will help ensure that you remain productive during your studies.

“Don't underestimate the power of resting. It builds you back unlike anything.” - Hiral Nagda

Stay Inspired

When learning computer science, it is easy to lose motivation, especially when faced with difficult material or a challenging assignment. One method to keep motivated is to remind yourself of your long-term goals and why you are studying computer science.

“Success is the sum of small efforts, repeated day in and day out.” - Robert Collier

Do your own Research

While developing a skill it’s normal to get stuck. It’s very common that you would face trouble solving a problem or an error. Always google first. If there’s a problem there’s always a solution for it and someone has already faced the same problem. Do your own research on Google, Stack overflow, GitHub etc. If you can’t find a solution then seek help from your teacher, your classmates and developer communities.

“It is better to fail in originality than to succeed in imitation” - Zora Neale Hurston

Skill Development

As a computer science student, you must learn a variety of skills in order to excel in your studies and, ultimately, in your career. Apart from academics you must invest some of your time in developing skills, it’s your time to explore the possibilities. I have been programming for over 4 years, I have explored most departments of computer science (e.g. web development, app development, data science, cyber security) . The more you explore and gather knowledge the faster you realize which path you should follow as your career. But whatever you learn, learn effectively. It’s normal for any engineer to switch their jobs.

“The beautiful thing about learning is that nobody can take it away from you.” - B.B. King

Time Management

Make good use of your time. Time management is essential to stay productive as a computer science student. Make sure you are not overburdened with commitments and responsibilities, and strive to prioritize your tasks appropriately.

“The shorter way to do many things is to only do one thing at a time.” - Mozart

You can stay productive and flourish as a computer science student if you follow these recommendations. It's crucial to remember that everyone has a different learning style, and what works for one person might not work for another. Be prepared to try new things and don't be hesitant to seek assistance when you need it. You can reach your goals and reach your full potential as a computer science student with dedication and hard work. Happy learning!

Why Numpy is so important for Data Science?

Jubaeir Islam — Wed, 14 Dec 2022 18:16:19 +0000

What is Numpy?

NumPy is a Python library that is used for scientific computing. It is designed to work with large arrays of data, and provides functions for working with these arrays efficiently. It is an essential tool for working with data in Python, and is widely used in many different fields such as scientific research, machine learning, and data analysis.

Use Cases of Numpy

Data scientists use NumPy because it provides a number of convenient features for working with large arrays of data. For example, NumPy allows data scientists to perform mathematical operations on entire arrays of data, which is much more efficient than performing the same operations on individual data points. NumPy also provides tools for working with arrays of data that have missing or incomplete values, which is a common issue in many data sets. Additionally, NumPy is fast and efficient, which is important when working with large amounts of data. Overall, NumPy provides data scientists with a powerful and convenient set of tools for working with data in Python.

Python List vs Numpy Arrays

Python Lists vs Numpy Arrays

NumPy arrays and Python lists are similar in that they are both used to store collections of data. However, there are some key differences between the two. NumPy arrays are typically more efficient than Python lists for storing and manipulating large amounts of data. This is because NumPy arrays are stored in contiguous blocks of memory, which makes it more efficient to access and manipulate the data. In contrast, Python lists are stored in a more flexible way, which makes them more flexible but less efficient for working with large amounts of data. Additionally, NumPy provides a number of convenient functions for working with arrays, which makes it easier to perform mathematical operations on the data. Overall, NumPy arrays are more efficient and convenient for working with large amounts of data, and are the preferred choice for many data science applications.

Numpy in a nutshell

To use NumPy, you first need to import it into your Python environment using the import statement. For example, you can import NumPy using the following code:

import numpy as np

Once you have imported NumPy, you can create arrays using the np.array() function. This function takes a list of numbers as input, and returns a NumPy array containing those numbers. For example, the following code creates a NumPy array containing the numbers 1, 2, and 3:

my_array = np.array([1, 2, 3])

NumPy arrays support a wide range of mathematical operations, such as addition, subtraction, multiplication, and division. These operations can be performed on entire arrays at once, rather than on individual elements. For example, the following code adds two NumPy arrays together:

array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])

array3 = array1 + array2

In this example, array3 will be equal to [5, 7, 9]. NumPy also provides a number of convenient functions for working with arrays, such as finding the minimum and maximum values, calculating the mean and standard deviation, and more.

Overall, NumPy is a powerful and convenient library for working with large arrays of data in Python. It is an essential tool for many data science applications, and is widely used in fields such as scientific research, machine learning, and data analysis.

Most used built-in functions for Numpy used by professionals

Some of the most commonly used built-in functions in NumPy include:

np.array(): This function is used to create NumPy arrays from a list of numbers.
np.zeros(): This function creates a new NumPy array filled with zeros.
np.ones(): This function creates a new NumPy array filled with ones.
np.full(): This function creates a new NumPy array filled with a specified value.
np.eye(): This function creates a new square NumPy array with the diagonal elements set to one and the rest set to zero.
np.linspace(): This function creates a new NumPy array with a specified number of evenly spaced elements between a start and end value.
np.random.random(): This function creates a new NumPy array with random values between 0 and 1.
np.mean(): This function calculates the mean of the elements in a NumPy array.
np.min(): This function calculates the minimum value of the elements in a NumPy array.
np.max(): This function calculates the maximum value of the elements in a NumPy array. These are just a few examples of the many built-in functions that NumPy provides. NumPy also offers many other functions for working with arrays, such as functions for sorting, reshaping, and concatenating arrays.

Summary

Thanks for reading

Machine Learning for Beginners

Jubaeir Islam — Tue, 15 Nov 2022 13:09:50 +0000

Intro to Machine Learning. (For complete beginners)

This post is for those who are interested in obtaining a general overview of machine learning. Let's get started.

What is Machine Learning?

Making a computer learn by analyzing data and statistics is known as machine learning. It enables software programs to predict outcomes more accurately without having to be expressly designed to do so. In order to forecast new output values, machine learning algorithms use historical data as input. The more data is fed, the program predicts more accurately. Let's make it easier. Suppose you are teaching a child what is a fish. You show multiple pictures of different fishes and mention characteristics of them. So whenever they see a new entity with similar characteristics their immediately register that entity as a fish. We use the same technique to teach a computer what is a fish. We provide it with images of different fishes. It is called Training Data. Computers recognizes all the characteristics of these pictures and stored into its memory as Fish. Now if we give it a new image of a fish to predict what it is, it provides a prediction based on the Training Data. The new image is called Test Data. In order to predicting something a computer requires training data as input and based on the the Test Data we get the prediction either it's a fish or not.

Application of Machine Learning

We already utilize machine learning, and you might not be aware of how it affects your life. Here are a few applications you should be aware of:

Features of social media: Machine learning algorithms are incorporated into social media platforms to assist you receive customized experiences. Facebook keeps track of all of your activity, including comments, likes, and the amount of time you spend on various types of content. Based on your activity, the system suggests pages and friends that are personalized for you.

Virtual assistants: If you're looking for a virtual personal assistant, popular choices include Apple's Siri, Amazon's Alexa, and Google Now. These voice-activated gadgets can accomplish a variety of things, such look up flight information, check your schedule, set alarms, and more. The main feature of these smart speakers and devices is machine learning. Each time you engage with them, they gather and improve the knowledge they have. The computer can utilize that information to deliver results that are most closely aligned to your preferences.

Product recommendations are a prominent use of machine learning that are popular on e-commerce platforms. In order to propose and recommend things you might be interested in, these websites are able to track your behavior based on your searches, past transactions, and shopping cart history.

Image recognition is becoming increasingly common in a wide range of industries. You've probably seen this in the course of your daily life while posting a photo on social media. The software can identify someone you tag in a photograph. It may also be incredibly useful for tracking down missing people, unlocking phones and other mobile devices, and spotting potential thieves or dangers.

Real-life implementation

Let's build a Machine learning model with real data. First. how do we do it? There are seven core steps:

Import Data: First we import our dataset on which we are going to build our project. We
Clean Data: We remove null values, unnecessary columns and duplicates for precise prediction.
Split the data into Training/Test Sets
Algorithm: Select a machine learing algorithm to analize the data. (eg. Decision Tress)
Train the model
Make predictions
Evaluate and Improve

Tools

Below tools will be necessary for our first project

Jupyter-Notebook: Best for Data-Analysis
Numpy: Provides multi-dimensional array
Pandas: Python library built on Numpy to analyze data effectively
Matlotlib: Python Library for making graphs and plots
Scikit-Learn: Popular Machine Learning library that provides all common machine learning algorithms

Music Recommendation according to customers' likings

Here we have a dataset of an exceptionally popular music streaming platform names Musify. We are going to predict which music albums these users are likely to buy according to their profiles (Data collected)
You can download it here: Link

Importing our data

We import all the libraries first and for the ease of your use you can use Kaggle

import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline

#Only applicable for kaggle. 
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

Working with the data
As we are now set with the environment now we are going to import and analyze our data with Pandas. First Let's see what's inside our dataset. We can easily view our dataset through Pandas library's read_csv method. (eg. pd.read_csv('FILE_NAME.csv))

df = pd.read_csv('../input/musicpredicts/music.csv') # importing our dataset
df #viewing our data set

Here we imported our data in df variable. we can write df and press enter to get the below output.

Clean data

We are done with importing our data. Now we clean our data. Cleaning means we remove empty cells, irrelevant data, duplicates, null values etc. Let's get into it!
Note: We don't have any null values here so we don't remove anything.
Now we split our input dataset X. And our output dataset. Here we don't want genre as our input but output. We drop the column genre with .drop() method.

X = df.drop(columns=['genre'])
X #Its the input

We get the below output which is without genre. But didn't drop genre from the main data frame.

If we place our cursor in drop we can see how we can use this method.

We create another variable y which only has the genre and it will be our output

y = df['genre']
y # it will be the output

Making predictions

Now we have our output dataset stored in y. Let's train our model. As said earlier we will use Scikit-Learn for implementing our Machine Learning Algorithm In order to having good accuracy, We are going to use Scikit learn's Decision Tree Classifier

#import scikit learn
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
model = DecisionTreeClassifier() #We call the object
model.fit(X,y) #This is how we teach the object patterns of our dataset.
predictions = model.predict([ [ 21,1 ] , [ 22,0 ] ])
#Let's test our predictions of what 21 years old male and 22 years old female likes. Both are missing from our dataset
predictions

Let's break it down. Fist we call the method, the we teach patterns our dataset has with .fit(INPUT,OUTPUT). After providing a dummy data which is missing from dataset we get this output

array(['HipHop', 'Dance'], dtype=object)

So 21 years old males like HipHop and 22 years old females like Dance. Let's see what is actually decision tree. Decision Tree is the most powerful and popular tool for classification and prediction. A Decision tree is a flowchart-like tree structure, where each internal node denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node (terminal node) holds a class label. In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision nodes are used to make any decision and have multiple branches, whereas Leaf nodes are the output of those decisions and do not contain any further branches. The decisions or the test are performed on the basis of features of the given dataset.It is called a decision tree because, similar to a tree, it starts with the root node, which expands on further branches and constructs a tree-like structure.

Pros and Cons of using a decision tree

Pros

Decision trees are able to generate understandable rules
Decision trees perform classification without requiring much computation.
Decision trees are able to handle both continuous and categorical variables.
Decision trees provide a clear indication of which fields are most important for prediction or classification.

Cons

Decision trees are less appropriate for estimation tasks where the goal is to predict the value of a continuous attribute.
Decision trees are prone to errors in classification problems with many classes and a relatively small number of training examples.
Decision tree can be computationally expensive to train. The process of growing a decision tree is computationally expensive. At each node, each candidate splitting field must be sorted before its best split can be found. In some algorithms, combinations of fields are used and a search must be made for optimal combining weights. Pruning algorithms can also be expensive since many candidate sub-trees must be formed and compared.

Now let's measure the accuracy

We split our datasets into two sets Training and Testing. Rule of thumb: We allocate 80% of our data for Training and 20% of our data for testing. We have already imported a function called train_test_split. It returns a tuple so we split our tests and train datasets like the code below.

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2)

First two variables are input training and testing sets and last two are output training and testing datasets. We are allocating 20% of our data as test data with test_size=0.2 attribute in train_test_split function

model.fit(X_train,y_train)
predictions = model.predict(X_test)

Let's calculate the accuracy

from sklearn.metrics import accuracy_score
#Now let's measure the accuracy
score = accuracy_score(y_test,predictions)
score

1.0
We have an accuracy of 100 percent! But it picks data randomly so everytime we run this cell we get different results. For a dataset which has a few data. It's tough for a computer to predict something very accurately. The more data, the more accurate, As simple as that.

Making your model more precise

Our model is too weak. So we go back where we have started. And make it more efficienct. How? We will create a new trained model and everytime we make a prediction we use that trained model. Let's code.

import joblib
model.fit(X,y)
joblib.dump(model,'music-recommender.joblib')

['music-recommender.joblib']

Now we use that file everytime we predict something. In real life we don't have to train our data again and again. Now we predict new data with already trained model.

#import scikit learn
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
'''
Comment out everything and load the trained model
'''
# model = DecisionTreeClassifier() #We call the object
# model.fit(X,y) #This is how we teach the object patterns of our dataset.
#Load the trained model
model = joblib.load('music-recommender.joblib')
#Let's see if it works
predictions = model.predict([ [ 21,1 ] ])
#Let's test our predictions of what 21 years old male and 22 years old female likes. Both are missing from our dataset
predictions

array(['HipHop'], dtype=object)
It WORKS! NOW we will visually understand how decision tree actually make predictions.

Data Visualization

Let's visualize the data to get a visual presentation of how decision tree is working.

#import scikit learn
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn import tree

model = DecisionTreeClassifier() #We call the object
model.fit(X,y) #This is how we teach the object patterns of our dataset.
tree.export_graphviz(model,out_file='music_recommender.dot',feature_names=['age','gender'],class_names=sorted(y.unique()),label = 'all', rounded = True,filled=True)

Below file has been generated. Now we don't need to train our model again.

music_recommender.dot
Now let's open the file in VSCode. Download Graphviz (dot) language support for Visual Studio Code v0.0.6 extension. And we get the graph below. That's how decision trees work.

And now Good luck and enjoy learning how to make machines smart! Good vibes only