<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Neha Gupta</title>
    <description>The latest articles on Forem by Neha Gupta (@ngneha09).</description>
    <link>https://forem.com/ngneha09</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1418384%2Fca1ec307-ef6a-4338-bc61-f6dfb2e07c60.png</url>
      <title>Forem: Neha Gupta</title>
      <link>https://forem.com/ngneha09</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/ngneha09"/>
    <language>en</language>
    <item>
      <title>Why Blockchain was developed ?</title>
      <dc:creator>Neha Gupta</dc:creator>
      <pubDate>Fri, 16 May 2025 16:43:23 +0000</pubDate>
      <link>https://forem.com/ngneha09/why-blockchain-was-developed--3iom</link>
      <guid>https://forem.com/ngneha09/why-blockchain-was-developed--3iom</guid>
      <description>&lt;p&gt;Hey everyone 👋&lt;/p&gt;

&lt;p&gt;Lately I have been diving into Web3. I started studying about Blockchain and that’s when I wondered why Blockchain was developed 😕and that’s when I read some articles and research paper and found out answer.&lt;/p&gt;

&lt;p&gt;In this blog I will tell you the &lt;strong&gt;History of Blockchain&lt;/strong&gt;. So let’s get started 😉&lt;/p&gt;

&lt;h2&gt;
  
  
  Story of Blockchain
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Early Blockchain History (1982–2004)
&lt;/h3&gt;

&lt;p&gt;Back in year &lt;strong&gt;1982&lt;/strong&gt; cryptographer &lt;strong&gt;David Chaum&lt;/strong&gt; first proposed a blockchain-like protocol in his dissertation “&lt;em&gt;Computer Systems Established, Maintained, and Trusted by Mutually Suspicious Groups&lt;/em&gt;”. As the name suggests gave the idea of transparent and secure system.&lt;/p&gt;

&lt;p&gt;In &lt;strong&gt;1991&lt;/strong&gt; &lt;strong&gt;Stuart Haber&lt;/strong&gt; and &lt;strong&gt;W. Scott Stornetta&lt;/strong&gt; developed cryptographically secured chain of blocks. At that time they wanted to build a system in which document timestamps could not be tampered so that no literation can be performed and we can have secure and transparent document creation. In their design blocks store the timestamps of digital documents.&lt;/p&gt;

&lt;p&gt;The Blockchain idea continued in &lt;strong&gt;1992&lt;/strong&gt; and &lt;strong&gt;Haber&lt;/strong&gt;, &lt;strong&gt;Stornetta&lt;/strong&gt;, and &lt;strong&gt;Dave Bayer&lt;/strong&gt; incorporated &lt;em&gt;Merkle trees&lt;/em&gt; into the design, which improved its efficiency by allowing several document certificates to be collected into one block.&lt;br&gt;
The idea continued thereafter and many scientist published their papers and proposed ideas.&lt;/p&gt;

&lt;p&gt;In &lt;strong&gt;2004&lt;/strong&gt; Cryptographic activist &lt;strong&gt;Hal Finney&lt;/strong&gt; introduced a system for digital cash known as “&lt;em&gt;Reusable Proof of Work&lt;/em&gt;”. This step was the game-changer in the history of Blockchain and Cryptography. This System helps others to solve the &lt;strong&gt;Double Spending Problem&lt;/strong&gt; (it’s like you have 1 Coin and at the same time you gave it two of your friends so here you have used just 1 coin to pay both of your friends) by keeping the ownership of tokens registered on a trusted server.&lt;/p&gt;

&lt;p&gt;So this was some remarkable work that is done so far but still Blockchain was not that popular as it is now. The actual breakthrough happened after year 2008.&lt;/p&gt;

&lt;h3&gt;
  
  
  2008 America’s Financial Crisis
&lt;/h3&gt;

&lt;p&gt;Back in 2008 the USA faced major banking crisis which disturbed the financial health of the country.&lt;/p&gt;

&lt;p&gt;In 2008 banks gave risky home loans (subprime mortgages) even to people who couldn’t afford it. These loans were bundled and sold as “safe investments”, but they were actually very risky. When many people couldn’t pay their loans, the housing market crashed. Big banks and financial institutions collapsed, causing a global financial crisis. Millions lost their jobs, homes, and savings.&lt;/p&gt;

&lt;p&gt;This 2008 crisis showed how corrupt the department was and in response to this and to enable transparency and trust among people a group or person named Satoshi Nakamoto released Bitcoin White Paper in 2009. With the development of Bitcoin they directly challenged centralized system of banks ensuring trust and transparency. He modified the model of Merkle Tree and created a system that is more secure and contains the secure history of data exchange. His System follows a peer-to-peer network of time stamping. His system became so useful that Cryptography became the backbone of Blockchain.&lt;/p&gt;

&lt;p&gt;The year 2014 is marked as the turning point for blockchain technology. Blockchain technology is separated from the currency and Blockchain 2.0 is born. Financial institutions and other industries started shifting their focus from digital currency to the development of blockchain technologies.&lt;/p&gt;

&lt;p&gt;In 2015, Ethereum Frontier Network was launched, thus enabling developers to write smart contracts and dApps that could be deployed to a live network. In the same year, the Linux Foundation launched the Hyperledger project.&lt;/p&gt;

&lt;p&gt;And from then on Blockchain technology became more popular and different digital currencies started to show up.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr9ysukzig1odherd1342.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr9ysukzig1odherd1342.png" alt="Image description" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In summary the whole idea of Blockchain was proposed to enhance transparency and security and generate trust among people.&lt;/p&gt;

&lt;p&gt;I hope you liked this blog. 💚&lt;/p&gt;

</description>
      <category>web3</category>
      <category>blockchain</category>
      <category>bitcoin</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Convolutional Neural Network || Beginner’s Guide</title>
      <dc:creator>Neha Gupta</dc:creator>
      <pubDate>Wed, 23 Oct 2024 07:37:01 +0000</pubDate>
      <link>https://forem.com/ngneha09/convolutional-neural-network-beginners-guide-29m9</link>
      <guid>https://forem.com/ngneha09/convolutional-neural-network-beginners-guide-29m9</guid>
      <description>&lt;p&gt;Hey there 👋 Hope you are doing well 😃&lt;/p&gt;

&lt;p&gt;In the journey of &lt;strong&gt;Deep Learning&lt;/strong&gt;, we come across a variety of neural networks. One of the most basic and foundational types is the &lt;strong&gt;Artificial Neural Network (ANN)&lt;/strong&gt;. ANNs are great for solving simple problems, but when it comes to complex data like images, texts, and videos, ANNs might struggle to perform effectively. To handle such complex data, we’ve introduced more advanced architectures, one of which is the &lt;strong&gt;Convolutional Neural Network (CNN)&lt;/strong&gt;. 🎯&lt;/p&gt;

&lt;p&gt;CNNs are designed specifically to work with complex, high-dimensional data, especially in the field of image processing. In this blog, we’ll explore the introduction to CNNs, their history, how they work, and their applications . 🌟&lt;/p&gt;

&lt;p&gt;So, let’s dive right in! 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe02pxhsvywos9e9ollzb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe02pxhsvywos9e9ollzb.png" alt="Image description" width="586" height="251"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  What is Convolutional Neural Network?
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;Convolutional Neural Networks are special kind of neural networks that are used for processing data that has known grid-like topology such as time-series data, image data. These networks use convolutional layers to process and make predictions from data.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fopvik692st9jv9od2uyv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fopvik692st9jv9od2uyv.png" alt="Image description" width="723" height="273"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;CNN basically consists of an input layer, convolutional layer, pooling layer, fully connected layer (ANN) and output layer.&lt;br&gt;
Don’t worry if you don’t get these points right now. We will discuss them later 😃&lt;/p&gt;

&lt;p&gt;A basic CNN takes an image as input applies convolution operation on it then forward the resultant to ANN and generate output.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why we use CNN?
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Note -: Image is collection of pixels&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;ANN works very well on 1D data such as loan prediction data, house price prediction data etc. But when it comes to 2D data such as image we need to flatten it first then feed it to ANN. Suppose our 2D data is of shape (256,256), on flattening it shape will be (65536,) the trainable parameters in the first layer will be &lt;strong&gt;65536 * number of neurons in layer 1 + bias terms of each neuron&lt;/strong&gt;. Training of such a large number of neurons is computationally expensive. Hence ANN will not work well with 2D data.&lt;/p&gt;

&lt;p&gt;Another problem arises with ANN is loss of important features such as spatial arrangement of pixels. When we flatten 2D data the pixels that are arranged according to the location will get disoriented.&lt;br&gt;
ANN also leads to overfitting.&lt;/p&gt;

&lt;p&gt;Seeing the above reasons and to properly process image data CNN was introduced.&lt;/p&gt;

&lt;h1&gt;
  
  
  History of CNN
&lt;/h1&gt;

&lt;p&gt;CNNs have evolved significantly over time, starting in the 1960s with &lt;strong&gt;Hubel &amp;amp; Wiesel's discovery of receptive fields&lt;/strong&gt;, which laid the foundation for feature detection. In 1980, &lt;strong&gt;Kunihiko Fukushima&lt;/strong&gt; introduced the &lt;strong&gt;Neocognitron&lt;/strong&gt;, a neural network that could recognize patterns in images. In the 1990s, &lt;strong&gt;Yann LeCun's LeNet-5&lt;/strong&gt; model was a breakthrough in handwritten digit recognition, marking the early success of CNNs in image processing.&lt;/p&gt;

&lt;p&gt;The deep learning revolution began in 2012 with &lt;strong&gt;AlexNet&lt;/strong&gt;, developed by &lt;strong&gt;Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton&lt;/strong&gt;, which significantly advanced image classification. By 2014, &lt;strong&gt;VGGNet&lt;/strong&gt; and &lt;strong&gt;GoogLeNet&lt;/strong&gt; further enhanced CNN architectures, improving efficiency and performance. In 2015, &lt;strong&gt;ResNet&lt;/strong&gt; introduced deeper networks with skip connections, addressing the vanishing gradient problem and becoming a standard in computer vision.&lt;/p&gt;

&lt;p&gt;Today, CNNs power various applications, from &lt;strong&gt;autonomous driving&lt;/strong&gt; to &lt;strong&gt;medical imaging&lt;/strong&gt;, with innovations like &lt;strong&gt;Capsule Networks&lt;/strong&gt;, &lt;strong&gt;EfficientNet&lt;/strong&gt;, and &lt;strong&gt;Transformers&lt;/strong&gt; continuing to reshape deep learning.&lt;/p&gt;

&lt;h1&gt;
  
  
  Working of CNN
&lt;/h1&gt;

&lt;h3&gt;
  
  
  Intuition behind CNN
&lt;/h3&gt;

&lt;p&gt;As we have already seen that CNN was initially used on handwritten digits data for recognizing different digits. Now you might be wondering that different persons have different styles of writing ✍digits then how a model can recognize a digit. And here’s why we need to understand the intuition of CNN first before proceeding to its working.&lt;br&gt;
Suppose we test our model for a handwritten 9️⃣ , when we feed this data to our model Ⓜ, it will extract the basic features first then using these basic features it will extract more complex features. The features are extracted in order to find the pattern in data and according to this pattern it will recognize the corresponding digit.&lt;br&gt;
It is like human brain . Suppose we see an animal 🐤 now based on the physical features of that animal we classify them accordingly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Working of CNN
&lt;/h3&gt;

&lt;p&gt;Now that we’ve understood the basic intuition behind CNNs, a question still arises: how does the model actually work? 🤔&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffd7uhaub09a5q7righ5d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffd7uhaub09a5q7righ5d.png" alt="Image description" width="723" height="273"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let’s break down the basic structure of a CNN. We start with the &lt;strong&gt;input layer&lt;/strong&gt;, which takes in a 2D grid representing a single image. This image is then passed through several layers of the CNN.&lt;/p&gt;

&lt;p&gt;First, we have the &lt;strong&gt;convolutional and pooling layers&lt;/strong&gt;. The convolutional layer applies filters (or kernels) to the image to extract features. This is done through a mathematical operation called &lt;strong&gt;convolution&lt;/strong&gt;, which, while similar to matrix multiplication, is specifically designed to detect patterns such as edges, textures, or shapes. &lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;pooling layer&lt;/strong&gt; follows, typically used to downsample the result from the convolutional layer (often called the &lt;strong&gt;feature map&lt;/strong&gt;). Pooling helps reduce the dimensionality of the data, retaining only the most important information. For instance, if we want to detect horizontal edges in an image, we can apply a filter designed to extract horizontal features effectively.&lt;/p&gt;

&lt;p&gt;Finally, we have the &lt;strong&gt;fully connected (dense) layers&lt;/strong&gt;, which act similarly to a traditional Artificial Neural Network (ANN). This part of the network takes the high-level features extracted by the convolutional and pooling layers and makes predictions based on those.&lt;/p&gt;

&lt;p&gt;In simple terms, CNNs work by scanning an image with filters to detect essential features and patterns, then passing the information through dense layers to classify or make predictions. &lt;/p&gt;

&lt;p&gt;This explanation provides a high-level overview, as this blog is intended to be an introductory guide.&lt;/p&gt;

&lt;h1&gt;
  
  
  Applications of CNN
&lt;/h1&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Image Classification&lt;/strong&gt; 🖼️ – CNNs excel at categorizing images, improving accuracy in tasks like object and handwritten digit recognition.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Object Detection&lt;/strong&gt; 🔍 – Used in self-driving cars to detect pedestrians, vehicles, and more by identifying and localizing objects in images.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Facial Recognition&lt;/strong&gt; 👤 – Powering facial recognition in smartphones and security systems by learning distinct facial features.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Medical Imaging&lt;/strong&gt; 🏥 – Helping in disease diagnosis through X-rays, MRIs, and CT scans by accurately identifying anomalies.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-Driving Cars&lt;/strong&gt; 🚗 – Performing real-time vision tasks like lane detection and obstacle recognition for safe navigation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Image and Video Processing&lt;/strong&gt; 🎥 – Enhancing images, segmenting, and tracking objects in real-time for video analysis and editing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NLP&lt;/strong&gt; 💬 – Applied in text classification tasks like sentiment analysis and spam detection using CNNs on word embeddings.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Art Generation&lt;/strong&gt; 🎨 – Enabling neural style transfer to create artistic visuals by blending styles and patterns.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Robotics&lt;/strong&gt; 🤖 – Assisting robots in recognizing objects and navigating environments using visual data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gaming and AR&lt;/strong&gt; 🎮 – Improving gaming realism and blending virtual and real-world elements through real-time visual data processing.&lt;/li&gt;
&lt;/ol&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;Convolutional Neural Networks (CNNs) have transformed how we process and understand complex data, especially in the fields of computer vision and beyond. From identifying objects in images to powering facial recognition systems, CNNs have become essential in solving a wide range of real-world problems.&lt;/p&gt;

&lt;p&gt;I hope you have found this blog interesting. Please leave some 💛 and don’t forget to follow me.&lt;/p&gt;

&lt;p&gt;Thankyou 💛&lt;/p&gt;

</description>
      <category>deeplearning</category>
      <category>tutorial</category>
      <category>interview</category>
      <category>ai</category>
    </item>
    <item>
      <title>729. My Calendar I|| Leetcode || Medium</title>
      <dc:creator>Neha Gupta</dc:creator>
      <pubDate>Fri, 27 Sep 2024 13:59:51 +0000</pubDate>
      <link>https://forem.com/ngneha09/729-my-calendar-i-leetcode-medium-e6b</link>
      <guid>https://forem.com/ngneha09/729-my-calendar-i-leetcode-medium-e6b</guid>
      <description>&lt;p&gt;Hey there 👋&lt;/p&gt;

&lt;p&gt;Hope you are doing well 😃&lt;/p&gt;

&lt;p&gt;In this blog we are going to see the complete intuition behind Leetcode problem &lt;strong&gt;729. My Calendar I&lt;/strong&gt;. We are to understand problem statement first then we will look at the solution of the problem followed by code.&lt;/p&gt;

&lt;p&gt;Problem link -: &lt;a href="https://leetcode.com/problems/my-calendar-i/description/" rel="noopener noreferrer"&gt;https://leetcode.com/problems/my-calendar-i/description/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let’s get started 🔥&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1wpeg5lpme1plvsa63ag.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1wpeg5lpme1plvsa63ag.png" alt="Image description" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Problem Statement
&lt;/h2&gt;

&lt;p&gt;You are implementing a program to use as your calendar. We can add a new event if adding the event will not cause a &lt;strong&gt;double booking&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;double booking&lt;/strong&gt; happens when two events have some non-empty intersection (i.e., some moment is common to both events.).&lt;/p&gt;

&lt;p&gt;The event can be represented as a pair of integers start and end that represents a booking on the half-open interval &lt;code&gt;[start, end)&lt;/code&gt;, the range of real numbers &lt;code&gt;x&lt;/code&gt; such that &lt;code&gt;start &amp;lt;= x &amp;lt; end&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Implement the &lt;code&gt;MyCalendar&lt;/code&gt; class:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;MyCalendar()&lt;/code&gt; Initializes the calendar object.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;boolean book(int start, int end)&lt;/code&gt; Returns true if the event can be added to the calendar successfully without causing a double booking. Otherwise, return false and do not add the event to the calendar.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Example 1
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fggs1w9sgnryzyu57ikvq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fggs1w9sgnryzyu57ikvq.png" alt="Image description" width="800" height="368"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Problem Explanation
&lt;/h2&gt;

&lt;p&gt;The problem is about implementing a calendar which holds event for a particular time interval.&lt;br&gt;
The calendar is implemented such that a single interval can hold single booking. For example suppose a user has made booking for time interval [1,4] then this complete slot will be assigned to this user now an another user comes to get a slot of [2,3], but he won’t be able to get it because this interval comes in [1,4].&lt;/p&gt;

&lt;p&gt;Note that the slot represents a booking on the &lt;strong&gt;half-open interval&lt;/strong&gt; &lt;code&gt;[start, end)&lt;/code&gt;, the range of real numbers &lt;code&gt;x&lt;/code&gt; such that &lt;code&gt;start &amp;lt;= x &amp;lt; end&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;In this problem we are given a list of time intervals and we have to check that whether a time slot can be assigned to an event or not.&lt;/p&gt;

&lt;h2&gt;
  
  
  Approach
&lt;/h2&gt;

&lt;p&gt;In the given problem the events can be booked in any order. For example [[47,50],[33,41]].&lt;br&gt;
So here simply adding the possible slots in a set or map and checking the next slot with previous stored slots won’t give us desired solution.&lt;/p&gt;

&lt;p&gt;Now what can we do here?🤔&lt;/p&gt;

&lt;p&gt;In this problem we have to find a way through which we can ensure that a single event is assigned to every slot. This we can do using &lt;strong&gt;prefix sum&lt;/strong&gt; and &lt;strong&gt;ordered map&lt;/strong&gt;. Huh?🙄&lt;/p&gt;

&lt;p&gt;We will make an ordered map (as it keep entries in sorted order). Whenever a slot is to be booked we will assume that the slot is valid and will book an event between start and end . &lt;code&gt;mp[start]++&lt;/code&gt; will indicate the starting point of event and &lt;code&gt;mp[end]--&lt;/code&gt; will indicate that the event has been completed at time &lt;code&gt;end-1&lt;/code&gt; . In this way we are keeping track of event in a particular time interval.&lt;br&gt;
Now we will see in map that every slot has single booking. We will do it by calculating cumulative sum. Whenever &lt;code&gt;sum&amp;gt;1&lt;/code&gt; it shows that a double booking has been done. We will remove that time slot from map.&lt;br&gt;
Note that the slot for which double booking is encountered is our current slot and this proves our assumption to be wrong.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwtvjlsw2kr0bn5rn0u7o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwtvjlsw2kr0bn5rn0u7o.png" alt="Image description" width="800" height="377"&gt;&lt;/a&gt;&lt;br&gt;
As you can see here how the approach is keeping track of slot and booking.&lt;/p&gt;

&lt;p&gt;And this is why I told you that the problem can be solved using map and prefix sum.&lt;/p&gt;

&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvpmueolr5xwd0vheqlsy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvpmueolr5xwd0vheqlsy.png" alt="Image description" width="800" height="763"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here we have made an ordered map and in &lt;code&gt;book()&lt;/code&gt; method we have assumed that the current slot can be booked. Then we have calculated cumulative sum, if sum&amp;gt;1 it indicates a double booking and hence our assumption is wrong we will remove this entry from map and will return False . If everything works well we will return True .&lt;/p&gt;

&lt;p&gt;So this was complete solution to problem &lt;strong&gt;729. My Calendar I&lt;/strong&gt;. I hope you have understood it well.&lt;br&gt;
If you like the blog please leave some ❤&lt;/p&gt;

&lt;p&gt;Thankyou 😃&lt;/p&gt;

</description>
      <category>leetcode</category>
      <category>datastructures</category>
      <category>interview</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>The Complete Guide to Becoming a Software Development Engineer (SDE)</title>
      <dc:creator>Neha Gupta</dc:creator>
      <pubDate>Sat, 21 Sep 2024 08:45:17 +0000</pubDate>
      <link>https://forem.com/ngneha09/the-complete-guide-to-becoming-a-software-development-engineer-sde-pn1</link>
      <guid>https://forem.com/ngneha09/the-complete-guide-to-becoming-a-software-development-engineer-sde-pn1</guid>
      <description>&lt;p&gt;Hey reader 👋&lt;/p&gt;

&lt;p&gt;Hope you are doing well 😃&lt;/p&gt;

&lt;p&gt;Becoming a &lt;strong&gt;Software Development Engineer (SDE)&lt;/strong&gt; in a top multinational company, startup, or big tech firm is a dream for many freshers, students, and professionals. However, many aspiring engineers don’t fully understand what an SDE actually does or how to climb the path to this coveted position.&lt;/p&gt;

&lt;p&gt;In this blog, I’ll walk you through the complete journey of becoming an SDE — from the skills you need to develop, to the interview preparation strategies, and finally landing your dream job.&lt;/p&gt;

&lt;p&gt;So, let’s dive in and get started on this exciting journey! 🔥&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa12qfy7li3jj2nxdv0af.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa12qfy7li3jj2nxdv0af.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What does an SDE do?🧑‍💻
&lt;/h2&gt;

&lt;p&gt;Before we dive into how to become an SDE, let’s clarify what the role involves. As a Software Development Engineer, you’ll be responsible for designing, developing, testing, and maintaining software systems that solve real-world problems. You could be working on anything from building websites and apps to designing complex backend systems, depending on the company and the project.&lt;/p&gt;

&lt;p&gt;SDEs are the masterminds behind the tech we use every day! They collaborate with cross-functional teams, solve challenging problems, and create solutions that can scale to millions of users. Sounds exciting 🤩, right?&lt;/p&gt;

&lt;h2&gt;
  
  
  Levels of SDE
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;SDE1 (Software Development Engineer 1)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Experience Level: Entry-level (0–2 years of experience)&lt;/p&gt;

&lt;p&gt;Role and Responsibilities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;As an SDE1, you’re just starting out in your software engineering career. This role is focused on learning, growing, and gaining experience in the field.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You’ll be working on well-defined tasks under the supervision of senior engineers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Your primary responsibility is writing clean, maintainable code and following the software development lifecycle (coding, testing, debugging).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You’ll also be involved in team discussions, learning about software design, and understanding how large systems work.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Expect to be given smaller, more manageable projects or parts of larger projects.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Skills Required:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Strong understanding of core programming concepts (OOP, DSA).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ability to write functional and efficient code.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Problem-solving skills and familiarity with the tools and technologies used by the team.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Growth Focus:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You’ll be learning how to work in a team and gaining real-world experience with production codebases.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;SDE2 (Software Development Engineer 2)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Experience Level: Mid-level (2–5 years of experience)&lt;/p&gt;

&lt;p&gt;Role and Responsibilities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;At this stage, you’re expected to be more independent and capable of handling larger and more complex tasks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;SDE2 engineers are usually responsible for designing and implementing features end-to-end. You’ll work on more critical parts of the system, potentially leading small projects.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You are also expected to make key architectural decisions, review code, and mentor junior engineers (SDE1).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;SDE2s have a solid understanding of system design and scalability. You may also collaborate with other teams and understand the broader impact of the code you write.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Skills Required:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Proficiency in programming and in-depth knowledge of the tools used by the team.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Experience in designing systems or components with scalability and performance in mind.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The ability to debug complex problems and suggest solutions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Some knowledge of system design and architecture.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Growth Focus:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Focus on becoming a more independent problem-solver and team player. You’ll be deepening your system design knowledge and learning to manage more complex tasks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;SDE3 (Software Development Engineer 3)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Experience Level: Senior-level (5+ years of experience)&lt;/p&gt;

&lt;p&gt;Role and Responsibilities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;SDE3s are senior engineers who take on the most complex projects and often lead teams of engineers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You’ll be responsible for designing and building large-scale systems with high reliability, performance, and scalability in mind.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;SDE3s are also responsible for making critical architectural decisions, driving technical direction, and influencing the overall strategy of the company’s engineering efforts.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You’re expected to mentor other engineers, drive best practices, and ensure that the team follows industry standards in terms of code quality and architecture.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Skills Required:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Expert-level knowledge of system design, architecture, and scaling solutions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ability to handle complex challenges and guide the engineering team through problem-solving.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Experience across multiple tech stacks and a deep understanding of how different components of large systems interact.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Strong leadership, mentorship, and communication skills.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Growth Focus:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SDE3s are leaders and problem solvers at the highest technical level. The next steps in your career could include moving into engineering management or becoming a principal/lead engineer.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But to get here, you need more than just coding skills — let’s explore the roadmap! 😊&lt;/p&gt;

&lt;h2&gt;
  
  
  SDE Roadmap 🚀
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Now that You Know What an SDE Does and the Levels of the Role…&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It’s time for you to understand the journey of becoming an SDE! The path you take will depend on the level of SDE you’re aiming for, but if you’re just starting out and looking to land an SDE1 position, then this pathway is definitely for you! 😀&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Pathway to SDE1: Your Step-by-Step Guide&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Start with the Basics: Learn Programming&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Every journey starts with the fundamentals, and for an SDE1 role, that means mastering a programming language. Whether you’re in college or learning on your own, choose a language that’s widely used in the industry. Common choices include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Python: Great for beginners and widely used in backend development.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Java: Extremely popular for enterprise-level applications.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;C++: Known for its speed and control over system resources, great for those who want to work on performance-intensive projects.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key here is not to learn every language out there, but to become proficient in one. Once you’re comfortable coding, you can always pick up new languages along the way.&lt;/p&gt;

&lt;p&gt;Learn time and space complexity calculation for a given code or the code you are writing. Understand best case, average case, and worst case complexity. This will be helpful when in later stages you try to optimize your code.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Dive Deep into Data Structures and Algorithms (DSA)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you’ve been following the tech world, you’ve probably heard that Data Structures and Algorithms (DSA) are the bread and butter of SDE interviews. Most companies rely heavily on DSA to test your problem-solving skills and efficiency as a developer.&lt;/p&gt;

&lt;p&gt;Here’s your DSA roadmap:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Start small: Begin with basic data structures like arrays, linked lists, stacks, and queues.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Learn algorithms: Work on sorting algorithms (merge sort, quick sort), searching algorithms (binary search), and more advanced topics like recursion and dynamic programming.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Practice: Platforms like LeetCode, HackerRank, and GeeksforGeeks are gold mines for practice. Start with easy problems, and once you get comfortable, move to more challenging ones.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Don’t rush — consistency is key! Aim for solving 1–2 problems a day, and you’ll soon see your skills improve.&lt;/p&gt;

&lt;p&gt;You can read this blog to get a complete guide to DSA -: &lt;a href="https://medium.com/@akshatsharma0610/a-data-structures-and-algorithms-guide-get-interview-ready-d2426c5e30c7" rel="noopener noreferrer"&gt;https://medium.com/@akshatsharma0610/a-data-structures-and-algorithms-guide-get-interview-ready-d2426c5e30c7&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Master Core Computer Science Subjects&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Knowledge of basic computer science subjects will help in interviews as well as professional work. They help in understanding the complete flow and working of application development. By mastering these topics you will be able to solve complex problems, be able to create robust software, indulge in technical discussions, etc. Thus, take some time and have a good understanding of these. Reference books, courses, college projects, etc. can prove to be helpful while learning these.&lt;/p&gt;

&lt;p&gt;The list of subjects you should study are -:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;DBMS&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Computer Networks&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;OOPs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Operating System&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Computer Architecture&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;System Design&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;Learn Object-Oriented Programming (OOP) Concepts&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Object-Oriented Programming (OOP) is another core area that companies will test in interviews. It’s especially important if you’re applying for positions in companies that use languages like Java, C++, or Python.&lt;/p&gt;

&lt;p&gt;OOP concepts help in designing reusable, scalable, and efficient systems. Focus on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Classes and Objects: The building blocks of OOP. Understand how to design classes that represent real-world entities.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Inheritance: Allows you to reuse code by creating hierarchies and extending base classes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Polymorphism: Understand both compile-time (method overloading) and run-time (method overriding) polymorphism.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Encapsulation: Hiding the internal state of objects and allowing access only through methods to protect data integrity.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Abstraction: Focus on defining an interface for interactions, while hiding the implementation details.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Practice by building small projects or applications that implement OOP principles, such as a banking system or a library management system.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Learn High-Level Design (HLD) and Low-Level Design (LLD) for Interview&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;While system design is more heavily tested in SDE2 and SDE3 interviews, it’s important to have a basic understanding for SDE1 as well. You might be asked to design small-scale systems like a URL shortener or a basic messaging system.&lt;/p&gt;

&lt;p&gt;Here’s what you should focus on at this stage:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Scalability: Understand how to build systems that can handle an increasing number of users or requests.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Load Balancing: How to distribute network traffic across multiple servers to avoid overload.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Caching: Techniques to store frequently accessed data for faster retrieval.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Database Design: How to create normalized database schemas and when to use denormalization for performance optimization.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It’s a good idea to start reading books like Designing Data-Intensive Applications to get a feel for how large systems are built.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Build Real-World Projects&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;While DSA helps you crack interviews, real-world projects showcase your ability to apply knowledge. Building projects not only strengthens your coding skills but also makes your resume stand out.&lt;/p&gt;

&lt;p&gt;Here are some beginner-friendly project ideas:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Personal Portfolio Website: Build a website showcasing your skills, projects, and achievements.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;To-Do List App: Create a full-stack application using a backend framework like Node.js or Django.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Weather App: Use public APIs to create an app that displays real-time weather information.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Freelance Project: You can also take a freelance project from different platforms or from your family or friend.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You can also clone any existing website and can do useful changes in it. This is really going to help you a lot.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Tip: Share your projects on GitHub, contribute to open source, and even write blogs about your learning process. Employers love seeing developers who are passionate about coding and sharing knowledge.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Learn Version Control (Git)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In the real world, coding is rarely done in isolation. You’ll be collaborating with other developers, and that’s where Git and version control come into play. Here’s what to do:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Learn the basics of Git: cloning repositories, committing code, and pushing changes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Get familiar with GitHub, which is used by companies for collaboration and reviewing code.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once you’ve learned Git, make sure to use it in your projects. This shows that you know how to work in a team and manage your code professionally.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Prepare for Behavioral and HR Interviews&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In addition to technical interviews, companies will often test your communication skills, teamwork, and cultural fit through behavioral interviews. Here’s what to focus on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Prepare your stories: Think of examples from your past where you’ve demonstrated leadership, collaboration, problem-solving, or dealt with challenges.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;STAR method: When answering, use the STAR method (Situation, Task, Action, Result) to structure your responses.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Show that you’re not only a good coder but also a team player with strong communication skills. This goes a long way in landing your dream job.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Get Certifications on Trending Technologies&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Professional certifications help in personal growth and learning new concepts from industry experts. These provide the flexibility to learn from anywhere and even professionals can use certification courses to upskill and receive promotions in their current jobs. This investment of time and resources can provide specialization in one domain and enhance our skill set and understanding of that topic.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Keep Learning and Stay Motivated!&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The tech world evolves rapidly, and being an SDE means you should always be curious and willing to learn. Whether it’s picking up a new programming language, learning cloud computing, or diving into machine learning, keep pushing yourself to learn and grow.&lt;/p&gt;

&lt;p&gt;The journey to SDE1 is challenging but incredibly rewarding. Stay consistent, practice regularly, and never lose your love for coding!&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion: Ready to Start Your SDE1 Journey?
&lt;/h2&gt;

&lt;p&gt;If you follow this pathway, you’ll not only build the technical skills required to become an SDE1, but you’ll also develop the mindset and confidence to succeed. Remember, it’s not about perfection, it’s about progress. Take it one step at a time, and soon, you’ll be well on your way to landing that dream role as a Software Development Engineer.&lt;/p&gt;

&lt;p&gt;Good luck, and keep coding! 🚀&lt;/p&gt;

</description>
      <category>softwareengineering</category>
      <category>career</category>
      <category>beginners</category>
      <category>interview</category>
    </item>
    <item>
      <title>Mistakes I made while studying Machine Learning</title>
      <dc:creator>Neha Gupta</dc:creator>
      <pubDate>Tue, 20 Aug 2024 07:13:44 +0000</pubDate>
      <link>https://forem.com/ngneha09/mistakes-i-made-while-studying-machine-learning-5dn9</link>
      <guid>https://forem.com/ngneha09/mistakes-i-made-while-studying-machine-learning-5dn9</guid>
      <description>&lt;p&gt;Hey there 👋 Hope you are doing well 😊&lt;br&gt;
We all know that this is the decade of artificial intelligence, data science, machine learning and stuff. These skills are very important and when added in resume, they can make your resume stand out of crowd. But while learning these skills it is very important to follow the right path. Misleading paths can waste a lot of your time. In this post I'll tell what mistakes I have made while I was learning ML. This post is helpful for those who are just starting their journey in AI. This will save a lot of time of yours😌&lt;br&gt;
So let's get started 🔥&lt;/p&gt;

&lt;h2&gt;
  
  
  Didn't do much research
&lt;/h2&gt;

&lt;p&gt;When I was starting my ML journey I didn't dedicate my time in doing research and collecting resources. I just jumped into it and found myself puzzled. There were times when my basics were not clear and I was so much overwhelmed by the intermediate things. And often I found myself scratching my head over different concepts. After all this I have realized that it is very important to do research before starting anything.&lt;/p&gt;

&lt;h2&gt;
  
  
  Directly jumped into ML algorithms
&lt;/h2&gt;

&lt;p&gt;Now you know that I have not done enough research and I was  so excited to study about ML that I didn't bother about learning basics and I started learning ML algorithms from day1 and seriously it was such an awful mistake that I have made. I should have started from Python followed by Math's then EDA, Feature Engineering and then ML algorithms.&lt;/p&gt;

&lt;h2&gt;
  
  
  Following more than one playlist/course at a time
&lt;/h2&gt;

&lt;p&gt;I was studying from YouTube and I started from a playlist. At starting everything was going pretty well but later I found myself distracted from my ongoing course and started learning from numerous courses out there. I have seen different videos for a single topic and this took a lot of time. Also I was so attracted by the content, it is like whenever I found a video from MIT or Stanford I start learning from them and I was beginner back then so watching them were like committing a sin. So it is very important to stay on a playlist that you are following or are going to follow in future.&lt;/p&gt;

&lt;h2&gt;
  
  
  Everybody has their own way of getting things done
&lt;/h2&gt;

&lt;p&gt;So this is like one of the most important things that I have understood lately. When we learn anything we do it in our own way whether it is web development, data structures or anything. When I was learning ML I used to follow different people and their techniques. When someone used Label Encoder I started using that when someone used Ordinal Encoder I started using that and this made me feel like I was drowning in an ocean. But with time I have realized that people have their own way of doing things and I have to find my own way too.&lt;/p&gt;

&lt;h2&gt;
  
  
  Not practicing and revising concepts
&lt;/h2&gt;

&lt;p&gt;Back then I was lazy enough to implement and revise the concepts that I have learn on a particular day. And when enough time was passed it felt like a heavy baggage. I started forgetting things and found myself stuck. so it is very important to regularly revise and practice concepts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sticking for hours
&lt;/h2&gt;

&lt;p&gt;Whenever I didn't get any concept I kept on studying it for hours and even days and this took a lot of my time. Sticking to things and completing them is very important but sticking to it for a long time and still nothing is working out is a grave mistake. If you don't get anything give it sometime or seek help from somebody else this will save a lot of your time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Leave your damn ego
&lt;/h2&gt;

&lt;p&gt;So yes I was someone who liked to get things done on my own and this is why I don't like seeking help from others. Whenever I stuck somewhere I never asked someone to help me out and this took a lot of my time. So don't repeat this mistake and ask for help whenever needed.&lt;/p&gt;

&lt;p&gt;So these was my mistakes that I have made during my journey of ML. I mentioned them in post because I don't want someone to repeat these mistakes at the cost of their time.&lt;/p&gt;

&lt;p&gt;I hope you liked my post. For more follow me.&lt;br&gt;
Thankyou 💚&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>basic</category>
    </item>
    <item>
      <title>How to get started to Machine Learning?</title>
      <dc:creator>Neha Gupta</dc:creator>
      <pubDate>Thu, 15 Aug 2024 10:20:51 +0000</pubDate>
      <link>https://forem.com/ngneha09/how-to-get-started-to-machine-learning-4ngo</link>
      <guid>https://forem.com/ngneha09/how-to-get-started-to-machine-learning-4ngo</guid>
      <description>&lt;p&gt;Hey there 👋 Hope you are doing well 😊&lt;br&gt;
As you know AI wave is all over everyone is trying different AI based services and getting amazing results. Every platform out there is embedding AI to make their platform more smart and useful. AI is one of the most important skills of this decade. But getting started to it is really difficult and can be misleading sometimes. So it is very important to follow right path to understand AI better. Getting into AI means getting started with Machine Learning. In this article I am going to tell you about how you can start your journey to Machine Learning and become master in it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Get Started with Python
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fae7m62ckq1irklihswab.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fae7m62ckq1irklihswab.png" alt="Image description" width="214" height="148"&gt;&lt;/a&gt;&lt;br&gt;
To go into Machine Learning you should know Python first. Learn basics first like how variables are created and manipulated, how loops and conditions work, how functions are formed and used, how lists, arrays, maps etc. created and work. Then you should learn about OOPs in Python. And finally you should know all about Pandas and Numpy libraries as these are going to be very useful in the journey of Data Science.&lt;/p&gt;

&lt;h2&gt;
  
  
  Learn Maths
&lt;/h2&gt;

&lt;p&gt;While learning Python, you should also devote your time in studying Maths as this subject is like a backbone of Machine Learning, Deep Learning, NLP etc. You should be familiar with Statistics, Linear Algebra, Probability theory, Hypothesis Testing, Calculus and Optimization. If you know these topics very well (clarity in basics) then you will find ML algorithms very easy. Also don't forget to implement Maths functions using Python.&lt;/p&gt;

&lt;h2&gt;
  
  
  Study EDA and Feature Engineering
&lt;/h2&gt;

&lt;p&gt;Now it is time to play with data😈. EDA stands for Exploratory Data Analysis this gives important insights from your dataset. It is very important for knowing relationship between different features in your dataset. Feature Engineering involves manipulating features in your dataset in order to make your data more resourceful. This complete process involves knowing about data and handling it efficiently. The libraries you should know are Seaborn, Matplotlib, Missingno, PyOd.&lt;/p&gt;

&lt;h2&gt;
  
  
  Machine Algorithms
&lt;/h2&gt;

&lt;p&gt;This is the part which you have been waiting for. Start studying about Machine learning algorithms now. Study all supervised and unsupervised learning algorithms. Get the maths and geometrical intuition implement them from scratch then understand the pre defined libraries. You should know about sklearn and its sub libraries here.&lt;/p&gt;

&lt;h2&gt;
  
  
  Improvise Model
&lt;/h2&gt;

&lt;p&gt;Now you know about Machine Learning Algorithms, now it is time to know about the techniques used to improvise them. Learn Cross-validation techniques, HyperParameter Tuning techniques and Ensembeling methods. Get into details and practice them on datasets.&lt;/p&gt;

&lt;h2&gt;
  
  
  Auto ML
&lt;/h2&gt;

&lt;p&gt;As you have came a long way now it is time to automate your tasks. Study Auto ML and practice get to know about Pipelines, Optuna etc.&lt;/p&gt;

&lt;h2&gt;
  
  
  Know about Version Controls
&lt;/h2&gt;

&lt;p&gt;Learn to use Git so that you can automate the process of deployment and management.&lt;/p&gt;

&lt;p&gt;And the learning path goes on....&lt;br&gt;
If you really want to master it then practice is very important. You can practice ML on different platforms.&lt;/p&gt;

&lt;h2&gt;
  
  
  Important Platforms
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Kaggle -: This is one of the most famous platforms for datascience. They hosts competitions. They have variety of datasets and tutorials and large community.&lt;br&gt;
Link -: &lt;a href="https://dev.tourl"&gt;https://www.kaggle.com/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Jupytor -: This is an IDE for creating Data Science projects. You can use Google Notebooks too.&lt;br&gt;
Link -: &lt;a href="https://dev.tourl"&gt;https://jupyter.org/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Github -: This is one of most famous platforms. It has got enormous datasets from where you can practice and make contributions on different projects.&lt;br&gt;
Link -: &lt;a href="https://dev.tourl"&gt;https://github.com/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Important Resources
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Maths -: Statquest statistics playlist on YouTube&lt;br&gt;
link -: &lt;a href="https://dev.tourl"&gt;https://youtu.be/qBigTkBLU6g?si=tn5f8dBQo_-xDfqr&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Machine Learning -: CampusX playlist (Hindi), Coursera Andrew Ng ML specialization course.&lt;br&gt;
CampusX&lt;br&gt;
Link -: &lt;a href="https://dev.tourl"&gt;https://youtu.be/ZftI2fEz0Fw?si=VrfnThcAc8z0OQ2N&lt;/a&gt;&lt;br&gt;
Coursera&lt;br&gt;
Link -: &lt;a href="https://dev.tourl"&gt;https://www.coursera.org/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So this was the complete roadmap for Machine Learning.&lt;br&gt;
Thankyou 💚&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>python</category>
    </item>
    <item>
      <title>Feature Transformation in Machine Learning || Feature Engineering</title>
      <dc:creator>Neha Gupta</dc:creator>
      <pubDate>Tue, 06 Aug 2024 11:24:41 +0000</pubDate>
      <link>https://forem.com/ngneha09/feature-transformation-in-machine-learning-feature-engineering-4of9</link>
      <guid>https://forem.com/ngneha09/feature-transformation-in-machine-learning-feature-engineering-4of9</guid>
      <description>&lt;p&gt;Hey reader 👋 Hope you are doing well 😊&lt;/p&gt;

&lt;p&gt;As you know, to get accurate predictions, our model should be trained well. For better training, our data should be processed properly. To gain valuable insights from data, we perform Exploratory Data Analysis (EDA). Using EDA, we engage in Feature Engineering to transform our data as required.&lt;/p&gt;

&lt;p&gt;In the process of Feature Engineering, we handle categorical data, missing values, outliers, feature selection, etc. Transforming numerical values is one of the critical tasks in Feature Engineering. This transformation allows us to convert all data into the same unit, making our data more efficient for model training.&lt;/p&gt;

&lt;p&gt;In this blog, we will discuss different types of transformations and their importance. So let's get started 🔥&lt;/p&gt;

&lt;h2&gt;
  
  
  Feature Transformation
&lt;/h2&gt;

&lt;p&gt;Feature Transformation refers to the process of converting data from one form to another. For example, transforming categorical data into numerical data, scaling numerical data, and converting data so that it follows the desired statistics of an algorithm (e.g., linear regression works well when the data is normally distributed).&lt;/p&gt;

&lt;p&gt;The different types of Feature Transformation are -:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Function Transformers&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Power Transformers&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Feature Scaling&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Encoding Categorical Data&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Missing Value Imputation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Outlier Detection&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Why is Feature Transformation Required?
&lt;/h2&gt;

&lt;p&gt;Imagine trying to solve a jigsaw puzzle with pieces that don’t quite fit together. In the same way, raw, unprocessed data might not fit the requirements of your machine-learning algorithms. Feature transformation is the process of reshaping those pieces, making them compatible and coherent, and ultimately, revealing the full picture.&lt;/p&gt;

&lt;p&gt;Machine learning algorithms often work better with features transformed to have similar scales or distributions. Feature transformation can lead to better model performance by improving the model’s ability to learn from the data.&lt;/p&gt;

&lt;p&gt;Feature transformation can reveal hidden patterns or relationships in the data that might not be apparent in the original feature space. By creating new features or modifying existing ones, you can expose valuable information that your model can use to make more accurate predictions.&lt;/p&gt;

&lt;p&gt;In some cases, feature transformation can help reduce the dimensionality of the data. This not only simplifies the modeling process but also helps prevent issues like the curse of dimensionality, which can lead to overfitting.&lt;/p&gt;

&lt;h2&gt;
  
  
  A brief about different Feature Transformation techniques
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Function Transformers&lt;/strong&gt; -: Function transformers are the type of feature transformation technique that uses a particular function to transform the data to the normal distribution.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Power Transformers&lt;/strong&gt; -: Power Transformation techniques are the type of feature transformation technique where the power is applied to the data observations for transforming the data. Techniques like Box-Cox or Yeo-Johnson transformations are used to make data more normally distributed, which can be beneficial for certain algorithms.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Feature Scaling&lt;/strong&gt; -: Feature Scaling is a feature engineering technique that is used to transform the complete data in single scale. It either scales up the data or scales down as per requirement.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Encoding Categorical Data&lt;/strong&gt; -: All the machine learning algorithms are suitable for numerical data, so it is very important to convert categorical data into numerical.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Missing Value Imputation&lt;/strong&gt; -: Sometimes our dataset may contain missing values which can affect our model significantly. so missing values should be handled properly.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Outlier Detection&lt;/strong&gt; -: Outliers are datapoints that exhibit completely different behavior than rest other points in dataset, these can hinder model performance. So these should be handled properly.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So this is it for this blog in the next blog we will see how &lt;strong&gt;Feature Scaling&lt;/strong&gt; is performed. Till then stay connected and don't forget to follow me.&lt;br&gt;
Thankyou 💜&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>beginners</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Handling Missing Values || Feature Engineering || Machine Learning (Part2)</title>
      <dc:creator>Neha Gupta</dc:creator>
      <pubDate>Fri, 02 Aug 2024 08:50:09 +0000</pubDate>
      <link>https://forem.com/ngneha09/handling-missing-values-feature-engineering-machine-learning-part2-37l0</link>
      <guid>https://forem.com/ngneha09/handling-missing-values-feature-engineering-machine-learning-part2-37l0</guid>
      <description>&lt;p&gt;Hey reader👋Hope you are doing well😊&lt;br&gt;
We know that to improve performance machine learning model feature engineering is crucial step. One of most important tasks in feature engineering is handling outliers. In this blog we are going to do a detailed discussion on handling missing values. So let's get started 🔥.&lt;/p&gt;

&lt;h2&gt;
  
  
  Complete Case Analysis
&lt;/h2&gt;

&lt;p&gt;Complete Case Analysis (CCA) is also &lt;strong&gt;"listwise deletion"&lt;/strong&gt;. This method is used to handle missing data. In this technique all the rows that contain one or more missing values are excluded from dataset. &lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxu9fukol08eqziwx89i7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxu9fukol08eqziwx89i7.png" alt="Image description" width="540" height="235"&gt;&lt;/a&gt;&lt;br&gt;
So here in final dataset only those rows are included that contain complete data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Assumptions for CCA&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Data should be completely missing at random. &lt;br&gt;
Suppose you have a dataset that contains 1000 rows and 5 columns now you have 50 such rows that have missing values. Now these 50 rows are random rows. And you can remove these rows.&lt;br&gt;
If you remove the data at random, the distribution of the data will remain unchanged.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;When the proportion of missing data is small.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;When simplicity and ease of implementation are prioritized.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key Points of Complete Case Analysis&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Simplicity: CCA is straightforward to implement and understand.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Bias: If the missing data are not missing completely at random (MCAR), CCA can introduce bias into the analysis.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Efficiency: Excluding data with missing values reduces the sample size, which can lead to a loss of statistical power.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Application: Commonly used in regression analysis, where only cases with complete data for all predictors are included.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use CCA when data in a particular column missing is less than equal to 5% . You can remove complete column if missing data in that column is greater than or equal to 95%.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Implementation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;import pandas as pd&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;//Load your dataset&lt;br&gt;
//Replace 'your_dataset.csv' with the actual file path&lt;br&gt;
&lt;code&gt;df = pd.read_csv('your_dataset.csv')&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;//Display the original data&lt;br&gt;
&lt;code&gt;print("Original Data:")&lt;/code&gt;&lt;br&gt;
&lt;code&gt;print(df.head())&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;//Filter out rows with any missing data (Complete Case Analysis)&lt;br&gt;
&lt;code&gt;df_complete_case = df.dropna()&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;//Display data after applying CCA&lt;br&gt;
&lt;code&gt;print("\nData after Complete Case Analysis:")&lt;/code&gt;&lt;br&gt;
&lt;code&gt;print(df_complete_case)&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;//Check the number of rows before and after CCA&lt;br&gt;
&lt;code&gt;print("\nNumber of rows before CCA:", len(df))&lt;/code&gt;&lt;br&gt;
&lt;code&gt;print("Number of rows after CCA:", len(df_complete_case))&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Note that CCA can create bias in data as on removal rows there are chances of losing important information.&lt;br&gt;
Check the following notebook for implementation of Handling Missing Values -:&lt;br&gt;
&lt;a href=""&gt;https://www.kaggle.com/code/nehagupta09/beginner-s-guide-to-handle-missing-values&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I hope you have understood that how missing values are handled in our dataset. In the next blog we are going to read take our discussion on feature engineering further. Till then stay connected and don't forget to follow me.&lt;/p&gt;

&lt;p&gt;Thankyou 💙&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>beginners</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Handling Missing Values || Feature Engineering || Machine Learning (Part1)</title>
      <dc:creator>Neha Gupta</dc:creator>
      <pubDate>Sat, 20 Jul 2024 07:56:22 +0000</pubDate>
      <link>https://forem.com/ngneha09/handling-missing-values-feature-engineering-machine-learning-part1-4h7b</link>
      <guid>https://forem.com/ngneha09/handling-missing-values-feature-engineering-machine-learning-part1-4h7b</guid>
      <description>&lt;p&gt;Hey reader👋Hope you are doing well😊&lt;br&gt;
We know that to improve performance machine learning model feature engineering is crucial step. One of most important tasks in feature engineering is handling outliers. In this blog we are going to do a detailed discussion on handling missing values. So let's get started 🔥.&lt;/p&gt;

&lt;h2&gt;
  
  
  What are Missing Values?
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;Missing values are data points that are absent for a specific variable in a dataset. They can be represented in various ways, such as blank cells, null values, or special symbols like “NA” or “unknown.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;These missing data points pose a significant challenge in data analysis and can lead to inaccurate or biased results.&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fid4xvgcaedqg9nxc31yi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fid4xvgcaedqg9nxc31yi.png" alt="Image description" width="727" height="370"&gt;&lt;/a&gt;&lt;br&gt;
There are many reasons for a dataset to contain missing values-:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Due to technical issues.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If the data comes from a survey then many people can leave blank response which can lead to missing values in the data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Data processing issues, privacy concerns etc.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Types of Missing Values
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Missing Completely at Random (MCAR)&lt;/strong&gt;&lt;br&gt;
MCAR is a specific type of missing data in which the probability of a data point being missing is entirely random and independent of any other variable in the dataset. In simpler terms, whether a value is missing or not has nothing to do with the values of other variables or the characteristics of the data point itself.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Missing at Random (MAR)&lt;/strong&gt;&lt;br&gt;
MAR is a type of missing data where the probability of a data point missing depends on the values of other variables in the dataset, but not on the missing variable itself. For example, if someone lost a schedule, then it may be replaced by a schedule taking at random from the set of filled schedules.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Missing not at random (MNAR)&lt;/strong&gt;&lt;br&gt;
MNAR is the most challenging type of missing data to deal with. It occurs when the probability of a data point being missing is related to the missing value itself. This means that the reason for the missing data is informative and directly associated with the variable that is missing. For example, when smoking status is not recorded in patients admitted as an emergency, who are also more likely to have worse outcomes from surgery.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How missing values impact our dataset?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;It can reduce the size of the sample or dataset.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Lack of information. If the dataset has large amount of missing values then there are high chances of lacking useful information.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If the missing data is not handled properly, it can bias (model not properly training on dataset) the results of your analysis.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Some statistical techniques require complete data for all variables, making them inapplicable when missing values are present.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Identify missing values
&lt;/h2&gt;

&lt;p&gt;There are different methods in Python's pandas library to identify missing values.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;code&gt;.isnull()&lt;/code&gt;&lt;/strong&gt; -: Identifies missing values in a Series or DataFrame.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;code&gt;.notnull()&lt;/code&gt;&lt;/strong&gt; -: Check for missing values in a pandas Series or DataFrame. It returns a boolean Series or DataFrame, where True indicates non-missing values and False indicates missing values.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;code&gt;.isna()&lt;/code&gt;&lt;/strong&gt; -: Similar to &lt;code&gt;notnull()&lt;/code&gt; but returns True for missing values and False for non-missing values.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Treating Missing Values
&lt;/h2&gt;

&lt;p&gt;There are various techniques used to treat missing values in a dataset. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Remove all the missing data&lt;/strong&gt;&lt;br&gt;
If the dataset doesn't contain significant amount of missing data then it is worthful to remove all the missing data. The method used in Python is-:&lt;br&gt;
&lt;strong&gt;&lt;code&gt;dropna()&lt;/code&gt;&lt;/strong&gt; -: Drops rows or columns containing missing values based on custom criteria.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Imputation&lt;/strong&gt;&lt;br&gt;
Imputation means replacing a missing value with another value based on reasonable estimate. This have chances to give high bias.&lt;br&gt;
Some common Imputation methods are -:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Mean Imputation&lt;/strong&gt; -: Replace missing values with the mean of the relevant variable. The strategy can highly be affected by outliers.
Implementation -:
Method 1-:
&lt;code&gt;df[column_name].fillna(df[column_name].mean())&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Method 2 -:&lt;br&gt;
Using &lt;code&gt;SimpleImputer()&lt;/code&gt;-:&lt;br&gt;
It is defined in sklearn library. It replace missing values using a descriptive statistic (e.g. mean, median, or most frequent) along each column, or using a constant value.&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjbey0sfjumttotosq3rp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjbey0sfjumttotosq3rp.png" alt="Image description" width="800" height="224"&gt;&lt;/a&gt;&lt;br&gt;
Here we have imported numpy and SimpleImputor and then created an instance of SimpleImputer named as &lt;code&gt;imp_mean&lt;/code&gt; which replaces missing value (&lt;code&gt;np.nan&lt;/code&gt;) by mean (&lt;code&gt;strategy="mean"&lt;/code&gt;). Then we have fitted the data to imputer and transformed it.&lt;br&gt;
We can use different strategies to impute missing values here.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Median Imputation&lt;/strong&gt; -: Replace missing values with the median of the relevant variable. &lt;br&gt;
Implementation -:&lt;br&gt;
&lt;code&gt;df[column_name].fillna(df[column_name].median())&lt;/code&gt;&lt;br&gt;
We can also use SimpleImputer all we need to do is to give strategy="median".&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Mode Imputation&lt;/strong&gt; -: Replace missing values with the mode of the relevant variable. &lt;br&gt;
Implementation -:&lt;br&gt;
&lt;code&gt;df[column_name].fillna(df[column_name].mode())&lt;/code&gt;&lt;br&gt;
We can also use SimpleImputer all we need to do is to give strategy="most_frequent". &lt;br&gt;
This strategy can be challenging in case of multimodal data (having more than one mode).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Forward and Backward Fill&lt;/strong&gt; &lt;br&gt;
Replace missing values with the previous or next non-missing value in the same variable.&lt;br&gt;
These fill methods are particularly useful when there is a logical sequence or order in the data, and missing values can be reasonably assumed to follow a pattern. The method parameter in &lt;code&gt;fillna()&lt;/code&gt; allows to specify the filling strategy, and here, it’s set to ‘ffill’ for forward fill and ‘bfill’ for backward fill.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Forward Fill&lt;/strong&gt;&lt;br&gt;
It replaces missing values with the last observed non-missing value in the column.&lt;br&gt;
Implementation-:&lt;br&gt;
forward_fill=&lt;code&gt;df[column_name].fillna(method='ffill')&lt;/code&gt;&lt;br&gt;
The result is stored in the variable forward_fill.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Backward Fill&lt;/strong&gt;&lt;br&gt;
It replaces missing values with the next observed non-missing value in the column.&lt;br&gt;
Implementation-:&lt;br&gt;
backward_fill=&lt;code&gt;df[column_name].fillna(method='bfill')&lt;/code&gt;&lt;br&gt;
The result is stored in the variable backward_fill.&lt;/p&gt;

&lt;p&gt;There are two more techniques which we will see in the next blog.&lt;br&gt;
I hope you have understood that how missing values are handled in our dataset. In the next blog we are going to read take our discussion further. Till then stay connected and don't forget to follow me.&lt;br&gt;
Thankyou 💙&lt;/p&gt;

</description>
      <category>python</category>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Handling Outliers|| Feature Engineering || Machine Learning</title>
      <dc:creator>Neha Gupta</dc:creator>
      <pubDate>Wed, 17 Jul 2024 05:56:08 +0000</pubDate>
      <link>https://forem.com/ngneha09/handling-outliers-feature-engineering-machine-learning-3316</link>
      <guid>https://forem.com/ngneha09/handling-outliers-feature-engineering-machine-learning-3316</guid>
      <description>&lt;p&gt;Hey reader👋Hope you are doing well😊&lt;br&gt;
We know that to improve performance machine learning model feature engineering is crucial step. One of most important tasks in feature engineering is handling outliers. In this blog we are going to do a detailed discussion on handling outliers. So let's get started 🔥.&lt;/p&gt;

&lt;h2&gt;
  
  
  What are Outliers?
&lt;/h2&gt;

&lt;p&gt;Outliers are extreme values that differ from most other data points in a dataset. They can have big impact on statistical analysis and skew the result of any hypothesis test.&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsafgxn1zh8gxpdwa3731.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsafgxn1zh8gxpdwa3731.png" alt="Image description" width="617" height="450"&gt;&lt;/a&gt;&lt;br&gt;
To understand it better let's consider an example-:&lt;br&gt;
Dataset A = [1,2,3,4,5,6]&lt;br&gt;
Mean =&amp;gt; 3.75&lt;br&gt;
Now let's some more datapoints in the dataset.&lt;br&gt;
A = [1,2,3,4,5,6,100,101]&lt;br&gt;
Mean =&amp;gt; 27.75&lt;br&gt;
So here we can see that the  mean is very much high just by adding two points and these two points are very different from rest of the other points in dataset, these points are definitely outliers. &lt;br&gt;
The outliers can negatively affect our data and modeling so it is very important to properly handle them.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Outliers are introduced in Data?
&lt;/h2&gt;

&lt;p&gt;Outliers in a dataset can be introduced through various mechanisms, both intentional and unintentional. Here are some common ways outliers can be introduced:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Human Error: Manual data entry mistakes, such as typing errors, can lead to outliers. For example, entering an extra zero or a decimal point in the wrong place.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Instrument Error: Faulty measurement instruments or sensors can produce erroneous values that stand out as outliers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Rare Events: Some outliers occur naturally due to rare events or extreme conditions. For example, an unusually high sales figure during a holiday season.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Merging Datasets: Combining datasets with different scales or units without proper alignment or adjustment can introduce outliers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Intentional Manipulation: In some cases, outliers might be introduced intentionally, such as in fraudulent financial reporting or tampering with experimental data.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Types of Outliers
&lt;/h2&gt;

&lt;p&gt;Based on their characteristics, outliers or anomalies can be divided into three categories -:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Global Outliers&lt;/strong&gt;&lt;br&gt;
Any observations or data points are considered as global outliers if they deviate significantly from the rest of the observations or data points in a dataset. For example, if you are collecting observations of temperatures in a city, then a value of 100 degrees would be considered an outlier, as it is an extreme as well as impossible temperature value for a city.&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2ykkjygjc118m7yrzzjc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2ykkjygjc118m7yrzzjc.png" alt="Image description" width="582" height="398"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Contextual Outliers&lt;/strong&gt;&lt;br&gt;
Any data points or observations are considered as contextual outliers if their value significantly deviates from the rest of the data points in a particular context. It means that the same values may not be considered an outlier in a different context. For example, if you have observations of temperatures in a city, then a value of 40 degrees would be considered an outlier in winter, but the same value might be part of the normal observations in summer.&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftjghex797xmxxfp9wg6y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftjghex797xmxxfp9wg6y.png" alt="Image description" width="728" height="335"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Collective Outliers&lt;/strong&gt;&lt;br&gt;
Any group of observations or data points within a data set is considered collective outliers if these observations as a collection deviate significantly from the entire data set. It means that these values, individually without collection with other data points, are not considered as either contextual or global outliers.&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxpv220s661qsdr9h6gem.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxpv220s661qsdr9h6gem.png" alt="Image description" width="452" height="367"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Identifying Outliers
&lt;/h2&gt;

&lt;p&gt;There are four ways of identifying outliers -:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Percentile Method&lt;/strong&gt; &lt;br&gt;
The percentile method identifies outliers in a dataset by comparing each observation to the rest of the data using percentiles. In this method, We first define the upper and lower bounds of a dataset using the desired percentiles. &lt;br&gt;
For example, we may use the 5th and 95th percentile for a dataset's lower and upper bounds, respectively. Any observations or data points that reside beyond and outside of these bounds can be considered outliers.&lt;br&gt;
This method is simple and useful for identifying outliers in symmetrical and normal distributions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Inter Quartile  Range (IQR) Method&lt;/strong&gt; &lt;br&gt;
This method is similar to Percentile method, a slight difference is here we define an Inter Quartile Range for detecting outliers.&lt;br&gt;
Q1 = 25th percentile&lt;br&gt;
Q3 = 75th percentile&lt;br&gt;
IQR = Q3-Q1&lt;br&gt;
Upper bound = Q3+1.5*(IQR)&lt;br&gt;
Lower bound = Q1-1.5*(IQR)&lt;br&gt;
We check every datapoint ,if the point is in range [Lower bound ,Upper bound] then it is a valid point otherwise it is an outlier.&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd3gswyiff9nmrwy84xfa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd3gswyiff9nmrwy84xfa.png" alt="Image description" width="800" height="278"&gt;&lt;/a&gt;&lt;br&gt;
We are considering 25th and 75th percentile here because we are assuming that our data is normally distributed and most of our data resides in this range.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Using Visualization&lt;/strong&gt;&lt;br&gt;
In python we can use box plot or whisker plot to detect outliers in a dataset.&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6wbnwzhq766edpr7ldw8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6wbnwzhq766edpr7ldw8.png" alt="Image description" width="799" height="420"&gt;&lt;/a&gt;&lt;br&gt;
The box plot just gives the visualization of IQR method.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Using Z score method&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;For a given value, the respective z-score represents its distance in terms of the standard deviation. For example, a z-score of 2 represents that the data point is 2 standard deviations away from the mean. To detect the outliers using the z-score, we can define the lower and upper bounds of the dataset. The upper bound is defined as z = 3, and the lower bound is defined as z = -3. This means any value more than 3 standard deviations away from the mean will be considered an outlier.&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3j3877wtojs3ib1q9b16.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3j3877wtojs3ib1q9b16.png" alt="Image description" width="748" height="482"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Python Implementation for detecting outliers
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmkizpasolmjh7lcqvvkn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmkizpasolmjh7lcqvvkn.png" alt="Image description" width="800" height="527"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Handling Outliers
&lt;/h2&gt;

&lt;p&gt;Depending on the dataset there are various ways to handle outliers-:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Removing Outliers&lt;/strong&gt;&lt;br&gt;
If the outliers are because of manual error it is better to remove them entirely from dataset. If dataset contains large number of outliers then removing them may result in loss of data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Transforming Outliers&lt;/strong&gt;&lt;br&gt;
The impact of outliers can be reduced or eliminated by transforming the feature. For example, a log transformation of a feature can reduce the skewness in the data, reducing the impact of outliers.&lt;br&gt;
(We will read about transformations in upcoming blogs)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Impute Outliers&lt;/strong&gt;&lt;br&gt;
In this outliers are considered as missing values and we can replace them with mean, median, mode, nearest neighbor etc.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Use robust statistical methods&lt;/strong&gt;&lt;br&gt;
Some of the statistical methods are less sensitive to outliers and can provide more reliable results when outliers are present in the data. For example, we can use median and IQR for the statistical analysis as they are not affected by the outlier’s presence. This way we can minimize the impact of outliers in statistical analysis.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Python Implementation of Handling Outliers
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhlxrq9x5owg5yyhcrqn6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhlxrq9x5owg5yyhcrqn6.png" alt="Image description" width="800" height="327"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I hope you have understood that how outliers are handled in our dataset. In the next blog we are going to read about how to handle missing values. Till then stay connected and don't forget to follow me.&lt;br&gt;
Thankyou 💙&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>machinelearning</category>
      <category>beginners</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Feature Engineering in ML</title>
      <dc:creator>Neha Gupta</dc:creator>
      <pubDate>Tue, 16 Jul 2024 09:27:47 +0000</pubDate>
      <link>https://forem.com/ngneha09/feature-engineering-in-ml-35id</link>
      <guid>https://forem.com/ngneha09/feature-engineering-in-ml-35id</guid>
      <description>&lt;p&gt;Hey reader👋&lt;br&gt;
We know that we train machine learning model on a dataset and generate prediction on any unseen data based on training. The data which we are using here must be structured and well defined so that our algorithm can work efficiently. To make our data more meaningful and useful for our algorithm we perform Feature Engineering on our dataset. Feature Engineering is one of the most important steps in Machine Learning.&lt;br&gt;
In this blog we are going to know about Feature Engineering and its importance. So let's get started🔥&lt;/p&gt;

&lt;h2&gt;
  
  
  Feature Engineering
&lt;/h2&gt;

&lt;p&gt;Feature Engineering is the process of using domain knowledge to extract features from raw data. These features can be used to improve the performance of Machine Learning algorithm.&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F26e1wbe1imipuze6wix4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F26e1wbe1imipuze6wix4.png" alt="Image description" width="800" height="288"&gt;&lt;/a&gt;&lt;br&gt;
So here you can see that we are working on a dataset, as a very first step we are processing data then we are extracting the important features using feature engineering then we are scaling the features i.e. transforming features in same unit. Once feature engineering is performed on dataset we are applying algorithm and then evaluating metrics. For better performance of model we are again performing feature engineering on the dataset till we get a good model.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Feature Engineering?
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Improves Model Performance&lt;/strong&gt;: Well-crafted features can significantly enhance the predictive power of our models. The better the features, the more likely the model will capture the underlying patterns in the data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Reduces Complexity&lt;/strong&gt;: By creating meaningful features, we can simplify the model's task, which often leads to better performance and reduced computational cost.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Enhances Interpretability&lt;/strong&gt;: Good features can make our model more interpretable, allowing us to understand and explain how the model makes its predictions.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Key Techniques in Feature Engineering
&lt;/h2&gt;

&lt;p&gt;The key techniques of Feature Engineering are -:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Feature Transformation&lt;/strong&gt; -: We can transform features so that our model can perform effectively on it and give better results. This generally involves -:&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Missing Value Imputation -: Techniques include imputation (filling missing values with mean, median, or mode), or using algorithms that can handle missing data directly.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Handling Categorical Data -: Converting categorical variables into numerical ones using methods like one-hot encoding or label encoding.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Outlier Detection -: Identifying and removing outliers can help in creating robust models.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Feature Scaling -:  Scaling features to a standard range or distribution can improve model performance, especially for distance-based algorithms.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Feature Construction&lt;/strong&gt; -: Sometimes to make our data more meaningful we add some extra information in our data based on existing information. This process is called Feature Construction. This can be done in following ways -:&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Polynomial Features: Creating interaction terms or polynomial terms of existing features to capture non-linear relationships.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Domain-Specific Features: Using domain knowledge to create features that capture essential characteristics of the data. For example, in a financial dataset, creating features like debt-to-income ratio or credit utilization.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Datetime Features: Extracting information such as day, month, year, or even whether a date falls on a weekend or holiday can provide valuable insights.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Feature Selection&lt;/strong&gt; -: Feature Selection is the process of selecting a subset of relevant features from the dataset to be used in a machine learning model. The different techniques we use for feature selection are -:&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Filter Method: Based on the statistical measure of the relationship between the feature and the target variable. Features with a high correlation are selected.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Wrapper Method: Based on the evaluation of the feature subset using a specific machine learning algorithm. The feature subset that results in the best performance is selected.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Embedded Method: Based on the feature selection as part of the training process of the machine learning algorithm.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Feature Extraction&lt;/strong&gt; -: Feature Extraction is the process of creating new features from existing ones to provide more relevant information to the machine learning model. This is important in machine learning because the scale of the features can affect the performance of the model. The various techniques used for feature extraction are -:&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Dimensionality Reduction: Reducing the number of features by transforming the data into a lower-dimensional space while retaining important information. Examples are PCA and t-SNE.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Feature Combination: Combining two or more existing features to create a new one. For example, the interaction between two features.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Feature Aggregation: Aggregating features to create a new one. For example, calculating the mean, sum, or count of a set of features.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Feature Transformation: Transforming existing features into a new representation. For example, log transformation of a feature with a skewed distribution.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So this was an introduction to feature engineering. In the upcoming blogs we are going to study about each technique separately. Till then stay connected and don't forget to follow me.&lt;br&gt;
Thankyou ❤&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>machinelearning</category>
      <category>beginners</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Handling Categorical Values|| Machine Learning</title>
      <dc:creator>Neha Gupta</dc:creator>
      <pubDate>Wed, 03 Jul 2024 17:25:37 +0000</pubDate>
      <link>https://forem.com/ngneha09/handling-categorical-values-machine-learning-a2</link>
      <guid>https://forem.com/ngneha09/handling-categorical-values-machine-learning-a2</guid>
      <description>&lt;p&gt;Hey reader👋 Hope you are doing well😊&lt;br&gt;
We know that machine learning is all about training our models on the given dataset and generating accurate output for any unseen similar data. There are algorithms (Regression algorithms) that works on numerical data only. And we know that dataset may contain numerical as well as categorical data. Then how can we use some algorithms that only work on numerical data on such dataset. To use Regression algorithms on categorical data we need to transform categorical data into numerical. But how can we do that?🤔&lt;br&gt;
Don't worry in this blog I am going to tell you that how we can handle Categorical data.&lt;br&gt;
So let's get started🔥&lt;/p&gt;

&lt;h2&gt;
  
  
  Handling Categorical Data
&lt;/h2&gt;

&lt;p&gt;Categorical data refers to the categories in the data. Example -&amp;gt; male, female, red, green, yes or no.&lt;br&gt;
(To understand the types of data that we can encounter please read this artice[&lt;a href="https://dev.to/ngneha09/day-2-of-machine-learning-582g%5D"&gt;https://dev.to/ngneha09/day-2-of-machine-learning-582g]&lt;/a&gt;)&lt;br&gt;
There are different techniques in Python's sklearn library to handle categorical data. Let's read about them-:&lt;/p&gt;

&lt;p&gt;*&lt;em&gt;1. Label Encoder *&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The Label Encoder identifies unique categories within a categorical variable and then it assigns unique value to each category. There is no strict rule on how these numerical labels are assigned. One common method is to assign labels based on alphabetical order of categories. It is best suited to ordinal categorical variables.&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftqctylo3s6trm2pzhcyj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftqctylo3s6trm2pzhcyj.png" alt="Image description" width="800" height="486"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Implementation-:&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftf4uylgq525ft26gtxog.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftf4uylgq525ft26gtxog.png" alt="Image description" width="800" height="165"&gt;&lt;/a&gt;&lt;br&gt;
So here you can see that we have imported the LabelEncoder from sklearn's &lt;code&gt;preprocessing&lt;/code&gt; module then we have created it's instance and then transformed categories into numerical labels using &lt;code&gt;fit_transform&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Disadvantage-:&lt;br&gt;
Due to arbitrary assignment this technique may not reflect meaningful relationships in the data. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. One Hot Encoding&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This technique creates binary features for each category in original variable.&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjgqqhfz9wr5pc7oz1xak.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjgqqhfz9wr5pc7oz1xak.png" alt="Image description" width="800" height="224"&gt;&lt;/a&gt;&lt;br&gt;
So here you can see that in the first row we have red color so we have 1 assigned color_red and others are given 0.&lt;/p&gt;

&lt;p&gt;Implementation-:&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuln2b822rxpr17yygq7v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuln2b822rxpr17yygq7v.png" alt="Image description" width="800" height="326"&gt;&lt;/a&gt;&lt;br&gt;
Here we have imported OneHotEncoder and then fit the data and transformed categories.&lt;/p&gt;

&lt;p&gt;Disadvantages-:&lt;br&gt;
With high cardinality categorical variables this can create sparse matrix, a matrix where most of the elements are 0. It can also result in increased dimensionality of data. Also it is not good for ordinal data as it doesn't preserve order.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Binary Encoding&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This technique is combination of Hashing and Binary. In this technique the unique categories are assigned unique integers which are then converted into binary code (bit representation). &lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw2ixtdmdsf8rypsw87pe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw2ixtdmdsf8rypsw87pe.png" alt="Image description" width="800" height="381"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Implementation-:&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk9a0wg4g67w6zetbr1ip.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk9a0wg4g67w6zetbr1ip.png" alt="Image description" width="800" height="376"&gt;&lt;/a&gt;&lt;br&gt;
Now you can see that extra columns are only the number of bits used in maximum integer assigned to categories.&lt;br&gt;
This technique is best for nominal data where we have large number of categories.&lt;/p&gt;

&lt;p&gt;Disadvantage-:&lt;br&gt;
This technique is not good for ordinal data as it does not follow any order.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Ordinal Encoding&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The critical aspect of Ordinal Encoding is to respect the inherent ordering of the categories. The integers should be assigned in such a way that the order of categories is preserved.&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvr6hojabbhn7mmupt8lv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvr6hojabbhn7mmupt8lv.png" alt="Image description" width="327" height="137"&gt;&lt;/a&gt;&lt;br&gt;
So here you can see that Poor is assigned 1 then Good is assigned 2 and so on. So here the ordering of the categories is preserved.&lt;/p&gt;

&lt;p&gt;Implementation-:&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvm32olm9okv1aqxlu1rk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvm32olm9okv1aqxlu1rk.png" alt="Image description" width="800" height="293"&gt;&lt;/a&gt;&lt;br&gt;
Here the encoder takes a 2D array ,we can see that the encoded data is in alphabetical order. This is because we have not given any particular order to encoder so it encodes data on the basis of alphabetical order.&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdouewz4y4s3oxwjn0bck.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdouewz4y4s3oxwjn0bck.png" alt="Image description" width="800" height="259"&gt;&lt;/a&gt;&lt;br&gt;
Here we have created an OrdinalEncoder instance with the specified order of categories.&lt;/p&gt;

&lt;p&gt;Disadvantages-:&lt;br&gt;
This encoding doesn't suit for nominal variables. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Frequency Encoding&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is used for nominal categorical variables with high cardinality. In this technique we calculate the frequency of each category and the encoded value is given by frequency of that category divided by total categories.&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F65bdu5wud8dlhuygf7k5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F65bdu5wud8dlhuygf7k5.png" alt="Image description" width="800" height="302"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Implementation-:&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftkv3cvcflrjc1xxg6p70.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftkv3cvcflrjc1xxg6p70.png" alt="Image description" width="800" height="305"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Disadvantage-:&lt;br&gt;
The major disadvantage of this technique is that multiple categories can have same frequency and as a result they will have same encoding.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. Mean Encoding&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In this technique each category in the feature variable is replaced with the mean value of the target variable for that category. &lt;br&gt;
Example-: Suppose we are predicting price of car (target variable) and we have a categorical variable 'Color'. If the average price of car is $20,000 then 'Red' would be replace by 20,000 in encoded feature.&lt;br&gt;
It is useful when dealing with high cardinality Categorical features.&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foahuk8ebv0527wyhrn2n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foahuk8ebv0527wyhrn2n.png" alt="Image description" width="800" height="218"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Implementation-:&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgjrfnyr454gth0t1kg43.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgjrfnyr454gth0t1kg43.png" alt="Image description" width="800" height="391"&gt;&lt;/a&gt;&lt;br&gt;
Here we have calculated mean of the target variable for each category. Map the original categories to their corresponding means. Replace each category with the computed mean.&lt;br&gt;
It has high chances of capturing any existing relationship between category and target variable.&lt;/p&gt;

&lt;p&gt;Disadvantages-:&lt;br&gt;
Mean encoding can lead to overfitting, especially when categories have few observations. Regularization techniques, such as smoothing, can help mitigate this risk.&lt;/p&gt;

&lt;p&gt;So this is how we handle categorical values. I hope you have understood it well. For more don't forget to follow me.&lt;br&gt;
Thankyou❤&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>machinelearning</category>
      <category>beginners</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
