<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: krish</title>
    <description>The latest articles on Forem by krish (@krishsharma0413).</description>
    <link>https://forem.com/krishsharma0413</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1219096%2Ff8e98872-b1ba-4c85-8105-ce2557269769.png</url>
      <title>Forem: krish</title>
      <link>https://forem.com/krishsharma0413</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/krishsharma0413"/>
    <language>en</language>
    <item>
      <title>[Project Breakdown] Syncord: Encrypted File Storage via Discord</title>
      <dc:creator>krish</dc:creator>
      <pubDate>Mon, 29 Dec 2025 22:08:35 +0000</pubDate>
      <link>https://forem.com/krishsharma0413/project-breakdown-syncord-using-discord-as-file-storage-system-2l0c</link>
      <guid>https://forem.com/krishsharma0413/project-breakdown-syncord-using-discord-as-file-storage-system-2l0c</guid>
      <description>&lt;p&gt;In today’s episode of project breakdown (yes, I’m starting this just now), we’ll look at one of my recent projects: &lt;strong&gt;Syncord&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Syncord?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/krishsharma0413/syncord" rel="noopener noreferrer"&gt;Syncord&lt;/a&gt; is a CLI-based file storage tool that uses Discord as an encrypted file storage.&lt;/p&gt;

&lt;p&gt;It allows you to upload files or entire directories to Discord, encrypt and partition them locally, and later download and reconstruct them securely; all through a simple command-line interface or an optional TUI dashboard.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F26rkldcu9w68nu8covew.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F26rkldcu9w68nu8covew.png" alt="start-up TUI image" width="627" height="570"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So in short, it is more or less a google drive but instead of google it's discord and instead of drive it's encrypted partitions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Even Create This Project?
&lt;/h2&gt;

&lt;p&gt;I’ve seen similar projects before, but most of them relied on a web server (Flask, Express.js) and a browser-based workflow. I personally didn’t enjoy hosting a server just to upload or download files.&lt;/p&gt;

&lt;p&gt;syncord was built to be &lt;strong&gt;purely command-line driven&lt;/strong&gt;, something I could put in my PATH and just use instantly without opening a browser.&lt;/p&gt;

&lt;p&gt;There are objectively better file storage solutions like Google Drive or One Drive, and I’d recommend those for most users. syncord exists primarily as a &lt;strong&gt;learning project and experiment&lt;/strong&gt;, built during my college winter break.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;People might question whether this violates Discord’s ToS. While syncord uses a user-owned bot, standard attachment uploads, and documented APIs. It does not exploit bugs. However, excessive or abusive usage could still result in rate limiting or action against the bot.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Why Discord?
&lt;/h2&gt;

&lt;p&gt;Well, Discord provides up-to 8MB of upload limit per message and while Discord does not guarantee long-term retention, but in practice files persist unless removed or moderated (I still have photos sent to a friend back in 2019 on Discord). So why not use it?&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I am thinking of adding telegram or other "free" storage providers too but that would most likely not happen anytime soon.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The Core Idea
&lt;/h2&gt;

&lt;p&gt;Project structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;syncord/
└── core/
    ├── ascii_hell.py
    ├── db_manager.py
    ├── discord_handler.py
    ├── encrypter.py
    ├── partition.py
    ├── setup.py
    └── tui.py
└── main.py
└── setup.yaml
└── syncord.db
└── syncord.exe
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let us talk about the 8MB upload limit first&lt;/p&gt;

&lt;h4&gt;
  
  
  files size &amp;lt;= 8MB
&lt;/h4&gt;

&lt;p&gt;If a file is smaller than 8MB then you can upload it without having any issues from discord and would work as intended.&lt;/p&gt;

&lt;h4&gt;
  
  
  file size &amp;gt; 8MB
&lt;/h4&gt;

&lt;p&gt;&lt;em&gt;Problem starts when the file size is actually greater than 8MB&lt;/em&gt;. To tackle that in my project, I have created a &lt;strong&gt;partition system&lt;/strong&gt; that chunks files into relevant size and a &lt;strong&gt;SQLite DB&lt;/strong&gt; that tracks the partition number and partition ID.&lt;/p&gt;

&lt;h4&gt;
  
  
  What is this partition system? &lt;code&gt;core/partition.py&lt;/code&gt;
&lt;/h4&gt;

&lt;p&gt;When a file is intended to be uploaded via &lt;a href="https://github.com/krishsharma0413/syncord" rel="noopener noreferrer"&gt;syncord&lt;/a&gt;, the file is broken into 5.5MB chunks and the metadata is stored in a database.&lt;/p&gt;

&lt;p&gt;The files created have &lt;code&gt;{number}.bin&lt;/code&gt; extension since I am essentially saying binary of the file itself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DB Table:&lt;/strong&gt; &lt;code&gt;core/database_manager.py&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;exists&lt;/span&gt; &lt;span class="n"&gt;syncord&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parition_number&lt;/span&gt; &lt;span class="nb"&gt;integer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parition_uuid&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message_id&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;file_name&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;folder_name&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;file_size_bytes&lt;/span&gt; &lt;span class="nb"&gt;integer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The DB includes: (yes typo was done in the DB itself which I never fixed)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;parition_number&lt;/strong&gt;: Which number partition this record represents. (e.g. 1.bin, 2.bin, etc)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;parition_uuid&lt;/strong&gt;: Which file are we talking about?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;message_id&lt;/strong&gt;: We get this after uploading to Discord.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;file_name&lt;/strong&gt;: Helps with knowing the original name along with it's extension.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;folder_name&lt;/strong&gt;: Since &lt;a href="https://github.com/krishsharma0413/syncord" rel="noopener noreferrer"&gt;syncord&lt;/a&gt; support entire folder upload feature so this helps with knowing the exact tree of it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;file_size_bytes&lt;/strong&gt;: Currently used for stats. Later could be used to checksum to further save the integrity of file.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What about privacy?
&lt;/h3&gt;

&lt;p&gt;While uploading to discord sounds like a wonderful idea for some people. Some might also question how safe it is since anyone can join your discord server without much effort. For that, I have implemented an encryption system.&lt;/p&gt;

&lt;h4&gt;
  
  
  Encryption System &lt;code&gt;core/encrypter.py&lt;/code&gt;
&lt;/h4&gt;

&lt;p&gt;I used &lt;strong&gt;fernet&lt;/strong&gt; (Fernet encryption is a symmetric, authenticated cryptography system from Python's cryptography library that uses a single secret key for both encryption and decryption, ensuring messages can't be read or tampered with unless you have the key.) to encrypt my bytes before making then &lt;code&gt;.bin&lt;/code&gt; file which gets uploaded. The key is stored locally (setup.yaml) on the PC itself rather than uploading that on Discord.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Since fernet introduces it's own metadata overhead, padding and base64 encoding on top of the data, it increases the file size (approx ~33.33%) and that was why I chunked on 5.5MB and not 8MB itself.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  How is it stored on discord?
&lt;/h3&gt;

&lt;p&gt;I am using &lt;a href="https://github.com/Pycord-Development/pycord" rel="noopener noreferrer"&gt;py-cord&lt;/a&gt; that is commonly used for creating a Discord BOT. A channel named &lt;code&gt;syncord&lt;/code&gt; gets created in the provided guild ID. All the files are sent together via the API and their message IDs are stored in the DB to help retrieve the files again when downloaded.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;setup&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;bot_token&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your_bot_token"&lt;/span&gt;
  &lt;span class="na"&gt;encryption_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto_generated_using_fernet"&lt;/span&gt;
  &lt;span class="na"&gt;guild_id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1234567890123456789&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pretty much sums up what exactly happens in the background.&lt;br&gt;
&lt;strong&gt;Upload&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;File is partitioned into 5.5MB chunks (if needed)&lt;/li&gt;
&lt;li&gt;Each chunk is encrypted using Fernet&lt;/li&gt;
&lt;li&gt;Chunks are uploaded via a Discord bot&lt;/li&gt;
&lt;li&gt;Message IDs and metadata are stored locally in SQLite&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Download&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Metadata is read from the local database&lt;/li&gt;
&lt;li&gt;Chunks are fetched using stored message IDs&lt;/li&gt;
&lt;li&gt;Data is decrypted locally&lt;/li&gt;
&lt;li&gt;Original file is reconstructed and saved&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  Usage
&lt;/h2&gt;
&lt;h4&gt;
  
  
  Prebuilt Binary (Recommended)
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;Download &lt;code&gt;syncord.exe&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Place it in a directory included in your system PATH&lt;/li&gt;
&lt;li&gt;Verify installation:
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;syncord &lt;span class="nt"&gt;--help&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;syncord can now be invoked from any directory.&lt;/p&gt;
&lt;h3&gt;
  
  
  Mandatory Initial Setup
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Syncord requires setup before first use.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Both the Discord bot token and Guild ID are mandatory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Required Configuration&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Discord Bot Token&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Discord Guild ID&lt;/strong&gt; (server where files are uploaded)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Encryption Key&lt;/strong&gt; (generated locally)
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;syncord setup &lt;span class="nt"&gt;--token&lt;/span&gt; YOUR_DISCORD_BOT_TOKEN &lt;span class="nt"&gt;--guild-id&lt;/span&gt; YOUR_GUILD_ID &lt;span class="nt"&gt;--encryption-key&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h4&gt;
  
  
  Important Notes
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;The encryption key is critical.&lt;/li&gt;
&lt;li&gt;Losing the key means &lt;strong&gt;permanent data loss&lt;/strong&gt;. &lt;/li&gt;
&lt;li&gt;Discord cannot decrypt or recover your files.&lt;/li&gt;
&lt;li&gt;syncord will not function without a valid token and guild ID.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Uploading Files
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Upload a Single File&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;syncord upload file.ext
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Upload an Entire Directory&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkp8mjzzf7866t7g85frp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkp8mjzzf7866t7g85frp.png" alt="TUI with folders and files image" width="800" height="681"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;syncord upload folder_name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Behavior&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;All files inside the directory are processed&lt;/li&gt;
&lt;li&gt;Folder structure is logically preserved&lt;/li&gt;
&lt;li&gt;Each directory becomes its own Syncord folder namespace&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Downloading Files
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;syncord download folder_on_syncord/file
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Syncord will:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Locate all required partitions&lt;/li&gt;
&lt;li&gt;Download them from Discord&lt;/li&gt;
&lt;li&gt;Decrypt locally&lt;/li&gt;
&lt;li&gt;Reassemble the original file&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;If no folder is provided, Syncord assumes the default upload namespace &lt;code&gt;default&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  TUI Dashboard
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fewpgl8x7izn0kjm3oiol.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fewpgl8x7izn0kjm3oiol.png" alt="TUI image" width="800" height="536"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;syncord dashboard
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stored file overview&lt;/li&gt;
&lt;li&gt;Folder and file downloads&lt;/li&gt;
&lt;li&gt;Storage usage&lt;/li&gt;
&lt;li&gt;Interactive navigation&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Usage Statistics
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;syncord stats
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Displays:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Total files uploaded&lt;/li&gt;
&lt;li&gt;Total downloads&lt;/li&gt;
&lt;li&gt;Storage usage&lt;/li&gt;
&lt;li&gt;Other tracked metrics&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Command Summary
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;syncord setup&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Mandatory initial configuration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;syncord upload&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Upload a file or directory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;syncord download&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Download and reconstruct a file&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;syncord dashboard&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Launch terminal UI dashboard&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;syncord stats&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Show usage statistics&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Limitations &amp;amp; Tradeoffs
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Discord is not a storage service and offers no durability guarantees.&lt;/li&gt;
&lt;li&gt;Heavy usage may trigger rate limits or bot restrictions.&lt;/li&gt;
&lt;li&gt;Loss of the encryption key results in permanent data loss.&lt;/li&gt;
&lt;li&gt;Metadata (filenames, folder names) is stored locally but not encrypted.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;This was my first time building a TUI-based application, coming from a background focused mainly on back-end API development. The project was fun.&lt;/p&gt;

&lt;p&gt;Writing this blog also motivated me to start documenting my past and future projects. I hope the &lt;strong&gt;dev.to&lt;/strong&gt; community enjoys these breakdowns. I plan to make them a regular thing.&lt;/p&gt;

&lt;p&gt;This was also one of my first blog written specifically over a project. I would apologize if it was a little challenging to read or comprehend, I intend to improve over time and practice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub Link&lt;/strong&gt;: &lt;a href="https://github.com/krishsharma0413/syncord" rel="noopener noreferrer"&gt;https://github.com/krishsharma0413/syncord&lt;/a&gt; ⭐ if you like what you see.&lt;br&gt;
&lt;strong&gt;personal rating&lt;/strong&gt;: 8.5/10&lt;br&gt;
Here is a &lt;a href="https://github.com/krishsharma0413/syncord/releases/tag/v0.0.1" rel="noopener noreferrer"&gt;video&lt;/a&gt; of the entire project. &lt;/p&gt;

</description>
      <category>programming</category>
      <category>sideprojects</category>
      <category>python</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Different Encoding Methods for your Dataset.</title>
      <dc:creator>krish</dc:creator>
      <pubDate>Tue, 16 Jul 2024 04:27:32 +0000</pubDate>
      <link>https://forem.com/krishsharma0413/different-encoding-methods-for-your-dataset-3ceh</link>
      <guid>https://forem.com/krishsharma0413/different-encoding-methods-for-your-dataset-3ceh</guid>
      <description>&lt;p&gt;Hey there, data enthusiasts! 🎀&lt;br&gt;&lt;br&gt;
In the exciting world of data science and machine learning, one of the first and most crucial steps is turning raw data into a format that our models can understand and learn from. This process, called data preprocessing, involves several important steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Data Cleaning&lt;/strong&gt;: Removal of noise and inconsistent data. Let's say there was a feature with 80% null values. will you still keep it? What about 20% null values. Those can easily be filled with statistics like mean of all categorical data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Integration&lt;/strong&gt;: Combine multiple dataset sources for better predictions. Eg. combining driver's medical record with race and season data to predict their position in an F1 race. While the health wouldn't be much helpful but using that as a weight for previous race position will drastically increase its importance!&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Selection&lt;/strong&gt;: Selection important and useful data. Try doing feature engineering and get the best features for your model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Transformation&lt;/strong&gt;: Data are transformed and consolidated for mining by performing encodings and feature engineering. I consider this as the most important topic before data mining since without encoding, data mining is useless and unhelpful.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Mining&lt;/strong&gt;: Intelligent methods are applied to extract data patterns &lt;strong&gt;OR&lt;/strong&gt; &lt;em&gt;Extraction of implicit, previously unknown and potentially useful information from data&lt;/em&gt;. Eg. using the race year and DOB of driver to find out the age of the driver to provide new insights while removing 2 columns from model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pattern Evaluation&lt;/strong&gt;: Identify the truly fascinating pattern using various evaluation metrics.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Knowledge Presentation&lt;/strong&gt;: Create graphs and stats like charts, heatmaps, and much more. Understand your data and improvise wherever needed using above steps.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Central to this preprocessing is the task of encoding. This blog delves into the various encoding methodologies, providing a comprehensive analysis of them.&lt;/p&gt;
&lt;h2&gt;
  
  
  Importance of Encoding
&lt;/h2&gt;

&lt;p&gt;Encoding is a crucial step in the data preprocessing pipeline, especially when dealing with categorical data. Categorical variables, which represent data that can be divided into specific groups or categories, often need to be converted into a numerical format for machine learning algorithms to process them effectively. This conversion process is known as encoding. Machine learning models typically require numerical input because they are based on mathematical calculations that cannot interpret categorical data directly. By transforming categorical data into numerical values through various encoding techniques, we can ensure that our models can leverage all available information, leading to better performance and more accurate predictions. Encoding not only makes data suitable for analysis but also helps preserve the relationships and characteristics inherent in the original categorical variables.&lt;/p&gt;
&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;No sane person codes on paper, he who codes on paper has mastered the essence of coding or the truth behind the universe itself.&lt;/em&gt; - ME🎀&lt;/p&gt;

&lt;p&gt;Install the following required &lt;strong&gt;Python libraries&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install scikit-learn pandas category_encoders
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Different datasets requires different encoding methods. Therefore, different examples might get used for each encoding methods.&lt;/p&gt;
&lt;h2&gt;
  
  
  Types of Encoding
&lt;/h2&gt;

&lt;p&gt;While there are hundreds of encoding methods, we will focus on the most important and widely used ones.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Multi-Hot Encoding&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Label Encoding&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Ordinal Encoding&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Binary Encoding&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Target Encoding&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Frequency Encoding&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
  
  
  Multi-Hot Encoding
&lt;/h3&gt;

&lt;p&gt;This method converts into binary-like data. Categorical values is mapped to a binary vector of length equal to the no. of categories. &lt;em&gt;This method is usually used in classification models.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Example: Imagine you have a dataset of music tracks.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Artist&lt;/th&gt;
&lt;th&gt;Genre&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Fly Me to the Moon&lt;/td&gt;
&lt;td&gt;The Macarons Project&lt;/td&gt;
&lt;td&gt;["slow", "acoustic", "pop"]&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mad at Disney&lt;/td&gt;
&lt;td&gt;Salem ilese&lt;/td&gt;
&lt;td&gt;["dance", "pop"]&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Here, the &lt;code&gt;genre&lt;/code&gt; is a feature we need to encode since providing array of multiple genre-names would be ineffective to the model.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.preprocessing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MultiLabelBinarizer&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="c1"&gt;# Creating the dataframe with list of genres per song
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Fly Me to the Moon&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Mad at Disney&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;artist&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The Macarons Project&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Salem ilese&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;genre&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;slow&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;acoustic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pop&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dance&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pop&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;# Using MultiLabelBinarizer to handle the list of genres
&lt;/span&gt;&lt;span class="n"&gt;mlb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MultiLabelBinarizer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;x_encoded&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mlb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;genre&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Creating the encoded dataframe
&lt;/span&gt;&lt;span class="n"&gt;encoded_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_encoded&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;mlb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;classes_&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Concatenating the original columns with the encoded genres
&lt;/span&gt;&lt;span class="n"&gt;df_final&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;concat&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;drop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;genre&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt; &lt;span class="n"&gt;encoded_df&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df_final&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;name&lt;/th&gt;
&lt;th&gt;artist&lt;/th&gt;
&lt;th&gt;acoustic&lt;/th&gt;
&lt;th&gt;dance&lt;/th&gt;
&lt;th&gt;pop&lt;/th&gt;
&lt;th&gt;slow&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Fly Me to the Moon&lt;/td&gt;
&lt;td&gt;The Macarons Project&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mad at Disney&lt;/td&gt;
&lt;td&gt;Salem ilese&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The data is encoded with the genres where &lt;strong&gt;1&lt;/strong&gt; means &lt;strong&gt;HOT&lt;/strong&gt; (&lt;em&gt;or present&lt;/em&gt;) and &lt;strong&gt;0&lt;/strong&gt; means &lt;strong&gt;COLD&lt;/strong&gt; (&lt;em&gt;or absent&lt;/em&gt;). A similar approach can be taken with &lt;strong&gt;One-Hot Encoding&lt;/strong&gt; but &lt;em&gt;binary Encoding&lt;/em&gt; or &lt;em&gt;Label Encoding&lt;/em&gt; is better in those cases most of the time.&lt;/p&gt;
&lt;h3&gt;
  
  
  Label Encoding
&lt;/h3&gt;

&lt;p&gt;This method converts each categorical value into a numerical data.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Similar to multi-hot encoding in a way. The only key difference would be that Label Encoding might inadvertently introduce ordinal relationships where none exist, which can mislead some algorithms. multi-hot encoding avoids this by treating each category independently.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Example: A company sells shirt of different sizes and colours for &lt;code&gt;X&lt;/code&gt; amount of price.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Colour&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;Company&lt;/th&gt;
&lt;th&gt;Price&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;red&lt;/td&gt;
&lt;td&gt;L&lt;/td&gt;
&lt;td&gt;Max&lt;/td&gt;
&lt;td&gt;300&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;blue&lt;/td&gt;
&lt;td&gt;S&lt;/td&gt;
&lt;td&gt;ACM&lt;/td&gt;
&lt;td&gt;230&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;red&lt;/td&gt;
&lt;td&gt;XL&lt;/td&gt;
&lt;td&gt;Zara&lt;/td&gt;
&lt;td&gt;568&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;green&lt;/td&gt;
&lt;td&gt;S&lt;/td&gt;
&lt;td&gt;Gucci&lt;/td&gt;
&lt;td&gt;927&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;where we need to use encoding for all 3 columns &lt;code&gt;Colour&lt;/code&gt;, &lt;code&gt;Size&lt;/code&gt;, and &lt;code&gt;Company&lt;/code&gt;. We will use Label Encoding since that addition to bias can help model to predict with better accuracy.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.preprocessing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LabelEncoder&lt;/span&gt;

&lt;span class="c1"&gt;# Creating the dataframe
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Colour&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;red&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;blue&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;red&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;green&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Size&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;L&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;S&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;XL&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;S&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Company&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Max&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ACM&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Zara&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Gucci&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Price&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;230&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;568&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;927&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;# Label Encoding for 'Colour', 'Size', and 'Company'
&lt;/span&gt;&lt;span class="n"&gt;label_encoder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LabelEncoder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Colour_encoded&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;label_encoder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Colour&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Size_encoded&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;label_encoder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Size&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Company_encoded&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;label_encoder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Company&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Drop the original categorical columns after encoding
&lt;/span&gt;&lt;span class="n"&gt;df_final&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;drop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Colour&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Size&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Company&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df_final&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Price&lt;/th&gt;
&lt;th&gt;Colour_encoded&lt;/th&gt;
&lt;th&gt;Size_encoded&lt;/th&gt;
&lt;th&gt;Company_encoded&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;300&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;230&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;568&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;927&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The numerical value here is assigned by sorting (alphabetically or numerically) the categories by default but if we want to intentionally give a preference to this encoding then we should look into &lt;em&gt;Ordinal Encoding&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Ordinal Encoding
&lt;/h3&gt;

&lt;p&gt;Similar to &lt;strong&gt;Label Encoding&lt;/strong&gt; with the only difference that we ourselves provide a specific order of importance to the categories (unlink how label encoder sorted all categories to provide numbering to it).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt; In the Label Encoding example, the company should be in your preference order since we know companies like &lt;strong&gt;Gucci&lt;/strong&gt; or &lt;strong&gt;Zara&lt;/strong&gt; will sell T-shirts at expensive prices.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Colour&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;Company&lt;/th&gt;
&lt;th&gt;Price&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;red&lt;/td&gt;
&lt;td&gt;L&lt;/td&gt;
&lt;td&gt;Max&lt;/td&gt;
&lt;td&gt;300&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;blue&lt;/td&gt;
&lt;td&gt;S&lt;/td&gt;
&lt;td&gt;ACM&lt;/td&gt;
&lt;td&gt;230&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;red&lt;/td&gt;
&lt;td&gt;XL&lt;/td&gt;
&lt;td&gt;Zara&lt;/td&gt;
&lt;td&gt;568&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;green&lt;/td&gt;
&lt;td&gt;S&lt;/td&gt;
&lt;td&gt;Gucci&lt;/td&gt;
&lt;td&gt;927&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Let's use &lt;code&gt;["ACM", "Max", "Zara", "Gucci"]&lt;/code&gt; as our order of cheap to expensive T-shirts.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.preprocessing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LabelEncoder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;OrdinalEncoder&lt;/span&gt;

&lt;span class="c1"&gt;# Creating the dataframe
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Colour&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;red&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;blue&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;red&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;green&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Size&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;L&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;S&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;XL&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;S&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Company&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Max&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ACM&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Zara&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Gucci&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Price&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;230&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;241&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;927&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;# Label Encoding for 'Colour' and 'Size'
&lt;/span&gt;&lt;span class="n"&gt;label_encoder_colour&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LabelEncoder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;label_encoder_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LabelEncoder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Colour_encoded&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;label_encoder_colour&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Colour&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Size_encoded&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;label_encoder_size&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Size&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Ordinal Encoding for 'Company' with the specified reversed order
&lt;/span&gt;&lt;span class="n"&gt;company_order&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ACM&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Max&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Zara&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Gucci&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;ordinal_encoder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OrdinalEncoder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;categories&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;company_order&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Company_encoded&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ordinal_encoder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Company&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]])&lt;/span&gt;

&lt;span class="c1"&gt;# Drop the original categorical columns after encoding
&lt;/span&gt;&lt;span class="n"&gt;df_final&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;drop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Colour&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Size&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Company&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df_final&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Price&lt;/th&gt;
&lt;th&gt;Colour_encoded&lt;/th&gt;
&lt;th&gt;Size_encoded&lt;/th&gt;
&lt;th&gt;Company_encoded&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;300&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;230&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;241&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;927&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This adds bias to the model depending upon the company name.&lt;/p&gt;
&lt;h3&gt;
  
  
  Binary Encoding
&lt;/h3&gt;

&lt;p&gt;This method converts each categorical value into binary digits (0s and 1s) then store them as separate columns. This is useful when you have many categories to encode and want to reduce dimensionality compared to multi-hot encoding.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Converts each category into binary code and then split the binary digits into separate columns. Results in log2(N) amount of columns while multi-hot encoding would provide (N) columns.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt; Encoding just the Colours into something suitable.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Colour&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Red&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Green&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Blue&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Red&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;category_encoders&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BinaryEncoder&lt;/span&gt;

&lt;span class="c1"&gt;# Sample data
&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Colour&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Red&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Green&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Blue&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Red&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]})&lt;/span&gt;

&lt;span class="c1"&gt;# Create a BinaryEncoder object
&lt;/span&gt;&lt;span class="n"&gt;encoder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BinaryEncoder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cols&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Colour&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Encode the categorical feature
&lt;/span&gt;&lt;span class="n"&gt;encoded_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;encoder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;encoded_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Colour_0&lt;/th&gt;
&lt;th&gt;Colour_1&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Most of the time, if the categories are less. We should use multi-hot encoding or label encoding.&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Target Encoding
&lt;/h3&gt;

&lt;p&gt;Also known as &lt;em&gt;Mean Encoding&lt;/em&gt; or &lt;em&gt;Livelihood encoding&lt;/em&gt;. This method encodes the categorical values by replacing each category with statistics of the target variable in that category. &lt;/p&gt;

&lt;p&gt;&lt;em&gt;Highly recommended and very useful for handling high cardinality categorical variables. This captures relationship between the categorical variables and the target variable more effectively than one-hot encoding.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Formula:&lt;/strong&gt;&lt;br&gt;


&lt;/p&gt;
&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;EncodingValue=(n×CategoricalMean)+(m×GlobalMean)n+m
Encoding Value = \frac{(n \times Categorical Mean) + (m \times Global Mean)}{n + m}
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;E&lt;/span&gt;&lt;span class="mord mathnormal"&gt;n&lt;/span&gt;&lt;span class="mord mathnormal"&gt;co&lt;/span&gt;&lt;span class="mord mathnormal"&gt;d&lt;/span&gt;&lt;span class="mord mathnormal"&gt;in&lt;/span&gt;&lt;span class="mord mathnormal"&gt;g&lt;/span&gt;&lt;span class="mord mathnormal"&gt;Va&lt;/span&gt;&lt;span class="mord mathnormal"&gt;l&lt;/span&gt;&lt;span class="mord mathnormal"&gt;u&lt;/span&gt;&lt;span class="mord mathnormal"&gt;e&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mopen nulldelimiter"&gt;&lt;/span&gt;&lt;span class="mfrac"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;n&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;+&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;m&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="frac-line"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;n&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;×&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;C&lt;/span&gt;&lt;span class="mord mathnormal"&gt;a&lt;/span&gt;&lt;span class="mord mathnormal"&gt;t&lt;/span&gt;&lt;span class="mord mathnormal"&gt;e&lt;/span&gt;&lt;span class="mord mathnormal"&gt;g&lt;/span&gt;&lt;span class="mord mathnormal"&gt;or&lt;/span&gt;&lt;span class="mord mathnormal"&gt;i&lt;/span&gt;&lt;span class="mord mathnormal"&gt;c&lt;/span&gt;&lt;span class="mord mathnormal"&gt;a&lt;/span&gt;&lt;span class="mord mathnormal"&gt;lM&lt;/span&gt;&lt;span class="mord mathnormal"&gt;e&lt;/span&gt;&lt;span class="mord mathnormal"&gt;an&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;+&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;m&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;×&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;Gl&lt;/span&gt;&lt;span class="mord mathnormal"&gt;o&lt;/span&gt;&lt;span class="mord mathnormal"&gt;ba&lt;/span&gt;&lt;span class="mord mathnormal"&gt;lM&lt;/span&gt;&lt;span class="mord mathnormal"&gt;e&lt;/span&gt;&lt;span class="mord mathnormal"&gt;an&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose nulldelimiter"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;



&lt;p&gt;here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;n: No. of samples.&lt;/li&gt;
&lt;li&gt;m: smoothing parameter.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt; In house prediction model, encoding neighborhood names wth mean of house price in those area would provide more insights than just normal label encoding.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;House Number&lt;/th&gt;
&lt;th&gt;Price&lt;/th&gt;
&lt;th&gt;Neighborhood&lt;/th&gt;
&lt;th&gt;Size (sq meter)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;500000&lt;/td&gt;
&lt;td&gt;Downtown&lt;/td&gt;
&lt;td&gt;200&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;350000&lt;/td&gt;
&lt;td&gt;Suburb&lt;/td&gt;
&lt;td&gt;150&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;700000&lt;/td&gt;
&lt;td&gt;City Center&lt;/td&gt;
&lt;td&gt;300&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;450000&lt;/td&gt;
&lt;td&gt;Suburb&lt;/td&gt;
&lt;td&gt;180&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;600000&lt;/td&gt;
&lt;td&gt;Downtown&lt;/td&gt;
&lt;td&gt;250&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="c1"&gt;# Original dataset
&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;House Number&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Price&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;500000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;350000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;700000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;450000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;600000&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Neighborhood&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Downtown&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Suburb&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;City Center&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Suburb&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Downtown&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Size (sq meter)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;150&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;180&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;250&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Calculate mean price for each neighborhood
&lt;/span&gt;&lt;span class="n"&gt;neighborhood_means&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Neighborhood&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Price&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;to_dict&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Map mean prices back to the original dataset
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Neighborhood&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Neighborhood&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;neighborhood_means&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Display the encoded dataset
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;House Number&lt;/th&gt;
&lt;th&gt;Price&lt;/th&gt;
&lt;th&gt;Neighborhood&lt;/th&gt;
&lt;th&gt;Size (sq meter)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;500000&lt;/td&gt;
&lt;td&gt;550000.0&lt;/td&gt;
&lt;td&gt;200&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;350000&lt;/td&gt;
&lt;td&gt;400000.0&lt;/td&gt;
&lt;td&gt;150&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;700000&lt;/td&gt;
&lt;td&gt;700000.0&lt;/td&gt;
&lt;td&gt;300&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;450000&lt;/td&gt;
&lt;td&gt;400000.0&lt;/td&gt;
&lt;td&gt;180&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;600000&lt;/td&gt;
&lt;td&gt;550000.0&lt;/td&gt;
&lt;td&gt;250&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h3&gt;
  
  
  Frequency Encoding
&lt;/h3&gt;

&lt;p&gt;This method replaces each categorical value with its frequency or count within the training dataset.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Formula:&lt;/strong&gt;&lt;br&gt;

&lt;/p&gt;
&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;Frequency(category)=Count(category)Total observations
\text{Frequency}(\text{category}) = \frac{\text{Count}(\text{category})}{\text{Total observations}}
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord text"&gt;&lt;span class="mord"&gt;Frequency&lt;/span&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord text"&gt;&lt;span class="mord"&gt;category&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mopen nulldelimiter"&gt;&lt;/span&gt;&lt;span class="mfrac"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord text"&gt;&lt;span class="mord"&gt;Total observations&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="frac-line"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord text"&gt;&lt;span class="mord"&gt;Count&lt;/span&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord text"&gt;&lt;span class="mord"&gt;category&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose nulldelimiter"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt; Encoding cities based on the no. of times each city appears in the dataset.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Transaction ID&lt;/th&gt;
&lt;th&gt;Amount&lt;/th&gt;
&lt;th&gt;City&lt;/th&gt;
&lt;th&gt;Product Category&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;New York&lt;/td&gt;
&lt;td&gt;Electronics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;200&lt;/td&gt;
&lt;td&gt;Los Angeles&lt;/td&gt;
&lt;td&gt;Clothing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;150&lt;/td&gt;
&lt;td&gt;Chicago&lt;/td&gt;
&lt;td&gt;Electronics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;300&lt;/td&gt;
&lt;td&gt;New York&lt;/td&gt;
&lt;td&gt;Groceries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;250&lt;/td&gt;
&lt;td&gt;Chicago&lt;/td&gt;
&lt;td&gt;Clothing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="c1"&gt;# Example dataset with customer transactions
&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Transaction ID&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Amount&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;150&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;250&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;City&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;New York&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Los Angeles&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Chicago&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;New York&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Chicago&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Product Category&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Electronics&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Clothing&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Electronics&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Groceries&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Clothing&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Using 'City' as a parameter (simple example)
&lt;/span&gt;&lt;span class="n"&gt;selected_city&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;New York&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

&lt;span class="c1"&gt;# Filter the dataset for the selected city
&lt;/span&gt;&lt;span class="n"&gt;filtered_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;City&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;selected_city&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Data for transactions in &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;selected_city&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filtered_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Applying frequency encoding to 'City'
&lt;/span&gt;&lt;span class="n"&gt;city_frequency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;City&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;normalize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;City&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;City&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;city_frequency&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Transaction ID&lt;/th&gt;
&lt;th&gt;Amount&lt;/th&gt;
&lt;th&gt;City&lt;/th&gt;
&lt;th&gt;Product Category&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;0.4&lt;/td&gt;
&lt;td&gt;Electronics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;200&lt;/td&gt;
&lt;td&gt;0.2&lt;/td&gt;
&lt;td&gt;Clothing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;150&lt;/td&gt;
&lt;td&gt;0.4&lt;/td&gt;
&lt;td&gt;Electronics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;300&lt;/td&gt;
&lt;td&gt;0.4&lt;/td&gt;
&lt;td&gt;Groceries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;250&lt;/td&gt;
&lt;td&gt;0.4&lt;/td&gt;
&lt;td&gt;Clothing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;With this, all the important and necessary encoding methods are covered! Choosing the right encoding method can significantly impact the performance of your machine learning models.&lt;/p&gt;

</description>
      <category>python</category>
      <category>machinelearning</category>
      <category>ai</category>
      <category>datascience</category>
    </item>
    <item>
      <title>If I could, THIS is how I will learn Python again 🐍</title>
      <dc:creator>krish</dc:creator>
      <pubDate>Sun, 26 Nov 2023 11:09:45 +0000</pubDate>
      <link>https://forem.com/krishsharma0413/if-i-could-this-is-how-i-will-learn-python-again-1k67</link>
      <guid>https://forem.com/krishsharma0413/if-i-could-this-is-how-i-will-learn-python-again-1k67</guid>
      <description>&lt;p&gt;&lt;strong&gt;tldr;&lt;/strong&gt; &lt;a href="https://docs.python.org/3/tutorial/index.html" rel="noopener noreferrer"&gt;official python docs&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Unlocking the door to knowledge requires the right keys: curiosity, persistence, and a sprinkle of passion. Learning isn't a task, it's an adventure waiting to be embraced.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Python being one of the most used language in programming is surprisingly one of the easiest to learn as well.&lt;/p&gt;

&lt;p&gt;Before we start, we need to set you up with your Python interpreter as well as the IDE you will be using.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Windows Users
&lt;/h3&gt;

&lt;p&gt;Go to &lt;a href="https://www.python.org/downloads/" rel="noopener noreferrer"&gt;https://www.python.org/downloads/&lt;/a&gt; and find the version that fits you (or is required for your project/company). If you are looking for the latest version then the &lt;strong&gt;Download Python 3.12.0&lt;/strong&gt; is all you need to get started.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqhl67qbc6pxbaknhuy8q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqhl67qbc6pxbaknhuy8q.png" alt="Screenshot of python.org page" width="800" height="241"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After downloading the setup you need to install it on your PC. &lt;br&gt;
I recommend checking the checkbox that asks you to add python to PATH variable.&lt;/p&gt;
&lt;h3&gt;
  
  
  Linux Users
&lt;/h3&gt;

&lt;p&gt;Python is already installed in Linux but usually it's the Python2 instead of the new Python3&lt;br&gt;
Although, you can use the terminal to install the latest version as well. on Ubuntu it is.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo apt-get update
sudo apt-get install python3.12
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  MAC Users
&lt;/h3&gt;

&lt;p&gt;In general, the Mac OS x systems have already installed a version of python, which can be used.&lt;br&gt;
Use this official website to download python.&lt;br&gt;
&lt;/p&gt;
&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
        &lt;div class="c-embed__cover"&gt;
          &lt;a href="https://www.python.org/downloads/macos/" class="c-link align-middle" rel="noopener noreferrer"&gt;
            &lt;img alt="" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.python.org%2Fstatic%2Fopengraph-icon-200x200.png" height="200" class="m-0" width="200"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="c-embed__body"&gt;
        &lt;h2 class="fs-xl lh-tight"&gt;
          &lt;a href="https://www.python.org/downloads/macos/" rel="noopener noreferrer" class="c-link"&gt;
            
Python Releases for macOS | Python.org

          &lt;/a&gt;
        &lt;/h2&gt;
          &lt;p class="truncate-at-3"&gt;
            The official home of the Python Programming Language
          &lt;/p&gt;
        &lt;div class="color-secondary fs-s flex items-center"&gt;
            &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.python.org%2Fstatic%2Ffavicon.ico" width="48" height="48"&gt;
          python.org
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;



&lt;p&gt;If you want another, you can use brew.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;brew install pyenv
pyenv install 3.12.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;NOTE:&lt;/strong&gt; Most of the time you don't really need to use the terminal to install python and you can get away with it using the python.org website itself.&lt;/p&gt;



&lt;h2&gt;
  
  
  Which IDE?
&lt;/h2&gt;

&lt;p&gt;IDK Just use whatever you like...&lt;br&gt;
Personally, I use &lt;a href="https://code.visualstudio.com/" rel="noopener noreferrer"&gt;VSCode&lt;/a&gt;. Don't ask why. I just use it.&lt;br&gt;
If you really are serious about Python then maybe it's better to choose a good IDE from the start and use that and learn its shortcuts since shortcuts help you speed up your productivity a lot than what you think it does.&lt;br&gt;
Following are some of the IDEs out there.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Visual Studio Code
&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
        &lt;div class="c-embed__cover"&gt;
          &lt;a href="https://code.visualstudio.com/" class="c-link align-middle" rel="noopener noreferrer"&gt;
            &lt;img alt="" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcode.visualstudio.com%2Fopengraphimg%2Fopengraph-home.png" height="420" class="m-0" width="800"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="c-embed__body"&gt;
        &lt;h2 class="fs-xl lh-tight"&gt;
          &lt;a href="https://code.visualstudio.com/" rel="noopener noreferrer" class="c-link"&gt;
            Visual Studio Code - The open source AI code editor
          &lt;/a&gt;
        &lt;/h2&gt;
          &lt;p class="truncate-at-3"&gt;
            Visual Studio Code redefines AI-powered coding with GitHub Copilot for building and debugging modern web and cloud applications. Visual Studio Code is free and available on your favorite platform - Linux, macOS, and Windows.
          &lt;/p&gt;
        &lt;div class="color-secondary fs-s flex items-center"&gt;
            &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcode.visualstudio.com%2Fassets%2Ffavicon.ico" width="256" height="256"&gt;
          code.visualstudio.com
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


&lt;/li&gt;

&lt;li&gt;Pycharm

&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
        &lt;div class="c-embed__cover"&gt;
          &lt;a href="https://www.jetbrains.com/pycharm/" class="c-link align-middle" rel="noopener noreferrer"&gt;
            &lt;img alt="" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fresources.jetbrains.com%2Fstorage%2Fproducts%2Fpycharm%2Fimg%2Fmeta%2Fpreview.png" height="450" class="m-0" width="800"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="c-embed__body"&gt;
        &lt;h2 class="fs-xl lh-tight"&gt;
          &lt;a href="https://www.jetbrains.com/pycharm/" rel="noopener noreferrer" class="c-link"&gt;
            PyCharm: The only Python IDE you need

          &lt;/a&gt;
        &lt;/h2&gt;
          &lt;p class="truncate-at-3"&gt;
            Built for web, data, and AI/ML professionals. Supercharged with an AI-enhanced IDE experience.

          &lt;/p&gt;
        &lt;div class="color-secondary fs-s flex items-center"&gt;
            &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.jetbrains.com%2Ffavicon.ico%3Fr%3D1234" width="64" height="64"&gt;
          jetbrains.com
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


&lt;/li&gt;

&lt;li&gt;Spyder

&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
      &lt;div class="c-embed__body flex items-center justify-between"&gt;
        &lt;a href="https://www.spyder-ide.org/" rel="noopener noreferrer" class="c-link fw-bold flex items-center"&gt;
          &lt;span class="mr-2"&gt;spyder-ide.org&lt;/span&gt;
          

        &lt;/a&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


&lt;/li&gt;

&lt;li&gt;Thonny

&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
      &lt;div class="c-embed__body flex items-center justify-between"&gt;
        &lt;a href="https://thonny.org/" rel="noopener noreferrer" class="c-link fw-bold flex items-center"&gt;
          &lt;span class="mr-2"&gt;thonny.org&lt;/span&gt;
          

        &lt;/a&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


&lt;/li&gt;

&lt;/ul&gt;



&lt;h2&gt;
  
  
  Reading Material
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Official Python Documentation
&lt;/h3&gt;

&lt;p&gt;Nothing is as good as the &lt;strong&gt;documentation of python&lt;/strong&gt; itself. It is &lt;strong&gt;slow paced&lt;/strong&gt; and teaches everything there is in python properly.&lt;br&gt;
&lt;/p&gt;
&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
        &lt;div class="c-embed__cover"&gt;
          &lt;a href="https://docs.python.org/3/tutorial/index.html" class="c-link align-middle" rel="noopener noreferrer"&gt;
            &lt;img alt="" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdocs.python.org%2F3.14%2F_images%2Fsocial_previews%2Fsummary_tutorial_index_4224eef5.png" height="418" class="m-0" width="800"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="c-embed__body"&gt;
        &lt;h2 class="fs-xl lh-tight"&gt;
          &lt;a href="https://docs.python.org/3/tutorial/index.html" rel="noopener noreferrer" class="c-link"&gt;
            The Python Tutorial — Python 3.14.2 documentation
          &lt;/a&gt;
        &lt;/h2&gt;
          &lt;p class="truncate-at-3"&gt;
            Python is an easy to learn, powerful programming language. It has efficient high-level data structures and a simple but effective approach to object-oriented programming. Python’s elegant syntax an...
          &lt;/p&gt;
        &lt;div class="color-secondary fs-s flex items-center"&gt;
            &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdocs.python.org%2F3%2F_static%2Fpy.svg" width="16" height="16"&gt;
          docs.python.org
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


&lt;h3&gt;
  
  
  Python Playground
&lt;/h3&gt;

&lt;p&gt;Maybe  you are not into just plain text reading and you want some practice while you learn. For that you can use the following link. It has a &lt;strong&gt;playground&lt;/strong&gt; feature where you can practice with an online interpreter as well.&lt;br&gt;
&lt;a href="https://www.learnpython.org/" rel="noopener noreferrer"&gt;https://www.learnpython.org/&lt;/a&gt;&lt;br&gt;
Liquid error: internal&lt;/p&gt;



&lt;h2&gt;
  
  
  Lecture Material
&lt;/h2&gt;
&lt;h3&gt;
  
  
  YouTube
&lt;/h3&gt;

&lt;p&gt;YouTube is filled with multiple videos that teaches you python, Some being &lt;a href="https://www.youtube.com/watch?v=XKHEtdqhLK8" rel="noopener noreferrer"&gt;12 hours&lt;/a&gt; long while some takes only &lt;a href="https://www.youtube.com/watch?v=x7X9w_GIm1s" rel="noopener noreferrer"&gt;100 seconds&lt;/a&gt;. Personally i recommend looking into &lt;strong&gt;&lt;a href="https://www.youtube.com/watch?v=XKHEtdqhLK8" rel="noopener noreferrer"&gt;this&lt;/a&gt;&lt;/strong&gt;&lt;br&gt;
&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/XKHEtdqhLK8"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;h3&gt;
  
  
  Paid Courses
&lt;/h3&gt;

&lt;p&gt;One of the best paid &lt;strong&gt;python Specialization&lt;/strong&gt; out there.&lt;br&gt;
&lt;/p&gt;
&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
        &lt;div class="c-embed__cover"&gt;
          &lt;a href="https://www.coursera.org/specializations/python" class="c-link align-middle" rel="noopener noreferrer"&gt;
            &lt;img alt="" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs3.amazonaws.com%2Fcoursera_assets%2Fmeta_images%2Fgenerated%2FXDP%2FXDP~SPECIALIZATION%21~python%2FXDP~SPECIALIZATION%21~python.jpeg" height="418" class="m-0" width="800"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="c-embed__body"&gt;
        &lt;h2 class="fs-xl lh-tight"&gt;
          &lt;a href="https://www.coursera.org/specializations/python" rel="noopener noreferrer" class="c-link"&gt;
            Python for Everybody | Coursera
          &lt;/a&gt;
        &lt;/h2&gt;
          &lt;p class="truncate-at-3"&gt;
            Offered by University of Michigan. Learn to Program and Analyze Data with Python. Develop programs to gather, clean, analyze, and visualize ... Enroll for free.
          &lt;/p&gt;
        &lt;div class="color-secondary fs-s flex items-center"&gt;
            &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fd3njjcbhbojbot.cloudfront.net%2Fweb%2Fimages%2Ffavicons%2Ffavicon-v2-194x194.png" width="194" height="194"&gt;
          coursera.org
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;




&lt;h2&gt;
  
  
  Further Advancing in Python
&lt;/h2&gt;

&lt;p&gt;Python is all about its library so learning how to use in-built libraries is a plus. I will recommend you to learn libraries like asyncio, math, datetime, time, hashlib, itertools, functools and unittest can greatly help you with your Competitive Programming or open source career.&lt;/p&gt;

&lt;p&gt;Refer to the following link for official documentation on &lt;strong&gt;in-built libraries&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;
&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
      &lt;div class="c-embed__body flex items-center justify-between"&gt;
        &lt;a href="https://docs.python.org/3/py-modindex.html" rel="noopener noreferrer" class="c-link fw-bold flex items-center"&gt;
          &lt;span class="mr-2"&gt;docs.python.org&lt;/span&gt;
          

        &lt;/a&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;Learn how to &lt;strong&gt;install external libraries using pip&lt;/strong&gt; from here.&lt;br&gt;
&lt;/p&gt;
&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
        &lt;div class="c-embed__cover"&gt;
          &lt;a href="https://docs.python.org/3/installing/index.html" class="c-link align-middle" rel="noopener noreferrer"&gt;
            &lt;img alt="" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdocs.python.org%2F3.14%2F_images%2Fsocial_previews%2Fsummary_installing_index_332aefa8.png" height="418" class="m-0" width="800"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="c-embed__body"&gt;
        &lt;h2 class="fs-xl lh-tight"&gt;
          &lt;a href="https://docs.python.org/3/installing/index.html" rel="noopener noreferrer" class="c-link"&gt;
            Installing Python Modules — Python 3.14.2 documentation
          &lt;/a&gt;
        &lt;/h2&gt;
          &lt;p class="truncate-at-3"&gt;
            Email, distutils-sig@python.org,. As a popular open source development project, Python has an active supporting community of contributors and users that also make their software available for other...
          &lt;/p&gt;
        &lt;div class="color-secondary fs-s flex items-center"&gt;
            &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdocs.python.org%2F3%2F_static%2Fpy.svg" width="16" height="16"&gt;
          docs.python.org
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;Something quite important when it comes to programming is learning how about &lt;strong&gt;Data Structures and Algorithm&lt;/strong&gt;. Although, I won’t recommend python for CP.&lt;br&gt;
&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/pkYVOmU3MgA"&gt;
&lt;/iframe&gt;
&lt;/p&gt;



&lt;h2&gt;
  
  
  Practice Practice Practice...
&lt;/h2&gt;

&lt;p&gt;Without practice you won’t be able to remember what you learnt. You can look into GitHub for open-source projects to contribute into to improve your documentation and unit testing skills or you can look into the following websites to practice via problem questions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Codewars:&lt;/strong&gt; Good for both &lt;strong&gt;beginners as well as professionals&lt;/strong&gt;. The website is filled with many questions and you can practice and get better in python using the website.&lt;br&gt;
&lt;/p&gt;
&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
      &lt;div class="c-embed__body flex items-center justify-between"&gt;
        &lt;a href="https://www.codewars.com" rel="noopener noreferrer" class="c-link fw-bold flex items-center"&gt;
          &lt;span class="mr-2"&gt;codewars.com&lt;/span&gt;
          

        &lt;/a&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;CodeBat:&lt;/strong&gt; Code practice website. Quite basic but gets the job done.&lt;br&gt;
&lt;/p&gt;
&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
      &lt;div class="c-embed__body flex items-center justify-between"&gt;
        &lt;a href="https://codingbat.com/python" rel="noopener noreferrer" class="c-link fw-bold flex items-center"&gt;
          &lt;span class="mr-2"&gt;codingbat.com&lt;/span&gt;
          

        &lt;/a&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;I personally don’t recommend &lt;strong&gt;leetcode&lt;/strong&gt; or &lt;strong&gt;hackerrank&lt;/strong&gt; until you have completed &lt;strong&gt;DSA&lt;/strong&gt;.&lt;/p&gt;




&lt;p&gt;After all this, the journey in python isn’t still completed. There is a lot to learn even now. Mainly domain specific libraries.&lt;/p&gt;

&lt;p&gt;I am a python developer with over 4 years of experience. You can find me &lt;a href="https://linktr.ee/krishsharma0413" rel="noopener noreferrer"&gt;here&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>beginners</category>
      <category>tutorial</category>
      <category>learning</category>
    </item>
  </channel>
</rss>
