<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Likith N</title>
    <description>The latest articles on Forem by Likith N (@likith_n).</description>
    <link>https://forem.com/likith_n</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3764263%2F4b87f404-6700-480e-b020-1a0eb951d4d4.jpg</url>
      <title>Forem: Likith N</title>
      <link>https://forem.com/likith_n</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/likith_n"/>
    <language>en</language>
    <item>
      <title>AutoCleanML – Intelligent ML Data preprocessing automation (pip install autocleanml)</title>
      <dc:creator>Likith N</dc:creator>
      <pubDate>Tue, 10 Feb 2026 12:52:38 +0000</pubDate>
      <link>https://forem.com/likith_n/autocleanml-intelligent-ml-data-preprocessing-automation-pip-install-autocleanml-1g2k</link>
      <guid>https://forem.com/likith_n/autocleanml-intelligent-ml-data-preprocessing-automation-pip-install-autocleanml-1g2k</guid>
      <description>&lt;p&gt;If you’ve ever built a machine learning project, you already know the truth:&lt;br&gt;
&lt;strong&gt;80% of ML work is data cleaning.&lt;br&gt;
And 80% of that cleaning is… repetitive.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Handling missing values, encoding categoricals, scaling features, fixing data types — every new dataset, same boilerplate, different notebook.&lt;/p&gt;

&lt;p&gt;After repeating this cycle one too many times, I decided to automate it.&lt;br&gt;
That’s how AutoCleanML was born&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Problem I Faced&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;As a student working on multiple ML projects and datasets, I noticed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Writing the same preprocessing code again and again&lt;/li&gt;
&lt;li&gt;Inconsistent cleaning logic across projects&lt;/li&gt;
&lt;li&gt;Hard-to-maintain notebooks&lt;/li&gt;
&lt;li&gt;Beginners getting stuck before even training a model&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;I wanted something that:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Works out of the box&lt;/li&gt;
&lt;li&gt;Follows best practices&lt;/li&gt;
&lt;li&gt;Is modular, reusable, and simple&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;✨ Introducing AutoCleanML&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AutoCleanML&lt;/strong&gt; is a &lt;strong&gt;Python library&lt;/strong&gt; that helps you automatically clean and preprocess datasets for machine learning with minimal code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It’s built for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Students&lt;/li&gt;
&lt;li&gt;ML beginners&lt;/li&gt;
&lt;li&gt;Data science interns&lt;/li&gt;
&lt;li&gt;Anyone tired of rewriting preprocessing logic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Using AutoCleanML&lt;/strong&gt;&lt;br&gt;
With AutoCleanML, you can go from a raw dataset to train-test splits in just a few lines.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import pandas as pd
from autocleanml import AutoCleanML

# Load dataset
df = pd.read_csv("data.csv")

# Initialize cleaner
cleaner = AutoCleanML(target_column="target")

# Clean data and split automatically
X_train, X_test, y_train, y_test, report = cleaner.fit_transform(df)

# Check preprocessing summary
print(report)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If data cleaning feels repetitive or slows down your ML projects, give AutoCleanML a try and see how much time it saves you.&lt;/p&gt;

&lt;p&gt;🔗 GitHub: &lt;a href="https://github.com/likith-n/AutoCleanML" rel="noopener noreferrer"&gt;https://github.com/likith-n/AutoCleanML&lt;/a&gt;&lt;br&gt;
📦 PyPI: &lt;a href="https://pypi.org/project/AutoCleanML/" rel="noopener noreferrer"&gt;https://pypi.org/project/AutoCleanML/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I’d genuinely love feedback from the community — whether it’s ideas, issues, or improvements.&lt;br&gt;
If you find it useful, consider ⭐ starring the repo or sharing the post so others can benefit too.&lt;/p&gt;

&lt;p&gt;Open source grows through people, not just code ❤️&lt;br&gt;
Happy cleaning &amp;amp; happy modeling!&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>python</category>
      <category>automation</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
