DEV Community

Cover image for 🔍 Data Cleaning Techniques Using SQL
Mohammed jobair Hossain
Mohammed jobair Hossain

Posted on

1

🔍 Data Cleaning Techniques Using SQL

Data cleaning is a critical step in the data preparation process. Whether it is analytics, business intelligence, or data engineering, clean data ensures more precise and reliable insights.

1️⃣ Convert Text to Lower/Upper Case
Ensure consistency in categorical fields like names or categories.

-- Convert to lowercase
SELECT LOWER(column_name) AS cleaned_column FROM table_name;

-- Convert to uppercase
SELECT UPPER(column_name) AS cleaned_column FROM table_name;

Enter fullscreen mode Exit fullscreen mode

2️⃣ Remove Extra Spaces from Text Fields Trim leading/trailing spaces using TRIM()

SELECT TRIM(column_name) AS cleaned_column FROM table_name;

Enter fullscreen mode Exit fullscreen mode

3️⃣ Convert Date Strings to a Consistent Format
Transform text-based dates into a usable date format:

SELECT STR_TO_DATE(column_name, '%m/%d/%Y') AS formatted_date FROM table_name;

Enter fullscreen mode Exit fullscreen mode

4️⃣Identify & Manage Outliers

Filter numeric values within a defined range:

SELECT * FROM table_name 
WHERE column_name BETWEEN lower_limit AND upper_limit;

Enter fullscreen mode Exit fullscreen mode

5️⃣ Remove Special Characters

Strip out unwanted symbols using regular expressions:

SELECT REGEXP_REPLACE(column_name, '[^a-zA-Z0-9 ]', '') AS cleaned_column 
FROM table_name;

Enter fullscreen mode Exit fullscreen mode

6️⃣ Standardize Categorical Values

Unify inconsistent text representations:

UPDATE table_name
SET column_name = 'Male'
WHERE column_name IN ('M', 'male');

Enter fullscreen mode Exit fullscreen mode

7️⃣ Replace NULLs with Default Values

SELECT COALESCE(column_name, 'DefaultValue') AS column_name 
FROM table_name;

Enter fullscreen mode Exit fullscreen mode

8️⃣ Delete Duplicate Rows

WITH CTE AS (
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY column1, column2 ORDER BY id) AS row_num
    FROM table_name
)
DELETE FROM table_name
WHERE id IN (
    SELECT id FROM CTE WHERE row_num > 1
);

Enter fullscreen mode Exit fullscreen mode

💡 Clean data = better insights.
SQL makes it easy to standardize, validate, and transform your data—right where it lives.

If you found this helpful, feel free to share or drop a comment with your favorite data cleaning tip! 💬

Runner H image

Overwhelmed? Let an AI Handle Your Tasks

Runner H clears your inbox, summarizes Slack threads, and plans your week — without you lifting a finger. You delegate once. It handles the rest.

Try Runner H

Top comments (0)

Feature flag article image

Create a feature flag in your IDE in 5 minutes with LaunchDarkly’s MCP server ⏰

How to create, evaluate, and modify flags from within your IDE or AI client using natural language with LaunchDarkly's new MCP server. Follow along with this tutorial for step by step instructions.

Read full post

👋 Kindness is contagious

Dive into this thoughtful piece, beloved in the supportive DEV Community. Coders of every background are invited to share and elevate our collective know-how.

A sincere "thank you" can brighten someone's day—leave your appreciation below!

On DEV, sharing knowledge smooths our journey and tightens our community bonds. Enjoyed this? A quick thank you to the author is hugely appreciated.

Okay