DEV Community

Cover image for Creating Reports to Find Missing Data During the Normalization Process in SQL
Seth Keddy
Seth Keddy

Posted on

Creating Reports to Find Missing Data During the Normalization Process in SQL

Introduction

Writing reports in SQL using report builder can be a lot of fun, but there are any ways to check your reporting to ensure you are doing it correctly. Data normalization helps reduce redundancy and improve data integrity by structuring data across related tables. However, during or after normalization, data quality issues such as missing references, null values, incorrect data types, or logic inconsistencies can arise.

This article outlines how to build SQL-based reports that detect and quantify data anomalies during the normalization process—using common SQL patterns like NOT EXISTS, IS NULL, CASE, and boolean-style flags (1 = true, 0 = false) for easy integration with dashboards and automated systems.


Objectives

  1. Identify missing or orphaned records.

  2. Detect nulls or incomplete critical fields.

  3. Report data type mismatches using CASE.

  4. Flag inconsistent or invalid logic with 1 (true) / 0 (false) values for reporting.


Common Reporting Patterns

1. Detect Orphaned Records Using NOT EXISTS

Find Orders referencing non-existent customers.

SELECT o.OrderID, o.CustomerID FROM Orders o WHERE  NOT  EXISTS ( SELECT  1  FROM Customers c WHERE c.CustomerID = o.CustomerID); 
Enter fullscreen mode Exit fullscreen mode

2. Detect Nulls in Required Fields

Check for incomplete records after normalization.

SELECT CustomerID, FirstName, LastName, CASE  WHEN FirstName IS  NULL  OR LastName IS  NULL  THEN  1  ELSE  0  END  AS MissingNameFlag FROM Customers; 
Enter fullscreen mode Exit fullscreen mode

This MissingNameFlag can be used to easily filter or aggregate issues in reports.


Using CASE to Address Data Type or Logic Issues

3. Data Stored as Text but Expected as Numeric or Boolean

Example: A Yes/No field stored as text ('yes', 'no', 'y', 'n', NULL)

SELECT CustomerID, PreferredContact, CASE  WHEN  LOWER(PreferredContact) IN ('yes', 'y') THEN  1  WHEN  LOWER(PreferredContact) IN ('no', 'n') THEN  0  ELSE  NULL  END  AS PreferredContactFlag FROM Customers; 
Enter fullscreen mode Exit fullscreen mode

You can use this pattern to normalize inconsistent values into a 1/0 format for dashboards or analytics.


4. Flagging Invalid Date or Numeric Values Stored as Strings

Example: A ZIP code column that should be 5 digits but contains alphabetic characters.

SELECT CustomerID, ZipCode, CASE  WHEN TRY_CAST(ZipCode AS  INT) IS  NULL  THEN  1  ELSE  0  END  AS InvalidZipCodeFlag FROM Customers; 
Enter fullscreen mode Exit fullscreen mode

Use TRY_CAST or ISNUMERIC() (in SQL Server) to detect improper data formats.


5. Identify Duplicate Keys or Unexpected Multiples

SELECT EmailAddress, COUNT(*) AS Occurrences, CASE  WHEN  COUNT(*) >  1  THEN  1  ELSE  0  END  AS DuplicateFlag FROM Customers GROUP  BY EmailAddress;
Enter fullscreen mode Exit fullscreen mode

Example: Consolidated Data Integrity Report

SELECT o.OrderID,
    o.CustomerID,
    c.CustomerID AS FoundCustomer, CASE  WHEN c.CustomerID IS  NULL  THEN  1  ELSE  0  END  AS OrphanedOrderFlag, CASE  WHEN TRY_CAST(o.TotalAmount AS  FLOAT) IS  NULL  THEN  1  ELSE  0  END  AS InvalidAmountFlag, CASE  WHEN o.ShipDate IS  NULL  THEN  1  ELSE  0  END  AS MissingShipDateFlag FROM Orders o LEFT  JOIN Customers c ON o.CustomerID = c.CustomerID;
Enter fullscreen mode Exit fullscreen mode

Automating the Reporting Pipeline

  • Output results to dashboards: Power BI, Grafana, Tableau, etc., using the 1/0 flags.

  • Schedule reports via SQL Agent or cron jobs.

  • Send flagged rows via email or log into audit tables.


Best Practices

  • Always enforce foreign key constraints and check constraints where applicable.

  • Use TRY_CAST to catch invalid data without throwing errors.

  • Standardize output with boolean-style flags (1 for true/problem, 0 for false/clean).

  • Centralize repeated logic in views or stored procedures.


Conclusion

By integrating CASE statements, TRY_CAST, and true/false flags, your SQL-based data integrity reports become significantly more actionable and dashboard-friendly. These enhancements allow technical and non-technical stakeholders to quickly understand where data integrity issues exist and respond effectively.

Top comments (0)