Introduction
Writing reports in SQL using report builder can be a lot of fun, but there are any ways to check your reporting to ensure you are doing it correctly. Data normalization helps reduce redundancy and improve data integrity by structuring data across related tables. However, during or after normalization, data quality issues such as missing references, null values, incorrect data types, or logic inconsistencies can arise.
This article outlines how to build SQL-based reports that detect and quantify data anomalies during the normalization process—using common SQL patterns like NOT EXISTS
, IS NULL
, CASE
, and boolean-style flags (1 = true
, 0 = false
) for easy integration with dashboards and automated systems.
Objectives
Identify missing or orphaned records.
Detect nulls or incomplete critical fields.
Report data type mismatches using
CASE
.Flag inconsistent or invalid logic with
1
(true) /0
(false) values for reporting.
Common Reporting Patterns
1. Detect Orphaned Records Using NOT EXISTS
Find Orders
referencing non-existent customers.
SELECT o.OrderID, o.CustomerID FROM Orders o WHERE NOT EXISTS ( SELECT 1 FROM Customers c WHERE c.CustomerID = o.CustomerID);
2. Detect Nulls in Required Fields
Check for incomplete records after normalization.
SELECT CustomerID, FirstName, LastName, CASE WHEN FirstName IS NULL OR LastName IS NULL THEN 1 ELSE 0 END AS MissingNameFlag FROM Customers;
This MissingNameFlag
can be used to easily filter or aggregate issues in reports.
Using CASE
to Address Data Type or Logic Issues
3. Data Stored as Text but Expected as Numeric or Boolean
Example: A Yes/No
field stored as text ('yes', 'no', 'y', 'n', NULL)
SELECT CustomerID, PreferredContact, CASE WHEN LOWER(PreferredContact) IN ('yes', 'y') THEN 1 WHEN LOWER(PreferredContact) IN ('no', 'n') THEN 0 ELSE NULL END AS PreferredContactFlag FROM Customers;
You can use this pattern to normalize inconsistent values into a 1/0
format for dashboards or analytics.
4. Flagging Invalid Date or Numeric Values Stored as Strings
Example: A ZIP code column that should be 5 digits but contains alphabetic characters.
SELECT CustomerID, ZipCode, CASE WHEN TRY_CAST(ZipCode AS INT) IS NULL THEN 1 ELSE 0 END AS InvalidZipCodeFlag FROM Customers;
Use TRY_CAST
or ISNUMERIC()
(in SQL Server) to detect improper data formats.
5. Identify Duplicate Keys or Unexpected Multiples
SELECT EmailAddress, COUNT(*) AS Occurrences, CASE WHEN COUNT(*) > 1 THEN 1 ELSE 0 END AS DuplicateFlag FROM Customers GROUP BY EmailAddress;
Example: Consolidated Data Integrity Report
SELECT o.OrderID,
o.CustomerID,
c.CustomerID AS FoundCustomer, CASE WHEN c.CustomerID IS NULL THEN 1 ELSE 0 END AS OrphanedOrderFlag, CASE WHEN TRY_CAST(o.TotalAmount AS FLOAT) IS NULL THEN 1 ELSE 0 END AS InvalidAmountFlag, CASE WHEN o.ShipDate IS NULL THEN 1 ELSE 0 END AS MissingShipDateFlag FROM Orders o LEFT JOIN Customers c ON o.CustomerID = c.CustomerID;
Automating the Reporting Pipeline
Output results to dashboards: Power BI, Grafana, Tableau, etc., using the
1/0
flags.Schedule reports via SQL Agent or cron jobs.
Send flagged rows via email or log into audit tables.
Best Practices
Always enforce foreign key constraints and check constraints where applicable.
Use
TRY_CAST
to catch invalid data without throwing errors.Standardize output with boolean-style flags (
1
for true/problem,0
for false/clean).Centralize repeated logic in views or stored procedures.
Conclusion
By integrating CASE
statements, TRY_CAST
, and true/false flags, your SQL-based data integrity reports become significantly more actionable and dashboard-friendly. These enhancements allow technical and non-technical stakeholders to quickly understand where data integrity issues exist and respond effectively.
Top comments (0)