DEV Community

Judy
Judy

Posted on

3 1 1 1 1

Deduplicate between an uncertain number of columns:SQL VS SPL #15

A certain database table has multiple fields, each storing an email address, which may be null or duplicated.

Image description
Now we need to merge multiple fields into one, remove null and duplicate values, and merge them with commas.

Image description
In a database/computing platform that supports functions such as array, for example, in azure-databricks, SQL can be written as follows:

SELECT concat_ws(',', array_distinct(array(Mail_1, Mail_2, Mail_3, Mail_4, Mail_5)), NULL) AS Mail
FROM my_data
Enter fullscreen mode Exit fullscreen mode

SQL Server and other databases do not have an array function, and generally require indirect implementation using statements such as cross apply:

SELECT 
    STRING_AGG(DistinctEmails.Mail, ',') AS Mail
FROM my_data
CROSS APPLY (
    SELECT DISTINCT Mail
    FROM (VALUES (Mail_1), (Mail_2), (Mail_3), (Mail_4), (Mail_5)) AS EmailList(Mail)
    WHERE Mail IS NOT NULL
) AS DistinctEmails
GROUP BY my_data.Mail_1, my_data.Mail_2, my_data.Mail_3, my_data.Mail_4, my_data.Mail_5
Enter fullscreen mode Exit fullscreen mode

SQL must write column names, which is inflexible. To deduplicate between an uncertain number of columns, it is necessary to dynamically generate column names using stored procedures and then execute SQL, which will make the architecture more complex. SPL does not need to write column names, and the code is the same for different data sources:
Try.Demo

Image description
A1: Load data.

A2: Take the current record, convert it into a set, take unique values and remove null, and merge it with commas. ~ indicates the current record, function ID is used for deduplication, @0 means null is removed.

esProc SPL is free,Download

Heroku

Built for developers, by developers.

Whether you're building a simple prototype or a business-critical product, Heroku's fully-managed platform gives you the simplest path to delivering apps quickly — using the tools and languages you already love!

Learn More

Top comments (0)

ITRS image

See What Users Experience in The Browser — Anywhere, Anytime

Simulate logins, checkouts, and payments on SaaS, APIs, and internal apps. Catch issues early, baseline web performance, and stay ahead of incidents. Easily record user journeys right from your browser.

Start Free Trial