Link to home
Start Free TrialLog in
Avatar of jose11au
jose11auFlag for Australia

asked on

SQL - Data quality records that are missing addresses

Hi All,

Could I please ask if this query is correct?

Basically I am looking for data quality issues where a record is missing address details. If either address_1, address_2 or address_3  is populated then the record is okay.

select  *
from Sales
where address_1 Is Null or address_2 Is Null or address_3 Is Null
AND
First_name is not null or Last_name is not null

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of Ryan Chong
Ryan Chong
Flag of Singapore image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of jose11au

ASKER

May I please ask what the is difference of using and not using brackets?
I am learning SQL
it's part of the arithmetic operation, just like:

(1 + 1) x 2 is different from  1 + (1 x 2)
Thank  you so much.

Avatar of slightwv (䄆 Netminder)
slightwv (䄆 Netminder)

Not sure your logic is correct.

>>where (address_1 Is Null or address_2 Is Null or address_3 Is Null)


If all addresses are null, that returns true.  So, it will return the row.

I might also suggest COALESCE.  It takes multiple columns and returns the first non null value.

Something like this:
where coalesce(address_1, address_2, address_3) Is not Null
AND
coalesce(First_name, Last_name) is not null

Open in new window

I think you need something more like this:

SELECT *, STUFF(
    CASE WHEN (address_1 IS NULL AND address_2 IS NULL AND address_3 IS NULL) THEN ', Address is missing' ELSE '' END +
    CASE WHEN (First_name IS NULL AND Last_name IS NULL) THEN ', Name is missing' ELSE '' END, 1, 2, '') AS what_data_is_bad
FROM dbo.Sales
WHERE (address_1 IS NULL AND address_2 IS NULL AND address_3 IS NULL)
OR (First_name IS NULL AND Last_name IS NULL)

For example:

;WITH Sales AS (
    SELECT * FROM (VALUES
        (1, 'address_1', NULL, NULL, 'first_name', NULL),
        (2, NULL, NULL, NULL, NULL, 'last_name'),
        (3, NULL, NULL, 'address_3', NULL, NULL),
        (4, NULL, NULL, NULL, NULL, NULL),
        (5, NULL, NULL, 'address_3', NULL, 'last_name')
    ) AS sales(id, address_1, address_2, address_3, first_name, last_name)
)        
SELECT *, STUFF(
    CASE WHEN (address_1 IS NULL AND address_2 IS NULL AND address_3 IS NULL) THEN ', Address is missing' ELSE '' END +
    CASE WHEN (First_name IS NULL AND Last_name IS NULL) THEN ', Name is missing' ELSE '' END, 1, 2, '') AS what_data_is_bad
FROM Sales
WHERE (address_1 IS NULL AND address_2 IS NULL AND address_3 IS NULL)
OR (First_name IS NULL AND Last_name IS NULL)