asked on

SQL query to merge near dupes and preserve Ids

I need a query to merge near-identical rows in SQL, but retain the IDs of the removed rows in a delimited list in an additional column.

My source data looks something like this (let's assume all but the Id column is an exact match):

18047        1 3rd St      Suite A      Hometown      NJ      08000
1045136        1 3rd St        Suite A      Hometown      NJ      08000
2321        1 3rd St        Suite A      Hometown      NJ      08000
3311        1 3rd St        Suite A      Hometown      NJ      08000
3681        1 3rd St        Suite A      Hometown      NJ      08000
1750 1 Baker Rd      Ste C      Happy Hill      NJ      08111
1822        1 Baker Rd      Ste C      Happy Hill      NJ      08111
1935        1 Baker Rd      Ste C      Happy Hill      NJ      08111

The results I'm looking for are:
18047      1 3rd St        Suite A      Hometown      NJ      08000 1045136|2321|3311|3681
1750      1 Baker Rd        Ste C        Happy Hill      NJ      08111 1822|1935

-- OR --

Including the persisting Id in the list would be even better:
18047      1 3rd St        Suite A      Hometown      NJ      08000 18047|1045136|2321|3311|3681
1750      1 Baker Rd        Ste C        Happy Hill      NJ      08111 1750|1822|1935

Thanks!

Jan Louwerens

SELECT
   ID, Address_1, Address_2, City, State, Zip_Code, Ids
FROM
   Your_Table_Name,
   (
      SELECT
         ID AS First_ID,
         LISTAGG(ID, '|') WITHIN GROUP (ORDER BY ID) OVER (PARTITION BY Address_1, Address_2, City, State, Zip_Code) AS Ids,
         ROW_NUMBER() OVER (PARTITION BY Address_1, Address_2, City, State, Zip_Code ORDER BY ID) AS Idx
      FROM
         Your_Table_Name
   )
WHERE
   ID = First_ID AND
   Idx = 1
ORDER BY ID

Open in new window

You may need to use STRING_AGG instead of LISTAGG.

Also, make sure you use your actual table name (replace Your_Table_Name) and column names (replace Address_1, Address_2, City, State, Zip_Code)

ASKER CERTIFIED SOLUTION

PortletPaul

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Sailing_12

ASKER

Using the LISTAGG query above throws the following error: The function 'LISTAGG' may not have a WITHIN GROUP clause.

Using STRING_AGG throws: 'string_agg' is not a recognized built-in function name.

Sailing_12

ASKER

PortletPaul - your first query is working, except that I get no IdLIst in cases where address2 is null (didn't show this case in my examples but expected) how do I account for no address2?

Sailing_12

ASKER

This seems to be working:

SELECT
    *
FROM (
    SELECT DISTINCT
        address1
      , isnull(address2, '') as address2
      , city
      , state
      , zip
    FROM mytable
) d
CROSS APPLY (
    SELECT
        STUFF((
            SELECT
                '| ' + CAST(ID AS varchar)
            FROM mytable AS t
            WHERE d.address1 = t.address1
            AND d.address2 = isnull(t.address2, '')
            AND d.city = t.city
            AND d.state = t.state
            AND d.zip = t.zip
            ORDER BY t.id
            FOR xml PATH ('')
        )
        , 1, 1, '')
) ca (IDList)

Open in new window