SQL query to merge near dupes and preserve Ids

I need a query to merge near-identical rows in SQL, but retain the IDs of the removed rows in a delimited list in an additional column.

My source data looks something like this (let's assume all but the Id column is an exact match):

18047             1 3rd St      Suite A      Hometown      NJ      08000
1045136             1 3rd St              Suite A      Hometown      NJ      08000
2321             1 3rd St              Suite A      Hometown      NJ      08000
3311             1 3rd St              Suite A      Hometown      NJ      08000
3681             1 3rd St              Suite A      Hometown      NJ      08000
1750              1 Baker Rd      Ste C      Happy Hill      NJ      08111
1822             1 Baker Rd      Ste C      Happy Hill      NJ      08111
1935             1 Baker Rd      Ste C      Happy Hill      NJ      08111

The results I'm looking for are:
18047      1 3rd St                  Suite A      Hometown      NJ      08000   1045136|2321|3311|3681
1750      1 Baker Rd          Ste C              Happy Hill      NJ      08111   1822|1935

-- OR --

Including the persisting Id in the list would be even better:
18047      1 3rd St                  Suite A      Hometown      NJ      08000   18047|1045136|2321|3311|3681
1750      1 Baker Rd          Ste C              Happy Hill      NJ      08111   1750|1822|1935


Thanks!
Sailing_12PirateAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Jan LouwerensSoftware EngineerCommented:
SELECT
   ID, Address_1, Address_2, City, State, Zip_Code, Ids
FROM
   Your_Table_Name,
   (
      SELECT
         ID AS First_ID,
         LISTAGG(ID, '|') WITHIN GROUP (ORDER BY ID) OVER (PARTITION BY Address_1, Address_2, City, State, Zip_Code) AS Ids,
         ROW_NUMBER() OVER (PARTITION BY Address_1, Address_2, City, State, Zip_Code ORDER BY ID) AS Idx
      FROM
         Your_Table_Name
   )
WHERE
   ID = First_ID AND
   Idx = 1
ORDER BY ID

Open in new window


You may need to use STRING_AGG instead of LISTAGG.

Also, make sure you use your actual table name (replace Your_Table_Name) and column names (replace Address_1, Address_2, City, State, Zip_Code)
PortletPaulEE Topic AdvisorCommented:
The query below will achieve the expected result.  It summarises the common data using "select distinct" then through a subsequent cross apply concatenates the ids into a list.  If you are using SQL Server 2017 (or later) then instead of the approach seen below you could use string_agg
SELECT
    *
FROM (
    SELECT DISTINCT
        address1
      , address2
      , city
      , state
      , zip
    FROM mytable
) d
CROSS APPLY (
    SELECT
        STUFF((
            SELECT
                '| ' + CAST(ID AS varchar)
            FROM mytable AS t
            WHERE d.address1 = t.address1
            AND d.address2 = t.address2
            AND d.city = t.city
            AND d.state = t.state
            AND d.zip = t.zip
            ORDER BY t.id
            FOR xml PATH ('')
        )
        , 1, 1, '')
) ca (IDList)

Open in new window

|    |  address1  | address2 |    city    | state | zip  |              IDList               |
|----|------------|----------|------------|-------|------|-----------------------------------|
|  1 | 1 3rd St   | Suite A  | Hometown   | NJ    | 8000 |  2321| 3311| 3681| 18047| 1045136 |
|  2 | 1 Baker Rd | Ste C    | Happy Hill | NJ    | 8111 |  1750| 1822| 1935                 |

Open in new window

Please note I had to assume all column names and the table name. In future questions please include the relevant names as it will be easier for all concerned.
CREATE TABLE mytable(
   ID       INTEGER  NOT NULL PRIMARY KEY 
  ,ADDRESS1 VARCHAR(19) NOT NULL
  ,ADDRESS2 VARCHAR(23) NOT NULL
  ,CITY     VARCHAR(18) NOT NULL
  ,STATE    VARCHAR(10) NOT NULL
  ,ZIP      INTEGER  NOT NULL
);
INSERT INTO mytable(ID,ADDRESS1,ADDRESS2,CITY,STATE,ZIP) VALUES (18047,'1 3rd St','Suite A','Hometown','NJ',08000);
INSERT INTO mytable(ID,ADDRESS1,ADDRESS2,CITY,STATE,ZIP) VALUES (1045136,'1 3rd St','Suite A','Hometown','NJ',08000);
INSERT INTO mytable(ID,ADDRESS1,ADDRESS2,CITY,STATE,ZIP) VALUES (2321,'1 3rd St','Suite A','Hometown','NJ',08000);
INSERT INTO mytable(ID,ADDRESS1,ADDRESS2,CITY,STATE,ZIP) VALUES (3311,'1 3rd St','Suite A','Hometown','NJ',08000);
INSERT INTO mytable(ID,ADDRESS1,ADDRESS2,CITY,STATE,ZIP) VALUES (3681,'1 3rd St','Suite A','Hometown','NJ',08000);
INSERT INTO mytable(ID,ADDRESS1,ADDRESS2,CITY,STATE,ZIP) VALUES (1750,'1 Baker Rd','Ste C','Happy Hill','NJ',08111);
INSERT INTO mytable(ID,ADDRESS1,ADDRESS2,CITY,STATE,ZIP) VALUES (1822,'1 Baker Rd','Ste C','Happy Hill','NJ',08111);
INSERT INTO mytable(ID,ADDRESS1,ADDRESS2,CITY,STATE,ZIP) VALUES (1935,'1 Baker Rd','Ste C','Happy Hill','NJ',08111);

Open in new window

Using string_agg it is much simpler:
SELECT
    address1
  , address2
  , city
  , state
  , zip
  , string_agg(id, '|') IDList
FROM mytable
GROUP BY
    address1
  , address2
  , city
  , state
  , zip

Open in new window

also see https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=c0ebce24f9927a3f50b8ff81e3f9afe9

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Sailing_12PirateAuthor Commented:
Using the LISTAGG query above throws the following error:  The function 'LISTAGG' may not have a WITHIN GROUP clause.

Using STRING_AGG throws: 'string_agg' is not a recognized built-in function name.
Sailing_12PirateAuthor Commented:
PortletPaul - your first query is working, except that I get no IdLIst in cases where address2 is null (didn't show this case in my examples but expected) how do I account for no address2?
Sailing_12PirateAuthor Commented:
This seems to be working:

SELECT
    *
FROM (
    SELECT DISTINCT
        address1
      , isnull(address2, '') as address2
      , city
      , state
      , zip
    FROM mytable
) d
CROSS APPLY (
    SELECT
        STUFF((
            SELECT
                '| ' + CAST(ID AS varchar)
            FROM mytable AS t
            WHERE d.address1 = t.address1
            AND d.address2 = isnull(t.address2, '')
            AND d.city = t.city
            AND d.state = t.state
            AND d.zip = t.zip
            ORDER BY t.id
            FOR xml PATH ('')
        )
        , 1, 1, '')
) ca (IDList)

Open in new window

It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Query Syntax

From novice to tech pro — start learning today.