SQL query to merge near dupes and preserve Ids

Sailing_12
Sailing_12 used Ask the Experts™
on
I need a query to merge near-identical rows in SQL, but retain the IDs of the removed rows in a delimited list in an additional column.

My source data looks something like this (let's assume all but the Id column is an exact match):

18047             1 3rd St      Suite A      Hometown      NJ      08000
1045136             1 3rd St              Suite A      Hometown      NJ      08000
2321             1 3rd St              Suite A      Hometown      NJ      08000
3311             1 3rd St              Suite A      Hometown      NJ      08000
3681             1 3rd St              Suite A      Hometown      NJ      08000
1750              1 Baker Rd      Ste C      Happy Hill      NJ      08111
1822             1 Baker Rd      Ste C      Happy Hill      NJ      08111
1935             1 Baker Rd      Ste C      Happy Hill      NJ      08111

The results I'm looking for are:
18047      1 3rd St                  Suite A      Hometown      NJ      08000   1045136|2321|3311|3681
1750      1 Baker Rd          Ste C              Happy Hill      NJ      08111   1822|1935

-- OR --

Including the persisting Id in the list would be even better:
18047      1 3rd St                  Suite A      Hometown      NJ      08000   18047|1045136|2321|3311|3681
1750      1 Baker Rd          Ste C              Happy Hill      NJ      08111   1750|1822|1935


Thanks!
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Jan LouwerensSoftware Engineer

Commented:
SELECT
   ID, Address_1, Address_2, City, State, Zip_Code, Ids
FROM
   Your_Table_Name,
   (
      SELECT
         ID AS First_ID,
         LISTAGG(ID, '|') WITHIN GROUP (ORDER BY ID) OVER (PARTITION BY Address_1, Address_2, City, State, Zip_Code) AS Ids,
         ROW_NUMBER() OVER (PARTITION BY Address_1, Address_2, City, State, Zip_Code ORDER BY ID) AS Idx
      FROM
         Your_Table_Name
   )
WHERE
   ID = First_ID AND
   Idx = 1
ORDER BY ID

Open in new window


You may need to use STRING_AGG instead of LISTAGG.

Also, make sure you use your actual table name (replace Your_Table_Name) and column names (replace Address_1, Address_2, City, State, Zip_Code)
EE Topic Advisor
Most Valuable Expert 2014
Awarded 2013
Commented:
The query below will achieve the expected result.  It summarises the common data using "select distinct" then through a subsequent cross apply concatenates the ids into a list.  If you are using SQL Server 2017 (or later) then instead of the approach seen below you could use string_agg
SELECT
    *
FROM (
    SELECT DISTINCT
        address1
      , address2
      , city
      , state
      , zip
    FROM mytable
) d
CROSS APPLY (
    SELECT
        STUFF((
            SELECT
                '| ' + CAST(ID AS varchar)
            FROM mytable AS t
            WHERE d.address1 = t.address1
            AND d.address2 = t.address2
            AND d.city = t.city
            AND d.state = t.state
            AND d.zip = t.zip
            ORDER BY t.id
            FOR xml PATH ('')
        )
        , 1, 1, '')
) ca (IDList)

Open in new window

|    |  address1  | address2 |    city    | state | zip  |              IDList               |
|----|------------|----------|------------|-------|------|-----------------------------------|
|  1 | 1 3rd St   | Suite A  | Hometown   | NJ    | 8000 |  2321| 3311| 3681| 18047| 1045136 |
|  2 | 1 Baker Rd | Ste C    | Happy Hill | NJ    | 8111 |  1750| 1822| 1935                 |

Open in new window

Please note I had to assume all column names and the table name. In future questions please include the relevant names as it will be easier for all concerned.
CREATE TABLE mytable(
   ID       INTEGER  NOT NULL PRIMARY KEY 
  ,ADDRESS1 VARCHAR(19) NOT NULL
  ,ADDRESS2 VARCHAR(23) NOT NULL
  ,CITY     VARCHAR(18) NOT NULL
  ,STATE    VARCHAR(10) NOT NULL
  ,ZIP      INTEGER  NOT NULL
);
INSERT INTO mytable(ID,ADDRESS1,ADDRESS2,CITY,STATE,ZIP) VALUES (18047,'1 3rd St','Suite A','Hometown','NJ',08000);
INSERT INTO mytable(ID,ADDRESS1,ADDRESS2,CITY,STATE,ZIP) VALUES (1045136,'1 3rd St','Suite A','Hometown','NJ',08000);
INSERT INTO mytable(ID,ADDRESS1,ADDRESS2,CITY,STATE,ZIP) VALUES (2321,'1 3rd St','Suite A','Hometown','NJ',08000);
INSERT INTO mytable(ID,ADDRESS1,ADDRESS2,CITY,STATE,ZIP) VALUES (3311,'1 3rd St','Suite A','Hometown','NJ',08000);
INSERT INTO mytable(ID,ADDRESS1,ADDRESS2,CITY,STATE,ZIP) VALUES (3681,'1 3rd St','Suite A','Hometown','NJ',08000);
INSERT INTO mytable(ID,ADDRESS1,ADDRESS2,CITY,STATE,ZIP) VALUES (1750,'1 Baker Rd','Ste C','Happy Hill','NJ',08111);
INSERT INTO mytable(ID,ADDRESS1,ADDRESS2,CITY,STATE,ZIP) VALUES (1822,'1 Baker Rd','Ste C','Happy Hill','NJ',08111);
INSERT INTO mytable(ID,ADDRESS1,ADDRESS2,CITY,STATE,ZIP) VALUES (1935,'1 Baker Rd','Ste C','Happy Hill','NJ',08111);

Open in new window

Using string_agg it is much simpler:
SELECT
    address1
  , address2
  , city
  , state
  , zip
  , string_agg(id, '|') IDList
FROM mytable
GROUP BY
    address1
  , address2
  , city
  , state
  , zip

Open in new window

also see https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=c0ebce24f9927a3f50b8ff81e3f9afe9
Sailing_12Pirate

Author

Commented:
Using the LISTAGG query above throws the following error:  The function 'LISTAGG' may not have a WITHIN GROUP clause.

Using STRING_AGG throws: 'string_agg' is not a recognized built-in function name.
Sailing_12Pirate

Author

Commented:
PortletPaul - your first query is working, except that I get no IdLIst in cases where address2 is null (didn't show this case in my examples but expected) how do I account for no address2?
Sailing_12Pirate

Author

Commented:
This seems to be working:

SELECT
    *
FROM (
    SELECT DISTINCT
        address1
      , isnull(address2, '') as address2
      , city
      , state
      , zip
    FROM mytable
) d
CROSS APPLY (
    SELECT
        STUFF((
            SELECT
                '| ' + CAST(ID AS varchar)
            FROM mytable AS t
            WHERE d.address1 = t.address1
            AND d.address2 = isnull(t.address2, '')
            AND d.city = t.city
            AND d.state = t.state
            AND d.zip = t.zip
            ORDER BY t.id
            FOR xml PATH ('')
        )
        , 1, 1, '')
) ca (IDList)

Open in new window

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial