asked on

Keep only one row in database with multiple duplicate values

Hello Experts,

I have a table with about 900k+ postal codes and lat/long values in different countries. I am trying to create a geospatial index and use country+postalcode+longitude for it. Unfortunately there are many overlapping values where one postalcode has multiple small cities in the same latitude. I can select them like this:

SELECT ta.zipcode,
       ta.city   
      ,ta.country
      ,ta.longitude
FROM [global_geo].[dbo].[ZipCodes] ta
WHERE (SELECT COUNT(*)
       FROM [global_geo].[dbo].[ZipCodes] ta2
       WHERE ta.zipcode=ta2.zipcode
       AND ta.country=ta2.country
       AND ta.longitude=ta2.longitude
      )>1

Open in new window

Now I would like to keep only the FIRST result of each double. How do I do that?

Thank you!

ASKER CERTIFIED SOLUTION

knsp

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

nikomanek

ASKER

Where do I put the TOP 1? I tried in different places and got a syntax error. What I ment by keeping the first in the DB is deleting the doubles.

ggzfab

Why not use a Group By like:

SELECT ta.zipcode,
       ta.city   
      ,ta.country
      ,Min(ta.longitude) as MinLat
      ,Max(ta.longitude) as MaxLat
FROM [global_geo].[dbo].[ZipCodes] ta
GROUP BY  ta.zipcode,
       ta.city   
      ,ta.country

Open in new window

This will transform the single longitude into two similar values and the dupes (2, 3, etc.) into the MIN and MAX value found.