Link to home
Start Free TrialLog in
Avatar of nikomanek
nikomanekFlag for Afghanistan

asked on

Keep only one row in database with multiple duplicate values

Hello Experts,

I have a table with about 900k+ postal codes and lat/long values in different countries. I am trying to create a geospatial index and use country+postalcode+longitude for it. Unfortunately there are many overlapping values where one postalcode has multiple small cities in the same latitude. I can select them like this:
SELECT ta.zipcode,
       ta.city   
      ,ta.country
      ,ta.longitude
FROM [global_geo].[dbo].[ZipCodes] ta
WHERE (SELECT COUNT(*)
       FROM [global_geo].[dbo].[ZipCodes] ta2
       WHERE ta.zipcode=ta2.zipcode
       AND ta.country=ta2.country
       AND ta.longitude=ta2.longitude
      )>1

Open in new window


Now I would like to keep only the FIRST result of each double. How do I do that?

Thank you!
ASKER CERTIFIED SOLUTION
Avatar of knsp
knsp

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of nikomanek

ASKER

Where do I put the TOP 1? I tried in different places and got a syntax error. What I ment by keeping the first in the DB is deleting the doubles.
Avatar of ggzfab
ggzfab

Why not use a Group By like:
SELECT ta.zipcode,
       ta.city   
      ,ta.country
      ,Min(ta.longitude) as MinLat
      ,Max(ta.longitude) as MaxLat
FROM [global_geo].[dbo].[ZipCodes] ta
GROUP BY  ta.zipcode,
       ta.city   
      ,ta.country

Open in new window

This will transform the single longitude into two similar values and the dupes (2, 3, etc.) into the MIN and MAX value found.