Removing duplicate cities in huge geo database using cfloop. Is there a faster way to do this?

I've got a huge database of all city names, long, lat, etc for every zip code in North America.  Many cities have multiple zip codes and therefore many entries.  What I'm trying to do is make my database smaller by removing all the duplicate rows based on state and city name.  This way, I will only have 1 record for each city.

Here is what I have now.  The problem is it is deathly slow.  Is there a better way I could be doing this without using Coldfusion to loop over each record?  MySQL 5, Coldfusion 9
<cfquery name="states">
SELECT states
FROM states
</cfquery>

<cfloop query="states">

<cfquery name="cities">
SELECT DISTINCT(city)as city
FROM cities
WHERE region1 = '#states.state#'
ORDER BY city
</cfquery>

<cfoutput query="cities">

<cfquery name="info">
SELECT *
FROM cities
WHERE region1 = '#state#'
AND city = '#cities.city#'
LIMIT 1
</cfquery>

<cfquery>
INSERT INTO cities_new
(country, language, regISO2, Region1, Region2, Region3, Region4, zip, city, area1, area2, lat, lng, tz, utc, dst)
VALUES
('#UCASE(info.country)#', '#UCASE(info.language)#', '#UCASE(info.regiso2)#', '#UCASE(info.region1)#', '#UCASE(info.region2)#', '#UCASE(info.region3)#', '#UCASE(info.region4)#', '#UCASE(info.zip)#', '#UCASE(info.city)#', '#UCASE(info.area1)#', '#UCASE(info.area2)#', #info.lat#, #info.lng#, '#UCASE(info.tz)#', '#UCASE(info.utc)#', '#UCASE(info.dst)#')
</cfquery>

</cfoutput>
</cfloop>

Open in new window

MFredinAsked:
Who is Participating?

[Webinar] Streamline your web hosting managementRegister Today

x
 
gdemariaConnect With a Mentor Commented:

The other easy way to do it is to select distinct city, region, etc...

List all the columns that you need (excluding the primary key)

Insert it into another tale that will generate a new primary key.

Not a good solution if you have foreign keys counting on the original primary key, but if you don't this is really fast and effective, it's also safer because a delete gone bad will remove more records than what you want.  This approach isn't deleting it is creating a new table you can test before swapping in for the old.

insert into newTempTable (column list...)
select distinct (column list....)

0
 
jet-blackCommented:
DELETE FROM your_table WHERE id NOT IN (SELECT id FROM your_table GROUP BY Region1, city)
0
 
MFredinAuthor Commented:
I tried this... but getting the following error:  You can't specify target table 'cities' for update in FROM clause

<cfquery>
DELETE
FROM cities
WHERE country = 'US'
AND id NOT IN (SELECT id FROM cities WHERE country = 'US' GROUP BY Region1, city)
</cfquery>
0
Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

 
gdemariaCommented:
> DELETE FROM your_table WHERE id NOT IN (SELECT id FROM your_table GROUP BY Region1, city)

seems to me this will not delete any records, the inner SELECT will fetch every single ID so there are no IDs not included by the inner select... therefore, no records will be removed

how about this code...

I hope you've made a backup or are testing in a copy of the table or something.. !


delete cities
WHERE country = 'US'
and ID In (select min(id)
   from cities
   group by city, region
   having count(*) > 1)

Open in new window

0
 
gdemariaCommented:
oh, I forgot to mention.   That will deleted one record of the duplicate.  You keep running it until it returns 0 results.   Some cities may have 3 or 4 duplicates, if they do, just run the script until no records are returned.    Of course not the best solution for automation, but an easy way if you're doing it manually just one time.

0
 
jet-blackConnect With a Mentor Commented:
Sorry, I forgot that currently you can't use the same table in subqueries on update actions in MYSQL.
You should try another ways. (maybe you can create temprorary table and use this temprorary table in subquery)

http://www.sitepoint.com/forums/showthread.php?t=541339
http://www.xaprb.com/blog/2006/06/23/how-to-select-from-an-update-target-in-mysql/
0
 
gdemariaCommented:
wow, that's a nasty restriction for a database

 select distinct... can avoid that, see my previous post
0
 
MFredinAuthor Commented:
Thanks!  Inserting into a new table was the best solution for me.  Fast too! Thanks!
0
All Courses

From novice to tech pro — start learning today.