Link to home
Start Free TrialLog in
Avatar of ambientsbs
ambientsbs

asked on

Faster MySQL Update Query to update field in table based on other table

Hello Experts,

I reached out here earlier to get some code for the following problem, and it's worked great, however I'm now using it on a table with about 3 Million records and it takes forever and eventually times out.

Here's what I have. I have a table with Cities and States and an empty Zip field. I'm using the statement below to populate the zip field based on the name of the city and state and a table that stores all my city, state, zip info:

UPDATE Z_MyTable S 
STRAIGHT_JOIN CityStateZip_LatLong SA
ON S.State = SA.state
SET S.Zip = SA.zipcode
WHERE S.City LIKE CONCAT('%',SA.city,'%')
AND S.State = SA.state

Open in new window


Is there any way that I can maybe order the Z_MyTable within this query in order to make it run faster? I'm at a loss, but it just doesn't seem to work great with that many records...

Thanks for the help!
Avatar of acbxyz
acbxyz
Flag of Germany image

The wildcard in your where is not good if it comes to performance, especially the wildcard at the beginning.

If this must be like it is, you can add a "where s.zip is null" and limit the number of rows being updated in one step.
This won't reduce the overall time, but you can successfully update all records without timeouts.
ASKER CERTIFIED SOLUTION
Avatar of Theo Kouwenhoven
Theo Kouwenhoven
Flag of Netherlands image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
This WHERE clause is incorrect and unoptimal as you are asking MySQL
    to do a substring search within a string:
       WHERE S.City LIKE CONCAT('%',SA.city,'%')


Incorrect? Now, we don't know depends on the contents  of the city fields
Unoptimal ! Yes sure, but every where, order by or group by is Unoptimal if the index in not available, But using wildcard is very unoptimal :-)

btw ambientsbs

Why do you use this %like% ?
I am presuming the CityStateZip_LatLong table (which is aliased to "SA")
is a reference table of cities, states, zip codes, etc .

I am presuming that the city field contains the name of each city in the
U.S. and therefore starts in column 1. It is therefore more correct -- as
I stated -- for proposed query to use a WHERE clause like this:
    WHERE S.City LIKE CONCAT('%',SA.city,'%')

As it turns out it should be more optimal as the MySQL optimizer can now
internally do a match against a city index (should it exist) instead of
a substring search through the entire table.
In that case you need the '%',SA.city,'%'

SA.city,'%' is not enough, there are loads of countries that have the ZIP code in front of the city name.

By the way, this will not work if one of the cities is also part of a larger city name:
eg. "new york" like '%york%' is also correct

yes I know it's a different state, but I'm not smart enough to find a better example :-)
Avatar of ambientsbs
ambientsbs

ASKER

Perhaps I'm doing all of this wrong and there's a simpler way to do it.....let me clarify what my data looks like. I have a list of a couple million vehicles with their cities and states but no zip codes. The cities and states will show things like:


city                                                     state
san francisco bay area                    CA
new york city                                    NY
greenville/upstate                            SC


I also have the CityStateZip_LatLong table that has a listing of every single city, state, and zip code in the US.

I'm using the LIKE statement because I never know what the city is going to look like, but I think it will work fine to just use the WHERE S.City LIKE CONCAT(SA.city,'%') statement.


Is there a cleaner/faster way to assign a zip code to every record based on city/state than what I am doing? Hope this helps clarify....


I'll work on the suggestions given and provide feedback. Thanks for the initial responses!