Solved

MySQL Multiple table update statement performance question.

Posted on 2008-10-02
13
850 Views
Last Modified: 2013-12-12
Hey folks,

New user here, so I'm not sure if I have the proper forum to ask this question, so my apologies in advance if this belongs elsewhere.

Essentially, here's my scenario. We have a web app that functions as a warehouse management system for locations of products, quantities, etc. A mini ERP basically, but far more simplistic.

I have 3 tables with the following structure: (there are more fields, only the relevant ones are shown)

product:
PKProductID, FKWarehouseLocationID

productgroup:
SKU, FKProductID

Warehouse:
PKWarehouseLocationID, WareHouseLocationName

The foreign key FKProductID from productgroup references the primary key PKProductID in the product table, and the Foreign key FKWarehouseLocationID in the product table references the primary key PKWarehouseLocationID in the warehouse table.

Now, I'm using this query to update individual warehouse locations for a given sku:

UPDATE warehouse, product, productgroup PG
JOIN product p ON (PG.`FKProductID` = p.`PKProductID`)
JOIN warehouse w ON (p.`FKWareHouseLocationID` = w.`PKWareHouseLocationID`)
set p.`FKWareHouseLocationID` = w.`PKWareHouseLocationID`
WHERE PG.`SKU` = 'enter_sku_here'
AND w.`WareHouseLocationName` = 'enter_location_of_product_here'

That query works, but it's ugly; it takes a good 12-13 seconds per record to update.  On a per record basis, that's not a *huge* issue, but we've gotten to the point where there's too many changes to efficiently do singly.

So I created a temp table I use for importing new sheets of warehouse locations with the structure of:

temptable:
SKU, WareHouseLocation

I adjusted the above query to basically update locations based on the values entered into the temptable. the new query looks like this:

UPDATE warehouse, product, temptable tt, productgroup PG
JOIN product p ON (PG.`FKProductID` = p.`PKProductID`)
JOIN warehouse w ON (p.`FKWareHouseLocationID` = w.`PKWareHouseLocationID`)
set p.`FKWareHouseLocationID` = w.`PKWareHouseLocationID`
WHERE PG.`SKU` = 'tt.sku'
AND w.`WareHouseLocationName` = 'tt.WareHouseLocation'

Only problem is, as you can see from my performance numbers, 12-13 seconds per record when you have a sheet of 1,000 rows just isn't viable.

I'm not an SQL wizard by any stretch, so I'm sure there's got to be a better way to accomplish this. Can anyone suggest a better way to write queries that involve multiple, large tables like these? Even a point in the right direction would be appreciated.
0
Comment
Question by:jmoriarty
  • 7
  • 6
13 Comments
 
LVL 2

Expert Comment

by:storeytime
ID: 22631677
personally I can;t even figure out how that works but I am assuming that what you are trying to do is find the product with a SKU and then update its location if so try the following if not please clarify:

UPDATE product p
JOIN productgroup pg ON (pg.`FKProductID` = p.`PKProductID`)
SET p.`FKWareHouseLocationID` =
        (SELECT `PKWareHouseLocationID`
         FROM warehouse w
         WHERE w.`WareHouseLocationName` = 'enter_location_of_product_here')
WHERE pg.`SKU` = 'enter_sku_here';


Was't sure on the data structures used for each I simulated with INT UNSIGNED for each and it works.
0
 

Author Comment

by:jmoriarty
ID: 22635635
Hi storeytime,

Thanks for the fast reply! I actually tried using something similar to that using a sub-query to choose the productid, but the problem with sub-queries is that, when pulling a large list of skus from a temporary table like in the latter half of my post, the sub-query would return more than one result. Or at least that's the error MySQL was giving me.

For example, using your above query, I would write it similar to this to facilitate the above paragraph:

UPDATE product p
JOIN productgroup pg ON (pg.`FKProductID` = p.`PKProductID`)
SET p.`FKWareHouseLocationID` =
        (SELECT `PKWareHouseLocationID`
         FROM warehouse w
         WHERE w.`WareHouseLocationName` = 'temptable.warehousename')
WHERE pg.`SKU` = 'temptable.sku';

which returns 0 rows affected (presumably due to the sub-query returning multiple results?)

And you mostly were right about the data types - my apologies for forgetting such a crucial detail. All of them are INT UNSIGNED with the exception of WareHouseLocationName which is VARCHAR

Thanks for taking the time to assist me; I appreciate it greatly.
0
 
LVL 2

Expert Comment

by:storeytime
ID: 22636452
If sub query returns multiple results then there is error in data not in the code.  All the IDs should be unique and defined as keys/ indexes.  Run the subquery alone to test that theory also run one with SKU WHERE statement
0
Master Your Team's Linux and Cloud Stack

Come see why top tech companies like Mailchimp and Media Temple use Linux Academy to build their employee training programs.

 

Author Comment

by:jmoriarty
ID: 22639003
Ah, I see, that starts to make sense now. Running the sub-query alone as shown below returns 0 results, even though there are matching warehouse location names in both tables. I.E., if I run a select statement for a specific warehouse name on both tables, I get results, but if I run the query as shown below, it returns 0 results. Could that be due to no indexes/key definitions on the temptable? Basically, I was just creating temptable with the fields as placeholders to run queries against, I hadn't defined any keys/indexes as of yet.

SELECT `PKWareHouseLocationID`
         FROM warehouse w
         WHERE w.`WareHouseLocationName` = 'temptable.warehousename'


Thank you again for taking the time to assist me.
0
 

Author Comment

by:jmoriarty
ID: 22639008
Actually, I take back part of what I said in the previous comment - I do have a primary key and index on the temptable assigned to the sku field.
0
 

Author Comment

by:jmoriarty
ID: 22639127
Small update - running the subquery as shown below (and above) does work. I had accidentally enclosed the table name in single quotes in the WHERE clause.

So running this:

select w.`pkwarehouseid`
from warehouse w, temptable
where w.WareHouseLocationName = temptable.warehousename

Does produce the right results and shows the pkwarehouseid of each warehousename in temptable, but plugging the sub-query into the update statement like this:

UPDATE product p
JOIN productgroup pg ON (pg.`FKProductID` = p.`PKProductID`)
SET p.`FKWareHouseID` =
        (select w.`pkwarehouseid`
from warehouse w, temptable
where w.WareHouseLocationName = temptable.warehousename)
WHERE pg.`SKU` = temptable.sku;

Throws a mysql error of:

Unknown column 'temptable.sku' in 'where clause'

I'm guessing it's something simple I'm not aware of that needs to be included in the update statement to reference the temptable?


Thanks again!

0
 
LVL 2

Expert Comment

by:storeytime
ID: 22639580
you have temptable defined only in the from in the subquery but access it in the WHERE of the UPDATE.

What was the speed of the command using without temptable was it still 12 sec

Best query so far

UPDATE
(((warehouse w INNER JOIN product p ON (w.`PKWareHouseLocationID` = p.`FKWareHouseLocationID`))
INNER JOIN productgroup pg ON (pg.`FKProductID` = p.`PKProductID`))
INNER JOIN temptable tt ON (tt.SKU = pg.SKU))
set p.WID = (select w.`PKWareHouseLocationID`
                     FROM warehouse w
                     WHERE w.`WareHouseLocationName`  = tt.warehousename) ;
0
 
LVL 2

Expert Comment

by:storeytime
ID: 22639602
UPDATE
(((warehouse w LEFT INNER JOIN product p ON (w.`PKWareHouseLocationID` = p.`FKWareHouseLocationID`))
LEFT INNER JOIN productgroup pg ON (pg.`FKProductID` = p.`PKProductID`))
LEFT INNER JOIN temptable tt ON (tt.SKU = pg.SKU))
set p.WID = (select w.`PKWareHouseLocationID`
                     FROM warehouse w
                     WHERE w.`WareHouseLocationName`  = tt.warehousename) ;



I forced LEFT INNER JOIN its suppose to be identical but I also read that it can cut down on permutations so it might speed it up with large tables
0
 

Author Comment

by:jmoriarty
ID: 22639666
Ah, now I see, you join all tables on common fields, and then update the single record you want to change. Since we're joining temptable to productgroup using sku, it's essentially the same as saying "update warehouseid where this sku is the same as this sku" since only the records that match in both tables are joined when using an inner join, correct?

This query worked well:

UPDATE
(((warehouse w INNER JOIN product p ON (w.`PKWareHouseLocationID` = p.`FKWareHouseLocationID`))
INNER JOIN productgroup pg ON (pg.`FKProductID` = p.`PKProductID`))
INNER JOIN temptable tt ON (tt.SKU = pg.SKU))
set p.WID = (select w.`PKWareHouseLocationID`
                     FROM warehouse w
                     WHERE w.`WareHouseLocationName`  = tt.warehousename) ;

Took about 4.5 seconds to update 11 rows, which is far better than the results I was getting. That's only ~7 minutes or so for 1,000 record update.

In response to your question, on a per SKU basis without running through temptable, the update statement took ~750 MS, so even on an individual update basis, it was still much faster.

Interestingly, your statement using LEFT INNER JOIN failed; could be a data issue?

I'm wondering though, even at ~7 minutes per 1,000, I'm beginning to think it might be faster to run the query through a PHP script and loop through each result of a select statement and update each record individually. What's your opinion?


Thanks!
0
 
LVL 2

Accepted Solution

by:
storeytime earned 500 total points
ID: 22639733
I think you are starting to pick up on it.  I haven't done in sql in years so it is slowly coming back to me

I doubt that would be faster depending on how the script is done it will have to be reparsed each time it goes through the loop which adds time to the query

UPDATE
((( productgroup pg INNER JOIN temptable tt ON (tt.SKU = pg.SKU))
INNER JOIN product p ON (pg.`FKProductID` = p.`PKProductID`))
INNER JOIN warehouse w  ON (w.`PKWareHouseLocationID` = p.`FKWareHouseLocationID`))
set p.WID = (select w.`PKWareHouseLocationID`
                     FROM warehouse w
                     WHERE w.`WareHouseLocationName`  = tt.warehousename) ;


I rearranged the joins since temptable is the smallest table is should be in the first join to decrease overall table size should speed it up somemore
0
 

Author Comment

by:jmoriarty
ID: 22640499
Ah, I see - so you join from from smallest to largest, smaller tables on the outside, larger on the inside, because your smallest table designated first is what's going to create the initial table. One of the things I never thought of.  And the parenthesis I assume are to ensure that the tables are joined in the proper order?

I've gained a lot of insight from your responses, I thank you once again.
0
 
LVL 2

Expert Comment

by:storeytime
ID: 22641583
no problem what is the speed down too
0
 

Author Comment

by:jmoriarty
ID: 22643039
Much better with the second query where you adjusted the smaller tables to be first - I just finished testing various sheets of records, ~12.54 seconds for 100 records, thats less than 3 minutes for 1,000 records, which is a 10 fold increase over what I was working with. The largest sheets are usually only around 2-3000 records at most, so that's not bad performance considering what I was working with.

Very much appreciated.
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Occasionally there is a need to clean table columns, especially if you have inherited legacy data. There are obviously many ways to accomplish that, including elaborate UPDATE queries with anywhere from one to numerous REPLACE functions (even within…
Introduction This article is intended for those who are new to PHP error handling (https://www.experts-exchange.com/articles/11769/And-by-the-way-I-am-New-to-PHP.html).  It addresses one of the most common problems that plague beginning PHP develop…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.

830 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question