Solved

How to speed up  SELECT JOIN query for large tables

Posted on 2008-06-20
4
1,992 Views
Last Modified: 2010-04-21
How do i speed up SELECT JOIN query for 2 large tables? (teststock and prdbesarch)

I've been using this code for a while now and it seems to be working fine.

One table is teststock and looks like this:
------------------------------------------------
ID int(12) NO PRI auto_increment
CombArtNr varchar(9) NO
KlantNr int(5) NO 0
besnr int(6) NO 0
itemnr int(4) NO 0

prdbesarch looks like this:
------------------------------
Field Name Field Type AllowNull Key Default Extra
prdbesarch
ID int(25) NO PRI auto_increment
besdate date NO
klantnr int(5) NO
klantnaam varchar(64) NO geen naam
fland varchar(64) NO geen land
opmerkingen varchar(512) NO geen
faktuurnr int(6) NO 0
proformanr int(6) NO 0
leveringsnr int(6) NO 0
finstatus varchar(128) NO
besnr int(6) NO
refklant varchar(64) NO
itemnr int(4) NO
artcode varchar(6) NO
klrcode varchar(3) NO
omschr varchar(256) NO
qty decimal(6,2) NO
price decimal(8,2) NO
totaal decimal(8,2) NO 0.00
kg decimal(6,2) NO
intrastat int(8) NO


Today i've update the teststock to the point where it has around 300 000 lines.
prdbesarch has around 15 000 lines


The code basicly shows me the content of prdbesarch and then for each line it gives me the the total stock (fromstock), and available stock (reststock).

Is it maybe because i'm pushing the limits of MySQL? Or is it because my code is just to complex for 2 tables?

This is an example of the code i use.
SELECT l.*, 
SUM(CASE g.KlantNr WHEN l.klantnr THEN CASE g.besnr WHEN l.besnr THEN CASE g.itemnr WHEN l.itemnr THEN 1 ELSE 0 END ELSE 0 END ELSE 0 END) "FromStock", 
SUM(CASE g.KlantNr WHEN '0' THEN 1 ELSE 0 END)  "RestStock", 
COUNT(CombArtNr) "AllStock" 
FROM ( 
    SELECT 
    prdbesarch.ID, 
    prdbesarch.besdate, 
    prdbesarch.klantnr, 
    prdbesarch.klantnaam, 
    prdbesarch.fland, 
    prdbesarch.opmerkingen, 
    prdbesarch.faktuurnr, 
    prdbesarch.proformanr, 
    prdbesarch.leveringsnr, 
    prdbesarch.finstatus, 
    prdbesarch.besnr, 
    prdbesarch.refklant, 
    prdbesarch.itemnr, 
    prdbesarch.artcode, 
    prdbesarch.klrcode, 
    prdbesarch.omschr, 
    prdbesarch.qty, 
    prdbesarch.price, 
    prdbesarch.totaal, 
    prdbesarch.kg, 
    prdbesarch.intrastat 
    FROM prdbesarch 
    WHERE (artcode LIKE 'A1101') 
    ) l 
    LEFT JOIN teststock g 
    ON CONCAT(l.artcode,'_',l.klrcode) = g.CombArtNr 
    GROUP BY g.CombArtNr, 
    l.ID, 
    l.klantnr, 
    l.besnr, 
    l.refklant, 
    l.itemnr, 
    l.artcode, 
    l.klrcode, 
    l.qty 
ORDER BY l.ID ASC

Open in new window

0
Comment
Question by:24Carat
  • 2
4 Comments
 
LVL 9

Accepted Solution

by:
Rurne earned 350 total points
ID: 21833821
The code is unoptimized.  Typically, subqueries (lines 6-29) slow performance, especially when used in joins, because MySQL has to create a temporary table in order to perform the  LEFT JOIN on the alias ("l", line 30).  The join is then performed on a function (line 32, "CONCAT(l.artcode,'_',l.klrcode)"), and so a full-text match has to be used and any index on g.CombArtNr is ignored.  Depending on the data definitions for prdbesarch and teststock, the GROUP BY statement (lines 33-41) are probably being performed on columns without indexes, which will take longer.  Finally, your result set is dependent on two aggregate functions which contain extended switching statements (lines 2-3), which takes considerable logic for the query planner to map and execute.  Finally, you are ordering on a column from a temporary table (line 42), which has no index from the original table.

One question I do have: Line 29 uses the LIKE conditional, but you don't use any '%' wildcards to indicate an incomplete phrase.  Should prdbesarch = 'A1101', or should it simply start with 'A1101' and have characters after it?  I assume it is the latter case in the code below.  In either case, as you're both filtering against and applying GROUP BY to prdbesarch.artcode, I'd recommend creating an index on it, if you have not already done so.

Below is a start, though to get optimum performance, we're going to still have to address the CONCAT issue not optimizing on index.  That's going to be a huge performance drain, but that will require modifying prdbesarch  I'd also recommend checking into the EXPLAIN syntax for MySQL a bit more to help pinpoint where .
SELECT prdbesarch.ID, prdbesarch.klantnr, prdbesarch.besnr, prdbesarch.refklant, prdbesarch.itemnr, prdbesarch.artcode, prdbesarch.klrcode, prdbesarch.qty, 
SUM(CASE g.KlantNr WHEN l.klantnr THEN CASE g.besnr WHEN l.besnr THEN CASE g.itemnr WHEN l.itemnr THEN 1 ELSE 0 END ELSE 0 END ELSE 0 END) "FromStock", 
SUM(CASE g.KlantNr WHEN '0' THEN 1 ELSE 0 END)  "RestStock", 
COUNT(CombArtNr) "AllStock" 
FROM prdbesarch
LEFT JOIN LEFT JOIN teststock 
ON CONCAT(prdbesarch.artcode,'_',prdbesarch.klrcode) = teststock.CombArtNr 
WHERE prdbesarch.artcode LIKE 'A1101%'
GROUP BY teststock.CombArtNr, 
    prdbesarch.ID, 
    prdbesarch.klantnr, 
    prdbesarch.besnr, 
    prdbesarch.refklant, 
    prdbesarch.itemnr, 
    prdbesarch.artcode, 
    prdbesarch.klrcode, 
    prdbesarch.qty 
ORDER BY prdbesarch.ID ASC

Open in new window

0
 
LVL 45

Assisted Solution

by:Kent Olsen
Kent Olsen earned 150 total points
ID: 21834215

You've got a potential issue in the join operation, too.

CombArtNr is a VARCHAR (9)

ArtCode is a VARCHAR (6)
KlrCode is a VARCHAR (3)

CONCAT (ArtCode, '_', KlrCode) can therefore be 10 characters, which will never match CombArtNr.

I'm a freak for normalization AND for integer join keys.  (They go together.)  You can make your database a lot more efficient if you'll utilize some of these techniques.

If it makes sense with regard to other processes, you should consider building another table with the CombArtNr values, with an auto_increment primary key.  In the teststock and prdbesarch tables, store the primary key from this table instead of the character values.  Index the column in both tables.

Now the join operation is based on indexed integers.  You'll see a tremendous performance improvement.


Good Luck,
Kent
0
 

Author Closing Comment

by:24Carat
ID: 31469251
Optimizing the query has improved the performance a lot.

I've alse created indexes on the teststock table and this helps a lot more even, however i'm unexperienced with indexes.

Wich is the best way to create indexes? Should i simple create 1 index named TSindex on all the field names? Or should i create indexes for each field?

How would you create the indexes on these tables?

Teststock is a table that changes frequently is that also an isue?
0
 
LVL 9

Expert Comment

by:Rurne
ID: 21835485
The only benefit to a multicolumn index (especially in MyISAM tables) is to use it as a unique constraint.  A multicolumn index only really hashes on the first column, so you are better off doing individual indexes on each of the columns in the GROUP BY statement.

Then again, it is a matter of perspective.  As you've noted that teststock has frequen inserts/updates, too many indexes can slow performance as each has to be rehashed after the manipulation query.  You may want to see how it performs with and without each of the indexes and to use some sample data to benchmark the insert and update queries:
http://dev.mysql.com/doc/refman/5.0/en/information-functions.html#function_benchmark
0

Featured Post

Back Up Your Microsoft Windows Server®

Back up all your Microsoft Windows Server – on-premises, in remote locations, in private and hybrid clouds. Your entire Windows Server will be backed up in one easy step with patented, block-level disk imaging. We achieve RTOs (recovery time objectives) as low as 15 seconds.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Foreword In the years since this article was written, numerous hacking attacks have targeted password-protected web sites.  The storage of client passwords has become a subject of much discussion, some of it useful and some of it misguided.  Of cou…
Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
Microsoft Active Directory, the widely used IT infrastructure, is known for its high risk of credential theft. The best way to test your Active Directory’s vulnerabilities to pass-the-ticket, pass-the-hash, privilege escalation, and malware attacks …

840 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question