LEFT OUTER JOIN VS. RIGHT OUTER JOIN

tantormedia
tantormedia used Ask the Experts™
on
Dear experts,

I have a query:
SELECT ...
MAX(CASE WHEN r.r_type = 'Audio' THEN r.Awards END) AudioReviews,
MAX(CASE WHEN r.r_type = 'Print' THEN r.Awards END) PrintReviews,
...
FROM BookList b
...
LEFT OUTER JOIN
(SELECT BookStem, r_type, GROUP_CONCAT(DISTINCT CASE WHEN Award IS NOT NULL THEN
CONCAT('<a href="/manage/review.asp?review=', CAST(ID AS CHAR), '" target="_blank">', Award, '</a>')
ELSE CONCAT('<a href="/manage/review.asp?review=', CAST(ID AS CHAR), '" target="_blank">Review</a>') END SEPARATOR '<br />') Awards
FROM Reviews
GROUP BY BookStem, r_type) r
ON b.BookStem = r.BookStem

That outer join part makes my query very slow. I replace the LEFT OUTER JOIN with RIGHT OUTER JOIN and swapped tables accordingly:
SELECT ...
MAX(CASE WHEN r.r_type = 'Audio' THEN r.Awards END) AudioReviews,
MAX(CASE WHEN r.r_type = 'Print' THEN r.Awards END) PrintReviews,
...
FROM
(SELECT BookStem, r_type, GROUP_CONCAT(DISTINCT CASE WHEN Award IS NOT NULL THEN
CONCAT('<a href="/manage/review.asp?review=', CAST(ID AS CHAR), '" target="_blank">', Award, '</a>')
ELSE CONCAT('<a href="/manage/review.asp?review=', CAST(ID AS CHAR), '" target="_blank">Review</a>') END SEPARATOR '<br />') Awards
FROM Reviews
GROUP BY BookStem, r_type) r
RIGHT OUTER JOIN
BookList b
ON b.BookStem = r.BookStem
...
To my surprise, now the query runs 3 times faster, though the result seems to be the same. Could you please explain this to me? I always thought it there should be no difference.
Also, maybe you could give me any suggestion how to improve the query further?
Thanks.
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Kevin CrossChief Technology Officer
Most Valuable Expert 2011

Commented:
Hmm.

That is very interesting and, yes, the results and performance of query should be the same.  To see what is going on, try to run EXPLAIN on both queries and see if there is a different plan developed based on the RIGHT versus LEFT OUTER JOIN.

Here is a nice reference for EXPLAIN and MySQL query performance in general:
http://www.experts-exchange.com/Database/MySQL/A_1250-3-Ways-to-Speed-Up-MySQL.html

As for further tuning, the results of EXPLAIN should help as well as the reference above.  Look for index issues and try to simplify when possible.  For example, instead of the CASE in the GROUP_CONCAT, try using COALESCE -- if nothing else, should be easier to maintain code.

...
(SELECT BookStem, r_type, GROUP_CONCAT(DISTINCT CONCAT('<a href="/manage/review.asp?review=', CAST(ID AS CHAR), '" target="_blank">', COALESCE(Award, 'Review'), '</a>') SEPARATOR '<br />') Awards
FROM Reviews
GROUP BY BookStem, r_type) r
...

Open in new window


Hope that helps!

Author

Commented:
Thank you for your answer. I did try EXPLAIN.
Here is the result for the LOJ:
1, 'PRIMARY', 'b', 'range', 'Index_PublicationDate', 'Index_PublicationDate', '9', '', 1036, 'Using where; Using temporary; Using filesort'
3, 'DERIVED', 'Reviews', 'ALL', '', '', '', '', 5859, 'Using filesort'
1, 'PRIMARY', '<derived3>', 'ALL', '', '', '', '', 2117, ''

For the ROJ:
1, 'PRIMARY', 'b', 'ALL', 'Index_PublicationDate', '', '', '', 2325, 'Using where; Using temporary; Using filesort'
2, 'DERIVED', 'Reviews', 'ALL', '', '', '', '', 5859, 'Using filesort'
1, 'PRIMARY', '<derived2>', 'ALL', '', '', '', '', 2117, ''

It seems like RIGHT OUTER JOIN should be slower, but it is faster...
Chief Technology Officer
Most Valuable Expert 2011
Commented:
Okay, see how these queries perform versus each other:
SELECT BookStem, r_type, GROUP_CONCAT(DISTINCT CONCAT('<a href="/manage/review.asp?review=', CAST(ID AS CHAR), '" target="_blank">', COALESCE(Award, 'Review'), '</a>') SEPARATOR '<br />') Awards
FROM Reviews
GROUP BY BookStem, r_type

Open in new window

SELECT BookStem
     , GROUP_CONCAT(DISTINCT CASE r_type WHEN 'Audio' THEN CONCAT('<a href="/manage/review.asp?review=', CAST(ID AS CHAR), '" target="_blank">', COALESCE(Award, 'Review'), '</a>') END SEPARATOR '<br />') AudioReviews
     , GROUP_CONCAT(DISTINCT CASE r_type WHEN 'Print' THEN CONCAT('<a href="/manage/review.asp?review=', CAST(ID AS CHAR), '" target="_blank">', COALESCE(Award, 'Review'), '</a>') END SEPARATOR '<br />') PrintReviews
FROM Reviews
WHERE r_type IN ('Print', 'Audio')
GROUP BY BookStem

Open in new window


Note you should remove DISTINCT from both if not truly necessary.  If you are using ID in the concatenation, consider if the ID value is unique for each Award even if it is a duplicative entry.  If so, then it may be a waste of resources and unnecessary blow to performance to use DISTINCT which will have to sort the results to eliminate duplicates most likely.

At any rate, if faste you can plug in the second query to help speed things up as it should remove the need to aggregate in the outer query.  If you still need to GROUP BY BookStem and r_type in outer query anyway, you may want to consider just a straight JOIN of the data in the two tables involved and then do aggregates all at once taking advantage of conditional aggregates as I have shown in my example for things like separating Print and Audio reviews.
11/26 Forrester Webinar: Savings for Enterprise

How can your organization benefit from savings just by replacing your legacy backup solutions with Acronis' #CyberProtection? Join Forrester's Joe Branca and Ryan Davis from Acronis live as they explain how you can too.

Author

Commented:
The first query ran in 0.0242, the second in 0.0021. But they are doing different things. The first query returns 2,117 rows, the second returns 1,483. Still, I tried to change my query using the second subquery, but the result was about the same as before.
Kevin CrossChief Technology Officer
Most Valuable Expert 2011

Commented:
Yes, because you were filtering out the non Audio and Print reviews in the other query before by using the conditional MAX(...) statements.  If you truly only need those two r_type values, then the second query limits the resultset to that and thus takes advantage of any indexing on that column versus doing a full table scan on Reviews.  Even if it does, your later query has 600+ less rows it has to scan/deal with.

If the outer query retains its speed, then the issue is something on the outside.  Can you please share the new full query with the WHERE clause especially.

Author

Commented:
OK, here is the new query:
SELECT b.BookStem, au.Submitted, bc.Subject AS BookGenre, au.AudiesGenre, DATE_FORMAT(au.AudiesDate, '%m/%d/%y') AS AudiesDate, b.Title,
      GROUP_CONCAT(DISTINCT CONCAT(a.LastName, ', ', a.FirstName) SEPARATOR '<br />') as Author,
      GROUP_CONCAT(DISTINCT CONCAT(n.LastName, ', ', n.FirstName) SEPARATOR '<br />') as Narrator,
      DATE_FORMAT(b.ShipDate, '%m/%d/%y') AS ShipDate, DATE_FORMAT(b.HCDate, '%m/%d/%y') AS HCDate,
      (CASE WHEN b.Released = 'x' THEN 'Released'
      WHEN nl.TextSent IS NULL THEN 'Unassigned'
      WHEN CURDATE() >= nl.`From` AND CURDATE() <= nl.`To` THEN 'Recording'
      WHEN CURDATE() < nl.From THEN 'Not Started'
      WHEN nl.AudioReceive IS NOT NULL THEN '1st Audio'
      WHEN nl.PickupSent IS NOT NULL THEN 'Waiting For Pickups'
      WHEN nl.PickupReceived IS NOT NULL THEN 'Final Audio'
      END) AS `Status`,
      ar.ProoferComments,
      r.AudioReviews,
      r.PrintReviews,
      b.DL_Rank, b.HC_Rank
      FROM BookList b
      LEFT OUTER JOIN Audies au ON b.BookStem = au.BookStem
      LEFT OUTER JOIN (BisacLinks bl JOIN BisacCodes bc ON bl.BisacCode = bc.BisacCode)
            ON b.BookStem = bl.BookStem
      LEFT OUTER JOIN (AuthorLinks al JOIN Authors a ON al.AuthorStem = a.AuthorStem)
            ON b.BookStem = al.BookStem
      LEFT OUTER JOIN (NarratorLinks nl JOIN Narrators n ON nl.NarratorStem = n.NarratorStem)
            ON b.BookStem = nl.BookStem
      LEFT OUTER JOIN
        (SELECT BookStem, CASE WHEN COUNT(ID) > 0 THEN '<a href="/manage/AudioList.asp" target="_blank">Comments</a>' ELSE NULL END AS ProoferComments
        FROM AudioReviews GROUP BY BookStem) ar
      ON b.BookStem = ar.BookStem
      LEFT OUTER JOIN
        (SELECT BookStem
     , GROUP_CONCAT(DISTINCT CASE r_type WHEN 'Audio' THEN CONCAT('<a href="/manage/review.asp?review=', CAST(ID AS CHAR), '" target="_blank">', COALESCE(Award, 'Review'), '</a>') END SEPARATOR '<br />') AudioReviews
     , GROUP_CONCAT(DISTINCT CASE r_type WHEN 'Print' THEN CONCAT('<a href="/manage/review.asp?review=', CAST(ID AS CHAR), '" target="_blank">', COALESCE(Award, 'Review'), '</a>') END SEPARATOR '<br />') PrintReviews
FROM Reviews
WHERE r_type IN ('Print', 'Audio')
GROUP BY BookStem) r
      ON b.BookStem = r.BookStem  WHERE b.PublicationDate >= CURDATE() - INTERVAL 12 MONTH GROUP BY b.BookStem  ORDER BY b.BookStem

Open in new window

Kevin CrossChief Technology Officer
Most Valuable Expert 2011

Commented:
Okay, real quick the only columns it appear to be aggregated any more are coming from tables 'a' and 'n'.  Try aggregating those in a derived table or view ahead of time also and see if that helps.

Ensure that BookStem has a good index on it in all tables as it appears to be what is joining everything.

Check on index on PublicationDate.

Author

Commented:
Thanks

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial