Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Left Join much slower than Join

Posted on 2008-10-25
13
Medium Priority
?
1,230 Views
Last Modified: 2012-05-05
SELECT * FROM table1
JOIN table2 USING (key2)
LEFT JOIN table3 JOIN table4 USING (key4) USING (key3)
WHERE table1.x=100
AND table2.y=200

All keys are primary keys and so indexed, and table1.x and table2.y are also indexed. VACUUM ANALYZE has been run.

This query takes a very slow 17 seconds. I can see from the query plan that this is due to sequence scans on table3 and table4.

If I change it from a LEFT JOIN to a JOIN then it takes under 0.5 seconds and it correctly uses the indexes on table3.key3 and table4.key4.

It's also fast (<0.5s) if I remove the join to table4, OR the join to table2 OR the last "AND table2.y=2".

So, as its any of a number of things that drastically speed it up, it seems its the complexity of the query rather than any lack of indexing that's causing the slow behaviour.

Any ideas why this would be ? Is there any way I can force it to use the indexes on table3 and table4 ? Am I missing something ?

Regards,
Mark
0
Comment
Question by:sl311
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 4
  • 2
13 Comments
 
LVL 143

Accepted Solution

by:
Guy Hengel [angelIII / a3] earned 750 total points
ID: 22802245
what about thiws:
SELECT * 
FROM table1 t1
JOIN table2 ON t1.key2 = t2.key2 AND t2.y=200
LEFT JOIN table3 t3
  ON t3.key3 = t2.key3
LEFT JOIN table4 t4
  ON t4.key4 = t3.key4
WHERE table1.x=100

Open in new window

0
 

Author Comment

by:sl311
ID: 22802298
Hi,

Well I have to say when I saw your response I was a little skeptical that it would even run because moving the order of the table4 join and changing it to a left join looked to me like it was in the wrong place, but I was wrong, and it returns the right results, but more importantly in <0.5s. The moving of t2.y=200 wasn't the issue because I moved that back to where it was originally and it was still fast.

So

LEFT JOIN table3 USING (key3)
LEFT JOIN table4 ON (table4.key4=table3.key4)

Is significantly (x34) faster than:

LEFT JOIN table3 JOIN table4 USING (key4) USING (key3)

The second one looks more correct to me because it's table3 and table4 that are really being joined and the USING clause is more compact  than ON, but I can't see a problem with the way you suggest.

Any ideas why there is such a leap in performance for what is in effect the same query ? Why would one use the indexes and not the other ?

Thanks,
Mark
0
 
LVL 143

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 22802317
I have to be honest, I don't know postgre sql at all. I only suggested the "what I know in mssql works fine" :)
still, even there, I don't know why a left join sometimes gives bad results, however the reason in your case is that the database FIRST did the join between table3 and table4, and then the join with the rest.
however, joining first table3 to the results of the other joins, and then table4 is eventually more effective, as less rows will have to be handled most likely
0
NFR key for Veeam Agent for Linux

Veeam is happy to provide a free NFR license for one year.  It allows for the non‑production use and valid for five workstations and two servers. Veeam Agent for Linux is a simple backup tool for your Linux installations, both on‑premises and in the public cloud.

 

Author Comment

by:sl311
ID: 22802356
Thanks, well even if it joined table3 to table4 first why wouldn't it use the index on table4.key4 for that join rather than doing a Seq Scan on table4 first which is what the query plan is showing.

I use the format of "LEFT JOIN table JOIN table" a lot in various queries so I'd rather not change them all especially as it's still a little unclear why this speeds things up, so I'll leave this question open for a few days, and hopefully someone can shed some light on why the Postgres query optimiser chooses Seq Scans when indexes are available and evidently much faster.

Thanks for your help,
Mark
0
 
LVL 3

Expert Comment

by:brycen
ID: 22860181
For deeper help, post the results of "explain analyze" with your selects.  Usually analysis of that output makes it clear what the underlying issue is.  In particular this will show the number of rows "examined" but later filtered out (which is no doubt the root of the problem here).
0
 

Author Comment

by:sl311
ID: 22860746
Hi,

We think the underlying issue is that the shared_buffers is not big enough for the query planner to plan using indexing. We have to adjust the shared memory in the Linux kernel, before we can upgrade this in Postgres (otherwise it gives us an error). This is going to take a week or so before we can do it, so once we've done that I'll post back whether that was indeed the problem.

If it isn't then I will post the explain analyze output.

Thanks for your input.
Mark
0
 

Author Comment

by:sl311
ID: 23027366
Hi,

Well we made the change but it wasn't the shared_buffers at all - so back to the drawing board !

I think this article clears up some of the problem:
http://www.postgresql.org/docs/8.2/interactive/explicit-joins.html

Although I'm still confused - under the bit:
SELECT * FROM a LEFT JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id);

it says:
"Therefore the planner has no choice of join order here: it must join B to C and then join A to that result."

I'm not sure I agree with that - why couldn't it join b to a to find those records from b that are to be returned, and then join c to the records from b just discovered ? That would mean 2 index scans, rather than an index scan and a sequence scan wouldn't it ? I can't see any ambiguity that would prevent this ?

Thanks,
Mark
0
 
LVL 3

Expert Comment

by:brycen
ID: 23032278
Also give the results of the following.  Too small a stats target can throw off the planner.



# show default_statistics_target;
+---------------------------+
| default_statistics_target |
+---------------------------+
| 50                        |
+---------------------------+

Open in new window

0
 

Author Comment

by:sl311
ID: 23035577
Thanks, it's currently the default of 10 - I'll up it to 50 this evening, do an Analyze and see what happens.
0
 
LVL 3

Expert Comment

by:brycen
ID: 23039323
"explain analyze"
Up it to 100
"explain analyze" again,
and compare the planner results.

Then you can back off from 1000 as you see fit.
0
 
LVL 3

Expert Comment

by:brycen
ID: 23039325
Make that 1,000.  Start at 1000, and see what difference it makes...
0
 

Author Closing Comment

by:sl311
ID: 31509912
Gave me a workaround - didn't tell me why the workaround works though.
0

Featured Post

Learn how to optimize MySQL for your business need

With the increasing importance of apps & networks in both business & personal interconnections, perfor. has become one of the key metrics of successful communication. This ebook is a hands-on business-case-driven guide to understanding MySQL query parameter tuning & database perf

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Best database to use for Maps is PostgreSQL. This is an open source database. Comes as a package with most Linux OS. For more info visit the following site: http://www.postgresql.org/ (http://www.postgresql.org/) This requires some add-o…
Many developers have database experience, but are new to PostgreSQL. It has some truly inspiring capabilities. I have several years' experience with Microsoft's SQL Server. When I began working with MySQL, I wanted a quick-reference to MySQL (htt…
Steps to create a PostgreSQL RDS instance in the Amazon cloud. We will cover some of the default settings and show how to connect to the instance once it is up and running.
In this brief tutorial Pawel from AdRem Software explains how you can quickly find out which services are running on your network, or what are the IP addresses of servers responsible for each service. Software used is freeware NetCrunch Tools (https…

704 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question