Solved

Left Join much slower than Join

Posted on 2008-10-25
13
1,218 Views
Last Modified: 2012-05-05
SELECT * FROM table1
JOIN table2 USING (key2)
LEFT JOIN table3 JOIN table4 USING (key4) USING (key3)
WHERE table1.x=100
AND table2.y=200

All keys are primary keys and so indexed, and table1.x and table2.y are also indexed. VACUUM ANALYZE has been run.

This query takes a very slow 17 seconds. I can see from the query plan that this is due to sequence scans on table3 and table4.

If I change it from a LEFT JOIN to a JOIN then it takes under 0.5 seconds and it correctly uses the indexes on table3.key3 and table4.key4.

It's also fast (<0.5s) if I remove the join to table4, OR the join to table2 OR the last "AND table2.y=2".

So, as its any of a number of things that drastically speed it up, it seems its the complexity of the query rather than any lack of indexing that's causing the slow behaviour.

Any ideas why this would be ? Is there any way I can force it to use the indexes on table3 and table4 ? Am I missing something ?

Regards,
Mark
0
Comment
Question by:sl311
  • 6
  • 4
  • 2
13 Comments
 
LVL 142

Accepted Solution

by:
Guy Hengel [angelIII / a3] earned 250 total points
ID: 22802245
what about thiws:
SELECT * 
FROM table1 t1
JOIN table2 ON t1.key2 = t2.key2 AND t2.y=200
LEFT JOIN table3 t3
  ON t3.key3 = t2.key3
LEFT JOIN table4 t4
  ON t4.key4 = t3.key4
WHERE table1.x=100

Open in new window

0
 

Author Comment

by:sl311
ID: 22802298
Hi,

Well I have to say when I saw your response I was a little skeptical that it would even run because moving the order of the table4 join and changing it to a left join looked to me like it was in the wrong place, but I was wrong, and it returns the right results, but more importantly in <0.5s. The moving of t2.y=200 wasn't the issue because I moved that back to where it was originally and it was still fast.

So

LEFT JOIN table3 USING (key3)
LEFT JOIN table4 ON (table4.key4=table3.key4)

Is significantly (x34) faster than:

LEFT JOIN table3 JOIN table4 USING (key4) USING (key3)

The second one looks more correct to me because it's table3 and table4 that are really being joined and the USING clause is more compact  than ON, but I can't see a problem with the way you suggest.

Any ideas why there is such a leap in performance for what is in effect the same query ? Why would one use the indexes and not the other ?

Thanks,
Mark
0
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 22802317
I have to be honest, I don't know postgre sql at all. I only suggested the "what I know in mssql works fine" :)
still, even there, I don't know why a left join sometimes gives bad results, however the reason in your case is that the database FIRST did the join between table3 and table4, and then the join with the rest.
however, joining first table3 to the results of the other joins, and then table4 is eventually more effective, as less rows will have to be handled most likely
0
 

Author Comment

by:sl311
ID: 22802356
Thanks, well even if it joined table3 to table4 first why wouldn't it use the index on table4.key4 for that join rather than doing a Seq Scan on table4 first which is what the query plan is showing.

I use the format of "LEFT JOIN table JOIN table" a lot in various queries so I'd rather not change them all especially as it's still a little unclear why this speeds things up, so I'll leave this question open for a few days, and hopefully someone can shed some light on why the Postgres query optimiser chooses Seq Scans when indexes are available and evidently much faster.

Thanks for your help,
Mark
0
 
LVL 3

Expert Comment

by:brycen
ID: 22860181
For deeper help, post the results of "explain analyze" with your selects.  Usually analysis of that output makes it clear what the underlying issue is.  In particular this will show the number of rows "examined" but later filtered out (which is no doubt the root of the problem here).
0
 

Author Comment

by:sl311
ID: 22860746
Hi,

We think the underlying issue is that the shared_buffers is not big enough for the query planner to plan using indexing. We have to adjust the shared memory in the Linux kernel, before we can upgrade this in Postgres (otherwise it gives us an error). This is going to take a week or so before we can do it, so once we've done that I'll post back whether that was indeed the problem.

If it isn't then I will post the explain analyze output.

Thanks for your input.
Mark
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 

Author Comment

by:sl311
ID: 23027366
Hi,

Well we made the change but it wasn't the shared_buffers at all - so back to the drawing board !

I think this article clears up some of the problem:
http://www.postgresql.org/docs/8.2/interactive/explicit-joins.html

Although I'm still confused - under the bit:
SELECT * FROM a LEFT JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id);

it says:
"Therefore the planner has no choice of join order here: it must join B to C and then join A to that result."

I'm not sure I agree with that - why couldn't it join b to a to find those records from b that are to be returned, and then join c to the records from b just discovered ? That would mean 2 index scans, rather than an index scan and a sequence scan wouldn't it ? I can't see any ambiguity that would prevent this ?

Thanks,
Mark
0
 
LVL 3

Expert Comment

by:brycen
ID: 23032278
Also give the results of the following.  Too small a stats target can throw off the planner.



# show default_statistics_target;

+---------------------------+

| default_statistics_target |

+---------------------------+

| 50                        |

+---------------------------+

Open in new window

0
 

Author Comment

by:sl311
ID: 23035577
Thanks, it's currently the default of 10 - I'll up it to 50 this evening, do an Analyze and see what happens.
0
 
LVL 3

Expert Comment

by:brycen
ID: 23039323
"explain analyze"
Up it to 100
"explain analyze" again,
and compare the planner results.

Then you can back off from 1000 as you see fit.
0
 
LVL 3

Expert Comment

by:brycen
ID: 23039325
Make that 1,000.  Start at 1000, and see what difference it makes...
0
 

Author Closing Comment

by:sl311
ID: 31509912
Gave me a workaround - didn't tell me why the workaround works though.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Best database to use for Maps is PostgreSQL. This is an open source database. Comes as a package with most Linux OS. For more info visit the following site: http://www.postgresql.org/ (http://www.postgresql.org/) This requires some add-o…
Many developers have database experience, but are new to PostgreSQL. It has some truly inspiring capabilities. I have several years' experience with Microsoft's SQL Server. When I began working with MySQL, I wanted a quick-reference to MySQL (htt…
Steps to create a PostgreSQL RDS instance in the Amazon cloud. We will cover some of the default settings and show how to connect to the instance once it is up and running.
This video shows how to remove a single email address from the Outlook 2010 Auto Suggestion memory. NOTE: For Outlook 2016 and 2013 perform the exact same steps. Open a new email: Click the New email button in Outlook. Start typing the address: …

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now