Avatar of SybaseUk
SybaseUk

asked on 

Join Algorithms and Subquery Flattening

What is subquery flattening and what is its relevance to Sybase IQ?? Can you give an example of a subquery being flattened?

Also, I understand that Sybase IQ has several join algorithms available to the optimizer. Can you explain the purpose of these and give an example of each one??

Thanks.
Sybase Database

Avatar of undefined
Last Comment
Joe Woodhouse
Avatar of Joe Woodhouse
Joe Woodhouse

Flattening is where a query containing a subquery is internally rewritten so the subquery becomes a join. This doesn't change the result set of the overall query; the motivation is purely performance.

Subqueries don't give the optimiser any choice about what order to process the tables in - the subqueries have to be done first. But rewriting as a join means the optimiser can now consider picking each table as the "outer" table for the join... and "the other way around" might end up being much much faster than doing it as a subquery.

For example:

select A.name
from   A
where A.id in (select B.id from B)

As written, the optimiser must run the subquery on table B first before it can start looking at table A... but there is nothing here preventing this from being rewritten as a join:

select A.name
from   A, B
where A.id = B.id

Both queries return the same results, but the join gives the optimiser more options in terms of how to run it.

Re. join algorithms - they are written up in the IQ manuals, and I don't know of anywhere that describes them better than that. Since under normal circumstances this is not something you have any control over (the optimiser decides for you) I think Sybase haven't bothered trying to explain them in great detail.
Avatar of grant300
grant300

Joe,

Great answer, I have just one nit to pick...

Using the IN clause, you may or may not get the same result as the flattened query depending upon whether or not the ID field is unique in table B.  In the first query, you will get one instance of the row from table A regardless of how many times the corresponding ID appears in table B.  With a join, any duplicates of an ID in table B will cause multiple instances of the corresponding row from table A.

I am not sure if the query optimizer is smart enough to flatten out the IN subquery or not.  Sybase is pretty smart so I am sure that it will either find a way to remove the ambiguity or simply not flatten the query.  I see three ways for it to do so though.
 - look for a unique index on B.id
 - create a work table as select distinct ID from B and then do the join
 - do the join and create a work table with A.name and B.id and then return the results, something like...
   select A.name from (select distinct A.name, B.id from worktable)

An unambiguous example would be...

select A.name
   from A
where A.id = (select B.id from B)

Regards,
Bill
ASKER CERTIFIED SOLUTION
Avatar of Joe Woodhouse
Joe Woodhouse

Blurred text
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
Avatar of SybaseUk
SybaseUk

ASKER

Hi Guys,

Is it ok if i share out the points between the 2 of you?

Also, can you kindly point me to the IQ manual which explains the join algorithms please? I have read that there are certain options which can influence the optimizer's decision of which algorithm to choose??

What do the 'push down' algorithms do differently and can u give an example please?

Thanks
Avatar of grant300
grant300

Please give the points to Joe; he answered your question and added additional information.

Regards,
Bill
Avatar of Joe Woodhouse
Joe Woodhouse

Looking again at the IQ manuals, I see they don't really define the algorithms, just tell you that they exist and for the main part it's the optimiser's job to pick them.

There is an option you can set to favour or to avoid a particular join type. You should only use this as part of investigating a specific performance problem or if Sybase Tech Support have asked you to. In 10+ years of using IQ I've never yet found it necessary to nudge the IQ optimiser's decisions on join types (or indeed, anything else).

The idea of "push down" in general is to do some things out of order, and where possible to rule out processing sooner rather than later.

A very very basic example, let's suppose my query includes these clauses:

WHERE A.id = B.id
and B.id = C.id
and C.qty > 1000000

If the optimiser knows there are no rows in C with qty > 1000000, it can "push down" this knowledge into the other joins, and say there will be no rows matching in C, and I'm only interested in the rows from B that match C, and I know that's zero, so there are no rows from B either, which means I don't even need to bother joining A to B even if there are plenty of rows that match, since I'll be throwing them all away anyway due to the other clauses.

... I think it's something like that. Basically the push-down versions are the same as the regular version, just with additional opportunities to take short-cuts.
Sybase Database
Sybase Database

Sybase, a subsidiary of SAP, builds a client/server relational database management system. Products include Adaptive Server Enterprise (ASE), Adaptive Server Anywhere (ASA), Sybase Unwired Platform (SUP) for mobile applications, Afaria for enterprise mobile device management and IQ for data warehouse and big data applications.

5K
Questions
--
Followers
--
Top Experts
Get a personalized solution from industry experts
Ask the experts
Read over 600 more reviews

TRUSTED BY

IBM logoIntel logoMicrosoft logoUbisoft logoSAP logo
Qualcomm logoCitrix Systems logoWorkday logoErnst & Young logo
High performer badgeUsers love us badge
LinkedIn logoFacebook logoX logoInstagram logoTikTok logoYouTube logo