Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
Solved

# SQL join question

Posted on 2009-07-12
Medium Priority
860 Views
Hi experts,
Which one of the following query is more efficient, and why?
Appreciate it if you can provide some analysis too

``````-- Query A
SELECT *
FROM Author
JOIN Book
ON Author.ID = Book.AuthorID
AND LEFT (Author.Name, 1) = 'A'

-- Query B
SELECT *
FROM Author
JOIN Book
ON Author.ID = Book.AuthorID
WHERE
LEFT (Author.Name, 1) = 'A'

-- Query C
SELECT *
FROM Author, Book
WHERE Author.ID = Book.AuthorID
AND LEFT (Author.Name, 1) = 'A'
``````
0
Question by:philipjonathan
[X]
###### Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

• Help others & share knowledge
• Earn cash & points
• 6
• 6
• 5
• +3

LVL 31

Expert Comment

ID: 24836935
well, there is not performance difference between query B and query C. both are same, just belongs to different SQL syntax. but query A will have faster performance as your condition will executed even before WHERE clause, FROM is a close which used to execute first so ultimately you are passing less records to other processes in query execution flow.
0

LVL 22

Accepted Solution

8080_Diver earned 400 total points
ID: 24836962
Queries BC are NOT the same.  Instead, A and C are equivalents.  Query B will give the worst performance because the LEFT function on the only column in the WHERE clause is going to preclude the use of any indexes.  It is probably that there will be an index on the ID columns, though, so both A and C are going to take advantage of that index and then check the LEFT(AuthorName, 1) constraint..
As to whether the first "half" of Query A will perform before anything else or before the constraint being applied, that all depends upon the query engine and, absent an indication as to which engine is involved (e.g. Oracle, mySQL, Teradata), one cannot make a definitive statement of regarding that aspect.  What one can say, though, is that it is probably that Query A will result in an index seek rather than a table (or, ven, an index) scan, which is what the Query B will cause.
Also, one can say that Query A conforms to the newer "rules" for writing "well formed and well formatted" queries. ;-)
0

LVL 25

Assisted Solution

imitchie earned 400 total points
ID: 24837153
I read differently.  As far as I can tell, all three queries will perform the same.  The usage of SQL-99 compliant syntax (using JOIN syntax) is for readability purposes and to assist the query optimizer.  However, it is obvious that there are only two tables, so SQL Server [2008] will work out an equivalent plan.  The proof is in the pudding, paste the whole lot (all 3) in query analyzer and hit Ctrl-L and check the plans picture for picture.
0

LVL 31

Expert Comment

ID: 24837404
well there are few differences between experts here but I guess nobody will disagree that Query A will best suitable for every aspects, more readable and good performance. right?
0

LVL 18

Author Comment

ID: 24837428
Thanks all

@8080_Diver:
How does the query optimizer behave in MS SQL Server 2008? Will it be the same in MS SQL Server 2005?

@imitchie:
Good call on the query analyzer. I will try and post again
0

LVL 31

Expert Comment

ID: 24837448
>> As far as I can tell, all three queries will perform the same<<

I don't agree with this statement, If we can get all three execution plan, we can discuss more in this.
0

LVL 21

Assisted Solution

pinaldave earned 400 total points
ID: 24837895
I tend to agree with RiteshShah here.

Query A is way better and for the reason, he has explained in his very first comment of ID:24836935.

All the query will give you same result and may look same in execution plan because query analizer is very smart and will convert other two (B and C) to type of A and will display that in execution plan.

Kind Regards,
Pinal Dave
0

LVL 18

Author Comment

ID: 24838361
I ran the execution plan analyzer (screenshot attached). It turns out that all 3 gives the same results. Is it expected? Would it behave differently had the query been more complex?

Join.PNG
0

LVL 31

Expert Comment

ID: 24838380
I am still stick with my very first statement, Query A is a good way to go for, As Pinal Dave  said "query analizer is very smart and will convert other two (B and C) to type of A and will display that in execution plan", QA did exactly like he said but in first query, you are passing less data to other processes in query flow so ideally it should be the right way to go for.
0

LVL 31

Assisted Solution

RiteshShah earned 400 total points
ID: 24838396
for more clarification, have a look at Pinal's blog at:

http://blog.sqlauthority.com/2009/04/06/sql-server-logical-query-processing-phases-order-of-statement-execution/

which will show you FROM clause will be executed first so if you are filtering data between FROM and WHERE, less data will be passed to WHERE and other clauses which will ultimately resulted in good performance.
0

LVL 22

Expert Comment

ID: 24840324
How does the query optimizer behave in MS SQL Server 2008? Will it be the same in MS SQL Server 2005?
In this particular case, it probably will; however, there is no guarantee regarding all other cases. ;-)
@All y'all
All of you seem to have immediately assumed that this is a SQL Server targeted query.  However, this is a generic SQL forum and, therefore, one cannot immediately assume that the target DBMS is SQL Server.  The performance of other query engines may or may not operate in a manner similar to that of SQL Server. Therefore, the nature of the results may or may not be equivalent.  For some engines I have worked with, the absence of the linkage by Author.ID to Book.AuthorID can have a significant impact.
By the way, from looking at the Query Plans, it is apparent that the AuthorID does not seem to have an index on it in the Author table and is also, apparently, not in the first position of an index with the Name in the Author table.
@philipjonathan,
Try adding a unique index on the AuthorID in the Author table and a non-unique index on AuthorID in the Book table and then rerun the queries and pull the execution plans.
0

LVL 51

Assisted Solution

Mark Wills earned 400 total points
ID: 24840461
Hi Guys,

Mind if I put my 2 cents in ?

Query A is the best on the infromation provided so far...

Query C being an equi-join will most likely translate into query A. Depends a bit on indexes and statistics. Regardless it is "yucky" syntax, and really best to declare your joins more formally to give the query optimizer every opportunity to resolve the data source requirements by more clearly indentifying those data sources in a formalised "FROM" statement.

Query B is applying a filter after the join so has potentially more work to do.

In reality, it is pretty hard to second guess the query optimizer, and it is getting very clever. As pinal Dave said above, they are all likely to use the same query plan in the end...

From Books on line we find the basic steps that SQL Server uses to process a single SELECT statement include the following:

1) The parser scans the SELECT statement and breaks it into logical units such as keywords, expressions, operators, and identifiers.

2) A query tree, sometimes referred to as a sequence tree, is built describing the logical steps needed to transform the source data into the format required by the result set.

3) The query optimizer analyzes different ways the source tables can be accessed. It then selects the series of steps that returns the results fastest while using fewer resources. The query tree is updated to record this exact series of steps. The final, optimized version of the query tree is called the execution plan.

4) The relational engine starts executing the execution plan. As the steps that require data from the base tables are processed, the relational engine requests that the storage engine pass up data from the rowsets requested from the relational engine.

5) The relational engine processes the data returned from the storage engine into the format defined for the result set and returns the result set to the client.

The process of selecting one execution plan from potentially many possible plans is referred to as optimization. The query optimizer is one of the most important components of a SQL database system. While some overhead is used by the query optimizer to analyze the query and select a plan, this overhead is typically saved several-fold when the query optimizer picks an efficient execution plan.

The SQL Server query optimizer is a cost-based optimizer. Each possible execution plan has an associated cost in terms of the amount of computing resources used. The query optimizer must analyze the possible plans and choose the one with the lowest estimated cost. Some complex SELECT statements have thousands of possible execution plans. In these cases, the query optimizer does not analyze all possible combinations. Instead, it uses complex algorithms to find an execution plan that has a cost reasonably close to the minimum possible cost.

The SQL Server query optimizer does not choose only the execution plan with the lowest resource cost; it chooses the plan that returns results to the user with a reasonable cost in resources and that returns the results the fastest. For example, processing a query in parallel typically uses more resources than processing it serially, but completes the query faster. The SQL Server optimizer will use a parallel execution plan to return results if the load on the server will not be adversely affected.

The query optimizer relies on distribution statistics when it estimates the resource costs of different methods for extracting information from a table or index. Distribution statistics are kept for columns and indexes. They indicate the selectivity of the values in a particular index or column.

So, you see, it is actually a bit difficult to decide. To the point that SQL may well decide that a table scan is faster because there is relatively little data to select from. Conversely,  if the index statistics and column statistics are not current, the query optimizer may falsely choose the wrong plan.

So, the only way to be sure of the best running queries, is to use very good syntax and structure, aim to minimise row selection, use columns that have indexes as predicates in joins and filters, and keep your statistics up to date...

The, finally, run the execution plan and see what it says - or even analyse the query in the database tuning advisor. But apply what you "actually" know about the data in terms of real business use when looking at the various recommendations.

But then, ritesh and other experts have pretty much all said Query "A" - slight differences of opinion about Query "C" and most agree that Query "B" is undesirable.

Just one suggestion, instead of doing LEFT(Author,1) = 'A' why not do Author like 'A%'

0

LVL 51

Expert Comment

ID: 24840546
@8080_diver, it does say SQL 2008, so I think it is perfectly reasonable to "assume" it is in fact Microsoft SQL Server.

0

LVL 51

Expert Comment

ID: 24840673
Most "normal" queries are more likely to be written in the style of Query B, and think that it might have been a little maligned as to it's relative performance.

In reality the Author.name like 'A%' has nothing to do with the join on Books. That is pretty much all Books for the selected Authors. The fact that the added qualifier for authors has no real impact on the books index per se.

However, query A could have been written :

SELECT *
FROM Book
JOIN Author
ON Author.ID = Book.AuthorID AND Author.Name like 'A%'

And that does have a more profound affect on the joined table without changing the results.
0

LVL 18

Author Comment

ID: 24846044
Dear all, thanks for the lively discussion.

@RiteshShah, pinaldave:
Thanks for the pointer, and pinaldave's blog is helpful to give me general outline of query execution sequence.

@8080_Diver:
I've added non-unique index to Book table, and the resulting execution plan, as attached here, looks the same for all 3. The Author table already has primary key (the ID column), but I just add another unique index on the ID column. I hope this will do? I'm primarily a programmer, so please pardon my ignorance in database area.

Btw, there is already a relationship between Author.ID and Book.AuthorID.

@mark_wills:
Thanks for the detailed analysis, and for pointing out the problem with the Author.Name. I've changed the query to use LIKE 'A%' as you proposed in your comment #24840673, and added index to Author.Name column. It improves a lot on querying the author table. Now remains the question about the performance of the join itself (and how the where clause affect the join)
JoinIndex.PNG
0

LVL 18

Author Comment

ID: 25075263
Hmm, my only conclusion is that the Query Optimiser is smart enough to recognise that all 3 queries are actually the same.
0

LVL 18

Author Closing Comment

ID: 31602689
Thanks all. I'm not sure how to distribute the points, but I understood mark_wills' answer most clearly, so I'll accept him, and just distribute points equally
0

LVL 18

Author Comment

ID: 25075314
Hmm, seems that there is problem with the 'Accept Multiple' function. I've reported a bug with EE here:
http://www.experts-exchange.com/Community_Support/EE_Bugs/Q_24645293.html
0

LVL 51

Expert Comment

ID: 25075391
Well, yes it does, but hope you can also see that sometimes a small restructure / thinking about the structure of your query can have a bigger impact than leaving up to the very excellent Query Optimiser, and also how hard it is to second guess what it will do :)

Cheers, and thanks for the assist - much appreciated.

Mark Wills
0

LVL 31

Expert Comment

ID: 25075971
Hi Mark,

You did really great job by explaining details in the form of essay. you deserve "Accepted"
0

LVL 51

Expert Comment

ID: 25076448
Ritesh, thanks, and you already know I suffer from too much typing :) Not to everyones taste though :) :)

My ZA comment above was really in response to the "Request Attention" question posted in community support. I wasn't fishing to get "Accepted", and quite happy with "Assisted".
0

## Featured Post

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

What if you have to shut down the entire Citrix infrastructure for hardware maintenance, software upgrades or "the unknown"? I developed this plan for "the unknown" and hope that it helps you as well. This article explains how to properly shut down â€¦