Question

SQL join question

Asked by: philipjonathan

Hi experts,
Which one of the following query is more efficient, and why?
Appreciate it if you can provide some analysis too

-- Query A
SELECT *
FROM Author
JOIN Book
  ON Author.ID = Book.AuthorID
    AND LEFT (Author.Name, 1) = 'A'
 
 
-- Query B
SELECT *
FROM Author
JOIN Book
  ON Author.ID = Book.AuthorID
WHERE
  LEFT (Author.Name, 1) = 'A'
 
 
-- Query C
SELECT *
FROM Author, Book
WHERE Author.ID = Book.AuthorID
  AND LEFT (Author.Name, 1) = 'A'

                                  
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:

Select allOpen in new window

This Question has been solved and asker verified All Experts Exchange premium technology solutions are available to subscription members.

Subscribe now for full access to Experts Exchange and get

Instant Access to this Solution

  • Plus...
  • 30 Day FREE access, no risk, no obligation
  • Collaborate with the world's top tech experts
  • Unlimited access to our exclusive solution database
  • Never be left without tech help again

Subscribe Now

Asked On
2009-07-12 at 19:47:42ID24564228
Tags

sql

,

join

,

performance

Topics

SQL Query Syntax

,

MS SQL Server

,

SQL Server 2008

Participating Experts
5
Points
500
Comments
22

Trusted by hundreds of thousands everyday for fast, accurate and reliable tech support.

  • "The time we save is the biggest benefit of Experts Exchange to Warner Bros. What could take multiple guys 2 hours or more each to find is accessed in around 15 minutes on Experts Exchange." Mike Kapnisakis, Warner Bros.
  • "Our team likes having a resource that is more secure than just using Google and most experts using this service really know their stuff. It's nice to look here first versus using Google." Dayna Sellner, Lockheed Martin
  • "Anytime that I've been stumped with a problem, 9 out of 10 times Experts Exchange has either the accepted solution or an open discussion of the potential solution to the problem." Kenny Red, eBay Inc.

See what Experts Exchange can do for you.

Got a question?

We've got the answer.

Experts Exchange has been collecting answers to technology questions since 1996…3 million and counting! If you have a question, chances are we already have your answer.

Screenshot of Experts Exchange Knowledgebase

Need individual assistance?

Our experts are ready to help.

If you can't find the exact answer you're looking for, ask our exclusive community of 50,000 experts. You’ll get a personalized answer from a trusted professional.

Screenshot of Experts Exchange Knowledgebase

Want to learn from the best?

Read articles from industry experts.

Thousands of free tech tips, tricks, how-to’s and tutorials are available in our peer reviewed articles section. See for yourself how smart our experts are, no login required.

Screenshot of an Article

Working on a long term project?

Store your work and research.

Save solutions to your questions, answers you’ve discovered through searching plus helpful articles in your personal knowledgebase for easy future access.

Screenshot of Experts Exchange Knowledgebase

Access the answers to your technology questions today.

Subscribe Now

30-day free trial. Register in 60 seconds.

What Makes Experts Exchange Unique?

Members of the expert community talk about why the experience at Experts Exchange is different than what you will find anywhere else.

Trusted by the world's most respected brands.

image of each brand's logo

Faithfully serving IT professionals since 1996.

Experts Exchange Logo

Try it out and discover for yourself.

Subscribe Now

30-day free trial. Register in 60 seconds.

Related Solutions

  1. Lag analysis
    I need to create a lag analysis for 3, 6, and 12 months. For example, I need to take the CurrentDolrs/EndingGrossAR from 12 months ago What would be the best way to do this? Query and/or Function?

Free Tech Articles

  1. WARNING: 5 Reasons why you should NEVER fix a computer for free.
    It is in our nature to love the puzzle. We are obsessed. The lot of us. We love puzzles. We love the challenge. We thrive on finding the answer. We hate disarray. It bothers us deep in our soul. W...
  2. SCCM OSD Basic troubleshooting
    SCCM 2007 OSD is a fantastic way to deploy operating systems, however, like most things SCCM issues can sometimes be difficult to resolve due to the sheer volume of logs to sift through and the dispe...
  3. Migrate Small Business Server 2003 to Exchange 2010 and Windows 2008 R2
    This guide is intended to provide step by step instructions on how to migrate from Small Business Server 2003 to Windows 2008 R2 with Exchange 2010. For this migration to work you will need the fo...
  4. Create a Win7 Gadget
    This article shows you how to create a simple "Gadget" -- a sort of mini-application supported by Windows 7 and Vista. Gadgets can be dropped anywhere on the desktop to provide instant information, ...
  5. Outlook continually prompting for username and password
    There have been a lot of questions recently regarding Outlook prompting for a username and password whilst using Exchange 2007. There are a few reasons why this would happen and I will try to cover t...
  6. Backup Exchange 2010 Information Store using Windows Backup
    There seems to be quite a lot of confusion around the ability to backup Exchange 2010 using the built in Windows Backup feature. This stems from the omission of this feature prior to Exchange 2007 s...

Cloud Class Webinars

  1. Avoiding Bugs in Microsoft Access
    Alison Balter takes and in-depth look at avoiding bugs in Access. In this webinar you will learn about using the immediate window to debug your applications, invoking the debugger, using breakpoints to troubleshoot, stepping through code, setting the next statement to execute, ...
  2. Top 10 Best New Features in Visio 2010
    Scott Helmers gives live demonstrations of the top 10 new features in Visio 2010. This webinar will teach you how to create compelling diagrams by adding shapes to the page with a single click, linking the shapes in a diagram to data in Excel (or SQL Server, or SharePoint), ...
  3. IT Consultant Business Secrets Revealed
    Michael Munger, Experts Exchange tech pro and IT consultant, pulls back the curtain on his very successful businesses and answers question on every IT consultant and business owner should know about. He shares secrets on what he did to solve the 5 most common problems in IT, ...
  4. Disaster Recovery and Business Continuity
    Quest CTO, Mike Billon, gives an overview of the steps involved in building a dunamic disaster recovery plan. Through case studies and an examination of software/hardware tooles for monitoring and testing, you'll gain a better understandin of where you are, where you want ...
  5. Organize Your Visio Diagrams with Containers and Lists
    Scott Helmers uses cross functional flowcharts, wireframe diagrams, data graphic legends and seating charts to teach you: how to ustilize all three new structured diagram components in Visio 2010, the best practices for organizeing shapes in previous version of Visio, how to organize ...
  6. How to Us Objects, Properties, Events and Methods in Microsoft Access
    Alison Dalter gives an in-depbth look at objects, properties, events and methods in Microsoft Access. In this webinar you will learn about using the object browser, referring to objects, working with properties and methods, working with object variables, understanding the ...

Join the Community

Give a Little. Get a Lot.

Join the community of experts here and help other tech pros by answering question in your area of expertise. You can earn FREE access to all Experts Exchange's premium features and resources.

Join the Community

Answers

 

by: RiteshShahPosted on 2009-07-12 at 20:10:08ID: 24836935

well, there is not performance difference between query B and query C. both are same, just belongs to different SQL syntax. but query A will have faster performance as your condition will executed even before WHERE clause, FROM is a close which used to execute first so ultimately you are passing less records to other processes in query execution flow.

 

by: 8080_DiverPosted on 2009-07-12 at 20:21:54ID: 24836962

Queries B & C are NOT the same.  Instead, A and C are equivalents.  Query B will give the worst performance because the LEFT function on the only column in the WHERE clause is going to preclude the use of any indexes.  It is probably that there will be an index on the ID columns, though, so both A and C are going to take advantage of that index and then check the LEFT(AuthorName, 1) constraint..

As to whether the first "half" of Query A will perform before anything else or before the constraint being applied, that all depends upon the query engine and, absent an indication as to which engine is involved (e.g. Oracle, mySQL, Teradata), one cannot make a definitive statement of regarding that aspect.  What one can say, though, is that it is probably that Query A will result in an index seek rather than a table (or, ven, an index) scan, which is what the Query B will cause.  

Also, one can say that Query A conforms to the newer "rules" for writing "well formed and well formatted" queries. ;-)

 

by: imitchiePosted on 2009-07-12 at 21:37:34ID: 24837153

I read differently.  As far as I can tell, all three queries will perform the same.  The usage of SQL-99 compliant syntax (using JOIN syntax) is for readability purposes and to assist the query optimizer.  However, it is obvious that there are only two tables, so SQL Server [2008] will work out an equivalent plan.  The proof is in the pudding, paste the whole lot (all 3) in query analyzer and hit Ctrl-L and check the plans picture for picture.

 

by: RiteshShahPosted on 2009-07-12 at 22:44:24ID: 24837404

well there are few differences between experts here but I guess nobody will disagree that Query A will best suitable for every aspects, more readable and good performance. right?

 

by: philipjonathanPosted on 2009-07-12 at 22:57:31ID: 24837428

Thanks all

@8080_Diver:
How does the query optimizer behave in MS SQL Server 2008? Will it be the same in MS SQL Server 2005?

@imitchie:
Good call on the query analyzer. I will try and post again

 

by: RiteshShahPosted on 2009-07-12 at 23:04:20ID: 24837448

>> As far as I can tell, all three queries will perform the same<<

I don't agree with this statement, If we can get all three execution plan, we can discuss more in this.

 

by: pinaldavePosted on 2009-07-13 at 01:30:57ID: 24837895

I tend to agree with RiteshShah here.

Query A is way better and for the reason, he has explained in his very first comment of ID:24836935.

All the query will give you same result and may look same in execution plan because query analizer is very smart and will convert other two (B and C) to type of A and will display that in execution plan.

Kind Regards,
Pinal Dave

 

by: philipjonathanPosted on 2009-07-13 at 03:11:39ID: 24838361

I ran the execution plan analyzer (screenshot attached). It turns out that all 3 gives the same results. Is it expected? Would it behave differently had the query been more complex?

 

by: RiteshShahPosted on 2009-07-13 at 03:15:57ID: 24838380

I am still stick with my very first statement, Query A is a good way to go for, As Pinal Dave  said "query analizer is very smart and will convert other two (B and C) to type of A and will display that in execution plan", QA did exactly like he said but in first query, you are passing less data to other processes in query flow so ideally it should be the right way to go for.

 

by: RiteshShahPosted on 2009-07-13 at 03:18:59ID: 24838396

for more clarification, have a look at Pinal's blog at:

http://blog.sqlauthority.com/2009/04/06/sql-server-logical-query-processing-phases-order-of-statement-execution/

which will show you FROM clause will be executed first so if you are filtering data between FROM and WHERE, less data will be passed to WHERE and other clauses which will ultimately resulted in good performance.

 

by: 8080_DiverPosted on 2009-07-13 at 07:58:58ID: 24840324

How does the query optimizer behave in MS SQL Server 2008? Will it be the same in MS SQL Server 2005?

In this particular case, it probably will; however, there is no guarantee regarding all other cases. ;-)

@All y'all

All of you seem to have immediately assumed that this is a SQL Server targeted query.  However, this is a generic SQL forum and, therefore, one cannot immediately assume that the target DBMS is SQL Server.  The performance of other query engines may or may not operate in a manner similar to that of SQL Server. Therefore, the nature of the results may or may not be equivalent.  For some engines I have worked with, the absence of the linkage by Author.ID to Book.AuthorID can have a significant impact.  

By the way, from looking at the Query Plans, it is apparent that the AuthorID does not seem to have an index on it in the Author table and is also, apparently, not in the first position of an index with the Name in the Author table.

@philipjonathan,

Try adding a unique index on the AuthorID in the Author table and a non-unique index on AuthorID in the Book table and then rerun the queries and pull the execution plans.

 

by: mark_willsPosted on 2009-07-13 at 08:11:26ID: 24840461

Hi Guys,

Mind if I put my 2 cents in ?

Query A is the best on the infromation provided so far...

Query C being an equi-join will most likely translate into query A. Depends a bit on indexes and statistics. Regardless it is "yucky" syntax, and really best to declare your joins more formally to give the query optimizer every opportunity to resolve the data source requirements by more clearly indentifying those data sources in a formalised "FROM" statement.

Query B is applying a filter after the join so has potentially more work to do.

In reality, it is pretty hard to second guess the query optimizer, and it is getting very clever. As pinal Dave said above, they are all likely to use the same query plan in the end...

From Books on line we find the basic steps that SQL Server uses to process a single SELECT statement include the following:

1) The parser scans the SELECT statement and breaks it into logical units such as keywords, expressions, operators, and identifiers.

2) A query tree, sometimes referred to as a sequence tree, is built describing the logical steps needed to transform the source data into the format required by the result set.

3) The query optimizer analyzes different ways the source tables can be accessed. It then selects the series of steps that returns the results fastest while using fewer resources. The query tree is updated to record this exact series of steps. The final, optimized version of the query tree is called the execution plan.

4) The relational engine starts executing the execution plan. As the steps that require data from the base tables are processed, the relational engine requests that the storage engine pass up data from the rowsets requested from the relational engine.

5) The relational engine processes the data returned from the storage engine into the format defined for the result set and returns the result set to the client.

And reading further :

The process of selecting one execution plan from potentially many possible plans is referred to as optimization. The query optimizer is one of the most important components of a SQL database system. While some overhead is used by the query optimizer to analyze the query and select a plan, this overhead is typically saved several-fold when the query optimizer picks an efficient execution plan.

The SQL Server query optimizer is a cost-based optimizer. Each possible execution plan has an associated cost in terms of the amount of computing resources used. The query optimizer must analyze the possible plans and choose the one with the lowest estimated cost. Some complex SELECT statements have thousands of possible execution plans. In these cases, the query optimizer does not analyze all possible combinations. Instead, it uses complex algorithms to find an execution plan that has a cost reasonably close to the minimum possible cost.

The SQL Server query optimizer does not choose only the execution plan with the lowest resource cost; it chooses the plan that returns results to the user with a reasonable cost in resources and that returns the results the fastest. For example, processing a query in parallel typically uses more resources than processing it serially, but completes the query faster. The SQL Server optimizer will use a parallel execution plan to return results if the load on the server will not be adversely affected.

The query optimizer relies on distribution statistics when it estimates the resource costs of different methods for extracting information from a table or index. Distribution statistics are kept for columns and indexes. They indicate the selectivity of the values in a particular index or column.


So, you see, it is actually a bit difficult to decide. To the point that SQL may well decide that a table scan is faster because there is relatively little data to select from. Conversely,  if the index statistics and column statistics are not current, the query optimizer may falsely choose the wrong plan.

So, the only way to be sure of the best running queries, is to use very good syntax and structure, aim to minimise row selection, use columns that have indexes as predicates in joins and filters, and keep your statistics up to date...

The, finally, run the execution plan and see what it says - or even analyse the query in the database tuning advisor. But apply what you "actually" know about the data in terms of real business use when looking at the various recommendations.

But then, ritesh and other experts have pretty much all said Query "A" - slight differences of opinion about Query "C" and most agree that Query "B" is undesirable.

Just one suggestion, instead of doing LEFT(Author,1) = 'A' why not do Author like 'A%'

 

by: mark_willsPosted on 2009-07-13 at 08:19:16ID: 24840546

@8080_diver, it does say SQL 2008, so I think it is perfectly reasonable to "assume" it is in fact Microsoft SQL Server.

 

by: mark_willsPosted on 2009-07-13 at 08:30:03ID: 24840673

Most "normal" queries are more likely to be written in the style of Query B, and think that it might have been a little maligned as to it's relative performance.

In reality the Author.name like 'A%' has nothing to do with the join on Books. That is pretty much all Books for the selected Authors. The fact that the added qualifier for authors has no real impact on the books index per se.

However, query A could have been written :

SELECT *
FROM Book
JOIN Author
ON Author.ID = Book.AuthorID AND Author.Name like 'A%'

And that does have a more profound affect on the joined table without changing the results.

 

by: philipjonathanPosted on 2009-07-13 at 20:06:59ID: 24846044

Dear all, thanks for the lively discussion.

@RiteshShah, pinaldave:
Thanks for the pointer, and pinaldave's blog is helpful to give me general outline of query execution sequence.


@8080_Diver:
I've added non-unique index to Book table, and the resulting execution plan, as attached here, looks the same for all 3. The Author table already has primary key (the ID column), but I just add another unique index on the ID column. I hope this will do? I'm primarily a programmer, so please pardon my ignorance in database area.

Btw, there is already a relationship between Author.ID and Book.AuthorID.


@mark_wills:
Thanks for the detailed analysis, and for pointing out the problem with the Author.Name. I've changed the query to use LIKE 'A%' as you proposed in your comment #24840673, and added index to Author.Name column. It improves a lot on querying the author table. Now remains the question about the performance of the join itself (and how the where clause affect the join)

 

by: philipjonathanPosted on 2009-08-11 at 19:25:28ID: 25075263

Hmm, my only conclusion is that the Query Optimiser is smart enough to recognise that all 3 queries are actually the same.

 

by: philipjonathanPosted on 2009-08-11 at 19:29:21ID: 31602689

Thanks all. I'm not sure how to distribute the points, but I understood mark_wills' answer most clearly, so I'll accept him, and just distribute points equally

 

by: philipjonathanPosted on 2009-08-11 at 19:37:04ID: 25075314

Hmm, seems that there is problem with the 'Accept Multiple' function. I've reported a bug with EE here:
http://www.experts-exchange.com/Community_Support/EE_Bugs/Q_24645293.html

 

by: mark_willsPosted on 2009-08-11 at 19:55:23ID: 25075391

Well, yes it does, but hope you can also see that sometimes a small restructure / thinking about the structure of your query can have a bigger impact than leaving up to the very excellent Query Optimiser, and also how hard it is to second guess what it will do :)

Cheers, and thanks for the assist - much appreciated.

Mark Wills

 

by: RiteshShahPosted on 2009-08-11 at 22:43:13ID: 25075971

Hi Mark,

You did really great job by explaining details in the form of essay. you deserve "Accepted"

 

by: mark_willsPosted on 2009-08-12 at 00:46:16ID: 25076448

Ritesh, thanks, and you already know I suffer from too much typing :) Not to everyones taste though :) :)

My ZA comment above was really in response to the "Request Attention" question posted in community support. I wasn't fishing to get "Accepted", and quite happy with "Assisted".

20120131-EE-VQP-002

3 Ways to Join

30-Day Free Trial

The Experts

98% positive feedback on 31,087 answers since March 2000. angeliii is a Microsoft Most Valuable Professional for his work with MS SQL Server & Develoment.

He has also proven his knowledge of Visual Basic Programming, PHP Scripting and Oracle Databases.

The Experts

97% positive feedback on 10,752 answers since July 2000. lrmoore has more than 18 years experience in the networking industry.

The six-time Mircosoft MVPs specialties include firewalls, virtual private networking, and network management.

Testimonials

"...and excellent source for support... Kind of like having your very own IT dept." Electriciansnet

Testimonials

"I was apprehensive at signing up at first. However... it has already made my life as an IT administrator much easier." JaCrews

Testimonials

"WOW! You guys have great, active, and knowledgeable people on here." moore50

Business Clients

Business Clients

In the Press

"If you’ve got a question... Experts Exchange can supply an answer.”

In the Press

"...an invaluable aid for both IT professionals and those who require tech support."

In the Press

"where IT professionals provide quick answers on just about any topic"

Business Account Plans

Loading Advertisement...