Avatar of alexk23
alexk23 asked on

select top n runs faster then select * on large table

We have query that joins 4 tables. Each table is a multi million row table.

sample query to give you an idea (real tables names and column were replaced with generics).

SELECT
   *
FROM Table1 t1 with (NOLOCK)
INNER JOIN Table2 t2 with (NOLOCK)
       t1.pk = t2.fk and (t2.col = 'W' or (t2.col3 = 'S' and t2.col4 = 5))
INNER JOIN Table3 t3 with (NOLOCK)
       on t3.fk = t1.pk and t3.col6 = 1
       and (t3.col3 is null or (t1.pk = t3.col5 and t3.col2 = 1))
       and (t3.col7 = 'Y' or (7 > 0 and t3.col9 <= getDate() + 7))
INNER JOIN Table4 t4 with (NOLOCK)
        t1.pk = t4.fk
WHERE t1.col9 = 'E'
  and t1.col10 <> 1
  and t1.col11 IS NOT NULL;

there is an index on every field that is used in every table in this query.
the 3rd join is very complex and creates internally some tables and results in some table scans (probably b/c of the OR conditions).
this query on average returns from 5-100 rows of data.

to optimize the query without changing anything all I did was add Top 1000 after the select keyword. Performance improvement is 500% to the original. Query plan does not show any table scans, only index seeks.

I cannot explain this behavior. Can someone help me understand why there is such a dramatic improvement? Isn't the top clause supposed to operate as the last step of the process, so it wouldn't effect performance at all?
Microsoft SQL Server 2005

Avatar of undefined
Last Comment
alexk23

8/22/2022 - Mon
chapmandew

Are the execution plans the same between the two statements?
ASKER
alexk23

totally different. that's what's baffling.
ASKER
alexk23

im going to post a screen capture of the two plans in a second
This is the best money I have ever spent. I cannot not tell you how many times these folks have saved my bacon. I learn so much from the contributors.
rwheeler23
adathelad

Are you clearing the data cache/execution plan cache between performance tests?

If not, then the apparent improvement may not be as high as it could be using cached data in memory (see: http://beta.experts-exchange.com/articles/Microsoft/Development/MS-SQL-Server/Comparing-SQL-Query-Optimisations-Fair-and-consistent-benchmarking.html)

It would be interesting to see what the performance difference is when you are running the query with a clean cache (only do on a test server, not recommended you clear the cache on production server).



ASKER
alexk23

attached is the query plan. notice the ONLY difference between the queries is TOP 1000 in the select clause
queryplan.gif
ASKER
alexk23

wrt to cache. I tried running the queries with different input parameters and performance gain was the same. not sure if this is the same as clearing the cache. i think the fact that the query plan is different demonstrates that it isn't simply a cache-fetch, but its actually doing something completely different
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
ASKER
alexk23

my original post has a typo. performance improvement is a whopping 5000% not, 500%. the execution plan shows 98%/2% original vs optimized.
chapmandew

you're 100% right....and its hard to explain why really.  SQL Server will try to determine the best exec plan that it can.  Probably adding the TOP() statement just makes SQL figure out a better way to run the query...this won't be the case for ALL queries, but it seems to be that way for this one.  Also, could be changing the query at all  (recompile) has made sql figure out a different plan.
SOLUTION
adathelad

Log in or sign up to see answer
Become an EE member today7-DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform
Sign up - Free for 7 days
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
See how we're fighting big data
Not exactly the question you had in mind?
Sign up for an EE membership and get your own personalized solution. With an EE membership, you can ask unlimited troubleshooting, research, or opinion questions.
ask a question
Éric Moreau

How many rows are returned when you don't have the TOP clause?

It seems that when you have the TOP clause, the WHERE clause is evaluated first to limit the number of rows and the JOINs are done (on less rows then you get a performance boost).
I started with Experts Exchange in 2004 and it's been a mainstay of my professional computing life since. It helped me launch a career as a programmer / Oracle data analyst
William Peck
ASKER
alexk23

in both cases same number of rows is returned. on average from 10-100 rows are returned by both queries.
SOLUTION
Log in to continue reading
Log In
Sign up - Free for 7 days
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
adathelad

The best resource I've found for understanding how SQL produces/chooses execution plans in in the Inside SQL Server 2005 books:
http://www.insidesqlserver.com/thebooks.html

In particular, the "Query Tuning and Optimization" and "T-SQL Querying" books.

But overall, my experience with query optimisation in a nutshell is:
1) try different things, you never quite know what will perform best unless you try it for your specific environment!
2) be fair/consistent with performance comparisons

Re: emoreau's comment, the order in which the various elements in a query are evaluated by the optimiser, I believe is done in a defined order (which I don't have to hand, but I know it's in the "T-SQL Querying" book - I'll check later), and I *think* the TOP condition is one of the last parts to be evaluated.

I'd also agree with chapmandew that don't assume adding TOP will always make queries faster!
ASKER CERTIFIED SOLUTION
Log in to continue reading
Log In
Sign up - Free for 7 days
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.