Solved

What are the performance implications of using derived tables?

Posted on 2004-10-09
10
933 Views
Last Modified: 2012-05-05
In the ASP.NET section of experts exchange, I have asked a question about how to get the (for example) 200th through 250th row of a result set in a MSSQL database.  The actual table is very large, and has duplicate key values, so I am struggling with how to page through it.  

I believe that temporary tables (create a table with an identity value) are too high overhead, because the base table has 4 million records.

The following query has been suggested.  I am not familiar with the DERIVED TABLE syntax, and I am not sure what the performance implications would be.  

Can you please comment on how efficient this query would be on a 4M row table, and suggest a better method if you know of one?

Here's the query:

SELECT * FROM(SELECT TOP 50 * FROM (SELECT TOP 250 * FROM (SELECT TOP 250 * FROM tablename ORDER BY field) DERIVEDTBL ORDER BY field DESC) DERIVEDTBL) DERIVEDTBL ORDER BY field

The actual query would have where clauses in it to retreive only rows that match a particular key or key range, but there could be thousands of records returned.  I am trying to avoid a several thousand row result set for performance reasons.

Will the query optimizer figure out what this means and do the right thing, or will it it create two temporary tables, and if so, should I worry about the overhead of doing this?

Thanks,

Kevin
0
Comment
Question by:kevinmackenzie
  • 4
  • 3
  • 2
  • +1
10 Comments
 
LVL 6

Expert Comment

by:robertjbarker
ID: 12267696
DERIVEDTBL is not a key word. It could just as well be "ABC" and the query you have would act just the same.

The first thing that is going to happen is that "(SELECT TOP 250 * FROM tablename ORDER BY field)" will be executed. From then on you are just working with 250 rows, and that is pretty small.

Getting that 250 rows is the thing to optimize, and the first thing to do there is to make sure "field" is indexed.  If it is not, you will do a full table scan to search through every value of that field.
0
 

Author Comment

by:kevinmackenzie
ID: 12268121
Thank you.  Is this syntax (without the DERIVEDTBL) the most efficient way to do this, or would creating a temporary table (more than one SQL statement) be better, or do you have another suggestion?

In other words, is this the best way to do this from a  performance standpoint, in your opinion?

Kevin
0
 
LVL 34

Expert Comment

by:arbert
ID: 12268879
Have you looked at the option of using ADO.NET paging?  Granted it has its own set of overhead and limitations, but it is an option.
0
 

Author Comment

by:kevinmackenzie
ID: 12268891
On ADO.NET paging, yes I have looked at it.  ASP.NET apparently does not completely expose the interface, as the only thing I can find is a Dataset, which reads every row that satisfies the query into memory on the server, which is unacceptable on a large table.  I'm told that ASP offered better services in this regard, perhaps later versions of ASP.NET will catch up.

Thanks for the suggestion.

Kevin
0
 

Author Comment

by:kevinmackenzie
ID: 12268899
Razo,

I also found, when actually trying to use this syntax today, that I couldn't make it work in a complete application without knowing what the total number of records in the table is.  So using it requires two queries (not just two select statements).  The first, a COUNT(*) query, and the second the line shown previously in this question.  Why two?  Because the argument for the TOP clause cannot be a variable. So I have to do the first query, then build the string for the second SQL command.

So, for you database experts out there, how much overhead is there to a COUNT(*) operation?  I suspect it is a complete pass of the database, starting at the first indexed point in the table, and continuing to the upper bound of the WHERE clause.  That starts to make this method unacceptable...

Kevin
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 34

Accepted Solution

by:
arbert earned 500 total points
ID: 12268962
Take a quick look at the MS document on .NET paging (Sorry if you've already looked at it--not trying to waste your time).  Notice, in the document, they have a stored proc that runs database side to get the result set.  The only downside is the dynamic SQL.  However, I don't think that's any worse than running a query twice (once for the count, and again to use the count):

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnpag/html/scalenethowto05.asp

Brett
0
 
LVL 34

Expert Comment

by:arbert
ID: 12268963
(oh ya, you're right, the future is going to be much better with SQL2005)
0
 
LVL 6

Expert Comment

by:robertjbarker
ID: 12269244
What I meant was that the identifier for the intermediate results could be anything, for example:

SELECT * FROM(SELECT TOP 50 * FROM (SELECT TOP 250 * FROM (SELECT TOP 250 * FROM tablename ORDER BY field) ABC ORDER BY field DESC) ABC) ABC ORDER BY field

The identifiers still have to be there, or there is a syntax error.

Anyway, I am not an ASP kind of guy. But from an SQL viewpoint, I can't think of a better way to do what you want, as long as you have an index defined for each "field" you intend to use.
0
 
LVL 8

Expert Comment

by:plq
ID: 12269316
Something I found on another thread which might be useful for this

select top 250 *, identity(int, 1,1) 'fred'
into #temp
from asset

select * from #temp where fred > 200

drop table #temp


this returns rows 201 to 250


Personally I just use 'top' and cut out records 1 to 200 using rs.absoluteposition
0
 

Author Comment

by:kevinmackenzie
ID: 12271082
Arbert,

I had not found that help file, and it is exactly what I need.

For anyone else looking into this problem, the solution I eventually decided to use is located here:

http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=27297

and is referred to as "Jeff's solution".

Here is his pseudo code:

-- for each column in your sort, you need a variable to hold
-- the "starting values". In our case, we need two:
declare @startingDate datetime;
declare @startingID int;

-- again, we want to returns resutls from row @a to row @b:
declare @a int;
declare @b int;

set @a = 200 -- start at row 200
set @b = 250 -- end at row 250

-- get the starting date and starting ID to return results:
set rowcount @a
select @StartingDate = DateCol, @startingID = ID
from yourtable
order by DateCol ASC,ID ASC

-- find out how many rows to return, and set the rowcount:
set @b = @b - @a
set rowcount @b

-- now return the results:
select * from yourtable
where
DateCol > @StartingDate OR
(DateCol = @StartingDate AND ID > @StartingID)
order by DateCol ASC, ID ASC

-- clean up:
set rowcount 0
 
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

When you hear the word proxy, you may become apprehensive. This article will help you to understand Proxy and when it is useful. Let's talk Proxy for SQL Server. (Not in terms of Internet access.) Typically, you'll run into this type of problem w…
JSON is being used more and more, besides XML, and you surely wanted to parse the data out into SQL instead of doing it in some Javascript. The below function in SQL Server can do the job for you, returning a quick table with the parsed data.
Familiarize people with the process of utilizing SQL Server functions from within Microsoft Access. Microsoft Access is a very powerful client/server development tool. One of the SQL Server objects that you can interact with from within Microsoft Ac…
This video shows, step by step, how to configure Oracle Heterogeneous Services via the Generic Gateway Agent in order to make a connection from an Oracle session and access a remote SQL Server database table.

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now