Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 951
  • Last Modified:

What are the performance implications of using derived tables?

In the ASP.NET section of experts exchange, I have asked a question about how to get the (for example) 200th through 250th row of a result set in a MSSQL database.  The actual table is very large, and has duplicate key values, so I am struggling with how to page through it.  

I believe that temporary tables (create a table with an identity value) are too high overhead, because the base table has 4 million records.

The following query has been suggested.  I am not familiar with the DERIVED TABLE syntax, and I am not sure what the performance implications would be.  

Can you please comment on how efficient this query would be on a 4M row table, and suggest a better method if you know of one?

Here's the query:

SELECT * FROM(SELECT TOP 50 * FROM (SELECT TOP 250 * FROM (SELECT TOP 250 * FROM tablename ORDER BY field) DERIVEDTBL ORDER BY field DESC) DERIVEDTBL) DERIVEDTBL ORDER BY field

The actual query would have where clauses in it to retreive only rows that match a particular key or key range, but there could be thousands of records returned.  I am trying to avoid a several thousand row result set for performance reasons.

Will the query optimizer figure out what this means and do the right thing, or will it it create two temporary tables, and if so, should I worry about the overhead of doing this?

Thanks,

Kevin
0
kevinmackenzie
Asked:
kevinmackenzie
  • 4
  • 3
  • 2
  • +1
1 Solution
 
robertjbarkerCommented:
DERIVEDTBL is not a key word. It could just as well be "ABC" and the query you have would act just the same.

The first thing that is going to happen is that "(SELECT TOP 250 * FROM tablename ORDER BY field)" will be executed. From then on you are just working with 250 rows, and that is pretty small.

Getting that 250 rows is the thing to optimize, and the first thing to do there is to make sure "field" is indexed.  If it is not, you will do a full table scan to search through every value of that field.
0
 
kevinmackenzieAuthor Commented:
Thank you.  Is this syntax (without the DERIVEDTBL) the most efficient way to do this, or would creating a temporary table (more than one SQL statement) be better, or do you have another suggestion?

In other words, is this the best way to do this from a  performance standpoint, in your opinion?

Kevin
0
 
arbertCommented:
Have you looked at the option of using ADO.NET paging?  Granted it has its own set of overhead and limitations, but it is an option.
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
kevinmackenzieAuthor Commented:
On ADO.NET paging, yes I have looked at it.  ASP.NET apparently does not completely expose the interface, as the only thing I can find is a Dataset, which reads every row that satisfies the query into memory on the server, which is unacceptable on a large table.  I'm told that ASP offered better services in this regard, perhaps later versions of ASP.NET will catch up.

Thanks for the suggestion.

Kevin
0
 
kevinmackenzieAuthor Commented:
Razo,

I also found, when actually trying to use this syntax today, that I couldn't make it work in a complete application without knowing what the total number of records in the table is.  So using it requires two queries (not just two select statements).  The first, a COUNT(*) query, and the second the line shown previously in this question.  Why two?  Because the argument for the TOP clause cannot be a variable. So I have to do the first query, then build the string for the second SQL command.

So, for you database experts out there, how much overhead is there to a COUNT(*) operation?  I suspect it is a complete pass of the database, starting at the first indexed point in the table, and continuing to the upper bound of the WHERE clause.  That starts to make this method unacceptable...

Kevin
0
 
arbertCommented:
Take a quick look at the MS document on .NET paging (Sorry if you've already looked at it--not trying to waste your time).  Notice, in the document, they have a stored proc that runs database side to get the result set.  The only downside is the dynamic SQL.  However, I don't think that's any worse than running a query twice (once for the count, and again to use the count):

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnpag/html/scalenethowto05.asp

Brett
0
 
arbertCommented:
(oh ya, you're right, the future is going to be much better with SQL2005)
0
 
robertjbarkerCommented:
What I meant was that the identifier for the intermediate results could be anything, for example:

SELECT * FROM(SELECT TOP 50 * FROM (SELECT TOP 250 * FROM (SELECT TOP 250 * FROM tablename ORDER BY field) ABC ORDER BY field DESC) ABC) ABC ORDER BY field

The identifiers still have to be there, or there is a syntax error.

Anyway, I am not an ASP kind of guy. But from an SQL viewpoint, I can't think of a better way to do what you want, as long as you have an index defined for each "field" you intend to use.
0
 
plqCommented:
Something I found on another thread which might be useful for this

select top 250 *, identity(int, 1,1) 'fred'
into #temp
from asset

select * from #temp where fred > 200

drop table #temp


this returns rows 201 to 250


Personally I just use 'top' and cut out records 1 to 200 using rs.absoluteposition
0
 
kevinmackenzieAuthor Commented:
Arbert,

I had not found that help file, and it is exactly what I need.

For anyone else looking into this problem, the solution I eventually decided to use is located here:

http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=27297

and is referred to as "Jeff's solution".

Here is his pseudo code:

-- for each column in your sort, you need a variable to hold
-- the "starting values". In our case, we need two:
declare @startingDate datetime;
declare @startingID int;

-- again, we want to returns resutls from row @a to row @b:
declare @a int;
declare @b int;

set @a = 200 -- start at row 200
set @b = 250 -- end at row 250

-- get the starting date and starting ID to return results:
set rowcount @a
select @StartingDate = DateCol, @startingID = ID
from yourtable
order by DateCol ASC,ID ASC

-- find out how many rows to return, and set the rowcount:
set @b = @b - @a
set rowcount @b

-- now return the results:
select * from yourtable
where
DateCol > @StartingDate OR
(DateCol = @StartingDate AND ID > @StartingID)
order by DateCol ASC, ID ASC

-- clean up:
set rowcount 0
 
0

Featured Post

Visualize your virtual and backup environments

Create well-organized and polished visualizations of your virtual and backup environments when planning VMware vSphere, Microsoft Hyper-V or Veeam deployments. It helps you to gain better visibility and valuable business insights.

  • 4
  • 3
  • 2
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now