Link to home
Start Free TrialLog in
Avatar of JackOfPH
JackOfPHFlag for Philippines

asked on

Which is better query to optimize speed of execution

I have a 30000 row data in the table employee, I want to retrieve the row data by 1000 per page?
Option 1:
 
select * from (
select x.*, RowNum row_num
from (select *, row_number() over (order by FirstName, LastName) as RowNum from employees) x
where RowNum <= 4000
) as y where row_num >= 3000
 
Option 2:
 
select * from (
select x.*, RowNum row_num
from (select *, row_number() over (order by FirstName, LastName) as RowNum from employees) x
) as y where row_num between 3000 and 4000
 
Option 3:
 
SELECT TOP 1000 * FROM Employees 
WHERE EmployeeID 
	NOT IN (SELECT TOP 3000 EMployeeID FROM Employees)
ORDER BY FirstName, LastName

Open in new window

Avatar of Pavel Celba
Pavel Celba
Flag of Czechia image

Option 2 is a little bit better than Option 1. Option 3 is not correct (invalid order in subquery).
Avatar of JackOfPH

ASKER

SELECT TOP 1000 * FROM Employees
WHERE EmployeeID
        NOT IN (SELECT TOP 3000 EMployeeID FROM Employees)
ORDER BY FirstName, LastName


What is wrong with this query?

Sorry, Option 1 is also incorrect because the RowNum column is not possible to use in the first where clause.
The NOT IN condition is wrong in Option 3.

SELECT TOP 3000 EMployeeID FROM Employees

Selects 3000 employees without any order specified. You could change it to:

SELECT TOP 1000 * FROM Employees
WHERE EmployeeID
        NOT IN (SELECT TOP 3000 EMployeeID FROM Employees ORDER BY FirstName, LastName)
ORDER BY FirstName, LastName
Can you explain why it is wrong in layman's term?
Do you have a suggested Query?
Since I change option 3 to this :

SELECT TOP 1000 * FROM Employees
WHERE EmployeeID
        NOT IN (SELECT TOP 3000 EMployeeID FROM Employees ORDER BY FirstName, LastName)
ORDER BY FirstName, LastName

Which is now better query to optimize execution? Option 2 or option 3?

Option 2 is:

select * from (
select x.*, RowNum row_num
from (select *, row_number() over (order by FirstName, LastName) as RowNum from employees) x
) as y where row_num between 3000 and 4000
 
ASKER CERTIFIED SOLUTION
Avatar of Pavel Celba
Pavel Celba
Flag of Czechia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
The best performance will provide new indexed column containing the Row number, of course.

It has many disadvantages. E.g. such column must be recalculated after each employee table update.
>>The best performance will provide new indexed column containing the Row number, of course.

What do you mean? Do you have query in mind?
No, I don't have query in mind.

You just have to imagine what SQL engine is doing when executing your query in Option 2. It must order the whole table each time it is executed. If you add one more column to the Employee table and populate it by Row number then you may write a simple query:

SELECT * FROM Employee WHERE Rownumber BETWEEN 3000 AND 4000

which is fully optimized when this new column is indexed.

Disadvantages of this solution are clear.

Avatar of mazher
mazher

Option 3
use execution plan or this approach 
SET STATISTICS TIME ON
SET STATISTICS TIME OFF

DBCC FREEPROCCACHE

DECLARE @CPU_START int;
set @CPU_START = @@CPU_BUSY

select top 100 * from ACCOUNTS
PRINT 'MS of CPU fro loop1: ' +
            cast ((@@CPU_BUSY - @CPU_START) * @@TIMETICKS / 1000 AS CHAR)

set @CPU_START = @@CPU_BUSY

PRINT 'MS of CPU fro loop1: ' +
            cast ((@@CPU_BUSY - @CPU_START) * @@TIMETICKS / 1000 AS CHAR)

-- misuring IO statistics of query
DBCC DROPCLEANBUFFERS
SET STATISTICS IO ON

SET STATISTICS IO OFF


DBCC DROPCLEANBUFFERS
SET STATISTICS TIME ON
SET STATISTICS IO ON
 
DECLARE @CPU_START int;
set @CPU_START = @@CPU_BUSY

Open in new window

sorry use this approach
for all three options you will get which query is fastest and spends less resources 

DBCC DROPCLEANBUFFERS
SET STATISTICS TIME ON
SET STATISTICS IO ON
 
DECLARE @CPU_START int;
 
set @CPU_START = @@CPU_BUSY
 
--your query option 1
 
PRINT 'MS of CPU fro loop1: ' +
 cast ((@@CPU_BUSY - @CPU_START) * @@TIMETICKS / 1000 AS CHAR)
 
SET STATISTICS TIME OFF
SET STATISTICS IO OFF

Open in new window

I will check all your suggestions tomorrow when I return from to office...
SELECT * FROM Employee WHERE Rownumber BETWEEN 3000 AND 4000

This query does not work!
Row_Number is not directly accessable out of select clause. Other option is to use identity column with temp table.


USE [AdventureWorks] 
GO 
SELECT  *
from    ( SELECT --TOP 5000
                    ROW_NUMBER( ) OVER ( ORDER BY employeeID ) AS RowNumber
          FROM      [HumanResources].[Employee] AS e
        ) Emp
WHERE   [RowNumber] BETWEEN  3000 AND 4000

Open in new window