JackOfPH
asked on
Which is better query to optimize speed of execution
I have a 30000 row data in the table employee, I want to retrieve the row data by 1000 per page?
Option 1:
select * from (
select x.*, RowNum row_num
from (select *, row_number() over (order by FirstName, LastName) as RowNum from employees) x
where RowNum <= 4000
) as y where row_num >= 3000
Option 2:
select * from (
select x.*, RowNum row_num
from (select *, row_number() over (order by FirstName, LastName) as RowNum from employees) x
) as y where row_num between 3000 and 4000
Option 3:
SELECT TOP 1000 * FROM Employees
WHERE EmployeeID
NOT IN (SELECT TOP 3000 EMployeeID FROM Employees)
ORDER BY FirstName, LastName
Option 2 is a little bit better than Option 1. Option 3 is not correct (invalid order in subquery).
ASKER
SELECT TOP 1000 * FROM Employees
WHERE EmployeeID
NOT IN (SELECT TOP 3000 EMployeeID FROM Employees)
ORDER BY FirstName, LastName
What is wrong with this query?
WHERE EmployeeID
NOT IN (SELECT TOP 3000 EMployeeID FROM Employees)
ORDER BY FirstName, LastName
What is wrong with this query?
Sorry, Option 1 is also incorrect because the RowNum column is not possible to use in the first where clause.
The NOT IN condition is wrong in Option 3.
SELECT TOP 3000 EMployeeID FROM Employees
Selects 3000 employees without any order specified. You could change it to:
SELECT TOP 1000 * FROM Employees
WHERE EmployeeID
NOT IN (SELECT TOP 3000 EMployeeID FROM Employees ORDER BY FirstName, LastName)
ORDER BY FirstName, LastName
SELECT TOP 3000 EMployeeID FROM Employees
Selects 3000 employees without any order specified. You could change it to:
SELECT TOP 1000 * FROM Employees
WHERE EmployeeID
NOT IN (SELECT TOP 3000 EMployeeID FROM Employees ORDER BY FirstName, LastName)
ORDER BY FirstName, LastName
ASKER
Can you explain why it is wrong in layman's term?
Do you have a suggested Query?
Do you have a suggested Query?
ASKER
Since I change option 3 to this :
SELECT TOP 1000 * FROM Employees
WHERE EmployeeID
NOT IN (SELECT TOP 3000 EMployeeID FROM Employees ORDER BY FirstName, LastName)
ORDER BY FirstName, LastName
Which is now better query to optimize execution? Option 2 or option 3?
Option 2 is:
select * from (
select x.*, RowNum row_num
from (select *, row_number() over (order by FirstName, LastName) as RowNum from employees) x
) as y where row_num between 3000 and 4000
SELECT TOP 1000 * FROM Employees
WHERE EmployeeID
NOT IN (SELECT TOP 3000 EMployeeID FROM Employees ORDER BY FirstName, LastName)
ORDER BY FirstName, LastName
Which is now better query to optimize execution? Option 2 or option 3?
Option 2 is:
select * from (
select x.*, RowNum row_num
from (select *, row_number() over (order by FirstName, LastName) as RowNum from employees) x
) as y where row_num between 3000 and 4000
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
The best performance will provide new indexed column containing the Row number, of course.
It has many disadvantages. E.g. such column must be recalculated after each employee table update.
It has many disadvantages. E.g. such column must be recalculated after each employee table update.
ASKER
>>The best performance will provide new indexed column containing the Row number, of course.
What do you mean? Do you have query in mind?
What do you mean? Do you have query in mind?
No, I don't have query in mind.
You just have to imagine what SQL engine is doing when executing your query in Option 2. It must order the whole table each time it is executed. If you add one more column to the Employee table and populate it by Row number then you may write a simple query:
SELECT * FROM Employee WHERE Rownumber BETWEEN 3000 AND 4000
which is fully optimized when this new column is indexed.
Disadvantages of this solution are clear.
You just have to imagine what SQL engine is doing when executing your query in Option 2. It must order the whole table each time it is executed. If you add one more column to the Employee table and populate it by Row number then you may write a simple query:
SELECT * FROM Employee WHERE Rownumber BETWEEN 3000 AND 4000
which is fully optimized when this new column is indexed.
Disadvantages of this solution are clear.
Option 3
use execution plan or this approach
SET STATISTICS TIME ON
SET STATISTICS TIME OFF
DBCC FREEPROCCACHE
DECLARE @CPU_START int;
set @CPU_START = @@CPU_BUSY
select top 100 * from ACCOUNTS
PRINT 'MS of CPU fro loop1: ' +
cast ((@@CPU_BUSY - @CPU_START) * @@TIMETICKS / 1000 AS CHAR)
set @CPU_START = @@CPU_BUSY
PRINT 'MS of CPU fro loop1: ' +
cast ((@@CPU_BUSY - @CPU_START) * @@TIMETICKS / 1000 AS CHAR)
-- misuring IO statistics of query
DBCC DROPCLEANBUFFERS
SET STATISTICS IO ON
SET STATISTICS IO OFF
SET STATISTICS TIME ON
SET STATISTICS TIME OFF
DBCC FREEPROCCACHE
DECLARE @CPU_START int;
set @CPU_START = @@CPU_BUSY
select top 100 * from ACCOUNTS
PRINT 'MS of CPU fro loop1: ' +
cast ((@@CPU_BUSY - @CPU_START) * @@TIMETICKS / 1000 AS CHAR)
set @CPU_START = @@CPU_BUSY
PRINT 'MS of CPU fro loop1: ' +
cast ((@@CPU_BUSY - @CPU_START) * @@TIMETICKS / 1000 AS CHAR)
-- misuring IO statistics of query
DBCC DROPCLEANBUFFERS
SET STATISTICS IO ON
SET STATISTICS IO OFF
DBCC DROPCLEANBUFFERS
SET STATISTICS TIME ON
SET STATISTICS IO ON
DECLARE @CPU_START int;
set @CPU_START = @@CPU_BUSY
sorry use this approach
for all three options you will get which query is fastest and spends less resources
for all three options you will get which query is fastest and spends less resources
DBCC DROPCLEANBUFFERS
SET STATISTICS TIME ON
SET STATISTICS IO ON
DECLARE @CPU_START int;
set @CPU_START = @@CPU_BUSY
--your query option 1
PRINT 'MS of CPU fro loop1: ' +
cast ((@@CPU_BUSY - @CPU_START) * @@TIMETICKS / 1000 AS CHAR)
SET STATISTICS TIME OFF
SET STATISTICS IO OFF
ASKER
I will check all your suggestions tomorrow when I return from to office...
ASKER
SELECT * FROM Employee WHERE Rownumber BETWEEN 3000 AND 4000
This query does not work!
This query does not work!
Row_Number is not directly accessable out of select clause. Other option is to use identity column with temp table.
USE [AdventureWorks]
GO
SELECT *
from ( SELECT --TOP 5000
ROW_NUMBER( ) OVER ( ORDER BY employeeID ) AS RowNumber
FROM [HumanResources].[Employee] AS e
) Emp
WHERE [RowNumber] BETWEEN 3000 AND 4000