asked on

In SQL Server, what would cause a query's duration to increase when I/O and CPU time are cut by 50%?

I have three queries that I've worked to optimize and my efforts have reduced CPU usage and I/O counts by half. (I verified these values via profiler and statictics in QA.) When I execute the "optimized" queries after executing "dbcc dropcleanbuffers", the duration is roughly double the original duration. When I execute the queries without dropping the buffers, the duration is rougly half the original value.

Here are some values returned by Profiler after dropping the buffers:
CPU: 6969 (old), 3250 (new)
Reads: 537075 (old), 191632 (new)
Duration 13235 (old), 36990 (new)

When looking at physical I/O via "set statistics io on", the number of physical I/O operations drop significantly. So, how can duration go up when CPU usage goes down, logical I/O goes down, and physical I/O goes down? What can I do to prove/fix the problem?

chapmandew

Because when you use dbcc dropcleanbuffers,it has to recompile a new execution plan....that is what is taking your time.

Guy Hengel [angelIII / a3]

actually, dropping the buffers means that on the next queries, data HAS to be read from disk again, instead from the memory buffers.
I don't think that the execution plan of the query is recompiled, that is the DBCC FREEPROCCACHE statement having that behaviour.

chapmandew

whoops...read it wrong. my eyes read dropcleanbuffers, but my mind read freeproccache. :) You're right.

Mark Wills

And I think that is what the real question is. If dropcleanbuffers forces read from disk again (and agree that is what should be happening) then why is I/O half of what it was ? Bottom line I think is it was trying very hard to re-establish execution plan, and that is why elapsed time increased where there was a drop in all other activity.

Mark Wills

To really answer your question, probably a few more stats - including what is happening in memory -are required along the lines of:

Cache Hit Ratio: SQL Plans SQLServer:Plan Cache < 70% Indicates low plan reuse.
Buffer Cache Hit Ratio SQLServer:Buffer Manager < 97% Potential for memory pressure.
Compilations/sec SQLServer:SQL Statistics Trend Compare to Batch Requests/sec.
Re-Compilations/sec SQLServer:SQL Statistics Trend Compare to Batch Requests/sec.

can also use this piece of SQL - not sure where it originally came from, but have used it a lot...

SELECT
substring(text,qs.statement_start_offset/2
,(CASE
WHEN qs.statement_end_offset = -1 THEN len(convert(nvarchar(max), text)) * 2
ELSE qs.statement_end_offset
END - qs.statement_start_offset)/2)
,qs.plan_generation_num as recompiles
,qs.execution_count as execution_count
,qs.total_elapsed_time - qs.total_worker_time as total_wait_time
,qs.total_worker_time as cpu_time
,qs.total_logical_reads as reads
,qs.total_logical_writes as writes
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) st
LEFT JOIN sys.dm_exec_requests r
ON qs.sql_handle = r.sql_handle
ORDER BY 3 DESC

AHTDEV

ASKER

I've done a little more testing to (I think) answer the questions raised above. I'm executing this on a machine with no other users so I'm not concerned about any kind of contention. The perfmon numbers appear to be in line with expectations. (No recompiles; proc cache hit)

I ran perfmon to look at the statistics mentioned above and everything appears to be in line with what I'd expect except for the "avg disk queue length". It spikes for the "optimized" version and stays at the "peak" for about 3x as long as the "unoptimized" version's peak.

Optimized version: 500 CPU; 59820 reads; 0 writes; 21294 duration
Unotimized version: 2469 CPU; 114911 reads; 108 writes; 8637 duration

Mark Wills

You have an IO bottleneck, now need to work out why it is sooo busy waiting.

Disk Queue Length represents the average number of physical read and write requests that were queued on the selected physical disk during the sampling period. If your I/O system is overloaded, more read/write operations will be waiting. If your disk queue length frequently exceeds a value of 2 during peak usage of SQL Server, then you might have an I/O bottleneck.

Are you sure you dont have byte read and byte write monitoring enabled in your virus software ?
Are the disks local ? Does avaliable disk space encroach the (min 15% free) 20% threshold (regardless of size)?

check the following...

SQL Server performance depends heavily on the I/O subsystem. Unless your database fits into physical memory, SQL Server constantly brings database pages in and out of the buffer pool. This generates substantial I/O traffic. Similarly, the log records need to be flushed to the disk before a transaction can be declared committed. And finally, SQL Server uses tempdb for various purposes such as to store intermediate results, to sort, to keep row versions and so on. So a good I/O subsystem is critical to the performance of SQL Server.
Access to log files is sequential except when a transaction needs to be rolled back while access to data files, including tempdb, is randomly accessed. So as a general rule, you should have log files on a separate physical disk than data files for better performance. The focus of this paper is not how to configure your I/O devices but to describe ways to identify if you have I/O bottleneck. Once an I/O bottleneck is identified, you may need to reconfigure your I/O subsystem.
If you have a slow I/O subsystem, your users may experience performance problems such as slow response times, and tasks that abort due to timeouts.
You can use the following performance counters to identify I/O bottlenecks. Note, these AVG values tend to be skewed (to the low side) if you have an infrequent collection interval. For example, it is hard to tell the nature of an I/O spike with 60-second snapshots. Also, you should not rely on one counter to determine a bottleneck; look for multiple counters to cross check the validity of your findings.

"PhysicalDisk Object: Avg. Disk Queue Length represents the average number of physical read and write requests that were queued on the selected physical disk during the sampling period. If your I/O system is overloaded, more read/write operations will be waiting. If your disk queue length frequently exceeds a value of 2 during peak usage of SQL Server, then you might have an I/O bottleneck.

"Avg. Disk Sec/Read is the average time, in seconds, of a read of data from the disk. Any number
Less than 10 ms - very good
Between 10 - 20 ms - okay
Between 20 - 50 ms - slow, needs attention
Greater than 50 ms Serious I/O bottleneck

"Avg. Disk Sec/Write is the average time, in seconds, of a write of data to the disk. Please refer to the guideline in the previous bullet.

"Physical Disk: %Disk Time is the percentage of elapsed time that the selected disk drive was busy servicing read or write requests. A general guideline is that if this value is greater than 50 percent, it represents an I/O bottleneck.

"Avg. Disk Reads/Sec is the rate of read operations on the disk. You need to make sure that this number is less than 85 percent of the disk capacity. The disk access time increases exponentially beyond 85 percent capacity.

"Avg. Disk Writes/Sec is the rate of write operations on the disk. Make sure that this number is less than 85 percent of the disk capacity. The disk access time increases exponentially beyond 85 percent capacity.

When using above counters, you may need to adjust the values for RAID configurations using the following formulas.
Raid 0 -- I/Os per disk = (reads + writes) / number of disks
Raid 1 -- I/Os per disk = [reads + (2 * writes)] / 2
Raid 5 -- I/Os per disk = [reads + (4 * writes)] / number of disks
Raid 10 -- I/Os per disk = [reads + (2 * writes)] / number of disks

ASKER CERTIFIED SOLUTION

AHTDEV

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

SOLUTION

Mark Wills

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial