asked on

sql query to get user timesheet events and calculate time between events

need assistance in speeding up performance. this part of the procedure is taking 25+ seconds to run returning 25k+ rows.

SELECT      c1.name, c1.event, c1.reason, c1.dtime as startTime,
            ISNULL((SELECT MIN(c2.dtime)
                        FROM [clock] c2
                        WHERE      c2.dtime > c1.dtime
                                    AND c2.name=c1.name ),GETUTCDATE()) as endTime
FROM      [clock] c1
WHERE      c1.dtime > CAST(CAST(GETDATE() as date) as dtime)

SOLUTION

Phil Davidson

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

danrosenthal

Give this a try:

WITH clockData (name, event, reason, dtime, rowNum)
AS (
	SELECT name, event, reason, dtime 
	,ROW_NUMBER() over(PARTITION BY name ORDER BY dtime) AS rowNum
	FROM [clock] 
	WHERE dtime > GETDATE()
)
SELECT name, event, reason, dtime AS startTime, ISNULL(c2.dtime,GETUTCDATE()) AS endTime
FROM clockData c1
LEFT JOIN clockData c2
ON c1.name = c2.name 
AND c1.rowNum = c2.rowNum + 1

Open in new window

ASKER CERTIFIED SOLUTION

Vitor Montalvão

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Scott Pletcher

For these specific queries, you'd get best performance by:

1) clustering the clock table on ( dtime )
2) add a nonclustered index on ( name, dtime )

Phil Davidson

Rebuilding the table is an interesting idea. I'm not sure if a primary key index could be put on a value such as dtime. Time is thought of as nondeterministic. I don't know if it would be unique.

You could look at the actual execution plan. If you are able to post it (however due to company policy you may not be), that may help us. To add to my #1 item, and if you really want to be proactive, you could place the nonclustered index on a separate hard disk with a separate disk controller. This would avoid disk contention.

I wonder if you couldn't rewrite this to use a variable like

datetime2 @rectangle = GETUTCDATE()
SELECT c1.name, c1.event, c1.reason, c1.dtime as startTime, ISNULL(MIN(c2.dtime), @rectangle) as endTime    
FROM  [clock] c1 
	INNER JOIN ON c2.name=c1.name
WHERE c2.dtime > c1.dtime AND c1.dtime > CAST(CAST(GETDATE() as date) as dtime) 
GROUP BY c1.name, c1.event, c1.reason, c1.dtime

Open in new window

Question to Vitor: Is there a reason to use the date data type as opposed to datetime2? I think datetime2 has more precision and uses less storage space in the table than date.

This way the GETUTCDATE is only called once. Having this function not called each time yet rely on memory for a variable may speed it up. I like the idea of using an INNER JOIN (which is faster). Hopefully the GROUP BY clause doesn't degrade performance. Ideally the server isn't memory constrained. If it is, the TEMPDB should be on its own storage device with a fast disk RPM speed (e.g., RPM 15k, or some high quality SAN). When the RAM is fully utilized, the TEMPDB can be written to more frequently than it should.

Vitor Montalvão

Question to Vitor: Is there a reason to use the date data type as opposed to datetime2? I think datetime2 has more precision and uses less storage space in the table than date.

You need to ask that to the author. I just rewrote the query to avoid the subquery. Kept all the rest as it was.

Scott Pletcher

>> I'm not sure if a primary key index could be put on a value such as dtime. ... I don't know if it would be unique. <<

Not, not the PK, the clustered index. No, it certainly wouldn't be unique, but it could very well still be the best clustered index, esp. if you (almost) always specify a dtime range when you process the table. A narrow, unique clustering key is a general rule and should not be blindly followed, no matter what some people claim.

danrosenthal

Indexes should definitely be added or modified if they are not optimal.
Have you had a chance to try the query I posted? (comment #2)

rlarian

ASKER

I'm getting much better speeds now. the one part that is not working is the current event. since the WHERE is looking for c2.dtime > c1.dtime and the next event hasn't been started so there is no endTime.

I am not getting the current event - clocked out for lunch, but have not clocked back in. The current time at lunch will be the time the next event starts, or the current time:

ISNULL(MIN(c2.dtime),GETUTCDATE()) as endTime

SOLUTION

Scott Pletcher

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Phil Davidson

clustering the clock table on ( dtime ) (or if there a dozens+ of dups for one dtime, then ( dtime, $IDENTITY ), if there is an identity column

In my opinion, this may or may not help. By creating a complex key (the dtime with something else), you may get a deterministic composite key for which you can implement an a primary key constraint. This will then be the clustered index (the table itself). However, while the composite key will potentially cover the index, there is a downside. The index may be unnecessarily complex and lead to worse performance. The actual execution plan may show that there is a greater cost when the clustered index is used.

Vitor Montalvão

I'm getting much better speeds now

rlarian, which query are you working with?

rlarian

ASKER

Thanks for your help. I used a combination of the different ideas to improve the speed.
one thing odd: if i use a variable in the WHERE
SET @timeLocal = GETDATE()
WHERE c1.dtime > @timeLocal ,

it is 10+ slower than

WHERE c1.dtime > GETDATE()

Phil Davidson

I wanted to comment on this close question. In an earlier comment, I said this:

By creating a complex key (the dtime with something else), you may get a deterministic composite key for which you can implement an a primary key constraint.

I'm not sure if a composite key with a nondeterministic column could be deterministic. That may or may not be true. If the combination of columns were nondeterministic, they couldn't form a composite key for the purposes of an index. A table without an index is a heap. Just an FYI.

Scott Pletcher

It doesn't have to be a PK, just the clustering key. You need to stop conflating the two, because they are not necessarily the same thing.

>> By creating a complex key (the dtime with something else) ... The index may be unnecessarily complex and lead to worse performance. <<
I think shows a fundamental misunderstanding of how SQL uses indexes, particularly a clustered indexes. I've got clustering keys that are 5 columns wide that work beautifully.

Clustered indexes do a great job for range searches, which fits dtime perfectly here. If a dtime range is (almost) always specified, that should be the clustering key. If rows insert rapidly enough that there could be a very large number of duplicate dtimes, then you can add an identity to the key so that you can explicitly specify it as "unique", removing SQL from having to generate a "uniquifier" for the dup key values.

>> WHERE c1.dtime > @timeLocal ,

it is 10+ slower than

WHERE c1.dtime > GETDATE() <<

That seems much more likely if dtime is only a nonclustered index. In that case, it's probably worth forcing a recompile for every query, as otherwise you could get a bad query plan when the lookup value changes.