rlarian
asked on
sql query to get user timesheet events and calculate time between events
need assistance in speeding up performance. this part of the procedure is taking 25+ seconds to run returning 25k+ rows.
SELECT c1.name, c1.event, c1.reason, c1.dtime as startTime,
ISNULL((SELECT MIN(c2.dtime)
FROM [clock] c2
WHERE c2.dtime > c1.dtime
AND c2.name=c1.name ),GETUTCDATE()) as endTime
FROM [clock] c1
WHERE c1.dtime > CAST(CAST(GETDATE() as date) as dtime)
SELECT c1.name, c1.event, c1.reason, c1.dtime as startTime,
ISNULL((SELECT MIN(c2.dtime)
FROM [clock] c2
WHERE c2.dtime > c1.dtime
AND c2.name=c1.name ),GETUTCDATE()) as endTime
FROM [clock] c1
WHERE c1.dtime > CAST(CAST(GETDATE() as date) as dtime)
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
For these specific queries, you'd get best performance by:
1) clustering the clock table on ( dtime )
2) add a nonclustered index on ( name, dtime )
1) clustering the clock table on ( dtime )
2) add a nonclustered index on ( name, dtime )
Rebuilding the table is an interesting idea. I'm not sure if a primary key index could be put on a value such as dtime. Time is thought of as nondeterministic. I don't know if it would be unique.
You could look at the actual execution plan. If you are able to post it (however due to company policy you may not be), that may help us. To add to my #1 item, and if you really want to be proactive, you could place the nonclustered index on a separate hard disk with a separate disk controller. This would avoid disk contention.
I wonder if you couldn't rewrite this to use a variable like
Question to Vitor: Is there a reason to use the date data type as opposed to datetime2? I think datetime2 has more precision and uses less storage space in the table than date.
This way the GETUTCDATE is only called once. Having this function not called each time yet rely on memory for a variable may speed it up. I like the idea of using an INNER JOIN (which is faster). Hopefully the GROUP BY clause doesn't degrade performance. Ideally the server isn't memory constrained. If it is, the TEMPDB should be on its own storage device with a fast disk RPM speed (e.g., RPM 15k, or some high quality SAN). When the RAM is fully utilized, the TEMPDB can be written to more frequently than it should.
You could look at the actual execution plan. If you are able to post it (however due to company policy you may not be), that may help us. To add to my #1 item, and if you really want to be proactive, you could place the nonclustered index on a separate hard disk with a separate disk controller. This would avoid disk contention.
I wonder if you couldn't rewrite this to use a variable like
datetime2 @rectangle = GETUTCDATE()
SELECT c1.name, c1.event, c1.reason, c1.dtime as startTime, ISNULL(MIN(c2.dtime), @rectangle) as endTime
FROM [clock] c1
INNER JOIN ON c2.name=c1.name
WHERE c2.dtime > c1.dtime AND c1.dtime > CAST(CAST(GETDATE() as date) as dtime)
GROUP BY c1.name, c1.event, c1.reason, c1.dtime
Question to Vitor: Is there a reason to use the date data type as opposed to datetime2? I think datetime2 has more precision and uses less storage space in the table than date.
This way the GETUTCDATE is only called once. Having this function not called each time yet rely on memory for a variable may speed it up. I like the idea of using an INNER JOIN (which is faster). Hopefully the GROUP BY clause doesn't degrade performance. Ideally the server isn't memory constrained. If it is, the TEMPDB should be on its own storage device with a fast disk RPM speed (e.g., RPM 15k, or some high quality SAN). When the RAM is fully utilized, the TEMPDB can be written to more frequently than it should.
Question to Vitor: Is there a reason to use the date data type as opposed to datetime2? I think datetime2 has more precision and uses less storage space in the table than date.You need to ask that to the author. I just rewrote the query to avoid the subquery. Kept all the rest as it was.
>> I'm not sure if a primary key index could be put on a value such as dtime. ... I don't know if it would be unique. <<
Not, not the PK, the clustered index. No, it certainly wouldn't be unique, but it could very well still be the best clustered index, esp. if you (almost) always specify a dtime range when you process the table. A narrow, unique clustering key is a general rule and should not be blindly followed, no matter what some people claim.
Not, not the PK, the clustered index. No, it certainly wouldn't be unique, but it could very well still be the best clustered index, esp. if you (almost) always specify a dtime range when you process the table. A narrow, unique clustering key is a general rule and should not be blindly followed, no matter what some people claim.
Indexes should definitely be added or modified if they are not optimal.
Have you had a chance to try the query I posted? (comment #2)
Have you had a chance to try the query I posted? (comment #2)
ASKER
I'm getting much better speeds now. the one part that is not working is the current event. since the WHERE is looking for c2.dtime > c1.dtime and the next event hasn't been started so there is no endTime.
I am not getting the current event - clocked out for lunch, but have not clocked back in. The current time at lunch will be the time the next event starts, or the current time:
ISNULL(MIN(c2.dtime),GETUT CDATE()) as endTime
I am not getting the current event - clocked out for lunch, but have not clocked back in. The current time at lunch will be the time the next event starts, or the current time:
ISNULL(MIN(c2.dtime),GETUT
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
clustering the clock table on ( dtime ) (or if there a dozens+ of dups for one dtime, then ( dtime, $IDENTITY ), if there is an identity column
In my opinion, this may or may not help. By creating a complex key (the dtime with something else), you may get a deterministic composite key for which you can implement an a primary key constraint. This will then be the clustered index (the table itself). However, while the composite key will potentially cover the index, there is a downside. The index may be unnecessarily complex and lead to worse performance. The actual execution plan may show that there is a greater cost when the clustered index is used.
I'm getting much better speeds nowrlarian, which query are you working with?
ASKER
Thanks for your help. I used a combination of the different ideas to improve the speed.
one thing odd: if i use a variable in the WHERE
SET @timeLocal = GETDATE()
WHERE c1.dtime > @timeLocal ,
it is 10+ slower than
WHERE c1.dtime > GETDATE()
one thing odd: if i use a variable in the WHERE
SET @timeLocal = GETDATE()
WHERE c1.dtime > @timeLocal ,
it is 10+ slower than
WHERE c1.dtime > GETDATE()
I wanted to comment on this close question. In an earlier comment, I said this:
I'm not sure if a composite key with a nondeterministic column could be deterministic. That may or may not be true. If the combination of columns were nondeterministic, they couldn't form a composite key for the purposes of an index. A table without an index is a heap. Just an FYI.
By creating a complex key (the dtime with something else), you may get a deterministic composite key for which you can implement an a primary key constraint.
I'm not sure if a composite key with a nondeterministic column could be deterministic. That may or may not be true. If the combination of columns were nondeterministic, they couldn't form a composite key for the purposes of an index. A table without an index is a heap. Just an FYI.
It doesn't have to be a PK, just the clustering key. You need to stop conflating the two, because they are not necessarily the same thing.
>> By creating a complex key (the dtime with something else) ... The index may be unnecessarily complex and lead to worse performance. <<
I think shows a fundamental misunderstanding of how SQL uses indexes, particularly a clustered indexes. I've got clustering keys that are 5 columns wide that work beautifully.
Clustered indexes do a great job for range searches, which fits dtime perfectly here. If a dtime range is (almost) always specified, that should be the clustering key. If rows insert rapidly enough that there could be a very large number of duplicate dtimes, then you can add an identity to the key so that you can explicitly specify it as "unique", removing SQL from having to generate a "uniquifier" for the dup key values.
>> WHERE c1.dtime > @timeLocal ,
it is 10+ slower than
WHERE c1.dtime > GETDATE() <<
That seems much more likely if dtime is only a nonclustered index. In that case, it's probably worth forcing a recompile for every query, as otherwise you could get a bad query plan when the lookup value changes.
>> By creating a complex key (the dtime with something else) ... The index may be unnecessarily complex and lead to worse performance. <<
I think shows a fundamental misunderstanding of how SQL uses indexes, particularly a clustered indexes. I've got clustering keys that are 5 columns wide that work beautifully.
Clustered indexes do a great job for range searches, which fits dtime perfectly here. If a dtime range is (almost) always specified, that should be the clustering key. If rows insert rapidly enough that there could be a very large number of duplicate dtimes, then you can add an identity to the key so that you can explicitly specify it as "unique", removing SQL from having to generate a "uniquifier" for the dup key values.
>> WHERE c1.dtime > @timeLocal ,
it is 10+ slower than
WHERE c1.dtime > GETDATE() <<
That seems much more likely if dtime is only a nonclustered index. In that case, it's probably worth forcing a recompile for every query, as otherwise you could get a bad query plan when the lookup value changes.
Open in new window