Solved

sql query to get user timesheet events and calculate time between events

Posted on 2014-12-02
15
302 Views
Last Modified: 2014-12-07
need assistance in speeding up performance. this part of the procedure is taking 25+ seconds to run returning 25k+ rows.


SELECT      c1.name, c1.event, c1.reason, c1.dtime as startTime,
            ISNULL((SELECT MIN(c2.dtime)
                        FROM [clock] c2
                        WHERE      c2.dtime > c1.dtime
                                    AND c2.name=c1.name ),GETUTCDATE()) as endTime    
FROM      [clock] c1
WHERE      c1.dtime > CAST(CAST(GETDATE() as date) as dtime)
0
Comment
Question by:rlarian
  • 4
  • 4
  • 3
  • +2
15 Comments
 
LVL 7

Assisted Solution

by:Phil Davidson
Phil Davidson earned 167 total points
Comment Utility
#1  Make sure that there is an index on the name.  Ideally you would have a covered index to include the other columns.  This takes up more space.  Updating the statistics may burden the database engine.  But this query you have will be faster.  You may want two indexes stored in different locations on the disk.  One should have the time and name variables (perfect for the aggregate subquery) that would be based on the name.  The other would include all variables used in the entire query based on the name as well.

#2  Make sure statistics get updated regularly.

#3
A subquery is using an aggregate: the SELECT MIN statement is fairly expensive.  You may want to rewrite it as a temporary table with a stored procedure.  If disk space isn't an issue, this may be advisable.  You may want to look into disk contention as the culprit.  If RAM is plentiful and disk contention may be an issue, you may want to use a table variable instead of a temporary table.  The temporary table takes up more space.  The ISNULL function will have to evaluate each time after this expensive query is ran. This is probably the pain point.

#4  
Is the server's RAM always above 95% utilization?

#5  Are you using datetime2?  This is the best date/time data type in SQL Server 2008.

#6
Do you have excessive logging enabled?  CDC and other types of logging may be unnecessary.

#7
I would try to rewrite the where clause to not use so many casts. Can the conditional logic use a different function for the purposes of comparing the c1.dtime value?
0
 
LVL 15

Expert Comment

by:danrosenthal
Comment Utility
Give this a try:

WITH clockData (name, event, reason, dtime, rowNum)
AS (
	SELECT name, event, reason, dtime 
	,ROW_NUMBER() over(PARTITION BY name ORDER BY dtime) AS rowNum
	FROM [clock] 
	WHERE dtime > GETDATE()
)
SELECT name, event, reason, dtime AS startTime, ISNULL(c2.dtime,GETUTCDATE()) AS endTime
FROM clockData c1
LEFT JOIN clockData c2
ON c1.name = c2.name 
AND c1.rowNum = c2.rowNum + 1

Open in new window

0
 
LVL 45

Accepted Solution

by:
Vitor Montalvão earned 167 total points
Comment Utility
Would help to know the schema from both tables (indexes included).
Anyway, check if this code runs faster:
SELECT c1.name, c1.event, c1.reason, c1.dtime as startTime, ISNULL(MIN(c2.dtime),GETUTCDATE()) as endTime    
FROM  [clock] c1 
	INNER JOIN ON c2.name=c1.name
WHERE c2.dtime > c1.dtime AND c1.dtime > CAST(CAST(GETDATE() as date) as dtime) 
GROUP BY c1.name, c1.event, c1.reason, c1.dtime

Open in new window

0
 
LVL 69

Expert Comment

by:ScottPletcher
Comment Utility
For these specific queries, you'd get best performance by:

1) clustering the clock table on ( dtime )
2) add a nonclustered index on ( name, dtime )
0
 
LVL 7

Expert Comment

by:Phil Davidson
Comment Utility
Rebuilding the table is an interesting idea.  I'm not sure if a primary key index could be put on a value such as dtime.  Time is thought of as nondeterministic.  I don't know if it would be unique.

You could look at the actual execution plan.  If you are able to post it (however due to company policy you may not be), that may help us.  To add to my #1 item, and if you really want to be proactive, you could place the nonclustered index on a separate hard disk with a separate disk controller.  This would avoid disk contention.

I wonder if you couldn't rewrite this to use a variable like

datetime2 @rectangle = GETUTCDATE()
SELECT c1.name, c1.event, c1.reason, c1.dtime as startTime, ISNULL(MIN(c2.dtime), @rectangle) as endTime    
FROM  [clock] c1 
	INNER JOIN ON c2.name=c1.name
WHERE c2.dtime > c1.dtime AND c1.dtime > CAST(CAST(GETDATE() as date) as dtime) 
GROUP BY c1.name, c1.event, c1.reason, c1.dtime

Open in new window


Question to Vitor: Is there a reason to use the date data type as opposed to datetime2?  I think datetime2 has more precision and uses less storage space in the table than date.

This way the GETUTCDATE is only called once.  Having this function not called each time yet rely on memory for a variable may speed it up.  I like the idea of using an INNER JOIN (which is faster).  Hopefully the GROUP BY clause doesn't degrade performance.  Ideally the server isn't memory constrained.  If it is, the TEMPDB should be on its own storage device with a fast disk RPM speed (e.g., RPM 15k, or some high quality SAN).  When the RAM is fully utilized, the TEMPDB can be written to more frequently than it should.
0
 
LVL 45

Expert Comment

by:Vitor Montalvão
Comment Utility
Question to Vitor: Is there a reason to use the date data type as opposed to datetime2?  I think datetime2 has more precision and uses less storage space in the table than date.
You need to ask that to the author. I just rewrote the query to avoid the subquery. Kept all the rest as it was.
0
 
LVL 69

Expert Comment

by:ScottPletcher
Comment Utility
>> I'm not sure if a primary key index could be put on a value such as dtime. ... I don't know if it would be unique. <<

Not, not the PK, the clustered index.  No, it certainly wouldn't be unique, but it could very well still be the best clustered index, esp. if you (almost) always specify a dtime range when you process the table.  A narrow, unique clustering key is a general rule and should not be blindly followed, no matter what some people claim.
0
Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

 
LVL 15

Expert Comment

by:danrosenthal
Comment Utility
Indexes should definitely be added or modified if they are not optimal.
Have you had a chance to try the query I posted? (comment #2)
0
 
LVL 4

Author Comment

by:rlarian
Comment Utility
I'm getting much better speeds now. the one part that is not working is the current event. since the WHERE is looking for c2.dtime > c1.dtime and the next event hasn't been started so there is no endTime.

I am not getting the current event - clocked out for lunch, but have not clocked back in. The current time at lunch will be the time the next event starts, or the current time:

 ISNULL(MIN(c2.dtime),GETUTCDATE()) as endTime
0
 
LVL 69

Assisted Solution

by:ScottPletcher
ScottPletcher earned 166 total points
Comment Utility
Your original query should handle that, assuming you've got the proper indexes:
1) clustering the clock table on ( dtime ) (or if there a dozens+ of dups for one dtime, then ( dtime, $IDENTITY ), if there is an identity column
 2) a nonclustered index on ( name, dtime )

For this logic, you'll want to use a subquery and not a join or cte.  You can change it to a CROSS APPLY if you like, but the results should be the same.


SELECT      c1.name, c1.event, c1.reason, c1.dtime as startTime,
             ISNULL((SELECT MIN(c2.dtime)
                         FROM [clock] c2
                         WHERE      c2.dtime > c1.dtime
                                     AND c2.name=c1.name ),GETUTCDATE()) as endTime    
 FROM      [clock] c1
 WHERE      c1.dtime > CAST(CAST(GETDATE() as date) as dtime)
0
 
LVL 7

Expert Comment

by:Phil Davidson
Comment Utility
clustering the clock table on ( dtime ) (or if there a dozens+ of dups for one dtime, then ( dtime, $IDENTITY ), if there is an identity column

In my opinion, this may or may not help.  By creating a complex key (the dtime with something else), you may get a deterministic composite key for which you can implement an a primary key constraint.  This will then be the clustered index (the table itself).  However, while the composite key will potentially cover the index, there is a downside.  The index may be unnecessarily complex and lead to worse performance.  The actual execution plan may show that there is a greater cost when the clustered index is used.
0
 
LVL 45

Expert Comment

by:Vitor Montalvão
Comment Utility
I'm getting much better speeds now
rlarian, which query are you working with?
0
 
LVL 4

Author Closing Comment

by:rlarian
Comment Utility
Thanks for your help. I used a combination of the different ideas to improve the speed.
one thing odd: if i use a variable in the WHERE
SET @timeLocal = GETDATE()
WHERE c1.dtime > @timeLocal ,

it is 10+ slower than

WHERE c1.dtime > GETDATE()
0
 
LVL 7

Expert Comment

by:Phil Davidson
Comment Utility
I wanted to comment on this close question.  In an earlier comment, I said this:

By creating a complex key (the dtime with something else), you may get a deterministic composite key for which you can implement an a primary key constraint.

I'm not sure if a composite key with a nondeterministic column could be deterministic.  That may or may not be true.  If the combination of columns were nondeterministic, they couldn't form a composite key for the purposes of an index.  A table without an index is a heap.  Just an FYI.
0
 
LVL 69

Expert Comment

by:ScottPletcher
Comment Utility
It doesn't have to be a PK, just the clustering key.  You need to stop conflating the two, because they are not necessarily the same thing.

>> By creating a complex key (the dtime with something else) ... The index may be unnecessarily complex and lead to worse performance. <<
I think shows a fundamental misunderstanding of how SQL uses indexes, particularly a clustered indexes.  I've got clustering keys that are 5 columns wide that work beautifully.  

Clustered indexes do a great job for range searches, which fits dtime perfectly here.  If a dtime range is (almost) always specified, that should be the clustering key.  If rows insert rapidly enough that there could be a very large number of duplicate dtimes, then you can add an identity to the key so that you can explicitly specify it as "unique", removing SQL from having to generate a "uniquifier" for the dup key values.


>> WHERE c1.dtime > @timeLocal ,

 it is 10+ slower than

 WHERE c1.dtime > GETDATE()  <<

That seems much more likely if dtime is only a nonclustered index.  In that case, it's probably worth forcing a recompile for every query, as otherwise you could get a bad query plan when the lookup value changes.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

JSON is being used more and more, besides XML, and you surely wanted to parse the data out into SQL instead of doing it in some Javascript. The below function in SQL Server can do the job for you, returning a quick table with the parsed data.
International Data Corporation (IDC) prognosticates that before the current the year gets over disbursing on IT framework products to be sent in cloud environs will be $37.1B.
Familiarize people with the process of retrieving data from SQL Server using an Access pass-thru query. Microsoft Access is a very powerful client/server development tool. One of the ways that you can retrieve data from a SQL Server is by using a pa…
Using examples as well as descriptions, and references to Books Online, show the documentation available for date manipulation functions and by using a select few of these functions, show how date based data can be manipulated with these functions.

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now