Solved

sql query to get user timesheet events and calculate time between events

Posted on 2014-12-02
15
348 Views
Last Modified: 2014-12-07
need assistance in speeding up performance. this part of the procedure is taking 25+ seconds to run returning 25k+ rows.


SELECT      c1.name, c1.event, c1.reason, c1.dtime as startTime,
            ISNULL((SELECT MIN(c2.dtime)
                        FROM [clock] c2
                        WHERE      c2.dtime > c1.dtime
                                    AND c2.name=c1.name ),GETUTCDATE()) as endTime    
FROM      [clock] c1
WHERE      c1.dtime > CAST(CAST(GETDATE() as date) as dtime)
0
Comment
Question by:rlarian
  • 4
  • 4
  • 3
  • +2
15 Comments
 
LVL 7

Assisted Solution

by:Phil Davidson
Phil Davidson earned 167 total points
ID: 40477890
#1  Make sure that there is an index on the name.  Ideally you would have a covered index to include the other columns.  This takes up more space.  Updating the statistics may burden the database engine.  But this query you have will be faster.  You may want two indexes stored in different locations on the disk.  One should have the time and name variables (perfect for the aggregate subquery) that would be based on the name.  The other would include all variables used in the entire query based on the name as well.

#2  Make sure statistics get updated regularly.

#3
A subquery is using an aggregate: the SELECT MIN statement is fairly expensive.  You may want to rewrite it as a temporary table with a stored procedure.  If disk space isn't an issue, this may be advisable.  You may want to look into disk contention as the culprit.  If RAM is plentiful and disk contention may be an issue, you may want to use a table variable instead of a temporary table.  The temporary table takes up more space.  The ISNULL function will have to evaluate each time after this expensive query is ran. This is probably the pain point.

#4  
Is the server's RAM always above 95% utilization?

#5  Are you using datetime2?  This is the best date/time data type in SQL Server 2008.

#6
Do you have excessive logging enabled?  CDC and other types of logging may be unnecessary.

#7
I would try to rewrite the where clause to not use so many casts. Can the conditional logic use a different function for the purposes of comparing the c1.dtime value?
0
 
LVL 15

Expert Comment

by:danrosenthal
ID: 40478067
Give this a try:

WITH clockData (name, event, reason, dtime, rowNum)
AS (
	SELECT name, event, reason, dtime 
	,ROW_NUMBER() over(PARTITION BY name ORDER BY dtime) AS rowNum
	FROM [clock] 
	WHERE dtime > GETDATE()
)
SELECT name, event, reason, dtime AS startTime, ISNULL(c2.dtime,GETUTCDATE()) AS endTime
FROM clockData c1
LEFT JOIN clockData c2
ON c1.name = c2.name 
AND c1.rowNum = c2.rowNum + 1

Open in new window

0
 
LVL 48

Accepted Solution

by:
Vitor Montalvão earned 167 total points
ID: 40478376
Would help to know the schema from both tables (indexes included).
Anyway, check if this code runs faster:
SELECT c1.name, c1.event, c1.reason, c1.dtime as startTime, ISNULL(MIN(c2.dtime),GETUTCDATE()) as endTime    
FROM  [clock] c1 
	INNER JOIN ON c2.name=c1.name
WHERE c2.dtime > c1.dtime AND c1.dtime > CAST(CAST(GETDATE() as date) as dtime) 
GROUP BY c1.name, c1.event, c1.reason, c1.dtime

Open in new window

0
The Eight Noble Truths of Backup and Recovery

How can IT departments tackle the challenges of a Big Data world? This white paper provides a roadmap to success and helps companies ensure that all their data is safe and secure, no matter if it resides on-premise with physical or virtual machines or in the cloud.

 
LVL 69

Expert Comment

by:Scott Pletcher
ID: 40478904
For these specific queries, you'd get best performance by:

1) clustering the clock table on ( dtime )
2) add a nonclustered index on ( name, dtime )
0
 
LVL 7

Expert Comment

by:Phil Davidson
ID: 40480132
Rebuilding the table is an interesting idea.  I'm not sure if a primary key index could be put on a value such as dtime.  Time is thought of as nondeterministic.  I don't know if it would be unique.

You could look at the actual execution plan.  If you are able to post it (however due to company policy you may not be), that may help us.  To add to my #1 item, and if you really want to be proactive, you could place the nonclustered index on a separate hard disk with a separate disk controller.  This would avoid disk contention.

I wonder if you couldn't rewrite this to use a variable like

datetime2 @rectangle = GETUTCDATE()
SELECT c1.name, c1.event, c1.reason, c1.dtime as startTime, ISNULL(MIN(c2.dtime), @rectangle) as endTime    
FROM  [clock] c1 
	INNER JOIN ON c2.name=c1.name
WHERE c2.dtime > c1.dtime AND c1.dtime > CAST(CAST(GETDATE() as date) as dtime) 
GROUP BY c1.name, c1.event, c1.reason, c1.dtime

Open in new window


Question to Vitor: Is there a reason to use the date data type as opposed to datetime2?  I think datetime2 has more precision and uses less storage space in the table than date.

This way the GETUTCDATE is only called once.  Having this function not called each time yet rely on memory for a variable may speed it up.  I like the idea of using an INNER JOIN (which is faster).  Hopefully the GROUP BY clause doesn't degrade performance.  Ideally the server isn't memory constrained.  If it is, the TEMPDB should be on its own storage device with a fast disk RPM speed (e.g., RPM 15k, or some high quality SAN).  When the RAM is fully utilized, the TEMPDB can be written to more frequently than it should.
0
 
LVL 48

Expert Comment

by:Vitor Montalvão
ID: 40480220
Question to Vitor: Is there a reason to use the date data type as opposed to datetime2?  I think datetime2 has more precision and uses less storage space in the table than date.
You need to ask that to the author. I just rewrote the query to avoid the subquery. Kept all the rest as it was.
0
 
LVL 69

Expert Comment

by:Scott Pletcher
ID: 40480946
>> I'm not sure if a primary key index could be put on a value such as dtime. ... I don't know if it would be unique. <<

Not, not the PK, the clustered index.  No, it certainly wouldn't be unique, but it could very well still be the best clustered index, esp. if you (almost) always specify a dtime range when you process the table.  A narrow, unique clustering key is a general rule and should not be blindly followed, no matter what some people claim.
0
 
LVL 15

Expert Comment

by:danrosenthal
ID: 40481344
Indexes should definitely be added or modified if they are not optimal.
Have you had a chance to try the query I posted? (comment #2)
0
 
LVL 4

Author Comment

by:rlarian
ID: 40481545
I'm getting much better speeds now. the one part that is not working is the current event. since the WHERE is looking for c2.dtime > c1.dtime and the next event hasn't been started so there is no endTime.

I am not getting the current event - clocked out for lunch, but have not clocked back in. The current time at lunch will be the time the next event starts, or the current time:

 ISNULL(MIN(c2.dtime),GETUTCDATE()) as endTime
0
 
LVL 69

Assisted Solution

by:Scott Pletcher
Scott Pletcher earned 166 total points
ID: 40481566
Your original query should handle that, assuming you've got the proper indexes:
1) clustering the clock table on ( dtime ) (or if there a dozens+ of dups for one dtime, then ( dtime, $IDENTITY ), if there is an identity column
 2) a nonclustered index on ( name, dtime )

For this logic, you'll want to use a subquery and not a join or cte.  You can change it to a CROSS APPLY if you like, but the results should be the same.


SELECT      c1.name, c1.event, c1.reason, c1.dtime as startTime,
             ISNULL((SELECT MIN(c2.dtime)
                         FROM [clock] c2
                         WHERE      c2.dtime > c1.dtime
                                     AND c2.name=c1.name ),GETUTCDATE()) as endTime    
 FROM      [clock] c1
 WHERE      c1.dtime > CAST(CAST(GETDATE() as date) as dtime)
0
 
LVL 7

Expert Comment

by:Phil Davidson
ID: 40482257
clustering the clock table on ( dtime ) (or if there a dozens+ of dups for one dtime, then ( dtime, $IDENTITY ), if there is an identity column

In my opinion, this may or may not help.  By creating a complex key (the dtime with something else), you may get a deterministic composite key for which you can implement an a primary key constraint.  This will then be the clustered index (the table itself).  However, while the composite key will potentially cover the index, there is a downside.  The index may be unnecessarily complex and lead to worse performance.  The actual execution plan may show that there is a greater cost when the clustered index is used.
0
 
LVL 48

Expert Comment

by:Vitor Montalvão
ID: 40482371
I'm getting much better speeds now
rlarian, which query are you working with?
0
 
LVL 4

Author Closing Comment

by:rlarian
ID: 40483045
Thanks for your help. I used a combination of the different ideas to improve the speed.
one thing odd: if i use a variable in the WHERE
SET @timeLocal = GETDATE()
WHERE c1.dtime > @timeLocal ,

it is 10+ slower than

WHERE c1.dtime > GETDATE()
0
 
LVL 7

Expert Comment

by:Phil Davidson
ID: 40484241
I wanted to comment on this close question.  In an earlier comment, I said this:

By creating a complex key (the dtime with something else), you may get a deterministic composite key for which you can implement an a primary key constraint.

I'm not sure if a composite key with a nondeterministic column could be deterministic.  That may or may not be true.  If the combination of columns were nondeterministic, they couldn't form a composite key for the purposes of an index.  A table without an index is a heap.  Just an FYI.
0
 
LVL 69

Expert Comment

by:Scott Pletcher
ID: 40486178
It doesn't have to be a PK, just the clustering key.  You need to stop conflating the two, because they are not necessarily the same thing.

>> By creating a complex key (the dtime with something else) ... The index may be unnecessarily complex and lead to worse performance. <<
I think shows a fundamental misunderstanding of how SQL uses indexes, particularly a clustered indexes.  I've got clustering keys that are 5 columns wide that work beautifully.  

Clustered indexes do a great job for range searches, which fits dtime perfectly here.  If a dtime range is (almost) always specified, that should be the clustering key.  If rows insert rapidly enough that there could be a very large number of duplicate dtimes, then you can add an identity to the key so that you can explicitly specify it as "unique", removing SQL from having to generate a "uniquifier" for the dup key values.


>> WHERE c1.dtime > @timeLocal ,

 it is 10+ slower than

 WHERE c1.dtime > GETDATE()  <<

That seems much more likely if dtime is only a nonclustered index.  In that case, it's probably worth forcing a recompile for every query, as otherwise you could get a bad query plan when the lookup value changes.
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
SSRS - Date Report Options 2 29
RESTORE A BACKUP IN SQL 2012 from SQL 2008 9 65
Transact SQL - Frequency of Length of Distinct Values 3 26
TSQL XML Namespaces 7 24
In this article we will get to know that how can we recover deleted data if it happens accidently. We really can recover deleted rows if we know the time when data is deleted by using the transaction log.
Ever wondered why sometimes your SQL Server is slow or unresponsive with connections spiking up but by the time you go in, all is well? The following article will show you how to install and configure a SQL job that will send you email alerts includ…
Familiarize people with the process of utilizing SQL Server functions from within Microsoft Access. Microsoft Access is a very powerful client/server development tool. One of the SQL Server objects that you can interact with from within Microsoft Ac…
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.

828 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question