random lag time in SQL query

Hi, I have a customer that have a major problem that comes and go for months now.    When doing query, all is going well most of the time, the request takes less than a second to return results.    Problem is that from time to time the same query takes up to 10 seconds to return results, making the program very frustrating to use if not almost useless.   This can happen many times during a day (more than 20 times at the worst days) or not at all.   I'm using sql server 2012 running on a windows server 2012 essential.   Problem can occur even when running directly on the server, it also happens even if there is just one computer opened.

I was not using indexs (except primary keys) and I thought this might be the problem, I have created some and the problem stopped right away for a month and came back for no reason.   I have tried to use the ALTER INDEX... REBUILD command but this had no effect at all.

I have used up pretty much all solutions I had in mind, I don't know where to go from here.    If anyone has any idea what this is or any suggestions on tools I can use to monitor this behavior and detect the source, that would be greatly appreciated.
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

We deal with similar symptoms on our production environment except that we ha a lots of connections and requests at the same time. However,  all is good and nice and requests are answered fast but once i a while there is a pile up of processes and waiting time that generate significant log.

What we noticed on is that every time that happens we observed a sudden drop of PLE(Page Life Expectancy) from high values, in the thousands, to very low values to under 100 even. That means that the entire buffer cache(SQL memory) is refreshed with data from the hard drive which cause al lot if I/O activity and waiting time for everyone. We learnd that this could be a known issue for SQL 2014 SP1 and it needs some cumulative patches to be installed, mainly SP1 CU4. We are on our way to actually install the SP2 pack.

We don't know yet that this will solve the problem and in your case could be something else. In order to go deeper into this issue to find the root cause you will definetely need some monitoring in place. I recommend you to use what is mentioned by me in this question:


That will have you to pinpoit things much faster and also to see if the PLE could be the cause in your case too. Start with this monitoring tool:


It installes full but after 14 days you can still use if for free as a limited version, which is still very useful. I am using it for years now.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Another more simple tool to use, which i recommended in the same answer, is Adam Mechanic's sp_whoisactive stored procedure which will give you etails about active processes. You just need to execute that procedure while your lag happens, or put in place an automated detection and logging. Here are the details:


How to use it:

How to use it to log activity:

Getting locking info:
Diamond IT AdministratorSystems AdministratorCommented:
I am not a Database expert by any means but have enough experience with them to know that Zberteoc was mentioning I\O's which could very be your issue. How large is the database? Is the server a physical or Virtual. The way you described your environment it sounds like its a physical server. How fast are your hard drives? If your doing a-lot of read writes to the database and have slow enough hard drives, that can cause the periodic lag. It could be that most times the read writes are few enough that the drives are okay keeping up, but when you have those days where there are a-lot of read writes your drive(s) can't keep pace. Have you tracked the days that it happens? Is there a pattern to the days? (ie. always on the 3rd wednesday of the month or every other Thursday). If there is a pattern, it may help determine the root cause whether its hardware related or the database itself.

Although unlikely, another possible issue could be that it is network related, Have you looked at your network traffic at the time the lags happen? I would recommend wireshark if you do not have a monitoring tool in place, as a temporary monitoring tool. If you have anything "hinky" going on Wireshark will see it.
Protecting & Securing Your Critical Data

Considering 93 percent of companies file for bankruptcy within 12 months of a disaster that blocked access to their data for 10 days or more, planning for the worst is just smart business. Learn how Acronis Backup integrates security at every stage

Vitor MontalvãoMSSQL Senior EngineerCommented:
Problem is that from time to time the same query takes up to 10 seconds to return results
This sounds like a concurrent process issue. Did you check during these low performance executions if any blocking locks were occurring in the database?

I was not using indexs (except primary keys) and I thought this might be the problem, I have created some and the problem stopped right away for a month and came back for no reason.
Yes, indexing can boost the queries and avoid blocking locks. But when using indexes you need to maintain them so Reindex and Rebuild jobs are needed to run daily to keep it sharp. Do you have any maintenance plan for this?
Buropro-CitationAuthor Commented:
Hi James, the database is quite small, when doing a backup I see that the size is 12 mégabytes, this a a physical server on SATA hard drives (two disk that mirror each other).  

Since the problem is occuring even when working on the server Workstation, I thought that it was not network related.   There is two work stations that are using the software, there is some read and writes but the problem can occur even if there's only one person working on a saturday morning when there is not much going on.    The problem occurs in particular in a calendar, so they are clicking one day after the other, I would say a query is done each second in those times.

There is no pattern on which day are worst, it happens pretty much randomly any day of the week or week-end.    Some time I can go dome days without any lag, I even had a month without lag but recently I would say this is happening 4 to 7 days a week.
Buropro-CitationAuthor Commented:
Hi Vitor, I don't know how to monitor locks, is there a tool in particular that can help me?   For the moment I log manually how much time the query takes so I know which days this happens and at which hour exactly, if I could after this compare with some logs from the sql server that would be wonderful.

There is no maintenance for now of the indexs, I had done a rebuild once for every indexes to see if it makes any difference but it did not.
Vitor MontalvãoMSSQL Senior EngineerCommented:
(..)the problem can occur even if there's only one person working on a saturday morning when there is not much going on.
This is also a good clue. Things that you should check:
There's any maintenance plan job running when the problem occurs? Check for full backups and/or reindexes jobs
Check the database properties for options like auto close and auto shrink. Both should be disabled
Vitor MontalvãoMSSQL Senior EngineerCommented:
I don't know how to monitor locks, is there a tool in particular that can help me?
You can use SSMS Activity Monitor or run the following query and check the BlkBy column:
EXEC sp_who2
If not empty means that the process is being blocked by the session number that appear in the column.

Please pay attention to my post and try to use those tools. I will assure you that will make your life easier. That Solarwinds tool could take you to the exact issue, especially if you don't have the necessary experience to do it by at hand tools from SQL server.
Scott PletcherSenior DBACommented:
Typically it's a blocking issue.  The best way to reduce blocking is with the best clustering index on each table (note that the best clus index not usually an identity column, no matter what some people say).

It's sometimes easy to determine the best clus index, sometimes not.  You have to look at the missing index stats and index usage stats (for a full review, you would also consider the operational stats, but you can skip that part since you're not familiar with it).

If you see large numbers of seeks on the same column, over and over, typically you would cluster on that column, although of course there are exceptions.
Buropro-CitationAuthor Commented:
Hi ScottPletcher, I have looked at missing index stats but it is empty, I don't know if it's normal?   I have used the command SELECT TOP 10 * FROM sys.dm_db_missing_index_group_stats
ORDER BY avg_total_user_cost * avg_user_impact * (user_seeks + user_scans)DESC;

@Zberteoc, I'll be sure to take a look at the solarwind database monitor, I'll be installing this shortly on my customer's server.    I'll see next week the results.
Buropro-CitationAuthor Commented:
I have some news regarding this case, last case of lag was on oct 27th and there was no problem until november 16th, I had installed the monitoring software but since it was a 14 days trial the period was over I don't have much data.    

Fortunatly I was logging in a database the PLE value everytime I'm calling the SQL query that is most used, I have seen that it was going up each days but then it dropped from 233432 to  8 causing a lag at this precise moment.    

Since that moment, the PLE has begun to go up again, there has been 18 more lags, 3 on the same night that the PLE dropped and 15 the next day.    The PLE of the last lag was at 72217.    

Should I do something about the PLE, is it normal behavior that I'm experiencing?   If so, what should I look for next?
Yes, that is exactly what used to happen to us only more often, could be 2-3 times per week. The solution was to increase the memory with 50% and at the same time we upgraded from SQL Server 2012 Service Pack 1, SP1, to Service Pack 2, SP2.

If you are still with SP1 then I recommend you to upgrade to SP2.
Buropro-CitationAuthor Commented:
The error has not been seen since october even though nothing was changed.    I have installed a monitoring for a month and noticed a PLU drop sometimes when the problem occurs.    This is yet to be solved but I have to close the question since no new development have occured for months.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.